Patrick Scott [GDD] Byzantine Generals 29 November 2018

The general consensus when it comes to scaling is that you can do so in two ways.
 
 
Horizontally, or Vertically.
 
This is what most people will tell you. I actually think it’s three ways, as sharding is a scaling strategy as well, but that’s a topic for another time.
 
 
Vertical scaling is probably how you’ve gone through your life with personal computers and/or laptops.
 
 
You bought one and eventually you needed more power.
 
 
Adding more RAM or CPU or hard drive space gives you that power. You use “Vertical Scaling” when you need more power on your machine.
 
 
The other way to scale, is by adding more machines. This is called “Horizontal Scaling”.
 
 
This introduces different complexities.
 
 
You’ve now entered the realm of distributed computing and distributed systems.
 
Realities of production systems.
 
 
Welcome.
 
 
I want to introduce a popular thought experiment from history, applied to computer science as a metaphor, to help in understanding this problem.
 
This is known as the "Byzantine Generals” problem.
 
I spent hours looking through YouTube clips and found this one, short (about 2 minutes) video on youtube which did a great job explaining the problem in an easy to understand and visualize way:
 
 
 
Go ahead and watch it now.
 
 
 
 
 
Back?
 
 
Ok.
 
 
Needless to say, it’s quite a complicated problem.
 
 
Generally the way to solve unsolvable problems is to introduce constraints, which make the results acceptable, while still not perfect.
 
 
Over the years, technology has been introduced, developed, refined, and iterated on to ease the burden of working with distributed systems, and work towards solving problems such as these.
 
 
Essentially, we want to find a way to work with many computers, as if they were one.
 
 
I specifically want to talk about a way to do just this, known as “clustering”.
 
 
A cluster is a group of servers that all act in unison as if they were one. They are a solution to horizontal scaling.
 
The cluster of computers needs to be able to reach consensus on an agreed upon state.
 
 
With the advent of Blockchain the ideas of clusters are becoming more mainstream. Even many in the general public understand that the blockchain is decentralized, and made up of many different servers working together, and making decisions based on the consensus of the majority.
 
 
Ethereum is even often described as a “global computer”.
 
 
The container orchestrators, Kubernetes and Docker Swarm, do a lot of things – one of the core essential pieces, is allowing you to group servers together in a private cluster.
 
 
This means that each orchestrator is aware of what other computers, known as "Nodes", make up the cluster, and what resources are available on each Node (memory, CPU, etc).
 
 
When given tasks or services to run, they evaluate the available resources on each Node in the cluster, and decide where the best place to run that task is based on it’s requirements.
 
 
It schedules tasks essentially the same way your computer or smart phone schedules tasks, except instead of being able to only schedule tasks on a single machine, it schedules tasks on a cluster which is made up of many machines.
 
 
An important part of any cluster is consensus. There are different types of consensus algorithms, Blockchain being one of them, with different requirements for agreeing upon the state.
 
 
The important thing is that the Nodes need to agree upon the state.
 
 
Docker Swarm and Kubernetes both work very similarly in regard to their consensus algorithm.
 
 
They use one known as “Raft”.
 
 
Raft has become very popular over the past few years due to it’s power and ease of implementation.
 
 
“Paxos” is another infamous consensus algorithm you may have heard of. It is renowned for being very technical and difficult to understand and implement.
 
I once met Greg Young, of DDD and CQRS/ES fame, and he told me I basically wouldn't know anything as an engineer until I've mastered Paxos. I haven't, but maybe someday :)
 
 
With either, though, the more Nodes in the system, the more parties need to agree on a state.
 
Meaning the algorithm becomes less efficient with more Nodes - also because more data needs to be replicated to more places. This is why people are all like "the blockchain can't scale!" and other people are like "we'll figure it out... probably..."
 
 
To deal with this in container orchestration cluster land, there are only a subset of nodes that are responsible for reaching consensus. 
 
The nodes in this subset are “Manager” nodes. Once the majority of "managers" reach an agreement on the state, they then schedule tasks on “Worker” nodes.
 
 
It’s generally recommended to run a max of 5 manager nodes for this reason, and a minimum of 3.
 
 
The reason being, if too many manager nodes are lost at once, it will be impossible for the cluster to reach a majority consensus, and manual recovery command will need to be run.
 
 
When running three managers, you can safely lose 1 entire manager Node without issues, as the 2 remaining managers will still be in consensus. It is not safe to remain in this state, however, the orchestrator will still know what state it SHOULD be in, and will spin up a new manager node to replace the lost node.
 
 
If you are running 5 managers, you can safely lose 2 without losing consensus.
 
Although it's not important for our discussion, a large difference with Blockchain is essentially ALL nodes are managers, rather than a small subset.
 
 
This was a lot to take in for today.
 
 
Don’t worry about it too much, you just need to know at a high level that there is a consensus algorithm that is operating in the background of your cluster, making sure all managers nodes agree upon the state of the cluster, so you need to run enough of them to have redundancy.
 
 
And just to elaborate a bit on what that state is, it is the description of how many nodes are in the cluster, what services, volumes, secrets, configs, and any other cloud primitives make up the cluster.
 
What do you think would happen if you were able to describe the state that you want to exist to the orchestrator by inputting it as a config?
 
 
The docker-compose files we used earlier for development orchestration were examples of this. They described the desired state of services, networks, and volumes that should be running in a YAML configuration file.
 
---
 
 
So can we use docker-compose to describe the production state, or do we need to use something else?
 
 
Join me tomorrow as we dive deeper.
Patrick “Byzantine General” Scott
P.S. Just in case you’re feeling overwhelmed I just want to reiterate the fact that you do NOT need to learn the details of how consensus algorithms work to run either of these orchestrators. It is important however, to understand at a high level what's happening behind the scenes to avoid setting up your cluster incorrectly.
 
 
Btw, can you think of other applications you have used that might use a consensus algorithm under the hood? They are all around you! Let me know!