Etcd In Kubernetes

Credits : https://kubernetes.io https://raft.github.io/
Credits : https://kubernetes.io https://raft.github.io/

Etcd is a distributed key-value store that provides a reliable way to store data that needs to be accessed by large scale distributed systems. An etcd cluster is meant to provide key-value storage with best of class stability, reliability, scalability, and performance. These systems never tolerate split-brain operation and are willing to sacrifice availability to achieve this end.

It serves as the backbone of many distributed systems, providing a reliable way for storing data across a cluster of servers. It works on a variety of operating systems including here Linux, BSD and OS X.

How do Etcd works?

The consensus is a fundamental problem in fault-tolerant distributed systems. Consensus involves multiple servers agreeing on values. Once they reach a decision on a value, that decision is final. Typical consensus algorithms make progress when any majority of their servers is available; for example, a cluster of 5 servers can continue to operate even if 2 servers fail. If more servers fail, they stop making progress (but will never return an incorrect result).

The Raft Consensus Algorithm aims to solve the consensus problem. It was designed with the main aim of keeping it simple and understandable.

The basis of the algorithm is that for any action to take place there has to be quorum which is decided by (N/2)+1, with N being the number of nodes or voting members of the distributed system. When an action is requested, it has to be voted on by all etcd members. If it receives greater than 50% of the votes then the action is allowed to take place. In this case, the action is written to a log on each node which is the source of truth. By having distributed consensus you get the security of knowing anything written to the log is an allowed action as well as having log replication and leader elections.

Because having quorum requires >50% of the votes, odd numbers of nodes increase your level of fault tolerance. Even numbers of nodes do not increase the failure tolerance of nodes that can go down. In the case of having two nodes the chance of losing quorum increases, since if either node goes down no new action can take place.

For instance, a 3 node distributed system (cluster) of your bank might be storing your bank account information in which each node stores a copy of the information. Replication is done so as to ensure that if a node dies, the information isn’t lost.

Every node has 2 components:

  1. A state machine — think of it as the current information stored on the node. For instance, the 3 nodes of your bank’s cluster may store a state — your bank account number, your balance, your history of transactions. These things define a “state” for your bank account. The aim is to maintain this state “same” across all the nodes (that is, to ensure that the nodes proceed ahead in synchrony).
  2. A commit log — think of it as a set of instructions on the state machine. For instance, an instruction can look like — deduct $100 from the bank account corresponding to my name and store it in the history of transactions. Instruction on the state machine changes the state of the machine.

In Kubernetes

Etcd’s job within Kubernetes is to safely store critical data for distributed systems. It’s best known as Kubernetes’ primary datastore used to store its configuration data, state, and metadata. Since Kubernetes usually runs on a cluster of several machines, it is a distributed system that requires a distributed datastore like Etcd.

Prerequisites

  1. Run etcd as a cluster of odd members.
  2. etcd is a leader-based distributed system. Ensure that the leader periodically sends heartbeats on time to all followers to keep the cluster stable.
  3. Ensure that no resource starvation occurs.

Performance and stability of the cluster are sensitive to network and disk IO. Any resource starvation can lead to heartbeat timeout, causing instability of the cluster. An unstable etcd indicates that no leader is elected. Under such circumstances, a cluster cannot make any changes to its current state, which implies no new pods can be scheduled.

  1. Keeping stable etcd clusters is critical to the stability of Kubernetes clusters. Therefore, run etcd clusters on dedicated machines or isolated environments for guaranteed resource requirements.
  2. The minimum recommended version of etcd to run in production is 3.2.10+.

When running Etcd clusters in production, we should take into consideration the guidelines offered by Kubernetes official documentation (https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/). The page offers a good starting point for robust production deployment.

References:

  1. https://raft.github.io/
  2. https://kubernetes.io

 

 

Author:

Mukul Kazla

Mukul Kazla

Cloud Native Engineer at Cloudical GmbH