Recently, I learned concepts of application and cluster maintenance in Kubernetes.
Rolling Updates and Rollbacks
When we try to update the deployments, there are basically two ways in Kubernetes to do the update.
-
Recreate: the number of related replica will be scaled down to zero and this means the deployment will have down time.
-
Rolling Update: the replica will be down and get updated one by one, which means the deployment will not have down time
Similarly way, if we found something wrong in the latest update, we may also use kubectl rollout undo <deployment>
to rollback the update.
Drain and Cordon
In cluster maintenance, if a node is unhealthy, we can definitely drain the node so that the application held by this node will be transferred to other nodes(if the application has replicaset). If the application in that node without replicaset, the application will disappear after we drain the node.
Cordon is a little bit different from drain. Drain will empty the node but cordon is to let the node unschedulable. The previous deployment or application can stay there but no more new application or deployment.
After we are done the maintenance of the node, we can run kubectl uncordon <node-name>
to make the node
schedulable again.
Upgrade Cluster Strategy
Kubernetes consist of multiple components. The version of them has some limitation due to some
dependency issue. For example, if the version of kube-apiserver is X
(1.10.0), the version of controller-manager and
kube-scheduler is allowed to be X-1
(1.9.0, 1.10.0) and the version of kubelet and kueb-proxy is allowed to be X-2
(1.8.0, 1.9.0, 1.10.0). For kubectl, it is allow to be from X+1
to X-1
(1.11.0, 1.10.0, 1.9.0).
It is recommend to upgrade one minor version at a time instead of multiple minor versions at a time. We
will first upgrade the master node. During the upgrade of master, the worker node functionality will not be
affected. We just cannot use control api to control worker node during the upgrade. And there are
three strategies to do the upgrade worker node.
- Upgrade all worker nodes at once, which require down time.
- Upgrade one node at a time, the applications in the upgrading node will be routed to other nodes. After the upgrade finished, the workload will be back to that node.
- Add new node with latest version and transfer the application or deployment one by one. This way is more optimal in a cloud environment since it is easy to provision new nodes.
Backup and Restore method
Backup in Kubernetes refers two things. One is backing up config and the other is backing up ETCD(the meta storage of Kubernetes).
Backing up config is easy to understand. We just need to save the config file somewhere. One typical way
to do the save is to run kubectl get pod <pod-name> -o yaml > pod-definition.yaml
.
I will talk about back up ETCD more. We can dump the ETCD by running ETCDCTL_API=3 etcdctl snapshot save snapshot.db
.
To restore the dump, we should stop the kube-apiserver first and run ETCDCTL_API=3 etcdctl snapshot restore snapshot.db
.
ETCDCTL_API=3
basically means we need to specify the ETCD_API version.
Sometimes, we need to restore the dump into a new cluster so that it will not accidentally join with an existing one.
Apart from this, we also need to specify the certificate argument to spawn up a server. Here are the arguments we need
to specify while spawning up the ETCD server: --endpoints
, --cacert
, --cert
, --key
.