Kubernetes Upgrade 1.20.x to v1.22.x Using Kubespray - Docker to Containerd Migration
WIP
This is a template post but below steps tested on our clusters, so you can use as it is (By adapting the kubespray configurations in this article according to your own configuration, of course).
Procedure
1. Upgrade k8s to v1.21.6 using kubespray v2.17
All steps are done on the bootstrap (or one of the master) node using root user
sudo su -
export CLUSTER_NAME=<put cluster name here>
Uninstall old ansible package
pip uninstall ansible
Install newer python packages and optional packages
yum -y install python3-pip python3-devel vim wget unzip tmux
# tmux is optional, but it is useful to have it
Create virtualenv if not exist
pip3 install virtualenv
mkdir -p ~/.virtualenvs
pip3 install virtualenvwrapper
echo 'export WORKON_HOME=$HOME/.virtualenvs' >> ~/.bashrc
echo 'source /usr/bin/virtualenvwrapper.sh' >> ~/.bashrc
source ~/.bashrc
Install required kubespray release
mkdir -p /kubespray
cd /kubespray
wget https://github.com/kubernetes-sigs/kubespray/archive/release-2.17.zip
unzip release-2.17.zip
cd kubespray-release-2.17
Create virtualenv and install requirements
mkvirtualenv -p python3 kspray_217
pip install -r requirements.txt
mkdir -p inventory/$CLUSTER_NAME
cp -rfp inventory/sample/* inventory/$CLUSTER_NAME/
cp ../kubespray-release-2.16/inventory/$CLUSTER_NAME/inventory.cfg inventory/$CLUSTER_NAME/inventory.cfg
# if you didn't use kubespray v2.16, you can create a new inventory.cfg file using the sample file
Edit inventory/$CLUSTER_NAME
/inventory.cfg if needed (e.g. add/remove nodes, etc)
vim inventory/$CLUSTER_NAME/inventory.cfg
Edit inventory/$CLUSTER_NAME
/group_vars/k8s_cluster/k8s-cluster.yml
Change kubernetes_version to 1.21.6
Uncomment and change kube_token_auth: true
cluster_name: $CLUSTER_NAME
kube_network_plugin: calico (Use your network plugin instead of calico if you use other network plugin)
vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml
Edit inventory/$CLUSTER_NAME
/group_vars/all/all.yml
- kube_read_only_port: 10255
vim inventory/$CLUSTER_NAME/group_vars/all/all.yml
Apply upgrade
ansible-playbook upgrade-cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root
Migrate to Containerd on Worker Nodes
Cordon and drain one of the worker nodes
kubectl cordon <worker node name>
kubectl drain <worker node name> --ignore-daemonsets --delete-emptydir-data`
Stop kubelet and docker services
systemctl stop kubelet docker containerd
systemctl status kubelet docker containerd
Remove docker and containerd packages
yum -y remove docker-docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64 \
docker-ce.x86_64 docker-scan-plugin.x86_64 containerd
Install containerd. We need specific version of containerd
yum -y install yum-utils
yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io-1.4.9-3.1.el7
Configure containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
Edit /etc/containerd/config.toml
- Add
SystemdCgroup = true
under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
Enable & Start containerd
systemctl enable containerd
systemctl start containerd
systemctl status containerd
Append these two KUBELET_ARGS
parameters to the /etc/kubernetes/kubelet.env
file
KUBELET_ARGS=" \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
"
Restart kubelet service and make sure it is running correctly. It may take a few minutes for Kubelet to fully boot up and error logs may print during this time. Ignore 'Not found $NODENAME' logs.
systemctl restart kubelet
systemctl status kubelet
Check docker, containerd and kubelet services
systemctl status docker containerd kubelet
#docker should be stopped
Reboot, just to make sure everything restarts fresh before the node is uncordoned
reboot
systemctl status docker containerd kubelet
#docker should be stopped
Finally, uncordon the node
kubectl uncordon <node name>
Migrate to Containerd on Master Nodes
Follow the same steps as worker nodes
Cordon and drain one of the master nodes
kubectl cordon <master node name>
kubectl drain --ignore-daemonsets --delete-emptydir-data <master node name>
Stop etcd, kubelet, containerd and docker services
systemctl stop kubelet etcd docker containerd
systemctl status kubelet docker etcd containerd
Remove docker and containerd packages
yum -y remove docker-docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64 \
docker-ce.x86_64 docker-scan-plugin.x86_64 containerd
Install containerd. We need specific version of containerd
yum -y install yum-utils
yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io-1.4.9-3.1.el7
Configure containerd
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
Edit /etc/containerd/config.toml
- Add
SystemdCgroup = true
under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
vim /etc/containerd/config.toml
Enable & Start containerd
systemctl enable containerd
systemctl start containerd
systemctl status containerd
Append these two KUBELET_ARGS
parameters to the /etc/kubernetes/kubelet.env
file
vim /etc/kubernetes/kubelet.env
KUBELET_ARGS=" \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
"
Restart kubelet service and make sure it is running correctly. It may take a few minutes for Kubelet to fully boot up and error logs may print during this time. Ignore 'Not found $NODENAME' logs.
systemctl restart kubelet
systemctl status kubelet
After that we need to run the kubespray 2.17 cluster.yml again with the following changes and --limit=$NODENAME parameter (Run this command for each master node)
Edit inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml
Change resolvconf_mode: host_resolvconf
Change container_manager: containerd
vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml
Edit inventory/$CLUSTER_NAME
/group_vars/etcd.yml
- Change etcd_deployment_type: host
vim inventory/$CLUSTER_NAME/group_vars/etcd.yml
Apply changes
ansible-playbook cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root --limit=$NODENAME
Check etcd, docker, containerd and kubelet services
systemctl status etcd docker containerd kubelet
#docker should be stopped
Reboot, just to make sure everything restarts fresh before the node is uncordoned
reboot
systemctl status docker containerd kubelet etcd
# etcd service is exists on master nodes only
Uncordon the node
kubectl uncordon <node name>
Upgrade Cluster to Kubernetes 1.22.8 Using Kubespray 2.18 (Optional)
Install kubespray release
mkdir -p /kubespray
cd /kubespray
wget https://github.com/kubernetes-sigs/kubespray/archive/release-2.18.zip
unzip release-2.18.zip
cd kubespray-release-2.18
Create virtualenv and install required python packages
mkvirtualenv -p python3 kspray_218
pip install -r requirements.txt
mkdir -p inventory/$CLUSTER_NAME
cp -rfp inventory/sample/* inventory/$CLUSTER_NAME/
cp ../kubespray-release-2.17/inventory/$CLUSTER_NAME/inventory.cfg inventory/$CLUSTER_NAME/inventory.cfg
Edit inventory/$CLUSTER_NAME
/inventory.cfg if needed (e.g. add/remove nodes, etc)
vim inventory/$CLUSTER_NAME/inventory.cfg
Edit inventory/$CLUSTER_NAME
/group_vars/k8s_cluster/k8s-cluster.yml
Change kubernetes_version to 1.22.8
Uncomment and change kube_token_auth: true
cluster_name:
kube_network_plugin: calico
vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml
Edit inventory/$CLUSTER_NAME/group_vars/all/all.yml
- kube_read_only_port: 10255
vim inventory/$CLUSTER_NAME/group_vars/all/all.yml
Apply upgrade
ansible-playbook upgrade-cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root
After Upgrade
Check Ingress Controller Version
Versions smaller than Nginx ingress v1 are not supported in kubernetes version 1.22. Therefore, the ingress controller version needs to be updated. There are multiple ingress controllers used for this purpose. I will advance the example through the nginx ingress controller offered by kubernetes.
The following steps are experimental and will affect you negatively if you have made manual changes to the ingress controller. Proceed at your own risk.
Delete existing ingress controller and its resources; (Careful, this will delete all ingress resources)
kubectl delete namespace ingress-nginx
Install new ingress controller
- Development environment;
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
- Production environment;
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/baremetal/deploy.yaml
Edit Existing Ingress Objects
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: example-ingress
namespace: app-namespace
spec:
rules:
- host: example.com
http:
paths:
- backend:
service:
name: example-service
port:
number: 80
path: /
pathType: Prefix
tls:
- hosts:
- example.com
secretName: example-com-ssl-2022-2023
status:
loadBalancer:
ingress:
- ip: X.X.X.X
Ingress must be include the following annotations:
kubernetes.io/ingress.class: nginx
Check Log Forwarder Configuration (If Exists)
Docker and Containerd saves container logs in different formats. Tools like fluentd configured to collect logs from Docker need to be reconfigured for logs saved in containerd's format.
An example fluentd configuration for docker is as follows:
<parse>
@type multi_format
<pattern>
# Get docker stdout-stderr json output logs file and parse it.
format json
time_key time
time_format %Y-%m-%dT%H:%M:%S.%NZ
</pattern>
<pattern>
# Get the non json !docker! stdout or stderr logs and set message into the "log" key.
format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
time_format %Y-%m-%dT%H:%M:%S.%N%:z
</pattern>
</parse>
An example fluentd configuration for containerd is as follows:
<parse>
@type cri
<parse>
@type multi_format
<pattern>
format json
keep_time_key true
</pattern>
<pattern>
format none
</pattern>
</parse>
</parse>