Kubernetes Upgrade 1.20.x to v1.22.x Using Kubespray - Docker to Containerd Migration

Photo by Growtika on Unsplash

Kubernetes Upgrade 1.20.x to v1.22.x Using Kubespray - Docker to Containerd Migration

WIP

This is a template post but below steps tested on our clusters, so you can use as it is (By adapting the kubespray configurations in this article according to your own configuration, of course).

Procedure

1. Upgrade k8s to v1.21.6 using kubespray v2.17

All steps are done on the bootstrap (or one of the master) node using root user

sudo su -

export CLUSTER_NAME=<put cluster name here>

Uninstall old ansible package

pip uninstall ansible

Install newer python packages and optional packages

yum -y install python3-pip python3-devel vim wget unzip tmux
# tmux is optional, but it is useful to have it

Create virtualenv if not exist

pip3 install virtualenv
mkdir -p ~/.virtualenvs
pip3 install virtualenvwrapper

echo 'export WORKON_HOME=$HOME/.virtualenvs' >> ~/.bashrc
echo 'source /usr/bin/virtualenvwrapper.sh' >> ~/.bashrc
source ~/.bashrc

Install required kubespray release

mkdir -p /kubespray
cd /kubespray
wget https://github.com/kubernetes-sigs/kubespray/archive/release-2.17.zip
unzip release-2.17.zip
cd kubespray-release-2.17

Create virtualenv and install requirements

mkvirtualenv -p python3 kspray_217
pip install -r requirements.txt

mkdir -p inventory/$CLUSTER_NAME
cp -rfp inventory/sample/* inventory/$CLUSTER_NAME/
cp ../kubespray-release-2.16/inventory/$CLUSTER_NAME/inventory.cfg inventory/$CLUSTER_NAME/inventory.cfg
# if you didn't use kubespray v2.16, you can create a new inventory.cfg file using the sample file

Edit inventory/$CLUSTER_NAME/inventory.cfg if needed (e.g. add/remove nodes, etc)

vim inventory/$CLUSTER_NAME/inventory.cfg

Edit inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

  • Change kubernetes_version to 1.21.6

  • Uncomment and change kube_token_auth: true

  • cluster_name: $CLUSTER_NAME

  • kube_network_plugin: calico (Use your network plugin instead of calico if you use other network plugin)

vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

Edit inventory/$CLUSTER_NAME/group_vars/all/all.yml

  • kube_read_only_port: 10255
vim inventory/$CLUSTER_NAME/group_vars/all/all.yml

Apply upgrade

ansible-playbook upgrade-cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root

Migrate to Containerd on Worker Nodes

Cordon and drain one of the worker nodes

kubectl cordon <worker node name>
kubectl drain <worker node name> --ignore-daemonsets --delete-emptydir-data`

Stop kubelet and docker services

systemctl stop kubelet docker containerd
systemctl status kubelet docker containerd

Remove docker and containerd packages

yum -y remove docker-docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64 \
docker-ce.x86_64 docker-scan-plugin.x86_64 containerd

Install containerd. We need specific version of containerd

yum -y install yum-utils
yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io-1.4.9-3.1.el7

Configure containerd

mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml

Edit /etc/containerd/config.toml

  • Add SystemdCgroup = true under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true

Enable & Start containerd

systemctl enable containerd
systemctl start containerd
systemctl status containerd

Append these two KUBELET_ARGS parameters to the /etc/kubernetes/kubelet.env file

KUBELET_ARGS=" \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
"

Restart kubelet service and make sure it is running correctly. It may take a few minutes for Kubelet to fully boot up and error logs may print during this time. Ignore 'Not found $NODENAME' logs.

systemctl restart kubelet
systemctl status kubelet

Check docker, containerd and kubelet services

systemctl status docker containerd kubelet
#docker should be stopped

Reboot, just to make sure everything restarts fresh before the node is uncordoned

reboot

systemctl status docker containerd kubelet
#docker should be stopped

Finally, uncordon the node

kubectl uncordon <node name>

Migrate to Containerd on Master Nodes

Follow the same steps as worker nodes

Cordon and drain one of the master nodes

kubectl cordon <master node name>
kubectl drain --ignore-daemonsets --delete-emptydir-data <master node name>

Stop etcd, kubelet, containerd and docker services

systemctl stop kubelet etcd docker containerd
systemctl status kubelet docker etcd containerd

Remove docker and containerd packages

yum -y remove docker-docker-ce-cli.x86_64 docker-ce-rootless-extras.x86_64 \
docker-ce.x86_64 docker-scan-plugin.x86_64 containerd

Install containerd. We need specific version of containerd

yum -y install yum-utils
yum-config-manager --add-repo \
https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io-1.4.9-3.1.el7

Configure containerd

mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml

Edit /etc/containerd/config.toml

  • Add SystemdCgroup = true under [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
 [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
      SystemdCgroup = true
vim /etc/containerd/config.toml

Enable & Start containerd

systemctl enable containerd
systemctl start containerd
systemctl status containerd

Append these two KUBELET_ARGS parameters to the /etc/kubernetes/kubelet.env file

vim /etc/kubernetes/kubelet.env
KUBELET_ARGS=" \
--container-runtime=remote \
--container-runtime-endpoint=unix:///run/containerd/containerd.sock \
"

Restart kubelet service and make sure it is running correctly. It may take a few minutes for Kubelet to fully boot up and error logs may print during this time. Ignore 'Not found $NODENAME' logs.

systemctl restart kubelet
systemctl status kubelet

After that we need to run the kubespray 2.17 cluster.yml again with the following changes and --limit=$NODENAME parameter (Run this command for each master node)

Edit inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

  • Change resolvconf_mode: host_resolvconf

  • Change container_manager: containerd

vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

Edit inventory/$CLUSTER_NAME/group_vars/etcd.yml

  • Change etcd_deployment_type: host
vim inventory/$CLUSTER_NAME/group_vars/etcd.yml

Apply changes

ansible-playbook cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root --limit=$NODENAME

Check etcd, docker, containerd and kubelet services

systemctl status etcd docker containerd kubelet
#docker should be stopped

Reboot, just to make sure everything restarts fresh before the node is uncordoned

reboot

systemctl status docker containerd kubelet etcd
# etcd service is exists on master nodes only

Uncordon the node

kubectl uncordon <node name>

Upgrade Cluster to Kubernetes 1.22.8 Using Kubespray 2.18 (Optional)

Install kubespray release

mkdir -p /kubespray
cd /kubespray
wget https://github.com/kubernetes-sigs/kubespray/archive/release-2.18.zip
unzip release-2.18.zip
cd kubespray-release-2.18

Create virtualenv and install required python packages

mkvirtualenv -p python3 kspray_218
pip install -r requirements.txt

mkdir -p inventory/$CLUSTER_NAME
cp -rfp inventory/sample/* inventory/$CLUSTER_NAME/
cp ../kubespray-release-2.17/inventory/$CLUSTER_NAME/inventory.cfg inventory/$CLUSTER_NAME/inventory.cfg

Edit inventory/$CLUSTER_NAME/inventory.cfg if needed (e.g. add/remove nodes, etc)

vim inventory/$CLUSTER_NAME/inventory.cfg

Edit inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

  • Change kubernetes_version to 1.22.8

  • Uncomment and change kube_token_auth: true

  • cluster_name:

  • kube_network_plugin: calico

vim inventory/$CLUSTER_NAME/group_vars/k8s_cluster/k8s-cluster.yml

Edit inventory/$CLUSTER_NAME/group_vars/all/all.yml

  • kube_read_only_port: 10255
vim inventory/$CLUSTER_NAME/group_vars/all/all.yml

Apply upgrade

ansible-playbook upgrade-cluster.yml -b -i inventory/$CLUSTER_NAME/inventory.cfg --user root

After Upgrade

Check Ingress Controller Version

Versions smaller than Nginx ingress v1 are not supported in kubernetes version 1.22. Therefore, the ingress controller version needs to be updated. There are multiple ingress controllers used for this purpose. I will advance the example through the nginx ingress controller offered by kubernetes.

The following steps are experimental and will affect you negatively if you have made manual changes to the ingress controller. Proceed at your own risk.

Delete existing ingress controller and its resources; (Careful, this will delete all ingress resources)

kubectl delete namespace ingress-nginx
Install new ingress controller
  • Development environment;
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/cloud/deploy.yaml
  • Production environment;
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.4.0/deploy/static/provider/baremetal/deploy.yaml
Edit Existing Ingress Objects
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  name: example-ingress
  namespace: app-namespace
spec:
  rules:
  - host: example.com
    http:
      paths:
      - backend:
          service:
            name: example-service
            port:
              number: 80
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - example.com
    secretName: example-com-ssl-2022-2023
status:
  loadBalancer:
    ingress:
    - ip: X.X.X.X

Ingress must be include the following annotations:

kubernetes.io/ingress.class: nginx

Check Log Forwarder Configuration (If Exists)

Docker and Containerd saves container logs in different formats. Tools like fluentd configured to collect logs from Docker need to be reconfigured for logs saved in containerd's format.

An example fluentd configuration for docker is as follows:

<parse>
  @type multi_format
  <pattern>
    # Get docker stdout-stderr json output logs file and parse it.
    format json
    time_key time
    time_format %Y-%m-%dT%H:%M:%S.%NZ
  </pattern>
  <pattern>
    # Get the non json !docker! stdout or stderr logs and set message into the "log" key.
    format /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
    time_format %Y-%m-%dT%H:%M:%S.%N%:z
  </pattern>
</parse>

An example fluentd configuration for containerd is as follows:

<parse>
  @type cri
  <parse>
    @type multi_format
    <pattern>
      format json
      keep_time_key true
    </pattern>
    <pattern>
      format none
    </pattern>
  </parse>
</parse>