Upgrade Procedures
How to upgrade Talos, Kubernetes, and everything else.
Before You Upgrade
Always a good idea to check things are healthy first:
kubectl get nodes
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
And maybe take a manual backup of anything critical:
task volsync:snapshot APP=<app> NS=<ns>
Talos Upgrades
Single Node
task talos:upgrade-node NODE=m0 VERSION=v1.9.0
This downloads the Talos version from the factory, applies it with secure boot, and reboots. Times out after 10 minutes.
Rolling Upgrade
For the whole cluster, just do them one at a time and wait for each to come back:
task talos:upgrade-node NODE=m0 VERSION=v1.9.0
# wait for it to rejoin
task talos:upgrade-node NODE=m1 VERSION=v1.9.0
# wait
task talos:upgrade-node NODE=m2 VERSION=v1.9.0
Between each, verify the node is Ready and Ceph is healthy:
kubectl get nodes
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph status
Kubernetes Upgrades
task talos:upgrade-k8s
This upgrades Kubernetes across all nodes. The version comes from kubernetes/apps/system-upgrade/tuppr/upgrades/kubernetes.yaml.
Flux and Helm Charts
Renovate handles this automatically - it creates PRs when updates are available. Just review and merge them.
To force a reconcile after merging:
task kubernetes:reconcile
Merge Renovate PRs
# List open PRs
task github:pr:list
# Merge one
task github:pr:merge ID=123
# Merge all of them
task github:pr:merge:all
ARC Upgrade
Actions Runner Controller needs a special upgrade process because of CRD stuff:
task kubernetes:upgrade-arc
This uninstalls the runner and controller, waits a bit, then reconciles them back via Flux.
Rollback
Talos
Talos keeps the previous install around. Reboot and pick the old one from the boot menu:
task talos:reboot-node NODE=<node> MODE=powercycle
Flux/Helm
Just revert the commit and push:
git revert <commit>
git push
task kubernetes:reconcile
If Things Go Wrong
Node Stuck During Upgrade
Check whats happening:
talosctl -n <node> dmesg | tail -100
Force a reboot if needed:
task talos:reboot-node NODE=<node> MODE=powercycle
Can't Connect After Upgrade
Regenerate kubeconfig:
task talos:kubeconfig
Helm Releases Failing
Restart failed releases:
task kubernetes:hr:restart