Introduction
Deploying new versions of applications is a critical, yet often nerve-wracking, part of the software development lifecycle. The fear of downtime, service interruptions, or a broken user experience looms large with every release. Traditional deployment strategies often involve taking services offline, leading to frustrated users and lost revenue. In today’s always-on world, such disruptions are simply unacceptable. This is where Kubernetes shines, offering robust mechanisms to perform updates without a single hiccup in service availability.
Kubernetes’ rolling update strategy is a powerful feature that allows you to update your applications with zero downtime. Instead of replacing all instances of your application at once, Kubernetes intelligently replaces old Pods with new ones in a controlled, gradual manner. This ensures that a sufficient number of healthy Pods are always running, maintaining continuous service availability throughout the deployment process. Understanding and mastering rolling updates is fundamental for anyone managing production-grade applications on Kubernetes.
This guide will thoroughly demystify Kubernetes rolling updates, walking you through the concepts, configuration, and practical steps to achieve seamless, zero-downtime deployments. We’ll cover everything from defining your deployment strategy to monitoring its progress and rolling back when necessary. By the end, you’ll be equipped to confidently deploy new application versions without breaking a sweat, guaranteeing a smooth experience for your users and peace of mind for your operations team.
TL;DR: Kubernetes Rolling Updates
Kubernetes rolling updates enable zero-downtime deployments by gradually replacing old application Pods with new ones. This ensures continuous service availability. Key configurations include maxSurge and maxUnavailable in your Deployment’s strategy.
Key Commands:
- Create Deployment:
kubectl apply -f my-deployment.yaml - Update Image:
kubectl set image deployment/my-app my-container=my-registry/my-app:v2.0 - Check Status:
kubectl rollout status deployment/my-app - View History:
kubectl rollout history deployment/my-app - Undo Rollout:
kubectl rollout undo deployment/my-app - Pause Rollout:
kubectl rollout pause deployment/my-app - Resume Rollout:
kubectl rollout resume deployment/my-app
Always define readiness and liveness probes for robust updates.
Prerequisites
Before diving into rolling updates, ensure you have the following:
- Kubernetes Cluster: A running Kubernetes cluster (e.g., Minikube, Kind, or a cloud-managed cluster like GKE, EKS, AKS). For local development, Minikube or Kind are excellent choices.
- kubectl: The Kubernetes command-line tool, configured to connect to your cluster. You can find installation instructions in the official Kubernetes documentation.
- Basic Kubernetes Knowledge: Familiarity with core Kubernetes concepts such as Pods, Deployments, Services, and ReplicaSets.
- Text Editor: Any text editor for creating Kubernetes YAML manifest files.
Step-by-Step Guide: Zero-Downtime Deployments with Rolling Updates
1. Understanding the RollingUpdate Strategy
Kubernetes Deployments default to the RollingUpdate strategy. This strategy updates Pods in a controlled sequence, ensuring that your application remains available throughout the process. The key parameters that govern this behavior are maxUnavailable and maxSurge, defined within the .spec.strategy.rollingUpdate field of your Deployment manifest. These values can be specified as either absolute numbers or percentages.
* maxUnavailable: This defines the maximum number of Pods that can be unavailable during the update process. If set to 25% (default), at most 25% of the total desired Pods can be down at any given time. This ensures that your application maintains a minimum level of availability.
* maxSurge: This defines the maximum number of Pods that can be created above the desired number of Pods. If set to 25% (default), the Deployment can temporarily scale up to 125% of its desired size during the update. This allows new Pods to be brought up and become ready before old Pods are terminated, minimizing downtime.
Together, these parameters allow you to fine-tune the speed and risk of your rolling updates. A higher maxSurge and maxUnavailable will lead to faster updates but with potentially more disruption, while lower values will result in slower but safer updates.
2. Creating an Initial Deployment
Let’s start by creating a simple Nginx Deployment with three replicas. This will serve as our baseline application that we’ll later update. Notice that we’ve included basic liveness and readiness probes. These are crucial for robust rolling updates, as they tell Kubernetes when a Pod is truly ready to receive traffic (readiness) and when it’s healthy enough to remain running (liveness). Without them, Kubernetes might prematurely route traffic to unready Pods or terminate Pods that are still processing requests.
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-app
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25% # At most 25% of Pods can be unavailable
maxSurge: 25% # At most 25% more Pods than desired can be created
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.19.0 # Our initial image version
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
Apply this manifest to your cluster:
kubectl apply -f deployment-v1.yaml
Verify:
Check the status of your Deployment and Pods. You should see three running Pods.
kubectl get deployment my-nginx-app
Expected Output:
NAME READY UP-TO-DATE AVAILABLE AGE
my-nginx-app 3/3 3 3 Xs
kubectl get pods -l app=nginx
Expected Output:
NAME READY STATUS RESTARTS AGE
my-nginx-app-79888998c-abcde 1/1 Running 0 Xs
my-nginx-app-79888998c-fghij 1/1 Running 0 Xs
my-nginx-app-79888998c-klmno 1/1 Running 0 Xs
3. Performing a Rolling Update
Now, let’s update our Nginx application to a newer version. We’ll change the image from nginx:1.19.0 to nginx:1.21.0. Kubernetes will detect this change in the Deployment’s Pod template and initiate a rolling update. It will create new Pods with the nginx:1.21.0 image, wait for them to become ready (as per readiness probes), and then terminate old Pods with the nginx:1.19.0 image. This process repeats until all old Pods are replaced by new ones.
You can modify the YAML file directly and re-apply, or use the kubectl set image command for a quick update.
kubectl set image deployment/my-nginx-app nginx=nginx:1.21.0
This command is a convenient way to update a container image within a Deployment. The format is deployment/.
Verify:
Monitor the rollout status. You’ll see new Pods being created and old ones terminating.
kubectl rollout status deployment/my-nginx-app
Expected Output (during rollout):
Waiting for deployment "my-nginx-app" rollout to finish: 1 out of 3 new replicas have been updated...
Waiting for deployment "my-nginx-app" rollout to finish: 1 old replicas are pending termination...
Waiting for deployment "my-nginx-app" rollout to finish: 2 out of 3 new replicas have been updated...
Waiting for deployment "my-nginx-app" rollout to finish: 1 old replicas are pending termination...
deployment "my-nginx-app" successfully rolled out
Once the rollout is complete, check the Pods again to confirm they are all running the new image.
kubectl get pods -l app=nginx -o custom-columns=NAME:.metadata.name,IMAGE:.spec.containers[0].image
Expected Output:
NAME IMAGE
my-nginx-app-7c7f7f7f7-abcde nginx:1.21.0
my-nginx-app-7c7f7f7f7-fghij nginx:1.21.0
my-nginx-app-7c7f7f7f7-klmno nginx:1.21.0
Notice the change in the Pod names (due to the new ReplicaSet created for the 1.21.0 image) and the updated image version.
4. Inspecting Rollout History
Kubernetes keeps a history of your Deployments, allowing you to track changes and revert to previous versions if needed. Each update creates a new revision.
kubectl rollout history deployment/my-nginx-app
Expected Output:
deployment.apps/my-nginx-app
REVISION CHANGE-CAUSE
1
2 kubectl set image deployment/my-nginx-app nginx=nginx:1.21.0
To see detailed information for a specific revision, use the --revision flag. This is useful for understanding what exactly changed in a previous version.
kubectl rollout history deployment/my-nginx-app --revision=1
Expected Output (truncated):
deployment.apps/my-nginx-app with revision 1
Pod Template:
Labels: app=nginx
pod-template-hash=79888998c
Containers:
nginx:
Image: nginx:1.19.0
Port: 80/TCP
Host Port: 0/TCP
Environment:
Mounts:
5. Rolling Back a Deployment
If a new deployment introduces issues, you can quickly revert to a previous, stable version using the kubectl rollout undo command. This is a critical feature for maintaining service reliability. You can undo to the immediately previous version or specify a particular revision.
To undo to the previous revision (revision 1 in our case):
kubectl rollout undo deployment/my-nginx-app
Expected Output:
deployment.apps/my-nginx-app rolled back
Alternatively, to roll back to a specific revision (e.g., revision 1):
kubectl rollout undo deployment/my-nginx-app --to-revision=1
Verify:
Monitor the rollout status and then check the Pod images. They should revert to nginx:1.19.0.
kubectl rollout status deployment/my-nginx-app
Expected Output:
Waiting for deployment "my-nginx-app" rollout to finish: 1 out of 3 new replicas have been updated...
deployment "my-nginx-app" successfully rolled out
kubectl get pods -l app=nginx -o custom-columns=NAME:.metadata.name,IMAGE:.spec.containers[0].image
Expected Output:
NAME IMAGE
my-nginx-app-79888998c-abcde nginx:1.19.0
my-nginx-app-79888998c-fghij nginx:1.19.0
my-nginx-app-79888998c-klmno nginx:1.19.0
6. Pausing and Resuming a Rollout
Sometimes, you might need to temporarily halt a rolling update to investigate an issue or perform manual steps. Kubernetes allows you to pause and resume rollouts. This is particularly useful in complex deployments or when integrating with external systems.
Let’s start another update (e.g., to nginx:1.22.0) and then immediately pause it.
kubectl set image deployment/my-nginx-app nginx=nginx:1.22.0
Expected Output:
deployment.apps/my-nginx-app image updated
Now, pause the rollout:
kubectl rollout pause deployment/my-nginx-app
Expected Output:
deployment.apps/my-nginx-app paused
Verify:
Check the rollout status. It should indicate that the rollout is paused, and you’ll likely see a mix of old and new Pods, neither fully deployed nor fully reverted.
kubectl rollout status deployment/my-nginx-app
Expected Output:
deployment "my-nginx-app" successfully paused
You can also check the Pods. Depending on when you paused, you might see some nginx:1.19.0 and some nginx:1.22.0 Pods.
kubectl get pods -l app=nginx -o custom-columns=NAME:.metadata.name,IMAGE:.spec.containers[0].image
Once you’ve addressed the issue or are ready to proceed, you can resume the rollout:
kubectl rollout resume deployment/my-nginx-app
Expected Output:
deployment.apps/my-nginx-app resumed
Verify:
The rollout will now continue until all Pods are updated to nginx:1.22.0.
kubectl rollout status deployment/my-nginx-app
Expected Output:
Waiting for deployment "my-nginx-app" rollout to finish: 1 out of 3 new replicas have been updated...
...
deployment "my-nginx-app" successfully rolled out
kubectl get pods -l app=nginx -o custom-columns=NAME:.metadata.name,IMAGE:.spec.containers[0].image
Expected Output:
NAME IMAGE
my-nginx-app-xxxxxxxx-abcde nginx:1.22.0
my-nginx-app-xxxxxxxx-fghij nginx:1.22.0
my-nginx-app-xxxxxxxx-klmno nginx:1.22.0
Production Considerations
While rolling updates provide excellent zero-downtime capabilities, several factors need careful consideration for production environments:
- Liveness and Readiness Probes: These are paramount. A misconfigured readiness probe can lead to traffic being routed to an unready Pod, causing errors, or a Pod being prematurely terminated. A faulty liveness probe can result in a Pod being repeatedly restarted, leading to a crash loop. Refer to the official Kubernetes documentation on probes for best practices.
- Pod Disruption Budgets (PDBs): For critical applications, especially those managed by Karpenter for cost optimization or during node maintenance, PDBs ensure that a minimum number of Pods are available during voluntary disruptions. This works in conjunction with rolling updates to guarantee availability. Learn more about PDBs.
- Resource Limits and Requests: Proper resource allocation prevents Pods from being evicted or throttled during high load, which can be exacerbated during a rolling update when new Pods are spinning up.
- PreStop Hooks and Termination Grace Period: Implement preStop hooks to gracefully shut down your application. This allows your application to finish processing in-flight requests and clean up before the Pod is terminated, preventing data loss or client errors. Increase
terminationGracePeriodSecondsif your application needs more time to shut down. - Monitoring and Alerting: During a rollout, closely monitor key metrics like error rates, latency, and resource utilization. Set up alerts to notify you immediately if any anomalies occur. Tools like Prometheus and Grafana are essential here. For advanced eBPF-based observability, consider exploring eBPF Observability with Hubble.
- Traffic Management: For more sophisticated traffic routing, especially during blue/green or canary deployments, consider using a service mesh like Istio with Ambient Mesh or the Kubernetes Gateway API. These tools offer advanced control over traffic flow, allowing for finer-grained control than standard rolling updates.
- Immutable Deployments: Always ensure your container images are immutable. Never modify an image tag; instead, build a new image with a new tag for every change. This ensures reproducibility and simplifies rollbacks.
- Automated Testing: Integrate automated tests (unit, integration, end-to-end) into your CI/CD pipeline. These tests should run against the new version before, during, and after a deployment to catch regressions early.
- Security Considerations: Ensure your images are scanned for vulnerabilities. For enhanced supply chain security, integrate tools like Sigstore and Kyverno to verify image authenticity and enforce policies.
- Network Policies: While not directly related to rolling updates, robust Kubernetes Network Policies ensure that your application Pods can only communicate with authorized services, enhancing overall security, especially when new Pods are introduced during a rollout. For advanced networking, consider solutions like Cilium WireGuard Encryption.
Troubleshooting
Here are some common issues encountered during rolling updates and their solutions:
-
Stuck Rollout / Pending Pods:
Problem: The rollout stalls, and new Pods remain in a
Pendingstate, often preventing old Pods from terminating.Cause: Insufficient cluster resources (CPU, memory), incorrect node selectors/taints/tolerations, or issues with Persistent Volume Claims (PVCs).
Solution:
- Check Pod events:
kubectl describe pod <pod-name>for scheduling issues.
- Verify cluster resources:
kubectl top nodesand
kubectl describe nodes.
- Ensure node selectors and tolerations are correctly configured.
- If using PVCs, ensure storage classes are available and claims can be bound. You might need to scale up your cluster or free up resources.
- Check Pod events:
-
CrashLoopBackOff Pods:
Problem: New Pods enter a
CrashLoopBackOffstate, indicating the application inside the container is failing to start.Cause: Application bugs, incorrect configuration (e.g., environment variables, mounted volumes), missing dependencies, or port conflicts.
Solution:
- Inspect Pod logs:
kubectl logs <pod-name>to identify application errors.
- Check Pod events:
kubectl describe pod <pod-name>for container-level issues.
- Verify your container image can run successfully locally.
- Compare configuration between the old and new versions for discrepancies.
- Rollback the deployment:
kubectl rollout undo deployment/<deployment-name>while you debug.
- Inspect Pod logs:
-
Readiness Probe Failure:
Problem: New Pods start but never become
Ready, preventing them from receiving traffic and the rollout from completing.Cause: The application takes too long to start, the readiness probe path/port is incorrect, or the application isn’t serving traffic on the expected endpoint.
Solution:
- Increase
initialDelaySecondsandperiodSecondsin the readiness probe to give the application more time. - Verify the readiness probe’s
pathandportare correct and the application exposes a healthy endpoint. - Check application logs for startup errors.
- Exec into the Pod and try to access the readiness endpoint manually:
kubectl exec -it <pod-name> -- curl localhost:<port>/<path>.
- Increase
-
Application Errors During Rollout (5xx errors):
Problem: Users experience errors during the rollout, even though Pods appear to be starting and terminating correctly.
Cause: Application not gracefully handling termination (no preStop hook), readiness probe is too aggressive (marks Pod ready before it can handle traffic), or load balancer/service is not properly updated.
Solution:
- Implement a preStop hook in your container to gracefully shut down.
- Increase
terminationGracePeriodSecondsfor your Pods. - Refine your readiness probe to accurately reflect when the application is truly ready to serve traffic.
- Ensure your Service is correctly configured to select only ready Pods.
- Consider increasing
maxUnavailableslightly if brief disruptions are acceptable for faster rollouts, or decreasingmaxSurgefor more cautious rollouts.
-
Rollout Stuck on “Waiting for deployment…”:
Problem: The
kubectl rollout statuscommand hangs indefinitely.Cause: Often related to Pods not becoming ready or old Pods not terminating. This can be a symptom of readiness probe failures or resource constraints.
Solution:
- Check for Pods in
PendingorCrashLoopBackOffstates (see above solutions). - Examine events for the deployment:
kubectl describe deployment <deployment-name>.
- Verify that the new Pods are actually being created and the old ones are being deleted.
- If the Deployment controller itself is unhealthy, check controller-manager logs.
- Check for Pods in
-
Rollback Fails or is Inconsistent:
Problem: Attempting to roll back to a previous version doesn’t work as expected, or the application remains unstable.
Cause: The previous revision itself was faulty, external dependencies changed, or stateful data was corrupted.
Solution:
- Inspect the history of the deployment carefully:
kubectl rollout history deployment/<deployment-name> --revision=<N>to ensure the target revision is indeed stable.
- If the problem persists, try rolling back to an even earlier, known-good revision.
- Consider if the issue is with external services (databases, caches) that might not have been rolled back or are incompatible with the older application version.
- For stateful applications, rolling back requires careful consideration of data compatibility.
- Inspect the history of the deployment carefully:
FAQ Section
-
What’s the difference between
maxSurgeandmaxUnavailable?maxSurgespecifies the maximum number of Pods that can be created *above* the desired number of replicas during an update. For example, if you have 3 replicas andmaxSurge: 1, Kubernetes can temporarily run up to 4 Pods (3 old + 1 new, or 2 old + 2 new, etc.). This ensures new Pods are ready before old ones are terminated, minimizing downtime.maxUnavailablespecifies the maximum number of Pods that can be *unavailable* (not ready) during an update. If you have 3 replicas andmaxUnavailable: 1, at most 1 Pod can be down or not ready at any point. This guarantees a minimum level of service availability.Both parameters work together to control the speed and safety of your rolling updates.
-
Can I perform a Blue/Green or Canary deployment with just Kubernetes Deployments?
While Deployments provide the foundation for rolling updates, they primarily support a “rolling” update strategy, which is a form of in-place update. For true Blue/Green or Canary deployments, you typically need additional tools like a Service Mesh (Istio Ambient Mesh), an Ingress Controller with advanced traffic splitting capabilities, or the Kubernetes Gateway API. These tools allow you to direct a percentage of traffic to a new version (Canary) or switch all traffic to a completely new, parallel environment (Blue/Green).
-
How do I ensure my database migrations are handled during a rolling update?
Database migrations are often the trickiest part of a zero-downtime deployment. It’s crucial to separate application deployment from database schema changes. Best practice is to use an init container or a separate Kubernetes Job to run schema migrations. Ensure migrations are backward-compatible so that both the old and new application versions can run against the updated schema during the transition. For more complex scenarios, consider tools like Flyway or Liquibase.
-
What happens if a rolling update fails midway?
If a rolling update encounters issues (e.g., new Pods crash, readiness probes fail), Kubernetes will detect this and halt the rollout. The Deployment will remain in a “failed” state, with a mix of old and new Pods (or only old Pods if new ones never became ready). At this point, you can inspect the issue, fix your image/configuration, and re-trigger the update, or perform a rollback using
kubectl rollout undoto revert to the last stable version.
-
How many replicas should I have for zero-downtime updates?
For truly zero-downtime updates, you should ideally have at least two replicas, and often more, depending on your
maxUnavailableandmaxSurgesettings. If you have only one replica,maxUnavailablemust be 0, which meansmaxSurgemust be at least 1, temporarily creating two replicas. IfmaxUnavailableis 1 and you have only one replica, then the service will experience downtime. A common recommendation is to have at least 3 replicas for critical services to ensure high availability and smooth rolling updates.
Cleanup Commands
To remove the resources created during this guide:
kubectl delete deployment my-nginx-app
Expected Output:
deployment.apps "my-nginx-app" deleted
Next Steps / Further Reading
You’ve now mastered the fundamentals of Kubernetes rolling updates for zero-downtime deployments. To further enhance your deployment strategies and application resilience, consider exploring:
- Kubernetes Deployment Documentation: Dive deeper into all the configuration options for Deployments.
- Pod Disruption Budgets (PDBs): Learn how to protect your applications during voluntary disruptions.
- Kubernetes Gateway API vs Ingress: Understand how modern traffic management solutions can enable advanced deployment patterns like canary releases.
- Istio Ambient Mesh Production Guide: Explore service meshes for advanced traffic management, observability, and security features.
- Kubernetes Jobs: For one-off tasks like database migrations.
- Blue/Green Deployments on GKE: A cloud provider’s perspective on advanced deployment strategies.
- Running LLMs on Kubernetes: GPU Scheduling Best Practices: If you’re dealing with demanding AI workloads, optimizing resource scheduling is crucial for stable updates.
Conclusion
Kubernetes rolling updates are a cornerstone of modern, cloud-native application deployment. By intelligently orchestrating the replacement of old application Pods with new ones, Kubernetes empowers you to achieve seamless, zero-downtime deployments, keeping your services continuously available and your users happy. Mastering the configuration of maxSurge and maxUnavailable, coupled with robust readiness and liveness probes, is key to successful and stress-free releases. With the knowledge gained from this guide, you are well-equipped to manage your application deployments with confidence and efficiency in any Kubernetes environment. Continue to explore the rich ecosystem of Kubernetes tools and practices to further refine your deployment strategies and build highly resilient systems.