Introduction
In the dynamic world of Kubernetes, optimizing resource utilization is a perennial challenge. Workloads often exhibit fluctuating resource demands, making it difficult to set static CPU and memory requests and limits effectively. Over-provisioning leads to wasted resources and increased cloud costs, while under-provisioning results in performance degradation, application instability, and even outages. This delicate balancing act is where the Kubernetes Vertical Pod Autoscaler (VPA) steps in as a powerful ally, offering a sophisticated solution to automatically adjust container resource requests based on historical usage.
The Horizontal Pod Autoscaler (HPA) scales pods horizontally by adding or removing replicas, but it doesn’t address the individual resource needs of each pod. VPA, on the other hand, focuses on vertically scaling the resources (CPU and memory) allocated to containers within a pod. By continuously monitoring actual resource consumption, VPA recommends and, optionally, enforces optimal resource requests, ensuring your applications have just enough resources to perform efficiently without breaking the bank. This guide will walk you through the intricacies of configuring and leveraging VPA to achieve significant improvements in resource efficiency and application performance within your Kubernetes clusters.
TL;DR: Kubernetes Vertical Pod Autoscaler (VPA) Configuration
The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests for containers in a pod based on historical usage, optimizing resource utilization.
Key Commands:
- Install VPA:
git clone https://github.com/kubernetes/autoscaler.git cd autoscaler/vertical-pod-autoscaler ./hack/vpa-up.sh - Create a VPA resource:
apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: my-app-vpa spec: targetRef: apiVersion: "apps/v1" kind: Deployment name: my-app-deployment updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: '*' minAllowed: cpu: 100m memory: 50Mi maxAllowed: cpu: 2 memory: 4Gi controlledResources: ["cpu", "memory"] - Check VPA recommendations:
kubectl get vpa my-app-vpa -o yaml - Uninstall VPA:
./hack/vpa-down.sh
VPA can operate in four modes: Off, Initial, Recreate, and Auto. Use Auto for full automation, Recreate for more aggressive updates, and Initial for setting requests only at pod creation. Remember to set proper minAllowed and maxAllowed in resourcePolicy to prevent runaway scaling.
Prerequisites
Before diving into VPA configuration, ensure you have the following:
- A Kubernetes Cluster: A running Kubernetes cluster (v1.13 or higher is recommended for full VPA functionality). You can use Minikube, Kind, or any cloud provider’s managed Kubernetes service (EKS, GKE, AKS).
kubectlCommand-Line Tool: Configured to interact with your cluster. Refer to the official Kubernetes documentation for installation instructions.- Metrics Server: VPA relies on the Metrics Server to collect resource utilization data. Ensure it’s installed and running in your cluster. You can check its status with
kubectl get apiservice v1beta1.metrics.k8s.io. If it’s not running, install it using:kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml - Basic Kubernetes Knowledge: Familiarity with Deployments, Pods, and resource requests/limits.
- Git: To clone the VPA repository for installation.
Step-by-Step Guide: Kubernetes Vertical Pod Autoscaler Configuration
Step 1: Install the Vertical Pod Autoscaler
The VPA is not part of the core Kubernetes distribution and needs to be installed separately. It consists of several components: the VPA Recommender, VPA Updater, and VPA Admission Controller. The Recommender analyzes historical and real-time resource usage to propose optimal resource requests. The Updater then applies these recommendations by evicting and recreating pods with updated resource requests (in certain modes). The Admission Controller intercepts pod creation requests and injects the recommended resource requests before the pod is scheduled.
We’ll install VPA by cloning its official GitHub repository and using the provided deployment scripts. This ensures you get all necessary components configured correctly.
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh
Verify Installation
After running the script, verify that all VPA components are running in the kube-system namespace. You should see deployments for vpa-recommender, vpa-updater, and vpa-admission-controller.
kubectl get deployments -n kube-system | grep vpa
kubectl get pods -n kube-system | grep vpa
Expected Output (may vary slightly based on version):
vpa-admission-controller 1/1 1 1 5m
vpa-recommender 1/1 1 1 5m
vpa-updater 1/1 1 1 5m
vpa-admission-controller-7b9d6c7b-abcde 1/1 Running 0 5m
vpa-recommender-6b8c7d6b-fghij 1/1 Running 0 5m
vpa-updater-5c7d8e9f-klmno 1/1 Running 0 5m
Step 2: Deploy a Sample Application
To demonstrate VPA’s functionality, we need an application whose resource requests can be adjusted. We’ll deploy a simple Nginx deployment without any explicit resource requests or limits initially, allowing VPA to make its recommendations.
This deployment creates three Nginx pods. Since we haven’t specified resource requests, Kubernetes will assign default values (or none, depending on the cluster configuration), which VPA will then observe and recommend adjustments for. For more advanced networking configurations for your applications, consider exploring topics like Kubernetes Network Policies to secure traffic or even Kubernetes Gateway API for modern ingress management.
# my-nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-deployment
labels:
app: my-nginx
spec:
replicas: 3
selector:
matchLabels:
app: my-nginx
template:
metadata:
labels:
app: my-nginx
spec:
containers:
- name: nginx
image: nginx:1.19.0
ports:
- containerPort: 80
# No resource requests/limits defined here initially
# VPA will recommend and set them.
---
apiVersion: v1
kind: Service
metadata:
name: my-nginx-service
spec:
selector:
app: my-nginx
ports:
- protocol: TCP
port: 80
targetPort: 80
kubectl apply -f my-nginx-deployment.yaml
Verify Deployment
Ensure your Nginx pods are running. Note that their resource requests will likely be empty or default at this stage.
kubectl get pods -l app=my-nginx
kubectl describe pod $(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}') | grep -A 5 "Limits:"
Expected Output (showing pods running and no explicit requests/limits):
my-nginx-deployment-7b8b7c8d-abcde 1/1 Running 0 2m
my-nginx-deployment-7b8b7c8d-fghij 1/1 Running 0 2m
my-nginx-deployment-7b8b7c8d-klmno 1/1 Running 0 2m
Limits:
cpu: 250m
memory: 64Mi
Requests:
cpu: 250m
memory: 64Mi
# Note: The above output might show default requests/limits injected by the cluster if no explicit ones are set.
# VPA will override these.
Step 3: Create a Vertical Pod Autoscaler Resource
Now, we’ll define a VPA resource that targets our Nginx deployment. The VPA resource tells the VPA components which pods to monitor and how to apply resource recommendations. Key fields include targetRef to specify the target workload, updatePolicy to control how recommendations are applied, and resourcePolicy to set bounds and control which resources are managed.
The updateMode field is crucial:
Off: VPA only provides recommendations; it does not apply them.Initial: VPA sets resource requests only when a pod is first created. It does not update existing pods.Recreate: VPA updates resource requests by evicting and recreating pods. This is more aggressive and can cause temporary service disruptions.Auto: VPA updates resource requests by evicting and recreating pods, similar toRecreate, but it also handles the initial setting. This is generally the most automated and recommended mode for production if your application can tolerate pod restarts.
The resourcePolicy allows you to define minimum and maximum allowed resources for containers, preventing VPA from recommending excessively low or high values. It’s a good practice to set these bounds based on your application’s known requirements to avoid over-provisioning or under-provisioning. For instance, if your application has a known memory leak, setting a maxAllowed memory can prevent it from consuming all available node memory.
# my-nginx-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-nginx-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-nginx-deployment
updatePolicy:
updateMode: "Auto" # Or "Recreate", "Initial", "Off"
resourcePolicy:
containerPolicies:
- containerName: 'nginx' # Target the 'nginx' container within the pod
minAllowed:
cpu: 50m
memory: 20Mi
maxAllowed:
cpu: 1
memory: 500Mi
controlledResources: ["cpu", "memory"] # Explicitly control both CPU and Memory
- containerName: 'istio-proxy' # Example for a sidecar, if you were using Istio
mode: "Off" # Do not manage Istio proxy resources with this VPA
controlledResources: ["cpu", "memory"]
# Optional: selector can be used instead of targetRef for more granular control
# selector:
# matchLabels:
# app: my-nginx
kubectl apply -f my-nginx-vpa.yaml
Verify VPA Creation
Check that the VPA resource has been created. It will take some time (a few minutes) for the VPA Recommender to gather metrics and provide recommendations.
kubectl get vpa my-nginx-vpa -o yaml
Expected Output (initial state, recommendations will appear later):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
creationTimestamp: "2023-10-27T10:00:00Z"
name: my-nginx-vpa
namespace: default
resourceVersion: "12345"
uid: a1b2c3d4-e5f6-7890-1234-567890abcdef
spec:
resourcePolicy:
containerPolicies:
- containerName: nginx
controlledResources:
- cpu
- memory
maxAllowed:
cpu: "1"
memory: 500Mi
minAllowed:
cpu: 50m
memory: 20Mi
- containerName: istio-proxy
controlledResources:
- cpu
- memory
mode: Off
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-nginx-deployment
updatePolicy:
updateMode: Auto
status:
conditions:
- lastTransitionTime: "2023-10-27T10:00:00Z"
message: Controller has not yet received metrics for the specified workload
reason: NoMetrics
status: "False"
type: RecommendationProvided
- lastTransitionTime: "2023-10-27T10:00:00Z"
message: Successfully restored VPA object from the checkpoint
reason: CheckpointRestored
status: "True"
type: CheckpointRestored
# Recommendations will appear here after some time
# recommendation:
# containerRecommendations:
# - containerName: nginx
# target:
# cpu: 100m
# memory: 40Mi
# lowerBound:
# cpu: 80m
# memory: 30Mi
# upperBound:
# cpu: 150m
# memory: 60Mi
# uncappedTarget:
# cpu: 100m
# memory: 40Mi
Step 4: Observe VPA Recommendations and Actions
Once VPA has gathered enough metrics (typically a few minutes), it will start providing recommendations. If updateMode is set to Auto or Recreate, VPA will also evict and recreate pods to apply these recommendations. You’ll see the resourceVersion of your pods change as they are updated.
To simulate some load on the Nginx pods, you can exec into one and run a simple command or use a load testing tool.
# Optional: Generate some load (e.g., in another terminal)
# Find one of your Nginx pod names
NGINX_POD=$(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}')
# Exec into the pod and run a command to consume some CPU/memory
# This is a simple example, real load testing tools are better
kubectl exec -it $NGINX_POD -- /bin/bash -c "yes > /dev/null &"
# Let it run for a minute, then kill it
# In the same exec session: kill %1
Wait a few minutes, then check the VPA object again. You should now see the recommendation section populated.
kubectl get vpa my-nginx-vpa -o yaml
Expected Output (with recommendations):
# ... (previous output)
status:
conditions:
- lastTransitionTime: "2023-10-27T10:05:00Z"
message: VPA target is controlled by a Horizontal Pod Autoscaler. This may lead to
conflicts, consider using HPA with VPA v2.
reason: HPAFound
status: "False"
type: RecommendationProvided
# ... (other conditions)
recommendation:
containerRecommendations:
- containerName: nginx
target:
cpu: 100m
memory: 40Mi
lowerBound:
cpu: 80m
memory: 30Mi
upperBound:
cpu: 150m
memory: 60Mi
uncappedTarget:
cpu: 100m
memory: 40Mi
Also, observe the pods. If updateMode is Auto or Recreate, VPA will have recreated your Nginx pods with the new resource requests. Check the pod descriptions to confirm.
kubectl describe pod $(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}') | grep -A 5 "Limits:"
Expected Output (showing VPA-injected requests/limits):
Limits:
cpu: 1
memory: 500Mi
Requests:
cpu: 100m
memory: 40Mi
State: Running
Notice that the Requests now reflect VPA’s target recommendation, while Limits are set to the maxAllowed from the resourcePolicy. If you are also using a service mesh like Istio Ambient Mesh, you might have sidecar containers. Remember to configure containerPolicies for them or set their mode to Off if VPA should not manage them.
Step 5: Understanding VPA Modes and Policies
The behavior of VPA is heavily influenced by its configuration. Let’s delve deeper into the updatePolicy and resourcePolicy.
updatePolicy
This field determines how VPA applies its recommendations. The default is Auto.
Auto: VPA automatically updates resource requests and limits by evicting and recreating pods. This is the most hands-off approach but requires your application to handle restarts gracefully.Recreate: Similar toAuto, but VPA only recreates pods to apply recommendations. It doesn’t set initial requests.Initial: VPA only sets resource requests during pod creation. It will not modify existing pods. Useful if you want VPA to provide a good starting point but prevent runtime changes.Off: VPA calculates recommendations but does not apply them. Thestatus.recommendationfield will be populated, but pods won’t be modified. This mode is excellent for auditing and understanding resource usage patterns before enabling full automation.
# Example: VPA in 'Off' mode for observation
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-nginx-vpa-off
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-nginx-deployment
updatePolicy:
updateMode: "Off" # Only recommend, do not apply
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 50m
memory: 20Mi
maxAllowed:
cpu: 2
memory: 1Gi
resourcePolicy
This policy allows fine-grained control over resource recommendations for specific containers within a pod. It’s an array of containerPolicies.
containerName: The name of the container to which this policy applies. Use*for all containers.mode: Can beAuto(default) orOff. If set toOfffor a specific container, VPA will not manage its resources. This is useful for sidecars or containers whose resources are managed externally.controlledResources: An array specifying which resources VPA should manage (e.g.,["cpu", "memory"]).minAllowed/maxAllowed: Define the lower and upper bounds for VPA’s recommendations. These are crucial for preventing VPA from setting requests too low (leading to OOMKills) or too high (leading to excessive costs).controlledValues: Specifies whether VPA should control onlyRequestsor bothRequestsAndLimits. Default isRequestsAndLimits. If set toRequestsOnly, VPA will only modify requests, leaving limits as they are or as defined in the deployment.
# Example: Advanced resourcePolicy
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa-advanced
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app-deployment
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: 'main-app'
minAllowed:
cpu: 100m
memory: 100Mi
maxAllowed:
cpu: 2 # 2 Cores
memory: 4Gi
controlledResources: ["cpu", "memory"]
controlledValues: "RequestsAndLimits" # VPA will manage both requests and limits
- containerName: 'data-loader' # A container that only needs CPU, but not much memory
minAllowed:
cpu: 50m
maxAllowed:
cpu: 500m
memory: 200Mi # Set a reasonable max memory even if not actively controlling
controlledResources: ["cpu"] # Only control CPU for this container
controlledValues: "RequestsOnly" # Only manage requests for this container
- containerName: 'logging-sidecar'
mode: "Off" # Do not manage this container's resources
controlledResources: ["cpu", "memory"]
# Apply the advanced VPA (after changing targetRef to your app)
# kubectl apply -f my-app-vpa-advanced.yaml
Step 6: Combining VPA with HPA (Carefully!)
VPA and HPA manage different aspects of scaling: VPA manages individual pod resources (vertical scaling), while HPA manages the number of pods (horizontal scaling). Using them together can lead to conflicts if not configured correctly, as both might try to manage CPU/memory, leading to a “thrashing” effect.
Kubernetes v1.23+ introduced a feature that allows VPA to cooperate with HPA by automatically setting resource requests that HPA can then use for scaling decisions. When VPA is enabled on a deployment also targeted by HPA, VPA will set the recommended resource requests, and HPA will use these as a baseline for scaling pods horizontally. However, VPA will refrain from updating existing pod resources if HPA is scaling based on CPU or memory.
For more robust combined scaling, consider external autoscalers or advanced scheduling solutions. For instance, tools like Karpenter Cost Optimization can dynamically provision nodes based on pod resource requests, complementing both HPA and VPA. If you’re dealing with demanding workloads like LLMs, understanding LLM GPU Scheduling Best Practices becomes critical, as VPA primarily focuses on CPU/memory and not specialized hardware.
Recommendation: If using VPA and HPA on the same workload:
- Use VPA in
Initialmode to set optimal starting requests. HPA can then scale based on these stable requests. - If using
AutoorRecreatewith HPA, be aware of the “HPAFound” warning in VPA status. VPA will try to coordinate, but it’s often better to let HPA manage replica counts based on VPA’s initial recommendations. - For advanced scenarios, consider the VPA-HPA integration documentation.
# Example: VPA in Initial mode with HPA
# First, update VPA to Initial mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-nginx-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-nginx-deployment
updatePolicy:
updateMode: "Initial" # VPA sets requests only at pod creation
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 50m
memory: 20Mi
maxAllowed:
cpu: 1
memory: 500Mi
---
# Then, create an HPA for the same deployment
apiVersion: autoscaling.k8s.io/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-nginx-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-nginx-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50 # Target 50% CPU utilization based on VPA's requests
kubectl apply -f my-nginx-vpa-hpa.yaml
Verify Combined Setup
Check both VPA and HPA status. The VPA should be in Initial mode and the HPA should be active.
kubectl get vpa my-nginx-vpa -o yaml
kubectl get hpa my-nginx-hpa -o yaml
Expected Output (VPA in Initial, HPA active):
# ... VPA output showing updateMode: Initial ...
status:
recommendation:
containerRecommendations:
- containerName: nginx
target:
cpu: 100m
memory: 40Mi
# ...
# ... HPA output
status:
currentMetrics:
- resource:
name: cpu
current:
averageUtilization: 10
averageValue: 10m
type: Resource
currentReplicas: 3
desiredReplicas: 3
lastScaleTime: "2023-10-27T10:15:00Z"
# ...
Production Considerations
Deploying VPA in a production environment requires careful planning and consideration:
- Application Tolerance to Restarts: If using
AutoorRecreatemodes, your applications must be stateless or gracefully handle pod restarts (e.g., with proper termination grace periods, readiness/liveness probes, and graceful shutdown). For stateful applications,InitialorOffmodes are often preferred. - Resource Limits: Always define
maxAllowedin yourresourcePolicy. This prevents runaway resource consumption by a misbehaving application, safeguarding your nodes from resource exhaustion. Similarly,minAllowedensures your applications always have a baseline amount of resources, preventing them from being starved. - Monitoring and Alerting: Monitor VPA’s behavior closely. Track pod restarts, resource utilization changes, and VPA recommendations. Set up alerts for unexpected increases or decreases in resource requests. Integrate VPA metrics into your existing observability stack. For advanced observability, consider tools like eBPF Observability with Hubble to gain deeper insights into network and application performance.
- Interaction with HPA: Understand the implications of using VPA and HPA together. While Kubernetes v1.23+ improved cooperation, conflicts can still arise. Consider VPA in
Initialmode for workloads also managed by HPA. - Node Capacity: VPA optimizes individual pod requests, but it doesn’t provision new nodes. Ensure your cluster has sufficient node capacity (or use a cluster autoscaler like Karpenter) to accommodate VPA’s recommendations, especially for memory.
- Rollout Strategy: Introduce VPA gradually. Start with
Offmode to gather recommendations, then move toInitialfor safe initial settings, and finally considerAutofor non-critical, restart-tolerant applications. - Sidecars and Shared Resources: Be mindful of sidecar containers (e.g., Istio proxies, logging agents). You might want to exclude them from VPA management using
containerPolicieswithmode: Off, or configure specific policies for them, as their resource needs might be different or managed by the sidecar’s own control plane. For securing your software supply chain, tools like Sigstore and Kyverno can ensure that only trusted images are deployed, which can be critical for sidecars too. - Cost Optimization: VPA directly contributes to cost savings by rightsizing resources. Combine it with cluster autoscalers and cloud provider cost management tools for maximum impact.
Troubleshooting
Here are common issues you might encounter with VPA and their solutions:
-
VPA recommendations are not showing up (
NoMetricsstatus).Problem: The VPA status shows
NoMetrics, and no recommendations appear.Solution: VPA relies on the Metrics Server to gather CPU and memory usage.
- Verify Metrics Server is installed and running:
kubectl get apiservice v1beta1.metrics.k8s.io kubectl get pods -n kube-system -l k8s-app=metrics-server - Check Metrics Server logs for errors:
kubectl logs -n kube-system $(kubectl get pods -n kube-system -l k8s-app=metrics-server -o jsonpath='{.items[0].metadata.name}') - Ensure your pods are generating some load so Metrics Server has data to report.
- Ensure the VPA Recommender pod is healthy:
kubectl get pods -n kube-system -l app=vpa-recommender
- Verify Metrics Server is installed and running:
-
Pods are not restarting or updating with new recommendations.
Problem: VPA shows recommendations, but pods are not being recreated or updated.
Solution:
- Check the
updateModein your VPA object. If it’sOfforInitial, pods won’t be updated after creation. Change it toAutoorRecreateif you want dynamic updates.kubectl get vpa my-nginx-vpa -o yaml | grep updateMode - Verify the VPA Updater pod is running and healthy:
kubectl get pods -n kube-system -l app=vpa-updater - Check VPA Updater logs for errors related to evicting or updating pods.
- Ensure there are no
- Check the