Orchestration

Master Kubernetes VPA Configuration

Introduction

In the dynamic world of Kubernetes, optimizing resource utilization is a perennial challenge. Workloads often exhibit fluctuating resource demands, making it difficult to set static CPU and memory requests and limits effectively. Over-provisioning leads to wasted resources and increased cloud costs, while under-provisioning results in performance degradation, application instability, and even outages. This delicate balancing act is where the Kubernetes Vertical Pod Autoscaler (VPA) steps in as a powerful ally, offering a sophisticated solution to automatically adjust container resource requests based on historical usage.

The Horizontal Pod Autoscaler (HPA) scales pods horizontally by adding or removing replicas, but it doesn’t address the individual resource needs of each pod. VPA, on the other hand, focuses on vertically scaling the resources (CPU and memory) allocated to containers within a pod. By continuously monitoring actual resource consumption, VPA recommends and, optionally, enforces optimal resource requests, ensuring your applications have just enough resources to perform efficiently without breaking the bank. This guide will walk you through the intricacies of configuring and leveraging VPA to achieve significant improvements in resource efficiency and application performance within your Kubernetes clusters.

TL;DR: Kubernetes Vertical Pod Autoscaler (VPA) Configuration

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests for containers in a pod based on historical usage, optimizing resource utilization.

Key Commands:

  • Install VPA:
    git clone https://github.com/kubernetes/autoscaler.git
    cd autoscaler/vertical-pod-autoscaler
    ./hack/vpa-up.sh
  • Create a VPA resource:
    apiVersion: autoscaling.k8s.io/v1
    kind: VerticalPodAutoscaler
    metadata:
      name: my-app-vpa
    spec:
      targetRef:
        apiVersion: "apps/v1"
        kind: Deployment
        name: my-app-deployment
      updatePolicy:
        updateMode: "Auto"
      resourcePolicy:
        containerPolicies:
          - containerName: '*'
            minAllowed:
              cpu: 100m
              memory: 50Mi
            maxAllowed:
              cpu: 2
              memory: 4Gi
            controlledResources: ["cpu", "memory"]
  • Check VPA recommendations:
    kubectl get vpa my-app-vpa -o yaml
  • Uninstall VPA:
    ./hack/vpa-down.sh

VPA can operate in four modes: Off, Initial, Recreate, and Auto. Use Auto for full automation, Recreate for more aggressive updates, and Initial for setting requests only at pod creation. Remember to set proper minAllowed and maxAllowed in resourcePolicy to prevent runaway scaling.

Prerequisites

Before diving into VPA configuration, ensure you have the following:

  • A Kubernetes Cluster: A running Kubernetes cluster (v1.13 or higher is recommended for full VPA functionality). You can use Minikube, Kind, or any cloud provider’s managed Kubernetes service (EKS, GKE, AKS).
  • kubectl Command-Line Tool: Configured to interact with your cluster. Refer to the official Kubernetes documentation for installation instructions.
  • Metrics Server: VPA relies on the Metrics Server to collect resource utilization data. Ensure it’s installed and running in your cluster. You can check its status with kubectl get apiservice v1beta1.metrics.k8s.io. If it’s not running, install it using:
    kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
  • Basic Kubernetes Knowledge: Familiarity with Deployments, Pods, and resource requests/limits.
  • Git: To clone the VPA repository for installation.

Step-by-Step Guide: Kubernetes Vertical Pod Autoscaler Configuration

Step 1: Install the Vertical Pod Autoscaler

The VPA is not part of the core Kubernetes distribution and needs to be installed separately. It consists of several components: the VPA Recommender, VPA Updater, and VPA Admission Controller. The Recommender analyzes historical and real-time resource usage to propose optimal resource requests. The Updater then applies these recommendations by evicting and recreating pods with updated resource requests (in certain modes). The Admission Controller intercepts pod creation requests and injects the recommended resource requests before the pod is scheduled.

We’ll install VPA by cloning its official GitHub repository and using the provided deployment scripts. This ensures you get all necessary components configured correctly.

git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-up.sh

Verify Installation

After running the script, verify that all VPA components are running in the kube-system namespace. You should see deployments for vpa-recommender, vpa-updater, and vpa-admission-controller.

kubectl get deployments -n kube-system | grep vpa
kubectl get pods -n kube-system | grep vpa

Expected Output (may vary slightly based on version):

vpa-admission-controller   1/1     1            1           5m
vpa-recommender            1/1     1            1           5m
vpa-updater                1/1     1            1           5m

vpa-admission-controller-7b9d6c7b-abcde   1/1     Running   0          5m
vpa-recommender-6b8c7d6b-fghij            1/1     Running   0          5m
vpa-updater-5c7d8e9f-klmno                1/1     Running   0          5m

Step 2: Deploy a Sample Application

To demonstrate VPA’s functionality, we need an application whose resource requests can be adjusted. We’ll deploy a simple Nginx deployment without any explicit resource requests or limits initially, allowing VPA to make its recommendations.

This deployment creates three Nginx pods. Since we haven’t specified resource requests, Kubernetes will assign default values (or none, depending on the cluster configuration), which VPA will then observe and recommend adjustments for. For more advanced networking configurations for your applications, consider exploring topics like Kubernetes Network Policies to secure traffic or even Kubernetes Gateway API for modern ingress management.

# my-nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx-deployment
  labels:
    app: my-nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-nginx
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.19.0
        ports:
        - containerPort: 80
        # No resource requests/limits defined here initially
        # VPA will recommend and set them.
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx-service
spec:
  selector:
    app: my-nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
kubectl apply -f my-nginx-deployment.yaml

Verify Deployment

Ensure your Nginx pods are running. Note that their resource requests will likely be empty or default at this stage.

kubectl get pods -l app=my-nginx
kubectl describe pod $(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}') | grep -A 5 "Limits:"

Expected Output (showing pods running and no explicit requests/limits):

my-nginx-deployment-7b8b7c8d-abcde   1/1     Running   0          2m
my-nginx-deployment-7b8b7c8d-fghij   1/1     Running   0          2m
my-nginx-deployment-7b8b7c8d-klmno   1/1     Running   0          2m

    Limits:
      cpu:     250m
      memory:  64Mi
    Requests:
      cpu:     250m
      memory:  64Mi
# Note: The above output might show default requests/limits injected by the cluster if no explicit ones are set.
# VPA will override these.

Step 3: Create a Vertical Pod Autoscaler Resource

Now, we’ll define a VPA resource that targets our Nginx deployment. The VPA resource tells the VPA components which pods to monitor and how to apply resource recommendations. Key fields include targetRef to specify the target workload, updatePolicy to control how recommendations are applied, and resourcePolicy to set bounds and control which resources are managed.

The updateMode field is crucial:

  • Off: VPA only provides recommendations; it does not apply them.
  • Initial: VPA sets resource requests only when a pod is first created. It does not update existing pods.
  • Recreate: VPA updates resource requests by evicting and recreating pods. This is more aggressive and can cause temporary service disruptions.
  • Auto: VPA updates resource requests by evicting and recreating pods, similar to Recreate, but it also handles the initial setting. This is generally the most automated and recommended mode for production if your application can tolerate pod restarts.

The resourcePolicy allows you to define minimum and maximum allowed resources for containers, preventing VPA from recommending excessively low or high values. It’s a good practice to set these bounds based on your application’s known requirements to avoid over-provisioning or under-provisioning. For instance, if your application has a known memory leak, setting a maxAllowed memory can prevent it from consuming all available node memory.

# my-nginx-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-nginx-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-nginx-deployment
  updatePolicy:
    updateMode: "Auto" # Or "Recreate", "Initial", "Off"
  resourcePolicy:
    containerPolicies:
      - containerName: 'nginx' # Target the 'nginx' container within the pod
        minAllowed:
          cpu: 50m
          memory: 20Mi
        maxAllowed:
          cpu: 1
          memory: 500Mi
        controlledResources: ["cpu", "memory"] # Explicitly control both CPU and Memory
      - containerName: 'istio-proxy' # Example for a sidecar, if you were using Istio
        mode: "Off" # Do not manage Istio proxy resources with this VPA
        controlledResources: ["cpu", "memory"]
  # Optional: selector can be used instead of targetRef for more granular control
  # selector:
  #   matchLabels:
  #     app: my-nginx
kubectl apply -f my-nginx-vpa.yaml

Verify VPA Creation

Check that the VPA resource has been created. It will take some time (a few minutes) for the VPA Recommender to gather metrics and provide recommendations.

kubectl get vpa my-nginx-vpa -o yaml

Expected Output (initial state, recommendations will appear later):

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  creationTimestamp: "2023-10-27T10:00:00Z"
  name: my-nginx-vpa
  namespace: default
  resourceVersion: "12345"
  uid: a1b2c3d4-e5f6-7890-1234-567890abcdef
spec:
  resourcePolicy:
    containerPolicies:
    - containerName: nginx
      controlledResources:
      - cpu
      - memory
      maxAllowed:
        cpu: "1"
        memory: 500Mi
      minAllowed:
        cpu: 50m
        memory: 20Mi
    - containerName: istio-proxy
      controlledResources:
      - cpu
      - memory
      mode: Off
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-nginx-deployment
  updatePolicy:
    updateMode: Auto
status:
  conditions:
  - lastTransitionTime: "2023-10-27T10:00:00Z"
    message: Controller has not yet received metrics for the specified workload
    reason: NoMetrics
    status: "False"
    type: RecommendationProvided
  - lastTransitionTime: "2023-10-27T10:00:00Z"
    message: Successfully restored VPA object from the checkpoint
    reason: CheckpointRestored
    status: "True"
    type: CheckpointRestored
  # Recommendations will appear here after some time
  # recommendation:
  #   containerRecommendations:
  #   - containerName: nginx
  #     target:
  #       cpu: 100m
  #       memory: 40Mi
  #     lowerBound:
  #       cpu: 80m
  #       memory: 30Mi
  #     upperBound:
  #       cpu: 150m
  #       memory: 60Mi
  #     uncappedTarget:
  #       cpu: 100m
  #       memory: 40Mi

Step 4: Observe VPA Recommendations and Actions

Once VPA has gathered enough metrics (typically a few minutes), it will start providing recommendations. If updateMode is set to Auto or Recreate, VPA will also evict and recreate pods to apply these recommendations. You’ll see the resourceVersion of your pods change as they are updated.

To simulate some load on the Nginx pods, you can exec into one and run a simple command or use a load testing tool.

# Optional: Generate some load (e.g., in another terminal)
# Find one of your Nginx pod names
NGINX_POD=$(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}')

# Exec into the pod and run a command to consume some CPU/memory
# This is a simple example, real load testing tools are better
kubectl exec -it $NGINX_POD -- /bin/bash -c "yes > /dev/null &"
# Let it run for a minute, then kill it
# In the same exec session: kill %1

Wait a few minutes, then check the VPA object again. You should now see the recommendation section populated.

kubectl get vpa my-nginx-vpa -o yaml

Expected Output (with recommendations):

# ... (previous output)
status:
  conditions:
  - lastTransitionTime: "2023-10-27T10:05:00Z"
    message: VPA target is controlled by a Horizontal Pod Autoscaler. This may lead to
      conflicts, consider using HPA with VPA v2.
    reason: HPAFound
    status: "False"
    type: RecommendationProvided
  # ... (other conditions)
  recommendation:
    containerRecommendations:
    - containerName: nginx
      target:
        cpu: 100m
        memory: 40Mi
      lowerBound:
        cpu: 80m
        memory: 30Mi
      upperBound:
        cpu: 150m
        memory: 60Mi
      uncappedTarget:
        cpu: 100m
        memory: 40Mi

Also, observe the pods. If updateMode is Auto or Recreate, VPA will have recreated your Nginx pods with the new resource requests. Check the pod descriptions to confirm.

kubectl describe pod $(kubectl get pods -l app=my-nginx -o jsonpath='{.items[0].metadata.name}') | grep -A 5 "Limits:"

Expected Output (showing VPA-injected requests/limits):

    Limits:
      cpu:     1
      memory:  500Mi
    Requests:
      cpu:     100m
      memory:  40Mi
    State:          Running

Notice that the Requests now reflect VPA’s target recommendation, while Limits are set to the maxAllowed from the resourcePolicy. If you are also using a service mesh like Istio Ambient Mesh, you might have sidecar containers. Remember to configure containerPolicies for them or set their mode to Off if VPA should not manage them.

Step 5: Understanding VPA Modes and Policies

The behavior of VPA is heavily influenced by its configuration. Let’s delve deeper into the updatePolicy and resourcePolicy.

updatePolicy

This field determines how VPA applies its recommendations. The default is Auto.

  • Auto: VPA automatically updates resource requests and limits by evicting and recreating pods. This is the most hands-off approach but requires your application to handle restarts gracefully.
  • Recreate: Similar to Auto, but VPA only recreates pods to apply recommendations. It doesn’t set initial requests.
  • Initial: VPA only sets resource requests during pod creation. It will not modify existing pods. Useful if you want VPA to provide a good starting point but prevent runtime changes.
  • Off: VPA calculates recommendations but does not apply them. The status.recommendation field will be populated, but pods won’t be modified. This mode is excellent for auditing and understanding resource usage patterns before enabling full automation.
# Example: VPA in 'Off' mode for observation
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-nginx-vpa-off
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-nginx-deployment
  updatePolicy:
    updateMode: "Off" # Only recommend, do not apply
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 20Mi
        maxAllowed:
          cpu: 2
          memory: 1Gi

resourcePolicy

This policy allows fine-grained control over resource recommendations for specific containers within a pod. It’s an array of containerPolicies.

  • containerName: The name of the container to which this policy applies. Use * for all containers.
  • mode: Can be Auto (default) or Off. If set to Off for a specific container, VPA will not manage its resources. This is useful for sidecars or containers whose resources are managed externally.
  • controlledResources: An array specifying which resources VPA should manage (e.g., ["cpu", "memory"]).
  • minAllowed / maxAllowed: Define the lower and upper bounds for VPA’s recommendations. These are crucial for preventing VPA from setting requests too low (leading to OOMKills) or too high (leading to excessive costs).
  • controlledValues: Specifies whether VPA should control only Requests or both RequestsAndLimits. Default is RequestsAndLimits. If set to RequestsOnly, VPA will only modify requests, leaving limits as they are or as defined in the deployment.
# Example: Advanced resourcePolicy
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa-advanced
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-app-deployment
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: 'main-app'
        minAllowed:
          cpu: 100m
          memory: 100Mi
        maxAllowed:
          cpu: 2 # 2 Cores
          memory: 4Gi
        controlledResources: ["cpu", "memory"]
        controlledValues: "RequestsAndLimits" # VPA will manage both requests and limits
      - containerName: 'data-loader' # A container that only needs CPU, but not much memory
        minAllowed:
          cpu: 50m
        maxAllowed:
          cpu: 500m
          memory: 200Mi # Set a reasonable max memory even if not actively controlling
        controlledResources: ["cpu"] # Only control CPU for this container
        controlledValues: "RequestsOnly" # Only manage requests for this container
      - containerName: 'logging-sidecar'
        mode: "Off" # Do not manage this container's resources
        controlledResources: ["cpu", "memory"]
# Apply the advanced VPA (after changing targetRef to your app)
# kubectl apply -f my-app-vpa-advanced.yaml

Step 6: Combining VPA with HPA (Carefully!)

VPA and HPA manage different aspects of scaling: VPA manages individual pod resources (vertical scaling), while HPA manages the number of pods (horizontal scaling). Using them together can lead to conflicts if not configured correctly, as both might try to manage CPU/memory, leading to a “thrashing” effect.

Kubernetes v1.23+ introduced a feature that allows VPA to cooperate with HPA by automatically setting resource requests that HPA can then use for scaling decisions. When VPA is enabled on a deployment also targeted by HPA, VPA will set the recommended resource requests, and HPA will use these as a baseline for scaling pods horizontally. However, VPA will refrain from updating existing pod resources if HPA is scaling based on CPU or memory.

For more robust combined scaling, consider external autoscalers or advanced scheduling solutions. For instance, tools like Karpenter Cost Optimization can dynamically provision nodes based on pod resource requests, complementing both HPA and VPA. If you’re dealing with demanding workloads like LLMs, understanding LLM GPU Scheduling Best Practices becomes critical, as VPA primarily focuses on CPU/memory and not specialized hardware.

Recommendation: If using VPA and HPA on the same workload:

  1. Use VPA in Initial mode to set optimal starting requests. HPA can then scale based on these stable requests.
  2. If using Auto or Recreate with HPA, be aware of the “HPAFound” warning in VPA status. VPA will try to coordinate, but it’s often better to let HPA manage replica counts based on VPA’s initial recommendations.
  3. For advanced scenarios, consider the VPA-HPA integration documentation.
# Example: VPA in Initial mode with HPA
# First, update VPA to Initial mode
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-nginx-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: my-nginx-deployment
  updatePolicy:
    updateMode: "Initial" # VPA sets requests only at pod creation
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 50m
          memory: 20Mi
        maxAllowed:
          cpu: 1
          memory: 500Mi
---
# Then, create an HPA for the same deployment
apiVersion: autoscaling.k8s.io/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-nginx-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50 # Target 50% CPU utilization based on VPA's requests
kubectl apply -f my-nginx-vpa-hpa.yaml

Verify Combined Setup

Check both VPA and HPA status. The VPA should be in Initial mode and the HPA should be active.

kubectl get vpa my-nginx-vpa -o yaml
kubectl get hpa my-nginx-hpa -o yaml

Expected Output (VPA in Initial, HPA active):

# ... VPA output showing updateMode: Initial ...
status:
  recommendation:
    containerRecommendations:
    - containerName: nginx
      target:
        cpu: 100m
        memory: 40Mi
      # ...

# ... HPA output
status:
  currentMetrics:
  - resource:
      name: cpu
      current:
        averageUtilization: 10
        averageValue: 10m
    type: Resource
  currentReplicas: 3
  desiredReplicas: 3
  lastScaleTime: "2023-10-27T10:15:00Z"
  # ...

Production Considerations

Deploying VPA in a production environment requires careful planning and consideration:

  1. Application Tolerance to Restarts: If using Auto or Recreate modes, your applications must be stateless or gracefully handle pod restarts (e.g., with proper termination grace periods, readiness/liveness probes, and graceful shutdown). For stateful applications, Initial or Off modes are often preferred.
  2. Resource Limits: Always define maxAllowed in your resourcePolicy. This prevents runaway resource consumption by a misbehaving application, safeguarding your nodes from resource exhaustion. Similarly, minAllowed ensures your applications always have a baseline amount of resources, preventing them from being starved.
  3. Monitoring and Alerting: Monitor VPA’s behavior closely. Track pod restarts, resource utilization changes, and VPA recommendations. Set up alerts for unexpected increases or decreases in resource requests. Integrate VPA metrics into your existing observability stack. For advanced observability, consider tools like eBPF Observability with Hubble to gain deeper insights into network and application performance.
  4. Interaction with HPA: Understand the implications of using VPA and HPA together. While Kubernetes v1.23+ improved cooperation, conflicts can still arise. Consider VPA in Initial mode for workloads also managed by HPA.
  5. Node Capacity: VPA optimizes individual pod requests, but it doesn’t provision new nodes. Ensure your cluster has sufficient node capacity (or use a cluster autoscaler like Karpenter) to accommodate VPA’s recommendations, especially for memory.
  6. Rollout Strategy: Introduce VPA gradually. Start with Off mode to gather recommendations, then move to Initial for safe initial settings, and finally consider Auto for non-critical, restart-tolerant applications.
  7. Sidecars and Shared Resources: Be mindful of sidecar containers (e.g., Istio proxies, logging agents). You might want to exclude them from VPA management using containerPolicies with mode: Off, or configure specific policies for them, as their resource needs might be different or managed by the sidecar’s own control plane. For securing your software supply chain, tools like Sigstore and Kyverno can ensure that only trusted images are deployed, which can be critical for sidecars too.
  8. Cost Optimization: VPA directly contributes to cost savings by rightsizing resources. Combine it with cluster autoscalers and cloud provider cost management tools for maximum impact.

Troubleshooting

Here are common issues you might encounter with VPA and their solutions:

  1. VPA recommendations are not showing up (NoMetrics status).

    Problem: The VPA status shows NoMetrics, and no recommendations appear.

    Solution: VPA relies on the Metrics Server to gather CPU and memory usage.

    1. Verify Metrics Server is installed and running:
      kubectl get apiservice v1beta1.metrics.k8s.io
      kubectl get pods -n kube-system -l k8s-app=metrics-server
    2. Check Metrics Server logs for errors:
      kubectl logs -n kube-system $(kubectl get pods -n kube-system -l k8s-app=metrics-server -o jsonpath='{.items[0].metadata.name}')
    3. Ensure your pods are generating some load so Metrics Server has data to report.
    4. Ensure the VPA Recommender pod is healthy:
      kubectl get pods -n kube-system -l app=vpa-recommender
  2. Pods are not restarting or updating with new recommendations.

    Problem: VPA shows recommendations, but pods are not being recreated or updated.

    Solution:

    1. Check the updateMode in your VPA object. If it’s Off or Initial, pods won’t be updated after creation. Change it to Auto or Recreate if you want dynamic updates.
      kubectl get vpa my-nginx-vpa -o yaml | grep updateMode
    2. Verify the VPA Updater pod is running and healthy:
      kubectl get pods -n kube-system -l app=vpa-updater
    3. Check VPA Updater logs for errors related to evicting or updating pods.
    4. Ensure there are no

Leave a Reply

Your email address will not be published. Required fields are marked *