Introduction
In the dynamic world of Kubernetes, optimizing infrastructure costs while maintaining performance and availability is a perennial challenge. Traditional cluster autoscalers often struggle with the nuances of cloud provider pricing models, leading to either over-provisioning or slow scaling responses. This is particularly true when attempting to leverage the significant savings offered by AWS EC2 Spot Instances, which can reduce compute costs by up to 90% compared to on-demand instances, but come with the risk of interruption.
Enter Karpenter, an open-source, high-performance Kubernetes cluster autoscaler built by AWS. Unlike its predecessors, Karpenter directly interfaces with the cloud provider’s APIs to provision new nodes in response to unschedulable pods. Its intelligent consolidation and flexible provisioning capabilities make it an ideal tool for implementing sophisticated cost-optimization strategies, especially when mixing Spot and On-Demand instances. This guide will walk you through setting up Karpenter to intelligently manage a mixed Spot and On-Demand instance fleet, ensuring your workloads are always scheduled on the most cost-effective resources available.
By the end of this tutorial, you’ll have a robust, cost-optimized Kubernetes environment that dynamically scales to meet your application demands, leveraging the best of both Spot and On-Demand instances. We’ll dive deep into Karpenter’s configuration, exploring how to define NodePools and AWSNodeTemplates to achieve this optimal balance. For further insights into cost reduction, consider exploring our comprehensive guide on Karpenter Cost Optimization.
TL;DR: Karpenter Spot/On-Demand Mix
Leverage Karpenter to intelligently provision a mix of Spot and On-Demand instances for cost savings and reliability.
- Install Karpenter: Use Helm to deploy Karpenter into your cluster.
- Create IAM Roles: Set up necessary IAM roles for Karpenter and node instances.
- Define AWSNodeTemplate: Specify instance types, AMIs, and subnets.
- Configure NodePool: Prioritize Spot instances with fallback to On-Demand using
instanceTypeOptions. - Deploy Workload: Observe Karpenter provisioning nodes based on your defined strategy.
# Install Karpenter (after prerequisites)
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version --namespace karpenter --create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"= \
--set settings.aws.clusterName= \
--set settings.aws.defaultInstanceProfile= \
--set settings.aws.interruptionQueue=
# Example AWSNodeTemplate (simplified)
kubectl apply -f - <"
securityGroupSelector:
karpenter.sh/discovery: ""
EOF
# Example NodePool (prioritizing Spot)
kubectl apply -f - <", "", ""]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # Prioritize Spot implicitly
kubelet:
maxPods: 110 # Example value
limits:
cpu: "1000"
disruption:
consolidationPolicy: WhenEmpty
expireAfter: 720h # Nodes expire after 30 days
EOF
# Deploy a sample workload
kubectl apply -f - <
Prerequisites
Before we begin, ensure you have the following:
- An Amazon EKS cluster running. Karpenter works best with EKS.
kubectlconfigured to communicate with your EKS cluster.- Helm 3 installed.
- AWS CLI v2 installed and configured with appropriate credentials.
- An IAM OIDC provider associated with your EKS cluster. This is standard for EKS and required for IRSA (IAM Roles for Service Accounts).
- Familiarity with Kubernetes concepts like Deployments, Pods, and Custom Resources.
Cluster and Account Setup
First, let's define some environment variables for convenience.
export CLUSTER_NAME="karpenter-spot-demo" # Replace with your EKS cluster name
export AWS_REGION="us-east-1" # Replace with your AWS region
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
echo "Cluster Name: $CLUSTER_NAME"
echo "AWS Region: $AWS_REGION"
echo "AWS Account ID: $ACCOUNT_ID"
Verify that your OIDC provider is set up. If not, you can create it with the AWS CLI:
aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text
# Expected output: https://oidc.eks.${AWS_REGION}.amazonaws.com/id/EXAMPLED999999999999999999999999999999
# If not present, create it:
# eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} --approve
Step-by-Step Guide
1. Create Karpenter IAM Roles and Instance Profile
Karpenter requires specific IAM permissions to interact with AWS services like EC2, EC2 Spot, and IAM. We'll create an IAM Role for Karpenter itself (for IRSA) and an Instance Profile that Karpenter will attach to the nodes it provisions. This Instance Profile grants permissions to the EC2 instances.
First, create an IAM role for the Karpenter controller. This role will be assumed by the Karpenter service account in your cluster.
# Create a trust policy for the Karpenter controller role
cat < karpenter-trust-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5)"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5):sub": "system:serviceaccount:karpenter:karpenter",
"oidc.eks.${AWS_REGION}.amazonaws.com/id/$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | cut -d '/' -f 5):aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
aws iam create-role --role-name KarpenterControllerRole-${CLUSTER_NAME} --assume-role-policy-document file://karpenter-trust-policy.json
# Attach the Karpenter controller policy
aws iam attach-role-policy --role-name KarpenterControllerRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/KarpenterControllerPolicy
Next, create the EC2 Instance Profile. Karpenter will use this to launch new instances. This profile needs permissions for EC2, ECR, and to register with the EKS cluster.
# Create the instance profile role
aws iam create-role --role-name KarpenterNodeRole-${CLUSTER_NAME} --assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'
# Attach necessary policies
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore # Optional, for SSM access
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy # If using AWS VPC CNI
# Create the instance profile
aws iam create-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME}
# Add the role to the instance profile
aws iam add-role-to-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME} --role-name KarpenterNodeRole-${CLUSTER_NAME}
Verify
Confirm the roles and instance profile are created:
aws iam get-role --role-name KarpenterControllerRole-${CLUSTER_NAME} --query Role.Arn --output text
aws iam get-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME} --query InstanceProfile.Arn --output text
# Expected output (ARNs will vary):
# arn:aws:iam::123456789012:role/KarpenterControllerRole-karpenter-spot-demo
# arn:aws:iam::123456789012:instance-profile/KarpenterNodeInstanceProfile-karpenter-spot-demo
2. Install Karpenter with Helm
Now we'll install Karpenter into your cluster using Helm. We'll specify the IAM role for the service account and the instance profile for the nodes. Karpenter will also need to know your cluster name and an SQS queue for interruption events.
First, create an SQS queue for Spot interruption and health events. This allows Karpenter to gracefully drain nodes before they are interrupted or terminated.
aws sqs create-queue --queue-name karpenter-${CLUSTER_NAME}
export KARPENTER_IAM_ROLE_ARN=$(aws iam get-role --role-name KarpenterControllerRole-${CLUSTER_NAME} --query Role.Arn --output text)
export INSTANCE_PROFILE_NAME=KarpenterNodeInstanceProfile-${CLUSTER_NAME}
export QUEUE_NAME=karpenter-${CLUSTER_NAME}
export KARPENTER_VERSION="0.33.0" # Check https://github.com/kubernetes-sigs/karpenter/releases for the latest stable version
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} --namespace karpenter --create-namespace \
--set serviceAccount.annotations."eks\.amazonaws\.com/role-arn"="${KARPENTER_IAM_ROLE_ARN}" \
--set settings.aws.clusterName="${CLUSTER_NAME}" \
--set settings.aws.defaultInstanceProfile="${INSTANCE_PROFILE_NAME}" \
--set settings.aws.interruptionQueue="${QUEUE_NAME}" \
--set settings.aws.defaultInstanceProfile="${INSTANCE_PROFILE_NAME}" \
--wait
Verify
Check if Karpenter pods are running and the service account has the correct annotation:
kubectl get pods -n karpenter
kubectl get sa karpenter -n karpenter -o yaml | grep "eks.amazonaws.com/role-arn"
# Expected output for pods:
# NAME READY STATUS RESTARTS AGE
# karpenter-6789abcd-efghj 1/1 Running 0 2m
# Expected output for service account:
# eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/KarpenterControllerRole-karpenter-spot-demo
3. Configure AWSNodeTemplate
The AWSNodeTemplate custom resource defines the cloud provider-specific settings for the nodes Karpenter provisions. This includes things like AMI selection, subnets, and security groups. We'll create a generic template that Karpenter can use to provision instances into your EKS cluster.
Karpenter automatically discovers subnets and security groups tagged with karpenter.sh/discovery: ${CLUSTER_NAME}. Ensure your subnets and security groups are correctly tagged. You can do this manually or via eksctl during cluster creation.
# Tag your subnets (example for public subnets)
# Replace subnet-xxxxxxxxxxxxxxxxx with your actual subnet IDs
# aws ec2 create-tags --resources subnet-xxxxxxxxxxxxxxxxx subnet-yyyyyyyyyyyyyyyyy --tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}
# Tag your security groups (example for cluster security group)
# Replace sg-xxxxxxxxxxxxxxxxx with your actual security group ID
# aws ec2 create-tags --resources sg-xxxxxxxxxxxxxxxxx --tags Key=karpenter.sh/discovery,Value=${CLUSTER_NAME}
# Create a default AWSNodeTemplate
kubectl apply -f - <
Verify
Check if the AWSNodeTemplate is created:
kubectl get awsmodetemplate default -o yaml
# Expected output (truncated):
# apiVersion: karpenter.k8s.aws/v1beta1
# kind: AWSNodeTemplate
# metadata:
# name: default
# spec:
# amiSelector:
# aws.amazon.com/bottlerocket-ami: "true"
# securityGroupSelector:
# karpenter.sh/discovery: karpenter-spot-demo
# subnetSelector:
# karpenter.sh/discovery: karpenter-spot-demo
4. Configure NodePool for Spot and On-Demand Mix
The NodePool custom resource is Karpenter's primary configuration object. It defines the constraints and preferences for node provisioning. Here, we'll configure a NodePool that prioritizes Spot instances while allowing Karpenter to fall back to On-Demand if Spot capacity is unavailable or insufficient.
The key to mixing Spot and On-Demand is the karpenter.sh/capacity-type requirement. By including both spot and on-demand in the values array, Karpenter will evaluate both options. Karpenter's internal logic is designed to prioritize the cheapest available options, which typically means Spot instances if they meet the other requirements. If Spot capacity is constrained or if a higher priority workload explicitly requests On-Demand, Karpenter will provision On-Demand instances.
kubectl apply -f - <
Verify
Check if the NodePool is created and its status:
kubectl get nodepool default -o yaml
# Expected output (truncated):
# apiVersion: karpenter.sh/v1beta1
# kind: NodePool
# metadata:
# name: default
# spec:
# disruption:
# consolidationPolicy: WhenUnderutilized
# expireAfter: 720h0m0s
# limits:
# cpu: "1000"
# memory: "2000Gi"
# template:
# spec:
# nodeClassRef:
# name: default
# requirements:
# - key: karpenter.sh/capacity-type
# operator: In
# values:
# - spot
# - on-demand
# ...
5. Deploy a Sample Workload and Observe Scaling
Now, let's deploy a sample workload that will require Karpenter to provision new nodes. We'll create a Deployment with many pods, exceeding the capacity of any existing nodes, thus triggering Karpenter.
Karpenter will detect the unschedulable pods and provision new nodes according to the NodePool configuration. You should see a mix of Spot and On-Demand instances being launched, prioritizing Spot if available and cost-effective.
kubectl apply -f - <
Verify
Monitor the pods and nodes. You should see new nodes being provisioned and pods transitioning from Pending to Running.
kubectl get pods -w # Watch pods until they are running
kubectl get nodes -l karpenter.sh/nodepool=default -o custom-columns=NAME:.metadata.name,CAPACITY_TYPE:.metadata.labels.'karpenter.sh/capacity-type',INSTANCE_TYPE:.metadata.labels.'node.kubernetes.io/instance-type'
# Expected output (example):
# NAME CAPACITY_TYPE INSTANCE_TYPE
# ip-10-0-10-100.us-east-1.compute.internal spot c6i.large
# ip-10-0-20-200.us-east-1.compute.internal spot m5.xlarge
# ip-10-0-30-300.us-east-1.compute.internal on-demand r6i.large
You can also check the Karpenter logs for detailed provisioning decisions:
kubectl logs -f -n karpenter $(kubectl get pod -n karpenter -l app.kubernetes.io/name=karpenter -o name)
Look for messages indicating "launching new instance" and detailing the instance type and capacity type (Spot/On-Demand).
Production Considerations
Deploying Karpenter in a production environment requires careful planning beyond the basic setup:
- NodePool Segmentation: Instead of a single
defaultNodePool, create multiple NodePools for different workload types (e.g., CPU-intensive, memory-intensive, GPU workloads). This allows for fine-grained control over instance types, AMIs, and scaling behavior. For example, you might have a NodePool exclusively for LLM GPU Scheduling. - Advanced AMI Selection: Beyond Bottlerocket, consider using custom AMIs for specific security hardening or pre-installed software. You can specify AMI IDs directly or use more complex selectors in your
AWSNodeTemplate. - Security Groups and Networking: Ensure your
securityGroupSelectorandsubnetSelectorin theAWSNodeTemplateare precise. For enhanced security, consider using Kubernetes Network Policies to restrict pod-to-pod communication. For secure pod-to-pod traffic, especially across different nodes, solutions like Cilium WireGuard Encryption can be invaluable. - Disruption Budgets and Pod Anti-Affinity: For critical applications, use Pod Disruption Budgets (PDBs) to ensure sufficient replicas remain available during node consolidation or Spot interruptions. Combine this with pod anti-affinity to spread critical pods across different nodes or availability zones.
- Monitoring and Alerting: Integrate Karpenter metrics with your existing monitoring solutions (Prometheus, Grafana). Monitor for provisioning failures, consolidation events, and Spot interruptions. Leverage eBPF Observability with Hubble for deep insights into network and application performance.
- Cost Monitoring: Use AWS Cost Explorer or third-party tools to track the actual cost savings achieved by Karpenter's Spot instance usage. Regularly review your instance type selections.
- Node Expiration and Rolling Updates: The
expireAftersetting in the NodePool is crucial for security and software updates. It ensures nodes are regularly replaced, allowing you to update the underlying AMI or Kubernetes version without manual intervention. - Cluster Autoscaler Coexistence: If you're migrating from Cluster Autoscaler, ensure it's completely disabled for the node groups Karpenter manages to avoid conflicts.
- Pod Scheduling Constraints: Use node selectors, node affinity, and tolerations to guide Karpenter's provisioning decisions. For example, if you have a NodePool with a specific taint for Spot instances, you can add a toleration to your deployment to ensure it lands on a Spot node.
- Service Mesh Integration: If you're using a service mesh like Istio Ambient Mesh, ensure that Karpenter-provisioned nodes are correctly configured to join the mesh and that sidecars (or ztunnels in Ambient Mesh) are automatically injected or configured.
- Gateway API Integration: If you are using the Kubernetes Gateway API for advanced traffic management, ensure your ingress controllers or gateways can properly utilize the dynamic nodes provisioned by Karpenter.
- Supply Chain Security: For critical workloads, consider integrating Karpenter with tools like Sigstore and Kyverno to enforce policies on container images and ensure their integrity on Karpenter-provisioned nodes.
Troubleshooting
Here are some common issues you might encounter with Karpenter and their solutions:
-
Pods stuck in Pending state, Karpenter not provisioning nodes.
Explanation: This is the most common issue. Karpenter isn't detecting the need for new nodes or is failing to provision them.
Solution:
- Check Karpenter controller logs:
kubectl logs -f -n karpenter $(kubectl get pod -n karpenter -l app.kubernetes.io/name=karpenter -o name). Look for errors related to API calls, permissions, or resource limits. - Verify NodePool and AWSNodeTemplate configurations: Ensure
requirementsin NodePool match your pod requests and thatamiSelector,subnetSelector, andsecurityGroupSelectorin AWSNodeTemplate are correct and discoverable by Karpenter. - Check IAM permissions: Ensure the Karpenter controller role and node instance profile have all necessary permissions.
- Cordoned/unschedulable nodes: Make sure your existing nodes aren't cordoned or marked unschedulable.
- Resource limits: Check if your NodePool or cluster-wide limits (e.g.,
limitsin NodePool) are preventing new nodes.
- Check Karpenter controller logs:
-
Karpenter provisions On-Demand instances instead of Spot, even when Spot is specified.
Explanation: While you've specified
spotin thekarpenter.sh/capacity-typerequirement, Karpenter might still opt for On-Demand under certain conditions.Solution:
- Spot capacity unavailability: There might be no available Spot capacity for the requested instance types and zones at that moment. Check Karpenter logs; it will usually indicate if it couldn't find Spot capacity.
- Instance type constraints: Your
instance-family,instance-size, or other requirements might be too restrictive, limiting Karpenter's ability to find suitable Spot instances. Broaden your instance selection if possible. - Pricing: In rare cases, an On-Demand instance might be cheaper than a specific Spot instance if your requirements are very broad and there's a temporary spike in Spot prices. Karpenter prioritizes cost.
- Pod tolerations/node selectors: Ensure your pods don't have specific node selectors or tolerations that inadvertently push them away from potential Spot nodes.
-
Nodes are provisioned but pods remain Pending or crash.
Explanation: The node itself might not be healthy or correctly configured for your workload.
Solution:
- Check node status:
kubectl describe node. Look for taints, events, or conditions that indicate issues. - Review AMI: Ensure the AMI specified (or auto-selected) in your
AWSNodeTemplateis compatible with your cluster version and workloads. - CRI, CNI, and Kubelet: Verify that the container runtime (CRI), Container Network Interface (CNI), and Kubelet are functioning correctly on the new nodes. Check node logs (e.g., via SSM or SSH).
- Resource requests: Ensure pod resource requests do not exceed node capacity.
- Check node status:
-
Karpenter is not consolidating or terminating idle nodes.
Explanation: Nodes are staying around longer than expected, leading to increased costs.
Solution:
disruptionsettings: Review theconsolidationPolicyandexpireAftersettings in your NodePool.WhenUnderutilizedorWhenEmptyare good choices.expireAfterensures nodes are eventually replaced.- Pod Disruption