Introduction
In the dynamic world of cloud-native applications, optimizing resource utilization and managing costs are paramount for any organization running Kubernetes. While Kubernetes provides powerful orchestration capabilities, effectively managing the underlying infrastructure to balance performance and expenditure remains a significant challenge. Traditional cluster autoscalers often struggle with the intricacies of diverse workloads, leading to over-provisioning or slow scaling, both of which impact your bottom line.
Enter Karpenter, an open-source, high-performance Kubernetes cluster autoscaler built by AWS. Unlike traditional autoscalers that react to unschedulable pods by scaling up existing node groups, Karpenter directly provisions new nodes tailored to your workload’s specific requirements. This intelligent provisioning capability, especially when combined with a strategic mix of Spot and On-Demand instances, can revolutionize your Kubernetes cost efficiency. This guide will walk you through leveraging Karpenter to achieve significant cost savings by dynamically provisioning the right blend of Spot and On-Demand instances for your workloads.
By the end of this tutorial, you’ll understand how to configure Karpenter to intelligently utilize the cost-effectiveness of AWS EC2 Spot Instances for fault-tolerant workloads, while reserving On-Demand Instances for critical, uninterrupted services. This hybrid approach, orchestrated by Karpenter, ensures your applications have the resources they need, exactly when they need them, at the lowest possible cost, offering a substantial improvement over manual provisioning or less sophisticated autoscaling solutions. For more advanced cost optimization strategies, check out our guide on Karpenter Cost Optimization.
TL;DR: Karpenter Spot & On-Demand Mix
Optimize Kubernetes costs by using Karpenter to intelligently provision a mix of AWS Spot and On-Demand instances. Spot instances are cheap but can be interrupted; On-Demand are reliable but pricier. Karpenter automates this, placing fault-tolerant workloads on Spot and critical ones on On-Demand, leading to significant savings and improved resource utilization.
Key Commands:
# Install Karpenter (if not already installed)
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} \
--namespace karpenter --create-namespace \
--set serviceAccount.create=false \
--set serviceAccount.name=karpenter \
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
--set settings.aws.clusterName=${CLUSTER_NAME} \
--set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
--wait # Wait for the installation to complete
# Create a Provisioner for Spot Instances
kubectl apply -f - <
Prerequisites
Before diving into Karpenter's intelligent provisioning, ensure you have the following ready:
- A running Kubernetes cluster on AWS EKS: Karpenter is designed for AWS and integrates deeply with EC2. You can set up an EKS cluster using
eksctlor the AWS console. Refer to the eksctl documentation for detailed instructions. kubectlinstalled and configured: Yourkubectlcommand-line tool must be configured to connect to your EKS cluster.helminstalled: We'll use Helm for installing Karpenter. If you don't have it, follow the Helm installation guide.- AWS CLI installed and configured: Ensure your AWS CLI is configured with credentials that have sufficient permissions to create IAM roles, policies, EC2 instances, and other AWS resources.
- IAM Permissions: Karpenter requires specific IAM permissions to interact with AWS services like EC2, IAM, and EC2 Spot. We'll set these up as part of the installation.
- Basic understanding of Kubernetes concepts: Familiarity with Deployments, Pods, Nodes, and Labels is assumed.
- Basic understanding of AWS EC2 Spot Instances: Knowing the benefits and limitations of Spot Instances will help you configure Karpenter effectively.
Step-by-Step Guide
1. Set Up Environment Variables and IAM Roles
First, we need to define some environment variables that will be used throughout the setup. Then, we'll create the necessary IAM roles and policies that Karpenter needs to operate within your AWS account. This step is crucial for Karpenter to be able to provision and manage EC2 instances on your behalf. We'll create an IAM Role for Karpenter itself and an Instance Profile for the nodes it provisions.
# Define environment variables
export CLUSTER_NAME="karpenter-spot-ondemand-demo"
export AWS_DEFAULT_REGION="us-east-1" # Or your desired region
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export CLUSTER_ENDPOINT=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.endpoint" --output text)
export KARPENTER_VERSION="v0.32.0" # Use the latest stable version
# Create Karpenter Controller IAM Role and Policy
# This role grants Karpenter permissions to manage EC2 instances, launch templates, etc.
aws iam create-role --role-name KarpenterControllerRole-${CLUSTER_NAME} --assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::'${AWS_ACCOUNT_ID}':oidc-provider/oidc.eks.${AWS_DEFAULT_REGION}.amazonaws.com/id/'$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")'"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.'${AWS_DEFAULT_REGION}'.amazonaws.com/id/'$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")':aud": "sts.amazonaws.com",
"oidc.eks.'${AWS_DEFAULT_REGION}'.amazonaws.com/id/'$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")':sub": "system:serviceaccount:karpenter:karpenter"
}
}
}
]
}'
# Attach the Karpenter controller policy
aws iam attach-role-policy --role-name KarpenterControllerRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/KarpenterControllerPolicy
# Create an Instance Profile for Karpenter-provisioned nodes
# This profile grants necessary permissions to the nodes (e.g., join EKS cluster, pull images)
aws iam create-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME}
aws iam create-role --role-name KarpenterNodeRole-${CLUSTER_NAME} --assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "ec2.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}'
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
aws iam attach-role-policy --role-name KarpenterNodeRole-${CLUSTER_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore
aws iam add-role-to-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME} --role-name KarpenterNodeRole-${CLUSTER_NAME}
# Tag your subnets for Karpenter discovery
# Karpenter needs to know which subnets to use for provisioning nodes.
# Replace with your actual VPC ID and subnet IDs
export VPC_ID=$(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.resourcesVpcConfig.vpcId" --output text)
export SUBNET_IDS=$(aws ec2 describe-subnets --filters Name=vpc-id,Values=${VPC_ID} Name=tag:eks:cluster-name,Values=${CLUSTER_NAME} --query "Subnets[*].SubnetId" --output text)
for SUBNET_ID in ${SUBNET_IDS}; do
aws ec2 create-tags --resources ${SUBNET_ID} --tags Key="karpenter.sh/discovery/${CLUSTER_NAME}",Value="true"
done
# Tag your security groups for Karpenter discovery (optional, but good practice)
export SECURITY_GROUP_IDS=$(aws ec2 describe-security-groups --filters Name=vpc-id,Values=${VPC_ID} Name=tag:eks:cluster-name,Values=${CLUSTER_NAME} --query "SecurityGroups[*].GroupId" --output text)
for SG_ID in ${SECURITY_GROUP_IDS}; do
aws ec2 create-tags --resources ${SG_ID} --tags Key="karpenter.sh/discovery/${CLUSTER_NAME}",Value="true"
done
Verify: You should see the IAM roles and instance profile created in your AWS console. The subnets and security groups should also have the karpenter.sh/discovery/${CLUSTER_NAME} tag. This tagging is essential for Karpenter to automatically discover the network resources it needs. For more details on IAM roles for service accounts, refer to the AWS EKS documentation.
aws iam get-role --role-name KarpenterControllerRole-${CLUSTER_NAME}
aws iam get-instance-profile --instance-profile-name KarpenterNodeInstanceProfile-${CLUSTER_NAME}
aws ec2 describe-subnets --filters Name=tag:karpenter.sh/discovery/${CLUSTER_NAME},Values=true --query "Subnets[*].Tags"
Expected Output (truncated):
# For get-role
{
"Role": {
"Path": "/",
"RoleName": "KarpenterControllerRole-karpenter-spot-ondemand-demo",
...
}
}
# For get-instance-profile
{
"InstanceProfile": {
"Path": "/",
"InstanceProfileName": "KarpenterNodeInstanceProfile-karpenter-spot-ondemand-demo",
...
}
}
# For describe-subnets
[
[
{
"Key": "karpenter.sh/discovery/karpenter-spot-ondemand-demo",
"Value": "true"
},
...
],
...
]
2. Install Karpenter
Now that the IAM roles and network tags are in place, we can install Karpenter into your EKS cluster using Helm. We'll configure it to use the IAM role and instance profile we just created, as well as specify the cluster name and endpoint for proper integration.
# Install Karpenter using Helm
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version ${KARPENTER_VERSION} \
--namespace karpenter --create-namespace \
--set serviceAccount.create=false \
--set serviceAccount.name=karpenter \
--set settings.aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \
--set settings.aws.clusterName=${CLUSTER_NAME} \
--set settings.aws.clusterEndpoint=${CLUSTER_ENDPOINT} \
--wait # Wait for the installation to complete
# Create a service account for Karpenter and annotate it with the IAM role
kubectl create serviceaccount karpenter --namespace karpenter
kubectl annotate serviceaccount karpenter \
-n karpenter \
eks.amazonaws.com/role-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:role/KarpenterControllerRole-${CLUSTER_NAME}
Verify: Ensure that the Karpenter pods are running in the karpenter namespace. You should see a controller pod and a webhook pod. For more on installing Karpenter, refer to the official Karpenter documentation.
kubectl get pods -n karpenter
Expected Output:
NAME READY STATUS RESTARTS AGE
karpenter-controller-XYZ 1/1 Running 0 2m
karpenter-webhook-ABC 1/1 Running 0 2m
3. Create Karpenter Provisioners for Spot and On-Demand
This is the core of our cost optimization strategy. We will define two distinct Karpenter Provisioner resources. One will be configured to exclusively provision AWS EC2 Spot Instances, ideal for stateless, fault-tolerant workloads. The other will provision On-Demand Instances, suitable for critical applications that cannot tolerate interruptions. Karpenter will then use these definitions to match incoming pods to the most appropriate node type.
Notice the karpenter.k8s.aws/capacity-type requirement. This is how we instruct Karpenter to request either Spot or On-Demand instances. We also define other requirements like instance categories, families, CPU, and memory to ensure Karpenter provisions nodes that meet our application's resource demands. The ttlSecondsAfterEmpty and consolidation settings are crucial for cost efficiency, ensuring unused nodes are terminated and existing nodes are optimized.
# spot-provisioner.yaml
apiVersion: karpenter.k8s.aws/v1beta1
kind: Provisioner
metadata:
name: spot-provisioner
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"] # Compute, Memory, General Purpose instances
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c5", "m5", "r5", "c6i", "m6i", "r6i"] # Example instance families
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["large", "xlarge"] # Exclude smaller sizes for efficiency
- key: karpenter.k8s.aws/instance-cpu
operator: In
values: ["4", "8", "16"] # Only provision nodes with 4, 8, or 16 vCPUs
- key: karpenter.k8s.aws/instance-memory
operator: In
values: ["8Gi", "16Gi", "32Gi"] # Only provision nodes with 8, 16, or 32 GiB memory
- key: karpenter.k8s.aws/capacity-type
operator: In
values: ["spot"] # THIS IS THE KEY FOR SPOT INSTANCES
- key: topology.kubernetes.io/zone
operator: In
values: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}b", "${AWS_DEFAULT_REGION}c"] # Specify zones
limits:
resources:
cpu: "1000" # Max CPU Karpenter can provision for this provisioner
memory: "1000Gi" # Max Memory Karpenter can provision for this provisioner
providerRef:
name: default # Refers to the AWSNodeTemplate we'll define next
ttlSecondsAfterEmpty: 60 # Terminate nodes after 60 seconds of being empty
consolidation:
enabled: true # Enable consolidation to reduce costs
---
# on-demand-provisioner.yaml
apiVersion: karpenter.k8s.aws/v1beta1
kind: Provisioner
metadata:
name: on-demand-provisioner
spec:
requirements:
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-family
operator: In
values: ["c5", "m5", "r5", "c6i", "m6i", "r6i"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: ["large", "xlarge"]
- key: karpenter.k8s.aws/instance-cpu
operator: In
values: ["4", "8", "16"]
- key: karpenter.k8s.aws/instance-memory
operator: In
values: ["8Gi", "16Gi", "32Gi"]
- key: karpenter.k8s.aws/capacity-type
operator: In
values: ["on-demand"] # THIS IS THE KEY FOR ON-DEMAND INSTANCES
- key: topology.kubernetes.io/zone
operator: In
values: ["${AWS_DEFAULT_REGION}a", "${AWS_DEFAULT_REGION}b", "${AWS_DEFAULT_REGION}c"]
limits:
resources:
cpu: "1000"
memory: "1000Gi"
providerRef:
name: default
ttlSecondsAfterEmpty: 300 # Keep On-Demand nodes longer if empty, or set to a higher value
consolidation:
enabled: true
Apply these provisioners:
kubectl apply -f spot-provisioner.yaml
kubectl apply -f on-demand-provisioner.yaml
Verify: You should see the two provisioners created in your cluster. You can inspect them using kubectl get provisioners. These resources tell Karpenter how to behave when it needs to provision new nodes.
kubectl get provisioners
Expected Output:
NAME AGE
on-demand-provisioner 1m
spot-provisioner 1m
4. Create an AWSNodeTemplate
The AWSNodeTemplate custom resource defines the common configuration for all nodes provisioned by Karpenter, regardless of whether they are Spot or On-Demand. This includes details like the AMI family, instance profile, security groups, and tags. By having a central AWSNodeTemplate, we avoid duplicating common settings across multiple provisioners. The {{ .Data.cluster_name }} and {{ .Data.cluster_endpoint }} are Karpenter's way of injecting cluster-specific values into the user data script for node bootstrapping.
# aws-node-template.yaml
apiVersion: karpenter.k8s.aws/v1beta1
kind: AWSNodeTemplate
metadata:
name: default
spec:
amiFamily: AL2 # Amazon Linux 2 (default)
instanceProfile: KarpenterNodeInstanceProfile-${CLUSTER_NAME}
securityGroupSelector:
karpenter.sh/discovery/${CLUSTER_NAME}: "true" # Selects security groups tagged earlier
subnetSelector:
karpenter.sh/discovery/${CLUSTER_NAME}: "true" # Selects subnets tagged earlier
tags:
karpenter.sh/cluster-name: ${CLUSTER_NAME}
Environment: Development # Example tag
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -euxo pipefail
# EKS Bootstrap script for Amazon Linux 2
# Ensure to replace with your actual cluster name and endpoint if not using Karpenter's templating
/etc/eks/bootstrap.sh ${CLUSTER_NAME} \
--apiserver-endpoint ${CLUSTER_ENDPOINT} \
--b64-cluster-ca $(aws eks describe-cluster --name ${CLUSTER_NAME} --query "cluster.certificateAuthority.data" --output text) \
--kubelet-extra-args "--node-labels=karpenter.sh/provisioner-name=default" # Add default label if desired
--BOUNDARY--
Apply the AWSNodeTemplate:
# Replace placeholders in the YAML before applying
envsubst < aws-node-template.yaml | kubectl apply -f -
Verify: You can check the created AWSNodeTemplate. This resource is referenced by the Provisioners and provides the underlying AWS infrastructure configuration for the nodes.
kubectl get awsnodetemplates
Expected Output:
NAME AGE
default 1m
5. Deploy Applications with Node Selectors
Now, we'll deploy two different applications. One will be a "fault-tolerant" application, designed to run on cheaper Spot Instances. The other will be a "critical" application, requiring the stability of On-Demand Instances. We achieve this by adding a nodeSelector to our deployment specifications, pointing to the respective Karpenter provisioner names.
The nodeSelector: karpenter.sh/provisioner-name: spot-provisioner tells Karpenter to use the spot-provisioner we defined earlier when scheduling pods for the spot-app. Similarly, nodeSelector: karpenter.sh/provisioner-name: on-demand-provisioner directs the critical-app pods to nodes provisioned by the on-demand-provisioner. This is the mechanism by which we enforce our Spot/On-Demand strategy.
# spot-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: spot-app
spec:
replicas: 5
selector:
matchLabels:
app: spot-app
template:
metadata:
labels:
app: spot-app
spec:
terminationGracePeriodSeconds: 30 # Important for Spot instances to handle interruptions gracefully
containers:
- name: pause
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 500m
memory: 512Mi
nodeSelector:
karpenter.sh/provisioner-name: spot-provisioner # Target the Spot provisioner
---
# critical-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: critical-app
spec:
replicas: 3
selector:
matchLabels:
app: critical-app
template:
metadata:
labels:
app: critical-app
spec:
containers:
- name: nginx
image: nginx
resources:
requests:
cpu: 1000m
memory: 1Gi
nodeSelector:
karpenter.sh/provisioner-name: on-demand-provisioner # Target the On-Demand provisioner
Apply these deployments:
kubectl apply -f spot-app.yaml
kubectl apply -f critical-app.yaml
Verify: Watch for Karpenter provisioning new nodes and scheduling your pods. You should see new EC2 instances appearing in your AWS console, and your pods eventually moving from a Pending to a Running state. The key is to observe that different node types (Spot vs. On-Demand) are provisioned for the respective applications. You can inspect the nodes for labels like karpenter.k8s.aws/capacity-type.
kubectl get pods -w
kubectl get nodes -l karpenter.sh/provisioner-name=spot-provisioner -o wide
kubectl get nodes -l karpenter.sh/provisioner-name=on-demand-provisioner -o wide
Expected Output (truncated):
# kubectl get pods -w
NAME READY STATUS RESTARTS AGE
spot-app-79c5c7d84-abcde 1/1 Running 0 10s
spot-app-79c5c7d84-fghij 1/1 Running 0 10s
critical-app-678f90a12-klmno 1/1 Running 0 5s
# kubectl get nodes -l karpenter.sh/provisioner-name=spot-provisioner -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-100-123.ec2.internal Ready <none> 1m v1.28.3 10.0.100.123 3.23.45.67 Amazon Linux 2 5.10.179-176.762.amzn2.x86_64 containerd://1.6.20
# kubectl get nodes -l karpenter.sh/provisioner-name=on-demand-provisioner -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
ip-10-0-101-234.ec2.internal Ready <none> 1m v1.28.3 10.0.101.234 4.56.78.90 Amazon Linux 2 5.10.179-176.762.amzn2.x86_64 containerd://1.6.20
You can further inspect the node labels to confirm the capacity type:
kubectl get node ip-10-0-100-123.ec2.internal -o jsonpath='{.metadata.labels}' | grep capacity-type
kubectl get node ip-10-0-101-234.ec2.internal -o jsonpath='{.metadata.labels}' | grep capacity-type
Expected Output:
"karpenter.k8s.aws/capacity-type":"spot"
"karpenter.k8s.aws/capacity-type":"on-demand"
Production Considerations
When moving your Karpenter Spot/On-Demand mix strategy to production, several key factors need careful attention to ensure stability, cost-effectiveness, and operational efficiency: