Introduction
The rise of microservices architecture has brought immense benefits in terms of scalability, resilience, and development velocity. However, managing the increasing complexity of inter-service communication, traffic management, security, and observability has become a significant challenge. This is where the concept of a service mesh shines, providing a dedicated infrastructure layer to handle these concerns transparently. Traditionally, service meshes like Istio have relied on the “sidecar proxy” model, injecting a separate container alongside each application pod to intercept and manage network traffic.
While effective, the sidecar model introduces its own set of overheads: increased resource consumption (CPU, memory), operational complexity (managing sidecar lifecycle, upgrades), and potential latency impacts. Enter Cilium Service Mesh, a groundbreaking approach that leverages the power of eBPF (extended Berkeley Packet Filter) to deliver a sidecar-less service mesh experience. By operating at the kernel level, Cilium can provide advanced networking, security, and observability features without the need for an additional proxy container, fundamentally changing how we think about service mesh deployments in Kubernetes.
This guide will deep dive into the Cilium Service Mesh, exploring its architecture, benefits, and practical implementation. We’ll walk through setting up a Kubernetes cluster with Cilium, enabling its service mesh features, and demonstrating how to apply advanced traffic management and security policies without a single sidecar in sight. Prepare to revolutionize your Kubernetes networking with a truly lightweight and high-performance service mesh.
TL;DR: Cilium Sidecar-less Service Mesh
Cilium leverages eBPF to provide a high-performance, sidecar-less service mesh, eliminating the overhead of traditional proxy-based meshes. It offers advanced traffic management, security, and observability directly at the kernel level.
Key Takeaways:
- eBPF Powered: Moves service mesh logic into the kernel for efficiency.
- Sidecar-less: No proxy containers, reducing resource consumption and complexity.
- Unified Platform: Combines CNI, Network Policies, and Service Mesh.
- Advanced Features: Traffic management (L7 routing, load balancing), mutual TLS (mTLS), and rich observability.
Quick Commands:
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://api.github.com/repos/cilium/cilium-cli/releases/latest | grep -oP "v\d+\.\d+\.\d+")
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz{,.sha256sum}
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
# Install Kind (if not already installed)
go install sigs.k8s.io/kind@v0.20.0
# Create a Kind cluster with Cilium Service Mesh enabled
kind create cluster --name cilium-mesh --config - <
Prerequisites
Before we embark on our journey with Cilium Service Mesh, ensure you have the following tools and knowledge:
- Kubernetes Fundamentals: A basic understanding of Kubernetes concepts like Pods, Services, Deployments, and Namespaces.
- kubectl: The Kubernetes command-line tool, configured to connect to your cluster. You can download it from the official Kubernetes documentation.
- Helm: The Kubernetes package manager, used for deploying Cilium. Install it by following the instructions on the Helm website.
- Kind (Kubernetes in Docker): We'll use Kind to create a local Kubernetes cluster for this tutorial. Install Kind by following the instructions on its official documentation page.
- Cilium CLI: The command-line interface for Cilium, essential for interacting with your Cilium installation and debugging.
- Go Language (for Kind installation): If you don't have Go installed, you'll need it to install Kind. Follow the instructions on the official Go website.
- Docker: Kind uses Docker to run Kubernetes nodes. Ensure Docker is installed and running on your system.
Having these tools ready will ensure a smooth experience throughout the tutorial.
Step-by-Step Guide
Step 1: Install Cilium CLI and Kind
First, we need to install the Cilium CLI, which provides useful tools for managing and troubleshooting Cilium, and Kind, which will create our local Kubernetes cluster. The Cilium CLI is crucial for checking the status of your Cilium deployment and enabling its various features. Kind offers a lightweight way to spin up a multi-node Kubernetes cluster locally using Docker containers, making it ideal for development and testing environments.
We'll fetch the latest stable version of the Cilium CLI from GitHub and install it into our system's PATH. For Kind, we'll use go install to get the latest version. These steps ensure you have the necessary client-side tools before provisioning the cluster.
# Install Cilium CLI
CILIUM_CLI_VERSION=$(curl -s https://api.github.com/repos/cilium/cilium-cli/releases/latest | grep -oP "v\d+\.\d+\.\d+")
echo "Installing Cilium CLI version: $CILIUM_CLI_VERSION"
curl -L --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-amd64.tar.gz{,.sha256sum}
sudo tar xzvfC cilium-linux-amd64.tar.gz /usr/local/bin
rm cilium-linux-amd64.tar.gz{,.sha256sum}
# Verify Cilium CLI installation
cilium version
# Install Kind (if not already installed)
echo "Installing Kind..."
go install sigs.k8s.io/kind@v0.20.0
# Verify Kind installation
kind version
Verify:
You should see version information for both cilium and kind.
cilium version
cilium: v0.15.0 compiled with go1.21.6 from 5534c0e
kind version
kind v0.20.0 go1.20.2 linux/amd64
Step 2: Create a Kind Cluster with Cilium Configuration
Next, we'll create a Kind Kubernetes cluster. A crucial step here is to disable the default CNI (Container Network Interface) that Kind would normally install. This is because Cilium will act as our CNI, and we want to ensure it's the sole network provider from the start. We'll also define specific pod and service subnets. This configuration prepares the cluster for a clean Cilium installation, where Cilium will manage all networking aspects, including the advanced service mesh features.
The Kind configuration specifies a control-plane node and two worker nodes, providing a more realistic environment for testing distributed applications and service mesh capabilities. By setting disableDefaultCNI: true, we explicitly tell Kind not to install its default networking plugin, making way for Cilium to take over completely.
kind create cluster --name cilium-mesh --config - <
Verify:
Confirm the cluster is running and your kubectl context is set correctly.
kubectl get nodes
NAME STATUS ROLES AGE VERSION
cilium-mesh-control-plane Ready control-plane 2m v1.28.3
cilium-mesh-worker Ready 2m v1.28.3
cilium-mesh-worker2 Ready 2m v1.28.3
Step 3: Deploy Cilium with Service Mesh and Hubble
Now, it's time to deploy Cilium itself. We'll use Helm, the Kubernetes package manager, for this. The Helm chart for Cilium is highly configurable. For our service mesh setup, we'll enable serviceMesh.enabled=true. Additionally, we'll enable Hubble, Cilium's observability platform, by setting hubble.enabled=true and hubble.ui.enabled=true. Hubble is indispensable for gaining deep insights into network flows and service mesh traffic, helping you visualize and troubleshoot communication patterns. For more on advanced eBPF observability, check out our guide on eBPF Observability with Hubble.
We're also configuring kubeProxyReplacement=strict for optimal performance, ensuring Cilium handles all service load balancing. The k8sServiceHost and k8sServicePort parameters are essential for Kind clusters so Cilium can properly connect to the Kubernetes API server. These settings enable Cilium to fully take over network policy enforcement, load balancing, and service mesh functionalities directly within the kernel, leveraging eBPF for maximum efficiency.
helm repo add cilium https://helm.cilium.io/
helm repo update
# Get Kubernetes API server details for Cilium configuration in Kind
K8S_SERVICE_HOST=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
K8S_SERVICE_PORT=$(kubectl get service kubernetes -n default -o jsonpath='{.spec.ports[0].port}')
helm install cilium cilium/cilium --version 1.15.4 \
--namespace kube-system \
--set kubeProxyReplacement=strict \
--set k8sServiceHost=${K8S_SERVICE_HOST} \
--set k8sServicePort=${K8S_SERVICE_PORT} \
--set ipam.mode=kubernetes \
--set serviceMesh.enabled=true \
--set hubble.enabled=true \
--set hubble.ui.enabled=true \
--set hubble.listenAddress=":4244" \
--set hubble.relay.enabled=true \
--set hubble.relay.listenAddress=":4245" \
--set externalIPs.enabled=true \
--set nodePort.enabled=true \
--set hostPort.enabled=true
Verify:
Wait for all Cilium components to become ready. This might take a few minutes.
cilium status --wait
/¯¯\
/¯¯\ /¯¯\ Cilium: OK
\ / \ / Operator: OK
\¯¯/¯¯\ Hubble: OK
/_/\_\_/\ ClusterMesh: disabled
\____/ Egress Gateway: disabled
Deployment cilium-operator Desired: 1, Ready: 1/1, Available: 1/1
Deployment cilium-hubble-ui Desired: 1, Ready: 1/1, Available: 1/1
Deployment cilium-hubble-relay Desired: 1, Ready: 1/1, Available: 1/1
DaemonSet cilium Desired: 3, Ready: 3/3, Available: 3/3
Containers: cilium Running: 3
cilium-operator Running: 1
cilium-hubble-ui Running: 1
cilium-hubble-relay Running: 1
Cluster Pods: 3/3 managed by Cilium
Image versions cilium: v1.15.4
cilium-operator: v1.15.4
cilium-hubble-ui: v0.13.0
cilium-hubble-relay: v1.15.4
Step 4: Deploy a Sample Application
To demonstrate the service mesh capabilities, we need a sample application. We'll deploy a simple application consisting of a few microservices (tiefighter, xwing, deathstar) that communicate with each other. This application setup is commonly used in Cilium examples to showcase network policies and service mesh features. It provides a clear scenario for applying traffic management and security policies.
The application YAML defines several deployments and services, simulating a typical microservices environment where different components interact. Once deployed, these services will automatically be part of the Cilium Service Mesh, allowing us to apply policies without modifying application code or injecting sidecar proxies.
kubectl apply -f https://raw.githubusercontent.com/cilium/cilium/1.15.4/examples/servicemesh/hubble-observe-l7/deployment.yaml
Verify:
Check that all application pods are running.
kubectl get pods -l app -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deathstar-7489564c7-j75w2 1/1 Running 0 2m4s 10.0.0.125 cilium-mesh-worker2
tiefighter-69c766b44c-28s8k 1/1 Running 0 2m4s 10.0.0.19 cilium-mesh-worker
xwing-79b84654c6-s2k9l 1/1 Running 0 2m4s 10.0.0.218 cilium-mesh-worker2
Step 5: Enable mTLS for a Namespace
One of the core features of a service mesh is mutual TLS (mTLS), which provides strong identity-based authentication and encryption for inter-service communication. With Cilium Service Mesh, enabling mTLS is as simple as annotating a namespace. Cilium then automatically handles certificate management and encryption for all traffic within that namespace, without requiring any sidecar proxies. This is a significant advantage over traditional service meshes, which often require extensive configuration for mTLS.
By annotating the default namespace, we instruct Cilium to enforce mTLS for all pods deployed within it. This means that any communication between pods in the default namespace will be encrypted and mutually authenticated at the kernel level, providing robust security without application changes. For a deeper dive into Kubernetes security, consider reading our Network Policies Security Guide.
kubectl annotate namespace default "cilium.io/servicemesh=enabled"
Verify:
Check the annotations on the namespace.
kubectl get namespace default -o jsonpath='{.metadata.annotations}' | grep servicemesh
"cilium.io/servicemesh":"enabled"
Step 6: Apply Layer 7 HTTP Policy
Cilium's eBPF-powered service mesh allows for advanced Layer 7 (application-layer) traffic policies without sidecars. This means you can define rules based on HTTP methods, paths, or headers. In this step, we'll create a CiliumNetworkPolicy that restricts access to the deathstar service. Specifically, we'll only allow GET requests to the /public path from tiefighter pods, demonstrating fine-grained control over application traffic.
This policy showcases the power of Cilium's L7 enforcement. It prevents unauthorized access to sensitive endpoints and ensures that only specific types of requests are allowed to pass. This level of control is typically found in traditional service meshes but is achieved directly in the kernel with Cilium, leading to lower latency and resource consumption. For more advanced traffic routing, you might also explore the Kubernetes Gateway API Migration Guide.
kubectl apply -f - <
Verify:
Attempt to make a request that should be allowed and one that should be denied. You can exec into the tiefighter pod to test.
TIEFIGHTER_POD=$(kubectl get pods -l app=tiefighter -o jsonpath='{.items[0].metadata.name}')
echo "Allowed request (GET /public):"
kubectl exec -it $TIEFIGHTER_POD -- curl -s deathstar/public
echo ""
echo "Denied request (GET /private):"
kubectl exec -it $TIEFIGHTER_POD -- curl -s deathstar/private
echo ""
echo "Denied request (POST /public):"
kubectl exec -it $TIEFIGHTER_POD -- curl -X POST -s deathstar/public
Expected Output:
Allowed request (GET /public):
Hello from deathstar!
Denied request (GET /private):
Denied request (POST /public):
The empty output for denied requests indicates that Cilium successfully blocked them at the network level.
Step 7: Observe Traffic with Hubble UI
Hubble, Cilium's observability platform, provides deep visibility into network flows and policy enforcement. With Hubble UI, you can visualize service dependencies, troubleshoot connectivity issues, and see how your L7 policies are impacting traffic in real-time. This is a critical tool for understanding the behavior of your service mesh. For more on Hubble, refer to our eBPF Observability with Hubble guide.
By port-forwarding the Hubble UI service, we can access its web interface from our local machine. The UI will display a graphical representation of network connections, show details of allowed and denied traffic, and help confirm that our L7 policy is actively working. This visual feedback is invaluable for verifying complex network configurations.
# Port-forward Hubble UI
kubectl port-forward -n kube-system svc/hubble-ui 8080:80 &
# Open Hubble UI in your browser
echo "Open Hubble UI in your browser: http://localhost:8080"
Verify:
Open your web browser and navigate to http://localhost:8080. You should see the Hubble UI dashboard, displaying network flows between your application pods. You can filter by namespace, pod, or even L7 protocol to observe the effects of your policies.
Production Considerations
While Cilium Service Mesh offers compelling advantages, deploying it in production requires careful planning:
- eBPF Kernel Compatibility: Ensure your Kubernetes nodes are running a Linux kernel version that fully supports the eBPF features required by Cilium (typically 4.9+ for basic features, 5.4+ for advanced ones). Always check the Cilium system requirements for the version you're deploying.
- Resource Management: Although sidecar-less, Cilium still consumes resources on your nodes. Monitor CPU and memory usage of Cilium agents and the operator. Optimize eBPF map sizes and policy complexity to prevent resource contention.
- Observability Integration: Leverage Hubble for deep observability. Integrate Hubble's metrics and logs with your existing monitoring stack (Prometheus, Grafana, ELK) for comprehensive insights. Consider using eBPF Observability with Hubble for custom metrics.
- Security Best Practices:
- Implement strong Kubernetes Network Policies in conjunction with Cilium's L7 policies.
- Enable mTLS across all critical namespaces.
- Regularly audit Cilium policies and configurations. Consider integrating with tools like Kyverno for policy enforcement, similar to the concepts discussed in Securing Container Supply Chains with Sigstore and Kyverno.
- High Availability: Deploy Cilium Operator in a highly available configuration. Ensure your etcd or Kubernetes API server is robust.
- Upgrades: Plan for rolling upgrades of Cilium components. Always test upgrades in a staging environment first. Refer to the official Cilium upgrade documentation.
- Integration with Cloud Providers: If running on a cloud (AWS EKS, GCP GKE, Azure AKS), follow specific integration guides provided by Cilium for optimal performance and compatibility with cloud-native load balancers and services.
- Traffic Management Complexity: For very complex, multi-cluster, or multi-cloud traffic routing scenarios, evaluate if Cilium's current service mesh features meet all requirements. While powerful, some advanced scenarios might still benefit from supplementary tools or a more feature-rich service mesh like Istio, especially when combined with its Ambient Mesh for sidecar-less operation as discussed in Istio Ambient Mesh Production Guide.
- Cost Optimization: The sidecar-less approach inherently saves resources. Further optimize costs by right-sizing nodes and leveraging autoscaling solutions like Karpenter, as explored in Karpenter Cost Optimization.
Troubleshooting
Here are some common issues you might encounter when working with Cilium Service Mesh and their solutions:
-
Issue: Cilium pods are not starting or are in CrashLoopBackOff.
Solution: This often indicates a problem with the underlying kernel or required modules. Check the Cilium agent logs for specific error messages and verify kernel compatibility. Ensure
CONFIG_BPFand related options are enabled in your kernel. A common cause in Kind is missing Kube-proxy replacement requirements.kubectl logs -n kube-system -l k8s-app=cilium --tail 100 kubectl describe pod -n kube-systemEnsure your kernel version meets Cilium's requirements. For Kind, double-check the
kubeProxyReplacementsettings during installation. -
Issue: Application pods are not getting IP addresses or are in Pending state.
Solution: If Cilium is acting as the CNI, this means Cilium isn't properly assigning IPs. Check Cilium's status and logs. Ensure the
ipam.modeis correctly configured (e.g.,kubernetesfor Kubernetes-managed IPAM).cilium status kubectl logs -n kube-system -l k8s-app=cilium-operator --tail 100Verify that the Cilium operator is healthy and has permissions to allocate IPs.
-
Issue: L7 policies are not being enforced, or traffic is unexpectedly blocked/allowed.
Solution: Policy enforcement issues can be tricky. Use
cilium policy getto verify the policy is correctly applied. Crucially, use Hubble to visualize traffic flows and see if packets are being dropped by a specific policy. The Hubble UI (http://localhost:8080after port-forwarding) or CLI (cilium hubble observe) can pinpoint the exact policy causing the issue.cilium policy get cilium hubble observe --type L7 --last 5mEnsure your
endpointSelectorandfromEndpointsmatch the labels of your pods precisely. Remember that CiliumNetworkPolicies are additive, and if no policy explicitly allows traffic, it is implicitly denied. -
Issue: mTLS is not working, or services cannot communicate after enabling it.
Solution: Verify that the namespace is correctly annotated for service mesh enablement. Check Cilium agent logs for mTLS-related errors. Ensure that all communicating pods are within the mTLS-enabled namespace or that appropriate policies are in place for cross-namespace communication if needed.
kubectl get namespace default -o jsonpath='{.metadata.annotations}' kubectl logs -n kube-system -l k8s-app=cilium --tail 100 | grep "mTLS"Sometimes, restarting application pods after enabling mTLS on a namespace can resolve initial connection issues as they pick up new network configurations.
-
Issue: Hubble UI is not accessible or not showing any flows.
Solution: First, ensure the Hubble UI and Hubble Relay pods are running and healthy. Verify that you are correctly port-forwarding the
hubble-uiservice. If no flows appear, check if Hubble is enabled in your Cilium installation and if the Cilium agents are correctly configured to send flow data to Hubble Relay.kubectl get pods -n kube-system -l k8s-app=hubble-ui kubectl get pods -n kube-system -l k8s-app=hubble-relay kubectl logs -n kube-system -l k8s-app=cilium --tail 100 | grep "hubble"Confirm that
hubble.enabled=trueandhubble.relay.enabled=truewere set during Helm installation. Also, check for any firewall rules blocking the port-forwarded traffic. -
Issue: Performance degradation or high resource usage by Cilium.
Solution: While sidecar-less, Cilium still uses eBPF programs in the kernel. High policy count, complex regex in L7 policies, or a large number of endpoints can increase resource usage. Use
cilium metricsto inspect Cilium's internal metrics and identify bottlenecks. Optimize your policies where possible.cilium metrics list kubectl top pods -n kube-system -l k8s-app=ciliumConsider simplifying overly complex policies or breaking them down. Ensure your nodes have sufficient CPU and memory resources. For deeper insights, you might export Cilium metrics to Prometheus and visualize them in Grafana.
FAQ Section
-
What is eBPF and how does Cilium use it for service mesh?
eBPF (extended Berkeley Packet Filter) is a revolutionary technology that allows programs to run in the Linux kernel without changing kernel source code or loading kernel modules. Cilium leverages eBPF to implement networking, security, and observability directly in the kernel's data path. For service mesh, this means it can perform L4/L7 traffic management, mTLS, and policy enforcement at wire speed, without the overhead of injecting sidecar proxies. This is the core of Cilium's sidecar-less architecture, offering significant performance and resource efficiency benefits. You can learn more about eBPF on the eBPF.io website.
-
How does Cilium Service Mesh compare to traditional sidecar-based meshes like Istio?
The primary difference lies in the architecture. Traditional meshes like Istio inject a proxy (e.g., Envoy) sidecar into every application pod. This sidecar intercepts all network traffic. Cilium, on the other hand, uses eBPF programs loaded into the kernel to achieve the same (and often more) functionality without any sidecar proxies. This results in lower resource consumption, reduced latency, and simplified operations (no sidecar lifecycle management). While Istio offers a broader range of features, especially for complex multi-cluster scenarios, Cilium's approach is often preferred for its efficiency and tight integration with Kubernetes networking. It's worth noting that Istio is also moving towards a sidecar-less model with its Ambient Mesh, which we cover in our Istio Ambient Mesh Production Guide.
-
Can Cilium Service Mesh be used with other CNIs?
No, Cilium functions as a CNI itself. To utilize Cilium's full capabilities, including its service mesh features, it must be installed as the primary CNI for your Kubernetes cluster. It replaces other CNIs like Calico, Flannel, or the default cloud provider CNI. This allows Cilium to have complete control over the network data path and inject eBPF programs for advanced functionality.
- What kind of traffic management features does Cil