Orchestration

Cilium BGP: Native Kubernetes Peering

Introduction

In the dynamic world of Kubernetes, efficient and robust networking is paramount. While Kubernetes provides its own networking model, integrating with existing on-premises or cloud network infrastructure often presents challenges, especially when it comes to advertising service IPs or Pod CIDRs to external routers. This is where Border Gateway Protocol (BGP) shines, and Cilium, with its powerful eBPF-based data plane, offers native BGP peering capabilities that revolutionize how Kubernetes clusters communicate with the outside world.

Traditional Kubernetes deployments often rely on external load balancers or complex Ingress controllers to expose services. However, for advanced use cases like bare-metal deployments, hybrid cloud environments, or scenarios requiring direct routing to Pods and Services, BGP integration becomes indispensable. Cilium’s native BGP support simplifies this by allowing your Kubernetes nodes to peer directly with your network routers, announcing Kubernetes service IP addresses (LoadBalancer IPs, ExternalIPs) and even Pod CIDRs, enabling highly efficient and direct traffic flow. This guide will walk you through setting up Cilium BGP, transforming your Kubernetes networking from isolated islands into a fully integrated part of your broader network topology.

TL;DR: Cilium BGP in a Nutshell

Cilium’s native BGP support allows Kubernetes nodes to peer directly with network routers, advertising service IPs (LoadBalancer, ExternalIPs) and Pod CIDRs. This enhances external connectivity, simplifies network integration, and enables advanced routing scenarios.

Key Steps:

  1. Install Cilium with BGP enabled.
  2. Configure BGPPeer and BGPAdvertisement Custom Resources.
  3. Expose Services using type: LoadBalancer or ExternalIPs.
  4. Verify BGP sessions and route advertisements.
# Install Cilium with BGP
helm install cilium cilium/cilium --version 1.15.0 \
  --namespace kube-system \
  --set bpf.masquerade=true \
  --set k8s.requireIPv4PodCIDR=true \
  --set loadBalancer.mode=snat \
  --set loadBalancer.enableExternalIPs=true \
  --set loadBalancer.enableSharedIP=true \
  --set bgp.enabled=true \
  --set bgp.announce.loadBalancerIP=true \
  --set bgp.announce.podCIDR=false # Set to true if you want to announce Pod CIDRs

# Example BGPPeer configuration
kubectl apply -f - <

Prerequisites

Before diving into Cilium BGP, ensure you have the following:

  • Kubernetes Cluster: A working Kubernetes cluster (v1.23+ recommended). This guide assumes a bare-metal or self-managed cloud environment where you control the underlying networking. For cloud providers with native LoadBalancer implementations (e.g., AWS ELB, GCP L7 LB), Cilium BGP might not be the primary choice unless you need to advertise Pod CIDRs or manage your own BGP fabric.
  • Cilium CLI: The cilium CLI installed. Refer to the Cilium installation guide.
  • Helm: Helm 3 installed for deploying Cilium.
  • Network Routers: Access to one or more network routers (physical or virtual) that support BGP and with which your Kubernetes nodes can establish BGP sessions. You'll need their IP addresses and Autonomous System Numbers (ASNs).
  • IP Address Management (IPAM): A clear plan for IP address allocation, especially for LoadBalancer IPs. Cilium can integrate with external IPAM solutions or use its own.
  • Basic Networking Knowledge: Familiarity with TCP/IP, routing, and BGP concepts is highly recommended.
  • Firewall Rules: Ensure that your network firewalls allow TCP port 179 for BGP communication between your Kubernetes nodes and your BGP peers.

Step-by-Step Guide: Setting Up Cilium BGP

Step 1: Install Cilium with BGP Enabled

The first step is to install Cilium with BGP support activated. This involves setting several Helm values during the installation. We'll enable BGP and configure Cilium to announce LoadBalancer IPs. We'll also set loadBalancer.mode=snat for simplicity, though other modes exist. You might also choose to announce Pod CIDRs, which is useful if you want external routers to directly route to Pods within your cluster.

For more advanced networking configurations, such as WireGuard encryption for Pod-to-Pod traffic, consider exploring features like those detailed in our Cilium WireGuard Encryption guide.

# Add the Cilium Helm repository
helm repo add cilium https://helm.cilium.io/

# Update your Helm repositories
helm repo update

# Install Cilium with BGP enabled
# Choose a version appropriate for your cluster, e.g., 1.15.0
helm install cilium cilium/cilium --version 1.15.0 \
  --namespace kube-system \
  --set bpf.masquerade=true \
  --set k8s.requireIPv4PodCIDR=true \
  --set loadBalancer.mode=snat \
  --set loadBalancer.enableExternalIPs=true \
  --set loadBalancer.enableSharedIP=true \
  --set bgp.enabled=true \
  --set bgp.announce.loadBalancerIP=true \
  --set bgp.announce.podCIDR=false \
  --set bgp.announce.nodes=false \
  --set bgp.serviceSelector.matchLabels.bgp-advertisement=true \
  --set ipam.mode=clusterPool \
  --set ipam.operator.clusterPoolIPv4PodCIDRList={10.0.0.0/8} # Example: Adjust to your Pod CIDR range
  # --set bgp.bgpControlPlane.enabled=true # Enable if you want to use the BGP Control Plane CRD
  # --set bgp.gracefulRestart.enabled=true # Enable BGP graceful restart for high availability
  # --set bgp.gracefulRestart.restartTimeSeconds=120

Explanation:

  • --set bpf.masquerade=true: Enables masquerading for outbound traffic from Pods.
  • --set k8s.requireIPv4PodCIDR=true: Ensures IPv4 Pod CIDRs are assigned.
  • --set loadBalancer.mode=snat: Configures the load balancer to use Source Network Address Translation (SNAT).
  • --set loadBalancer.enableExternalIPs=true: Allows Cilium to advertise ExternalIPs.
  • --set loadBalancer.enableSharedIP=true: Enables multiple services to share the same external IP, useful with BGP.
  • --set bgp.enabled=true: The crucial flag to enable Cilium's BGP daemon.
  • --set bgp.announce.loadBalancerIP=true: Tells Cilium to announce IPs assigned to Services of type LoadBalancer.
  • --set bgp.announce.podCIDR=false: Set to true if you want to advertise the CIDR blocks of your Pods, allowing external routers to route directly to Pods.
  • --set bgp.announce.nodes=false: Set to true if you want to advertise the IP addresses of your Kubernetes nodes.
  • --set bgp.serviceSelector.matchLabels.bgp-advertisement=true: This is a powerful feature. It allows you to selectively advertise only those LoadBalancer services that have the label bgp-advertisement: true. This prevents all LoadBalancer services from being advertised by default.
  • --set ipam.mode=clusterPool: Configures Cilium's IPAM to manage a cluster-wide pool.
  • --set ipam.operator.clusterPoolIPv4PodCIDRList={10.0.0.0/8}: Defines the overall Pod CIDR range for the cluster. Adjust this to your network's requirements.

Verify:

Check if Cilium Pods are running and healthy. You should see cilium-agent and cilium-operator Pods in the kube-system namespace.

kubectl get pods -n kube-system -l app=cilium
NAME                                  READY   STATUS    RESTARTS   AGE
cilium-agent-abcde                    1/1     Running   0          2m
cilium-agent-fghij                    1/1     Running   0          2m
cilium-operator-klmno-pqrst           1/1     Running   0          2m

Step 2: Configure BGP Peers

Next, you need to tell Cilium which routers it should peer with. This is done using the BGPPeer Custom Resource Definition (CRD). Each BGPPeer object defines a BGP peering session with an external router. Cilium will establish a BGP session from each node in your cluster to the specified peer.

# bgppeer-router1.yaml
apiVersion: cilium.io/v2alpha1
kind: BGPPeer
metadata:
  name: my-router-peer
spec:
  peerAddress: 192.168.1.1 # Replace with your router's IP address
  peerASN: 65001 # Replace with your router's Autonomous System Number (ASN)
  localASN: 64512 # Replace with your cluster's ASN (must be unique in your network)
  # Optional: nodeSelector to limit which nodes peer with this router
  # nodeSelector:
  #   matchLabels:
  #     kubernetes.io/hostname: k8s-node01
kubectl apply -f bgppeer-router1.yaml

Explanation:

  • peerAddress: The IP address of your external BGP router.
  • peerASN: The ASN of your external BGP router.
  • localASN: The ASN of your Kubernetes cluster. This must be a unique ASN within your network. Private ASNs (64512-65534) are commonly used for internal networks.
  • nodeSelector: (Optional) If you have a complex network topology, you might want only specific nodes to peer with certain routers. This allows you to select nodes based on labels.

Verify:

Check the status of the BGPPeer and the BGP sessions. It might take a moment for the sessions to establish.

cilium bgp peers
# Expected output (example):
BGP Peering Status:
  Peer: 192.168.1.1 (ASN: 65001)
    Local ASN: 64512
    State: Established
    Uptime: 1m30s
    Advertised Routes: 0
    Received Routes: 0

If the state is not Established, double-check your router configuration, firewall rules, and the IP/ASN in your BGPPeer manifest. You can also inspect Cilium agent logs for BGP-related errors.

kubectl logs -n kube-system -l k8s-app=cilium --tail=50 | grep bgp

Step 3: Configure BGP Advertisements (Optional, for advanced control)

While bgp.announce.loadBalancerIP=true and bgp.announce.podCIDR=true are useful for general announcements, you can gain more granular control using BGPAdvertisement and CiliumLoadBalancerIPPool Custom Resources. These CRDs allow you to define specific IP ranges for LoadBalancer services and control which prefixes are advertised under what conditions. This is particularly useful in environments where cost optimization and precise resource management are critical.

First, let's define an IP pool for our LoadBalancer services. This is crucial for Cilium to know which IPs it can assign and advertise.

# ciliumloadbalancerippool.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: my-lb-pool
spec:
  blocks:
    - cidr: 192.168.100.0/24 # A dedicated CIDR block for LoadBalancer IPs
kubectl apply -f ciliumloadbalancerippool.yaml

Next, define a BGPAdvertisement to specify what to advertise. This example advertises the IP pool we just created.

# bgpadvertisement.yaml
apiVersion: cilium.io/v2alpha1
kind: BGPAdvertisement
metadata:
  name: advertise-lb-pool
spec:
  # This advertisement will apply to any BGPPeer that doesn't have a specific `advertisements` list
  # Or you can explicitly link it via `BGPPeer`'s `advertisements` field
  loadBalancerIPs:
    - ipPoolRef:
        name: my-lb-pool
  # You can also advertise Pod CIDRs or Node IPs here if bgp.announce.podCIDR/nodes is false
  # podCIDRs:
  #   - {} # Advertise all Pod CIDRs
  # nodePodCIDRs:
  #   - {} # Advertise all Node Pod CIDRs
  # nodeExternalIPs:
  #   - {} # Advertise Node External IPs
kubectl apply -f bgpadvertisement.yaml

Explanation:

  • CiliumLoadBalancerIPPool: Defines a pool of IP addresses that Cilium can allocate to LoadBalancer services. This is essential for Cilium to manage and advertise these IPs.
  • BGPAdvertisement: This CRD provides fine-grained control over what BGP routes are advertised. You can specify whether to advertise LoadBalancer IPs (from a pool), Pod CIDRs, or Node IPs.

Verify:

Check the status of your IP Pool and Advertisement. This step doesn't have a direct output to verify BGP routes yet, but ensures the configuration is accepted.

kubectl get ciliumloadbalancerippool
kubectl get bgpadvertisement

Step 4: Expose a Service with LoadBalancer Type

Now, let's create a sample application and expose it using a Service of type LoadBalancer. Cilium will automatically allocate an IP from the configured CiliumLoadBalancerIPPool (if defined) or an available IP in the cluster (if using default IPAM and bgp.announce.loadBalancerIP=true) and advertise it via BGP.

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
# nginx-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  labels:
    bgp-advertisement: "true" # Only advertise services with this label if bgp.serviceSelector is set
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer
  externalTrafficPolicy: Cluster # or Local
  # Optional: Request a specific IP from the pool
  # loadBalancerIP: 192.168.100.10
kubectl apply -f nginx-deployment.yaml
kubectl apply -f nginx-service.yaml

Explanation:

  • type: LoadBalancer: This is the key. Cilium's BGP controller intercepts this and, instead of provisioning an external cloud load balancer, it advertises the allocated IP via BGP.
  • bgp-advertisement: "true": If you configured bgp.serviceSelector.matchLabels.bgp-advertisement=true during Cilium installation, this label is necessary for this service's IP to be advertised. This provides an excellent mechanism for controlling which services are exposed via BGP.
  • externalTrafficPolicy: Cluster: Traffic might be routed to any Pod in the cluster.
  • externalTrafficPolicy: Local: Traffic will only be routed to Pods on the node that received the traffic. This preserves the client source IP, but requires the BGP announcement to be specific to the node hosting the Pod.

Verify:

Check the service status. You should see an EXTERNAL-IP assigned. This IP should now be advertised via BGP to your configured routers.

kubectl get svc nginx-service
NAME            TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)        AGE
nginx-service   LoadBalancer   10.96.10.100   192.168.100.10   80:30000/TCP   1m

Now, verify that Cilium is advertising this route. You might need to wait a few moments for BGP to converge.

cilium bgp routes
# Expected output (example):
Prefix          NextHop         LocalPref  MED  Communities  Origin  ASPath  Age
192.168.100.10/32 192.168.1.20   100        0    -            IGP     64512   5s # NextHop will be the Cilium node IP

Finally, verify on your external router that it has learned the route for 192.168.100.10/32 (or whatever IP your service got). The exact command depends on your router type (e.g., Cisco, Juniper, VyOS, etc.).

# Example for a VyOS/FRR based router:
show ip bgp
show ip route 192.168.100.10

From an external machine, you should now be able to access your Nginx service using its EXTERNAL-IP.

curl 192.168.100.10
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...

Production Considerations

Deploying Cilium BGP in a production environment requires careful planning and consideration beyond the basic setup:

  1. High Availability (HA):
    • Multiple BGP Peers: Configure multiple BGPPeer objects to peer with different routers for redundancy. Each Kubernetes node will attempt to establish sessions with all defined peers.
    • Graceful Restart: Enable BGP graceful restart in Cilium (bgp.gracefulRestart.enabled=true) and on your routers. This prevents route flapping during Cilium agent restarts or node reboots, improving network stability.
  2. IP Address Management (IPAM):
    • Dedicated IP Pools: Use CiliumLoadBalancerIPPool to define specific IP ranges for your services. This helps in organizing your network and preventing IP conflicts.
    • Integration with External IPAM: For large-scale deployments, consider integrating with an external IPAM solution to manage your IP address space consistently.
  3. Network Security:
    • Router ACLs/Firewalls: Implement strict Access Control Lists (ACLs) on your BGP routers to limit which Kubernetes nodes can peer and what routes they can advertise.
    • BGP Authentication: Configure MD5 authentication for BGP sessions between Cilium nodes and your routers to prevent unauthorized peering. This is a critical security measure.
    • Kubernetes Network Policies: Complement BGP routing with robust Kubernetes Network Policies to control traffic within the cluster and to/from external endpoints.
  4. Route Filtering and Advertisement Control:
    • BGPAdvertisement CRD: Use the BGPAdvertisement CRD to precisely control which prefixes are advertised (e.g., specific LoadBalancer IPs, Pod CIDRs, or Node IPs).
    • bgp.serviceSelector: Leverage this Cilium Helm value to only advertise LoadBalancer IPs for services with specific labels, giving you granular control over exposure.
    • Router-side Filtering: Implement route filtering on your external BGP routers to accept only expected routes from your Kubernetes cluster.
  5. Monitoring and Observability:
    • Cilium Metrics: Cilium exposes extensive metrics, including BGP session status and route advertisements. Integrate these into your Prometheus/Grafana stack.
    • cilium bgp CLI: Regularly use cilium bgp peers and cilium bgp routes to inspect the state of your BGP sessions and advertised routes.
    • Router Monitoring: Monitor BGP sessions and learned routes on your external routers.
    • eBPF Observability: For deep insights into network traffic, consider using tools like Hubble, as discussed in our guide on eBPF Observability with Hubble.
  6. Scalability:
    • Number of Peers: Be mindful of the number of BGP sessions each Kubernetes node needs to establish. While modern BGP daemons are efficient, an excessive number of peers per node can impact performance.
    • Route Scale: Consider the number of routes your routers need to manage. Advertising every Pod CIDR in a very large cluster might generate a significant routing table.

Troubleshooting

Here are common issues encountered when setting up Cilium BGP and their solutions:

  1. BGP Session Not Established (State: Idle/Active)

    Problem: The cilium bgp peers command shows the BGP session in an Idle or Active state, not Established.

    Solution:

    • Firewall: Ensure TCP port 179 is open between your Kubernetes nodes and the BGP router. Check both host-level firewalls (firewalld, ufw, security groups) and network-level firewalls.
    • IP Address/ASN Mismatch: Double-check that peerAddress, peerASN, and localASN in your BGPPeer CRD exactly match the configuration on your router.
    • Router Configuration: Verify that the BGP configuration on your external router is correct, especially the neighbor IP and ASN.
    • Network Reachability: Ping the peerAddress from your Kubernetes nodes to ensure basic network connectivity.
    • Cilium Logs: Check Cilium agent logs for specific BGP errors:
      kubectl logs -n kube-system -l k8s-app=cilium --tail=100 | grep bgp
      
  2. LoadBalancer IP Not Assigned to Service

    Problem: A Service of type LoadBalancer doesn't get an EXTERNAL-IP.

    Solution:

    • CiliumLoadBalancerIPPool: If you're using IP pools, ensure a CiliumLoadBalancerIPPool exists and has available IPs within its CIDR range. Verify it's applied correctly.
    • Cilium BGP Configuration: Check your Cilium Helm values. Ensure bgp.enabled=true and bgp.announce.loadBalancerIP=true (or that your BGPAdvertisement CRD is correctly configured for LoadBalancer IPs).
    • Service Selector Label: If bgp.serviceSelector is configured in Cilium's Helm values, ensure your service has the required label (e.g., bgp-advertisement: "true").
    • Cilium Operator Logs: The Cilium operator is responsible for IP allocation. Check its logs:
      kubectl logs -n kube-system -l k8s-app=cilium-operator --tail=50
      
  3. Routes Not Advertised to Router

    Problem: The EXTERNAL-IP is assigned, and BGP session is Established, but the router doesn't learn the route.

    Solution:

    • cilium bgp routes: Verify that Cilium itself sees the route as advertised:
      cilium bgp routes
      

      If it's not listed, review your Cilium Helm configuration (bgp.announce.loadBalancerIP, bgp.announce.podCIDR) and BGPAdvertisement CRDs.

    • Router Route Filtering: Check your router's BGP configuration for inbound route filters (e.g., prefix-lists, route-maps) that might be blocking the routes from your Kubernetes nodes.
    • Next-Hop Reachability: Ensure the router can reach the BGP next-hop (which will be the IP of the Kubernetes node advertising the route).
    • BGP Peer Group/Templates: If your router uses peer groups or templates, ensure the correct policies are applied to the Kubernetes node peers.
  4. Traffic Not Reaching Service After BGP Advertisement

    Problem: The route is advertised and learned by the router, but traffic sent to the EXTERNAL-IP doesn't reach the Kubernetes service.

    Solution:

    • Service Endpoints: Verify that your Kubernetes service has healthy endpoints (Pods are running and ready).
      kubectl get ep nginx-service
      
    • Cilium Network Policies: Ensure no Cilium Network Policies are blocking ingress traffic to your service Pods.
    • externalTrafficPolicy: If externalTrafficPolicy: Local is set, ensure the traffic is hitting a node that actually hosts a Pod for that service. If not, try externalTrafficPolicy: Cluster.
    • Packet Capture (tcpdump): Use tcpdump on the Kubernetes node's network interface to see if traffic is arriving.
      kubectl exec -it -n kube-system cilium-agent-<your-node> -- tcpdump -i eth0 host <EXTERNAL_IP>
      
  5. Pod CIDRs Not Advertised

    Problem: You configured bgp.announce.podCIDR=true but don't see Pod CIDRs advertised.

    Solution:

    • Cilium Helm Config: Double-check bgp.announce.podCIDR=true in your Cilium Helm values.
    • Node IPAM: Ensure your Kubernetes cluster's IPAM (either Cilium's or Kubernetes' default) is correctly assigning Pod CIDRs to nodes.
    • cilium bgp routes: Verify Cilium lists the Pod CIDRs:
      cilium bgp routes
      
    • Router Configuration: Ensure your router is configured to accept and process routes for the Pod CIDR ranges.
  6. BGP Authentication Failure

    Problem: BGP sessions fail to establish with authentication errors in logs.

    Solution:

    • Password Mismatch: Ensure the MD5 password configured in the BGPPeer CRD (if using authenticationSecret) matches the password configured on the router exactly.
    • Router MD5 Configuration: Verify MD5 authentication is correctly enabled on the router interface or BGP neighbor configuration.

FAQ Section

  1. What is the difference between Cilium BGP and MetalLB?

    Both Cilium BGP and MetalLB provide LoadBalancer functionality for bare-metal Kubernetes clusters by advertising service IPs. The key difference lies in their architecture and integration. MetalLB is a standalone LoadBalancer implementation that can use ARP/NDP (Layer 2) or BGP (Layer 3) to announce IPs. Cilium BGP, on the other hand, is an integrated feature of the Cilium CNI. It leverages Cilium's eBPF data plane for efficient packet handling and offers a unified control plane for both CNI and BGP. If you're already using Cilium for your CNI, its native BGP integration offers a more cohesive and potentially more performant solution.

  2. Can Cilium BGP advertise Pod CIDRs directly?

    Yes, Cilium BGP can advertise Pod CIDRs. By setting bgp.announce.podCIDR=true during Cilium installation or configuring a BGPAdvertisement CRD, you can instruct Cilium to advertise the CIDR blocks of your Kubernetes nodes (which contain the Pods). This allows external routers to have direct routes to Pods, bypassing NAT and potentially simplifying network debugging. However, be mindful of the number of routes this generates in large clusters.

  3. How does Cilium BGP handle high availability for LoadBalancer IPs?

    Cilium BGP achieves high availability by advertising the same LoadBalancer IP from multiple Kubernetes nodes (the nodes hosting the service's Pods, or all nodes if externalTrafficPolicy: Cluster is used). BGP's Equal-Cost Multi-Path (ECMP) routing then distributes traffic across these multiple paths. If a node fails or a BGP session drops, routers will automatically remove the unreachable path and redirect traffic to the remaining healthy nodes. Enabling BGP graceful restart further enhances this by preventing route withdrawals during planned maintenance or temporary outages.

  4. Is Cilium BGP suitable for cloud environments?

    Cilium BGP is primarily designed for bare-metal or self-managed Kubernetes clusters where you have direct control over the underlying network and BGP routers. In most managed cloud Kubernetes services (e.g., GKE, EKS, AKS), the cloud provider offers its own native LoadBalancer integration, which typically leverages their proprietary network fabric. While you could

Leave a Reply

Your email address will not be published. Required fields are marked *