Orchestration

Kubernetes BGP: Cilium Native Peering

Introduction

In the ever-evolving landscape of Kubernetes networking, efficiently routing traffic in and out of your cluster remains a critical challenge. Traditional solutions often involve complex external load balancers or intricate Ingress controllers that can introduce latency and operational overhead. For those running Kubernetes on bare-metal, edge, or hybrid cloud environments, advertising service IPs and Egress IPs directly to the underlying network infrastructure is highly desirable. This is where Cilium, an eBPF-powered CNI, steps in with its native Border Gateway Protocol (BGP) integration.

Cilium’s BGP capabilities transform Kubernetes into a first-class citizen in your data center’s routing topology. By peering directly with network routers, Cilium can dynamically advertise Kubernetes Service IPs (ServiceType=LoadBalancer), Pod CIDRs, and Egress NAT IPs. This eliminates the need for external load balancers, reduces network hops, and simplifies network architecture, leading to improved performance and reduced costs. This guide will walk you through configuring Cilium BGP, enabling your Kubernetes services to be directly reachable from your physical network, and leveraging advanced routing features for robust and scalable deployments.

TL;DR: Cilium BGP Quick Start

Cilium BGP enables Kubernetes to advertise Service IPs, Pod CIDRs, and Egress NAT IPs directly to your network routers, simplifying network architecture and improving performance. Here’s a quick summary of the key steps:

  • Install Cilium with BGP enabled:
  • helm install cilium cilium/cilium --version 1.15.0 \
      --namespace kube-system \
      --set ipam.mode=clusterPool \
      --set cluster.name=kubezilla-cluster \
      --set bgp.enabled=true \
      --set bgp.announce.loadbalancerIP=true \
      --set bgp.announce.podCIDR=false \
      --set bgp.announce.nodes=false \
      --set bgp.announce.serviceExternalIP=true \
      --set bgp.announce.servicePodCIDR=false \
      --set bgp.announce.serviceIP=true
  • Configure a BGP Peer Policy:
  • apiVersion: cilium.io/v2alpha1
    kind: CiliumBGPPeerPolicy
    metadata:
      name: bgp-policy-to-router1
    spec:
      nodeSelector:
        matchLabels:
          kubernetes.io/hostname: k8s-node1 # Or a broader selector
      peers:
        - peerAddress: 192.168.1.1 # Your router's IP
          peerASN: 65001 # Your router's ASN
          localASN: 64512 # Cilium's ASN
  • Create a LoadBalancer Service:
  • apiVersion: v1
    kind: Service
    metadata:
      name: my-nginx-service
    spec:
      selector:
        app: nginx
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
      type: LoadBalancer
      loadBalancerIP: 192.168.1.100 # Your desired IP, must be available in your network
  • Verify BGP Advertisements: Use cilium bgp peers and check your router’s BGP table.

Prerequisites

Before diving into Cilium BGP, ensure you have the following:

  • Kubernetes Cluster: A running Kubernetes cluster (v1.20+ recommended). This guide assumes a bare-metal or self-managed environment where you control the underlying network.
  • Cilium CLI: The cilium CLI installed on your local machine. Refer to the Cilium documentation for installation.
  • Helm: Helm v3+ installed for deploying Cilium.
  • Network Routers: One or more BGP-capable network routers in your infrastructure. You’ll need their IP addresses and Autonomous System Numbers (ASNs).
  • IP Address Management (IPAM): A clear plan for IP address allocation. If you plan to advertise Service IPs, ensure they are not conflicting with existing IPs and are routable within your network.
  • Basic BGP Knowledge: Familiarity with BGP concepts like ASNs, peering, and route advertisements will be beneficial.
  • Admin Privileges: Sufficient permissions to install and configure CNI plugins in your Kubernetes cluster.

Step-by-Step Guide: Configuring Cilium BGP

Step 1: Install Cilium with BGP Enabled

The first step is to install Cilium in your Kubernetes cluster, ensuring that the BGP controller is enabled. We’ll use Helm for this, specifying various BGP-related flags to control what Cilium advertises. For bare-metal deployments, ipam.mode=clusterPool is a common choice, but you might use AWS ENI or Azure IPAM for cloud environments. We’re specifically enabling bgp.announce.loadbalancerIP and bgp.announce.serviceExternalIP to allow our ServiceType=LoadBalancer to be advertised.

helm repo add cilium https://helm.cilium.io/
helm repo update

helm install cilium cilium/cilium --version 1.15.0 \
  --namespace kube-system \
  --set ipam.mode=clusterPool \
  --set cluster.name=kubezilla-cluster \
  --set bgp.enabled=true \
  --set bgp.announce.loadbalancerIP=true \
  --set bgp.announce.podCIDR=false \
  --set bgp.announce.nodes=false \
  --set bgp.announce.serviceExternalIP=true \
  --set bgp.announce.servicePodCIDR=false \
  --set bgp.announce.serviceIP=true \
  --set bgp.kubeProxyReplacement=strict # Recommended for bare-metal

Explanation

This command installs Cilium and enables its BGP agent. Key parameters are:

  • ipam.mode=clusterPool: Configures Cilium to manage IP addresses from a cluster-wide pool. This is ideal for bare-metal.
  • bgp.enabled=true: Activates the BGP controller within Cilium.
  • bgp.announce.loadbalancerIP=true: Instructs Cilium to advertise the loadBalancerIP specified in ServiceType=LoadBalancer services.
  • bgp.announce.serviceExternalIP=true: Allows Cilium to advertise ExternalIPs defined on services.
  • bgp.announce.podCIDR=false: By default, we keep this false as advertising individual pod CIDRs can lead to large routing tables on your routers. For some advanced scenarios, you might enable this.
  • bgp.kubeProxyReplacement=strict: This is important for bare-metal as it allows Cilium to fully replace kube-proxy, handling service load balancing via eBPF and improving performance. For more on networking, consider exploring our guide on Cilium WireGuard Encryption for secure pod-to-pod communication.

Verify

Check if Cilium pods are running and healthy. You should see a cilium-bgp-control-plane pod in addition to the regular cilium agent pods.

kubectl -n kube-system get pods -l k8s-app=cilium
kubectl -n kube-system get pods -l app.kubernetes.io/name=cilium-bgp-control-plane

Expected output:

NAME                                   READY   STATUS    RESTARTS   AGE
cilium-xxxx                            1/1     Running   0          5m
cilium-bgp-control-plane-xxxx          1/1     Running   0          5m

Step 2: Configure a Cilium BGP Peer Policy

Now that Cilium is installed with BGP capabilities, you need to tell it which routers to peer with. This is done using a Kubernetes Custom Resource Definition (CRD) called CiliumBGPPeerPolicy. This policy defines the BGP neighbors, their ASNs, and the local ASN for Cilium. You can scope these policies to specific nodes using nodeSelector.

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeerPolicy
metadata:
  name: bgp-policy-to-router1
spec:
  nodeSelector:
    matchLabels:
      kubernetes.io/hostname: k8s-node1 # Apply this policy only to k8s-node1
      # Alternatively, you can use:
      # cilium.io/bgp-policy: router1-peers # If you label your nodes
  peers:
    - peerAddress: 192.168.1.1 # IP address of your BGP router
      peerASN: 65001 # ASN of your BGP router
      localASN: 64512 # The ASN Cilium will use for peering
      # Optionally configure authentication with a password
      # authSecret:
      #   name: bgp-auth-secret
      #   namespace: kube-system
    - peerAddress: 192.168.1.2 # Another router, if applicable
      peerASN: 65002
      localASN: 64512
  # Optional: Define address families to announce
  # families:
  #   - ipv4unicast:
  #       prefixLimit: 100
  #   - ipv6unicast:
  #       prefixLimit: 100

Explanation

This YAML defines a BGP peer policy. The nodeSelector ensures that only specific Kubernetes nodes will attempt to establish BGP sessions with the defined peers. This is crucial for environments with diverse network topologies or for testing. peerAddress is the IP of your physical router, peerASN is its ASN, and localASN is the ASN Cilium will present for itself. It’s common to choose a private ASN (e.g., 64512-65534) for your Kubernetes cluster. If your router requires authentication, you can define an authSecret.

Apply this policy to your cluster:

kubectl apply -f bgp-peer-policy.yaml

Verify

Check the status of your BGP peers using the Cilium CLI. This command runs inside one of the Cilium agent pods.

kubectl exec -ti -n kube-system cilium-xxxx -- cilium bgp peers

Expected output (after some time for peering to establish):

BGP Peers:
  Peer Address: 192.168.1.1
  Peer ASN: 65001
  Local ASN: 64512
  Session State: Established
  Uptime: 2m3s
  Advertised Routes: 0
  Received Routes: 0

  Peer Address: 192.168.1.2
  Peer ASN: 65002
  Local ASN: 64512
  Session State: Established
  Uptime: 1m15s
  Advertised Routes: 0
  Received Routes: 0

If the session state is not Established, double-check your router configuration, firewall rules, and the IP addresses/ASNs in your CiliumBGPPeerPolicy. You should also verify BGP peering on your physical router.

Step 3: Create a Service of Type LoadBalancer

With Cilium BGP peering established, you can now create a standard Kubernetes Service of type LoadBalancer. Cilium will detect this service and, if configured, advertise its loadBalancerIP to the peered routers. It’s crucial that the loadBalancerIP you choose is an IP address that is routable within your network and is not already in use. You might need to coordinate with your network team for IP allocation.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:latest
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx-service
  annotations:
    # Optional: If you want to explicitly assign a specific IP from a pool
    # cilium.io/bgp-loadbalancer-ip: 192.168.1.100
spec:
  selector:
    app: nginx
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80
  type: LoadBalancer
  # This IP will be advertised via BGP
  loadBalancerIP: 192.168.1.100 # IMPORTANT: This IP must be available and routable in your network

Explanation

This YAML defines a simple Nginx deployment and a corresponding ServiceType=LoadBalancer. The critical part is the loadBalancerIP: 192.168.1.100. When Cilium sees this service, it will take this IP address and advertise it via BGP to the routers specified in your CiliumBGPPeerPolicy. Your routers will then learn that 192.168.1.100 is reachable via the Kubernetes nodes that are peering with them. This allows external traffic to directly hit your Kubernetes cluster and be load-balanced by Cilium’s eBPF data plane.

For more advanced traffic management, consider using the Kubernetes Gateway API, which offers more flexibility than traditional Ingress, though it typically integrates with external load balancers or proxy solutions.

kubectl apply -f nginx-lb-service.yaml

Verify

Check the service status and then verify BGP advertisements from Cilium’s perspective and on your router.

kubectl get svc my-nginx-service

Expected output:

NAME               TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)        AGE
my-nginx-service   LoadBalancer   10.96.123.45   192.168.1.100   80:3xxxx/TCP   1m

Now, verify with Cilium:

kubectl exec -ti -n kube-system cilium-xxxx -- cilium bgp routes

Expected output (you should see your service IP advertised):

BGP Routes:
  Prefix: 192.168.1.100/32
  Nexthop: 10.0.0.1 (node IP where Cilium agent is running)
  Source: Kubernetes Service my-nginx-service (LoadBalancer)

Finally, log into your physical router and check its BGP routing table. The exact command varies by vendor (e.g., show ip bgp on Cisco/Juniper, show bgp ipv4 unicast on Arista).

# Example on a Cisco-like router
show ip bgp

You should see an entry for 192.168.1.100/32 with the next hop pointing to your Kubernetes node’s IP address.

Step 4: (Optional) Advertise Pod CIDRs or Egress IPs

While advertising LoadBalancer IPs is the most common use case, Cilium BGP can also advertise Pod CIDRs or Egress NAT IPs. Advertising Pod CIDRs directly might be useful for highly specialized routing or security requirements, though it can bloat router tables. Egress NAT IPs are crucial when you want a stable, external IP for outgoing traffic from your cluster.

Advertise Pod CIDRs

To advertise Pod CIDRs, you would set bgp.announce.podCIDR=true during Cilium installation. Each node would then advertise its allocated Pod CIDR block to the BGP peers. This needs careful planning to avoid routing conflicts.

helm upgrade cilium cilium/cilium --version 1.15.0 \
  --namespace kube-system \
  --set ipam.mode=clusterPool \
  --set cluster.name=kubezilla-cluster \
  --set bgp.enabled=true \
  --set bgp.announce.loadbalancerIP=true \
  --set bgp.announce.podCIDR=true \
  --set bgp.announce.nodes=false \
  --set bgp.announce.serviceExternalIP=true \
  --set bgp.announce.servicePodCIDR=false \
  --set bgp.announce.serviceIP=true \
  --set bgp.kubeProxyReplacement=strict

After upgrading, verify with cilium bgp routes and your router’s BGP table. You’ll see routes for each node’s pod CIDR (e.g., 10.0.1.0/24).

Advertise Egress Gateway IPs

Cilium’s Egress Gateway functionality allows you to assign specific external IPs for outgoing traffic from selected pods. Cilium BGP can then advertise these Egress Gateway IPs. This is particularly useful for whitelisting applications in external firewalls or for specific compliance needs.

  1. Enable Egress Gateway in Cilium: This is typically done during installation or upgrade.
  2. helm upgrade cilium cilium/cilium --version 1.15.0 \
      --namespace kube-system \
      --set egressGateway.enabled=true \
      # ... other BGP settings ...
  3. Create a CiliumEgressGatewayPolicy: This CRD specifies which pods should use an egress gateway and which IP to use.
  4. apiVersion: cilium.io/v2alpha1
    kind: CiliumEgressGatewayPolicy
    metadata:
      name: egress-to-internet
    spec:
      selectors:
        - podSelector:
            matchLabels:
              app: my-backend # Pods with this label will use the egress gateway
      egressGateway:
        nodeSelector:
          matchLabels:
            kubernetes.io/hostname: k8s-node2 # Traffic will exit from this node
        egressIP: 192.168.1.200 # The external IP to use for egress

    Apply this policy:

    kubectl apply -f egress-gateway-policy.yaml
  5. Ensure Egress IP Advertisement: Cilium BGP will automatically advertise the egressIP (e.g., 192.168.1.200/32) if bgp.announce.egressIP=true was set during installation.

This allows you to control the source IP for outgoing connections, which is a powerful feature for network segmentation and security. For more on network segmentation, refer to our Kubernetes Network Policies: Complete Security Hardening Guide.

Production Considerations

  • ASN Planning: Carefully choose your internal ASNs. Use private ASNs (64512-65534) for your Kubernetes cluster unless you have specific public peering requirements.
  • IP Address Management (IPAM): Implement a robust IPAM strategy. The IPs advertised by Cilium (LoadBalancer IPs, Egress IPs) must be unique and routable within your network. Coordinate closely with your network team.
  • Router Configuration: Ensure your physical routers are correctly configured to peer with your Kubernetes nodes. This includes allowing BGP sessions, defining neighbor statements, and potentially configuring route maps or prefix lists to control advertised/received routes.
  • Redundancy and High Availability:
    • Multiple Peers: Configure each Kubernetes node to peer with multiple routers for redundancy. If one router fails, traffic can be routed via another.
    • Multi-node Announcements: Cilium automatically handles advertising LoadBalancer IPs from multiple nodes. If a node hosting the primary endpoint for a service goes down, traffic will automatically fail over to a healthy node.
  • Security:
    • BGP Authentication: Always use BGP MD5 authentication (authSecret in CiliumBGPPeerPolicy) to prevent unauthorized peering.
    • Route Filtering: Implement prefix lists and route maps on your physical routers to control which routes are accepted from and advertised to Cilium. This prevents accidental or malicious route injection.
    • Firewall Rules: Ensure firewalls allow TCP port 179 (BGP) between your Kubernetes nodes and BGP routers.
  • Observability and Monitoring:
    • Monitor BGP session status on both Kubernetes nodes (using cilium bgp peers) and your physical routers.
    • Integrate BGP metrics into your monitoring stack. Cilium exposes Prometheus metrics that can be scraped. For deeper insights into network behavior, consider leveraging eBPF Observability with Hubble.
  • Scalability: Be mindful of the number of routes you advertise. Advertising individual Pod CIDRs for a very large cluster might strain router resources. Prioritize advertising LoadBalancer IPs and Egress IPs.
  • Upgrade Strategy: Plan Cilium upgrades carefully. Test in a staging environment. Pay attention to breaking changes in BGP CRDs or configuration options between versions.
  • Integration with Cloud Providers: While Cilium BGP shines in bare-metal, some cloud providers offer native BGP services (e.g., AWS Direct Connect, Google Cloud Router). Cilium BGP can still be used in hybrid scenarios or to extend routing within a VPC.

Troubleshooting

  1. BGP Session Not Established

    Issue: cilium bgp peers shows “Session State: Connect” or “Idle” instead of “Established”.

    Solution:

    • Firewall: Ensure TCP port 179 is open between the Kubernetes node and the BGP router in both directions.
    • IP Address/ASN Mismatch: Double-check peerAddress, peerASN, and localASN in your CiliumBGPPeerPolicy against your router’s configuration.
    • Router Configuration: Verify that your router is configured to peer with the Kubernetes node’s IP address and expects the correct ASN.
    • Authentication: If using MD5 authentication, ensure the secret is correctly configured in Kubernetes and matches the router’s password.
    • Network Reachability: Ping the router’s IP from the Kubernetes node to ensure basic network connectivity.
    • Logs: Check the logs of the cilium-bgp-control-plane pod: kubectl logs -n kube-system -l app.kubernetes.io/name=cilium-bgp-control-plane.
  2. Service IP Not Advertised

    Issue: The loadBalancerIP of your service is not appearing in cilium bgp routes or your router’s BGP table.

    Solution:

    • Cilium Installation Flags: Ensure bgp.announce.loadbalancerIP=true (and/or bgp.announce.serviceExternalIP=true) was set during Cilium installation. If not, upgrade Cilium with these flags.
    • Service Type: Confirm your service is of type: LoadBalancer and has a loadBalancerIP specified.
    • Endpoints: Verify the service has healthy endpoints (pods running and ready). A service with no ready endpoints might not have its IP advertised.
    • BGP Session: Ensure the BGP session to the router is ‘Established’.
    • Cilium BGP Control Plane Logs: Check the logs for any errors related to route advertisement.
  3. External Traffic Not Reaching Service

    Issue: The LoadBalancer IP is advertised and reachable from the router, but external clients cannot access the service.

    Solution:

    • Network Path: Trace the network path from the client to the LoadBalancer IP. Ensure no firewalls or ACLs are blocking traffic.
    • Router Forwarding: Verify your router is correctly forwarding traffic to the Kubernetes node’s IP (the BGP next-hop).
    • Service Endpoints: Ensure the service’s pods are healthy and listening on the correct port. Test with kubectl port-forward to a pod.
    • Cilium Network Policies: If you have Cilium Network Policies, ensure they are not blocking incoming traffic to your service pods.
    • kube-proxy Replacement: If bgp.kubeProxyReplacement=strict, ensure Cilium is fully managing service load balancing. Otherwise, ensure kube-proxy is running and configured correctly.
  4. BGP Routes Flapping

    Issue: BGP sessions or routes are frequently going up and down.

    Solution:

    • Network Instability: Check for underlying network issues (e.g., flaky links, overloaded routers).
    • Resource Constraints: Ensure the Kubernetes node running the Cilium agent has sufficient CPU/memory.
    • Router Configuration: Review BGP timers (keepalive, holdtime) on the router and ensure they are compatible.
    • Cilium Logs: Look for repeated error messages in the cilium-bgp-control-plane logs.
  5. Incorrect Next-Hop for Advertised Routes

    Issue: The next-hop for advertised routes in the router’s BGP table is incorrect or points to an unreachable IP.

    Solution:

    • Node IP: Cilium uses the node’s primary IP as the next-hop. Ensure this IP is correct and reachable from the router.
    • Network Interface: If a node has multiple network interfaces, ensure Cilium is properly configured to use the correct interface for BGP.
    • Cilium IPAM: Verify Cilium’s IPAM is correctly assigning IPs to nodes and pods.
  6. Too Many Routes Advertised

    Issue: Your routers are complaining about too many BGP routes, especially if bgp.announce.podCIDR=true.

    Solution:

    • Disable Pod CIDR Advertising: Set bgp.announce.podCIDR=false in your Cilium Helm values. This is the most common recommendation to keep routing tables lean.
    • Route Summarization: If you must advertise pod CIDRs, consider configuring route summarization on your physical routers to aggregate routes.
    • Prefix Limits: Use BGP prefix limits on your routers to prevent them from accepting an excessive number of routes from Cilium.

FAQ Section

  1. What is the difference between Cilium BGP and a traditional LoadBalancer controller like MetalLB?

    Cilium BGP integrates BGP directly into the CNI, allowing Kubernetes to peer directly with network routers and advertise IPs. It leverages Cilium’s eBPF data plane for efficient load balancing. MetalLB also provides LoadBalancer services for bare-metal, but it can use either BGP or ARP/NDP. MetalLB runs as a separate component and typically uses kube-proxy for data plane forwarding, whereas Cilium can fully replace kube-proxy with eBPF for higher performance and more advanced features. Cilium BGP offers a more integrated and performant solution when you’re already using Cilium as your CNI.

  2. Can I use Cilium BGP in a cloud environment (AWS, GCP, Azure)?

    While Cilium BGP is primarily beneficial for bare-metal, it can be used in cloud environments for specific hybrid or advanced routing scenarios. However, cloud providers typically offer their own native LoadBalancer services and BGP integration (e.g., AWS VPC Endpoints, Google Cloud Load Balancing) that are often simpler to integrate with. Cilium BGP might be used to advertise IPs over a Direct Connect or VPN connection to an on-premises network, or to advertise custom Egress IPs.

  3. Does Cilium BGP support IPv6?

    Yes, Cilium BGP supports IPv6. You can configure IPv6 peering and advertise IPv6 service IPs or Pod CIDRs. You would need to ensure your underlying network infrastructure and routers are also configured for IPv6 BGP peering and routing.

  4. How does Cilium BGP handle LoadBalancer IP failover?

    Cilium BGP leverages BGP’s inherent fast convergence mechanisms. When a node that is advertising a LoadBalancer IP becomes unhealthy or the service endpoints on that node fail, Cilium will withdraw the route from that node. Your BGP routers will then update their routing tables and direct traffic to other healthy nodes that are also advertising the same LoadBalancer IP (if multiple nodes have healthy endpoints for the service). This provides robust and rapid failover.

  5. Can I use Cilium BGP with other CNI plugins?

    No, Cilium BGP is an integral part of the Cilium CNI. It relies on Cilium’s eBPF data plane and controller logic. You cannot use Cilium BGP with another CNI like Calico or Flannel. If you require BGP capabilities, you must use Cilium as your CNI.

Cleanup Commands

To remove the resources created in this guide:

# Delete the Nginx deployment and service
kubectl delete -f nginx-lb-service.yaml

# Delete the Cilium BGP Peer Policy
kubectl delete -f bgp-peer-policy.yaml

# If you created an Egress Gateway Policy
# kubectl delete -f egress-gateway-policy.yaml

# Uninstall Cilium
helm uninstall cilium --namespace kube-system

# Clean up Cilium CRDs (optional, only if you want a complete wipe)
# kubectl get crd -o name | grep cilium.io | xargs kubectl delete
# Note: Be careful with this command, it removes all Cilium CRDs.

Next Steps / Further Reading

  • Explore advanced Cilium features like Istio Ambient Mesh integration, which can provide service mesh capabilities without sidecars, further enhancing your network’s efficiency.
  • Deep dive into Cilium’s official BGP documentation for more advanced configurations and troubleshooting.
  • Learn more about

Leave a Reply

Your email address will not be published. Required fields are marked *