Alerts and Monitoring Best Practices Cloud Computing Kubernetes Kubernetes GUI Orchestration Platform Engineering

Jaeger Distributed Tracing on Kubernetes: Debug Microservices Like a Pro

Master Jaeger distributed tracing on Kubernetes in 2025. Learn how to debug microservices, identify bottlenecks, and optimize performance with hands-on examples and visual guides.

Ever felt like your microservices are playing hide-and-seek in production? One API call enters your Kubernetes cluster, bounces through 15 different services, and emerges 3 seconds later (or doesn’t emerge at all). Welcome to the world of distributed systems debugging—where traditional logging feels like searching for a needle in a haystack while blindfolded.

Enter Jaeger, your X-ray vision for microservices architecture.

graph TD
    SDK["OpenTelemetry SDK"] --> |HTTP or gRPC| COLLECTOR
    COLLECTOR["Jaeger Collector"] --> STORE[Storage]
    COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
    COLLECTOR --> |gRPC/sampling| SDK
    PLUGIN --> STORE
    QUERY[Jaeger Query Service] --> STORE
    QUERY --> |gRPC| PLUGIN
    UI[Jaeger UI] --> |HTTP| QUERY
    subgraph Application Host
        subgraph User Application
            SDK
        end
    end
Application Host
User Application
HTTP or gRPC
gRPC
gRPC/sampling
gRPC
HTTP
OpenTelemetry SDK
Jaeger Collector
Storage
Storage Plugin
Jaeger Query Service
Jaeger UI

What is Distributed Tracing? (The Non-Technical Explanation)

Imagine you’re tracking a package through a delivery network. The package starts at the warehouse, goes through multiple sorting centers, delivery hubs, and finally reaches your doorstep. At each checkpoint, someone scans the barcode and records the timestamp.

That’s essentially what distributed tracing does for your API requests—it follows the journey of each request through your microservices ecosystem, recording every hop, every delay, and every interaction.

graph LR
    A[User Request] --> B[API Gateway]
    B --> C[Auth Service]
    B --> D[Order Service]
    D --> E[Payment Service]
    D --> F[Inventory Service]
    E --> G[Notification Service]
    F --> G
    
    style A fill:#e1f5ff
    style B fill:#fff4e1
    style C fill:#ffe1e1
    style D fill:#e1ffe1
    style E fill:#f0e1ff
    style F fill:#ffe1f5
    style G fill:#fff9e1
User Request
API Gateway
Auth Service
Order Service
Payment Service
Inventory Service
Notification Service

Why Jaeger on Kubernetes?

Jaeger, originally built by Uber to track millions of transactions daily, is now a graduated CNCF project perfectly aligned with Kubernetes-native architectures. The newly released Jaeger v2 (as of November 2024) brings game-changing improvements built on the OpenTelemetry Collector framework.

Here’s why it matters:

  • Performance bottleneck detection in seconds, not hours
  • Root cause analysis across distributed services
  • Service dependency mapping to understand system topology
  • OpenTelemetry compatibility for future-proof instrumentation
sequenceDiagram
    participant App as Your Application
    participant Agent as Jaeger Agent
    participant Collector as Jaeger Collector
    participant Storage as Storage Backend
    participant UI as Jaeger UI
    
    App->>Agent: Send trace spans
    Agent->>Collector: Batch and forward
    Collector->>Storage: Write traces
    UI->>Storage: Query traces
    Storage->>UI: Return results
    UI->>UI: Visualize trace timeline

The Architecture: How Jaeger Works

sequenceDiagram
    participant App as Your Application
    participant Agent as Jaeger Agent
    participant Collector as Jaeger Collector
    participant Storage as Storage Backend
    participant UI as Jaeger UI
    
    App->>Agent: Send trace spans
    Agent->>Collector: Batch and forward
    Collector->>Storage: Write traces
    UI->>Storage: Query traces
    Storage->>UI: Return results
    UI->>UI: Visualize trace timeline
Your ApplicationJaeger AgentJaeger CollectorStorage BackendJaeger UISend trace spansBatch and forwardWrite tracesQuery tracesReturn resultsVisualize trace timelineYour ApplicationJaeger AgentJaeger CollectorStorage BackendJaeger UI

Think of it like this:

  • Your apps = Witnesses reporting what they see
  • Jaeger Agent = Local police station collecting reports
  • Collector = Central headquarters processing information
  • Storage = Archive for historical records
  • UI = Detective board connecting all the dots

Quick Start: Deploy Jaeger on Kubernetes

Let’s get our hands dirty. Here’s how to deploy Jaeger using the OpenTelemetry Operator (the recommended approach for v2):

Step 1: Install the Operator

First, add the OpenTelemetry Operator to your cluster:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Step 2: Create Jaeger Instance Configuration

Create a file named jaeger-instance.yaml with the following configuration:

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: jaeger-all-in-one
  namespace: observability
spec:
  mode: deployment
  image: jaegertracing/jaeger:2.13.0
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
      zipkin:
        endpoint: 0.0.0.0:9411
    
    processors:
      batch:
        timeout: 10s
        send_batch_size: 1024
    
    exporters:
      jaeger:
        endpoint: localhost:14250
        tls:
          insecure: true
    
    service:
      pipelines:
        traces:
          receivers: [otlp, zipkin]
          processors: [batch]
          exporters: [jaeger]

Step 3: Deploy with Service Exposure

apiVersion: v1
kind: Service
metadata:
  name: jaeger-ui
  namespace: observability
spec:
  type: LoadBalancer
  ports:
  - name: ui
    port: 16686
    targetPort: 16686
  - name: otlp-grpc
    port: 4317
    targetPort: 4317
  - name: otlp-http
    port: 4318
    targetPort: 4318
  selector:
    app.kubernetes.io/name: jaeger-all-in-one

Apply both configurations:

kubectl create namespace observability
kubectl apply -f jaeger-instance.yaml
kubectl apply -f jaeger-service.yaml

Step 4: Verify Installation

Check if Jaeger is running:

kubectl get pods -n observability
kubectl get svc -n observability

Access the UI by port-forwarding:

kubectl port-forward svc/jaeger-ui 16686:16686 -n observability

Navigate to http://localhost:16686 and you’ll see the Jaeger dashboard!

Real-World Debugging Scenario

Let’s say you’re experiencing slow checkout times in your e-commerce platform. Here’s how you’d use Jaeger to investigate:

graph TB
    A[Slow Checkout Alert] --> B{Check Jaeger}
    B --> C[View Checkout Traces]
    C --> D[Identify 2s Delay]
    D --> E[Payment Service]
    E --> F[Database Query Issue]
    F --> G[Missing Index Found]
    G --> H[Add Index]
    H --> I[Checkout Time: 200ms]
    
    style A fill:#ff6b6b
    style I fill:#51cf66
    style F fill:#ffd43b
Slow Checkout Alert
Check Jaeger
View Checkout Traces
Identify 2s Delay
Payment Service
Database Query Issue
Missing Index Found
Add Index
Checkout Time: 200ms
graph TB
    A[Slow Checkout Alert] --> B{Check Jaeger}
    B -yes-> C[View Checkout Traces]
    C --> D[Identify 2s Delay]
    D --> E[Payment Service]
    E --> F[Database Query Issue]
    F --> G[Missing Index Found]
    G --> H[Add Index]
    H --> I[Checkout Time: 200ms]
    
    style A fill:#ff6b6b
    style I fill:#51cf66
    style F fill:#ffd43b

What Jaeger Reveals:

  1. Total request time: 3.2 seconds
  2. Breakdown by service:
    • API Gateway: 50ms
    • Auth Service: 100ms
    • Order Service: 200ms
    • Payment Service: 2.5s ⚠️
    • Inventory Service: 150ms

You immediately see that 78% of your checkout time is spent in the Payment Service. Drilling down further, you discover a database query without proper indexing.

Advanced Configuration: Production-Ready Setup

For production environments, you’ll want persistent storage. Here’s a configuration using Elasticsearch:

apiVersion: v1
kind: ConfigMap
metadata:
  name: jaeger-config
  namespace: observability
data:
  config.yaml: |
    service:
      extensions: [health_check, zpages]
      pipelines:
        traces:
          receivers: [otlp]
          processors: [batch, memory_limiter]
          exporters: [elasticsearch]
    
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
    
    processors:
      batch:
        timeout: 5s
        send_batch_size: 512
      
      memory_limiter:
        check_interval: 1s
        limit_mib: 512
    
    exporters:
      elasticsearch:
        endpoints: 
          - http://elasticsearch:9200
        index: jaeger-traces
        mapping:
          mode: ecs
    
    extensions:
      health_check:
        endpoint: 0.0.0.0:13133
      
      zpages:
        endpoint: localhost:55679

Monitoring Jaeger Itself

Here’s a monitoring configuration to ensure Jaeger stays healthy:

apiVersion: v1
kind: Service
metadata:
  name: jaeger-metrics
  namespace: observability
  labels:
    app: jaeger
spec:
  ports:
  - name: metrics
    port: 8888
    targetPort: 8888
  selector:
    app.kubernetes.io/name: jaeger-all-in-one
  
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: jaeger-monitor
  namespace: observability
spec:
  selector:
    matchLabels:
      app: jaeger
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

Best Practices for Production

  1. Sampling Strategy: Don’t trace everything—start with 1% sampling and adjust based on traffic
  2. Resource Limits: Set appropriate memory and CPU limits to prevent resource exhaustion
  3. Data Retention: Configure storage retention policies (7-30 days is typical)
  4. Security: Enable TLS and authentication for production deployments
  5. High Availability: Run multiple collector and query instances for redundancy

Troubleshooting Common Issues

Traces not appearing?

  • Check if your application is properly instrumented with OpenTelemetry SDK
  • Verify network connectivity between services and Jaeger collector
  • Confirm the correct endpoint configuration (usually jaeger-collector:4317)

High memory usage?

  • Reduce sampling rate
  • Adjust batch processor settings
  • Enable memory limiter processor

The Bottom Line

Jaeger transforms microservices debugging from a frustrating guessing game into a data-driven investigation. With the new v2 architecture built on OpenTelemetry, you’re not just adopting a tracing tool you’re investing in the future of cloud-native observability.

Whether you’re troubleshooting production incidents at 2 AM or optimizing performance during business hours, Jaeger gives you the visibility you need to debug microservices like a pro.

Ready to see what’s really happening inside your Kubernetes cluster? Install Jaeger today and watch your debugging productivity skyrocket.

Quick Reference

Essential Ports:

  • 16686 – Jaeger UI
  • 4317 – OTLP gRPC receiver
  • 4318 – OTLP HTTP receiver
  • 9411 – Zipkin compatible endpoint
  • 8888 – Prometheus metrics
  • 13133 – Health check endpoint

Useful Commands:

# Check Jaeger logs
kubectl logs -n observability deployment/jaeger-all-in-one -f

# Get Jaeger UI URL
kubectl get svc -n observability jaeger-ui

# View collector metrics
kubectl port-forward -n observability svc/jaeger-metrics 8888:8888
curl http://localhost:8888/metrics

# Restart Jaeger
kubectl rollout restart deployment/jaeger-all-in-one -n observability

Additional Resources:

Have questions about implementing Jaeger in your Kubernetes environment? Drop a comment below or connect with me on LinkedIn. Let’s debug those microservices together! 🔍

Leave a Reply

Your email address will not be published. Required fields are marked *