Ever felt like your microservices are playing hide-and-seek in production? One API call enters your Kubernetes cluster, bounces through 15 different services, and emerges 3 seconds later (or doesn’t emerge at all). Welcome to the world of distributed systems debugging—where traditional logging feels like searching for a needle in a haystack while blindfolded.
Enter Jaeger, your X-ray vision for microservices architecture.
graph TD
SDK["OpenTelemetry SDK"] --> |HTTP or gRPC| COLLECTOR
COLLECTOR["Jaeger Collector"] --> STORE[Storage]
COLLECTOR --> |gRPC| PLUGIN[Storage Plugin]
COLLECTOR --> |gRPC/sampling| SDK
PLUGIN --> STORE
QUERY[Jaeger Query Service] --> STORE
QUERY --> |gRPC| PLUGIN
UI[Jaeger UI] --> |HTTP| QUERY
subgraph Application Host
subgraph User Application
SDK
end
end
What is Distributed Tracing? (The Non-Technical Explanation)
Imagine you’re tracking a package through a delivery network. The package starts at the warehouse, goes through multiple sorting centers, delivery hubs, and finally reaches your doorstep. At each checkpoint, someone scans the barcode and records the timestamp.
That’s essentially what distributed tracing does for your API requests—it follows the journey of each request through your microservices ecosystem, recording every hop, every delay, and every interaction.
graph LR
A[User Request] --> B[API Gateway]
B --> C[Auth Service]
B --> D[Order Service]
D --> E[Payment Service]
D --> F[Inventory Service]
E --> G[Notification Service]
F --> G
style A fill:#e1f5ff
style B fill:#fff4e1
style C fill:#ffe1e1
style D fill:#e1ffe1
style E fill:#f0e1ff
style F fill:#ffe1f5
style G fill:#fff9e1
Why Jaeger on Kubernetes?
Jaeger, originally built by Uber to track millions of transactions daily, is now a graduated CNCF project perfectly aligned with Kubernetes-native architectures. The newly released Jaeger v2 (as of November 2024) brings game-changing improvements built on the OpenTelemetry Collector framework.
Here’s why it matters:
- Performance bottleneck detection in seconds, not hours
- Root cause analysis across distributed services
- Service dependency mapping to understand system topology
- OpenTelemetry compatibility for future-proof instrumentation
sequenceDiagram
participant App as Your Application
participant Agent as Jaeger Agent
participant Collector as Jaeger Collector
participant Storage as Storage Backend
participant UI as Jaeger UI
App->>Agent: Send trace spans
Agent->>Collector: Batch and forward
Collector->>Storage: Write traces
UI->>Storage: Query traces
Storage->>UI: Return results
UI->>UI: Visualize trace timelineThe Architecture: How Jaeger Works
sequenceDiagram
participant App as Your Application
participant Agent as Jaeger Agent
participant Collector as Jaeger Collector
participant Storage as Storage Backend
participant UI as Jaeger UI
App->>Agent: Send trace spans
Agent->>Collector: Batch and forward
Collector->>Storage: Write traces
UI->>Storage: Query traces
Storage->>UI: Return results
UI->>UI: Visualize trace timelineThink of it like this:
- Your apps = Witnesses reporting what they see
- Jaeger Agent = Local police station collecting reports
- Collector = Central headquarters processing information
- Storage = Archive for historical records
- UI = Detective board connecting all the dots
Quick Start: Deploy Jaeger on Kubernetes
Let’s get our hands dirty. Here’s how to deploy Jaeger using the OpenTelemetry Operator (the recommended approach for v2):
Step 1: Install the Operator
First, add the OpenTelemetry Operator to your cluster:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Step 2: Create Jaeger Instance Configuration
Create a file named jaeger-instance.yaml with the following configuration:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: jaeger-all-in-one
namespace: observability
spec:
mode: deployment
image: jaegertracing/jaeger:2.13.0
config: |
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
zipkin:
endpoint: 0.0.0.0:9411
processors:
batch:
timeout: 10s
send_batch_size: 1024
exporters:
jaeger:
endpoint: localhost:14250
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp, zipkin]
processors: [batch]
exporters: [jaeger]
Step 3: Deploy with Service Exposure
apiVersion: v1
kind: Service
metadata:
name: jaeger-ui
namespace: observability
spec:
type: LoadBalancer
ports:
- name: ui
port: 16686
targetPort: 16686
- name: otlp-grpc
port: 4317
targetPort: 4317
- name: otlp-http
port: 4318
targetPort: 4318
selector:
app.kubernetes.io/name: jaeger-all-in-one
Apply both configurations:
kubectl create namespace observability
kubectl apply -f jaeger-instance.yaml
kubectl apply -f jaeger-service.yaml
Step 4: Verify Installation
Check if Jaeger is running:
kubectl get pods -n observability
kubectl get svc -n observability
Access the UI by port-forwarding:
kubectl port-forward svc/jaeger-ui 16686:16686 -n observability
Navigate to http://localhost:16686 and you’ll see the Jaeger dashboard!
Real-World Debugging Scenario
Let’s say you’re experiencing slow checkout times in your e-commerce platform. Here’s how you’d use Jaeger to investigate:
graph TB
A[Slow Checkout Alert] --> B{Check Jaeger}
B --> C[View Checkout Traces]
C --> D[Identify 2s Delay]
D --> E[Payment Service]
E --> F[Database Query Issue]
F --> G[Missing Index Found]
G --> H[Add Index]
H --> I[Checkout Time: 200ms]
style A fill:#ff6b6b
style I fill:#51cf66
style F fill:#ffd43bgraph TB
A[Slow Checkout Alert] --> B{Check Jaeger}
B -yes-> C[View Checkout Traces]
C --> D[Identify 2s Delay]
D --> E[Payment Service]
E --> F[Database Query Issue]
F --> G[Missing Index Found]
G --> H[Add Index]
H --> I[Checkout Time: 200ms]
style A fill:#ff6b6b
style I fill:#51cf66
style F fill:#ffd43b
What Jaeger Reveals:
- Total request time: 3.2 seconds
- Breakdown by service:
- API Gateway: 50ms
- Auth Service: 100ms
- Order Service: 200ms
- Payment Service: 2.5s ⚠️
- Inventory Service: 150ms
You immediately see that 78% of your checkout time is spent in the Payment Service. Drilling down further, you discover a database query without proper indexing.



Advanced Configuration: Production-Ready Setup
For production environments, you’ll want persistent storage. Here’s a configuration using Elasticsearch:
apiVersion: v1
kind: ConfigMap
metadata:
name: jaeger-config
namespace: observability
data:
config.yaml: |
service:
extensions: [health_check, zpages]
pipelines:
traces:
receivers: [otlp]
processors: [batch, memory_limiter]
exporters: [elasticsearch]
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
send_batch_size: 512
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
elasticsearch:
endpoints:
- http://elasticsearch:9200
index: jaeger-traces
mapping:
mode: ecs
extensions:
health_check:
endpoint: 0.0.0.0:13133
zpages:
endpoint: localhost:55679
Monitoring Jaeger Itself
Here’s a monitoring configuration to ensure Jaeger stays healthy:
apiVersion: v1
kind: Service
metadata:
name: jaeger-metrics
namespace: observability
labels:
app: jaeger
spec:
ports:
- name: metrics
port: 8888
targetPort: 8888
selector:
app.kubernetes.io/name: jaeger-all-in-one
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: jaeger-monitor
namespace: observability
spec:
selector:
matchLabels:
app: jaeger
endpoints:
- port: metrics
interval: 30s
path: /metrics
Best Practices for Production
- Sampling Strategy: Don’t trace everything—start with 1% sampling and adjust based on traffic
- Resource Limits: Set appropriate memory and CPU limits to prevent resource exhaustion
- Data Retention: Configure storage retention policies (7-30 days is typical)
- Security: Enable TLS and authentication for production deployments
- High Availability: Run multiple collector and query instances for redundancy
Troubleshooting Common Issues
Traces not appearing?
- Check if your application is properly instrumented with OpenTelemetry SDK
- Verify network connectivity between services and Jaeger collector
- Confirm the correct endpoint configuration (usually
jaeger-collector:4317)
High memory usage?
- Reduce sampling rate
- Adjust batch processor settings
- Enable memory limiter processor
The Bottom Line
Jaeger transforms microservices debugging from a frustrating guessing game into a data-driven investigation. With the new v2 architecture built on OpenTelemetry, you’re not just adopting a tracing tool you’re investing in the future of cloud-native observability.
Whether you’re troubleshooting production incidents at 2 AM or optimizing performance during business hours, Jaeger gives you the visibility you need to debug microservices like a pro.
Ready to see what’s really happening inside your Kubernetes cluster? Install Jaeger today and watch your debugging productivity skyrocket.
Quick Reference
Essential Ports:
16686– Jaeger UI4317– OTLP gRPC receiver4318– OTLP HTTP receiver9411– Zipkin compatible endpoint8888– Prometheus metrics13133– Health check endpoint
Useful Commands:
# Check Jaeger logs
kubectl logs -n observability deployment/jaeger-all-in-one -f
# Get Jaeger UI URL
kubectl get svc -n observability jaeger-ui
# View collector metrics
kubectl port-forward -n observability svc/jaeger-metrics 8888:8888
curl http://localhost:8888/metrics
# Restart Jaeger
kubectl rollout restart deployment/jaeger-all-in-one -n observability
Additional Resources:
Have questions about implementing Jaeger in your Kubernetes environment? Drop a comment below or connect with me on LinkedIn. Let’s debug those microservices together! 🔍