OpenTelemetry Collector: Unified Observability Pipeline
In the complex world of cloud-native applications, gaining comprehensive visibility into your systems is paramount. Microservices, distributed architectures, and dynamic Kubernetes environments generate an unprecedented volume of telemetry data—logs, metrics, and traces. Correlating this data across disparate services and infrastructure components is a significant challenge. Traditional monitoring solutions often struggle to keep pace, leading to data silos, vendor lock-in, and increased operational overhead. This is where OpenTelemetry steps in, offering a vendor-agnostic, open-source standard for instrumenting, generating, and exporting telemetry data.
At the heart of the OpenTelemetry ecosystem lies the OpenTelemetry Collector. This powerful, flexible, and vendor-agnostic component acts as a central processing pipeline for your observability data. It can receive telemetry in various formats, process it (e.g., filter, transform, enrich), and export it to multiple backend destinations, all before it ever leaves your infrastructure. Deploying the OpenTelemetry Collector within your Kubernetes cluster allows you to standardize your observability strategy, reduce the burden on your application services, and ensure that your critical operational insights are always available, regardless of your chosen backend analytics platform. This guide will walk you through deploying and configuring the OpenTelemetry Collector on Kubernetes, establishing a robust and unified observability pipeline.
TL;DR: OpenTelemetry Collector on Kubernetes
Deploy the OpenTelemetry Collector as a DaemonSet or Deployment to collect, process, and export metrics, traces, and logs from your Kubernetes cluster. Here’s how to quickly get started:
- Install OTel Operator:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml - Deploy Collector (Example Configuration):
apiVersion: opentelemetry.io/v1alpha1 kind: OpenTelemetryCollector metadata: name: otel-collector spec: mode: daemonset # or deployment config: | receivers: otlp: protocols: grpc: http: kubernetes_objects: watch_types: ["pods", "nodes"] processors: batch: memory_limiter: limit_mib: 100 spike_limit_mib: 20 exporters: logging: loglevel: debug otlp: endpoint: "otel-collector-metrics.default.svc.cluster.local:4317" # Example for internal OTLP export service: pipelines: traces: receivers: [otlp] processors: [memory_limiter, batch] exporters: [logging] metrics: receivers: [otlp, kubernetes_objects] processors: [memory_limiter, batch] exporters: [logging] logs: receivers: [otlp] processors: [memory_limiter, batch] exporters: [logging] --- apiVersion: v1 kind: Service metadata: name: otel-collector-metrics spec: selector: app.kubernetes.io/instance: otel-collector app.kubernetes.io/component: otel-collector ports: - protocol: TCP port: 4317 targetPort: 4317 name: otlp-grpc - protocol: TCP port: 4318 targetPort: 4318 name: otlp-http type: ClusterIP - Apply the manifest:
kubectl apply -f otel-collector.yaml - Verify deployment:
kubectl get pods -l app.kubernetes.io/instance=otel-collector
This setup uses the OpenTelemetry Operator to deploy a Collector as a DaemonSet, receiving OTLP data and exporting it to logs (for demonstration) or another OTLP endpoint.
Prerequisites
Before diving into the deployment, ensure you have the following:
- Kubernetes Cluster: A running Kubernetes cluster (v1.19+ recommended). You can use Minikube, Kind, or a cloud-managed cluster like EKS, AKS, or GKE.
- kubectl: The Kubernetes command-line tool, configured to connect to your cluster. Download instructions can be found in the official Kubernetes documentation.
- Helm (Optional but Recommended): For easier management of the OpenTelemetry Operator and Collector. Install instructions are on the Helm website.
- Basic understanding of Kubernetes concepts: Pods, Deployments, DaemonSets, Services, and ConfigMaps.
- Basic understanding of OpenTelemetry concepts: Receivers, Processors, Exporters, and Pipelines. Refer to the OpenTelemetry documentation for more details.
Step-by-Step Guide: Deploying OpenTelemetry Collector on Kubernetes
1. Install the OpenTelemetry Operator
The OpenTelemetry Operator simplifies the deployment and management of the OpenTelemetry Collector within Kubernetes. It provides custom resources (CRDs) like OpenTelemetryCollector, allowing you to define your collector configuration declaratively. This is the recommended approach for production environments as it handles lifecycle management, scaling, and updates.
First, we’ll install the operator using its official manifest. This will deploy the necessary CRDs, RBAC roles, and the operator itself into your cluster, typically in the opentelemetry-operator-system namespace. This operator will then watch for OpenTelemetryCollector custom resources and manage the actual collector deployments.
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
Verify:
After applying the manifest, you should see the operator pod running in its dedicated namespace. It might take a moment for the pod to reach the Running state.
kubectl get pods -n opentelemetry-operator-system
Expected Output:
NAME READY STATUS RESTARTS AGE
opentelemetry-operator-controller-manager-XXXXX 1/1 Running 0 2m
2. Define Your OpenTelemetry Collector Configuration
The heart of the OpenTelemetry Collector is its configuration. This YAML defines how the collector receives, processes, and exports telemetry data. We’ll create a basic configuration that receives data via OTLP (OpenTelemetry Protocol), processes it with batching, and exports it to the console (for demonstration) and potentially to another OTLP endpoint. For collecting infrastructure metrics, we’ll also include the kubernetes_objects receiver.
This configuration defines three main components:
- Receivers: How data gets into the collector. Here,
otlpfor traces, metrics, and logs from applications, andkubernetes_objectsfor metadata about Kubernetes resources. - Processors: How data is modified.
batchbuffers data to reduce export calls, andmemory_limiterprevents the collector from consuming too much memory. - Exporters: Where data goes.
loggingprints telemetry to the collector’s standard output (useful for debugging), andotlpexports data to another OTLP endpoint (e.g., a backend like Jaeger, Prometheus, or a cloud observability service). - Service: Orchestrates the pipelines, connecting receivers to processors and exporters for traces, metrics, and logs.
Create a file named otel-collector.yaml with the following content:
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otel-collector
spec:
mode: daemonset # Can be deployment, daemonset, or statefulset
image: "otel/opentelemetry-collector-contrib:0.96.0" # Use a specific version, contrib includes more receivers/exporters
# Set resource limits for the collector pods
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
config: |
receivers:
otlp:
protocols:
grpc:
http:
# Use the k8s_cluster receiver for cluster-wide metrics/logs
# For node-level data, consider deploying as a DaemonSet and using host-level receivers
# For Kubernetes object metadata, use kubernetes_objects
kubernetes_objects:
watch_types: ["pods", "nodes", "deployments", "replicasets"] # Collect metadata for these object types
# Optional: Add fields and labels to collected data
# fields:
# - field_selector: "status.phase=Running"
# labels:
# - label_selector: "app=my-app"
processors:
memory_limiter:
# 80% of maximum configured memory.
limit_mib: 150
# 25% of limit_mib.
spike_limit_mib: 50
batch:
send_batch_size: 1000
timeout: 5s
resourcedetection/system:
detectors: [system]
system:
resource_attributes:
os.type:
enabled: true
os.description:
enabled: true
host.arch:
enabled: true
host.name:
enabled: true
# Processor to add Kubernetes metadata to telemetry
k8sattributes:
auth_type: "serviceAccount"
passthrough: false
# Extract metadata from pods
pod_association:
- from: "resource_attribute"
name: "k8s.pod.uid"
- from: "resource_attribute"
name: "k8s.pod.name"
- from: "connection"
# Add metadata to traces, metrics, and logs
extract:
metadata:
- "k8s.pod.name"
- "k8s.deployment.name"
- "k8s.namespace.name"
- "k8s.node.name"
- "k8s.cluster.name" # Requires a cluster name to be set as an environment variable or resource attribute
exporters:
# Console exporter for debugging
logging:
loglevel: debug
# OTLP exporter to send data to another OTLP endpoint (e.g., a central collector, Jaeger, Prometheus remote write)
otlp:
endpoint: "otel-collector-backend.observability.svc.cluster.local:4317" # Replace with your actual backend endpoint
tls:
insecure: true # Use false and configure certs for production
# Example: Prometheus remote write exporter
# prometheusremotewrite:
# endpoint: "http://prometheus-server.monitoring.svc.cluster.local:9090/api/v1/write"
# Example: Jaeger exporter
# jaeger:
# endpoint: "jaeger-collector.observability.svc.cluster.local:14250"
# tls:
# insecure: true # Use false and configure certs for production
service:
telemetry:
logs:
level: "debug"
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch, k8sattributes, resourcedetection/system]
exporters: [logging, otlp] # Export to console and OTLP backend
metrics:
receivers: [otlp, kubernetes_objects]
processors: [memory_limiter, batch, k8sattributes, resourcedetection/system]
exporters: [logging, otlp] # Export to console and OTLP backend
logs:
receivers: [otlp]
processors: [memory_limiter, batch, k8sattributes, resourcedetection/system]
exporters: [logging, otlp] # Export to console and OTLP backend
---
# Define a Service to expose the OTLP receiver endpoint for applications
apiVersion: v1
kind: Service
metadata:
name: otel-collector-ingest
labels:
app.kubernetes.io/instance: otel-collector
app.kubernetes.io/component: otel-collector
spec:
selector:
app.kubernetes.io/instance: otel-collector
ports:
- protocol: TCP
port: 4317 # OTLP gRPC port
targetPort: 4317
name: otlp-grpc
- protocol: TCP
port: 4318 # OTLP HTTP port
targetPort: 4318
name: otlp-http
type: ClusterIP
---
# Define a Service for internal OTLP export, if the collector exports to another collector
apiVersion: v1
kind: Service
metadata:
name: otel-collector-backend
labels:
app.kubernetes.io/instance: otel-collector
app.kubernetes.io/component: otel-collector
spec:
selector:
app.kubernetes.io/instance: otel-collector
ports:
- protocol: TCP
port: 4317
targetPort: 4317
name: otlp-grpc
- protocol: TCP
port: 4318
targetPort: 4318
name: otlp-http
type: ClusterIP
Explanation:
- The
OpenTelemetryCollectorcustom resource defines a collector instance. mode: daemonsetmeans a collector pod will run on every node. This is ideal for host-level metrics, logs, and ensuring low-latency collection for applications running on that node. Alternatively,deploymentcan be used for a centralized collector that receives data from all applications.imagespecifies the collector image. Usingotel/opentelemetry-collector-contribprovides a wider range of receivers, processors, and exporters. Always pin to a specific version for stability.resourcesdefine CPU and memory limits, crucial for stability in a shared Kubernetes environment.- The
configblock contains the core OpenTelemetry Collector configuration. - The
k8sattributesprocessor is vital for enriching telemetry with Kubernetes metadata like pod names, namespaces, and deployment names, making your observability data much more useful. For more on Kubernetes networking and context, see our Kubernetes Network Policies Security Guide. - Two
Serviceresources are defined:otel-collector-ingest: Exposes the OTLP gRPC (port 4317) and HTTP (port 4318) endpoints so applications can send telemetry to the collector.otel-collector-backend: An example service endpoint for internal OTLP export, if you have a multi-tier collector architecture (e.g., agent collectors sending to a central gateway collector).
3. Deploy the OpenTelemetry Collector
Now, apply the otel-collector.yaml manifest to your Kubernetes cluster. The OpenTelemetry Operator will detect the new OpenTelemetryCollector resource and create the necessary Kubernetes Deployments/DaemonSets, Services, and ConfigMaps.
kubectl apply -f otel-collector.yaml
Verify:
Check if the collector pods are running and the services are created.
kubectl get pods -l app.kubernetes.io/instance=otel-collector
kubectl get svc -l app.kubernetes.io/instance=otel-collector
Expected Output (Pods):
NAME READY STATUS RESTARTS AGE
otel-collector-XXXXX-yyyy 1/1 Running 0 30s
otel-collector-XXXXX-zzzz 1/1 Running 0 30s
# ... one pod per node if mode: daemonset
Expected Output (Services):
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
otel-collector-ingest ClusterIP 10.96.XXX.XXX <none> 4317/TCP,4318/TCP 30s
otel-collector-backend ClusterIP 10.96.YYY.YYY <none> 4317/TCP,4318/TCP 30s
4. Configure Applications to Send Telemetry to the Collector
Once the collector is deployed, you need to instrument your applications to send their telemetry data to it. OpenTelemetry provides SDKs for various languages (official documentation). The most common approach is to configure your application’s OpenTelemetry SDK to export data via OTLP to the collector’s service endpoint.
For example, if your application is in the same namespace as the collector, you can use the service name otel-collector-ingest.default.svc.cluster.local (assuming the collector is in the default namespace). If in a different namespace (e.g., my-app-ns), use otel-collector-ingest.default.svc.cluster.local.
Here’s an example of how you might configure a Go application (environment variables) or a Kubernetes Deployment to point to the collector:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-instrumented-app
spec:
replicas: 1
selector:
matchLabels:
app: my-instrumented-app
template:
metadata:
labels:
app: my-instrumented-app
spec:
containers:
- name: my-app
image: my-instrumented-app:latest # Replace with your application image
ports:
- containerPort: 8080
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector-ingest.default.svc.cluster.local:4318" # OTLP HTTP endpoint
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf" # or grpc
- name: OTEL_RESOURCE_ATTRIBUTES
value: "service.name=my-instrumented-app,service.version=1.0.0"
# ... other OpenTelemetry environment variables for traces, metrics, logs, etc.
Explanation:
The environment variables OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_PROTOCOL are crucial. They tell the OpenTelemetry SDK within your application where to send its collected telemetry data. We’re pointing it to the otel-collector-ingest service, which then routes to the collector pods.
Verify:
Once your application is deployed and running, check the logs of your OpenTelemetry Collector pods. You should start seeing telemetry data being received and processed, visible via the logging exporter.
kubectl logs -l app.kubernetes.io/instance=otel-collector -f
Expected Output (example trace log):
2023-10-27T10:30:00.123Z INFO logging/logging.go:78 TracesExporter {"#spans": 1, "resource spans": [{"resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "my-instrumented-app"}}, {"key": "host.name", "value": {"stringValue": "otel-collector-XXXXX"}}]}, "scope_spans": [{"scope": {"name": "my-app-scope"}, "spans": [{"trace_id": "...", "span_id": "...", "parent_span_id": "...", "name": "doSomething", "start_time": "...", "end_time": "...", "kind": "SPAN_KIND_INTERNAL", "attributes": [{"key": "http.method", "value": {"stringValue": "GET"}}]}]}]}]}
5. Integrate with a Backend Observability Platform
The final step is to ensure your processed telemetry reaches your desired backend. In our collector configuration, we’ve included an otlp exporter pointing to otel-collector-backend.observability.svc.cluster.local:4317. This is a placeholder. You’ll need to replace this with the actual endpoint of your observability backend. This could be:
- Jaeger / Zipkin: For distributed tracing.
- Prometheus / Mimir / Thanos: For metrics (often via Prometheus remote write or OTLP).
- Loki / Elasticsearch / Splunk: For logs.
- Cloud Provider Services: AWS CloudWatch, Google Cloud Operations, Azure Monitor.
- Commercial APM Tools: Datadog, New Relic, Dynatrace (often with specific OpenTelemetry exporters).
For example, to send to a Jaeger collector, you would change the otlp exporter endpoint. If you are using a service mesh like Istio Ambient Mesh, you might find additional ways to collect and export telemetry data.
Here’s an example of modifying the otlp exporter to point to a Jaeger collector (assuming Jaeger is deployed in the observability namespace):
exporters:
logging:
loglevel: debug
otlp:
endpoint: "jaeger-collector.observability.svc.cluster.local:4317" # Jaeger OTLP gRPC endpoint
tls:
insecure: true # Adjust for production
Re-apply the updated manifest:
kubectl apply -f otel-collector.yaml
Verify:
Check your Jaeger UI (or other backend) to confirm that traces, metrics, and logs are being received and displayed correctly. This confirms your unified observability pipeline is fully operational.
Production Considerations
Deploying the OpenTelemetry Collector in production requires careful planning for reliability, scalability, and security:
- Resource Management: Always define
requestsandlimitsfor CPU and memory. Monitor collector resource usage carefully and adjust as needed. Collectors can be memory-intensive, especially with high cardinality metrics or large trace payloads. - High Availability and Scalability:
- DaemonSet vs. Deployment: For agents collecting from nodes, a DaemonSet is appropriate. For central gateway collectors, a Deployment with multiple replicas and a Horizontal Pod Autoscaler (HPA) based on CPU/memory usage or custom metrics is recommended.
- Multi-tier Architecture: Consider a two-tier setup: an agent collector (DaemonSet) on each node for local collection, forwarding to a central gateway collector (Deployment) for advanced processing and exporting to backends. This reduces the blast radius and centralizes complex logic.
- Security:
- TLS/mTLS: Always enable TLS for OTLP communication, especially between collectors and backends, and ideally from applications to collectors. Use Kubernetes secrets to manage certificates.
- Authentication/Authorization: Implement authentication for exporters (e.g., API keys, OAuth2) if your backend supports it. Refer to the Collector security documentation.
- Network Policies: Restrict network access to your collector’s ingress ports and egress ports. Only allow trusted applications to send data to the collector and only allow the collector to send data to approved backend endpoints. For detailed guidance, consult our Kubernetes Network Policies: Complete Security Hardening Guide.
- Image Security: Use trusted, signed container images. Tools like Sigstore and Kyverno can enforce image signing policies.
- Data Persistence: If using stateful collectors (e.g., for disk-based buffering), ensure persistent storage is configured.
- Configuration Management: Use GitOps practices to manage your collector configurations. Store them in a version control system and apply changes declaratively.
- Monitoring the Collector Itself: Expose the collector’s internal metrics (e.g., via the
prometheusreceiver orzpagesextension) to monitor its health, data throughput, and error rates. - Cost Optimization: Efficiently configure processors (e.g.,
batch,tail_sampling,filter) to reduce the volume of data sent to expensive backends. For overall Kubernetes cost savings, tools like Karpenter Cost Optimization can help manage underlying infrastructure.
Troubleshooting
-
Collector Pods Not Running / CrashLoopBackOff:
Issue: The collector pods fail to start or repeatedly crash.
Solution: Check the pod logs and events for error messages. Common causes include:
- Configuration Errors: YAML syntax errors or incorrect receiver/processor/exporter names in the
config: |block.kubectl logs <collector-pod-name> kubectl describe pod <collector-pod-name> - Resource Constraints: The pod might be running out of memory or CPU. Increase
requestsandlimitsin theOpenTelemetryCollectormanifest. - Image Pull Issues: Incorrect image name or tag, or authentication failure to the container registry.
- Configuration Errors: YAML syntax errors or incorrect receiver/processor/exporter names in the
-
No Telemetry Data Reaching the Collector:
Issue: Applications are instrumented, but the collector logs show no incoming data.
Solution:
- Check Application Configuration: Ensure
OTEL_EXPORTER_OTLP_ENDPOINTandOTEL_EXPORTER_OTLP_PROTOCOLare correctly set in your application’s environment variables, pointing to the collector’s service (e.g.,otel-collector-ingest.default.svc.cluster.local:4317). - Network Connectivity: Verify that your application pods can reach the collector service. Use
kubectl exec -it <app-pod> -- curl -v telnet otel-collector-ingest.default.svc.cluster.local 4317(iftelnet/curlis available in the image).kubectl exec -it <your-app-pod> -- /bin/sh # Inside the pod: # ping otel-collector-ingest.default.svc.cluster.local # nc -vz otel-collector-ingest.default.svc.cluster.local 4317 # if nc is available - Collector Receiver Configuration: Ensure the
otlpreceiver is correctly configured in your collector YAML, and its ports match the application’s export protocol (gRPC on 4317, HTTP on 4318).
- Check Application Configuration: Ensure
-
Telemetry Data Not Reaching Backend:
Issue: Collector logs show data being received, but it’s not appearing in your backend (Jaeger, Prometheus, etc.).
Solution:
- Collector Exporter Configuration: Double-check the
exporters.otlp.endpoint(or other exporter endpoint) in the collector configuration. Ensure it points to the correct backend address and port. - Backend Connectivity: Verify the collector pods can reach the backend. This might involve checking Kubernetes Network Policies (see our Network Policies Security Guide), firewall rules, or DNS resolution.
- Backend Health: Ensure your observability backend (e.g., Jaeger collector, Prometheus remote write endpoint) is healthy and listening on the expected port.
- Backend Authentication: If your backend requires API keys or other credentials, ensure they are correctly configured in the collector’s exporter section, usually via Kubernetes Secrets.
- Collector Exporter Configuration: Double-check the
-
Missing Kubernetes Metadata:
Issue: Telemetry data is exported, but lacks Kubernetes attributes like pod name, namespace, etc.
Solution:
k8sattributesProcessor: Ensure thek8sattributesprocessor is enabled and correctly configured in your collector’s pipelines.- RBAC for Collector: The collector needs appropriate RBAC permissions to read Kubernetes object metadata. The OpenTelemetry Operator usually handles this, but verify the ServiceAccount associated with the collector has
getandlistpermissions on pods, nodes, deployments, etc.kubectl auth can-i get pods --namespace <collector-namespace> --as=system:serviceaccount:<collector-namespace>:<collector-service-account> - Pod Association: Check the
pod_associationsettings ink8sattributesto ensure the processor can correctly link incoming telemetry to Kubernetes pods.
-
High Resource Consumption by Collector:
Issue: Collector pods are consuming excessive CPU or memory.
Solution:
- Memory Limiter Processor: Ensure the
memory_limiterprocessor is configured and active in all pipelines. Adjust itslimit_mibandspike_limit_mibvalues. - Batch Processor: The
batchprocessor helps reduce the number of export calls. Tunesend_batch_sizeandtimeout. - Sampling: Implement sampling (e.g.,
tail_samplingprocessor) to reduce the volume of traces, especially high-volume, low-value ones. - Filtering: Use
filterprocessors to drop unwanted metrics, traces, or logs at the collector level before they are processed and exported. - Cardinality: High cardinality metrics can consume significant memory. Identify and address sources of high cardinality.
- Image Version: Ensure you are using a recent, stable version of the collector image, as performance improvements are continuous.
- Memory Limiter Processor: Ensure the
FAQ Section
1. What is OpenTelemetry and why do I need the Collector?
OpenTelemetry is a CNCF project that provides a set of APIs, SDKs, and tools to instrument, generate, collect, and export telemetry data (metrics, logs, and traces). The OpenTelemetry Collector acts as a proxy for this data, allowing you to receive data in various formats, process it (filter, transform, enrich), and then export it to multiple backend destinations. It decouples your application instrumentation from your backend choices, offering flexibility and reducing application overhead.
2. Should I deploy the Collector as a DaemonSet or a Deployment?
It depends on your use case:
- DaemonSet: Ideal for “agent” collectors that run on each node to collect host-level metrics, logs (e.g., from
/var/log), and local application telemetry. It ensures a collector is present on every node. - Deployment: Suited for “gateway” or “aggregator” collectors. These typically receive data from multiple agent collectors or directly from applications and perform more intensive processing before exporting to final backends. They can be scaled horizontally. Many production setups use a hybrid approach (DaemonSet agents -> Deployment gateways).
3. How do I send data from my application to the OpenTelemetry Collector?
Your applications, instrumented with OpenTelemetry SDKs, should be configured to export telemetry data using the OpenTelemetry Protocol (OTLP). You point the OTLP exporter in your application’s SDK to the Kubernetes Service