Introduction
In the dynamic world of cloud-native applications, robust monitoring is not just a luxury; it’s a fundamental necessity. As Kubernetes clusters grow in complexity, manually configuring Prometheus to scrape metrics from hundreds of services and pods becomes an unmanageable nightmare. This is where the Prometheus Operator steps in, transforming monitoring from a tedious, error-prone task into an automated, declarative process. By introducing custom resource definitions (CRDs) like `ServiceMonitor` and `PodMonitor`, the Operator allows you to define monitoring targets directly within your Kubernetes manifests, bringing a GitOps-friendly approach to observability.
This guide will demystify the Prometheus Operator, focusing specifically on the power of `ServiceMonitor` and `PodMonitor` resources. We’ll explore how these CRDs enable automatic discovery and scraping of metrics from your applications, whether they expose metrics via a Kubernetes `Service` or directly from individual `Pod` endpoints. By the end of this tutorial, you’ll be equipped to leverage the Prometheus Operator to build a scalable, resilient, and fully automated monitoring infrastructure for your Kubernetes workloads, ensuring you always have a clear view into the health and performance of your applications.
TL;DR: Prometheus Operator with ServiceMonitor and PodMonitor
The Prometheus Operator automates Prometheus setup in Kubernetes using CRDs like ServiceMonitor and PodMonitor to dynamically discover and scrape metrics. ServiceMonitor targets services, while PodMonitor targets individual pod endpoints.
- Install Prometheus Operator: Use Helm to deploy the
kube-prometheus-stack. - ServiceMonitor: Scrapes metrics exposed by a Kubernetes Service.
- PodMonitor: Scrapes metrics directly from Pod IPs, bypassing Services.
- Key Fields:
selector(to match services/pods),namespaceSelector,endpoints(port/path). - Verification: Check Prometheus UI (
/targets) andkubectl get servicemonitors.
# Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace
# Example ServiceMonitor for an Nginx service
kubectl apply -f - <
Prerequisites
Before diving into `ServiceMonitor` and `PodMonitor`, ensure you have the following:
- A Kubernetes Cluster: Access to a functional Kubernetes cluster (e.g., Minikube, Kind, GKE, EKS, AKS).
kubectl: The Kubernetes command-line tool, configured to interact with your cluster. Refer to the official Kubernetes documentation for installation.- Helm: The Kubernetes package manager. Install it by following the official Helm installation guide.
- Basic understanding of Prometheus: Familiarity with what Prometheus is and how it collects metrics.
- Basic understanding of Kubernetes Services and Pods: Knowledge of how these core Kubernetes resources function.
Step-by-Step Guide
This guide will walk you through deploying the Prometheus Operator and then configuring `ServiceMonitor` and `PodMonitor` resources to scrape metrics from example applications.
Step 1: Install the Prometheus Operator
The easiest way to install the Prometheus Operator, along with Prometheus itself, Grafana, and Alertmanager, is by using the `kube-prometheus-stack` Helm chart. This chart provides a comprehensive monitoring solution out-of-the-box.
The Prometheus Operator is a powerful controller that watches for specific custom resources (like `Prometheus`, `ServiceMonitor`, `PodMonitor`, `Alertmanager`, etc.) and manages Prometheus and Alertmanager instances accordingly. It abstracts away the complexities of deploying and managing Prometheus in a Kubernetes environment, allowing you to define your monitoring targets declaratively.
# Add the prometheus-community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
# Update your Helm repositories
helm repo update
# Install the kube-prometheus-stack chart into a new 'monitoring' namespace
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword='prom-operator' \
--set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
--set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false
The `--set` flags are crucial here. `serviceMonitorSelectorNilUsesHelmValues=false` and `podMonitorSelectorNilUsesHelmValues=false` ensure that the Prometheus instance created by the chart will only pick up `ServiceMonitor` and `PodMonitor` resources that explicitly match its `release: prometheus` label selector. This prevents it from scraping every `ServiceMonitor`/`PodMonitor` in the cluster by default, which is good practice for multi-tenant or complex environments.
Verify: Prometheus Operator Installation
Check if the pods are running in the `monitoring` namespace. It might take a few minutes for all components to come up.
kubectl get pods --namespace monitoring
Expected Output (truncated):
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 5m
prometheus-grafana-7bbd56485-6r7w9 3/3 Running 0 5m
prometheus-kube-prometheus-operator-7c87c9f86-w2z2c 1/1 Running 0 5m
prometheus-kube-prometheus-prometheus-0 2/2 Running 0 4m
prometheus-kube-state-metrics-5f89c4456c-x8b2t 1/1 Running 0 5m
prometheus-prometheus-node-exporter-7ljhp 1/1 Running 0 5m
prometheus-prometheus-node-exporter-g5mjs 1/1 Running 0 5m
You should see pods for Grafana, Alertmanager, the Prometheus Operator, Prometheus itself, kube-state-metrics, and node-exporter, all in `Running` status.
Step 2: Deploy an Example Application with Metrics
To demonstrate `ServiceMonitor` and `PodMonitor`, we need an application that exposes Prometheus-compatible metrics. We'll use a simple Nginx server with the `nginx-prometheus-exporter` sidecar.
This application is composed of an Nginx deployment, which serves traffic, and an `nginx-prometheus-exporter` container running as a sidecar in the same pod. The exporter exposes Nginx metrics on port 9113 at the `/metrics` path. We'll also create a Kubernetes `Service` to expose the Nginx HTTP port.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-metrics-app
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
name: http-web
- name: nginx-exporter
image: nginx/nginx-prometheus-exporter:1.1.0
args:
- -nginx.scrape-uri=http://localhost:80/stub_status
ports:
- containerPort: 9113
name: http-metrics
---
apiVersion: v1
kind: Service
metadata:
name: nginx-metrics-service
labels:
app: nginx
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: http-web
name: http-web
- protocol: TCP
port: 9113
targetPort: http-metrics
name: http-metrics
Save the above YAML to a file named `nginx-app.yaml` and apply it:
kubectl apply -f nginx-app.yaml
Verify: Example Application Deployment
Check that the Nginx pod and service are running.
kubectl get deployment nginx-metrics-app
kubectl get service nginx-metrics-service
kubectl get pods -l app=nginx
Expected Output:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-metrics-app 1/1 1 1 2m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx-metrics-service ClusterIP 10.104.148.156 <none> 80/TCP,9113/TCP 2m
NAME READY STATUS RESTARTS AGE
nginx-metrics-app-7b8f9f688-abcde 2/2 Running 0 2m
You should see one deployment, one service, and one pod for `nginx-metrics-app`, all in healthy states.
Step 3: Creating a ServiceMonitor
A `ServiceMonitor` tells the Prometheus Operator to find Kubernetes `Services` that match a specified label selector and scrape metrics from their defined endpoints. This is the most common and recommended way to expose application metrics in Kubernetes, as it leverages the stable IP and DNS of a `Service`.
The `ServiceMonitor` resource includes a `selector` to match `Service` labels, a `namespaceSelector` to restrict which namespaces to watch, and `endpoints` which define the port name and path for scraping. The `release: prometheus` label on the `ServiceMonitor` is crucial for it to be picked up by our Prometheus instance, as configured during the Helm installation.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-app-servicemonitor
labels:
release: prometheus # This label is crucial for the Prometheus instance to discover it
spec:
selector:
matchLabels:
app: nginx # Selects the Service with label app: nginx
endpoints:
- port: http-metrics # Refers to the named port 'http-metrics' in the Service
path: /metrics # The path where metrics are exposed
interval: 15s # Scrape interval, defaults to Prometheus global config if not set
namespaceSelector:
matchNames:
- default # Only look for Services in the 'default' namespace
Save the above YAML to `nginx-servicemonitor.yaml` and apply it:
kubectl apply -f nginx-servicemonitor.yaml
Verify: ServiceMonitor Creation and Prometheus Targets
First, check if the `ServiceMonitor` resource is created.
kubectl get servicemonitors
Expected Output:
NAME AGE
nginx-app-servicemonitor 1m
Now, let's verify that Prometheus is actually scraping the metrics. We'll port-forward to the Prometheus UI.
# Get the Prometheus service name (it's usually 'prometheus-kube-prometheus-prometheus')
kubectl get svc -n monitoring | grep prometheus-kube-prometheus-prometheus
# Port-forward to the Prometheus UI
kubectl -n monitoring port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090
Open your browser to `http://localhost:9090/targets`. You should see a target for `nginx-metrics-service` in the `default` namespace under the `servicemonitor/nginx-app-servicemonitor` job, with a state of `UP`.
You can also navigate to the "Graph" tab and query for `nginx_exporter_build_info` or `nginx_connections_accepted` to see the collected metrics.
Step 4: Creating a PodMonitor
While `ServiceMonitor` is generally preferred, there are scenarios where you might need to scrape metrics directly from individual pods, bypassing the `Service` abstraction. This is where `PodMonitor` comes in. Examples include:
- StatefulSets: Scrape individual pod instances of a StatefulSet that might expose unique metrics.
- DaemonSets: Scrape metrics from DaemonSet pods running on every node, like node exporters, where you might want to bypass the `Service` and get pod-specific metadata.
- Sidecar Scenarios: Scrape metrics from a specific sidecar container within a pod, if that container isn't exposed via the primary service.
Similar to `ServiceMonitor`, `PodMonitor` uses a `selector` to match `Pod` labels, `namespaceSelector`, and `podMetricsEndpoints` to define the port name and path. Again, the `release: prometheus` label is essential for discovery.
Let's create a `PodMonitor` for our Nginx exporter directly.
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: nginx-app-podmonitor
labels:
release: prometheus # This label is crucial for the Prometheus instance to discover it
spec:
selector:
matchLabels:
app: nginx # Selects Pods with label app: nginx
podMetricsEndpoints:
- port: http-metrics # Refers to the named port 'http-metrics' in the Pod spec
path: /metrics # The path where metrics are exposed
interval: 30s # Scrape interval
namespaceSelector:
matchNames:
- default # Only look for Pods in the 'default' namespace
Save the above YAML to `nginx-podmonitor.yaml` and apply it:
kubectl apply -f nginx-podmonitor.yaml
Verify: PodMonitor Creation and Prometheus Targets
First, check if the `PodMonitor` resource is created.
kubectl get podmonitors
Expected Output:
NAME AGE
nginx-app-podmonitor 1m
Now, refresh `http://localhost:9090/targets` in your browser (keep the `port-forward` running). You should now see *two* targets for your Nginx application: one from the `ServiceMonitor` and one from the `PodMonitor`. Both should be in an `UP` state. The `PodMonitor` target will typically show the direct IP address of the pod.
This demonstrates the flexibility of the Prometheus Operator. You can choose the most appropriate method for your application's architecture. For pod-to-pod communication and direct metric scraping, consider exploring advanced networking solutions like Cilium WireGuard Encryption to secure your metric endpoints.
Production Considerations
Deploying monitoring in production requires more than just getting targets scraped. Here are key considerations:
- Resource Management: Prometheus can be resource-intensive, especially with many targets or high scrape frequency. Monitor Prometheus's own resource usage and adjust CPU/memory limits and requests. Consider Horizontal Pod Autoscalers (HPA) for Prometheus if load fluctuates.
- Scalability and Federation: For very large clusters or multiple clusters, a single Prometheus instance may not suffice. Explore Prometheus federation or Thanos/Cortex for long-term storage and global querying.
- High Availability: Run multiple replicas of Prometheus and Alertmanager. The `kube-prometheus-stack` supports this. Ensure persistent storage is configured for stateful components.
- Security:
- Network Policies: Restrict Prometheus's access to only the necessary metric endpoints. For a comprehensive guide, refer to our Kubernetes Network Policies: Complete Security Hardening Guide.
- RBAC: Ensure Prometheus has only the necessary RBAC permissions to discover `Services` and `Pods` via `ServiceMonitor` and `PodMonitor` resources.
- Authentication/Authorization: Secure access to the Prometheus and Grafana UIs, potentially integrating with OIDC or other identity providers.
- TLS: Encrypt metric scrapes using TLS. Prometheus supports this, and you can configure `ServiceMonitor`/`PodMonitor` to use `tlsConfig`.
- Supply Chain Security: Ensure the container images used for Prometheus and its components are from trusted sources and scanned for vulnerabilities. Tools like Sigstore and Kyverno can help enforce supply chain security policies.
- Storage: Prometheus needs persistent storage for its time-series database. Configure appropriate `PersistentVolume` and `PersistentVolumeClaim` with adequate size and performance (e.g., SSD-backed storage).
- Alerting: Configure Alertmanager rules effectively. Integrate with your preferred notification channels (Slack, PagerDuty, email).
- Dashboards: Leverage Grafana to build meaningful dashboards. The `kube-prometheus-stack` comes with many pre-built dashboards for Kubernetes components.
- Label Management: Use consistent labels across your Kubernetes resources. This is critical for effective `selector` matching in `ServiceMonitor` and `PodMonitor`, and for organizing metrics in Prometheus.
- Cost Optimization: Efficient monitoring can also impact costs. Understanding how your nodes are utilized and potentially scaling them down when not needed can save money. Check out our guide on Reduce Kubernetes Costs by 60% with Karpenter for node optimization strategies.
- Service Mesh Integration: If you're using a service mesh like Istio, you might leverage its built-in observability features or integrate Prometheus with the mesh's proxy metrics. For advanced Istio deployments, see our Istio Ambient Mesh Production Guide.
Troubleshooting
Here are common issues you might encounter and how to resolve them:
-
Issue: Prometheus is not scraping targets defined by `ServiceMonitor` or `PodMonitor`.
Solution:- Check `release` label: Ensure your `ServiceMonitor`/`PodMonitor` has a `metadata.labels.release: prometheus` (or whatever label your Prometheus instance is configured to select) that matches the `serviceMonitorSelector` or `podMonitorSelector` of your `Prometheus` resource.
- Check labels: Verify that the `spec.selector.matchLabels` in your `ServiceMonitor` accurately matches the labels on your Kubernetes `Service`, or for `PodMonitor`, the labels on your `Pod`'s `metadata`.
- Check namespaceSelector: Ensure `spec.namespaceSelector` in your `ServiceMonitor`/`PodMonitor` includes the namespace where your `Service`/`Pod` resides. If omitted, it defaults to the `ServiceMonitor`/`PodMonitor`'s namespace.
- Check port name: The `port` field in `endpoints` (for `ServiceMonitor`) or `podMetricsEndpoints` (for `PodMonitor`) must exactly match the *name* of the port defined in the `Service` or `Pod` spec, respectively, not the port number.
- Check path: Ensure the `path` field is correct (defaults to `/metrics`).
- Prometheus UI: Access the Prometheus UI (`/targets`) to see if the target appears, and check the error message if it's `DOWN`.
- Prometheus Operator logs: Check the logs of the Prometheus Operator pod in the `monitoring` namespace for any errors or warnings related to `ServiceMonitor` or `PodMonitor` processing.
kubectl logs -n monitoring -l app.kubernetes.io/name=kube-prometheus-operator
-
Issue: Metrics are not appearing in Prometheus, even if the target is `UP`.
Solution:- Application exposing metrics: Verify that your application is actually exposing metrics on the specified port and path. You can `kubectl exec` into the pod and use `curl localhost:9113/metrics` (adjust port/path) to confirm.
- Firewall/Network Policies: Ensure no Kubernetes Network Policies are blocking Prometheus from reaching the application's metrics endpoint. If you have strict policies, you might need to explicitly allow ingress from Prometheus. Our Network Policies Security Guide offers detailed instructions.
- Scrape interval: If the scrape interval is very long, it might just take time for metrics to appear.
-
Issue: Prometheus pod is crashing or in `CrashLoopBackOff`.
Solution:- Resource limits: Prometheus can consume significant resources. Check the pod's logs (`kubectl logs prometheus-kube-prometheus-prometheus-0 -n monitoring`) for out-of-memory errors. Increase CPU and memory requests/limits in the Helm chart values.
- Persistent volume issues: If using persistent storage, ensure the PVC is bound and the underlying storage is healthy and has sufficient IOPS. Check events related to the Prometheus pod and PVC.
- Configuration errors: Review the Prometheus configuration generated by the Operator. Malformed rules or scrape configs can cause issues.
-
Issue: Grafana dashboards are not showing data.
Solution:- Prometheus data source: Verify that Grafana is correctly configured to use Prometheus as a data source. Check Grafana logs.
- Metric names: Ensure the metric names used in the Grafana dashboards actually exist in Prometheus. Query Prometheus directly to confirm.
- Time range: Adjust the time range in Grafana to ensure you are looking at a period when data was being collected.
-
Issue: `ServiceMonitor`/`PodMonitor` is created, but not listed when running `kubectl get servicemonitors` or `kubectl get podmonitors`.
Solution:- CRD not installed: The Custom Resource Definition (CRD) for `ServiceMonitor` or `PodMonitor` might not be installed. The `kube-prometheus-stack` chart usually installs these, but if you're installing components manually, ensure the CRDs are present.
kubectl get crd servicemonitors.monitoring.coreos.com kubectl get crd podmonitors.monitoring.coreos.com - Namespace issue: You might be looking in the wrong namespace. Ensure you specify the correct namespace with `-n
` or use `-A` to list across all namespaces.
- CRD not installed: The Custom Resource Definition (CRD) for `ServiceMonitor` or `PodMonitor` might not be installed. The `kube-prometheus-stack` chart usually installs these, but if you're installing components manually, ensure the CRDs are present.
-
Issue: Metrics are scraped, but labels are missing or incorrect.
Solution:- Relabeling: Prometheus uses `relabel_configs` to modify labels before ingestion. `ServiceMonitor` and `PodMonitor` allow defining `relabelings` within their `endpoints` or `podMetricsEndpoints` sections. You can use this to add, remove, or modify labels based on Kubernetes metadata. Consult the Prometheus documentation on relabeling for syntax.
- Kubernetes metadata: Prometheus automatically adds some Kubernetes metadata as labels (e.g., `kubernetes_name`, `kubernetes_namespace`). If you need more, you might need custom relabeling.
FAQ Section
-
Q: What is the main difference between `ServiceMonitor` and `PodMonitor`?
A: A `ServiceMonitor` instructs Prometheus to discover and scrape metrics from a Kubernetes `Service`. This is generally preferred as `Services` provide a stable abstraction over changing pod IPs. A `PodMonitor`, on the other hand, tells Prometheus to discover and scrape metrics directly from individual `Pod` IPs. `PodMonitor` is useful for scenarios like StatefulSets, DaemonSets, or when you need to scrape specific sidecar containers within a pod that aren't exposed via a Service. -
Q: Why is the `release: prometheus` label important on my `ServiceMonitor`/`PodMonitor`?
A: When you install the `kube-prometheus-stack` via Helm, the `Prometheus` custom resource created by the operator typically includes a `serviceMonitorSelector` and `podMonitorSelector` that look for `ServiceMonitor` and `PodMonitor` resources with specific labels (by default, `release: prometheus`). If your `ServiceMonitor`/`PodMonitor` doesn't have this matching label, the Prometheus instance will not discover or scrape it. This mechanism allows you to have multiple Prometheus instances in a cluster, each scraping a distinct set of targets. -
Q: Can I use `ServiceMonitor` and `PodMonitor` in different namespaces than Prometheus?
A: Yes, absolutely. The `ServiceMonitor` and `PodMonitor` resources themselves can live in any namespace. You control which namespaces they watch for targets using the `spec.namespaceSelector` field. For example, if `namespaceSelector` is set to `matchNames: [default, my-app-namespace]`, it will look for `Services`/`Pods` in those specified namespaces. If `namespaceSelector` is omitted, it defaults to the namespace where the `ServiceMonitor`/`PodMonitor` resource itself is deployed. -
Q: How do I secure the metrics endpoints?
A: There are several ways:- Network Policies: Implement Kubernetes Network Policies to restrict access to the metrics port to only the Prometheus pods.
- TLS: Configure your application to expose metrics over HTTPS and then configure the `ServiceMonitor` or `PodMonitor` with `tlsConfig` to enable secure scraping.
- Authentication: For highly sensitive metrics, you can configure your application to require basic authentication or other forms of authentication, and then provide credentials in the `ServiceMonitor`/`PodMonitor`'s `basicAuth` or `bearerTokenFile` sections.
-
Q: How can I debug why a `ServiceMonitor` or `PodMonitor` isn't working?
A: Follow these steps systematically:- Check resource existence: `kubectl get servicemonitors -A` or `kubectl get podmonitors -A`.
- Check Prometheus Operator logs: `kubectl logs -n monitoring -l app.kubernetes.io/name=kube-prometheus-operator`. Look for errors related to your monitor.
- Check `Prometheus` resource status: `kubectl get prometheus -n monitoring prometheus-kube-prometheus-prometheus -o yaml`. Look at the `status` section for any issues with `ServiceMonitor`/`PodMonitor` discovery.
- Prometheus UI `/targets` page: This is the most direct way to see if Prometheus has discovered your target and if it's `UP` or `DOWN`, along with any error messages.
- Manual scrape test: `kubectl port-forward` to your application pod and `curl` the metrics endpoint directly to ensure the application is exposing metrics as expected.
Cleanup Commands
To remove the resources created in this guide, run the following commands:
# Delete the example application and its service/monitors
kubectl delete -f nginx-app.yaml
kubectl delete -f nginx-servicemonitor.yaml
kubectl delete -f nginx-podmonitor.yaml
# Uninstall the kube-prometheus-stack Helm chart
helm uninstall prometheus --namespace monitoring
# Optionally, delete the monitoring namespace
kubectl delete namespace monitoring
Next Steps / Further Reading
Congratulations! You've successfully deployed the Prometheus Operator and used `ServiceMonitor` and `PodMonitor` to scrape metrics. This is just the beginning of building a robust observability stack.
- Explore Grafana: Dive deeper into building custom dashboards in Grafana to visualize your metrics. The `kube-prometheus-stack` comes with many pre-built dashboards that you can adapt.
- Alerting with Alertmanager: Configure Alertmanager to send notifications based on your Prometheus alerts. Learn about defining `PrometheusRule` resources.
- Custom Metrics: Instrument your own applications to expose custom metrics. Refer to the Prometheus client libraries documentation.
- Advanced Relabeling: Master Prometheus relabeling rules for advanced target filtering, label manipulation, and metric optimization.
- eBPF Observability: For even deeper insights into your cluster's networking and application performance, explore technologies like eBPF. Our guide on eBPF Observability: Building Custom Metrics with Hubble can get you started.
- Kubernetes Gateway API: For advanced traffic management and potentially exposing metrics from your ingress layer, consider the Kubernetes Gateway API vs Ingress: The Complete Migration Guide.
- GPU Scheduling for LLMs: If you're working with AI/ML workloads, monitoring GPU usage is critical. Learn about best practices in our Running LLMs on Kubernetes: GPU Scheduling Best Practices.
Conclusion
The Prometheus Operator, with its `ServiceMonitor` and `PodMonitor` CRDs, fundamentally changes how we approach monitoring in Kubernetes. By enabling declarative, automated discovery of metric endpoints, it significantly reduces operational overhead and enhances the reliability of your monitoring infrastructure. You've learned how to set up the Operator, deploy an example application, and configure both `ServiceMonitor` and `PodMonitor` to effectively scrape metrics. With these tools, you're well on your way to building a scalable, resilient, and insightful observability platform for your cloud-native applications. Keep exploring, keep monitoring, and keep your clusters healthy!