Kubernetes has become the de facto standard for container orchestration, but even experienced developers fall into common traps that can lead to production nightmares. After working with hundreds of K8s deployments, I’ve identified the most frequent mistakes that could be silently breaking your clusters right now.
1. Not Setting Resource Requests and Limits
The Mistake: Deploying pods without defining CPU and memory constraints.
Why It’s Bad: Without resource limits, a single misbehaving pod can consume all node resources, causing cascading failures across your cluster.
Wrong Way:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: myapp:latest
# No resources defined - Recipe for disaster!
Right Way:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: myapp:latest
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Pro Tip: Start with requests at 50% of limits, then tune based on actual metrics from your monitoring tools.
2. Using latest Tag in Production
The Mistake: Deploying containers with the latest tag or no tag at all.
Why It’s Bad: You lose reproducibility, can’t roll back reliably, and introduce unpredictable behavior when images update.
Wrong Way:
spec:
containers:
- name: web
image: nginx:latest # Which version is this really?
Right Way:
spec:
containers:
- name: web
image: nginx:1.25.3 # Explicit, reproducible, rollback-friendly
imagePullPolicy: IfNotPresent
Bonus: Implement semantic versioning for your own images: myapp:v1.2.3 or use git commit SHAs: myapp:a4f2c8d.
3. Running Containers as Root
The Mistake: Not specifying a security context, defaulting to root user (UID 0).
Why It’s Bad: If a container is compromised, the attacker has root-level access, making privilege escalation trivial.
Wrong Way:
spec:
containers:
- name: app
image: myapp:v1.0
# Runs as root by default
Right Way:
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
capabilities:
drop:
- ALL
containers:
- name: app
image: myapp:v1.0
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
Implementation: Update your Dockerfile to create a non-root user:
FROM node:18-alpine
RUN addgroup -g 1000 appgroup && \
adduser -D -u 1000 -G appgroup appuser
USER appuser
WORKDIR /app
COPY --chown=appuser:appgroup . .
CMD ["node", "server.js"]
4. Not Implementing Readiness and Liveness Probes
The Mistake: Deploying applications without health checks.
Why It’s Bad: Kubernetes can’t determine if your app is healthy, leading to traffic being sent to broken pods or pods being restarted unnecessarily.
Wrong Way:
spec:
containers:
- name: api
image: myapi:v2.0
# No health checks
Right Way:
spec:
containers:
- name: api
image: myapi:v2.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
Quick Guide:
- Liveness Probe: Is the app alive? If not, restart it.
- Readiness Probe: Is the app ready to serve traffic? If not, remove from service.
5. Ignoring Pod Disruption Budgets (PDB)
The Mistake: Not protecting critical applications from voluntary disruptions during cluster maintenance.
Why It’s Bad: During node drains or cluster upgrades, all your pods might go down simultaneously, causing downtime.
Wrong Way:
# Just the deployment, no PDB
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 3
# ...
Right Way:
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-service
spec:
replicas: 3
selector:
matchLabels:
app: payment-service
template:
metadata:
labels:
app: payment-service
spec:
containers:
- name: payment
image: payment:v1.0
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: payment-service-pdb
spec:
minAvailable: 2 # Keep at least 2 pods running during disruptions
selector:
matchLabels:
app: payment-service
Alternative: Use maxUnavailable: 1 to ensure only one pod is disrupted at a time.
6. Not Using Namespaces for Multi-Tenancy
The Mistake: Deploying everything in the default namespace.
Why It’s Bad: No logical separation, difficult RBAC management, and resource quotas can’t be applied effectively.
Wrong Way:
kubectl apply -f app.yaml # Goes to default namespace
Right Way:
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
---
apiVersion: v1
kind: ResourceQuota
metadata:
name: prod-quota
namespace: production
spec:
hard:
requests.cpu: "100"
requests.memory: "200Gi"
persistentvolumeclaims: "10"
pods: "50"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
namespace: production
spec:
# ... your deployment spec
Namespace Strategy:
dev– Development workloadsstaging– Pre-production testingproduction– Production workloadsmonitoring– Prometheus, Grafanaingress– Ingress controllers
7. Exposing Secrets in Environment Variables
The Mistake: Storing sensitive data in ConfigMaps or plain environment variables.
Why It’s Bad: Secrets are visible in pod specs, logs, and can be easily extracted from a compromised container.
Wrong Way:
spec:
containers:
- name: app
env:
- name: DB_PASSWORD
value: "SuperSecret123!" # Plain text password!
- name: API_KEY
valueFrom:
configMapKeyRef:
name: app-config
key: api-key # ConfigMaps aren't encrypted!
Right Way:
apiVersion: v1
kind: Secret
metadata:
name: db-credentials
type: Opaque
stringData:
password: "SuperSecret123!"
username: "dbadmin"
---
spec:
containers:
- name: app
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
volumeMounts:
- name: secret-volume
mountPath: /etc/secrets
readOnly: true
volumes:
- name: secret-volume
secret:
secretName: db-credentials
Better Yet: Use external secret management:
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: db-secret
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: db-credentials
data:
- secretKey: password
remoteRef:
key: database/production
property: password
8. Not Implementing Network Policies
The Mistake: Leaving pod-to-pod communication wide open.
Why It’s Bad: A compromised pod can communicate with any other pod in your cluster, enabling lateral movement for attackers.
Wrong Way:
# No NetworkPolicy = All pods can talk to each other
Right Way:
# Default deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: production
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
---
# Allow backend to database
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: postgres
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 5432
9. Forgetting About Horizontal Pod Autoscaling (HPA)
The Mistake: Setting a fixed number of replicas regardless of load.
Why It’s Bad: You’re either over-provisioned (wasting money) or under-provisioned (causing performance issues).
Wrong Way:
spec:
replicas: 3 # Always 3, whether you need 1 or 10
Right Way:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2 # Initial minimum
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: webapp:v1.0
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 30
- type: Pods
value: 2
periodSeconds: 30
selectPolicy: Max
10. Not Implementing Proper Logging and Monitoring
The Mistake: Deploying applications without centralized logging or metrics collection.
Why It’s Bad: When things go wrong (and they will), you’re flying blind with no way to debug issues.
Wrong Way:
# Deploy and hope for the best
spec:
containers:
- name: app
image: myapp:v1.0
Right Way:
Logging with Fluent Bit:
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level info
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 5
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
[OUTPUT]
Name es
Match *
Host elasticsearch.logging.svc
Port 9200
Index fluent-bit
Application Instrumentation:
spec:
containers:
- name: app
image: myapp:v1.0
env:
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://otel-collector:4317"
- name: OTEL_SERVICE_NAME
value: "my-app"
ports:
- containerPort: 8080
name: http
- containerPort: 8081
name: metrics # Prometheus metrics endpoint
ServiceMonitor for Prometheus:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: app-metrics
namespace: production
spec:
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics
interval: 30s
path: /metrics
Bonus: Quick Checklist Before Production
Before deploying to production, verify:
- [ ] Resource requests and limits are set
- [ ] Image tags are specific (no
latest) - [ ] Security context is configured (non-root user)
- [ ] Liveness and readiness probes are implemented
- [ ] Pod Disruption Budget is in place
- [ ] Namespaces are used for separation
- [ ] Secrets are managed securely (not in ConfigMaps)
- [ ] Network policies are configured
- [ ] HPA is configured for variable workloads
- [ ] Logging and monitoring are set up
- [ ] RBAC is configured properly
- [ ] Backup strategy is in place
- [ ] Resource quotas are set per namespace
- [ ] Anti-affinity rules for critical pods
Conclusion
These mistakes are incredibly common, even in production clusters at major companies. The good news? They’re all fixable with proper configuration and a security-first mindset.
Start by auditing your current deployments:
# Find pods without resource limits
kubectl get pods --all-namespaces -o json | \
jq '.items[] | select(.spec.containers[].resources.limits == null) | .metadata.name'
# Find pods running as root
kubectl get pods --all-namespaces -o json | \
jq '.items[] | select(.spec.securityContext.runAsNonRoot != true) | .metadata.name'
# Find deployments without HPA
kubectl get deployments --all-namespaces -o json | \
jq -r '.items[] | select(.spec.replicas != null) | "\(.metadata.namespace)/\(.metadata.name)"'
Remember: Kubernetes gives you the tools to build resilient, secure, and scalable applications. But like any powerful tool, it requires knowledge and discipline to use correctly.
What Kubernetes mistakes have you made? Share your experiences in the comments below, or join the Kubezilla community where we discuss Kubernetes best practices daily.