What is K8sGPT?
K8sGPT is a tool that scans your Kubernetes clusters, diagnoses issues, and uses AI to provide recommendations and explanations. It gives Kubernetes Superpowers to everyone by triaging and enriching cluster information with AI.
Key Features:
- Automated cluster scanning
- Issue detection and diagnosis
- AI-powered explanations
- Support for multiple AI backends
- Integration with monitoring tools
- Multi-language support
- Custom analyzer plugins
Installation
Using Homebrew (macOS/Linux)
brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt
Using Binary Release
# Linux AMD64
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_amd64.deb
sudo dpkg -i k8sgpt_amd64.deb
# Linux ARM64
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_arm64.deb
sudo dpkg -i k8sgpt_arm64.deb
# RPM-based (RHEL/CentOS/Fedora)
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_amd64.rpm
sudo rpm -i k8sgpt_amd64.rpm
Using Go
go install github.com/k8sgpt-ai/k8sgpt/cmd/k8sgpt@latest
Using Docker
docker run -it --rm \
-v ~/.kube/config:/root/.kube/config \
ghcr.io/k8sgpt-ai/k8sgpt:latest analyze --explain
Install in Kubernetes (Operator)
bash
# Add Helm repository
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
# Install K8sGPT Operator
helm install k8sgpt-operator k8sgpt/k8sgpt-operator \
--namespace k8sgpt-operator-system \
--create-namespace
Verify Installation
k8sgpt version
Quick Start
Basic Analysis
# Analyze cluster without AI explanations
k8sgpt analyze
# Analyze with AI explanations (requires configured backend)
k8sgpt analyze --explain
# Analyze with output filters
k8sgpt analyze --explain --filter=Pod
# Analyze specific namespace
k8sgpt analyze --namespace default --explain
AI Backend Configuration
OpenAI
# Add OpenAI backend
k8sgpt auth add openai
# With specific token
k8sgpt auth add openai --token $OPENAI_TOKEN
# With custom model
k8sgpt auth add openai --model gpt-4 --token $OPENAI_TOKEN
# Set as default
k8sgpt auth default openai
Azure OpenAI
k8sgpt auth add azureopenai \
--baseurl https://your-instance.openai.azure.com/ \
--engine your-deployment-name \
--token $AZURE_OPENAI_TOKEN
Local AI (LocalAI)
k8sgpt auth add localai \
--baseurl http://localhost:8080/v1 \
--model ggml-gpt4all-j
# No token needed for local
k8sgpt auth default localai
Ollama
k8sgpt auth add ollama \
--baseurl http://localhost:11434 \
--model llama2
k8sgpt auth default ollama
Anthropic Claude
k8sgpt auth add claude \
--token $ANTHROPIC_API_KEY \
--model claude-3-sonnet-20240229
k8sgpt auth default claude
Cohere
k8sgpt auth add cohere --token $COHERE_TOKEN
k8sgpt auth default cohere
Google Gemini
k8sgpt auth add google \
--token $GOOGLE_API_KEY \
--model gemini-pro
k8sgpt auth default google
Amazon Bedrock
k8sgpt auth add amazonbedrock \
--region us-east-1 \
--model anthropic.claude-v2
List Configured Backends
# List all auth backends
k8sgpt auth list
# Show default backend
k8sgpt auth default
Remove Backend
k8sgpt auth remove openai
Analysis Commands
Basic Analysis
# Simple analysis
k8sgpt analyze
# With explanations
k8sgpt analyze --explain
# With specific backend
k8sgpt analyze --explain --backend openai
# Analyze all namespaces
k8sgpt analyze --explain --all-namespaces
# Analyze specific namespace
k8sgpt analyze --namespace kube-system --explain
Filtering
# Filter by resource type
k8sgpt analyze --explain --filter=Pod
k8sgpt analyze --explain --filter=Service
k8sgpt analyze --explain --filter=Deployment
# Multiple filters
k8sgpt analyze --explain --filter=Pod,Service,Deployment
# Exclude filters
k8sgpt analyze --explain --filter=Pod --no-filter=Service
Available Filters:
- Pod
- Service
- Deployment
- ReplicaSet
- StatefulSet
- Ingress
- PersistentVolumeClaim
- NetworkPolicy
- Node
- CronJob
- HPA (HorizontalPodAutoscaler)
- PodDisruptionBudget
- GatewayClass
- Gateway
- HTTPRoute
Output Formats
# JSON output
k8sgpt analyze --explain --output=json
# YAML output
k8sgpt analyze --explain --output=yaml
# Save to file
k8sgpt analyze --explain > analysis-report.txt
# JSON to file
k8sgpt analyze --explain --output=json > analysis.json
Anonymous Data
# Anonymize sensitive data in output
k8sgpt analyze --explain --anonymize
# This replaces:
# - Pod names
# - Namespace names
# - Node names
# - Container names
Filters Configuration
List Available Filters
k8sgpt filters list
Add Custom Filters
# Enable specific analyzers
k8sgpt filters add Pod
k8sgpt filters add Service
k8sgpt filters add Deployment
# Enable multiple
k8sgpt filters add Pod,Service,Ingress
Remove Filters
k8sgpt filters remove Pod
Integration Commands
Trivy Integration
# Enable Trivy for vulnerability scanning
k8sgpt integration activate trivy
# List integrations
k8sgpt integration list
# Deactivate integration
k8sgpt integration deactivate trivy
Prometheus Integration
# Activate Prometheus (experimental)
k8sgpt integration activate prometheus
Serve Mode
Start Server
# Start K8sGPT server
k8sgpt serve
# With specific port
k8sgpt serve --port 8080
# With backend
k8sgpt serve --backend openai
API Endpoints
# Analyze endpoint
curl -X POST http://localhost:8080/analyze \
-H "Content-Type: application/json" \
-d '{
"namespace": "default",
"explain": true,
"filters": ["Pod", "Service"]
}'
# Health check
curl http://localhost:8080/health
K8sGPT Operator
Deploy Operator
# Install via Helm
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update
helm install k8sgpt-operator k8sgpt/k8sgpt-operator \
--namespace k8sgpt-operator-system \
--create-namespace \
--set serviceMonitor.enabled=true
Create K8sGPT Resource
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-sample
namespace: k8sgpt-operator-system
spec:
ai:
enabled: true
backend: openai
model: gpt-4
secret:
name: k8sgpt-secret
key: openai-api-key
noCache: false
version: v0.3.30
filters:
- Pod
- Service
- Deployment
sink:
type: slack
webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
kubectl apply -f k8sgpt-resource.yaml
Create Secret for API Key
kubectl create secret generic k8sgpt-secret \
--from-literal=openai-api-key=$OPENAI_TOKEN \
-n k8sgpt-operator-system
Check Operator Status
bash
# Get K8sGPT resources
kubectl get k8sgpt -n k8sgpt-operator-system
# Describe K8sGPT resource
kubectl describe k8sgpt k8sgpt-sample -n k8sgpt-operator-system
# Check operator logs
kubectl logs -n k8sgpt-operator-system \
-l control-plane=controller-manager -f
# Get Results (CRD)
kubectl get results -n k8sgpt-operator-system
# Describe a result
kubectl describe result <result-name> -n k8sgpt-operator-system
Custom Resource Definitions (Operator)
Result CRD
yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
name: pod-crashloopbackoff-example
namespace: default
spec:
kind: Pod
name: failing-pod
error:
- text: "Back-off restarting failed container"
details: "Container is in CrashLoopBackOff state"
parentObject: "Deployment/my-deployment"
Check Results
bash
# List all results
kubectl get results --all-namespaces
# Get specific result
kubectl get result <result-name> -o yaml
# Watch results
kubectl get results -w
Common Analysis Scenarios
Analyze Pod Issues
bash
# Check all pod problems
k8sgpt analyze --filter=Pod --explain
# Specific namespace
k8sgpt analyze --namespace production --filter=Pod --explain
Analyze Service Issues
bash
# Check service configurations
k8sgpt analyze --filter=Service --explain
Analyze Network Issues
bash
# Check networking components
k8sgpt analyze --filter=Ingress,Service,NetworkPolicy --explain
Analyze Storage Issues
bash
# Check PVC problems
k8sgpt analyze --filter=PersistentVolumeClaim --explain
Analyze Node Issues
bash
# Check node health
k8sgpt analyze --filter=Node --explain
Analyze Deployment Issues
bash
# Check deployments, replicasets
k8sgpt analyze --filter=Deployment,ReplicaSet --explain
Full Cluster Scan
bash
# Comprehensive analysis
k8sgpt analyze --explain --all-namespaces
Output Examples
Without Explanation
0: Service default/my-service
- Error: Service has no endpoints, pods may not be matching the selector
With AI Explanation
0: Service default/my-service
- Error: Service has no endpoints, pods may not be matching the selector
- AI Explanation: This error typically occurs when:
1. The selector in the Service doesn't match any Pod labels
2. All Pods matching the selector are not in Running state
3. The Pods are in a different namespace than expected
Solutions:
1. Check that your Pod labels match the Service selector
2. Verify Pods are running: kubectl get pods -l app=my-app
3. Update Service selector or Pod labels as needed
Configuration File
Config Location
bash
# Default config location
~/.config/k8sgpt/k8sgpt.yaml
# View config
cat ~/.config/k8sgpt/k8sgpt.yaml
Example Config
yaml
ai:
backend: openai
model: gpt-4
baseurl: ""
engine: ""
temperature: 0.7
topP: 1.0
kubeconfig: ~/.kube/config
kubecontext: ""
language: english
filters:
- Pod
- Service
- Deployment
integrations:
trivy:
enabled: true
skipInstall: false
sink:
type: ""
webhook: ""
Advanced Usage
Language Support
bash
# Set language for AI responses
k8sgpt analyze --explain --language spanish
k8sgpt analyze --explain --language french
k8sgpt analyze --explain --language german
k8sgpt analyze --explain --language chinese
# Available languages:
# english, spanish, french, german, italian, portuguese, dutch,
# russian, chinese, japanese, korean, turkish
Custom Analyzers
bash
# List available analyzers
k8sgpt analyze --list-filters
# Use custom analyzer plugin
k8sgpt analyze --with-doc
Caching
bash
# Disable cache
k8sgpt analyze --no-cache --explain
# Default: cache enabled for performance
Max Concurrency
bash
# Control concurrent analysis
k8sgpt analyze --max-concurrency 10
Troubleshooting with K8sGPT
Scenario 1: CrashLoopBackOff
bash
# Analyze pod crashes
k8sgpt analyze --filter=Pod --explain
# Example output with AI:
# "The pod is crashing because the container is exiting with error code 1.
# This typically indicates:
# 1. Missing environment variables
# 2. Application configuration error
# 3. Failed health checks
# Check logs with: kubectl logs <pod-name>"
Scenario 2: Service Not Accessible
bash
k8sgpt analyze --filter=Service,Ingress --explain
# AI will check:
# - Service selector matches
# - Endpoints exist
# - Port configurations
# - Ingress rules
Scenario 3: Resource Limits
bash
k8sgpt analyze --filter=Pod,Node --explain
# AI will identify:
# - Memory/CPU pressure
# - OOMKilled pods
# - Node resource constraints
# - Recommendations for limits
Scenario 4: Storage Issues
bash
k8sgpt analyze --filter=PersistentVolumeClaim --explain
# AI will diagnose:
# - PVC binding issues
# - Storage class problems
# - Volume mount errors
Scenario 5: Network Policies
bash
k8sgpt analyze --filter=NetworkPolicy --explain
# AI will analyze:
# - Policy conflicts
# - Connectivity issues
# - Ingress/egress rules
Integration with CI/CD
GitLab CI
yaml
k8sgpt-scan:
stage: test
image: ghcr.io/k8sgpt-ai/k8sgpt:latest
script:
- k8sgpt auth add openai --token $OPENAI_TOKEN
- k8sgpt analyze --explain --output=json > k8sgpt-report.json
artifacts:
reports:
json: k8sgpt-report.json
GitHub Actions
yaml
name: K8sGPT Analysis
on: [push]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup K8sGPT
run: |
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/latest/download/k8sgpt_amd64.deb
sudo dpkg -i k8sgpt_amd64.deb
- name: Analyze Cluster
env:
OPENAI_TOKEN: ${{ secrets.OPENAI_TOKEN }}
run: |
k8sgpt auth add openai --token $OPENAI_TOKEN
k8sgpt analyze --explain --output=json
Jenkins Pipeline
groovy
pipeline {
agent any
stages {
stage('K8sGPT Analysis') {
steps {
sh '''
k8sgpt auth add openai --token $OPENAI_TOKEN
k8sgpt analyze --explain > k8sgpt-report.txt
'''
archiveArtifacts artifacts: 'k8sgpt-report.txt'
}
}
}
}
Slack/Teams Integration
Slack Webhook (Operator)
yaml
apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
name: k8sgpt-slack
namespace: k8sgpt-operator-system
spec:
ai:
enabled: true
backend: openai
secret:
name: k8sgpt-secret
key: openai-api-key
filters:
- Pod
- Service
sink:
type: slack
webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
Microsoft Teams
yaml
spec:
sink:
type: msteams
webhook: https://outlook.office.com/webhook/YOUR-WEBHOOK
Security Best Practices
1. API Key Management
bash
# Use environment variables
export OPENAI_TOKEN=sk-xxx
k8sgpt auth add openai
# Don't hardcode tokens in scripts
# Use secrets in Kubernetes
# Rotate keys regularly
k8sgpt auth remove openai
k8sgpt auth add openai --token $NEW_TOKEN
2. Anonymization
bash
# Always use anonymization for sharing reports
k8sgpt analyze --explain --anonymize > report.txt
# This protects:
# - Internal naming conventions
# - Resource identifiers
# - Namespace names
3. RBAC for Operator
yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: k8sgpt-sa
namespace: k8sgpt-operator-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: k8sgpt-role
rules:
- apiGroups: [""]
resources: ["pods", "services", "nodes", "namespaces"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets", "statefulsets"]
verbs: ["get", "list"]
Performance Tips
1. Use Filters
bash
# Don't analyze everything unless needed
k8sgpt analyze --filter=Pod,Service
# More targeted = faster results
2. Namespace Scoping
bash
# Analyze specific namespace
k8sgpt analyze --namespace production
# Avoid --all-namespaces for large clusters
3. Caching
bash
# Use cache for repeated analysis
k8sgpt analyze --explain # cache enabled by default
# Clear cache when needed
k8sgpt cache clear
4. Concurrency Control
bash
# Reduce concurrency for stability
k8sgpt analyze --max-concurrency 5
Comparing AI Backends
BackendSpeedCostQualityLocalOpenAI GPT-4MediumHighExcellentNoOpenAI GPT-3.5FastLowGoodNoClaudeMediumMediumExcellentNoLocalAIFastFreeGoodYesOllamaFastFreeGoodYesGeminiFastMediumVery GoodNo
Recommendations:
- Production: OpenAI GPT-4 or Claude
- Development: GPT-3.5 or Ollama
- Air-gapped: LocalAI or Ollama
- Cost-sensitive: Ollama or LocalAI
Useful Commands Reference
bash
# Installation
brew install k8sgpt
# Authentication
k8sgpt auth add openai --token $TOKEN
k8sgpt auth list
k8sgpt auth default openai
# Analysis
k8sgpt analyze --explain
k8sgpt analyze --filter=Pod --explain
k8sgpt analyze --namespace default --explain
k8sgpt analyze --all-namespaces --explain
k8sgpt analyze --explain --anonymize
# Output formats
k8sgpt analyze --explain --output=json
k8sgpt analyze --explain --output=yaml
# Filters
k8sgpt filters list
k8sgpt filters add Pod,Service
# Integrations
k8sgpt integration activate trivy
k8sgpt integration list
# Server mode
k8sgpt serve --port 8080
# Operator
helm install k8sgpt-operator k8sgpt/k8sgpt-operator
kubectl get k8sgpt -n k8sgpt-operator-system
kubectl get results --all-namespaces
# Version
k8sgpt version
# Help
k8sgpt --help
k8sgpt analyze --help
Real-World Examples
Example 1: Debug Failed Deployment
bash
$ k8sgpt analyze --filter=Deployment,Pod --explain
0: Deployment default/nginx-deployment
- Error: Deployment has minimum availability warning
- AI Explanation: The deployment shows 0/3 pods are ready.
This is likely because:
1. Image pull failures - check ImagePullBackOff status
2. Resource constraints - insufficient CPU/memory on nodes
3. Configuration errors in pod spec
Run: kubectl describe deployment nginx-deployment
Check pod events: kubectl describe pod <pod-name>
Example 2: Service Discovery Issue
bash
$ k8sgpt analyze --filter=Service --explain
0: Service default/api-service
- Error: Service has no endpoints
- AI Explanation: The service selector (app: api) doesn't match
any running pods. Either:
1. Pods aren't running: kubectl get pods -l app=api
2. Label mismatch: Check pod labels vs service selector
3. Pods in different namespace
Fix: Update service selector or pod labels to match
Example 3: Resource Exhaustion
bash
$ k8sgpt analyze --filter=Node,Pod --explain
0: Node ip-10-0-1-100.ec2.internal
- Error: Node has MemoryPressure condition
- AI Explanation: Node is running low on memory (>85% used).
This causes pod evictions and scheduling failures.
Actions:
1. Check top memory consuming pods: kubectl top pods
2. Add resource limits to pods
3. Scale down non-critical workloads
4. Add more nodes to cluster
Community Resources
- GitHub: https://github.com/k8sgpt-ai/k8sgpt
- Documentation: https://docs.k8sgpt.ai/
- Slack: https://k8sgpt.slack.com
- Discord: https://discord.gg/k8sgpt
- Blog: https://k8sgpt.ai/blog
Contributing
bash
# Clone repository
git clone https://github.com/k8sgpt-ai/k8sgpt.git
cd k8sgpt
# Build from source
go build -o k8sgpt main.go
# Run tests
go test ./...
# Create custom analyzer
# See: https://github.com/k8sgpt-ai/k8sgpt/tree/main/pkg/analyzer
Tips & Tricks
- Combine with kubectl: Use K8sGPT to identify issues, kubectl to fix them
- Regular scans: Run K8sGPT in CI/CD for continuous cluster health checks
- Use filters: Focus on specific resource types for faster analysis
- Anonymize reports: Always anonymize before sharing externally
- Local AI for dev: Use Ollama/LocalAI for free development testing
- Operator for production: Deploy operator for continuous monitoring
- Slack integration: Get real-time alerts for cluster issues
- Cost optimization: Use GPT-3.5 for routine checks, GPT-4 for complex issues
- Cache wisely: Enable cache for repeated analysis, disable for fresh data
- Document patterns: Save common issues and AI solutions for team reference