K8sGPT Cheatsheet

What is K8sGPT?

K8sGPT is a tool that scans your Kubernetes clusters, diagnoses issues, and uses AI to provide recommendations and explanations. It gives Kubernetes Superpowers to everyone by triaging and enriching cluster information with AI.

Key Features:

  • Automated cluster scanning
  • Issue detection and diagnosis
  • AI-powered explanations
  • Support for multiple AI backends
  • Integration with monitoring tools
  • Multi-language support
  • Custom analyzer plugins

Installation

Using Homebrew (macOS/Linux)

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

Using Binary Release

# Linux AMD64
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_amd64.deb
sudo dpkg -i k8sgpt_amd64.deb

# Linux ARM64
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_arm64.deb
sudo dpkg -i k8sgpt_arm64.deb

# RPM-based (RHEL/CentOS/Fedora)
curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.3.30/k8sgpt_amd64.rpm
sudo rpm -i k8sgpt_amd64.rpm

Using Go

go install github.com/k8sgpt-ai/k8sgpt/cmd/k8sgpt@latest

Using Docker

docker run -it --rm \
  -v ~/.kube/config:/root/.kube/config \
  ghcr.io/k8sgpt-ai/k8sgpt:latest analyze --explain

Install in Kubernetes (Operator)

bash

# Add Helm repository
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update

# Install K8sGPT Operator
helm install k8sgpt-operator k8sgpt/k8sgpt-operator \
  --namespace k8sgpt-operator-system \
  --create-namespace

Verify Installation

k8sgpt version

Quick Start

Basic Analysis

# Analyze cluster without AI explanations
k8sgpt analyze

# Analyze with AI explanations (requires configured backend)
k8sgpt analyze --explain

# Analyze with output filters
k8sgpt analyze --explain --filter=Pod

# Analyze specific namespace
k8sgpt analyze --namespace default --explain

AI Backend Configuration

OpenAI

# Add OpenAI backend
k8sgpt auth add openai

# With specific token
k8sgpt auth add openai --token $OPENAI_TOKEN

# With custom model
k8sgpt auth add openai --model gpt-4 --token $OPENAI_TOKEN

# Set as default
k8sgpt auth default openai

Azure OpenAI

k8sgpt auth add azureopenai \
  --baseurl https://your-instance.openai.azure.com/ \
  --engine your-deployment-name \
  --token $AZURE_OPENAI_TOKEN

Local AI (LocalAI)

k8sgpt auth add localai \
  --baseurl http://localhost:8080/v1 \
  --model ggml-gpt4all-j

# No token needed for local
k8sgpt auth default localai

Ollama

k8sgpt auth add ollama \
  --baseurl http://localhost:11434 \
  --model llama2

k8sgpt auth default ollama

Anthropic Claude

k8sgpt auth add claude \
  --token $ANTHROPIC_API_KEY \
  --model claude-3-sonnet-20240229

k8sgpt auth default claude

Cohere

k8sgpt auth add cohere --token $COHERE_TOKEN

k8sgpt auth default cohere

Google Gemini

k8sgpt auth add google \
  --token $GOOGLE_API_KEY \
  --model gemini-pro

k8sgpt auth default google

Amazon Bedrock

k8sgpt auth add amazonbedrock \
  --region us-east-1 \
  --model anthropic.claude-v2

List Configured Backends

# List all auth backends
k8sgpt auth list

# Show default backend
k8sgpt auth default

Remove Backend

k8sgpt auth remove openai

Analysis Commands

Basic Analysis

# Simple analysis
k8sgpt analyze

# With explanations
k8sgpt analyze --explain

# With specific backend
k8sgpt analyze --explain --backend openai

# Analyze all namespaces
k8sgpt analyze --explain --all-namespaces

# Analyze specific namespace
k8sgpt analyze --namespace kube-system --explain

Filtering

# Filter by resource type
k8sgpt analyze --explain --filter=Pod
k8sgpt analyze --explain --filter=Service
k8sgpt analyze --explain --filter=Deployment

# Multiple filters
k8sgpt analyze --explain --filter=Pod,Service,Deployment

# Exclude filters
k8sgpt analyze --explain --filter=Pod --no-filter=Service

Available Filters:

  • Pod
  • Service
  • Deployment
  • ReplicaSet
  • StatefulSet
  • Ingress
  • PersistentVolumeClaim
  • NetworkPolicy
  • Node
  • CronJob
  • HPA (HorizontalPodAutoscaler)
  • PodDisruptionBudget
  • GatewayClass
  • Gateway
  • HTTPRoute

Output Formats

# JSON output
k8sgpt analyze --explain --output=json

# YAML output
k8sgpt analyze --explain --output=yaml

# Save to file
k8sgpt analyze --explain > analysis-report.txt

# JSON to file
k8sgpt analyze --explain --output=json > analysis.json

Anonymous Data

# Anonymize sensitive data in output
k8sgpt analyze --explain --anonymize

# This replaces:
# - Pod names
# - Namespace names
# - Node names
# - Container names

Filters Configuration

List Available Filters

k8sgpt filters list

Add Custom Filters

# Enable specific analyzers
k8sgpt filters add Pod
k8sgpt filters add Service
k8sgpt filters add Deployment

# Enable multiple
k8sgpt filters add Pod,Service,Ingress

Remove Filters

k8sgpt filters remove Pod

Integration Commands

Trivy Integration

# Enable Trivy for vulnerability scanning
k8sgpt integration activate trivy

# List integrations
k8sgpt integration list

# Deactivate integration
k8sgpt integration deactivate trivy

Prometheus Integration

# Activate Prometheus (experimental)
k8sgpt integration activate prometheus

Serve Mode

Start Server

# Start K8sGPT server
k8sgpt serve

# With specific port
k8sgpt serve --port 8080

# With backend
k8sgpt serve --backend openai

API Endpoints

# Analyze endpoint
curl -X POST http://localhost:8080/analyze \
  -H "Content-Type: application/json" \
  -d '{
    "namespace": "default",
    "explain": true,
    "filters": ["Pod", "Service"]
  }'

# Health check
curl http://localhost:8080/health

K8sGPT Operator

Deploy Operator

# Install via Helm
helm repo add k8sgpt https://charts.k8sgpt.ai/
helm repo update

helm install k8sgpt-operator k8sgpt/k8sgpt-operator \
  --namespace k8sgpt-operator-system \
  --create-namespace \
  --set serviceMonitor.enabled=true

Create K8sGPT Resource

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-sample
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    backend: openai
    model: gpt-4
    secret:
      name: k8sgpt-secret
      key: openai-api-key
  noCache: false
  version: v0.3.30
  filters:
    - Pod
    - Service
    - Deployment
  sink:
    type: slack
    webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL

kubectl apply -f k8sgpt-resource.yaml

Create Secret for API Key

kubectl create secret generic k8sgpt-secret \
  --from-literal=openai-api-key=$OPENAI_TOKEN \
  -n k8sgpt-operator-system

Check Operator Status

bash

# Get K8sGPT resources
kubectl get k8sgpt -n k8sgpt-operator-system

# Describe K8sGPT resource
kubectl describe k8sgpt k8sgpt-sample -n k8sgpt-operator-system

# Check operator logs
kubectl logs -n k8sgpt-operator-system \
  -l control-plane=controller-manager -f

# Get Results (CRD)
kubectl get results -n k8sgpt-operator-system

# Describe a result
kubectl describe result <result-name> -n k8sgpt-operator-system

Custom Resource Definitions (Operator)

Result CRD

yaml

apiVersion: core.k8sgpt.ai/v1alpha1
kind: Result
metadata:
  name: pod-crashloopbackoff-example
  namespace: default
spec:
  kind: Pod
  name: failing-pod
  error:
    - text: "Back-off restarting failed container"
  details: "Container is in CrashLoopBackOff state"
  parentObject: "Deployment/my-deployment"

Check Results

bash

# List all results
kubectl get results --all-namespaces

# Get specific result
kubectl get result <result-name> -o yaml

# Watch results
kubectl get results -w

Common Analysis Scenarios

Analyze Pod Issues

bash

# Check all pod problems
k8sgpt analyze --filter=Pod --explain

# Specific namespace
k8sgpt analyze --namespace production --filter=Pod --explain

Analyze Service Issues

bash

# Check service configurations
k8sgpt analyze --filter=Service --explain

Analyze Network Issues

bash

# Check networking components
k8sgpt analyze --filter=Ingress,Service,NetworkPolicy --explain

Analyze Storage Issues

bash

# Check PVC problems
k8sgpt analyze --filter=PersistentVolumeClaim --explain

Analyze Node Issues

bash

# Check node health
k8sgpt analyze --filter=Node --explain

Analyze Deployment Issues

bash

# Check deployments, replicasets
k8sgpt analyze --filter=Deployment,ReplicaSet --explain

Full Cluster Scan

bash

# Comprehensive analysis
k8sgpt analyze --explain --all-namespaces

Output Examples

Without Explanation

0: Service default/my-service
- Error: Service has no endpoints, pods may not be matching the selector

With AI Explanation

0: Service default/my-service
- Error: Service has no endpoints, pods may not be matching the selector
- AI Explanation: This error typically occurs when:
  1. The selector in the Service doesn't match any Pod labels
  2. All Pods matching the selector are not in Running state
  3. The Pods are in a different namespace than expected
  
  Solutions:
  1. Check that your Pod labels match the Service selector
  2. Verify Pods are running: kubectl get pods -l app=my-app
  3. Update Service selector or Pod labels as needed

Configuration File

Config Location

bash

# Default config location
~/.config/k8sgpt/k8sgpt.yaml

# View config
cat ~/.config/k8sgpt/k8sgpt.yaml

Example Config

yaml

ai:
  backend: openai
  model: gpt-4
  baseurl: ""
  engine: ""
  temperature: 0.7
  topP: 1.0
kubeconfig: ~/.kube/config
kubecontext: ""
language: english
filters:
  - Pod
  - Service
  - Deployment
integrations:
  trivy:
    enabled: true
    skipInstall: false
sink:
  type: ""
  webhook: ""

Advanced Usage

Language Support

bash

# Set language for AI responses
k8sgpt analyze --explain --language spanish
k8sgpt analyze --explain --language french
k8sgpt analyze --explain --language german
k8sgpt analyze --explain --language chinese

# Available languages:
# english, spanish, french, german, italian, portuguese, dutch,
# russian, chinese, japanese, korean, turkish

Custom Analyzers

bash

# List available analyzers
k8sgpt analyze --list-filters

# Use custom analyzer plugin
k8sgpt analyze --with-doc

Caching

bash

# Disable cache
k8sgpt analyze --no-cache --explain

# Default: cache enabled for performance

Max Concurrency

bash

# Control concurrent analysis
k8sgpt analyze --max-concurrency 10

Troubleshooting with K8sGPT

Scenario 1: CrashLoopBackOff

bash

# Analyze pod crashes
k8sgpt analyze --filter=Pod --explain

# Example output with AI:
# "The pod is crashing because the container is exiting with error code 1.
# This typically indicates:
# 1. Missing environment variables
# 2. Application configuration error
# 3. Failed health checks
# Check logs with: kubectl logs <pod-name>"

Scenario 2: Service Not Accessible

bash

k8sgpt analyze --filter=Service,Ingress --explain

# AI will check:
# - Service selector matches
# - Endpoints exist
# - Port configurations
# - Ingress rules

Scenario 3: Resource Limits

bash

k8sgpt analyze --filter=Pod,Node --explain

# AI will identify:
# - Memory/CPU pressure
# - OOMKilled pods
# - Node resource constraints
# - Recommendations for limits

Scenario 4: Storage Issues

bash

k8sgpt analyze --filter=PersistentVolumeClaim --explain

# AI will diagnose:
# - PVC binding issues
# - Storage class problems
# - Volume mount errors

Scenario 5: Network Policies

bash

k8sgpt analyze --filter=NetworkPolicy --explain

# AI will analyze:
# - Policy conflicts
# - Connectivity issues
# - Ingress/egress rules

Integration with CI/CD

GitLab CI

yaml

k8sgpt-scan:
  stage: test
  image: ghcr.io/k8sgpt-ai/k8sgpt:latest
  script:
    - k8sgpt auth add openai --token $OPENAI_TOKEN
    - k8sgpt analyze --explain --output=json > k8sgpt-report.json
  artifacts:
    reports:
      json: k8sgpt-report.json

GitHub Actions

yaml

name: K8sGPT Analysis
on: [push]
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup K8sGPT
        run: |
          curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/latest/download/k8sgpt_amd64.deb
          sudo dpkg -i k8sgpt_amd64.deb
      - name: Analyze Cluster
        env:
          OPENAI_TOKEN: ${{ secrets.OPENAI_TOKEN }}
        run: |
          k8sgpt auth add openai --token $OPENAI_TOKEN
          k8sgpt analyze --explain --output=json

Jenkins Pipeline

groovy

pipeline {
    agent any
    stages {
        stage('K8sGPT Analysis') {
            steps {
                sh '''
                    k8sgpt auth add openai --token $OPENAI_TOKEN
                    k8sgpt analyze --explain > k8sgpt-report.txt
                '''
                archiveArtifacts artifacts: 'k8sgpt-report.txt'
            }
        }
    }
}

Slack/Teams Integration

Slack Webhook (Operator)

yaml

apiVersion: core.k8sgpt.ai/v1alpha1
kind: K8sGPT
metadata:
  name: k8sgpt-slack
  namespace: k8sgpt-operator-system
spec:
  ai:
    enabled: true
    backend: openai
    secret:
      name: k8sgpt-secret
      key: openai-api-key
  filters:
    - Pod
    - Service
  sink:
    type: slack
    webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL

Microsoft Teams

yaml

spec:
  sink:
    type: msteams
    webhook: https://outlook.office.com/webhook/YOUR-WEBHOOK

Security Best Practices

1. API Key Management

bash

# Use environment variables
export OPENAI_TOKEN=sk-xxx
k8sgpt auth add openai

# Don't hardcode tokens in scripts
# Use secrets in Kubernetes

# Rotate keys regularly
k8sgpt auth remove openai
k8sgpt auth add openai --token $NEW_TOKEN

2. Anonymization

bash

# Always use anonymization for sharing reports
k8sgpt analyze --explain --anonymize > report.txt

# This protects:
# - Internal naming conventions
# - Resource identifiers
# - Namespace names

3. RBAC for Operator

yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: k8sgpt-sa
  namespace: k8sgpt-operator-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: k8sgpt-role
rules:
- apiGroups: [""]
  resources: ["pods", "services", "nodes", "namespaces"]
  verbs: ["get", "list"]
- apiGroups: ["apps"]
  resources: ["deployments", "replicasets", "statefulsets"]
  verbs: ["get", "list"]

Performance Tips

1. Use Filters

bash

# Don't analyze everything unless needed
k8sgpt analyze --filter=Pod,Service

# More targeted = faster results

2. Namespace Scoping

bash

# Analyze specific namespace
k8sgpt analyze --namespace production

# Avoid --all-namespaces for large clusters

3. Caching

bash

# Use cache for repeated analysis
k8sgpt analyze --explain  # cache enabled by default

# Clear cache when needed
k8sgpt cache clear

4. Concurrency Control

bash

# Reduce concurrency for stability
k8sgpt analyze --max-concurrency 5

Comparing AI Backends

BackendSpeedCostQualityLocalOpenAI GPT-4MediumHighExcellentNoOpenAI GPT-3.5FastLowGoodNoClaudeMediumMediumExcellentNoLocalAIFastFreeGoodYesOllamaFastFreeGoodYesGeminiFastMediumVery GoodNo

Recommendations:

  • Production: OpenAI GPT-4 or Claude
  • Development: GPT-3.5 or Ollama
  • Air-gapped: LocalAI or Ollama
  • Cost-sensitive: Ollama or LocalAI

Useful Commands Reference

bash

# Installation
brew install k8sgpt

# Authentication
k8sgpt auth add openai --token $TOKEN
k8sgpt auth list
k8sgpt auth default openai

# Analysis
k8sgpt analyze --explain
k8sgpt analyze --filter=Pod --explain
k8sgpt analyze --namespace default --explain
k8sgpt analyze --all-namespaces --explain
k8sgpt analyze --explain --anonymize

# Output formats
k8sgpt analyze --explain --output=json
k8sgpt analyze --explain --output=yaml

# Filters
k8sgpt filters list
k8sgpt filters add Pod,Service

# Integrations
k8sgpt integration activate trivy
k8sgpt integration list

# Server mode
k8sgpt serve --port 8080

# Operator
helm install k8sgpt-operator k8sgpt/k8sgpt-operator
kubectl get k8sgpt -n k8sgpt-operator-system
kubectl get results --all-namespaces

# Version
k8sgpt version

# Help
k8sgpt --help
k8sgpt analyze --help

Real-World Examples

Example 1: Debug Failed Deployment

bash

$ k8sgpt analyze --filter=Deployment,Pod --explain

0: Deployment default/nginx-deployment
- Error: Deployment has minimum availability warning
- AI Explanation: The deployment shows 0/3 pods are ready.
  This is likely because:
  1. Image pull failures - check ImagePullBackOff status
  2. Resource constraints - insufficient CPU/memory on nodes
  3. Configuration errors in pod spec
  
  Run: kubectl describe deployment nginx-deployment
  Check pod events: kubectl describe pod <pod-name>

Example 2: Service Discovery Issue

bash

$ k8sgpt analyze --filter=Service --explain

0: Service default/api-service  
- Error: Service has no endpoints
- AI Explanation: The service selector (app: api) doesn't match
  any running pods. Either:
  1. Pods aren't running: kubectl get pods -l app=api
  2. Label mismatch: Check pod labels vs service selector
  3. Pods in different namespace
  
  Fix: Update service selector or pod labels to match

Example 3: Resource Exhaustion

bash

$ k8sgpt analyze --filter=Node,Pod --explain

0: Node ip-10-0-1-100.ec2.internal
- Error: Node has MemoryPressure condition
- AI Explanation: Node is running low on memory (>85% used).
  This causes pod evictions and scheduling failures.
  
  Actions:
  1. Check top memory consuming pods: kubectl top pods
  2. Add resource limits to pods
  3. Scale down non-critical workloads
  4. Add more nodes to cluster

Community Resources


Contributing

bash

# Clone repository
git clone https://github.com/k8sgpt-ai/k8sgpt.git
cd k8sgpt

# Build from source
go build -o k8sgpt main.go

# Run tests
go test ./...

# Create custom analyzer
# See: https://github.com/k8sgpt-ai/k8sgpt/tree/main/pkg/analyzer

Tips & Tricks

  1. Combine with kubectl: Use K8sGPT to identify issues, kubectl to fix them
  2. Regular scans: Run K8sGPT in CI/CD for continuous cluster health checks
  3. Use filters: Focus on specific resource types for faster analysis
  4. Anonymize reports: Always anonymize before sharing externally
  5. Local AI for dev: Use Ollama/LocalAI for free development testing
  6. Operator for production: Deploy operator for continuous monitoring
  7. Slack integration: Get real-time alerts for cluster issues
  8. Cost optimization: Use GPT-3.5 for routine checks, GPT-4 for complex issues
  9. Cache wisely: Enable cache for repeated analysis, disable for fresh data
  10. Document patterns: Save common issues and AI solutions for team reference