Orchestration

Deploy Rook Ceph for Kubernetes Storage

Introduction

In the dynamic world of Kubernetes, persistent storage has long been a critical challenge. While stateless applications thrive on the ephemeral nature of containers, stateful workloads—databases, message queues, and AI/ML data stores—demand robust, reliable, and scalable storage solutions that can survive pod restarts and node failures. Traditional network-attached storage (NAS) or storage area networks (SAN) often struggle to integrate seamlessly with the cloud-native paradigm, creating operational overhead and bottlenecks. This is where Rook Ceph steps in, transforming your Kubernetes cluster into a self-managing, self-scaling, and self-healing distributed storage system.

Rook is an open-source Cloud Native Computing Foundation (CNCF) project that brings block, file, and object storage capabilities directly into your Kubernetes environment. It does this by orchestrating Ceph, a powerful open-source distributed storage system, on top of your cluster’s nodes. By leveraging Kubernetes primitives like Deployments, DaemonSets, and Custom Resource Definitions (CRDs), Rook automates the deployment, management, and scaling of Ceph clusters, making enterprise-grade storage accessible and manageable for cloud-native applications. Imagine no longer needing external storage appliances or complex storage management interfaces; with Rook Ceph, your storage becomes an integral, declarative part of your Kubernetes infrastructure.

This guide will walk you through the process of deploying and configuring Rook Ceph on your Kubernetes cluster, enabling you to provision highly available and resilient persistent volumes for your stateful applications. We’ll cover everything from the initial setup to creating storage classes and consuming block, file, and object storage, ensuring you have a comprehensive understanding of how to leverage this powerful combination. By the end, you’ll be equipped to provide your Kubernetes workloads with the persistent storage they deserve, fully integrated and managed within the cloud-native ecosystem.

TL;DR: Rook Ceph on Kubernetes

Rook orchestrates Ceph to provide highly available, scalable, and self-managing distributed storage (block, file, object) directly within your Kubernetes cluster. It simplifies the deployment and lifecycle management of Ceph via Kubernetes CRDs.

Key Takeaways:

  • Distributed Storage: Turns your Kubernetes nodes into a robust, fault-tolerant storage cluster.
  • Kubernetes Native: Managed entirely through Kubernetes CRDs, Deployments, and DaemonSets.
  • Versatile: Supports Block (RBD), File (CephFS), and Object (RGW) storage.
  • Automation: Handles deployment, scaling, healing, and upgrades of Ceph components.

Quick Setup Commands:

# Clone Rook Ceph repository
git clone --single-branch --branch v1.12.5 https://github.com/rook/rook.git
cd rook/deploy/examples

# Create Rook operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# Wait for operator to be ready
kubectl -n rook-ceph get pod -l app=rook-ceph-operator --watch

# Create Ceph cluster (ensure OSDs are configured correctly)
kubectl create -f cluster.yaml

# Verify Ceph cluster health
kubectl -n rook-ceph get cephcluster
kubectl -n rook-ceph status

# Create StorageClass for RBD (Block Storage)
kubectl create -f csi/rbd/storageclass.yaml

# Create StorageClass for CephFS (File Storage)
kubectl create -f csi/cephfs/storageclass.yaml

Prerequisites

Before embarking on your Rook Ceph journey, ensure you have the following:

  • A Kubernetes Cluster: Version 1.22+ is recommended. This can be a local cluster (e.g., Kind, Minikube) or a cloud-based cluster (EKS, GKE, AKS). Ensure your nodes have sufficient resources (CPU, RAM). For production, dedicated worker nodes for storage are often preferred.
  • `kubectl` Configured: You need `kubectl` installed and configured to interact with your Kubernetes cluster. Refer to the official Kubernetes documentation for installation instructions.
  • Storage Devices: Each node intended to host Ceph OSDs (Object Storage Daemons) must have at least one raw, unformatted, and unpartitioned block device. These can be physical disks, cloud provider volumes (e.g., EBS, Persistent Disk), or even loop devices for testing. Rook will automatically format and manage these devices. Do NOT use disks with existing data, as Rook will erase them.
  • Network Connectivity: Ensure proper network connectivity between your Kubernetes nodes. Ceph is sensitive to network latency, so a fast, reliable network is crucial for performance. For advanced networking considerations, especially in a production environment, you might explore solutions like Cilium WireGuard Encryption for secure and high-performance pod-to-pod communication.
  • `git` Installed: To clone the Rook repository.
  • Basic Kubernetes Knowledge: Familiarity with Kubernetes concepts such as Pods, Deployments, Services, Persistent Volumes (PVs), and Persistent Volume Claims (PVCs) is assumed.

Step-by-Step Guide: Deploying Rook Ceph on Kubernetes

This section will guide you through the complete process of setting up Rook Ceph, from deploying the operator to provisioning different types of storage.

Step 1: Clone the Rook Repository

First, we need to clone the Rook repository from GitHub. This repository contains all the necessary Kubernetes manifests for deploying the Rook operator and a sample Ceph cluster. We’ll clone a specific stable branch to ensure compatibility and avoid potential breaking changes from the `master` branch.

# Clone the Rook repository for a stable release (e.g., v1.12.5)
git clone --single-branch --branch v1.12.5 https://github.com/rook/rook.git

# Navigate to the appropriate directory containing example manifests
cd rook/deploy/examples

Explanation: The `git clone` command fetches the Rook project’s source code. We specify `–single-branch –branch v1.12.5` to get only the `v1.12.5` branch, which is a stable release. It’s always best practice to use a specific version for production deployments to ensure stability and predictability. The `cd rook/deploy/examples` command changes your current directory to where the example Kubernetes YAML files are located, which we will use for deployment.

Step 2: Deploy the Rook Operator

The Rook operator is the brain of your Rook Ceph deployment. It watches for Rook-specific Custom Resource Definitions (CRDs) like `CephCluster`, `CephBlockPool`, `CephFilesystem`, and `CephObjectStore`. When these CRDs are created or modified, the operator translates those declarations into actual Ceph components (Monitors, OSDs, Managers, etc.) and manages their lifecycle within Kubernetes.

# Create the Rook CRDs, common resources, and the Rook operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

Explanation:

  • `crds.yaml`: Defines the Custom Resource Definitions (CRDs) that Rook uses to represent Ceph components and configurations within Kubernetes. These CRDs extend the Kubernetes API, allowing you to declare Ceph resources just like you would declare a Pod or a Deployment.
  • `common.yaml`: Contains common resources like the `rook-ceph` Namespace, necessary RBAC (Role-Based Access Control) configurations, and service accounts that the Rook operator will use.
  • `operator.yaml`: Deploys the Rook operator itself as a Kubernetes Deployment. This operator is responsible for orchestrating the Ceph cluster.

Verify:
Check if the Rook operator pod is running. It might take a minute or two for the pod to transition to the `Running` state.

# Watch the Rook operator pod status
kubectl -n rook-ceph get pod -l app=rook-ceph-operator --watch

Expected Output (after some time):

NAME                                READY   STATUS    RESTARTS   AGE
rook-ceph-operator-7c8d9f4c5-abcde   1/1     Running   0          2m

Step 3: Deploy the Ceph Cluster

With the Rook operator running, we can now define and deploy our Ceph cluster. The `cluster.yaml` manifest declares a `CephCluster` custom resource, specifying details like the number of Ceph monitors, which nodes to use for OSDs, and the storage devices to consume.

Before applying `cluster.yaml`, you must modify it to match your environment’s storage configuration. The most critical section is `storage`. You can choose to use all available devices, specific devices by path, or even a specific node selector.

Example Modification for `cluster.yaml`:

Open `cluster.yaml` in your editor. Locate the `storage` section. Here are common configurations:

Option A: Use all unformatted devices on all nodes: (Simplest for testing)

# ...
spec:
  cephVersion:
    image: "quay.io/ceph/ceph:v17.2.7" # Or another desired Ceph version
  dataDirHostPath: /var/lib/rook # Important for persistent metadata
  mon:
    count: 3
    allowMultiplePerNode: false
  storage:
    useAllNodes: true # Use all available nodes
    useAllDevices: true # Use all available unformatted devices on those nodes
    # deviceFilter: "^sd[b-g]" # Uncomment to filter devices by regex, e.g., sdb, sdc, etc.
    # config:
    #   databaseSizeMB: "1024" # Size of the RocksDBWAL (for OSDs)
# ...

Option B: Use specific devices on specific nodes: (Recommended for production)

# ...
spec:
  cephVersion:
    image: "quay.io/ceph/ceph:v17.2.7"
  dataDirHostPath: /var/lib/rook
  mon:
    count: 3
    allowMultiplePerNode: false
  storage:
    useAllNodes: false # Do not use all nodes by default
    nodes: # Specify nodes and their devices
      - name: "kube-worker-1" # Replace with your actual node name
        devices:
          - name: "sdb" # Replace with your actual device name (e.g., sdb, vdb, nvme0n1)
          - name: "sdc"
      - name: "kube-worker-2"
        devices:
          - name: "sdb"
      - name: "kube-worker-3"
        devices:
          - name: "sdb"
# ...

Important considerations for `cluster.yaml`:

  • `cephVersion.image`: Ensure you use a recent, stable Ceph image. The example uses `v17.2.7` (Quincy). Check Quay.io/ceph/ceph for available tags.
  • `dataDirHostPath`: This path on the host is used for Ceph metadata and configuration. It should ideally be on a persistent disk or a robust filesystem.
  • `mon.count`: For production, always use an odd number (3 or 5) for high availability.
  • `storage`: This is the most crucial part. Ensure the `devices` or `deviceFilter` correctly identifies raw, unformatted disks on your nodes. Rook will format these disks.

Once you’ve edited `cluster.yaml` to suit your storage configuration, apply it:

# Create the Ceph cluster
kubectl create -f cluster.yaml

Explanation: The Rook operator detects the `CephCluster` CRD and begins deploying Ceph components: Monitors (`mon`), Managers (`mgr`), and Object Storage Daemons (`osd`). The OSDs are the workhorses, storing your data across the specified disks. This process can take several minutes, depending on the number of OSDs and the speed of your disks.

Verify:
Monitor the deployment of Ceph pods. You should see `mon` (monitors), `mgr` (managers), and `osd` (object storage daemons) pods coming up.

# Watch all Ceph pods
kubectl -n rook-ceph get pod --watch

# Check Ceph cluster health
kubectl -n rook-ceph get cephcluster

Expected Output (after several minutes, once all pods are running):

# kubectl -n rook-ceph get pod
NAME                                           READY   STATUS    RESTARTS   AGE
csi-cephfsplugin-658b4d8d9-2r8z5               3/3     Running   0          5m
csi-cephfsplugin-provisioner-789f5bc9-d89z6    5/5     Running   0          5m
csi-rbdplugin-5b9c7d4f9-w3x2v                  3/3     Running   0          5m
csi-rbdplugin-provisioner-7b7c8d9-x1y2z        5/5     Running   0          5m
rook-ceph-mgr-a-7bb69d7b5f-m5s2x               1/1     Running   0          6m
rook-ceph-mon-a-69d7b5f-m5s2x                  1/1     Running   0          7m
rook-ceph-mon-b-7c8d9f4c5-w3x2v                1/1     Running   0          7m
rook-ceph-mon-c-8d9f4c5-x1y2z                  1/1     Running   0          7m
rook-ceph-operator-7c8d9f4c5-abcde             1/1     Running   0          12m
rook-ceph-osd-0-5b9c7d4f9-qwer                 1/1     Running   0          5m
rook-ceph-osd-1-7c8d9f4c5-tyui                 1/1     Running   0          5m
rook-ceph-osd-2-8d9f4c5-asdf                   1/1     Running   0          5m
# ... (more OSDs if you have more disks/nodes)

# kubectl -n rook-ceph get cephcluster
NAME        DATADIRHOSTPATH   MONCOUNT   AGE   STATE      HEALTH      MESSAGE
rook-ceph   /var/lib/rook     3          7m    Created    HEALTH_OK   Ceph cluster is running

If `HEALTH` is `HEALTH_OK`, your Ceph cluster is successfully deployed and running!

Step 4: Configure Ceph Block Storage (RBD)

Ceph Block Device (RBD) provides block-level storage, ideal for databases and other applications that require high-performance, raw device access. Rook exposes RBD through the CSI (Container Storage Interface) driver, allowing Kubernetes to dynamically provision `PersistentVolume`s.

# Navigate to the CSI RBD examples directory
cd ../csi/rbd/

# Create a CephBlockPool. This defines a pool for RBD volumes.
kubectl create -f storageclass.yaml

Explanation:

  • The `storageclass.yaml` in this directory defines a `StorageClass` named `rook-ceph-block`. This `StorageClass` tells Kubernetes how to provision `PersistentVolume`s using the Ceph RBD CSI driver.
  • It references a `CephBlockPool` named `replicapool` (defined implicitly or by default in the `storageclass.yaml`) which specifies the replication level for data stored in this pool. By default, it’s typically set to 3x replication for high availability.

Verify:
Check if the `StorageClass` has been created.

# Get the StorageClass
kubectl get storageclass rook-ceph-block

Expected Output:

NAME              PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   2m

Now, let’s test it by provisioning a `PersistentVolumeClaim` (PVC) and using it with a simple Pod.

# rbd-test-pvc-pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
    - ReadWriteOnce # Can be mounted as read-write by a single node
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rbd-test-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: rbd-test-app
  template:
    metadata:
      labels:
        app: rbd-test-app
    spec:
      containers:
        - name: rbd-test-container
          image: busybox
          command: ["sh", "-c", "echo 'Hello from Rook Ceph RBD!' > /mnt/test/hello.txt && sleep 3600"]
          volumeMounts:
            - name: rbd-volume
              mountPath: /mnt/test
      volumes:
        - name: rbd-volume
          persistentVolumeClaim:
            claimName: rbd-pvc
# Apply the PVC and Deployment
kubectl apply -f rbd-test-pvc-pod.yaml

Verify:
Check if the PVC is bound and the Pod is running.

# Check PVC status
kubectl get pvc rbd-pvc

# Check Pod status
kubectl get pod -l app=rbd-test-app

# Verify data persistence
kubectl exec deployment/rbd-test-app -- cat /mnt/test/hello.txt

Expected Output:

# kubectl get pvc rbd-pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-1a2b3c4d-5e6f-7890-1234-567890abcdef   1Gi        RWO            rook-ceph-block   1m

# kubectl get pod -l app=rbd-test-app
NAME                             READY   STATUS    RESTARTS   AGE
rbd-test-app-7c8d9f4c5-vwxyz     1/1     Running   0          1m

# kubectl exec deployment/rbd-test-app -- cat /mnt/test/hello.txt
Hello from Rook Ceph RBD!

Step 5: Configure Ceph File Storage (CephFS)

Ceph Filesystem (CephFS) provides a POSIX-compliant shared filesystem. This is perfect for use cases where multiple pods or applications need to access the same data simultaneously, such as content management systems, shared logs, or machine learning datasets. For large-scale AI/ML workloads, understanding how to efficiently schedule pods to access this shared data can be crucial. For more on that, see our guide on Running LLMs on Kubernetes: GPU Scheduling Best Practices.

# Navigate to the CSI CephFS examples directory
cd ../cephfs/

# Create a CephFilesystem and StorageClass.
# The `filesystem.yaml` defines the CephFilesystem, and `storageclass.yaml` creates the K8s StorageClass.
kubectl create -f filesystem.yaml -f storageclass.yaml

Explanation:

  • `filesystem.yaml`: Defines a `CephFilesystem` custom resource. This CRD instructs Rook to set up a CephFS within your Ceph cluster, including metadata servers (MDS) and data pools.
  • `storageclass.yaml`: Defines a `StorageClass` named `rook-ceph-filesystem`. This `StorageClass` uses the CephFS CSI driver to provision `PersistentVolume`s that map to paths within the CephFS.

Verify:
Check if the `StorageClass` and Ceph MDS pods have been created.

# Get the StorageClass
kubectl get storageclass rook-ceph-filesystem

# Check CephFS MDS pods (should be two, one active, one standby)
kubectl -n rook-ceph get pod -l app=rook-ceph-mds

Expected Output:

# kubectl get storageclass rook-ceph-filesystem
NAME                   PROVISIONER                       RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-filesystem   rook-ceph.cephfs.csi.ceph.com     Delete          Immediate           true                   1m

# kubectl -n rook-ceph get pod -l app=rook-ceph-mds
NAME                                  READY   STATUS    RESTARTS   AGE
rook-ceph-mds-myfs-a-69d7b5f-m5s2x    1/1     Running   0          1m
rook-ceph-mds-myfs-b-7c8d9f4c5-w3x2v  1/1     Running   0          1m

Now, let’s test CephFS with a PVC and a Pod.

# cephfs-test-pvc-pod.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
    - ReadWriteMany # Can be mounted as read-write by multiple nodes
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cephfs-test-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cephfs-test-app
  template:
    metadata:
      labels:
        app: cephfs-test-app
    spec:
      containers:
        - name: cephfs-test-container
          image: busybox
          command: ["sh", "-c", "echo 'Hello from Rook CephFS!' > /mnt/test/hello.txt && sleep 3600"]
          volumeMounts:
            - name: cephfs-volume
              mountPath: /mnt/test
      volumes:
        - name: cephfs-volume
          persistentVolumeClaim:
            claimName: cephfs-pvc
# Apply the PVC and Deployment
kubectl apply -f cephfs-test-pvc-pod.yaml

Verify:
Check if the PVC is bound and the Pod is running, then verify the data.

# Check PVC status
kubectl get pvc cephfs-pvc

# Check Pod status
kubectl get pod -l app=cephfs-test-app

# Verify data persistence
kubectl exec deployment/cephfs-test-app -- cat /mnt/test/hello.txt

Expected Output:

# kubectl get pvc cephfs-pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS         AGE
cephfs-pvc   Bound    pvc-1b2c3d4e-6f70-8901-2345-67890abcdef    1Gi        RWX            rook-ceph-filesystem   1m

# kubectl get pod -l app=cephfs-test-app
NAME                              READY   STATUS    RESTARTS   AGE
cephfs-test-app-7c8d9f4c5-xyzab   1/1     Running   0          1m

# kubectl exec deployment/cephfs-test-app -- cat /mnt/test/hello.txt
Hello from Rook CephFS!

Step 6: Configure Ceph Object Storage (RGW)

Ceph Object Gateway (RGW) provides S3/Swift compatible object storage, making it ideal for cloud-native applications that need to store large amounts of unstructured data like images, videos, or backups.

# Navigate to the object storage examples directory
cd ../object/

# Create a CephObjectStore and a StorageClass.
# The `object.yaml` defines the CephObjectStore, and `storageclass.yaml` creates the K8s StorageClass.
kubectl create -f object.yaml -f storageclass.yaml

Explanation:

  • `object.yaml`: Defines a `CephObjectStore` custom resource. This instructs Rook to deploy Ceph RGW instances (similar to an S3 compatible service) within your cluster.
  • `storageclass.yaml`: Defines a `StorageClass` named `rook-ceph-object`. This `StorageClass` is special; it doesn’t provision volumes but rather creates S3 buckets within the CephObjectStore.

Verify:
Check if the RGW pods are running and the `StorageClass` is created.

# Check RGW pods
kubectl -n rook-ceph get pod -l app=rook-ceph-rgw

# Get the StorageClass
kubectl get storageclass rook-ceph-object

Expected Output:

# kubectl -n rook-ceph get pod -l app=rook-ceph-rgw
NAME                                      READY   STATUS    RESTARTS   AGE
rook-ceph-rgw-my-store-a-69d7b5f-m5s2x    1/1     Running   0          1m

# kubectl get storageclass rook-ceph-object
NAME                        PROVISIONER                           RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-object            rook-ceph.ceph.csi.ceph.com           Delete          Immediate           false                  1m

To interact with the RGW, you’ll need the access key and secret key, which Rook generates and stores in a Kubernetes `Secret`.

# Get the RGW credentials (access key and secret key)
kubectl -n rook-ceph get secret rook-ceph-object-user-my-user -o yaml | grep "accessKey\|secretKey" | base64 --decode

Expected Output:

accessKey: YOUR_ACCESS_KEY
secretKey: YOUR_SECRET_KEY

You can now use these credentials with any S3-compatible client (e.g., `s3cmd`, AWS CLI) to interact with your Ceph Object Store. You’ll also need the RGW endpoint, which is a Kubernetes Service:

# Get the RGW service endpoint
kubectl -n rook-ceph get svc rook-ceph-rgw-my-store -o jsonpath='{.spec.clusterIP}:{.spec.ports[0].port}'

Expected Output:

10.43.123.45:80

For external access, you would typically expose this service using a `NodePort`, `LoadBalancer`, or an Ingress/Gateway API resource. For example, to expose it via a LoadBalancer (if your cloud provider supports it):

# rgw-lb.yaml
apiVersion: v1
kind: Service
metadata:
  name: rook-ceph-rgw-my-store-external
  namespace: rook-ceph
spec:
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: 8080 # RGW default port inside the container
  selector:
    app: rook-ceph-rgw
    ceph_daemon_id: my-store
  type: LoadBalancer
kubectl apply -f rgw-lb.yaml

Wait for the LoadBalancer to get an external IP, then use that IP/port with your S3 client.

Production Considerations

Deploying Rook Ceph in production requires careful planning beyond the basic setup. Here are key aspects to consider:

  1. Dedicated Storage Nodes: For optimal performance and isolation, consider using dedicated worker nodes for your Ceph OSDs. This prevents storage I/O from contending with application workloads and simplifies resource management. These nodes should have fast, reliable disks and network interfaces.
  2. Disk Selection:
    • Type: Use enterprise-grade SSDs or NVMe drives for OSDs for best performance. Avoid HDD-only setups for critical workloads.
    • Raw Devices: Ensure disks are raw, unformatted, and unpartitioned. Rook will manage them.
    • Journal/WAL Separation: For very high-performance requirements, consider separating the Ceph OSD WAL (Write-Ahead Log) and DB (RocksDB) onto faster, smaller NVMe drives, while the actual data resides on larger SSDs. Rook supports this configuration through `config` in the storage spec.
  3. Network Configuration:
    • Dedicated Network: For large clusters, a dedicated network for Ceph inter-node communication (public and cluster network) can significantly improve performance and stability.
    • High Bandwidth/Low Latency: Ceph is very network-intensive. Ensure your network can handle the traffic.
    • Security: Implement Kubernetes Network Policies to restrict communication to and from Ceph components to only necessary services. Consider encrypting traffic, potentially with solutions like Cilium WireGuard Encryption if your CNI supports it.
  4. Ceph Version: Always use a stable, well-tested Ceph version. Rook’s compatibility matrix with Ceph versions is crucial. Check the official Rook documentation for supported versions.
  5. Monitoring and Alerting:
    • Rook-Ceph Dashboard: Enable the Ceph dashboard for a visual overview of your cluster health.
    • Prometheus/Grafana: Integrate Ceph metrics with Prometheus and Grafana for detailed monitoring and custom alerts. Rook provides exporters for this.
    • Logging: Ensure centralized logging is configured for Ceph pods to aid in troubleshooting. You can gain deeper insights into network and system activity using tools like eBPF Observability with Hubble.
  6. Backup and Disaster Recovery:
    • Snapshots: Utilize Ceph’s snapshot capabilities for quick recovery from data corruption.
    • Replication: Ceph provides data replication, but this is not a backup. You still need off-cluster backups. For object storage, consider cross-region replication or external backup solutions.
    • Disaster Recovery Plan: Have a clear plan for recovering your Ceph cluster and data in case of a major outage.
  7. Resource Management:
    • Limits and Requests: Set appropriate CPU and memory requests/limits for Ceph pods to prevent resource starvation or overconsumption.
    • Autoscaling: While Rook manages Ceph scaling, consider node autoscaling solutions like Karpenter to dynamically add nodes with storage as your cluster demands grow.
  8. Security:
    • RBAC: Ensure strict RBAC policies are in place for Rook and Ceph components.
    • Encryption: Rook supports encryption of OSD data at rest. Configure this in your `CephCluster` manifest.
    • Secrets: Manage Ceph access keys and other sensitive information using Kubernetes Secrets.
  9. Upgrades: Plan for Rook

Leave a Reply

Your email address will not be published. Required fields are marked *