Orchestration

Kubernetes Persistent Storage with CSI Drivers

Introduction

In the dynamic world of containerized applications, statelessness is often lauded as a virtue. However, real-world applications, from databases to message queues and logging systems, inherently require persistent storage to retain data across pod restarts, scaling events, and even node failures. Without a robust storage solution, your application’s state would vanish, leading to data loss and operational nightmares. This is where Kubernetes Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) come into play, providing an abstraction layer that decouples storage consumption from storage provision.

While PVs and PVCs offer a powerful framework, the underlying mechanism for connecting Kubernetes to diverse storage backends is handled by Container Storage Interface (CSI) drivers. CSI is a standard that allows Kubernetes to expose arbitrary storage systems to containerized workloads, enabling seamless integration with cloud provider storage (AWS EBS, GCP Persistent Disk, Azure Disk), network-attached storage (NFS, CephFS), and even local storage. This tutorial will demystify the process of leveraging CSI drivers to provision and consume persistent storage in your Kubernetes clusters, ensuring your stateful applications can thrive in a resilient and scalable environment.

TL;DR: Persistent Volumes with CSI Drivers

Kubernetes Persistent Volumes (PVs) and Persistent Volume Claims (PVCs) abstract storage, allowing stateful applications to retain data. CSI drivers enable Kubernetes to interface with various storage backends (cloud, NFS, local). This guide shows you how to provision storage via CSI, create PVCs, and attach them to Pods.

Key Commands:

  • kubectl get storageclass: List available StorageClasses.
  • kubectl apply -f storageclass.yaml: Create a custom StorageClass.
  • kubectl apply -f pvc.yaml: Request storage with a PersistentVolumeClaim.
  • kubectl apply -f pod-with-pvc.yaml: Deploy a Pod consuming the PVC.
  • kubectl get pv,pvc: Monitor Persistent Volumes and Claims.
  • kubectl delete -f .: Clean up all resources.

Prerequisites

To follow this guide, you’ll need:

  • A running Kubernetes cluster (v1.13+ for CSI support). This could be a local cluster like Minikube or Kind, or a cloud-managed cluster (EKS, GKE, AKS).
  • kubectl installed and configured to connect to your cluster. You can find installation instructions on the official Kubernetes documentation.
  • Basic understanding of Kubernetes concepts like Pods, Deployments, and Services.
  • Familiarity with YAML syntax for Kubernetes object definitions.

Step-by-Step Guide

This guide will walk you through setting up a simple NGINX application that uses a Persistent Volume provisioned by a CSI driver. We’ll cover dynamic provisioning, where the storage is created on demand.

Step 1: Identify Available CSI Drivers and StorageClasses

Before you can provision storage, you need to know what CSI drivers are installed and what StorageClasses are available in your cluster. StorageClasses abstract the underlying storage system, allowing administrators to define “classes” of storage with different performance characteristics, access modes, and reclaim policies. When a user requests storage via a PersistentVolumeClaim, they refer to a StorageClass, and the appropriate CSI driver provisions the actual storage.

Most cloud-managed Kubernetes clusters come with pre-installed CSI drivers and default StorageClasses. For example, GKE clusters often have standard and premium StorageClasses backed by GCP Persistent Disk CSI.

kubectl get storageclass

Verify: You should see a list of StorageClasses. Note the PROVISIONER column, which indicates the CSI driver. If you’re on a cloud provider, you’ll likely see provisioners like disk.csi.ceph.com, ebs.csi.aws.com, or pd.csi.storage.gke.io.

NAME                   PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
standard (default)     pd.csi.storage.gke.io           Delete          Immediate           true                   2d
premium                pd.csi.storage.gke.io           Delete          Immediate           true                   2d
nfs-client             cluster.local/nfs-subdir-external-provisioner   Delete          Immediate           true                   1d

If you don’t see any StorageClasses, or if you want to use a specific type of storage (e.g., NFS, Ceph, or a custom local path provisioner), you’ll need to install the relevant CSI driver. For instance, to use NFS, you might install the CSI Driver for NFS or a community-driven NFS Subdir External Provisioner. The installation process for CSI drivers varies significantly, but usually involves deploying a set of Kubernetes manifests (Deployments, DaemonSets, RBAC) that run the driver components.

Step 2: Create a Custom StorageClass (Optional, but Recommended)

While you can often use default StorageClasses, creating your own gives you fine-grained control over storage parameters. This is especially useful for specifying performance tiers, replication settings, or reclaim policies. For this example, let’s assume we want a `ReadWriteOnce` volume with a `Delete` reclaim policy. We’ll use the GKE `pd.csi.storage.gke.io` provisioner as an example, requesting a `standard` disk type. You would adapt the `provisioner` and `parameters` to match your chosen CSI driver and desired storage characteristics.

# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-fast-storage
provisioner: pd.csi.storage.gke.io # Replace with your cluster's CSI provisioner (e.g., ebs.csi.aws.com, disk.csi.ceph.com)
parameters:
  type: pd-standard # Example for GCP. For AWS, 'type: gp2' or 'gp3'. For Azure, 'skuName: Standard_LRS'.
reclaimPolicy: Delete # Retain or Delete. Delete is common for dynamic provisioning.
volumeBindingMode: Immediate
allowVolumeExpansion: true

The reclaimPolicy: Delete means that when the PersistentVolumeClaim is deleted, the underlying physical volume (e.g., EBS volume, GCP Persistent Disk) will also be deleted. If you set it to Retain, the volume will persist, which is useful for manual data recovery but requires manual cleanup. The volumeBindingMode: Immediate means the PV is provisioned as soon as the PVC is created. WaitForFirstConsumer is another option, which delays provisioning until a Pod is scheduled to use the PVC, potentially improving scheduling decisions. For more advanced networking configurations or specific storage types, you might consider how this interacts with solutions like Cilium WireGuard Encryption if your storage backend communicates over the network.

kubectl apply -f storageclass.yaml

Verify: Ensure your custom StorageClass is created.

kubectl get storageclass my-fast-storage
NAME              PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
my-fast-storage   pd.csi.storage.gke.io   Delete          Immediate           true                   10s

Step 3: Create a PersistentVolumeClaim (PVC)

Now that we have a StorageClass, we can request storage using a PersistentVolumeClaim. The PVC is a request for storage by a user, specifying the desired size, access mode, and optionally, a StorageClass. The Kubernetes control plane then finds or provisions a PersistentVolume that matches the PVC’s requirements.

# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-app-pvc
spec:
  accessModes:
    - ReadWriteOnce # Can be mounted as read-write by a single node
  storageClassName: my-fast-storage # Reference our custom StorageClass
  resources:
    requests:
      storage: 1Gi # Request 1 Gigabyte of storage

accessModes define how the volume can be mounted. Common modes include:

  • ReadWriteOnce: The volume can be mounted as read-write by a single node.
  • ReadOnlyMany: The volume can be mounted as read-only by many nodes.
  • ReadWriteMany: The volume can be mounted as read-write by many nodes. (Less common for block storage, more typical for file storage like NFS).

The resources.requests.storage field specifies the minimum size required. The actual provisioned size might be larger depending on the CSI driver and StorageClass parameters. For applications that require high performance or specific storage characteristics, understanding these parameters is crucial. When considering the cost implications, services like Karpenter Cost Optimization primarily focus on compute, but efficient storage provisioning also plays a role in overall cloud spend.

kubectl apply -f pvc.yaml

Verify: Check the status of your PVC and the automatically provisioned PV. The PVC should transition to Bound status, indicating it’s successfully linked to a PV. The PV will have a unique name generated by the CSI driver.

kubectl get pvc my-app-pvc
NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
my-app-pvc   Bound    pvc-b78c2e6c-f1d0-4a81-9b0d-1e1b1d7d0a2d   1Gi        RWO            my-fast-storage   15s
kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS      REASON   AGE
pvc-b78c2e6c-f1d0-4a81-9b0d-1e1b1d7d0a2d   1Gi        RWO            Delete           Bound    default/my-app-pvc   my-fast-storage            20s

Step 4: Deploy a Pod Using the PVC

Now that we have a PVC bound to a PV, we can deploy an application that uses this persistent storage. We’ll deploy a simple NGINX Pod that writes a timestamp to a file on the mounted volume every 5 seconds.

# pod-with-pvc.yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-nginx-pod
spec:
  containers:
    - name: nginx
      image: nginx:latest
      ports:
        - containerPort: 80
      volumeMounts:
        - name: my-persistent-storage
          mountPath: /usr/share/nginx/html # Mount point inside the container
    - name: writer
      image: busybox
      command: ["/bin/sh", "-c"]
      args:
        - while true; do echo "$(date -u) - Hello from Kubezilla!" >> /data/index.html; sleep 5; done
      volumeMounts:
        - name: my-persistent-storage
          mountPath: /data # Mount point for the writer container
  volumes:
    - name: my-persistent-storage
      persistentVolumeClaim:
        claimName: my-app-pvc # Reference our PVC

In this Pod definition, we declare a volume named my-persistent-storage and link it to our my-app-pvc. The NGINX container mounts this volume at /usr/share/nginx/html, which is NGINX’s default document root. A busybox sidecar container continuously appends data to /data/index.html, ensuring our persistent storage is being used. If you’re building more complex, distributed applications, you might consider a service mesh like Istio Ambient Mesh to manage traffic and policies between your application components, even those interacting with storage.

kubectl apply -f pod-with-pvc.yaml

Verify: Check the Pod’s status and then exec into the Pod to confirm data persistence. You can also port-forward to the NGINX service to see the content.

kubectl get pod my-nginx-pod
NAME           READY   STATUS    RESTARTS   AGE
my-nginx-pod   2/2     Running   0          20s
kubectl exec -it my-nginx-pod -c writer -- cat /data/index.html
Thu Mar 28 10:30:00 UTC 2024 - Hello from Kubezilla!
Thu Mar 28 10:30:05 UTC 2024 - Hello from Kubezilla!
Thu Mar 28 10:30:10 UTC 2024 - Hello from Kubezilla!

If you were to delete and recreate this Pod (while keeping the PVC), the data would persist. Try it: delete the Pod, wait a few seconds, then recreate it, and check the file content again.

kubectl delete pod my-nginx-pod
# Wait for the pod to terminate
kubectl apply -f pod-with-pvc.yaml
kubectl exec -it my-nginx-pod -c writer -- cat /data/index.html

You should see the old content along with new timestamps, proving persistence!

Step 5: Expand the Persistent Volume (Optional)

Many CSI drivers and StorageClasses support volume expansion, allowing you to increase the size of an existing Persistent Volume without downtime. This is particularly useful for growing databases or logging systems. The allowVolumeExpansion: true in our StorageClass is crucial for this.

To expand, simply edit the PVC to request a larger size.

kubectl edit pvc my-app-pvc

Change storage: 1Gi to storage: 2Gi and save the file. The Kubernetes controller, in conjunction with the CSI driver, will then attempt to expand the underlying volume.

# ... (excerpt from kubectl edit pvc)
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi # Changed from 1Gi
  storageClassName: my-fast-storage
  volumeName: pvc-b78c2e6c-f1d0-4a81-9b0d-1e1b1d7d0a2d
# ...

Verify: Check the PVC and PV status. The PVC’s capacity should update, and the PV’s capacity will eventually reflect the new size. Note that sometimes the filesystem inside the container also needs to be expanded, which some CSI drivers handle automatically, while others might require a Pod restart or manual intervention within the container.

kubectl get pvc my-app-pvc -o yaml | grep "capacity\|size"
    capacity:
      storage: 2Gi
    size: 2Gi
kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                STORAGECLASS      REASON   AGE
pvc-b78c2e6c-f1d0-4a81-9b0d-1e1b1d7d0a2d   2Gi        RWO            Delete           Bound    default/my-app-pvc   my-fast-storage            5m

Production Considerations

Deploying persistent storage in production requires careful planning beyond just provisioning.

  • Backup and Restore: Implement robust backup and restore strategies for your persistent data. This often involves snapshotting the underlying volumes (if supported by your CSI driver) or using application-level backups. Tools like Velero can help with Kubernetes-native backup and restore operations.
  • Performance: Choose the right StorageClass and underlying disk type for your application’s performance needs (IOPS, throughput). Monitor storage performance using tools that integrate with your cloud provider or CSI driver. For example, eBPF Observability with Hubble can provide deep insights into network and application performance, but storage I/O often requires specific monitoring solutions.
  • Availability and Durability: Understand the replication and durability characteristics of your chosen storage. Cloud provider disks are typically highly durable, but network storage solutions might require specific configurations for high availability across availability zones.
  • Security: Ensure data at rest and in transit is encrypted. Most cloud CSI drivers support encryption at rest by default. For data in transit, ensure your applications and storage backend use secure protocols. Consider tools like Sigstore and Kyverno for ensuring the integrity and security of your container images and configurations, which indirectly impacts the security posture of your data.
  • Access Modes: Carefully select accessModes. ReadWriteOnce is suitable for single-replica stateful applications. For multi-replica stateful applications (e.g., clustered databases), you’ll likely need ReadWriteMany (e.g., NFS, CephFS) or a shared filesystem CSI driver.
  • Reclaim Policy: For production, Retain reclaim policy is often preferred for critical data to prevent accidental data loss upon PVC deletion, though it requires manual cleanup. Delete is convenient for ephemeral data or development environments.
  • Storage Quotas: Implement resource quotas to limit the amount of storage PVCs can consume in a namespace, preventing resource exhaustion.
  • Monitoring and Alerting: Set up monitoring for disk usage, IOPS, and latency on your persistent volumes. Configure alerts for low disk space or performance degradation.

Troubleshooting

Here are some common issues you might encounter when working with Kubernetes Persistent Volumes and CSI drivers.

  1. PVC stuck in Pending state:

    Explanation: This is a very common issue, often meaning Kubernetes couldn’t find or provision a PV that matches the PVC’s requests.

    Solution:

    • Check the PVC events: kubectl describe pvc <pvc-name>. Look for messages indicating why it’s pending (e.g., “no volume plugin matched,” “no space left on device,” “StorageClass not found”).
    • Verify the StorageClass: Ensure the storageClassName specified in the PVC exists and is spelled correctly (kubectl get storageclass).
    • Check the CSI driver: Is the CSI driver provisioner running and healthy? For cloud providers, check the cloud console for any API errors or resource limits.
    • Check resource availability: Does your cloud account have sufficient quotas for the requested storage type and size?
  2. Pod stuck in ContainerCreating or Pending due to volume issues:

    Explanation: The Pod can’t start because it failed to mount the PVC.

    Solution:

    • Check Pod events: kubectl describe pod <pod-name>. Look for “FailedMount” or “VolumeAttachLimit” errors.
    • Verify PVC status: Ensure the PVC is in the Bound state (kubectl get pvc <pvc-name>). If it’s pending, refer to the previous troubleshooting step.
    • Check node logs: Sometimes the actual mount error occurs on the node. SSH into the node where the Pod is scheduled (or trying to schedule) and check system logs (e.g., journalctl -u kubelet) for mount-related errors.
    • Access mode mismatch: Ensure the PVC’s accessModes are compatible with how the Pod is trying to use it (e.g., trying to use ReadWriteOnce by multiple Pods simultaneously on different nodes will fail).
  3. Volume expansion fails or doesn’t reflect new size:

    Explanation: The underlying volume might have been expanded, but the filesystem inside the container hasn’t been resized, or the StorageClass doesn’t support expansion.

    Solution:

    • Check StorageClass: Ensure allowVolumeExpansion: true is set in the StorageClass.
    • Check PVC events: kubectl describe pvc <pvc-name> for any errors during expansion.
    • Restart Pod: In some cases, especially for older CSI drivers or specific filesystems, a Pod restart might be required for the filesystem to recognize the new size.
    • Manual filesystem resize: For certain setups, you might need to manually exec into the Pod and run filesystem resize commands (e.g., resize2fs for ext4).
  4. Data loss after PVC deletion:

    Explanation: This happens when the reclaimPolicy of the StorageClass (or the PV itself) is set to Delete.

    Solution:

    • Prevention: For critical data, always set the reclaimPolicy to Retain. This prevents the underlying physical volume from being deleted when the PVC is removed.
    • Backup: Implement robust backup strategies to recover from accidental deletions.
  5. CSI driver components are not running or healthy:

    Explanation: The CSI driver itself consists of Kubernetes components (Deployments, DaemonSets) that need to be operational for storage provisioning and management to work.

    Solution:

    • Check CSI driver pods: kubectl get pods -n kube-system | grep csi (adjust namespace and grep for your specific driver). Ensure all pods are running.
    • Check logs: kubectl logs <csi-driver-pod-name> -n kube-system for any errors or warnings.
    • Review installation: Re-check the installation instructions for your specific CSI driver to ensure all components are correctly deployed and configured.

FAQ Section

  1. What’s the difference between a PersistentVolume (PV) and a PersistentVolumeClaim (PVC)?

    A PersistentVolume (PV) is a piece of storage in the cluster, provisioned by an administrator or dynamically by a CSI driver. It’s a cluster resource, independent of any Pod. A PersistentVolumeClaim (PVC) is a request for storage by a user. It consumes PV resources. Think of PVs as the actual physical storage units and PVCs as requests for those units, much like a Pod requests CPU and memory.

  2. Can I use a single PersistentVolumeClaim with multiple Pods?

    Yes, but it depends on the accessModes of the PVC and the underlying storage technology.

    • ReadWriteOnce (RWO): Only one node can mount the volume read-write. This means Pods on the same node can share it, but Pods on different nodes cannot.
    • ReadOnlyMany (ROX): Many nodes can mount the volume read-only.
    • ReadWriteMany (RWX): Many nodes can mount the volume read-write. This is typically supported by network file systems like NFS or CephFS, but not by block storage like EBS or GCP Persistent Disk.
  3. What happens to my data if I delete a Pod that uses a PVC?

    If you delete a Pod, the data on the associated Persistent Volume remains intact, as long as you don’t delete the PersistentVolumeClaim. The PVC and its bound PV are independent of the Pod lifecycle. When a new Pod is created and references the same PVC, it will access the existing data.

  4. How do I choose the right StorageClass?

    Choosing the right StorageClass depends on your application’s requirements:

    • Performance: Do you need high IOPS (databases) or high throughput (logging)? Select a StorageClass that maps to a high-performance disk type.
    • Access Mode: Does your application need RWO, ROX, or RWX?
    • Cost: Different StorageClasses (and their underlying storage) have different cost implications.
    • Reclaim Policy: For critical data, use Retain; for ephemeral data, Delete.
    • Availability: Consider if the storage is zonal or regional and how that impacts your application’s high availability strategy.

    Consult your cloud provider’s documentation or your CSI driver’s documentation for details on available parameters.

  5. Can I use local storage with CSI?

    Yes, you can. The Kubernetes Local Persistent Volume CSI Driver allows you to use `hostPath` directories or raw block devices on worker nodes as Persistent Volumes. This is useful for performance-sensitive applications that benefit from local I/O, or for edge deployments. However, managing local storage requires careful consideration for data availability and disaster recovery, as the data is tied to a specific node.

Cleanup Commands

It’s crucial to clean up your resources to avoid unnecessary cloud costs and clutter in your cluster. Since our StorageClass had a reclaimPolicy: Delete, deleting the PVC will also delete the underlying physical volume.

# Delete the Pod
kubectl delete -f pod-with-pvc.yaml

# Delete the PersistentVolumeClaim
kubectl delete -f pvc.yaml

# Delete the StorageClass (optional, if you want to remove it)
kubectl delete -f storageclass.yaml

# Verify everything is gone
kubectl get pod,pvc,pv,storageclass
No resources found in default namespace.
No resources found.

Next Steps / Further Reading

You’ve now mastered the basics of Kubernetes Persistent Volumes with CSI drivers. To deepen your knowledge and explore more advanced topics:

  • StatefulSets: For deploying stateful applications that require stable, unique network identifiers and persistent storage, explore Kubernetes StatefulSets. They are ideal for databases, message queues, and other distributed systems that rely on persistent, ordered deployment.
  • Volume Snapshots: Learn about Volume Snapshots for creating point-in-time copies of your persistent data, which are invaluable for backups and disaster recovery.
  • CSI Driver Development: If you’re interested in how CSI drivers work under the hood or want to integrate a custom storage solution, delve into the CSI Specification and Documentation.
  • Advanced Storage Patterns: Explore shared filesystems, object storage, and how to integrate them with Kubernetes for specific use cases.
  • Networking and Storage Security: Understand how Kubernetes Network Policies can be used to secure traffic to your storage endpoints, especially for network-attached storage.
  • Cost Optimization: Continue your journey into optimizing Kubernetes costs, perhaps by exploring how efficient storage tiering relates to overall cluster expenses alongside compute optimization tools like Karpenter.

Conclusion

Persistent storage is a fundamental requirement for most real-world applications running on Kubernetes. By leveraging the power of Persistent Volumes, Persistent Volume Claims, and the extensible Container Storage Interface, you can provide robust, scalable, and resilient storage solutions for your stateful workloads. Understanding how to choose the right StorageClass, manage access modes, and troubleshoot common issues will empower you to build and operate complex, data-driven applications effectively in your Kubernetes clusters. Embrace the persistence, and let your applications truly thrive!

Leave a Reply

Your email address will not be published. Required fields are marked *