Orchestration

MIG & Time-Slicing: Share GPUs Efficiently

Introduction

In the burgeoning world of AI, machine learning, and high-performance computing, Graphics Processing Units (GPUs) have become indispensable. However, these powerful accelerators often come with a hefty price tag and can be underutilized if not managed efficiently. Running a single, small workload on an entire GPU is akin to driving a sports car to pick up groceries – overkill and wasteful. This challenge is amplified in multi-tenant Kubernetes environments where diverse workloads compete for shared resources.

Traditional GPU allocation in Kubernetes typically assigns an entire GPU to a single pod, leading to resource fragmentation and increased infrastructure costs. To combat this, we turn to advanced GPU sharing techniques: NVIDIA Multi-Instance GPU (MIG) and Time-Slicing. MIG allows a single physical GPU to be partitioned into multiple, fully isolated GPU instances, each with its own dedicated memory, cache, and compute cores. This provides robust isolation and predictable performance for concurrent workloads. Complementing MIG, Time-Slicing enables multiple pods to share a single GPU (or a MIG-partitioned GPU) by rapidly switching between their contexts, offering a more flexible sharing model suitable for best-effort or burstable workloads. Together, these technologies unlock significant cost savings and improve GPU utilization within your Kubernetes clusters, especially when paired with intelligent scheduling strategies, as discussed in our LLM GPU Scheduling Guide.

This comprehensive guide will walk you through setting up and configuring both NVIDIA MIG and Time-Slicing within your Kubernetes cluster. We’ll cover the necessary prerequisites, detailed step-by-step instructions for driver installation, Kubernetes device plugin deployment, and how to define your workloads to leverage these powerful sharing mechanisms. By the end, you’ll be equipped to maximize your GPU investments, reduce operational costs, and provide a more efficient platform for your AI and ML initiatives.

TL;DR: GPU Sharing with MIG and Time-Slicing

Maximize GPU utilization in Kubernetes by enabling NVIDIA MIG for hardware partitioning and Time-Slicing for software sharing.

  • Prerequisites: NVIDIA GPUs (Ampere or newer for MIG), NVIDIA driver, Kubernetes cluster.
  • Install NVIDIA Drivers: Ensure correct driver version matching your kernel.
  • Install NVIDIA Container Toolkit: Required for GPU access within containers.
  • Deploy NVIDIA Device Plugin: Essential for Kubernetes to recognize and schedule GPUs.
  • Configure MIG: Use nvidia-smi mig to create GPU instances, then configure device plugin.
  • Configure Time-Slicing: Modify device plugin configmap to enable time-slicing.
  • Deploy Workloads: Request nvidia.com/gpu for time-slicing or nvidia.com/mig-<profile> for MIG.

# Example: Install NVIDIA Device Plugin (simplified)
helm repo add nvdp https://nvidia.github.io/helm-charts
helm repo update
helm install \
    --generate-name \
    nvdp/nvidia-device-plugin \
    --namespace nvidia-device-plugin \
    --create-namespace

# Example: Deploy a Pod requesting a MIG 1g.5gb instance
kubectl apply -f - <

Prerequisites

Before diving into GPU sharing, ensure you have the following in place:

  • Kubernetes Cluster: A running Kubernetes cluster (v1.18+ recommended for device plugin features).
  • NVIDIA GPUs:
    • For MIG: NVIDIA Ampere architecture GPUs (e.g., A100, A30, A40) or newer. Check the NVIDIA MIG User Guide for compatible GPUs.
    • For Time-Slicing: Any NVIDIA GPU supported by the NVIDIA Container Toolkit.
  • NVIDIA Drivers: The appropriate NVIDIA GPU drivers installed on all your Kubernetes worker nodes that contain GPUs. These drivers must match your kernel version.
  • NVIDIA Container Toolkit: Installed on all GPU-enabled worker nodes. This enables Docker/containerd to interact with NVIDIA GPUs. Follow the official NVIDIA Container Toolkit installation guide.
  • Helm: Helm v3+ installed on your local machine for deploying the NVIDIA Device Plugin.
  • kubectl: Configured to interact with your Kubernetes cluster.
  • Administrative Access: Root or sudo privileges on your worker nodes to install drivers and the container toolkit.

Step-by-Step Guide

Step 1: Install NVIDIA GPU Drivers

This step is crucial as the Kubernetes device plugin relies on the underlying NVIDIA drivers to expose GPU resources. Ensure you install the correct driver version compatible with your operating system and kernel. A common pitfall is driver mismatch, leading to GPUs not being detected. For cloud instances, often a specific driver version is recommended or pre-installed. For bare metal, you'll typically download directly from NVIDIA.

Note: The exact commands may vary based on your Linux distribution (Ubuntu, CentOS, etc.). We'll use Ubuntu as an example. Always reboot your node after driver installation.


# On each GPU-enabled worker node:

# 1. Update package lists
sudo apt update

# 2. Install kernel headers and build tools (if not already present)
sudo apt install -y linux-headers-$(uname -r) build-essential

# 3. Add NVIDIA CUDA repository (adjust for your OS and CUDA version)
# For Ubuntu 20.04 and CUDA 11.8 (example)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"

# 4. Install NVIDIA drivers (e.g., nvidia-driver-535 for a specific version)
# Or for the latest recommended driver:
sudo apt update
sudo apt install -y nvidia-driver-535 # Replace with desired version or 'nvidia-driver-535-server' for server versions

# 5. Reboot the node
sudo reboot now

Verify: After rebooting, log back into the node and check the driver installation using nvidia-smi.


nvidia-smi

Expected Output: You should see details about your NVIDIA GPUs, including driver version, CUDA version, and GPU utilization.


+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0              46W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |                  Off |
+-----------------------------------------+----------------------+----------------------+

Step 2: Install NVIDIA Container Toolkit

The NVIDIA Container Toolkit is essential for enabling containers to access NVIDIA GPUs. It provides a runtime hook that injects GPU capabilities into containers. Without this, your pods won't be able to utilize the GPUs even if the drivers are installed. This step also needs to be performed on all GPU-enabled worker nodes.


# On each GPU-enabled worker node:

# 1. Add the NVIDIA Container Toolkit repository
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/ubuntu2004/libnvidia-container.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

# 2. Update package lists
sudo apt update

# 3. Install the NVIDIA Container Toolkit
sudo apt install -y nvidia-container-toolkit

# 4. Configure the Docker daemon to use the NVIDIA runtime (if using Docker)
# If using containerd, the setup is slightly different and often handled by the toolkit itself.
# For containerd, ensure your /etc/containerd/config.toml has the NVIDIA runtime configured.
# Example for containerd:
# sudo nvidia-container-toolkit runtime configure --runtime=containerd
# sudo systemctl restart containerd

# For Docker:
sudo systemctl restart docker

Verify: Test the container toolkit installation by running a simple CUDA container. This verifies that your container runtime can access the GPU.


# For Docker:
sudo docker run --rm --gpus all nvcr.io/nvidia/cuda:11.4.0-base-ubuntu20.04 nvidia-smi

# For containerd (assuming 'ctr' is available, or use a Kubernetes Pod later):
# This is usually verified via the Kubernetes device plugin and a test pod.

Expected Output: Similar to the nvidia-smi output from Step 1, showing GPU details from within the container.


+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0              46W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |                  Off |
+-----------------------------------------+----------------------+----------------------+

Step 3: Deploy NVIDIA Kubernetes Device Plugin

The NVIDIA Device Plugin for Kubernetes is responsible for exposing GPU resources to the Kubernetes scheduler. It discovers NVIDIA GPUs on the nodes, reports them to the Kubernetes API server as schedulable resources (e.g., nvidia.com/gpu), and allows containers to request them. We'll deploy it using Helm.


# On your local machine (with kubectl and helm configured):

# 1. Add the NVIDIA Helm repository
helm repo add nvdp https://nvidia.github.io/helm-charts
helm repo update

# 2. Install the NVIDIA Device Plugin
# We'll install a basic version first, then modify for MIG/Time-Slicing.
helm install \
    --generate-name \
    nvdp/nvidia-device-plugin \
    --namespace nvidia-device-plugin \
    --create-namespace

Verify: Check if the device plugin pods are running and if Kubernetes nodes are reporting GPU resources.


kubectl get pods -n nvidia-device-plugin

# Check node resources (look for nvidia.com/gpu)
kubectl describe node  | grep "nvidia.com/gpu"

Expected Output:


# For kubectl get pods:
NAME                                      READY   STATUS    RESTARTS   AGE
nvidia-device-plugin-1678890123-xxxxx     1/1     Running   0          2m

# For kubectl describe node:
  nvidia.com/gpu:                 1

Step 4: Configure NVIDIA MIG (Multi-Instance GPU)

MIG allows you to partition a single Ampere-class GPU into multiple, smaller, fully isolated GPU instances. Each MIG instance appears as a separate GPU to the system. This provides strong isolation and guaranteed QoS. This step involves two parts: configuring MIG on the physical GPU and then configuring the NVIDIA Device Plugin to expose these MIG instances to Kubernetes.

First, on your GPU-enabled worker node, create MIG profiles. You can create various profiles (e.g., 1g.5gb, 2g.10gb, etc.) depending on your GPU model. Refer to the NVIDIA MIG User Guide for available profiles for your specific GPU.


# On the GPU-enabled worker node:

# 1. Check current MIG status (should be disabled by default)
sudo nvidia-smi -L

# 2. Enable MIG mode (requires GPU reset, so no active workloads)
sudo nvidia-smi -mig 1

# 3. Create MIG instances. Example for an A100 40GB:
# This creates a 1g.5gb instance from GPU 0, then a 2g.10gb, and another 1g.5gb.
# Adjust as per your needs and GPU capabilities.
# First, delete any existing MIG configurations if you are reconfiguring
# sudo nvidia-smi mig -dci # Delete all compute instances
# sudo nvidia-smi mig -dgi # Delete all GPU instances

# Create a 1g.5gb instance
sudo nvidia-smi mig -cgi 19 -C -i 0 # 19 corresponds to 1g.5gb profile on A100

# Create a 2g.10gb instance
sudo nvidia-smi mig -cgi 20 -C -i 0 # 20 corresponds to 2g.10gb profile on A100

# Create another 1g.5gb instance
sudo nvidia-smi mig -cgi 19 -C -i 0 # Another 1g.5gb instance

# 4. Verify the created MIG instances
sudo nvidia-smi -L

Expected Output (nvidia-smi -L after MIG configuration):


GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
  MIG 3a9a7a9a-b4d0-5d6c-b36e-d7e7e7e7e7e7
    GPU instance 0: 1g.5gb (UUID: MIG-GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/0/0)
      Compute instance 0: 1g.5gb (UUID: MIG-CI-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/0/0/0)
  MIG 3b9b7b9b-b4d0-5d6c-b36e-d7e7e7e7e7e7
    GPU instance 1: 2g.10gb (UUID: MIG-GPU-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/0/1)
      Compute instance 0: 2g.10gb (UUID: MIG-CI-yyyyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/0/1/0)
  MIG 3c9c7c9c-b4d0-5d6c-b36e-d7e7e7e7e7e7
    GPU instance 2: 1g.5gb (UUID: MIG-GPU-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz/0/2)
      Compute instance 0: 1g.5gb (UUID: MIG-CI-zzzzzzzz-zzzz-zzzz-zzzz-zzzzzzzzzzzz/0/2/0)

Next, configure the NVIDIA Device Plugin to expose these MIG resources. The device plugin needs to be deployed with MIG strategy enabled. First, delete the existing device plugin, then reinstall it with the correct configuration.


# On your local machine (with kubectl and helm configured):

# 1. Find the name of your existing device plugin release
helm list -n nvidia-device-plugin

# 2. Uninstall the existing device plugin
helm uninstall  -n nvidia-device-plugin # Replace 

# 3. Reinstall the device plugin with MIG strategy
helm install \
    --generate-name \
    nvdp/nvidia-device-plugin \
    --namespace nvidia-device-plugin \
    --create-namespace \
    --set mig.strategy=full

Verify: Check node resources again. You should now see specific MIG resources reported by the node.


kubectl describe node  | grep "nvidia.com/mig"

Expected Output:


  nvidia.com/mig-1g.5gb:          2
  nvidia.com/mig-2g.10gb:         1

Now, deploy a pod requesting a MIG instance. Notice the resource request format: nvidia.com/mig-<profile>.


apiVersion: v1
kind: Pod
metadata:
  name: mig-workload
spec:
  restartPolicy: OnFailure
  containers:
    - name: mig-container
      image: nvcr.io/nvidia/cuda:11.4.0-base-ubuntu20.04
      command: ["bash", "-c", "echo 'Running on MIG instance!' && nvidia-smi && sleep 3600"]
      resources:
        limits:
          # Request one 1g.5gb MIG instance
          nvidia.com/mig-1g.5gb: 1
  nodeSelector:
    # Ensure this pod lands on a GPU-enabled node with MIG configured
    kubernetes.io/hostname: 

kubectl apply -f mig-workload.yaml
kubectl logs mig-workload

Expected Output (from logs): nvidia-smi inside the pod should only show the allocated MIG instance, not the full GPU.


Running on MIG instance!
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                  N/A |
| N/A   32C    P0              46W / 400W |      0MiB / 5120MiB  |      0%      Default |
|                                         |                      |                1g.5gb|
+-----------------------------------------+----------------------+----------------------+

Step 5: Configure NVIDIA Time-Slicing

Time-Slicing allows multiple pods to share a single GPU by rapidly switching contexts. Unlike MIG, it doesn't provide hardware isolation, but it's highly flexible and efficient for workloads that don't require dedicated GPU resources or have bursty usage patterns. This is configured through the NVIDIA Device Plugin's configmap.

First, let's get the current device plugin configmap.


# On your local machine:

# Get the name of the device plugin configmap
kubectl get configmap -n nvidia-device-plugin

# Example output might be nvidia-device-plugin-config-xxxxx

# Get the YAML of the configmap
kubectl get configmap  -n nvidia-device-plugin -o yaml > device-plugin-config.yaml

Now, edit the device-plugin-config.yaml file to enable time-slicing. You'll need to modify the config.yaml data entry. The sharing.timeSlicing.resources[0].gpus value determines how many pods can share a single GPU. For instance, 2 means two pods can share one GPU, each getting 50% of the time. If you have MIG configured, you can also time-slice within a MIG instance.


# device-plugin-config.yaml (modified)
apiVersion: v1
kind: ConfigMap
metadata:
  name:  # Keep the original name
  namespace: nvidia-device-plugin
data:
  config.yaml: |
    version: v1
    # For time-slicing on full GPUs
    sharing:
      timeSlicing:
        resources:
          - name: nvidia.com/gpu
            gpus: 4 # Allows 4 pods to share one full GPU
    # If MIG is enabled and you want to time-slice MIG instances:
    # mig:
    #   strategy: full
    #   sharing:
    #     timeSlicing:
    #       resources:
    #         - name: nvidia.com/mig-1g.5gb # Time-slice 1g.5gb MIG instances
    #           gpus: 2 # Allows 2 pods to share one 1g.5gb MIG instance
    #         - name: nvidia.com/mig-2g.10gb # Time-slice 2g.10gb MIG instances
    #           gpus: 2 # Allows 2 pods to share one 2g.10gb MIG instance

Apply the modified configmap and then restart the device plugin pods for changes to take effect.


# On your local machine:

# Apply the modified configmap
kubectl apply -f device-plugin-config.yaml -n nvidia-device-plugin

# Restart the device plugin pods (this will cause a brief interruption in GPU scheduling)
kubectl delete pods -l app=nvidia-device-plugin -n nvidia-device-plugin

Verify: After the device plugin pods restart, check the node resources. The nvidia.com/gpu count should now reflect the time-slicing configuration (e.g., if you set gpus: 4 for nvidia.com/gpu on a node with 1 physical GPU, the node will report nvidia.com/gpu: 4).


kubectl describe node  | grep "nvidia.com/gpu"

Expected Output:


  nvidia.com/gpu:                 4 # If you configured 4 slices per GPU

Now, deploy multiple pods requesting a time-sliced GPU. Each pod will request nvidia.com/gpu: 1.


# time-sliced-workload-1.yaml
apiVersion: v1
kind: Pod
metadata:
  name: time-sliced-workload-1
spec:
  restartPolicy: OnFailure
  containers:
    - name: time-sliced-container-1
      image: nvcr.io/nvidia/cuda:11.4.0-base-ubuntu20.04
      command: ["bash", "-c", "echo 'Running time-sliced workload 1...' && nvidia-smi && sleep 3600"]
      resources:
        limits:
          nvidia.com/gpu: 1
  nodeSelector:
    kubernetes.io/hostname: 
---
# time-sliced-workload-2.yaml
apiVersion: v1
kind: Pod
metadata:
  name: time-sliced-workload-2
spec:
  restartPolicy: OnFailure
  containers:
    - name: time-sliced-container-2
      image: nvcr.io/nvidia/cuda:11.4.0-base-ubuntu20.04
      command: ["bash", "-c", "echo 'Running time-sliced workload 2...' && nvidia-smi && sleep 3600"]
      resources:
        limits:
          nvidia.com/gpu: 1
  nodeSelector:
    kubernetes.io/hostname: 

kubectl apply -f time-sliced-workload-1.yaml
kubectl apply -f time-sliced-workload-2.yaml
kubectl logs time-sliced-workload-1
kubectl logs time-sliced-workload-2

Expected Output (from logs): Both pods should show the full GPU in their nvidia-smi output, but they are logically sharing it via time-slicing. You might observe varying GPU utilization if both pods are actively using the GPU.


# Output from time-sliced-workload-1
Running time-sliced workload 1...
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:07:00.0 Off |                    0 |
| N/A   32C    P0              46W / 400W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |                  Off |
+-----------------------------------------+----------------------+----------------------+

# Output from time-sliced-workload-2 (similar)
Running time-sliced workload 2...
... (similar nvidia-smi output) ...

Production Considerations

Deploying GPU sharing in production requires careful planning beyond just technical setup:

  • Resource Management and Quotas: While time-slicing and MIG increase utilization, implement Kubernetes Resource Quotas to prevent individual teams or namespaces from monopolizing shared GPU resources. This is especially critical for time-sliced GPUs where a single greedy workload can starve others.
  • Monitoring and Observability: Implement robust monitoring for GPU utilization, memory, temperature, and power consumption. Tools like Prometheus and Grafana, integrated with NVIDIA DCGM (Data Center GPU Manager), can provide critical insights. For advanced eBPF-based observability, consider solutions like Hubble, as detailed in our eBPF Observability with Hubble guide, to understand network and process interactions with GPUs.
  • Workload Characterization: Understand your workloads.
    • MIG: Ideal for critical, performance-sensitive workloads that require guaranteed isolation and predictable performance (e.g., real-time inference, small training jobs).
    • Time-Slicing: Best for bursty, less critical, or development workloads where slight performance variance is acceptable (e.g., interactive notebooks, batch processing with variable load).
  • Node Autoscaling: Integrate with cluster autoscalers like Cluster Autoscaler or Karpenter. Ensure that your autoscaling policies correctly account for node capacity based on MIG instances or time-sliced GPU counts, not just raw physical GPUs. Karpenter, for instance, can be configured to provision nodes with specific GPU types and MIG profiles.
  • Security and Isolation: MIG provides hardware isolation, which is a strong security boundary. Time-slicing offers softer isolation. For multi-tenant clusters, consider additional security measures like Kubernetes Network Policies to restrict communication between namespaces, and Pod Security Standards to enforce secure pod configurations. Ensure your container images are signed and verified using solutions like Sigstore and Kyverno.
  • Driver Management: Keep NVIDIA drivers updated to benefit from performance improvements and bug fixes, but always test thoroughly in a staging environment before rolling out to production. Automate driver updates where possible, perhaps using tools like NVIDIA GPU Operator.
  • Failure Domains: Distribute GPU-enabled nodes across different availability zones to ensure high availability for your GPU workloads.
  • Cost Optimization: Regularly review GPU utilization metrics. If MIG instances are consistently underutilized, consider reconfiguring MIG profiles. If time-sliced GPUs are overloaded, it might be time to scale up or add more physical GPUs.

Troubleshooting

1. Issue: nvidia-smi command not found or showing error.

Problem: NVIDIA drivers are not installed correctly, or the PATH is not set.

Solution:

  1. Verify driver installation by checking kernel modules:
    lsmod | grep nvidia
  2. Reinstall drivers following Step 1.
  3. Ensure /usr/bin or /usr/local/bin (where nvidia-smi typically resides) is in your system's PATH.
  4. Check if the node was rebooted after driver installation.

2. Issue: Pods requesting GPUs are stuck in Pending state.

Problem: Kubernetes scheduler cannot find a node with available GPU resources. This could be due to the NVIDIA Device Plugin not running, GPU resources not being reported correctly, or insufficient resources.

Solution:

<

Leave a Reply

Your email address will not be published. Required fields are marked *