NVIDIA Developer Blog · · 5 min read

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

Mirrored from NVIDIA Developer Blog for archival readability. Support the source by reading on the original site.

Get Real-Time Visibility into GPU Usage Across Kubernetes Clusters

May 21, 2026

Maximizing the value of AI infrastructure demands deep visibility into GPU utilization. Yet many platform teams running AI workloads on Kubernetes operate with limited visibility into how their GPUs are used. Most don’t know who’s consuming them, how much memory is in use, and whether Kubernetes pods are pending or silently idle. Without a signal, GPU fleets are routinely underutilized and slow to surface scheduling bottlenecks until users escalate.

The GPU Usage Monitor, built on the NVIDIA Data Center GPU Manager (DCGM) Exporter, enables real-time visibility into GPU allocation, compute utilization, memory consumption, and pod status across an entire Kubernetes cluster and through a single Helm chart deployment.

The observability gap in GPU-Accelerated Kubernetes clusters

For site reliability engineers (SREs) and platform teams managing GPU-accelerated Kubernetes clusters, two failure modes are common and costly.

  • Over-provisioning: Engineers request entire GPUs to avoid contention, but models frequently use 30-50% of available memory and compute. Without visibility into consumption, there’s no signal to right-size these allocations. The result is a cluster with high nominal demand but low effective utilization – paying for hardware that sits idle.
  • Pod starvation and scheduling blind spots: GPU requests can stack up, leaving pods queued in a Pending state and causing model training jobs or inference endpoints to stall before they start. Without a cluster-wide view of pending versus running GPU pods, these scheduling bottlenecks are often discovered too late – typically when a user reports a failure, rather than through a monitoring alert.

The standard Kubernetes metrics stack – including kube-state-metrics and node-exporter – doesn’t surface GPU-specific signals. DCGM Exporter exposes per-GPU hardware metrics, but wiring it into Prometheus and Grafana with production-quality dashboards requires significant manual configuration effort. Teams end up with inconsistent, one-off monitoring setups, or no GPU monitoring at all.

What is the GPU Usage Monitor?

The GPU Usage Monitor is an open-source project that deploys a fully integrated GPU observability stack for Kubernetes. Rather than requiring SRE and platform teams to assemble and configure individual components, the GPU Usage Monitor uses DCGM Exporter, kube-state-metrics, Prometheus, and Grafana into a single deployment, complete with pre-built dashboards designed specifically for GPU-accelerated workloads.

The design principle is operational simplicity. A single helm install command results in actionable GPU visibility within minutes, with no custom dashboard authoring or scrape configuration required.

GPU Usage Monitor architecture

The tool consists of four main components:

  • DCGM Exporter: Exposes NVIDIA GPU metrics (external – deployed via GPU Operator)
  • kube-state-metrics: Exposes Kubernetes pod and resource metrics
  • Prometheus: Collects and stores metrics from DCGM and kube-state-metrics
  • Grafana: Provides visualization through the GPU Usage Monitor Dashboard
A flow chart showing the GPU Usage Monitor architecture diagram.
Figure 1. GPU Usage Monitor architecture diagram

DCGM handles the hardware layer, and kube-state-metrics handles the Kubernetes layer. Prometheus and Grafana tie them together into a unified observability plane. Each component is well-understood independently by platform teams; the value of the chart is the integration.

How to get started with the GPU Usage Monitor

The GPU Usage Monitor is open source under the Apache 2.0 license and available now on GitHub.

Prerequisites
Before installing, verify the following:

  • Kubernetes 1.19 or later
  • Helm 3.0 or later
  • DCGM Exporter running on GPU nodes

Installation
Deploying the full monitoring stack takes three commands.

# Update chart dependencies
helm dependency update

# Install into a dedicated namespace
helm install gpu-usage-monitor . \
  --namespace gpu-usage-monitor \
  --create-namespace

# Forward Grafana to localhost
kubectl port-forward \
  -n gpu-usage-monitor \
  svc/gpu-usage-monitor-grafana 3000:80

Navigate to http://localhost:3000 and log in with the default credentials (admin / admin). For any environment beyond a local development cluster, update credentials through values.yaml before exposing the dashboard to wider teams.

What the dashboards surface

Once deployed, the pre-built Grafana dashboards give operators an immediate read on the state of GPU resources across the cluster.

A visualization of the Grafana dashboard.
Figure 2. The GPU Utilization Monitor Grafana dashboard

Key insights you can get from the dashboard:

  • GPU allocation trends. Track which namespaces and workloads hold GPU allocations over time. Spot allocations that were made but never actively used are a direct signal for reclaiming idle capacity.
  • Compute utilization with thresholds. Per-GPU utilization percentages displayed against configurable thresholds. Set warning and critical bounds to get ahead of saturation before it degrades inference latency or training throughput.
  • Memory usage per workload. Real-time GPU memory consumption broken down by pod. This is the foundational signal for right-sizing resource requests: if a workload consistently consumes 12 GB on an 80 GB NVIDIA GPU, it doesn’t need a full GPU allocation.
  • Running and pending pod counts. A single-pane view of how many GPU-enabled Kubernetes pods are actively running versus stuck in Pending. A growing pending count is an early warning of scheduling pressure – visible before users notice anything is wrong.
  • GPU type filtering. Filter all metrics by NVIDIA GPU platform (Hopper, Blackwell, Blackwell Ultra, and others). Useful for heterogeneous fleets where GPU type affects what workloads are appropriate and what utilization numbers are expected.

Configuration

The Helm chart is designed to fit into existing infrastructure rather than replace it. Key configuration options in values.yaml fall into three areas.

  • External Prometheus integration: If an organization operates a managed or self-hosted Prometheus instance, the chart can be configured to ship GPU metrics to the existing stack instead of deploying a new Prometheus alongside it. This keeps metric retention, alerting rules, and data lifecycle management centralized.
  • Custom resource allocation: CPU and memory requests and limits for all chart components are configurable. Tune these values to fit the cluster’s resource budget, especially for the Prometheus instance if long-term metric retention is required.
  • Credential management: Override default Grafana credentials before any broader rollout. The chart exposes these through standard Helm values, making them straightforward to manage via existing secret management workflows.

Whether managing a small GPU cluster for a single ML team or running a large-scale platform serving hundreds of workloads, complete GPU observability is a prerequisite for operating that infrastructure efficiently. The GPU Usage Monitor makes that observability accessible in minutes.

Learn more

Access the comprehensive Helm chart for monitoring GPU resources in Kubernetes clusters.

Discuss (0)

Tags

Data Center / Cloud | Developer Tools & Techniques | General | Intermediate Technical | Kubernetes | Open Source

About the Authors

Avatar photo
About Guy Saltoun
Guy Saltoun is the director of Solutions Engineering at Run:ai. He manages a global team who are responsible for installations, implementations, and training sessions of cloud-native AI solutions.

Comments

Comments are closed.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from NVIDIA Developer Blog