Skip to main content

Python package

gpu

Real-time GPU monitoring and diagnostic capabilities for NVIDIA and AMD graphics hardware.

The GPU diagnostics module provides comprehensive tools for monitoring graphics processing unit performance, memory usage, and utilization metrics. It supports both NVIDIA GPUs through NVML and AMD GPUs through ROCm SMI, offering unified access to hardware statistics regardless of vendor. The API enables both synchronous queries for immediate metrics and asynchronous background collection for continuous monitoring during long-running inference sessions.

Classes

  • BackgroundRecorder: Asynchronous GPU metrics collection.
  • GPUDiagContext: Context manager providing unified access to GPU diagnostic information across NVIDIA and AMD hardware.
  • GPUStats: Comprehensive GPU state snapshot containing memory and utilization statistics.
  • MemoryStats: Detailed GPU memory usage statistics including total, free, used, and reserved memory.
  • UtilizationStats: GPU compute and memory activity utilization percentages.