Iris Coleman
Oct 14, 2025 16:42
NVIDIA introduces Coherent Driver-based Reminiscence Administration (CDMM) to enhance GPU reminiscence management on hardware-coherent platforms, addressing points confronted by builders and cluster directors.
NVIDIA has launched a brand new reminiscence administration mode, Coherent Driver-based Reminiscence Administration (CDMM), designed to boost the management and efficiency of GPU reminiscence on hardware-coherent platforms akin to GH200, GB200, and GB300. This growth goals to deal with the challenges posed by non-uniform reminiscence entry (NUMA), which may result in inconsistent system efficiency when functions aren’t totally NUMA-aware, in response to NVIDIA.
NUMA vs. CDMM
NUMA mode, the present default for NVIDIA drivers on hardware-coherent platforms, exposes each CPU and GPU reminiscence to the working system (OS). This setup permits reminiscence allocation by customary Linux and CUDA APIs, facilitating dynamic reminiscence migration between CPU and GPU. Nonetheless, this may additionally lead to GPU reminiscence being handled as a generic pool, probably affecting software efficiency negatively.
In distinction, CDMM mode prevents GPU reminiscence from being uncovered to the OS as a software program NUMA node. As a substitute, the NVIDIA driver straight manages GPU reminiscence, offering extra exact management and probably boosting software efficiency. This method is akin to the PCIe-attached GPU mannequin, the place GPU reminiscence stays distinct from system reminiscence.
Implications for Kubernetes
The introduction of CDMM is especially important for Kubernetes, a widely-used platform for managing giant GPU clusters. In NUMA mode, Kubernetes might encounter surprising behaviors, akin to reminiscence over-reporting and incorrect software of pod reminiscence limits, which may result in efficiency points and software failures. CDMM mode helps mitigate these points by guaranteeing higher isolation and management over GPU reminiscence.
Impression on Builders and System Directors
For CUDA builders, CDMM mode impacts how system-allocated reminiscence is dealt with. Whereas GPU can nonetheless entry system-allocated reminiscence throughout the NVLink chip-to-chip connection, reminiscence pages is not going to migrate as they could in NUMA mode. This transformation requires builders to adapt their reminiscence administration methods to totally leverage the capabilities of CDMM.
System directors will discover that instruments like numactl
or mbind
are ineffective for GPU reminiscence administration in CDMM mode, as GPU reminiscence just isn’t offered to the OS. Nonetheless, these instruments can nonetheless be utilized for managing system reminiscence.
Pointers for Selecting Between CDMM and NUMA
When deciding between CDMM and NUMA modes, contemplate the particular reminiscence administration wants of your functions. NUMA mode is appropriate for functions that depend on OS administration of mixed CPU and GPU reminiscence. In distinction, CDMM mode is right for functions requiring direct GPU reminiscence management, bypassing the OS for enhanced efficiency and management.
Finally, CDMM mode affords builders and directors the flexibility to harness the complete potential of NVIDIA’s hardware-coherent reminiscence architectures, optimizing efficiency for GPU-accelerated workloads. For these utilizing platforms like GH200, GB200, or GB300, enabling CDMM mode may present important advantages, particularly in Kubernetes environments.
Picture supply: Shutterstock