Tony Kim
Could 22, 2026 00:35
NVIDIA GB200 NVL72 leverages Slurm’s topology-aware scheduling for environment friendly AI workloads, unlocking exascale efficiency.

NVIDIA’s GB200 NVL72, a cutting-edge rack-scale AI supercomputer, is now attaining optimized efficiency by topology-aware job scheduling with Slurm. This development is essential as AI fashions, significantly trillion-parameter massive language fashions (LLMs), demand each unprecedented compute energy and environment friendly useful resource allocation. The system, constructed on NVIDIA’s Blackwell structure, delivers as much as 130 terabytes per second (TB/s) of GPU communication bandwidth and helps coaching and inference for a number of the most advanced AI workloads.
The GB200 NVL72 integrates 72 NVIDIA Blackwell GPUs and 36 Grace CPUs in a single rack, interconnected through NVIDIA NVLink. In keeping with NVIDIA, this setup not solely helps large-scale coaching but additionally accelerates real-time inference with over 1.5 million tokens per second for OpenAI GPT fashions. Nonetheless, maximizing this efficiency in shared clusters requires strategic scheduling, as highlighted in NVIDIA’s collaboration with SchedMD to reinforce Slurm’s topology-aware capabilities.
Why Scheduling Issues for Exascale Methods
AI workloads usually run on shared clusters, the place a number of jobs should compete for sources. With out topology-aware scheduling, jobs might span throughout NVLink domains inefficiently, resulting in useful resource fragmentation and decreased efficiency. The newly launched Slurm topology/block plugin aligns jobs with the bodily community format of the GB200 NVL72, preserving locality and minimizing fragmentation. This ensures that GPU sources are allotted in a manner that maximizes bandwidth and compute effectivity.
For instance, NVIDIA’s simulation of a 5,000-node GB200 NVL72 cluster confirmed that the brand new scheduling insurance policies achieved GPU occupancy inside 1% of a theoretical most whereas sustaining excessive job effectivity. The plugin additionally strategically positioned smaller jobs to release sources for bigger AI coaching duties, putting a steadiness between utilization and efficiency.
Section Sizing and Finest Practices
One of many key options of the GB200 NVL72 system is its help for bigger phase sizes. Whereas earlier programs just like the NVIDIA HGX H100 had been restricted to a single-node phase measurement, the GB200 NVL72 can deal with segments as much as 18 nodes. This flexibility permits operators to tailor phase sizes to particular workloads, similar to utilizing 16-node segments for high-bandwidth fashions like mixture-of-experts (MoE) coaching, or smaller segments for much less demanding duties.
In observe, NVIDIA recommends phase sizes that align with workload traits. For instance, massive jobs of 128 GPUs or extra ought to use 16-node segments, whereas smaller jobs could be allotted to single-node segments. These configurations stop over-constraining the scheduler and preserve excessive cluster utilization, whilst job profiles evolve over time.
Market Context and Adoption
Industrial deployments of the GB200 NVL72 started ramping up in 2025, with programs priced between $2.8 million and $3.4 million per rack. As of March 2026, costs have reportedly climbed to as excessive as $8.8 million for totally configured programs, reflecting hovering demand for superior AI infrastructure. NVIDIA’s knowledge heart income, which reached $39.1 billion in Q1 FY26, underscores the rising reliance on programs just like the GB200 NVL72 for AI and HPC workloads.
For merchants, NVIDIA’s inventory (NASDAQ: NVDA) is presently buying and selling at $221.42 with a market cap of $5.40 trillion. The corporate’s management in AI {hardware}, mixed with improvements in software program like Slurm’s topology-aware scheduling, positions it strongly within the quickly increasing AI and HPC markets.
Trying Forward
The GB200 NVL72 represents a major leap ahead in AI supercomputing, however its full potential hinges on environment friendly workload administration. NVIDIA’s partnership with SchedMD to refine Slurm demonstrates how software program can complement {hardware} to realize exascale efficiency. For organizations deploying these programs, steady monitoring and simulation-based testing of scheduling insurance policies might be key to sustaining each excessive utilization and peak efficiency.
As AI fashions proceed to develop in complexity, the GB200 NVL72 and related architectures will possible grow to be foundational to large-scale AI coaching and inference. With additional developments in scheduling algorithms and {hardware} integration, the period of exascale AI computing is simply starting.
Picture supply: Shutterstock
