Alvin Lang
Mar 18, 2025 20:51
NVIDIA introduces the NeMo Curator, a GPU-accelerated streaming pipeline for environment friendly video processing on DGX Cloud, optimizing AI mannequin improvement and lowering prices.
The arrival of bodily AI has considerably elevated video content material technology, with a single autonomous car producing over 1 TB of video every day, in line with NVIDIA. To handle and make the most of this huge information effectively, NVIDIA has launched the NeMo Curator, a GPU-accelerated streaming pipeline accessible on the NVIDIA DGX Cloud.
Challenges with Conventional Processing
Conventional batch processing methods have struggled with the exponential information development, usually resulting in underutilization of GPUs and elevated prices. These methods accumulate giant information volumes for processing, which may trigger inefficiencies and latency points in AI mannequin improvement.
GPU-Accelerated Streaming Answer
To handle these challenges, the NeMo Curator introduces a versatile streaming pipeline that leverages GPU acceleration for large-scale video curation. This superior pipeline incorporates auto-scaling and load-balancing methods to optimize throughput throughout numerous phases, maximizing {hardware} utilization and lowering complete value of possession (TCO).
Optimized Throughput and Useful resource Utilization
The streaming processing strategy permits for direct piping of intermediate information between phases, lowering latency and bettering effectivity. By separating CPU-intensive duties from GPU-intensive ones, the system can higher align with the precise capability of obtainable infrastructure, avoiding idle sources and making certain balanced throughput.
Structure and Implementation
The NeMo Curator pipeline, constructed on the Ray framework, is split into a number of phases, from video decoding to embedding computation. Every stage makes use of a pool of Ray actors to course of information in parallel, with the orchestration thread managing enter and output queues to take care of optimum throughput. The system dynamically adjusts the actor pool measurement to accommodate various stage speeds, making certain constant movement and effectivity.
Efficiency and Future Prospects
In comparison with conventional batch processing, the streaming pipeline achieves a 1.8x speedup, processing one hour of video per GPU in roughly 195 seconds. The NeMo Curator pipeline has proven an 89x efficiency enchancment over baseline, able to processing round 1 million hours of 720p video on 2,000 H100 GPUs in a single day. NVIDIA continues to work with early entry companions to refine the system additional and increase its capabilities.
For extra detailed insights, go to the NVIDIA weblog.
Picture supply: Shutterstock