Terrill Dicki
Might 13, 2026 17:28
NVIDIA’s XANI workflow slashes nanoscale imaging knowledge evaluation from 9 months to below 4 hours utilizing Grace Blackwell Superchips.

NVIDIA has unveiled a significant breakthrough in nanoscale imaging with its Accelerated X-ray Evaluation for Nanoscale Imaging (XANI) workflow. Utilizing its Grace Blackwell Superchips, the corporate has reduce down knowledge processing time for X-ray free-electron laser (XFEL) amenities from 9 months to below 4 hours—an enchancment of over 1,000x.
XFEL amenities, akin to LCLS-II within the U.S. and European XFEL in Germany, generate huge datasets whereas probing the atomic and digital dynamics of superior supplies like semiconductors, batteries, and catalysts. These amenities produce as much as 1 million X-ray pulses per second, capturing structural shifts on the atomic degree in actual time. Nonetheless, analyzing the ensuing terabytes of multidimensional knowledge has historically been a computational bottleneck.
NVIDIA’s XANI resolution leverages the GB200 Grace Blackwell Superchips to speed up this course of. By combining GPU-based processing with CUDA Python and distributed computing, the crew compressed the evaluation of 42 terabytes of information to below 4 hours whereas sustaining precision. This can be a stark distinction to conventional CPU-bound workflows, which regularly course of simply 10% of a dataset throughout experiments.
Key Improvements in XANI
A number of technical developments underpin XANI’s efficiency:
- GPU Acceleration: XANI achieved a 43x speedup on a single GPU and a 1,000x enhance on 64 GPUs in comparison with earlier CPU-based strategies.
- cuPyNumeric Libraries: New libraries, like LMFIT and multithreaded HDF5, improved GPU utilization and enabled 165x quicker I/O throughput.
- GPUDirect Storage (GDS): By instantly loading knowledge into GPU reminiscence, XANI bypasses CPU bottlenecks, enabling learn speeds of as much as 700GB/s throughout 16 Grace Blackwell nodes.
The workflow additionally introduces a distributed reminiscence structure that simplifies scientific computing. By swapping NumPy imports for cuPyNumeric, researchers can mechanically parallelize operations throughout clusters with out writing advanced MPI code. This makes XANI accessible to fields past physics, together with supplies chemistry and quantum computing.
Scaling for Subsequent-Gen Analysis
The XANI structure is designed for scalability. With its GPU-centric distributed mannequin, scientists can now analyze knowledge in actual time, offering stay suggestions throughout experiments. This functionality may redefine how XFEL amenities function, decreasing delays between knowledge assortment and actionable insights.
Due to advances in nonlinear least-squares algorithms and batched GPU computation, XANI can course of high-resolution imaging knowledge all the way down to the pixel degree. The workflow’s capability to suit damped oscillations to detector knowledge in parallel ensures quicker and extra exact outcomes than ever earlier than.
Implications for Scientific Discovery
NVIDIA’s XANI workflow represents a paradigm shift for high-performance computing in scientific analysis. By decreasing evaluation occasions from months to hours, it accelerates discoveries in supplies science, quantum physics, and past. XFEL amenities worldwide now stand to learn from these efficiencies, unlocking new alternatives for real-time experimentation.
For researchers, the implications are clear: superior GPU-based programs like Grace Blackwell Superchips have gotten indispensable instruments in tackling the information challenges of contemporary science.
Picture supply: Shutterstock
