Zach Anderson
Mar 11, 2026 22:27
NVIDIA releases 120B-parameter Nemotron 3 Tremendous with 5x throughput positive aspects for agentic AI. Main enterprises together with Siemens and Palantir already deploying.
NVIDIA dropped its Nemotron 3 Tremendous mannequin on March 11, 2026, a 120-billion-parameter open-source AI system that claims 5x increased throughput than its predecessor. The timing coincides with NVDA inventory buying and selling at $185.49, up 0.40% on the day, as the corporate pushes deeper into the enterprise AI agent market.
The mannequin tackles two issues plaguing multi-agent AI deployments: context explosion and what NVIDIA calls the “pondering tax.” Multi-agent workflows generate as much as 15x extra tokens than normal chatbots as a result of every interplay requires resending full dialog histories, device outputs, and reasoning chains. That will get costly quick.
Nemotron 3 Tremendous’s reply is a 1-million-token context window that lets brokers maintain complete workflow states in reminiscence. For sensible purposes, a software program growth agent can load a whole codebase directly. Monetary analysts can course of 1000’s of pages of stories with out re-reasoning throughout fragmented conversations.
Structure Selections Matter
The hybrid mixture-of-experts design retains solely 12 billion parameters lively throughout inference regardless of the 120 billion complete. NVIDIA launched a method known as Latent MoE that prompts 4 professional specialists for the computational value of 1. Mixed with multi-token prediction—producing a number of phrases concurrently—the corporate claims 3x sooner inference speeds.
On Blackwell {hardware} operating NVFP4 precision, inference runs as much as 4x sooner than FP8 on the earlier Hopper era with no accuracy loss, in accordance with NVIDIA’s benchmarks.
Enterprise Adoption Already Underway
The launch announcement reads like a buyer record. Perplexity is providing customers entry for search and as a part of its 20-model orchestration system. Software program growth platforms CodeRabbit, Manufacturing facility, and Greptile are integrating it into their AI coding brokers.
Heavier industrial purposes are coming from Siemens, Dassault Systèmes, and Cadence for manufacturing and semiconductor design automation. Palantir and Amdocs are deploying it for cybersecurity and telecom workflows respectively.
Cloud availability spans Google Cloud’s Vertex AI, Oracle Cloud Infrastructure, with Amazon Bedrock and Microsoft Azure coming quickly. Inference suppliers together with Fireworks AI, DeepInfra, and CloudFlare are already serving the mannequin.
Open Supply Play
NVIDIA launched the mannequin with open weights below a permissive license, together with over 10 trillion tokens of coaching knowledge and 15 reinforcement studying environments. That is a major departure from the closed-model method dominating frontier AI growth.
The mannequin topped the Synthetic Evaluation effectivity leaderboard and powered NVIDIA’s AI-Q analysis agent to first place on each DeepResearch Bench leaderboards—exams measuring multi-step analysis throughout massive doc units.
For NVIDIA traders watching the $4.51 trillion market cap firm, Nemotron 3 Tremendous represents one other push to make its {hardware} indispensable for enterprise AI deployment. The true take a look at can be whether or not these enterprise integrations translate to sustained Blackwell chip demand via 2026.
Picture supply: Shutterstock

