Ted Hisokawa
Jun 12, 2026 15:13
MiniMax M3, the 428B-parameter mannequin, launches on NVIDIA infrastructure, providing long-context reasoning and multimodal workflows for enterprise AI.

MiniMax M3, a cutting-edge 428-billion-parameter AI mannequin, is now obtainable on NVIDIA’s accelerated infrastructure, together with its Blackwell GPUs. The mannequin, launched by Shanghai-based MiniMax on June 1, 2026, goals to simplify enterprise AI workflows by combining long-context reasoning, multimodal capabilities, and agentic activity optimization—all in a single system.
The standout characteristic of MiniMax M3 is its capacity to course of as much as 1 million tokens in context, an enormous improve over most current fashions. This permits prolonged coding classes, advanced authorized doc evaluation, or long-form video understanding with out breaking context. Moreover, the mannequin helps native multimodal enter—textual content, photos, and video—eliminating the necessity for separate pipelines and decreasing complexity for builders.
Architectural Advances: MiniMax Sparse Consideration
On the coronary heart of M3’s efficiency is the brand new MiniMax Sparse Consideration (MSA) structure. Not like conventional quadratic consideration mechanisms, MSA makes use of a pre-filtering stage to focus solely on related context blocks, dramatically bettering pace and effectivity. Based on MiniMax, this reduces computational prices to only 1/twentieth of its predecessor, MiniMax M2, for 1M-token contexts. Prefill speeds are reportedly 9 instances sooner, whereas decoding is 15 instances sooner in comparison with older sparse consideration implementations.
The mannequin additionally trains natively throughout textual content, photos, and video from the bottom up, without having for post-training multimodality hacks—a key differentiator within the frontier mannequin area.
Enterprise Deployment and Customization
The MiniMax M3 might be deployed utilizing standard open-source inference engines like NVIDIA TensorRT LLM, SGLang, and vLLM. NVIDIA has built-in the mannequin into its Dynamo distributed inference platform, which boosts efficiency for long-sequence workloads by separating prefill and decode duties throughout GPUs. This strategy reportedly delivers a 4x enchancment in interactivity at 32k enter size sequences on NVIDIA Blackwell {hardware}.
For these seeking to customise M3, NVIDIA’s NeMo Framework provides strong instruments for fine-tuning, together with assist for sequence lengths as much as 128k tokens. Builders may also carry out reinforcement studying with the mannequin to optimize it for particular purposes like agent-based workflows or doc parsing.
Aggressive Market Place
MiniMax M3 is getting into a crowded AI mannequin market however goals to distinguish itself via its technical capabilities and open-weight strategy. On coding benchmarks, MiniMax claims a 59.0% rating on SWE-Bench Professional, narrowly outperforming GPT-5.5 (58.6%) and Gemini 3.1 Professional (54.2%). Whereas these outcomes are company-reported, they place M3 as a number one contender within the coding and multimodal AI area.
Crucially, the mannequin undercuts many closed-source opponents on value, with pricing reported at $0.60 per million enter tokens at launch. This aggressive pricing technique targets cost-sensitive enterprises deploying large-scale AI workflows.
What’s Subsequent?
Builders can begin working with MiniMax M3 instantly through NVIDIA’s GPU-accelerated API or by downloading mannequin weights from Hugging Face. With its open-weight design, the mannequin is predicted to see large adoption in domains like authorized tech, autonomous techniques, and multimodal content material era.
Whereas the AI world will likely be watching intently to confirm MiniMax’s claims on effectivity and benchmarks, the mannequin’s technical improvements and value construction make it a compelling choice for enterprises seeking to streamline advanced workflows.
Picture supply: Shutterstock
