Briefly
- DeepSeek launched its new V4-Professional mannequin with 1.6 trillion parameters.
- It prices $1.74/$3.48 per million enter/output tokens, roughly 1/twentieth the value of Claude Opus 4.7 and 98% lower than GPT 5.5 Professional.
- DeepSeek educated V4 partly on Huawei Ascend chips, circumventing U.S. export restrictions, and says that after 950 new supernodes come on-line later in 2026, the Professional mannequin’s already-low value will drop additional.
DeepSeek is again, and it confirmed up just a few hours after OpenAI dropped GPT-5.5. Coincidence? Possibly. However in case you’re a Chinese language AI lab that the U.S. authorities has been attempting to decelerate with chip export bans for the previous three years, your sense of timing will get fairly sharp.
The Hangzhou-based lab launched preview variations of DeepSeek-V4-Professional and DeepSeek-V4-Flash at this time, each open-weight, each with a million token context home windows. Which means you’ll be able to principally work with a context roughly the scale of the Lord of the Rings Trilogy earlier than the mannequin collapses. Each are additionally priced nicely beneath something comparable within the West, and each are free for these able to operating regionally.
DeepSeek’s final main disruption—R1 in January 2025—wiped $600 billion from Nvidia’s market cap in a single day as investor questioned whether or not American firms actually wanted such large investments to provide outcomes {that a} small chinese language lab achieved with a fraction of the price. V4 is a distinct sort of transfer: quieter, extra technical, and extra centered on effectivity for anybody truly constructing with AI.
Two fashions, very completely different jobs
Of the 2 new fashions, DeepSeek’s V4-Professional is the massive one, with 1.6 trillion whole parameters. To place that in perspective, parameters are the interior “settings” or “mind cells” {that a} mannequin makes use of to retailer information and acknowledge patterns—the extra parameters a mannequin has, the extra advanced data it may well theoretically maintain. That makes it the largest open-source mannequin within the LLM market so far. The dimensions could sound ridiculous till you be taught it solely prompts 49 billion of them per inference go.
That is the Combination-of-Consultants trick DeepSeek has refined since V3: The complete mannequin sits there, however solely the related slice of it wakes up for any given request. Extra information, identical compute invoice.
“DeepSeek-V4-Professional-Max, the utmost reasoning effort mode of DeepSeek-V4-Professional, considerably advances the information capabilities of open-source fashions, firmly establishing itself as the most effective open-source mannequin out there at this time,” Deepseek wrote within the mannequin’s official card on Huggingface. “It achieves top-tier efficiency in coding benchmarks and considerably bridges the hole with main closed-source fashions on reasoning and agentic duties.”
V4-Flash is the sensible one: 284 billion whole parameters, 13 billion lively. It’s designed to be sooner, cheaper, and in line with DeepSeek’s personal benchmarks, “achieves comparable reasoning efficiency to the Professional model when given a bigger considering finances.”
Each help a million tokens of context. That is roughly 750,000 phrases—roughly your complete “Lord of the Rings” trilogy plus change. And that’s as a regular function, not a premium tier.
Deepseek’s (not so) secret sauce: Making consideration not horrible at scale
This is the technical half for nerds or these within the magic powering the mannequin. Deepseek doesn’t disguise its secrets and techniques, and all the things is obtainable at no cost—the total paper is obtainable on Github.
Customary AI consideration—the mechanism that lets a mannequin perceive relationships between phrases—has a brutal scaling downside. Each time you double the context size, the compute price roughly quadruples. So operating a mannequin on 1,000,000 tokens is not simply twice as costly as 500,000 tokens. It is 4 occasions as costly. Because of this lengthy context has traditionally been a checkbox labs add after which silently throttle behind price limits.
DeepSeek invented two new consideration sorts to get round this. The primary, Compressed Sparse Consideration, works in two steps. It first compresses teams of tokens—say, each 4 tokens—right into a single entry. Then, as an alternative of attending to all of these compressed entries, it makes use of a “Lightning Indexer” to select solely probably the most related outcomes for any given question. Your mannequin goes from attending to 1,000,000 tokens to attending to a a lot smaller set of a very powerful chunks, sort of like a librarian who would not learn each guide however is aware of precisely which shelf to test.
The second, Closely Compressed Consideration, is extra aggressive. It collapses each 128 tokens right into a single entry—no sparse choice, simply brutal compression. You lose fine-grained element, however you get a particularly low-cost world view. The 2 consideration sorts run in alternating layers, so the mannequin will get each the element and the overview.

The consequence, from the technical paper: At a million tokens, V4-Professional makes use of 27% of the compute its predecessor (V3.2) wanted. KV cache—the reminiscence the mannequin wants to trace context—drops to only 10% of V3.2. V4-Flash pushes that additional: 10% of compute, 7% of reminiscence.
And this ended up with Deepseek having the ability to supply a less expensive value per token than its opponents, whereas offering comparable outcomes. To place that in greenback phrases: GPT-5.5 launched yesterday at $5 enter and $30 output per million tokens with GPT-5.5 Professional priced at $30 per million enter tokens and $180 per million output tokens.
Deepseek V4-Professional is $1.74 enter and $3.48 output. V4-Flash is $0.14 enter and $0.28 output. Cline CEO Saoud Rizwan identified that if Uber had used DeepSeek as an alternative of Claude, its 2026 AI finances—reportedly sufficient for 4 months of utilization—would have lasted seven years.
deepseek v4 is now the most cost effective sota mannequin out there at 1/twentieth the price of opus 4.7.
for perspective, if uber used deepseek as an alternative of claude their 2026 ai finances would have lasted 7 years as an alternative of solely 4 months. pic.twitter.com/i9rJZzvRBV
— Saoud Rizwan (@sdrzn) April 24, 2026
The benchmarks
DeepSeek does one thing uncommon in its technical report: It publishes the gaps. Most mannequin releases cherry-pick the benchmarks the place they win. DeepSeek ran the total comparability towards GPT-5.4 and Gemini-3.1-Professional, discovered that V4-Professional’s reasoning lags behind these fashions by about three to 6 months, and printed it anyway.
The place V4-Professional-Max truly wins: Codeforces, aggressive programming benchmark, rated like human chess. V4-Professional scored 3,206, inserting it round twenty third amongst precise human contest individuals. On Apex Shortlist, a curated set of arduous math and STEM issues, it scored a go price and hit 90.2% versus Opus 4.6’s 85.9% and GPT-5.4’s 78.1%. On SWE-Verified, which measures whether or not a mannequin can resolve actual GitHub points pulled from precise open-source repositories, it scored 80.6%—matching Claude Opus 4.6.

The place it trails: multitasking benchmark MMLU-Professional (Gemini-3.1-Professional at 91.0% vs V4-Professional at 87.5%), skilled information benchmark GPQA Diamond (Gemini 94.3 vs V4-Professional 90.1), and Humanity’s Final Examination, a graduate-level benchmark the place Gemini-3.1-Professional’s 44.4% nonetheless beats V4-Professional’s 37.7%.
On lengthy context particularly, V4-Professional leads open-source fashions and beats Gemini-3.1-Professional on the CorpusQA benchmark (a take a look at simulating actual doc evaluation at a million tokens), however loses to Claude Opus 4.6 on MRCR—a take a look at measuring how nicely a mannequin retrieves particular needles buried deep in a really lengthy haystack.
Constructed to run brokers, not simply reply questions
The agentic stuff is the place this launch will get attention-grabbing for builders truly transport merchandise.
V4-Professional can run in Claude Code, OpenCode, and different AI coding instruments. Based on DeepSeek’s inside survey of 85 builders who used V4-Professional as their major coding agent, 52% stated it was able to be their default mannequin, 39% leaned towards sure, and fewer than 9% stated no. Inner workers stated it outperforms Claude Sonnet and approaches Claude Opus 4.5 on agentic coding duties.

Synthetic Evaluation, which runs impartial evaluations of AI fashions on real-world duties, ranked V4-Professional first amongst all open-weight fashions on GDPval-AA—a benchmark testing economically precious information work throughout finance, authorized, and analysis duties, scored by way of Elo. V4-Professional-Max scored 1,554 Elo, forward of GLM-5.1 (1,535) and MiniMax’s M2.7 (1,514). For reference, Claude Opus 4.6 scores 1,619 on the identical benchmark—nonetheless forward, however the hole is closing.
DeepSeek V4 Professional is the #1 open weights mannequin on GDPval-AA, our agentic real-world work duties analysis@deepseek_ai has launched V4 Professional (1.6T whole / 49B lively) and V4 Flash (284B whole / 13B lively). V4 is DeepSeek’s first new measurement since V3, with all intermediate fashions… pic.twitter.com/2kJWVrKQjF
— Synthetic Evaluation (@ArtificialAnlys) April 24, 2026
Deepseek’s V4 additionally introduces one thing referred to as “interleaved considering.” In earlier fashions, in case you had been operating an agent that made a number of device calls—say, it searched the net, then ran some code, then searched once more—the mannequin’s reasoning context acquired flushed between rounds. Every new step, the mannequin needed to rebuild its psychological mannequin from scratch. V4 retains the total chain of thought throughout device calls, so a 20-step agent workflow would not undergo from amnesia midway by way of. This issues greater than it sounds for anybody operating advanced automated pipelines.
Deepseek and the U.S.-China AI struggle
The U.S. has been proscribing high-end Nvidia chip exports to China since 2022. The acknowledged objective was to sluggish Chinese language AI growth, however the chip ban did not cease DeepSeek and as an alternative made them invent a extra environment friendly structure and construct out home {hardware} provide.
DeepSeek did not launch V4 in a vacuum—the AI area has been flush with exercise as of late: Anthropic shipped Claude Opus 4.7 on April 16—a mannequin Decrypt examined and located sturdy on coding and reasoning, with notably excessive token utilization. The day earlier than that, Anthropic was additionally sitting on Claude Mythos, a cybersecurity mannequin it says it may well’t launch publicly as a result of it is too good at autonomous community assaults.
Xiaomi dropped MiMo V2.5 Professional on April 22, going full multimodal—picture, audio, video. Prices $1 enter and $3 output per million tokens. It matches Opus 4.6 on most coding benchmarks. Three months in the past, no person was speaking about Xiaomi as a frontier AI firm. Now it is transport aggressive fashions sooner than most Western labs.
OpenAI’s GPT-5.5 landed yesterday with prices spiking as much as $180 per million tokens of output within the Professional model. It beats V4-Professional on Terminal Bench 2.0 (82.7% vs 70.0%), which assessments advanced command-line agent workflows. However it prices significantly greater than V4-Professional for equal duties. That very same day Tencent launched Hy3, one other state-of-the-art mannequin centered on effectivity.
What this implies for you
So with so many new fashions out there, the query builders are literally asking: When is the premium value it?
For enterprise, the mathematics could have modified. A mannequin that leads open-source benchmarks at $1.74 per million enter tokens means large-scale doc processing, authorized overview, or code technology pipelines that had been costly six months in the past are actually less expensive. The one-million-token context means you’ll be able to feed total codebases or regulatory filings in a single request as an alternative of chunking them throughout a number of calls.
Moreover, its open-source nature means it cannot solely be run at no cost on native {hardware}, however it may be custom-made and improved primarily based on the corporate’s wants and use circumstances.
For builders and solo builders, V4-Flash is the one to observe. At $0.14 enter and $0.28 output, it is cheaper than fashions that had been thought-about finances choices a yr in the past—and it handles most duties the Professional model handles. DeepSeek’s current deepseek-chat and deepseek-reasoner endpoints already path to V4-Flash in non-thinking and considering modes respectively, so in case you’re on the API, you are already utilizing it.
The fashions are text-only for now. DeepSeek stated it is engaged on multimodal capabilities, which implies different huge labs from Xiaomi to OpenAI nonetheless have that edge. Each fashions are MIT licensed and out there on Hugging Face at this time. The outdated deepseek-chat and deepseek-reasoner endpoints retire on July 24, 2026.
Day by day Debrief Publication
Begin each day with the highest information tales proper now, plus authentic options, a podcast, movies and extra.
