In short
- IplanRIO launched Rio 3.5 Open 397B on June 13, billing it as a government-built frontier AI mannequin with benchmark scores topping Qwen 3.7 Plus.
- AI firm Nex revealed a mathematical proof displaying the mannequin is a direct 0.6 Nex / 0.4 Qwen weight merge.
- IplanRIO up to date the mannequin card, credited Nex, pulled the benchmark claims, and blamed an “incorrect add.”
Rio de Janeiro’s IplanRIO launched Rio 3.5 on June 13. Town’s IT company referred to as it a frontier-class mannequin: 397 billion parameters, with a permissive open-source license, constructed by the municipal authorities of a metropolis within the World South.
Rio 3.5’s launch timing was good: Brazil was taking part in its World Cup opener, and social media was already on hearth. Feedback about it quickly unfold from Brazil to past.
However simply as shortly because it gained consideration, there was a dispute over who precisely created the mannequin.
The unique mannequin card described Rio 3.5 as a post-train of Qwen 3.5 397B, Alibaba’s open-base mannequin, with a brand new reasoning layer referred to as SwiReasoning added on high. The event price was reported at R$500,000 (Rio didn’t verify this), or practically $100,000 USD—roughly 30 occasions cheaper than equal off-the-shelf AI programs.
The structure is Combination-of-Consultants, which suggests solely round 17 billion of the 397 billion parameters hearth on any given token. That makes inference cheaper than the headline dimension suggests. The mannequin additionally helps imaginative and prescient and textual content, handles over a dozen languages, and ships below a totally open MIT license.
SwiReasoning is the technical centerpiece. It is a training-free inference framework that switches dynamically between two modes. When the mannequin is assured a couple of subsequent phrase—low entropy within the likelihood distribution—it causes in plain language. When unsure, it shifts to latent reasoning, pondering in hidden inner states with out emitting tokens. IplanRIO stated Rio 3.5 was particularly skilled to use this, and that the positive factors present up within the benchmark numbers.

The self-reported numbers have been eye-catching. Terminal-Bench 2.1—which measures autonomous terminal command execution, scored as proportion of duties handed—got here in at 70.8% for Rio 3.5, edging out Qwen 3.7 Plus at 70.3% and the highly effective DeepSeek v4 Professional at 67.9%.
On IMOAnswerBench, a math olympiad benchmark scored as proportion appropriate, Rio 3.5 hit 89.5%. On HLE—Humanity’s Final Examination, a near-unsolvable multi-domain knowledgeable battery scored as a proportion—Rio 3.5 landed at 36.5%, forward of Qwen 3.7 Plus’s 34.7%.
A municipal authorities beating crucial flagship fashions on essentially the most significant high quality benchmarks: That is the headline that unfold, particularly after the Mayor of Rio de Janeiro tweeted about it.
“An open AI mannequin skilled in Rio and publicly funded during the last yr by [the Municipality of Rio] has simply surpassed all different fashions,” Eduardo Cavaliere wrote. “Immediately, the world is speaking about an open AI mannequin skilled in Rio.”
🇧🇷 Modelo de IA aberta treinada no Rio com financiamento público ao longo do último ano pela @Prefeitura_Rio superando todos os outros modelos. Inteligência synthetic não é uma coisa distante, estrangeira, de laboratório bilionário…não existe só pra fazer texto, imagens… https://t.co/GK1ThytVV9
— Eduardo Cavaliere (@CavaliereRio) June 14, 2026
Then Nex confirmed up
“Skilled in Rio” proved to be not fully correct.
Nex-AGI, a Shanghai-based open-source AI alliance, posted on X days after the discharge. The opener: “The Rio 3.5 mannequin broke the web this week. The plot twist? It is basically our open-source mannequin, Nex N2 Professional, carrying a unique hat.”
They’d analyzed the weights. The mathematics was actual: Rio 3.5 ≈ 0.6 × Nex N2 Professional + 0.4 × Qwen 3.5. A verification script and a full GitHub report adopted.
The Rio 3.5 mannequin broke the web this week. The plot twist? It’s basically our open-source mannequin, Nex N2 Professional, carrying a unique hat.
🤯 We analyzed the weights, and the recipe is actual: Rio 3.5 ≈ 0.6 * Nex N2 Professional + 0.4 * Qwen 3.5
It even actually introduces itself… pic.twitter.com/yHRRu37aut
— Nex (@NexEcosystem) June 14, 2026
The proof got here in two elements.
First, behavioral. Nex stripped the hardcoded “You’re Rio” system immediate from the deployed mannequin and despatched it 120 identification questions. With out the masks, Nex studies the mannequin referred to as itself “Nex, from Nex-AGI” 79.2% of the time. It referred to as itself “Rio” precisely 0% of the time. Nex stated the mannequin additionally recited the corporate’s particular backstory verbatim, mentioning the “Shanghai Innovation Institute” and “a large-model ecosystem alliance.” That is Nex’s personal coaching knowledge, surfacing in another person’s mannequin.
Second, mathematical. In a real weight merge, each parameter within the new mannequin sits on a straight line between the 2 supply fashions. Nex measured this collinearity throughout all 60 layers. The consequence got here again at 0.993. Two unrelated fashions in the identical parameter area scored near-zero by probability. Hitting 0.993 throughout each single layer is not a coincidence. The blending ratio held at α ≈ 0.571, secure to 3 decimal locations.
Principally, it was practically 60% Nex, with the remainder being the bottom Qwen mannequin.
“Each weight tensor in Rio is, to 1000’s of normal deviations, the identical 0.6/0.4 mix of Nex and Qwen—throughout all 60 layers and each element of the community,” Nex wrote. “There isn’t a harmless clarification.”

The numbers additionally instructed a quieter story. Nex N2 Professional, launched simply days earlier than Rio 3.5, scores 75.3% on Terminal-Bench 2.1—greater than Rio’s 70.8%. On GDPval, an financial forecasting benchmark scored as an Elo-style score, Nex sits at 1,585 towards Rio’s 1,533. If Rio is 60% Nex, then you definitely’d count on it to attain under Nex on Nex’s personal benchmarks. It does.

IplanRIO responds
IplanRIO up to date the Hugging Face mannequin card—the benchmark desk got here down and the attribution modified.
“The mannequin is constructed by way of a merge of nex-agi/Nex-N2-Professional and Qwen/Qwen3.5-397B-A17B, preceded by On-Coverage Distillation from a stronger mannequin,” the up to date Readme says. “We detected an incorrect add within the earlier model, the place the bottom merged model was uploaded as an alternative of the ultimate distilled mannequin. We’re sorry for the confusion and apologize profusely.”
No different public assertion from IplanRIO has come out. Nex is now credited.
The “incorrect add” clarification is the important thing declare. IplanRIO says the meant launch was a distilled model of the merged base—not the uncooked merge itself. On-policy distillation means a stronger trainer mannequin generates outputs, and the coed trains on these whereas additionally producing its personal. It is costlier than a uncooked merge, however nonetheless cheaper than coaching from scratch. If that step was actual, then it could characterize at the least some unique work on high of the merge.
What really shipped, per IplanRIO, was the merged base with nothing on high.
Group observers cut up on what which means. Tech commentator Rafael Quintanilha gave the charitable learn: Since Nex N2 Professional is itself constructed on Qwen, the workforce might have credited the underlying structure and left it there. He additionally identified the mannequin went viral throughout a World Cup match, “not essentially ‘prepared for public consumption.'”
concerning the Rio 3.5 scenario
merging two ~400B-class fashions after which making use of coverage distillation isn’t trivial
that stated, they made two errors:
– a technical error (most likely brought on by an absence of consideration to element)
– and a communication one (we are able to debate the integrity of…
— montano (@lucas_montano) June 15, 2026
Developer and AI YouTuber Lucas Montano famous that “merging two ~400B-class fashions after which making use of coverage distillation is not trivial”—whereas acknowledging each a technical error and a communication failure.
AI researcher Diego Ambrosio was much less beneficiant. The unique launch described Rio 3.5 as the results of “autonomous post-training and proprietary fine-tuning”—framing that implied unique analysis, not a merge.
Authorized? Sure. Moral? Effectively…
Mannequin merging is totally authorized. Nex N2 Professional is Apache 2.0—you need to use it, modify it, and redistribute it, so long as you credit score it. Qwen 3.5 is brazenly licensed too. No person’s going to courtroom. right here.
The issue was presenting the output as independently developed work with out naming all the supply fashions. The open-source group has seen this earlier than. Earlier this yr, Cursor’s Composer 2 was discovered to be constructed on Moonshot’s Kimi K2.5 with out disclosure. The backlash was quick and reputational—no legal professionals, simply screenshots.
Constructing on current open fashions is regular. As Decrypt has coated, stacking and merging open weights is virtually its personal subculture. The norm is not “do not construct on others’ work.” The norm is: Say what you used.
What made this louder than a typical attribution miss was the institutional wrapper. A pseudonymous developer delivery a frankenmerge below their very own title is one factor. A municipal authorities utilizing it to assert public-sector AI sovereignty—in the course of the World Cup—is one other. “It was a waste of sources,” one Brazilian commentator wrote.
Nex did not make it a struggle. “We’re flattered that the Metropolis of Rio used our work to attain SOTA efficiency,” the corporate wrote on X. “However within the open-source world, attribution issues.”
IplanRIO is working to add the corrected, distilled mannequin with full attribution in place. When that lands, the identical checks will run once more—and the group will discover out whether or not the distillation really modified something, or whether or not it is nonetheless largely Nex with a unique system immediate.
Every day Debrief Publication
Begin on daily basis with the highest information tales proper now, plus unique options, a podcast, movies and extra.
