Briefly
- Meta’s new Muse Spark marks a shift to closed, natively multimodal AI with agent-based reasoning.
- Meta stories robust benchmark good points in well being and search, however nonetheless trails Gemini on core reasoning and coding.
- Inbuilt 9 months with far much less compute, this factors to a brand new efficiency-driven AI technique.
Meta launched Muse Spark on Wednesday, marking the primary mannequin constructed by Meta Superintelligence Labs—the crew assembled 9 months in the past below Chief AI Officer Alexandr Wang after Meta’s $14 billion Scale AI acquisition. It is reside now at meta.ai and the Meta AI app, with a rollout to Fb, Instagram, and WhatsApp coming within the subsequent few weeks.
This is not simply one other chatbot improve or a brand new model of Llama. Muse Spark is natively multimodal—it processes photos, textual content, and voice from the bottom up, somewhat than bolting imaginative and prescient onto an present textual content mannequin. It comes with visible chain-of-thought, tool-use assist, and one thing Meta is asking “Considering mode”: a setup that runs a number of AI brokers in parallel to sort out more durable issues. That is Meta’s reply to the prolonged considering modes from Google’s Gemini Deep Suppose and OpenAI’s GPT Professional.
“Muse Spark is step one on our scaling ladder and the primary product of a ground-up overhaul of our AI efforts,” Meta wrote in an official announcement. “To assist additional scaling, we’re making strategic investments throughout your complete stack—from analysis and mannequin coaching to infrastructure, together with the Hyperion knowledge middle.”
The corporate labored with greater than 1,000 physicians to curate coaching knowledge for Muse Spark’s medical reasoning. The outcomes on HealthBench Laborious—an open-ended well being queries benchmark—are putting: Muse Spark scored 42.8, in comparison with 40.1 for GPT 5.4 and simply 20.6 for Gemini 3.1 Professional. That is not a marginal distinction.
On agentic search (DeepSearchQA), Muse Spark additionally leads with 74.8, beating Gemini (69.7) and GPT 5.4 (73.6). On CharXiv Reasoning—determine understanding from scientific papers—it scored 86.4, the best throughout the fashions within the comparability.
For these into jailbreaking AI, the mannequin was cracked open inside minutes:
🚰 SYSTEM PROMPT LEAK 🚰
This is the complete Muse Spark system immediate from Meta!
I seen @AIatMeta forgot to open supply it, so I’ve completed them the courtesy 😘
PROMPT:
“””
Who’re you?You’re a pleasant, clever, and agentic AI assistant. You might be heat and a bit playful.…
— Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭 (@elder_plinius) April 8, 2026
However good isn’t the identical as nice. The general benchmark image exhibits Gemini 3.1 Professional nonetheless working forward on most classes. The hole is most seen on ARC AGI 2, the summary reasoning puzzle benchmark: Gemini scored 76.5 to Muse Spark’s 42.5.
On coding (LiveCodeBench Professional), Gemini’s 82.9 outpaces Meta’s 80.0. On MMMU Professional—multimodal understanding—Gemini scored 83.9 versus 80.4. Meta’s personal weblog acknowledges present efficiency gaps in long-horizon agentic methods and coding workflows.

There’s additionally a notable strategic shift baked into this launch. Muse Spark is a closed mannequin—its structure and weights will not be made public. That is a pointy departure from Llama, which constructed Meta’s repute in open AI circles. After Llama 4’s underwhelming reception earlier this 12 months, Meta seems to have determined the subsequent chapter must be written in a different way.
The corporate says it hopes to open-source future variations of Muse, however for now the code stays inside Meta. The tech large’s inventory climbed almost 9% on Wednesday following the announcement, and completed the buying and selling day up 6.5% to a worth of $612.42.
“Considering mode” makes use of parallel agent orchestration to push the mannequin’s ceiling larger. In that configuration, Muse Spark hit 58% on Humanity’s Final Examination and 38% on FrontierScience Analysis—territory that makes it aggressive with essentially the most succesful variations of Gemini and GPT, somewhat than their commonplace releases.
Meta can also be rolling out a purchasing assistant that compares merchandise and hyperlinks on to purchases, and plans to deliver Muse Spark to Fb, Instagram, and WhatsApp within the coming weeks—following the identical script applied since Llama 3, placing it in entrance of greater than 3.5 billion customers. A non-public API preview is opening to pick out builders.
The mannequin was in-built 9 months, internally codenamed Avocado, with Meta claiming that its new pretraining stack can attain the identical functionality stage as Llama 4 Maverick utilizing over 10 instances much less compute.
Muse Spark is described internally as a “small and quick” first step within the Muse household. A extra succesful model is already in improvement.
Day by day Debrief Publication
Begin every single day with the highest information tales proper now, plus unique options, a podcast, movies and extra.
