Satya Nadella needs AI to be your subsequent physician.
The Microsoft CEO introduced two healthcare AI advances on social media this week, together with MAI-DxO, a system that simulates a number of digital medical doctors working collectively to unravel medical mysteries.
In testing towards 304 advanced instances from the New England Journal of Medication, Microsoft reported that the AI appropriately recognized 85.5% of them. A gaggle of 21 skilled physicians tackling the identical instances? They obtained 20% proper.
“Excited to share two advances that carry us nearer to real-world affect in healthcare AI,” Nadella wrote. “MAI-DxO is a model-agnostic orchestrator that simulates a panel of digital physicians. It achieves 85.5% diagnostic accuracy—4 instances that of skilled medical doctors—whereas slicing diagnostic prices.”
Excited to share two advances that carry us nearer to real-world affect in healthcare AI:
SDBench introduces a brand new benchmark that transforms 304 NEJM instances into interactive diagnostic simulations. AI should ask questions, order assessments, and weigh prices, mirroring the complexity of… pic.twitter.com/lASC4hK730
— Satya Nadella (@satyanadella) June 30, 2025
The announcement comes as Microsoft joins a crowded area of tech corporations racing to use AI to healthcare’s thorniest issues.
With Individuals spending practically $5 trillion yearly on healthcare—and diagnostic errors affecting 12 million individuals every year, in keeping with Johns Hopkins College—the concept of utilizing AI to handle human-related points looks as if a no brainer.
How Microsoft’s Medical Council Works
MAI-DxO works like a medical dream workforce trapped in a pc. The system tackles instances by means of what Microsoft calls the Sequential Analysis Benchmark, or SDBench.
As an alternative of multiple-choice questions like conventional medical AI assessments, it mirrors how medical doctors really work: beginning with restricted details about a affected person, asking follow-up questions, ordering assessments, and adjusting theories as new knowledge arrives.
Every check incurs a price in digital cash, forcing the AI to steadiness thoroughness towards healthcare spending.
In different phrases, it mainly simulates a medical council debating a case, with completely different fashions taking part in completely different roles. The fashions debate, disagree, and finally attain a consensus, identical to your physicians would for those who had been a difficult case to review.
In a single configuration, MAI-DxO achieved 80% accuracy whereas spending $2,397 per case, roughly 20% lower than the $2,963 that physicians usually spend.
At peak efficiency, it achieved 85.5% accuracy at a price of $7,184 per case. By comparability, OpenAI’s standalone o3 mannequin achieved 78.6% accuracy however value $7,850.
The digital doctor panel consists of Dr. Speculation, who maintains a operating record of the three almost definitely diagnoses utilizing Bayesian chance strategies.
Dr. Check-Chooser selects as much as three diagnostic assessments per spherical, aiming for max data acquire.
Dr. Challenger acts because the contrarian, looking for proof that contradicts the prevailing principle. Dr. Stewardship vetoes costly assessments with low diagnostic worth.
In the meantime, Dr. Guidelines ensures all check names are legitimate and the workforce’s reasoning stays constant.
Microsoft examined the system on instances revealed within the New England Journal of Medication between 2024 and 2025, after the AI’s coaching cutoff date, eliminating any chance the mannequin had memorized the solutions.
The research had been troublesome instances that required thorough examination to be correctly recognized.
The 21 physicians Microsoft recruited for comparability had between 5 and 20 years of expertise, with a median of 12 years.
They labored with out entry to colleagues, textbooks, or AI help to make sure a good comparability of uncooked diagnostic potential. They reported a 20% success charge on these admittedly troublesome instances.
The system operates in a number of modes. “On the spot Reply” offers a analysis based mostly solely on preliminary data for $300—the price of one doctor go to.
“Query Solely” permits follow-up questions with out ordering assessments. “Budgeted” tracks prices with a most spending restrict. “No Finances” provides the panel free rein, whereas “Ensemble” runs a number of panels and aggregates their conclusions for max accuracy.
The Way forward for Medication?
MAI-DxO represents Microsoft’s broader push into shopper well being AI.
The corporate studies over 50 million health-related periods every day throughout its Bing and Copilot merchandise. From knee ache searches to pressing care lookups, Microsoft sees search engines like google and AI assistants changing into the brand new entrance door for healthcare.
After all, this is only one extra step in a really lengthy timeline of medical tech.
For context, Stanford’s MYCIN system recognized bacterial infections within the Seventies, and Google’s AMIE simulated doctor-patient conversations simply final yr.
Microsoft developed MAI-DxO as a model-agnostic system, which means it could possibly work with AI fashions from completely different corporations.
In testing, it boosted efficiency throughout fashions from OpenAI, Google, Anthropic, Meta, and others by a median of 11%. The advance was statistically important throughout all examined fashions.
Dr. Dominic King and Harsha Nori, who led the analysis at Microsoft AI, emphasised in a weblog publish that the know-how stays a analysis demonstration.
“Necessary challenges stay earlier than generative AI might be safely and responsibly deployed throughout healthcare,” they wrote. The system excels at advanced diagnostic challenges however wants testing on routine instances.
Microsoft plans to submit the analysis for peer assessment and is working with healthcare organizations to validate the strategy in scientific settings.
The corporate has made clear that any deployment would require “rigorous security testing, scientific validation, and regulatory opinions.”
For now, MAI-DxO stays confined to analysis labs. However with diagnostic errors contributing to just about 10% of affected person deaths and affecting hundreds of thousands yearly, Microsoft’s digital doctor panel represents one other step towards AI-assisted healthcare.
The five-doctor AI workforce may diagnose higher than 21 human physicians mixed, however it’s nonetheless too early to see a mainstream implementation.
Microsoft says AI will not substitute medical doctors; it’ll increase them. The 21 physicians who scored 20% on these brutal NEJM instances are most likely hoping that is true.
Edited by Sebastian Sinclair and Josh Quittner
Typically Clever Publication
A weekly AI journey narrated by Gen, a generative AI mannequin.