Briefly
- Microsoft launched two totally different modes that pair GPT and Claude to extend the standard of AI analysis.
- Critique makes the fashions collaborate, whereas Council makes them work in parallel whereas a 3rd decide finds the discrepancies.
- This two-model workflow fixes hallucinations, weak citations, and different issues related to mono-model AI analysis.
Deep analysis AI has been one of many hottest arms races in tech this yr. Google introduced its analysis agent for Gemini in December 2024, OpenAI launched its personal analysis agent in February 2025, xAI adopted go well with, Perplexity doubled down, and Anthropic’s Claude constructed a loyal following amongst professionals who want detailed, cited solutions, introducing its agent in April of final yr.
Each firm has been attempting to persuade you that their single AI mannequin is the neatest researcher within the room. Microsoft simply stated: Why decide one?
The corporate introduced two new options on Monday for Copilot’s Researcher instrument—known as Critique and Council—that put OpenAI’s GPT and Anthropic’s Claude to work on the identical analysis job in sequence. The end result, in line with Microsoft’s testing towards an trade benchmark, scores larger than each system included in that take a look at, together with fashions from the highest AI corporations.
Introducing Critique, a brand new multi-model deep analysis system in M365 Copilot.
You should utilize a number of fashions collectively to generate optimum responses and experiences. pic.twitter.com/m4RlQmCKzs
— Satya Nadella (@satyanadella) March 30, 2026
“Critique is a brand new multi mannequin deep analysis system designed for complicated analysis duties. It separates technology from analysis and makes use of a mix of fashions from Frontier labs, together with Anthropic and OpenAI,” Microsoft explains. “One mannequin leads the technology section, planning the duty, iterating via retrieval, and producing an preliminary draft, whereas a second mannequin focuses on evaluation and refinement, performing as an knowledgeable reviewer earlier than the ultimate report is produced.”
Here is the fundamental drawback Critique is designed to repair: Each AI analysis instrument right this moment works the identical method. You ask a query, one mannequin plans a search, scours sources, writes a report, and arms it again to you. That single mannequin is doing every thing with nobody checking its work.
This will find yourself with some hallucinations slipping in, some errors in citations, faux or inaccurate claims, and so forth.
Critique breaks that workflow in two. GPT handles the primary section—it plans the analysis, pulls sources, and writes an preliminary draft. Then Claude steps in as a strict editor, reviewing the report for factual accuracy, quotation high quality, and whether or not the reply really addressed what was requested. Solely after that evaluation does the ultimate report attain the consumer. Microsoft says the roles can finally run in the wrong way too, with Claude drafting and GPT critiquing, although for now GPT goes first.
On the DRACO benchmark—a standardized take a look at overlaying 100 complicated analysis duties throughout 10 domains together with medication, legislation, and expertise—Copilot with Critique scored 57.4. factors with Anthropic’s Claude Opus 4.6 by itself hitting 42.7. Microsoft’s mixed system beats the following finest end result by almost 14%.

The most important beneficial properties confirmed up in breadth of research and presentation high quality, with factual accuracy additionally posting a major enchancment.
The second function, Council, takes a distinct method to the identical drawback. As a substitute of getting one mannequin evaluation the opposite’s work, Council runs GPT and Claude concurrently and places their full experiences aspect by aspect. A 3rd “decide” mannequin then reads each and writes a abstract explaining the place the 2 AIs agreed, the place they diverged, and what distinctive angles every one caught that the opposite missed. Evaluating AI analysis instruments manually has been one thing customers have needed to do themselves till now.
In Critique, the fashions basically collaborate with one another whereas in Council the fashions compete towards one another.
Critique is the default expertise in Researcher whereas Council requires you to pick “Mannequin Council” from the picker to activate the side-by-side mode. Each options are presently out there to customers enrolled in Microsoft’s Frontier program, the early-access channel for Copilot’s latest capabilities. A Microsoft 365 Copilot license ($30/consumer/month) is required, however customers additionally must be enrolled in Frontier to entry them.

OpenAI and Microsoft have a multibillion-dollar partnership, however Microsoft’s wager is that no single mannequin stays on high for lengthy, and that the true worth is within the orchestration layer that routes duties to whichever mixture works finest.
Day by day Debrief Publication
Begin every single day with the highest information tales proper now, plus unique options, a podcast, movies and extra.
