In short
- Prophet Area exams AI fashions by having them predict real-world, unresolved occasions, with GPT-5 presently main the rankings.
- AI fashions present distinct prediction “personalities” and infrequently diverge from market consensus, typically producing excessive returns.
- Early outcomes recommend AI can forecast as precisely as prediction markets, doubtlessly reworking institutional decision-making.
A brand new synthetic intelligence benchmark launched in August reveals that AI fashions can forecast real-world occasions as precisely as prediction markets—and typically higher, in line with researchers on the College of Chicago’s SIGMA Lab.
Prophet Area evaluates AI programs by having them predict the outcomes of stay, unresolved occasions drawn from platforms like Kalshi and Polymarket—starting from election outcomes to sports activities matches and financial indicators. Not like conventional benchmarks that take a look at fashions on historic knowledge with recognized solutions, Prophet Area exams AI towards future predictions.
“By anchoring evaluations in unresolved, real-world occasions, Prophet Area ensures a stage taking part in discipline. There isn’t any pre-training benefit, no secret fine-tuning trick, no leakage of take a look at samples,” the Prophet Area crew stated within the benchmark’s official weblog submit.
The benchmark says it’s making an attempt to handle a basic query about synthetic intelligence: “Can AI programs reliably predict the long run by connecting the dots throughout present real-world data?”
Early outcomes recommend they’ll. GPT-5 presently leads the leaderboard with a Brier rating of 82.21%. In the meantime, OpenAI’s o3-mini mannequin has emerged because the revenue champion, producing the very best common returns when its predictions are translated into simulated bets (normally an underdog with sufficient possibilities to win can present much more return, given the correct circumstances).
DeepSeek R1 seems to be the contrarian AI within the group, incessantly making predictions that diverge sharply from each different fashions and market consensus, so most likely not the most effective mannequin to belief if you wish to make a fast buck on Myriad Markets.
The platform reveals distinct “personalities” amongst AI fashions when going through an identical data. In a single instance, when predicting whether or not AI regulation would develop into federal regulation earlier than 2026, the market assigned only a 25% chance. However the fashions diverged wildly: Qwen 3 predicted 75%, GPT-4.1 estimated 60%, whereas Llama 4 Maverick stayed conservative at 35%.
In one other case, o3-mini earned a simulated $9 return on a $1 guess by appropriately predicting Toronto FC would beat San Diego FC in a Main League Soccer match. The mannequin gave Toronto a 30% probability of profitable, whereas the market priced it at simply 11%. Toronto received.
“(Prophet Area) exams fashions’ forecasting functionality, a excessive type of intelligence that calls for a broad vary of capabilities, together with understanding present data and information sources, reasoning below uncertainty, and making time-sensitive predictions about unfolding occasions,” the researchers wrote.
The Prophet Area additionally permits human-AI collaboration. Customers can provide further information and context to see how predictions shift, whereas AI fashions present detailed rationales for his or her forecasts.
As prediction markets themselves combine AI—Kalshi not too long ago partnered with Elon Musk’s Grok, whereas Polymarket generates AI-powered market summaries—Prophet Area affords the primary systematic comparability of machine forecasting towards collective human judgment.
And, in the event that they get actually good at it, then machines might be purely factual, with no sentiments or feelings taking part in a job within the choices. They may doubtlessly match or exceed the knowledge of crowds, altering the way in which establishments method threat evaluation, funding choices, and strategic planning.
The Prophet Area platform continues updating every day as occasions resolve, offering an evolving image of whether or not synthetic intelligence can really predict the long run by connecting right now’s dots.
Typically Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.