Sam Altman’s OpenAI o3 mannequin—which was deprecated late final week with the discharge of GPT-5—demolished Elon Musk’s Grok 4 in 4 straight video games Thursday to win Google’s Kaggle Recreation Enviornment AI Chess Exhibition.
You could assume it was an excellent advanced spectacle of excessive tech behemoths placing their reasoning to the final word check, however as an appetizer, let’s say world champion Magnus Carlsen in contrast each bots to “a gifted child who does not know the way the items transfer.”
The three-day match, which ran August 5-7, compelled general-purpose chatbots—sure, the identical ones that aid you write e mail and declare to be approaching human-level intelligence—to play chess with none specialised coaching. No chess engines, no wanting up strikes, simply no matter chess data they’d randomly absorbed from the web.
The outcomes had been about as elegant as you’d anticipate from forcing a language mannequin to play a board sport. Carlsen, who co-commentated the ultimate, estimated each AIs had been taking part in on the degree of informal gamers who just lately discovered the foundations—round 800 ELO. For context, he is arguably the most effective chess participant who ever lived, with an ELO of 2839 factors. These AIs had been taking part in like they’d discovered chess from a corrupted PDF.
“They oscillate between actually, actually good play and incomprehensible sequences,” Carlsen mentioned throughout a broadcast, following the sport. At one level, after watching Grok stroll its king immediately into hazard, he joked it’d assume they had been taking part in King of the Hill as a substitute of chess.
The precise video games had been like a masterclass in how to not play chess, even for individuals who do not know the sport. Within the first match, Grok basically gave away certainly one of its necessary items without cost, then made issues worse by buying and selling off extra items whereas already behind.
Recreation two bought even weirder. Grok tried to execute what chess gamers name the “Poisoned Pawn”—a dangerous however reliable technique the place you seize an enemy pawn that appears free however is not. Besides Grok grabbed the improper pawn completely, one which was clearly defended. Its queen (essentially the most highly effective piece within the board) bought trapped and captured instantly.
By sport three, Grok had constructed what seemed like a stable place—good positional management, no apparent risks, and mainly a arrange that may aid you win the match. Then in mid sport, it mainly fumbled the ball on to the opponent. It misplaced piece after piece in fast succession.
This was truly bizarre, contemplating that earlier than the match in opposition to o3, Grok was a fairly sturdy contender, displaying stable potential—a lot that the chess Grand Grasp Hikaru Nakamura praised it. “Grok is definitely the most effective to date, simply being goal, simply the most effective.”
The fourth (and final) sport offered the one real suspense. OpenAI’s o3 made an enormous blunder early within the sport, which is a giant hazard in any cheap match. Nakamura, who was streaming the match, mentioned there have been nonetheless “a number of tips” left for o3 regardless of the drawback.
He was proper—o3 clawed again to win its queen again and slowly squeezed out a victory whereas Grok’s endgame play fell aside like moist cardboard.
“Grok made so many errors in these video games, however OpenAI didn’t,” Nakamura mentioned throughout his livestream. This was fairly the reversal from earlier within the week.
The timing could not have been worse for Elon Musk. After Grok’s sturdy early rounds, he’d posted on X that his AI’s chess talents had been only a “facet impact” and that xAI had “spent virtually no effort on chess.” That turned out to be an understatement.
Earlier than this “official” chess match, Worldwide Grasp Levy Rozman hosted his personal match earlier this yr with much less superior fashions. He revered all of the strikes the chatbots really useful, and the entire state of affairs ended up being an entire mess with unlawful strikes, piece summonings, and incorrect calculations. Stockfish, an AI constructed particularly for chess, ended up profitable the match in opposition to ChatGPT. Altman’s AI was matched in opposition to Musk’s within the semifinals, and Grok misplaced. So it’s 2-0 for Sam.
Nevertheless, this match was totally different. Every bot bought 4 probabilities to make a authorized transfer—in the event that they failed 4 instances, they mechanically misplaced. This wasn’t hypothetical. In early rounds, AIs tried to teleport items throughout the board, deliver lifeless items again to life, and transfer pawns sideways like they had been taking part in some fever-dream model of chess they’d invented themselves.
They bought disqualified.
Google’s Gemini grabbed third place by beating one other OpenAI mannequin, salvaging some dignity for the match organizers. That bronze medal match featured a very absurd drawn sport the place each AIs had fully profitable positions at totally different factors however could not determine how you can end.
Carlsen identified that the AIs had been higher at counting captured items than truly delivering checkmate—they understood materials benefit however not how you can win. It is like being nice at accumulating substances however unable to prepare dinner a meal.
These are the identical AI fashions that tech executives declare are approaching human intelligence, threatening white-collar jobs, and revolutionizing how we work. But they can not play a board sport that has existed for 1,500 years with out making an attempt to cheat or forgetting the foundations.
So it’s in all probability protected to say we’re protected, AI gained’t take management of humanity, for now.
Typically Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.