OpenAI launched GPT-4.5 on Thursday, simply someday after Anthropic launched Claude 3.7 Sonnet and merely every week following xAI’s Grok-3 debut and DeepSeek’s announcement of a brand new mannequin coming quickly.
And costly is the operative phrase right here. OpenAI’s new mannequin comes with an eye-watering API price ticket of $75 per million enter tokens and $150 per million output tokens.
It seems to be a brand new aggressive part within the AI race, with corporations scrambling to outdo one another with more and more succesful—and more and more costly—fashions.
For context, that is ten instances pricier than Claude 3.7 Sonnet, making it doubtlessly prohibitive for a lot of builders and startups seeking to construct on the know-how.
GPT-4o (its predecessor) value $2.50 per 1M tokens of enter and $10.00 per 1M tokens of output—making GPT-4.5 2900% costlier to enter and 1300% dearer to get a response.
Sam Altman, OpenAI’s CEO, did not draw back from acknowledging the mannequin’s large useful resource necessities in his announcement. “Dangerous information: It’s a big, costly mannequin,” he stated.
“A heads up: this isn’t a reasoning mannequin and received’t crush benchmarks. It’s a unique type of intelligence,” Altman stated. “There’s a magic to it I haven’t felt earlier than.”
GPT-4.5 is prepared!
excellent news: it’s the first mannequin that appears like speaking to a considerate particular person to me. i’ve had a number of moments the place i’ve sat again in my chair and been astonished at getting truly good recommendation from an AI.
dangerous information: it’s a big, costly mannequin. we…
— Sam Altman (@sama) February 27, 2025
And this appears to be the important thing. Customers are paying 1300% extra to not have a extra clever mannequin, however to have a nicer mannequin that feels extra human.
For instance, one factor by which GPT-4.5 shines, in accordance with OpenAI, is in what they name “vibes,” or primarily the mannequin’s EQ, heat, and collaborative really feel.
The corporate created a “Vibes check set” measuring inventive intelligence and conversational high quality, on which GPT-4.5 purportedly outperformed different fashions.
The examples shared through the presentation did not precisely introduce something new.
The primary demonstration had actually this immediate: “UGHHH! My pal cancelled on me once more!!! Write a textual content message telling them that I HATE THEM!!!!” which arguably isn’t one thing for which you’d use a reliable massive language mannequin.
In a following demonstration evaluating GPT-4.5 to OpenAI’s o1 mannequin, researchers requested each AIs to clarify the necessity for AI alignment and to assist craft a message to a pal who had canceled plans.
The responses, whereas exhibiting some improved nuance in GPT-4.5, hardly appeared revolutionary. The distinction was within the tone.
In one other instance, the analysis group requested the highly effective GPT-4.5 why the ocean water is salty.
The brand new mannequin responded utilizing much less advanced phrases—”due to rain, rivers, and rocks”—in comparison with earlier fashions.
GPT-4-Turbo gave a extra complete and detailed reply, which the group didn’t like, arguing that “you get the sensation that it desires you to understand how sensible it’s.”
One amusing element from the presentation was an Easter egg hinting at a doable GPT-6, with a question that learn: “Num GPUs for GPT-6 Coaching.”
Maybe when that mannequin arrives, the demos will probably be extra spectacular.
The benchmarks offered paint a combined image. GPT-4.5 scores 71.4% on GPQA (a science analysis), in comparison with GPT-4o’s 53.6%.
Nevertheless, it nonetheless trails behind OpenAI’s o3-mini mannequin, which scores 79.7% via its reasoning capabilities.
Related patterns emerged throughout different benchmarks. On the AIME ’24 math analysis, GPT-4.5 scored 36.7%, beating GPT-4o’s 9.3% however nonetheless far behind o3-mini’s 87.3%.
For coding duties, GPT-4.5 outperformed its predecessor and o3-mini on the SWE-Lancer Diamond benchmark however fell brief on SWE-Bench Verified in comparison with the reasoning-focused mannequin.
Altman described the mannequin in virtually mystical phrases, calling it “the primary mannequin that appears like speaking to a considerate particular person.”
He added: “I’ve had a number of moments the place I’ve sat again in my chair and been astonished at getting truly good recommendation from an AI.”
In the course of the mannequin’s presentation, OpenAI researchers defined that the corporate advances AI via two distinct approaches: unsupervised studying and reasoning.
Whereas reasoning teaches fashions to “suppose earlier than responding,” unsupervised studying helps enhance “phrase mannequin accuracy and instinct.” GPT-4.5 doubles down on the latter.
“GPT-4.5 is our subsequent step in scaling up unsupervised studying, rising world information, instinct, and lowering hallucinations,” an OpenAI analysis lead defined within the presentation.
Creating GPT-4.5 required large technical innovation, in accordance with the group. They needed to construct new inference techniques to serve such a big mannequin effectively, use low-precision coaching to maximise GPU utilization, and even prepare throughout a number of information facilities concurrently.
The discharge comes at a time when shopper expectations for AI are sky-high, and competitors within the area is intensifying. Whether or not GPT-4.5’s “totally different type of intelligence” and improved “vibes” justify its monumental useful resource necessities and steep pricing stays to be seen.
GPT-4.5 is at present obtainable for Professional customers who pay $200 a month. Plus customers paying $20 a month could have entry to the mannequin subsequent week.
Edited by Sebastian Sinclair
Usually Clever Publication
A weekly AI journey narrated by Gen, a generative AI mannequin.