Can AI Brokers Increase Ethereum Safety? OpenAI and Paradigm Created a Testing Floor - Decrypt

In short

EVMbench exams AI brokers on 120 real-world Ethereum good contract vulnerabilities.
Software evaluates detection, patching, and exploitation throughout three distinct modes.
GPT-5.3-Codex achieved 72.2% success fee in exploit mode testing.

ChatGPT maker OpenAI and crypto-focused funding agency Paradigm have launched EVMbench, a device to assist enhance Ethereum Digital Machine good contract safety.

EVMbench is designed to guage AI brokers’ capability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Digital Machine (EVM) good contracts.

Good contracts are the guts of the Ethereum community, holding the code that powers every thing from decentralized finance protocols to token launches. The weekly variety of good contracts deployed on Ethereum reached an all-time excessive of 1.7 million in November 2025, with 669,500 deployed final week alone, in accordance with Token Terminal.

EVMbench attracts on 120 curated vulnerabilities from 40 audits, most sourced from open audit competitions equivalent to Code4rena, in accordance with an OpenAI weblog submit. It additionally consists of eventualities from the safety auditing course of for Tempo, Stripe’s purpose-built layer-1 blockchain centered on high-throughput, low-cost stablecoin funds.

Funds large Stripe launched the general public testnet for Tempo in December, saying on the time that it was being constructed with enter from Visa, Shopify, and OpenAI, amongst others.

The aim is to floor testing in economically significant, real-world code—notably as AI-driven stablecoin funds broaden, the agency added.

Introducing EVMbench—a brand new benchmark that measures how effectively AI brokers can detect, exploit, and patch high-severity good contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

EVMbench is supposed to guage AI fashions throughout three modes: Detect, patch, and exploit. In “detect,” brokers audit repositories and are scored on their recall of ground-truth vulnerabilities. In “patch,” brokers should eradicate vulnerabilities with out breaking meant performance. Lastly, within the “exploit” part, brokers try end-to-end fund-draining assaults in a sandboxed blockchain atmosphere, with grading carried out through deterministic transaction replay.

In exploit mode, GPT-5.3-Codex operating through OpenAI’s Codex CLI achieved a rating of 72.2%, in comparison with 31.9% for GPT-5, which was launched six months earlier. Efficiency was weaker within the detect and patch duties, the place brokers typically didn’t audit exhaustively or struggled to protect full contract performance.

The ChatGPT makers’ researchers cautioned that EVMbench doesn’t totally seize real-world safety complexity. Nonetheless, they added that measuring AI efficiency in economically related environments is vital as fashions develop into highly effective instruments for each attackers and defenders.

Sam Altman’s OpenAI and Ethereum co-founder Vitalik Buterin have beforehand been at odds over the tempo of AI growth.

In January 2025, Altman stated that his agency was “assured we all know the way to construct AGI as we have now historically understood it.” However Buterin advocated that AI programs ought to embody a “gentle pause” functionality that might quickly limit industrial-scale AI operations if warning indicators emerge.

Day by day Debrief Publication

Begin day-after-day with the highest information tales proper now, plus unique options, a podcast, movies and extra.

Supply hyperlink

What's Hot

COLDCARD SECURITY RISK: IMMEDIATE ACTION REQUIRED

Sui’s USDsui Mannequin Turns Stablecoin Yield Into Ecosystem Buybacks

AI Music Firm Suno Loses Copyright Case in Germany – Decrypt

Can AI Brokers Increase Ethereum Safety? OpenAI and Paradigm Created a Testing Floor – Decrypt

Day by day Debrief Publication

Tether Stories $1.5B Q2 Revenue as Reserve Surplus Tops $4.1B

Bitcoin And Ethereum Edge Greater As Merchants Watch Altcoin Rotation

Tether clears $1.5 billion in revenue as its security cushion shrinks by half

Japanese Large SBI Weighs In on XRP Value – U.At the moment

Bitcoin And Ethereum Edge Greater As Merchants Watch Altcoin Rotation

Coldcard Bitcoin Thief Possible Used Prime Blockchain Providers Supplier

Bitcoin (BTC) July worth acquire survives hawkish Fed, AI meltdown and Coldcard fallout

US Treasury Sanctions Iranian Companies Taking Bitcoin for Hormuz Passage – Decrypt

BlackRock Bitcoin Shopping for Rebounds as ETF Traders Sit 22% Underwater

Japan’s Bond-vs-Yen Dilemma Might Shake Bitcoin and Crypto: Analyst

Coldcard exploit reignites Bitcoin self-custody debate after $38 million theft

US Closes In Of Iran's Bitcoin Insurance coverage Coverage

Top Insights

Hoskinson Slams Ripple Over Crypto Competitors Push

Crypto Merchants In South Korea Face 22% Tax Beginning January 2027

The most recent crypto whale picks: 3 altcoins positioned for explosive progress

What's Hot

Can AI Brokers Increase Ethereum Safety? OpenAI and Paradigm Created a Testing Floor – Decrypt

In short

Day by day Debrief Publication

Related Posts

Subscribe to Updates