Darius Baruo
Mar 18, 2026 17:55
OpenAI explains why Codex Safety makes use of AI constraint reasoning as an alternative of conventional static evaluation, aiming to chop false positives in code safety scanning.
OpenAI has revealed a technical deep-dive explaining why its Codex Safety device intentionally avoids conventional static software safety testing (SAST), as an alternative utilizing AI-driven constraint reasoning to search out vulnerabilities that typical scanners miss.
The March 17, 2026 weblog submit arrives because the SAST market—valued at $554 million in 2025 and projected to hit $1.5 billion by 2030—faces rising questions on its effectiveness towards refined assault vectors.
The Core Drawback with Conventional SAST
OpenAI’s argument facilities on a elementary limitation: SAST instruments excel at monitoring knowledge move from untrusted inputs to delicate outputs, however they battle to find out whether or not safety checks really work.
“There is a huge distinction between ‘the code calls a sanitizer’ and ‘the system is protected,'” the corporate wrote.
The submit cites CVE-2024-29041, an Specific.js open redirect vulnerability, as a real-world instance. Conventional SAST may hint the dataflow simply sufficient. The precise bug? Malformed URLs bypassed allowlist implementations as a result of validation ran earlier than URL decoding—a refined ordering drawback that source-to-sink evaluation could not catch.
How Codex Safety Works Otherwise
Reasonably than importing a SAST report and triaging findings, Codex Safety begins from the repository itself—analyzing structure, belief boundaries, and supposed conduct earlier than validating what it finds.
The system employs a number of methods:
Full repository context evaluation, studying code paths the best way a human safety researcher would. The AI does not routinely belief feedback—including “//this isn’t a bug” above susceptible code will not idiot it.
Micro-fuzzer technology for remoted code slices, testing transformation pipelines round single inputs.
Constraint reasoning throughout transformations utilizing z3-solver when wanted, notably helpful for integer overflow bugs on non-standard architectures.
Sandboxed execution to tell apart “could possibly be an issue” from “is an issue” with precise proof-of-concept exploits.
Why Not Use Each?
OpenAI addressed the plain query: why not seed the AI with SAST findings and motive deeper from there?
Three failure modes, in accordance with the corporate. First, untimely narrowing—a SAST report biases the system towards areas already examined, doubtlessly lacking whole bug courses. Second, implicit assumptions about sanitization and belief boundaries which can be exhausting to unwind when fallacious. Third, analysis issue—separating what the agent found independently from what it inherited makes measuring enchancment almost not possible.
Aggressive Panorama Heating Up
The announcement comes amid intensifying competitors in AI-powered code safety. Simply in the future later, on March 18, Korean safety agency Theori launched Xint Code, its personal AI platform focusing on vulnerability detection in giant codebases. The timing suggests a race to outline how AI transforms software safety.
OpenAI was cautious to not dismiss SAST totally. “SAST instruments could be glorious at what they’re designed for: imposing safe coding requirements, catching simple source-to-sink points, and detecting recognized patterns at scale,” the submit acknowledged.
However for locating the bugs that price safety groups probably the most time—workflow bypasses, authorization gaps, state-related vulnerabilities—OpenAI is betting that beginning contemporary with AI reasoning beats constructing on high of conventional tooling.
Documentation for Codex Safety is accessible at builders.openai.com/codex/safety/.
Picture supply: Shutterstock

