In short
- Nvidia, Carnegie Mellon, and UC Berkeley have launched ENPIRE, a framework that lets AI coding brokers run the total loop of educating robots new abilities with no human supervision.
- Brokers working Codex, Claude Code, and Kimi Code pushed an eight-robot fleet to a 99% success charge on duties together with pin insertion, GPU insertion, and zip-tie reducing.
- Scaling from one robotic to eight reduce the time wanted to grasp a process by greater than half, although the token invoice grew even sooner than the time saved.
A fleet of eight robotic arms at Nvidia’s GEAR lab spent the previous few weeks educating themselves to insert pins, seat graphics playing cards, and reduce zip ties. The one people concerned have been those who wrote the paper afterward.
The ability got here from ENPIRE, a framework detailed in a paper printed Tuesday by researchers at Nvidia, Carnegie Mellon College, and UC Berkeley. ENPIRE palms all the job of coaching a robotic to AI coding brokers, the identical software program that already writes and exams its personal code, and lets them run that course of immediately on bodily {hardware}.

Coding brokers like OpenAI’s Codex, Anthropic’s Claude Code, and Moonshot’s Kimi Code have spent the previous 12 months working what researchers name autoresearch—writing code, testing it, and rewriting it once more with no individual within the loop. That loop has principally stayed on a display screen, the place resetting a failed experiment prices nothing. ENPIRE drags it into the bodily world, the place resetting an experiment means shifting an precise robotic arm.
Constructing the ‘Enpire’
The system splits the work into two phases. Within the first, a human walks the agent via constructing two everlasting instruments: a reset routine that returns the workspace to a recent beginning place, and a reward operate that watches digital camera footage to attain success—principally a referee that by no means blinks and by no means takes a lunch break. That setup occurs as soon as, then will get reused for each try that follows.
As soon as these instruments exist, the agent takes over utterly. It searches printed analysis for concepts, picks between coaching strategies like imitation studying, reinforcement studying, or hand-written guidelines, then rewrites its personal code and exams the end result on the robotic. Nothing in that loop requires an individual to look at, which is both liberating or barely unsettling relying on how you are feeling a few robotic holding scissors unsupervised.
Nvidia ran the experiment on eight bimanual robotic stations, every with its personal {hardware}, pc, and coding agent. The stations commerce progress through Git, the identical instrument coders use to merge code, so a profitable concept spreads fleet-wide inside minutes.
Researchers measured the payoff on “Push-T,” a process the place a robotic slides a T-shaped block right into a goal zone utilizing solely pushes, and pin insertion, the place it threads pins into 4-millimeter holes. Scaling from one robotic to eight reduce the time to grasp Push-T from roughly 5 hours to 2, and pin insertion from greater than 90 minutes to about 40.

Throughout the 4 real-world duties examined, the brokers drove their insurance policies to a 99% success charge, in accordance with the paper. For pin insertion, the brokers reached near-perfect reliability sooner than a comparable human-in-the-loop methodology, the type that also wants somebody to point out up each morning.
Nvidia’s Jim Fan, the GEAR Lab co-lead who directs the corporate’s AI analysis, known as the challenge an effort to allow AutoResearch within the bodily world for the primary time. Fan stated the staff handed the brokers a fleet of robots, a GPU allocation, and a token finances, then stepped again and let the robots take over.
At present, we allow AutoResearch within the bodily world for the primary time! Introducing ENPIRE: we give 8 Codex brokers a fleet of robots, an allocation of GPUs, and beneficiant token finances. We set them free with a easy objective: resolve the duty as shortly as attainable, maintain the robots busy… pic.twitter.com/zC0OQNzDBs
— Jim Fan (@DrJimFan) June 16, 2026
The hole between simulation and actuality confirmed up nearly instantly. All three coding brokers solved Push-T inside a simulator, however two of the three failed as soon as the identical process moved onto a bodily robotic, the paper notes.
Simulators do not have friction issues. Actual tables do.
Nvidia additionally examined ENPIRE inside RoboCasa, a simulated kitchen benchmark that scores robots on chores like opening cupboards or turning off stoves by success charge, mercifully with none threat of burning the place down. There, ENPIRE outperformed each Nvidia’s personal end-to-end mannequin GR00T and CaP-X, a tool-using agent that skips the autoresearch loop solely.
ENPIRE extends an concept Nvidia first floated with Eureka, a 2023 system that used a language mannequin to write down reward capabilities for robots inside a simulator as an alternative of getting human engineers do it by hand. ENPIRE strikes that self-improvement loop off the simulator and onto actual {hardware}, with the agent designing its personal exams slightly than simply its personal rewards.
The discharge lands the identical week Alibaba unveiled its personal embodied-AI push, the Qwen-Robotic Suite, a trio of basis fashions for robotic navigation, manipulation, and physics simulation. Alibaba is constructing software program brains for robotic our bodies it does not manufacture; Nvidia is testing whether or not brokers can run the entire analysis loop on {hardware} it owns finish to finish. Each level to the identical pattern: bodily robots have gotten the following enviornment for coding brokers to compete in.
Every day Debrief E-newsletter
Begin day-after-day with the highest information tales proper now, plus unique options, a podcast, movies and extra.
