Briefly
- X-OmniClaw is an open-source Android AI agent from Oppo that retains its core logic on-device and solely calls the cloud for high-level reasoning.
- The framework builds a long-term semantic reminiscence out of your picture gallery and session historical past, letting it act as a steady assistant fairly than a one-shot chatbot.
- A habits cloning function lets customers file a navigation path as soon as so the agent can replay it immediately through Android deeplink, bypassing multi-step app navigation in future classes.
Your cellphone already has a digicam, a microphone, and a display. It might see what you are in actual life and what’s occurring by itself show. And now, the AI workforce from Chinese language smartphone producer Oppo has found out that every one that {hardware} that sits there, principally underused, is strictly what you have to construct a genuinely helpful cellular AI agent.
That mission is X-OmniClaw, printed by the Multi-X Workforce. It is an open-source AI agent framework for Android that turns your cellphone right into a hands-free, context-aware assistant able to operating actual duties throughout actual apps, with out routing every part by way of a cloud copy of your system.
Most cellular AI methods do not really run in your cellphone. They run on cloud servers that host digital copies of Android, letting an AI faucet and scroll by way of apps remotely. The end result: no entry to your actual digicam, your precise photographs, or your native information—only a stranger utilizing a replica of your cellphone.
X-OmniClaw takes the alternative method. Per the technical report, it introduces “an edge-native structure that executes instantly on the person’s bodily system, thereby eliminating the hole between simulated environments and real-world interplay contexts.”
The report makes use of a automotive analogy: The smartphone is “the car,” X-OmniClaw is “the inner engine for management and notion,” and the cloud-based language mannequin is simply referred to as in as “the gas” when heavy reasoning is required. The whole lot else stays native.
How the Oppo AI cellphone agent works
X-OmniClaw’s general structure relies on three pillars: Omni Notion, Omni Motion, and Omni Reminiscence that work as one steady loop, with cloud LLMs referred to as in just for heavy reasoning, based on Oppo.

Omni Notion covers every part the cellphone can sense. It combines digicam feeds, display content material, and voice enter right into a single pipeline. A vision-language mannequin interprets the scene earlier than the agent does the rest. So if you happen to level your digicam at a bottle and ask, “how a lot does this price?”, the agent first figures out what you are , then opens the related purchasing app and begins looking. No guessing required.
Omni Reminiscence is what separates X-OmniClaw from a one-shot chatbot. The agent maintains context throughout duties, app switches, and classes. It additionally builds a long-term semantic reminiscence out of your picture gallery, turning uncooked pictures into structured notes about objects, scenes, and occasions. The report states “runtime continuity is what lets X-OmniClaw function as an ongoing system agent fairly than a one-shot response system.”
Omni Motion handles execution. It combines XML interface information with an on-device visible mannequin and OCR—a character-recognition layer to determine precisely what to faucet, even on ad-heavy screens the place construction alone is not sufficient. It additionally contains habits cloning: file your self navigating to a buried app web page as soon as, and the agent can replay that route immediately utilizing an Android deeplink shortcut subsequent time.
What the Oppo AI agent can really do

Oppo shared some issues the mannequin can do. For instance, the agent identifies a bodily product through digicam, opens Taobao, scrolls outcomes, and returns a value abstract—no typing required.
Oppo additionally demoed a floating on-screen companion that helps a person work by way of math workout routines step-by-step: autonomously studying the display, processing every query, and advancing when achieved.
It additionally supplied one other instance during which a person asks the agent to assemble a spotlight video from parrot-themed photographs. The system scans the gallery, finds matching photographs utilizing its semantic reminiscence, opens CapCut’s video editor through deeplink, batch-selects the information, and generates the video. What used to take “a couple of minutes or longer” turns into a handful of automated steps.

2026: The yr of agentic AI
AI brokers have turn out to be one of the vital mentioned classes in tech. OpenClaw—the open-source agent framework that reached over 373,000 GitHub stars and was ultimately backed by OpenAI—launched the present wave by exhibiting what persistent, locally-run brokers may do on PCs. Hermes Agent by Nous Analysis took issues additional with a self-improving studying loop that compounds capabilities over time.
Each run totally on desktop {hardware}. X-OmniClaw extends the identical structure to the system you really carry in every single place. The workforce constructed on the open-source HermesApp codebase, and the paper explicitly credit OpenClaw’s structured ability mannequin as foundational inspiration, then tailored it for the multimodal, always-on nature of a smartphone.
The code is on GitHub now. Oppo says it’ll launch all property and preserve updating the mission because the system evolves.
Day by day Debrief E-newsletter
Begin daily with the highest information tales proper now, plus unique options, a podcast, movies and extra.
