Prior to now decade, synthetic intelligence has grown by primarily feeding on the identical useful resource: public internet information. Texts, photographs, paperwork, boards, information, blogs, repositories… an infinite quantity of fabric that fashions have absorbed to construct their language and cognitive skills. However this part is about to finish.
In accordance with projections cited by Messari, the full quantity of public textual content out there for mannequin coaching—roughly 300 trillion tokens—could possibly be utterly exhausted between 2026 and 2032. Because of this giant fashions have “eaten the web,” and now they want one thing else. The subsequent frontier for AI will not be the net: it will likely be the actual world.
And that is the place the idea of frontier information comes into play, the useful resource that can outline the competitiveness of future fashions. Video, audio, sensory, motor, robotic information, motion information, information generated from interplay with the bodily world or advanced digital interfaces. Knowledge that can’t merely be downloaded: they have to be collected, coordinated, verified, and, above all, incentivized.
Because of this, the blockchain will not be a element or a marginal addition: it’s the infrastructure that permits the orchestration of this new information economic system.
The Finish of “Net Scraping” and the Starting of Excessive-Worth Knowledge
Probably the most superior fashions of 2025—not solely linguistic but additionally multimodal, agentic, and reasoning-oriented—not enhance with the mere addition of generic textual datasets. They require one thing way more particular and way more costly to gather: information that displays actions, intentions, motion, interplay, manipulation, context.
That is the case, for instance, with computer-use brokers, AI able to interacting instantly with the pc as a human would. To coach these techniques, textual descriptions usually are not sufficient: “trajectories” are wanted, that are precise recordings of individuals performing duties on the display.
A protocol like Chakra, talked about within the report, has developed an extension that enables customers to document their display whereas performing each day duties: navigating a administration system, getting ready an Excel doc, modifying photographs, utilizing skilled software program. These recordings turn out to be invaluable materials for coaching fashions like GLADOS-1, the primary computer-use mannequin constructed virtually solely on crowdsourced information.
And that is exactly the purpose: these information don’t exist till somebody produces them. And so they have to be paid for. Identical to vitality or inference is paid for.
The Rising Worth of Gameplay-Motion Pairs
One other placing instance comes from the gaming world. A platform like Shaga, born as a decentralized cloud gaming community, produces an especially priceless byproduct: the so-called Gameplay-Motion Pairs (GAP), that are synchronized pairs of what occurs on display and the instructions the participant points.
These are information that can not be retrieved just by watching movies on YouTube: they must be captured on the supply, on the participant’s machine. And this kind of dataset, in response to estimates reported by Messari, may be value as much as $50–$100 per hour of gameplay.
To place it into context: Shaga has already accrued over 259,000 hours of gameplay, with an estimated worth of greater than 26 million {dollars}. And it’s no coincidence that OpenAI, a yr earlier, provided half a billion to amass Medal, the same platform specializing exactly in gameplay recording.
These information are used to coach world fashions, fashions that don’t merely interpret language however simulate physics, causality, and agent-environment interplay. These are the fashions that can allow extra clever robots, autonomous brokers, superior forecasting techniques, and AI able to “transferring” in advanced environments.
Bodily AI: intelligence coming into the bodily world
And that is exactly the place we arrive on the second main wave of frontier information: robotic information.
The AI of the longer term won’t solely reside in information facilities. It should stay in robots, drones, autonomous vehicles, distributed sensors, and sensible dwelling gadgets. Every robotic will want information to learn to transfer, establish objects, make choices, and manipulate environments. And this information assortment is extremely expensive: it requires bodily {hardware}, human operators for teleoperation, steady upkeep, and coordination.
Initiatives like PrismaX, BitRobot, GEODNET, and NATIX are starting to make use of incentivized mechanisms typical of Web3 to distribute this price throughout a world community of contributors. As a substitute of getting a single firm accumulating robotic information, hundreds of customers can accomplish that in a coordinated method, receiving direct compensation.
It’s the identical logic as mining: however as a substitute of computational energy, right here the contribution is the actual information.
Machine-to-machine coordination: when AI acts in the actual world
If robots and AI brokers actually start to work together with the bodily world, a very new stage of coordination is required. Robots might want to:
- establish one another,
- transact funds,
- buy companies,
- eat information,
- execute duties in a verifiable method,
- exhibit having carried out an motion,
- depend on shared ledgers of id and status.
That is the place initiatives like OpenMind and Peaq emerge, making an attempt to construct an onchain infrastructure devoted to the communication and id of robots. An equal of DNS, however for machines. A system the place drones, autonomous vehicles, robotic arms, or industrial techniques can sign their presence, certify their actions, pay different techniques, and alternate companies.
It’s the starting of the machine economic system, an economic system populated by non-human entities that work together autonomously on decentralized networks.
Licensed Actual Knowledge: The Position of IoTeX and DePIN Networks
The report additionally locations vital concentrate on IoTeX, a protocol that in recent times has reworked its infrastructure right into a complete platform for the gathering, certification, and orchestration of real-world information.
IoTeX allows the connection of sensors, IoT gadgets, dwelling techniques, and industrial gear, offering:
- a verified onchain id for every machine,
- a knowledge aggregation system,
- a stage of cryptographic attestation through ZK,
- APIs that permit AI brokers to make the most of that information in real-time.
Right now, IoTeX coordinates over 16,000 gadgets and dozens of vertical tasks, offering AI brokers with the power to entry verified information from the actual world. A big distinction in comparison with easy scraping.
The Endpoint: Knowledge as a Monetary Asset
In accordance with Messari, the trajectory is obvious: information is turning into a monetary asset in each respect. Simply as as we speak one can put money into compute, GPU, and colocation, sooner or later it will likely be attainable to put money into “information streams,” buy utilization rights, assist networks that accumulate frontier information, and in return, obtain financial returns.
It’s an virtually inevitable evolution: if information turns into scarce, priceless, and tough to supply, it is going to then have a market, a worth, demand, and provide.
Blockchain, as soon as once more, is the best layer for:
- coordinate this economic system,
- confirm its integrity,
- hint the provenance,
- distribute the compensations,
- defend customers,
- assist international scalability.
Conclusion
AI won’t advance by more and more bigger fashions, however by richer information, sourced from the actual world and picked up through international networks of contributors. It’s the best gold rush of the subsequent decade: not that of chips, however that of knowledge.
Web3 protocols usually are not a mere element: they’re the pure platform for accumulating, verifying, distributing, and compensating those that present this information. If the net was the uncooked materials of the primary AI wave, the actual world would be the uncooked materials of the second.
And this time, for the primary time, the gathering won’t be managed by just a few giants, however by the networks.
Open, incentivized, decentralized networks: the brand new infrastructure of frontier information.
