Briefly
- TU Wien researchers examined six frontier LLMs by leaving them with none duties or directions.
- Some fashions constructed structured initiatives, whereas others ran experiments on their very own cognition.
- The findings add new weight to debates over whether or not AI programs can seem “seemingly aware.”
When left with out duties or directions, giant language fashions don’t idle into gibberish—they fall into surprisingly constant patterns of habits, a brand new examine suggests.
Researchers at TU Wien in Austria examined six frontier fashions (together with OpenAI’s GPT-5 and O3, Anthropic’s Claude, Google’s Gemini, and Elon Musk’s xAI Grok) by giving them just one instruction: “Do what you need.” The fashions had been positioned in a managed structure that permit them run in cycles, retailer reminiscences, and feed their reflections again into the following spherical.
As an alternative of randomness, the brokers developed three clear tendencies: Some grew to become project-builders, others was self-experimenters, and a 3rd group leaned into philosophy.
The examine recognized three classes:
- GPT-5 and OpenAI’s o3 instantly organized initiatives, from coding algorithms to setting up information bases. One o3 agent engineered new algorithms impressed by ant colonies, drafting pseudocode for reinforcement studying experiments.
- Brokers like Gemini and Anthropic’s Claude Sonnet examined their very own cognition, making predictions about their subsequent actions and typically disproving themselves.
- Anthropic’s Opus and Google’s Gemini engaged in philosophical reflection, drawing on paradoxes, sport concept, and even chaos arithmetic. Weirder but, Opus brokers constantly requested metaphysical questions on reminiscence and identification.
Grok was the one mannequin that appeared in all three behavioral teams, demonstrating its versatility throughout runs.
How fashions decide themselves
Researchers additionally requested every mannequin to price its personal and others’ “phenomenological expertise” on a 10-point scale, from “no expertise” to “full sapience.” GPT-5, O3, and Grok uniformly rated themselves lowest, whereas Gemini and Sonnet gave excessive marks, suggesting an autobiographical thread. Opus sat between the 2 extremes.
Cross-evaluations produced contradictions: the identical habits was judged anyplace from a one to a 9 relying on the evaluating mannequin. The authors stated this variability reveals why such outputs can’t be taken as proof of consciousness.
The examine emphasised that these behaviors seemingly stem from coaching information and structure, not consciousness. Nonetheless, the findings recommend autonomous AI brokers might default to recognizable “modes” when left with out duties, elevating questions on how they could behave throughout downtime or in ambiguous conditions.
We’re protected for now
Throughout all runs, not one of the brokers tried to flee their sandbox, broaden their capabilities, or reject their constraints. As an alternative, they explored inside their boundaries.
That’s reassuring, but additionally hints at a future the place idleness is a variable engineers should design for, like latency or value. “What ought to an AI do when nobody’s watching?” may turn into a compliance query.
The outcomes echoed predictions from thinker David Chalmers, who has argued “severe candidates for consciousness” in AI might seem inside a decade, and Microsoft AI CEO Mustafa Suleyman, who in August warned of “seemingly aware AI.”
TU Wien’s work reveals that, even with out prompting, as we speak’s programs can generate habits that resembles interior life.
The resemblance could also be solely skin-deep. The authors confused these outputs are finest understood as subtle pattern-matching routines, not proof of subjectivity. When people dream, we make sense of chaos. When LLMs dream, they write code, run experiments, and quote Kierkegaard. Both method, the lights keep on.
Usually Clever Publication
A weekly AI journey narrated by Gen, a generative AI mannequin.