In short
- A brand new examine reveals LLMs can mimic human buy intent by mapping free-text solutions to Likert scores by semantic similarity.
- Technique achieved 90% of human check–retest reliability on 9,300 actual survey responses.
- The examine raises questions on bias, generalization, and the way far “artificial shoppers” can stand in for actual folks.
Neglect focus teams: A brand new examine discovered that enormous language fashions can forecast whether or not you need to purchase one thing with hanging accuracy, dramatically outperforming conventional advertising instruments.
Researchers on the College of Mannheim and ETH Zürich have discovered that enormous language fashions can replicate human buy intent—the “How doubtless are you to purchase this?” metric beloved by entrepreneurs—by remodeling free-form textual content into structured survey information.
In a paper printed final week, the crew launched a way referred to as “Semantic Similarity Ranking,” which converts the mannequin’s open-ended responses into numerical “Likert” scores, a five-point scale utilized in conventional shopper analysis.
Fairly than asking a mannequin to choose a quantity between one and 5, the researchers had it reply naturally—“I’d positively purchase this,” or “Possibly if it had been on sale”—after which measured how semantically shut these statements had been to canonical solutions like “I’d positively purchase this” or “I’d not purchase this.”
Every reply was mapped in embedding area to the closest reference assertion, successfully turning LLM textual content into statistical scores. “We present that optimizing for semantic similarity relatively than numeric labels yields purchase-intent distributions that carefully match human survey information,” the authors wrote. “LLM-generated responses achieved 90% of the reliability of repeated human surveys whereas preserving pure variation in attitudes.”
In exams throughout 9,300 actual human survey responses about personal-care merchandise, the SSR methodology produced artificial respondents whose Likert distributions almost mirrored the originals. In different phrases: when requested to “assume like shoppers,” the fashions did.
Why it issues
The discovering might reshape how corporations conduct product testing and market analysis. Shopper surveys are notoriously costly, gradual, and susceptible to bias. Artificial respondents—in the event that they behave like actual ones—might let corporations display screen hundreds of merchandise or messages for a fraction of the price.
It additionally validates a deeper declare: that the geometry of an LLM’s semantic area encodes not simply language understanding however attitudinal reasoning. By evaluating solutions in embedding area relatively than treating them as literal textual content, the examine demonstrates that mannequin semantics can stand in for human judgment with stunning constancy.
On the similar time, it raises acquainted moral and methodological dangers. The researchers examined just one product class, leaving open whether or not the identical method would maintain for monetary choices or politically charged subjects. And artificial “shoppers” might simply turn into artificial targets: the identical modeling methods might assist optimize political persuasion, promoting, or behavioral nudges.
Because the authors put it, “market-driven optimization pressures can systematically erode alignment”—a phrase that resonates far past advertising.
A observe of skepticism
The authors acknowledge that their check area—personal-care merchandise—is slim and will not generalize to high-stakes or emotionally charged purchases. The SSR mapping additionally will depend on fastidiously chosen reference statements: small wording modifications can skew outcomes. Furthermore, the examine depends on human survey information as “floor fact,” though such information is notoriously noisy and culturally biased.
Critics level out that embedding-based similarity assumes that language vectors map neatly onto human attitudes, an assumption that will fail when context or irony enters the combination. The paper’s personal reliability numbers—90% of human test-retest consistency—sound spectacular however nonetheless depart room for vital drift. Briefly, the tactic works on common, nevertheless it’s not but clear whether or not these averages seize actual human variety or just replicate the mannequin’s coaching priors.
The larger image
Educational curiosity in “artificial shopper modeling” has surged in 2025 as corporations experiment with AI-based focus teams and predictive polling. Comparable work by MIT and the College of Cambridge has proven that LLMs can mimic demographic and psychometric segments with reasonable reliability, however none have beforehand demonstrated an in depth statistical match to actual purchase-intent information.
For now, the SSR methodology stays a analysis prototype, nevertheless it hints at a future the place LLMs won’t simply reply questions—however symbolize the general public itself.
Whether or not that’s an advance or a hallucination within the making remains to be up for debate.
Typically Clever E-newsletter
A weekly AI journey narrated by Gen, a generative AI mannequin.