Briefly
- Legislation professors most well-liked AI-generated contract regulation solutions over these written by fellow professors about 75% of the time.
- AI responses have been flagged as dangerous much less typically than professor-written responses.
- Researchers mentioned the outcomes present that giant language fashions can align with skilled requirements.
Legislation professors most well-liked solutions generated by synthetic intelligence over solutions written by fellow professors, in keeping with a current research led by Stanford College that examined how massive language fashions carry out on authorized reasoning duties.
Within the research, 16 professors from 14 U.S. regulation faculties—together with Stanford, Yale, New York College, the College of Chicago, Georgetown, UCLA, and the College of Virginia—created 40 contract regulation questions overlaying authorized doctrine, case regulation, hypotheticals, and coverage points. Researchers noticed it as a great method to take a look at the capabilities of recent AI.
“Giant language fashions (LLMs) are more and more promoted as instructional tutors, but most evaluations concentrate on domains with a single floor fact,” the researchers wrote. “Many disciplines, nonetheless, hinge on judgment: reasoning, weighing ambiguity, and reaching defensible conclusions. Legislation offers a pointy take a look at.”
In 2,918 blinded comparisons, professors chosen the reply they’d reasonably give a scholar. Google’s Gemini 2.5 Professional gained 75.92% of its matchups towards human instructors, whereas the tech big’s NotebookLM gained 74.75% of the time, giving AI-generated outcomes the nod over people in roughly three-quarters of responses.
In accordance with the researchers, to find out whether or not the outcomes mirrored a broader skilled consensus, the researchers analyzed how typically professors agreed when evaluating the identical reply pairs.
“Noticed settlement exceeded the extent anticipated if judgments have been completely idiosyncratic, indicating that the LLMs’ success displays alignment with frequent disciplinary standards,” they wrote.
The research discovered that AI fashions additionally outperformed human instructors throughout a number of classes, together with recall questions regarding case, code, or doctrine, hypotheticals, and coverage discussions.
“To probe whether or not any LLM benefit may be pushed by surface-level writing type reasonably than substantive content material, we moreover engineered a set of lexico-syntactic options—reply size, structural group, reasoning nuance, authorized anchors, confidence tone, readability, and pedagogical help—and examined how a lot of the desire sample they might clarify,” the research mentioned.
AI-generated solutions have been additionally flagged as dangerous much less typically than these written by professors, with Gemini recording a 3.41% harmfulness fee and NotebookLM 3.64%, in contrast with 12.06% for human instructors. In a separate evaluation of extra fashions, Anthropic’s Claude Opus 4.7 ranked first, adopted by OpenAI’s ChatGPT 5.4 and Gemini 2.5 Professional, whereas each AI mannequin evaluated outperformed human instructors on common.
The researchers cautioned that the research didn’t measure whether or not the solutions matched every professor’s particular person educating preferences, leaving open the likelihood that AI-generated responses have been considered as typically acceptable reasonably than tailor-made to anyone teacher’s method.
“Whereas LLM responses are typically most well-liked over these of human instructors, our analysis setting doesn’t permit us to straight measure the extent to which teacher preferences are happy,” the research mentioned. “It’s at the very least theoretically doable that LLMs, though typically delivering stronger responses, nonetheless generate solutions which might be merely considered as “ok.”
The research comes as courts, regulation companies, and regulation faculties more and more grapple with how synthetic intelligence must be used within the authorized career.
In March, the Los Angeles Superior Courtroom started testing AI instruments to assist judges handle rising caseloads, whereas regulation faculties are including AI coaching packages.
“The potential advantages of those new applied sciences as a drive multiplier within the observe of regulation simply can’t be ignored,” Mississippi School College of Legislation Dean John P. Anderson beforehand advised Decrypt. “Whether or not our college students plan to be litigators or transactional attorneys, their future employers will anticipate familiarity with these AI instruments. We would like the companies hiring our college students to be assured that each MC Legislation grad is competent in AI applied sciences.
On the similar time, nonetheless, regulation companies proceed to confront instances undermined by hallucinations and different AI-generated errors. In April, Legislation agency Sullivan & Cromwell admitted to a U.S. chapter courtroom {that a} current submitting in a high-profile case contained pretend citations generated by AI.
Every day Debrief E-newsletter
Begin each day with the highest information tales proper now, plus unique options, a podcast, movies and extra.

