James Ding
Feb 18, 2026 20:38
Harvey’s BigLaw Bench International doubles benchmark dimension, testing AI authorized capabilities throughout jurisdictions as mannequin scores hit 90% on core duties.
Harvey AI launched BigLaw Bench: International on February 18, greater than doubling its public benchmark dataset with new evaluations for UK, Australia, and Spanish authorized programs. The enlargement marks the primary main replace since Harvey introduced plans to scale BLB fivefold earlier this month.
The timing issues. Main basis fashions now hit roughly 90% on BLB’s core authorized duties—up from round 60% in 2024. However Harvey’s inside analysis reveals efficiency degrades when fashions sort out jurisdiction-specific work. BLB: International goals to quantify precisely the place that localization hole exists.
Six Activity Classes Below the Microscope
Harvey constructed the benchmark round six workflows its enterprise purchasers really use: drafting, lengthy doc evaluation, doc comparability, public analysis, multi-document evaluation, and extraction. Every job was designed by native practitioners in collaboration with Mercor, then cross-reviewed by Harvey’s utilized authorized researchers.
The situations get particular. One UK job asks fashions to advise on FCA enforcement dangers when a CSO sells shares earlier than a failed drug trial announcement. A Spanish benchmark includes analyzing CNMC antitrust publicity for tech corporations caught in a no-poach settlement. Australian duties embody FIRB approval determinations for infrastructure fund acquisitions.
“The objective of BLB: International is to assist perceive and remediate the place basis fashions wrestle to localize successfully on core AI duties,” Harvey acknowledged within the announcement.
Why This Issues for Enterprise AI Adoption
Regulation companies working throughout borders face an actual drawback: an AI assistant that handles Delaware company regulation brilliantly may discover UK monetary laws or Spanish competitors regulation. With out standardized benchmarks, there is no solution to confirm constant high quality throughout workplaces.
Harvey’s method—constructing jurisdiction-specific duties with over two dozen native consultants—creates a baseline for measuring that consistency. The corporate plans to increase BLB: Area, its preference-based analysis system launched in November 2025, to worldwide markets as nicely.
Extra nations are coming. Harvey indicated it is going to proceed constructing native knowledgeable cohorts and deepening current datasets primarily based on buyer suggestions. For authorized tech consumers evaluating AI distributors, BLB: International gives one thing that did not exist earlier than: a standardized solution to evaluate mannequin efficiency on actual authorized work throughout a number of jurisdictions.
Picture supply: Shutterstock

