Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
2510.17900v1
cs.CY, cs.AI, cs.CL
2025-10-23
Авторы:
Kush Juvekar, Arghya Bhattacharya, Sai Khadloya, Utkarsh Saxena
Abstract
Large language models (LLMs) are entering legal workflows, yet we lack a
jurisdiction-specific framework to assess their baseline competence therein. We
use India's public legal examinations as a transparent proxy. Our multi-year
benchmark assembles objective screens from top national and state exams and
evaluates open and frontier LLMs under real-world exam conditions. To probe
beyond multiple-choice questions, we also include a lawyer-graded,
paired-blinded study of long-form answers from the Supreme Court's
Advocate-on-Record exam. This is, to our knowledge, the first exam-grounded,
India-specific yardstick for LLM court-readiness released with datasets and
protocols. Our work shows that while frontier systems consistently clear
historical cutoffs and often match or exceed recent top-scorer bands on
objective exams, none surpasses the human topper on long-form reasoning. Grader
notes converge on three reliability failure modes: procedural or format
compliance, authority or citation discipline, and forum-appropriate voice and
structure. These findings delineate where LLMs can assist (checks,
cross-statute consistency, statute and precedent lookups) and where human
leadership remains essential: forum-specific drafting and filing, procedural
and relief strategy, reconciling authorities and exceptions, and ethical,
accountable judgment.
Ссылки и действия
Дополнительные ресурсы: