The Massive Legal Embedding Benchmark (MLEB)

2510.19365v1 cs.CL, cs.AI, cs.IR 2025-10-24

Авторы:

Umar Butler, Abdur-Rahman Butler, Adrian Lucas Malec

Abstract

We present the Massive Legal Embedding Benchmark (MLEB), the largest, most diverse, and most comprehensive open-source benchmark for legal information retrieval to date. MLEB consists of ten expert-annotated datasets spanning multiple jurisdictions (the US, UK, EU, Australia, Ireland, and Singapore), document types (cases, legislation, regulatory guidance, contracts, and literature), and task types (search, zero-shot classification, and question answering). Seven of the datasets in MLEB were newly constructed in order to fill domain and jurisdictional gaps in the open-source legal information retrieval landscape. We document our methodology in building MLEB and creating the new constituent datasets, and release our code, results, and data openly to assist with reproducible evaluations.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

The Massive Legal Embedding Benchmark (MLEB)

Авторы:

Abstract

Ссылки и действия

Связанные статьи

ADORE: Autonomous Domain-Oriented Relevance Engine for E-commerce

Evidence-Guided Schema Normalization for Temporal Tabular Reasoning

SEDA: A Self-Adapted Entity-Centric Data Augmentation for Boosting Gird-based Di...

Principled Context Engineering for RAG: Statistical Guarantees via Conformal Pre...

TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Inform...

Навигация