Beyond Ranked Lists: The SARAL Framework for Cross-Lingual Document Set Retrieval

2511.03228v1 cs.CL, cs.IR 2025-11-07
Авторы:

Shantanu Agarwal, Joel Barry, Elizabeth Boschee, Scott Miller

Abstract

Machine Translation for English Retrieval of Information in Any Language (MATERIAL) is an IARPA initiative targeted to advance the state of cross-lingual information retrieval (CLIR). This report provides a detailed description of Information Sciences Institute's (ISI's) Summarization and domain-Adaptive Retrieval Across Language's (SARAL's) effort for MATERIAL. Specifically, we outline our team's novel approach to handle CLIR with emphasis in developing an approach amenable to retrieve a query-relevant document \textit{set}, and not just a ranked document-list. In MATERIAL's Phase-3 evaluations, SARAL exceeded the performance of other teams in five out of six evaluation conditions spanning three different languages (Farsi, Kazakh, and Georgian).

Ссылки и действия