You Have Been LaTeXpOsEd: A Systematic Analysis of Information Leakage in Preprint Archives Using Large Language Models
2510.03761v1
cs.CR, cs.AI
2025-10-08
Авторы:
Richard A. Dubniczky, Bertalan Borsos, Tihanyi Norbert
Abstract
The widespread use of preprint repositories such as arXiv has accelerated the
communication of scientific results but also introduced overlooked security
risks. Beyond PDFs, these platforms provide unrestricted access to original
source materials, including LaTeX sources, auxiliary code, figures, and
embedded comments. In the absence of sanitization, submissions may disclose
sensitive information that adversaries can harvest using open-source
intelligence. In this work, we present the first large-scale security audit of
preprint archives, analyzing more than 1.2 TB of source data from 100,000 arXiv
submissions. We introduce LaTeXpOsEd, a four-stage framework that integrates
pattern matching, logical filtering, traditional harvesting techniques, and
large language models (LLMs) to uncover hidden disclosures within
non-referenced files and LaTeX comments. To evaluate LLMs' secret-detection
capabilities, we introduce LLMSec-DB, a benchmark on which we tested 25
state-of-the-art models. Our analysis uncovered thousands of PII leaks,
GPS-tagged EXIF files, publicly available Google Drive and Dropbox folders,
editable private SharePoint links, exposed GitHub and Google credentials, and
cloud API keys. We also uncovered confidential author communications, internal
disagreements, and conference submission credentials, exposing information that
poses serious reputational risks to both researchers and institutions. We urge
the research community and repository operators to take immediate action to
close these hidden security gaps. To support open science, we release all
scripts and methods from this study but withhold sensitive findings that could
be misused, in line with ethical principles. The source code and related
material are available at the project website https://github.com/LaTeXpOsEd
Ссылки и действия
Дополнительные ресурсы: