Robust Neural Audio Fingerprinting using Music Foundation Models
2511.05399v1
cs.SD, cs.AI
2025-11-11
Авторы:
Shubhr Singh, Kiran Bhat, Xavier Riley, Benjamin Resnick, John Thickstun, Walter De Brouwer
Abstract
The proliferation of distorted, compressed, and manipulated music on modern
media platforms like TikTok motivates the development of more robust audio
fingerprinting techniques to identify the sources of musical recordings. In
this paper, we develop and evaluate new neural audio fingerprinting techniques
with the aim of improving their robustness. We make two contributions to neural
fingerprinting methodology: (1) we use a pretrained music foundation model as
the backbone of the neural architecture and (2) we expand the use of data
augmentation to train fingerprinting models under a wide variety of audio
manipulations, including time streching, pitch modulation, compression, and
filtering. We systematically evaluate our methods in comparison to two
state-of-the-art neural fingerprinting models: NAFP and GraFPrint. Results show
that fingerprints extracted with music foundation models (e.g., MuQ, MERT)
consistently outperform models trained from scratch or pretrained on
non-musical audio. Segment-level evaluation further reveals their capability to
accurately localize fingerprint matches, an important practical feature for
catalog management.
Ссылки и действия
Дополнительные ресурсы: