EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
2510.19414v1
eess.AS, cs.AI, cs.SD
2025-10-24
Авторы:
Tong Zhang, Yihuan Huang, Yanzhen Ren
Abstract
The growing prevalence of speech deepfakes has raised serious concerns,
particularly in real-world scenarios such as telephone fraud and identity
theft. While many anti-spoofing systems have demonstrated promising performance
on lab-generated synthetic speech, they often fail when confronted with
physical replay attacks-a common and low-cost form of attack used in practical
settings. Our experiments show that models trained on existing datasets exhibit
severe performance degradation, with average accuracy dropping to 59.6% when
evaluated on replayed audio. To bridge this gap, we present EchoFake, a
comprehensive dataset comprising more than 120 hours of audio from over 13,000
speakers, featuring both cutting-edge zero-shot text-to-speech (TTS) speech and
physical replay recordings collected under varied devices and real-world
environmental settings. Additionally, we evaluate three baseline detection
models and show that models trained on EchoFake achieve lower average EERs
across datasets, indicating better generalization. By introducing more
practical challenges relevant to real-world deployment, EchoFake offers a more
realistic foundation for advancing spoofing detection methods.
Ссылки и действия
Дополнительные ресурсы: