EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

2510.19414v1 eess.AS, cs.AI, cs.SD 2025-10-24

Авторы:

Tong Zhang, Yihuan Huang, Yanzhen Ren

Abstract

The growing prevalence of speech deepfakes has raised serious concerns, particularly in real-world scenarios such as telephone fraud and identity theft. While many anti-spoofing systems have demonstrated promising performance on lab-generated synthetic speech, they often fail when confronted with physical replay attacks-a common and low-cost form of attack used in practical settings. Our experiments show that models trained on existing datasets exhibit severe performance degradation, with average accuracy dropping to 59.6% when evaluated on replayed audio. To bridge this gap, we present EchoFake, a comprehensive dataset comprising more than 120 hours of audio from over 13,000 speakers, featuring both cutting-edge zero-shot text-to-speech (TTS) speech and physical replay recordings collected under varied devices and real-world environmental settings. Additionally, we evaluate three baseline detection models and show that models trained on EchoFake achieve lower average EERs across datasets, indicating better generalization. By introducing more practical challenges relevant to real-world deployment, EchoFake offers a more realistic foundation for advancing spoofing detection methods.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection

Авторы:

Abstract

Ссылки и действия

Связанные статьи

BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical ...

DroneAudioset: An Audio Dataset for Drone-based Search and Rescue

Unsupervised Speech Enhancement using Data-defined Priors

Data-Efficient ASR Personalization for Non-Normative Speech Using an Uncertainty...

Selective Classifier-free Guidance for Zero-shot Text-to-speech

Навигация