On Deepfake Voice Detection -- It's All in the Presentation
2509.26471v1
eess.AS, cs.AI
2025-10-02
Авторы:
Héctor Delgado, Giorgio Ramondetti, Emanuele Dalmasso, Gennady Karvitsky, Daniele Colibro, Haydar Talib
Abstract
While the technologies empowering malicious audio deepfakes have dramatically
evolved in recent years due to generative AI advances, the same cannot be said
of global research into spoofing (deepfake) countermeasures. This paper
highlights how current deepfake datasets and research methodologies led to
systems that failed to generalize to real world application. The main reason is
due to the difference between raw deepfake audio, and deepfake audio that has
been presented through a communication channel, e.g. by phone. We propose a new
framework for data creation and research methodology, allowing for the
development of spoofing countermeasures that would be more effective in
real-world scenarios. By following the guidelines outlined here we improved
deepfake detection accuracy by 39% in more robust and realistic lab setups, and
by 57% on a real-world benchmark. We also demonstrate how improvement in
datasets would have a bigger impact on deepfake detection accuracy than the
choice of larger SOTA models would over smaller models; that is, it would be
more important for the scientific community to make greater investment on
comprehensive data collection programs than to simply train larger models with
higher computational demands.
Ссылки и действия
Дополнительные ресурсы: