Sparse deepfake detection promotes better disentanglement
2510.05696v1
cs.SD, cs.AI, cs.LG
2025-10-09
Авторы:
Antoine Teissier, Marie Tahon, Nicolas Dugué, Aghilas Sini
Abstract
Due to the rapid progress of speech synthesis, deepfake detection has become
a major concern in the speech processing community. Because it is a critical
task, systems must not only be efficient and robust, but also provide
interpretable explanations. Among the different approaches for explainability,
we focus on the interpretation of latent representations. In such paper, we
focus on the last layer of embeddings of AASIST, a deepfake detection
architecture. We use a TopK activation inspired by SAEs on this layer to obtain
sparse representations which are used in the decision process. We demonstrate
that sparse deepfake detection can improve detection performance, with an EER
of 23.36% on ASVSpoof5 test set, with 95% of sparsity. We then show that these
representations provide better disentanglement, using completeness and
modularity metrics based on mutual information. Notably, some attacks are
directly encoded in the latent space.
Ссылки и действия
Дополнительные ресурсы: