XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
2510.06706v1
cs.SD, cs.CL, eess.AS
2025-10-10
Авторы:
Phuong Tuan Dat, Tran Huy Dat
Abstract
Recent advancements in speech synthesis technologies have led to increasingly
sophisticated spoofing attacks, posing significant challenges for automatic
speaker verification systems. While systems based on self-supervised learning
(SSL) models, particularly the XLSR-Conformer architecture, have demonstrated
remarkable performance in synthetic speech detection, there remains room for
architectural improvements. In this paper, we propose a novel approach that
replaces the traditional Multi-Layer Perceptron (MLP) in the XLSR-Conformer
model with a Kolmogorov-Arnold Network (KAN), a powerful universal approximator
based on the Kolmogorov-Arnold representation theorem. Our experimental results
on ASVspoof2021 demonstrate that the integration of KAN to XLSR-Conformer model
can improve the performance by 60.55% relatively in Equal Error Rate (EER) LA
and DF sets, further achieving 0.70% EER on the 21LA set. Besides, the proposed
replacement is also robust to various SSL architectures. These findings suggest
that incorporating KAN into SSL-based models is a promising direction for
advances in synthetic speech detection.
Ссылки и действия
Дополнительные ресурсы: