XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection

2510.06706v1 cs.SD, cs.CL, eess.AS 2025-10-10

Авторы:

Phuong Tuan Dat, Tran Huy Dat

Abstract

Recent advancements in speech synthesis technologies have led to increasingly sophisticated spoofing attacks, posing significant challenges for automatic speaker verification systems. While systems based on self-supervised learning (SSL) models, particularly the XLSR-Conformer architecture, have demonstrated remarkable performance in synthetic speech detection, there remains room for architectural improvements. In this paper, we propose a novel approach that replaces the traditional Multi-Layer Perceptron (MLP) in the XLSR-Conformer model with a Kolmogorov-Arnold Network (KAN), a powerful universal approximator based on the Kolmogorov-Arnold representation theorem. Our experimental results on ASVspoof2021 demonstrate that the integration of KAN to XLSR-Conformer model can improve the performance by 60.55% relatively in Equal Error Rate (EER) LA and DF sets, further achieving 0.70% EER on the 21LA set. Besides, the proposed replacement is also robust to various SSL architectures. These findings suggest that incorporating KAN into SSL-based models is a promising direction for advances in synthetic speech detection.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection

Авторы:

Abstract

Ссылки и действия

Связанные статьи

SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level

emg2speech: synthesizing speech from electromyography using self-supervised spee...

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

Sci-Phi: A Large Language Model Spatial Audio Descriptor

Навигация