emg2speech: synthesizing speech from electromyography using self-supervised speech models

2510.23969v1 cs.SD, cs.CL, eess.AS 2025-10-30

Авторы:

Harshavardhana T. Gowda, Lee M. Miller

Abstract

We present a neuromuscular speech interface that translates electromyographic (EMG) signals collected from orofacial muscles during speech articulation directly into audio. We show that self-supervised speech (SS) representations exhibit a strong linear relationship with the electrical power of muscle action potentials: SS features can be linearly mapped to EMG power with a correlation of $r = 0.85$. Moreover, EMG power vectors corresponding to different articulatory gestures form structured and separable clusters in feature space. This relationship: $\text{SS features}$ $\xrightarrow{\texttt{linear mapping}}$ $\text{EMG power}$ $\xrightarrow{\texttt{gesture-specific clustering}}$ $\text{articulatory movements}$, highlights that SS models implicitly encode articulatory mechanisms. Leveraging this property, we directly map EMG signals to SS feature space and synthesize speech, enabling end-to-end EMG-to-speech generation without explicit articulatory models and vocoder training.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

emg2speech: synthesizing speech from electromyography using self-supervised speech models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

SP-MCQA: Evaluating Intelligibility of TTS Beyond the Word Level

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection

Sci-Phi: A Large Language Model Spatial Audio Descriptor

Навигация