emg2speech: synthesizing speech from electromyography using self-supervised speech models
2510.23969v1
cs.SD, cs.CL, eess.AS
2025-10-30
Авторы:
Harshavardhana T. Gowda, Lee M. Miller
Abstract
We present a neuromuscular speech interface that translates electromyographic
(EMG) signals collected from orofacial muscles during speech articulation
directly into audio. We show that self-supervised speech (SS) representations
exhibit a strong linear relationship with the electrical power of muscle action
potentials: SS features can be linearly mapped to EMG power with a correlation
of $r = 0.85$. Moreover, EMG power vectors corresponding to different
articulatory gestures form structured and separable clusters in feature space.
This relationship: $\text{SS features}$ $\xrightarrow{\texttt{linear mapping}}$
$\text{EMG power}$ $\xrightarrow{\texttt{gesture-specific clustering}}$
$\text{articulatory movements}$, highlights that SS models implicitly encode
articulatory mechanisms. Leveraging this property, we directly map EMG signals
to SS feature space and synthesize speech, enabling end-to-end EMG-to-speech
generation without explicit articulatory models and vocoder training.
Ссылки и действия
Дополнительные ресурсы: