LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning

2509.25670v1 cs.SD, cs.CV 2025-10-02

Авторы:

Kang Yang, Yifan Liang, Fangkun Liu, Zhenping Xie, Chengshi Zheng

Abstract

Lip-to-speech (L2S) synthesis for Mandarin is a significant challenge, hindered by complex viseme-to-phoneme mappings and the critical role of lexical tones in intelligibility. To address this issue, we propose Lexical Tone-Aware Lip-to-Speech (LTA-L2S). To tackle viseme-to-phoneme complexity, our model adapts an English pre-trained audio-visual self-supervised learning (SSL) model via a cross-lingual transfer learning strategy. This strategy not only transfers universal knowledge learned from extensive English data to the Mandarin domain but also circumvents the prohibitive cost of training such a model from scratch. To specifically model lexical tones and enhance intelligibility, we further employ a flow-matching model to generate the F0 contour. This generation process is guided by ASR-fine-tuned SSL speech units, which contain crucial suprasegmental information. The overall speech quality is then elevated through a two-stage training paradigm, where a flow-matching postnet refines the coarse spectrogram from the first stage. Extensive experiments demonstrate that LTA-L2S significantly outperforms existing methods in both speech intelligibility and tonal accuracy.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Shared Multi-modal Embedding Space for Face-Voice Association

Multi-Reward GRPO for Stable and Prosodic Single-Codebook TTS LLMs at Scale

A Novel CustNetGC Boosted Model with Spectral Features for Parkinson's Disease P...

Efficient Audio-Visual Speech Separation with Discrete Lip Semantics and Multi-S...

Discovering "Words" in Music: Unsupervised Learning of Compositional Sparse Code...

Навигация