FLToP CTC: Frame-Level Token Pruning via Relative Threshold for Efficient and Memory-Saving Decoding on Diverse Platforms
2510.09085v1
cs.LG, cs.SD, eess.AS
2025-10-14
Авторы:
Atul Shree, Harshith Jupuru
Abstract
CTC-based ASR systems face computational and memory bottlenecks in
resource-limited environments. Traditional CTC decoders, requiring up to 90% of
processing time in systems (e.g., wav2vec2-large on L4 GPUs), face
inefficiencies due to exhaustive token-level operations. This paper introduces
Frame Level Token Pruning for Connectionist Temporal Classification (FLToP
CTC), a novel decoding algorithm that employs frame-level token pruning guided
by a relative threshold probability. By dynamically eliminating low-probability
tokens per frame, FLToP CTC reduces compute and memory demands while
maintaining negligible WER degradation. On LibriSpeech, FLToP CTC achieves a
10.5x runtime speedup and 2.78x memory reduction versus standard CTC decoders.
Its simplicity enables seamless integration into CTC decoders across platforms
(CPUs, GPUs, etc.). FLToP CTC addresses CTC bottlenecks, offering scalability
for resource-limited environments and realtime applications, enhancing speech
recognition accessibility and efficiency.
Ссылки и действия
Дополнительные ресурсы: