Neural Codecs as Biosignal Tokenizers
2510.09095v1
cs.LG, cs.NE
2025-10-14
Авторы:
Kleanthis Avramidis, Tiantian Feng, Woojae Jeong, Jihwan Lee, Wenhui Cui, Richard M Leahy, Shrikanth Narayanan
Abstract
Neurophysiological recordings such as electroencephalography (EEG) offer
accessible and minimally invasive means of estimating physiological activity
for applications in healthcare, diagnostic screening, and even immersive
entertainment. However, these recordings yield high-dimensional, noisy
time-series data that typically require extensive pre-processing and
handcrafted feature extraction to reveal meaningful information. Recently,
there has been a surge of interest in applying representation learning
techniques from large pre-trained (foundation) models to effectively decode and
interpret biosignals. We discuss the challenges posed for incorporating such
methods and introduce BioCodec, an alternative representation learning
framework inspired by neural codecs to capture low-level signal characteristics
in the form of discrete tokens. Pre-trained on thousands of EEG hours, BioCodec
shows efficacy across multiple downstream tasks, ranging from clinical
diagnostic tasks and sleep physiology to decoding speech and motor imagery,
particularly in low-resource settings. Additionally, we provide a qualitative
analysis of codebook usage and estimate the spatial coherence of codebook
embeddings from EEG connectivity. Notably, we also document the suitability of
our method to other biosignal data, i.e., electromyographic (EMG) signals.
Overall, the proposed approach provides a versatile solution for biosignal
tokenization that performs competitively with state-of-the-art models. The
source code and model checkpoints are shared.
Ссылки и действия
Дополнительные ресурсы: