NeuroSwift: A Lightweight Cross-Subject Framework for fMRI Visual Reconstruction of Complex Scenes
2510.02266v1
cs.CV, cs.HC
2025-10-04
Авторы:
Shiyi Zhang, Dong Liang, Yihang Zhou
Abstract
Reconstructing visual information from brain activity via computer vision
technology provides an intuitive understanding of visual neural mechanisms.
Despite progress in decoding fMRI data with generative models, achieving
accurate cross-subject reconstruction of visual stimuli remains challenging and
computationally demanding. This difficulty arises from inter-subject
variability in neural representations and the brain's abstract encoding of core
semantic features in complex visual inputs. To address these challenges, we
propose NeuroSwift, which integrates complementary adapters via diffusion:
AutoKL for low-level features and CLIP for semantics. NeuroSwift's CLIP Adapter
is trained on Stable Diffusion generated images paired with COCO captions to
emulate higher visual cortex encoding. For cross-subject generalization, we
pretrain on one subject and then fine-tune only 17 percent of parameters (fully
connected layers) for new subjects, while freezing other components. This
enables state-of-the-art performance with only one hour of training per subject
on lightweight GPUs (three RTX 4090), and it outperforms existing methods.
Ссылки и действия
Дополнительные ресурсы: