Editing Physiological Signals in Videos Using Latent Representations
2509.25348v2
cs.CV, cs.HC, cs.MM
2025-10-02
Авторы:
Tianwen Zhou, Akshay Paruchuri, Josef Spjut, Kaan Akşit
Abstract
Camera-based physiological signal estimation provides a non-contact and
convenient means to monitor Heart Rate (HR). However, the presence of vital
signals in facial videos raises significant privacy concerns, as they can
reveal sensitive personal information related to the health and emotional
states of an individual. To address this, we propose a learned framework that
edits physiological signals in videos while preserving visual fidelity. First,
we encode an input video into a latent space via a pretrained 3D Variational
Autoencoder (3D VAE), while a target HR prompt is embedded through a frozen
text encoder. We fuse them using a set of trainable spatio-temporal layers with
Adaptive Layer Normalizations (AdaLN) to capture the strong temporal coherence
of remote Photoplethysmography (rPPG) signals. We apply Feature-wise Linear
Modulation (FiLM) in the decoder with a fine-tuned output layer to avoid the
degradation of physiological signals during reconstruction, enabling accurate
physiological modulation in the reconstructed video. Empirical results show
that our method preserves visual quality with an average PSNR of 38.96 dB and
SSIM of 0.98 on selected datasets, while achieving an average HR modulation
error of 10.00 bpm MAE and 10.09% MAPE using a state-of-the-art rPPG estimator.
Our design's controllable HR editing is useful for applications such as
anonymizing biometric signals in real videos or synthesizing realistic videos
with desired vital signs.
Ссылки и действия
Дополнительные ресурсы: