MVP4D: Multi-View Portrait Video Diffusion for Animatable 4D Avatars
2510.12785v1
cs.CV, cs.AI, cs.GR
2025-10-16
Авторы:
Felix Taubner, Ruihang Zhang, Mathieu Tuli, Sherwin Bahmani, David B. Lindell
Abstract
Digital human avatars aim to simulate the dynamic appearance of humans in
virtual environments, enabling immersive experiences across gaming, film,
virtual reality, and more. However, the conventional process for creating and
animating photorealistic human avatars is expensive and time-consuming,
requiring large camera capture rigs and significant manual effort from
professional 3D artists. With the advent of capable image and video generation
models, recent methods enable automatic rendering of realistic animated avatars
from a single casually captured reference image of a target subject. While
these techniques significantly lower barriers to avatar creation and offer
compelling realism, they lack constraints provided by multi-view information or
an explicit 3D representation. So, image quality and realism degrade when
rendered from viewpoints that deviate strongly from the reference image. Here,
we build a video model that generates animatable multi-view videos of digital
humans based on a single reference image and target expressions. Our model,
MVP4D, is based on a state-of-the-art pre-trained video diffusion model and
generates hundreds of frames simultaneously from viewpoints varying by up to
360 degrees around a target subject. We show how to distill the outputs of this
model into a 4D avatar that can be rendered in real-time. Our approach
significantly improves the realism, temporal consistency, and 3D consistency of
generated avatars compared to previous methods.
Ссылки и действия
Дополнительные ресурсы: