Bimanual 3D Hand Motion and Articulation Forecasting in Everyday Images
2510.06145v1
cs.CV, cs.AI, cs.LG
2025-10-09
Авторы:
Aditya Prakash, David Forsyth, Saurabh Gupta
Abstract
We tackle the problem of forecasting bimanual 3D hand motion & articulation
from a single image in everyday settings. To address the lack of 3D hand
annotations in diverse settings, we design an annotation pipeline consisting of
a diffusion model to lift 2D hand keypoint sequences to 4D hand motion. For the
forecasting model, we adopt a diffusion loss to account for the multimodality
in hand motion distribution. Extensive experiments across 6 datasets show the
benefits of training on diverse data with imputed labels (14% improvement) and
effectiveness of our lifting (42% better) & forecasting (16.4% gain) models,
over the best baselines, especially in zero-shot generalization to everyday
images.
Ссылки и действия
Дополнительные ресурсы: