Web-Scale Collection of Video Data for 4D Animal Reconstruction
2511.01169v1
cs.CV, I.2.10; I.4.5
2025-11-07
Авторы:
Brian Nlong Zhao, Jiajun Wu, Shangzhe Wu
Abstract
Computer vision for animals holds great promise for wildlife research but
often depends on large-scale data, while existing collection methods rely on
controlled capture setups. Recent data-driven approaches show the potential of
single-view, non-invasive analysis, yet current animal video datasets are
limited--offering as few as 2.4K 15-frame clips and lacking key processing for
animal-centric 3D/4D tasks. We introduce an automated pipeline that mines
YouTube videos and processes them into object-centric clips, along with
auxiliary annotations valuable for downstream tasks like pose estimation,
tracking, and 3D/4D reconstruction. Using this pipeline, we amass 30K videos
(2M frames)--an order of magnitude more than prior works. To demonstrate its
utility, we focus on the 4D quadruped animal reconstruction task. To support
this task, we present Animal-in-Motion (AiM), a benchmark of 230 manually
filtered sequences with 11K frames showcasing clean, diverse animal motions. We
evaluate state-of-the-art model-based and model-free methods on
Animal-in-Motion, finding that 2D metrics favor the former despite unrealistic
3D shapes, while the latter yields more natural reconstructions but scores
lower--revealing a gap in current evaluation. To address this, we enhance a
recent model-free approach with sequence-level optimization, establishing the
first 4D animal reconstruction baseline. Together, our pipeline, benchmark, and
baseline aim to advance large-scale, markerless 4D animal reconstruction and
related tasks from in-the-wild videos. Code and datasets are available at
https://github.com/briannlongzhao/Animal-in-Motion.
Ссылки и действия
Дополнительные ресурсы: