Geometry-Aware Losses for Structure-Preserving Text-to-Sign Language Generation
2509.23011v1
cs.CV, cs.CL
2025-10-01
Авторы:
Zetian Wu, Tianshuo Zhou, Stefan Lee, Liang Huang
Abstract
Sign language translation from text to video plays a crucial role in enabling
effective communication for Deaf and hard--of--hearing individuals. A major
challenge lies in generating accurate and natural body poses and movements that
faithfully convey intended meanings. Prior methods often neglect the anatomical
constraints and coordination patterns of human skeletal motion, resulting in
rigid or biomechanically implausible outputs. To address this, we propose a
novel approach that explicitly models the relationships among skeletal
joints--including shoulders, arms, and hands--by incorporating geometric
constraints on joint positions, bone lengths, and movement dynamics. During
training, we introduce a parent-relative reweighting mechanism to enhance
finger flexibility and reduce motion stiffness. Additionally, bone-pose losses
and bone-length constraints enforce anatomically consistent structures. Our
method narrows the performance gap between the previous best and the
ground-truth oracle by 56.51%, and further reduces discrepancies in bone length
and movement variance by 18.76% and 5.48%, respectively, demonstrating
significant gains in anatomical realism and motion naturalness.
Ссылки и действия
Дополнительные ресурсы: