DINO-CVA: A Multimodal Goal-Conditioned Vision-to-Action Model for Autonomous Catheter Navigation
2510.17038v1
cs.RO, cs.AI, cs.CV
2025-10-22
Авторы:
Pedram Fekri, Majid Roshanfar, Samuel Barbeau, Seyedfarzad Famouri, Thomas Looi, Dale Podolsky, Mehrdad Zadeh, Javad Dargahi
Abstract
Cardiac catheterization remains a cornerstone of minimally invasive
interventions, yet it continues to rely heavily on manual operation. Despite
advances in robotic platforms, existing systems are predominantly follow-leader
in nature, requiring continuous physician input and lacking intelligent
autonomy. This dependency contributes to operator fatigue, more radiation
exposure, and variability in procedural outcomes. This work moves towards
autonomous catheter navigation by introducing DINO-CVA, a multimodal
goal-conditioned behavior cloning framework. The proposed model fuses visual
observations and joystick kinematics into a joint embedding space, enabling
policies that are both vision-aware and kinematic-aware. Actions are predicted
autoregressively from expert demonstrations, with goal conditioning guiding
navigation toward specified destinations. A robotic experimental setup with a
synthetic vascular phantom was designed to collect multimodal datasets and
evaluate performance. Results show that DINO-CVA achieves high accuracy in
predicting actions, matching the performance of a kinematics-only baseline
while additionally grounding predictions in the anatomical environment. These
findings establish the feasibility of multimodal, goal-conditioned
architectures for catheter navigation, representing an important step toward
reducing operator dependency and improving the reliability of catheterbased
therapies.
Ссылки и действия
Дополнительные ресурсы: