AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation
2510.01433v1
cs.RO, cs.AI
2025-10-04
Авторы:
Anukriti Singh, Kasra Torshizi, Khuzema Habib, Kelin Yu, Ruohan Gao, Pratap Tokekar
Abstract
Vision-based robot learning often relies on dense image or point-cloud
inputs, which are computationally heavy and entangle irrelevant background
features. Existing keypoint-based approaches can focus on manipulation-centric
features and be lightweight, but either depend on manual heuristics or
task-coupled selection, limiting scalability and semantic understanding. To
address this, we propose AFFORD2ACT, an affordance-guided framework that
distills a minimal set of semantic 2D keypoints from a text prompt and a single
image. AFFORD2ACT follows a three-stage pipeline: affordance filtering,
category-level keypoint construction, and transformer-based policy learning
with embedded gating to reason about the most relevant keypoints, yielding a
compact 38-dimensional state policy that can be trained in 15 minutes, which
performs well in real-time without proprioception or dense representations.
Across diverse real-world manipulation tasks, AFFORD2ACT consistently improves
data efficiency, achieving an 82% success rate on unseen objects, novel
categories, backgrounds, and distractors.
Ссылки и действия
Дополнительные ресурсы: