EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation
2511.05397v1
cs.RO, cs.CV
2025-11-11
Авторы:
Samarth Chopra, Alex McMoil, Ben Carnovale, Evan Sokolson, Rajkumar Kubendran, Samuel Dickerson
Abstract
While Vision-Language-Action (VLA) models map visual inputs and language
instructions directly to robot actions, they often rely on costly hardware and
struggle in novel or cluttered scenes. We introduce EverydayVLA, a 6-DOF
manipulator that can be assembled for under $300, capable of modest payloads
and workspace. A single unified model jointly outputs discrete and continuous
actions, and our adaptive-horizon ensemble monitors motion uncertainty to
trigger on-the-fly re-planning for safe, reliable operation. On LIBERO,
EverydayVLA matches state-of-the-art success rates, and in real-world tests it
outperforms prior methods by 49% in-distribution and 34.9% out-of-distribution.
By combining a state-of-the-art VLA with cost-effective hardware, EverydayVLA
democratizes access to a robotic foundation model and paves the way for
economical use in homes and research labs alike. Experiment videos and details:
https://everydayvla.github.io/
Ссылки и действия
Дополнительные ресурсы: