Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
2509.26539v1
cs.CV, cs.CL, cs.LG
2025-10-02
Авторы:
Zhen Yang, Zi-Yi Dou, Di Feng, Forrest Huang, Anh Nguyen, Keen You, Omar Attia, Yuhao Yang, Michael Feng, Haotian Zhang, Ram Ramrakhya, Chao Jia, Jeffrey Nichols, Alexander Toshev, Yinfei Yang, Zhe Gan
Abstract
Developing autonomous agents that effectively interact with Graphic User
Interfaces (GUIs) remains a challenging open problem, especially for small
on-device models. In this paper, we present Ferret-UI Lite, a compact,
end-to-end GUI agent that operates across diverse platforms, including mobile,
web, and desktop. Utilizing techniques optimized for developing small models,
we build our 3B Ferret-UI Lite agent through curating a diverse GUI data
mixture from real and synthetic sources, strengthening inference-time
performance through chain-of-thought reasoning and visual tool-use, and
reinforcement learning with designed rewards. Ferret-UI Lite achieves
competitive performance with other small-scale GUI agents. In GUI grounding,
Ferret-UI Lite attains scores of $91.6\%$, $53.3\%$, and $61.2\%$ on the
ScreenSpot-V2, ScreenSpot-Pro, and OSWorld-G benchmarks, respectively. For GUI
navigation, Ferret-UI Lite achieves success rates of $28.0\%$ on AndroidWorld
and $19.8\%$ on OSWorld. We share our methods and lessons learned from
developing compact, on-device GUI agents.
Ссылки и действия
Дополнительные ресурсы: