WALT: Web Agents that Learn Tools
2510.01524v1
cs.CV, cs.AI, cs.LG
2025-10-04
Авторы:
Viraj Prabhu, Yutong Dai, Matthew Fernandez, Jing Gu, Krithika Ramakrishnan, Yanqi Luo, Silvio Savarese, Caiming Xiong, Junnan Li, Zeyuan Chen, Ran Xu
Abstract
Web agents promise to automate complex browser tasks, but current methods
remain brittle -- relying on step-by-step UI interactions and heavy LLM
reasoning that break under dynamic layouts and long horizons. Humans, by
contrast, exploit website-provided functionality through high-level operations
like search, filter, and sort. We introduce WALT (Web Agents that Learn Tools),
a framework that reverse-engineers latent website functionality into reusable
invocable tools. Rather than hypothesizing ad-hoc skills, WALT exposes robust
implementations of automations already designed into websites -- spanning
discovery (search, filter, sort), communication (post, comment, upvote), and
content management (create, edit, delete). Tools abstract away low-level
execution: instead of reasoning about how to click and type, agents simply call
search(query) or create(listing). This shifts the computational burden from
fragile step-by-step reasoning to reliable tool invocation. On VisualWebArena
and WebArena, WALT achieves higher success with fewer steps and less
LLM-dependent reasoning, establishing a robust and generalizable paradigm for
browser automation.
Ссылки и действия
Дополнительные ресурсы: