Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
2509.25873v1
cs.AI, cs.CL, cs.LG, cs.PL, cs.SE
2025-10-02
Авторы:
Hankun Dai, Maoquan Wang, Mengnan Qi, Yikai Zhang, Zijian Jin, Yongqiang Yao, Yufan Huang, Shengyu Fu, Elsie Nallipogu
Abstract
Large language models (LLMs) are increasingly being applied to programming
tasks, ranging from single-turn code completion to autonomous agents. Current
code agent designs frequently depend on complex, hand-crafted workflows and
tool sets. However, this reliance on elaborate scaffolding presents several
challenges: agent performance becomes overly dependent on prompt tuning and
custom design choices, heavy human intervention obscures a model's true
underlying capabilities, and intricate pipelines are costly to build and
maintain. Furthermore, optimizing complex task prompts increases the risk of
data leakage. Currently, when introducing new models, LLM providers like OpenAI
and Anthropic often publish benchmark scores to demonstrate their models'
coding proficiency, but keep their proprietary evaluation frameworks
confidential. To address these limitations, we introduce Lita (Lite Agent),
which operationalizes liteness, a principle of minimizing manual design while
retaining the essential elements of a fully autonomous agent. Lita enables a
more faithful and unified evaluation without elaborate scaffolding. Experiments
on the Aider Polyglot and SWE-Bench with frontier models demonstrate that Lita
achieves competitive or superior performance compared to workflow-based and
agentic baselines. Crucially, Lita also consumes fewer tokens and requires
significantly less design effort. Our results suggest that Lita is sufficient
to reveal the underlying coding competence of modern LLMs. Finally, we propose
the Agent Complexity Law: the performance gap between agents of varying
complexity, from simple to sophisticated designs, will shrink as the core model
improves, ultimately converging to a negligible difference.