DRAMA: Unifying Data Retrieval and Analysis for Open-Domain Analytic Queries
2510.27238v1
cs.DB, cs.AI, cs.CL, cs.IR
2025-11-04
Авторы:
Chuxuan Hu, Maxwell Yang, James Weiland, Yeji Lim, Suhas Palawala, Daniel Kang
Abstract
Manually conducting real-world data analyses is labor-intensive and
inefficient. Despite numerous attempts to automate data science workflows, none
of the existing paradigms or systems fully demonstrate all three key
capabilities required to support them effectively: (1) open-domain data
collection, (2) structured data transformation, and (3) analytic reasoning.
To overcome these limitations, we propose DRAMA, an end-to-end paradigm that
answers users' analytic queries in natural language on large-scale open-domain
data. DRAMA unifies data collection, transformation, and analysis as a single
pipeline. To quantitatively evaluate system performance on tasks representative
of DRAMA, we construct a benchmark, DRAMA-Bench, consisting of two categories
of tasks: claim verification and question answering, each comprising 100
instances. These tasks are derived from real-world applications that have
gained significant public attention and require the retrieval and analysis of
open-domain data. We develop DRAMA-Bot, a multi-agent system designed following
DRAMA. It comprises a data retriever that collects and transforms data by
coordinating the execution of sub-agents, and a data analyzer that performs
structured reasoning over the retrieved data. We evaluate DRAMA-Bot on
DRAMA-Bench together with five state-of-the-art baseline agents. DRAMA-Bot
achieves 86.5% task accuracy at a cost of $0.05, outperforming all baselines
with up to 6.9 times the accuracy and less than 1/6 of the cost. DRAMA is
publicly available at https://github.com/uiuc-kang-lab/drama.