Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis
2510.11143v1
cs.AI, cs.HC, 68U35, 62P30, I.2.2
2025-10-15
Авторы:
Chuke Chen, Biao Luo, Nan Li, Boxiang Wang, Hang Yang, Jing Guo, Ming Xu
Abstract
The rapid expansion of scientific data has widened the gap between analytical
capability and research intent. Existing AI-based analysis tools, ranging from
AutoML frameworks to agentic research assistants, either favor automation over
transparency or depend on manual scripting that hinders scalability and
reproducibility. We present ARIA (Automated Research Intelligence Assistant), a
spec-driven, human-in-the-loop framework for automated and interpretable data
analysis. ARIA integrates six interoperable layers, namely Command, Context,
Code, Data, Orchestration, and AI Module, within a document-centric workflow
that unifies human reasoning and machine execution. Through natural-language
specifications, researchers define analytical goals while ARIA autonomously
generates executable code, validates computations, and produces transparent
documentation. Beyond achieving high predictive accuracy, ARIA can rapidly
identify optimal feature sets and select suitable models, minimizing redundant
tuning and repetitive experimentation. In the Boston Housing case, ARIA
discovered 25 key features and determined XGBoost as the best performing model
(R square = 0.93) with minimal overfitting. Evaluations across heterogeneous
domains demonstrate ARIA's strong performance, interpretability, and efficiency
compared with state-of-the-art systems. By combining AI for research and AI for
science principles within a spec-driven architecture, ARIA establishes a new
paradigm for transparent, collaborative, and reproducible scientific discovery.