LLM Agents for Interactive Workflow Provenance: Reference Architecture and Evaluation Methodology
2509.13978v1
cs.DC, cs.AI, cs.DB, 68M14, 68M20, 68T07, C.2.4; D.1.3; I.2.0
2025-09-19
Авторы:
Renan Souza, Timothy Poteet, Brian Etz, Daniel Rosendo, Amal Gueroudji, Woong Shin, Prasanna Balaprakash, Rafael Ferreira da Silva
Резюме на русском
## Контекст
In modern scientific discovery, workflows spanning the Edge, Cloud, and High Performance Computing (HPC) continuum are crucial for processing and analyzing data. These workflows enable hypothesis validation, anomaly detection, reproducibility, and impactful findings. However, as workflows scale, provenance data—essential for understanding and analyzing these processes—become increasingly complex. Current systems rely on custom scripts, structured queries, or static dashboards, which limit interactivity and flexibility. This complexity hinders effective data exploration and analysis.
To address this challenge, researchers are exploring interactive approaches leveraging Large Language Models (LLMs). These models offer potential for transforming how provenance data are accessed and analyzed, enabling more intuitive and efficient workflows. By integrating LLM agents into provenance systems, the goal is to provide researchers with a more interactive and insightful experience, overcoming the limitations of existing methods. This work aims to define a reference architecture and evaluation methodology for such systems.
## Метод
The proposed methodology combines a reference architecture and an evaluation framework for interactive provenance analysis using LLM agents. The reference architecture is lightweight and metadata-driven, translating natural language queries into structured provenance queries. It integrates Retrieval-Augmented Generation (RAG) to enhance LLM responses with contextual metadata.
Key components include:
1. **Metadata-driven design**: A structured schema translates natural language into provenance queries.
2. **LLM agent integration**: LLMs like LLaMA, GPT, Gemini, and Claude are utilized for query interpretation and response generation.
3. **Prompt tuning**: Fine-tuning prompts improves the accuracy and relevance of LLM responses.
4. **Diverse query testing**: A range of query classes, including temporal, spatial, and entity-based queries, are evaluated.
5. **Real-world evaluation**: The methodology is tested on a chemistry workflow, showcasing practical applicability.
This modular and scalable approach ensures that the system can adapt to various scientific workflows while maintaining accuracy and usability.
## Результаты
Evaluations were conducted using LLaMA, GPT, Gemini, and Claude LLMs across multiple query classes and a real-world chemistry workflow. The results demonstrate the following:
1. **Accuracy**: LLM agents achieved high accuracy in interpreting natural language queries and generating structured provenance queries.
2. **Query diversity**: The system performed well across temporal, spatial, and entity-based queries, showcasing its versatility.
3. **Comparison with baselines**: LLM-based approaches outperformed traditional methods, such as static dashboards and structured queries, in terms of interactivity and depth of analysis.
4. **Scalability**: The metadata-driven design proved scalable, handling large-scale provenance data efficiently.
The open-source implementation provides a blueprint for integrating LLM agents into existing provenance systems, offering a practical solution for enhancing workflow provenance analysis.
## Значимость
The proposed approach has significant implications across multiple domains:
1. **Scientific research**: Enables more interactive and insightful analysis of workflow provenance, supporting hypothesis validation and reproducibility.
2. **Data-intensive applications**: Facilitates complex data exploration in fields such as chemistry, biology, and environmental science.
3. **Real-world impact**: The modular design and open-source nature allow for easy adoption and customization across different scientific and industrial workflows.
The integration of LLM agents represents a paradigm shift in provenance analysis, offering a more intuitive and powerful alternative to traditional methods. The potential for broader adoption is high, given the growing demand for interactive and scalable data analysis tools.
## Выводы
The research introduces a reference architecture and evaluation methodology for leveraging LLM agents in interactive workflow provenance analysis. Key achievements include:
1. Demonstration of the feasibility and effectiveness of LLM-based approaches in provenance analysis.
2. Development of a modular and scalable design that enhances interactivity and accuracy.
3. Practical evaluation across diverse query classes and a real-world workflow, showcasing the system's potential.
Future work will focus on expanding the scope of query types, improving LLM prompt tuning, and exploring additional scientific domains for broader applicability. This work lays the foundation for transformative advancements in scientific data analysis and workflow provenance.
Abstract
Modern scientific discovery increasingly relies on workflows that process
data across the Edge, Cloud, and High Performance Computing (HPC) continuum.
Comprehensive and in-depth analyses of these data are critical for hypothesis
validation, anomaly detection, reproducibility, and impactful findings.
Although workflow provenance techniques support such analyses, at large scale,
the provenance data become complex and difficult to analyze. Existing systems
depend on custom scripts, structured queries, or static dashboards, limiting
data interaction. In this work, we introduce an evaluation methodology,
reference architecture, and open-source implementation that leverages
interactive Large Language Model (LLM) agents for runtime data analysis. Our
approach uses a lightweight, metadata-driven design that translates natural
language into structured provenance queries. Evaluations across LLaMA, GPT,
Gemini, and Claude, covering diverse query classes and a real-world chemistry
workflow, show that modular design, prompt tuning, and Retrieval-Augmented
Generation (RAG) enable accurate and insightful LLM agent responses beyond
recorded provenance.