OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies
2510.13561v2
cs.SE, cs.AI, 68N30
2025-10-17
Авторы:
Peng Di, Faqiang Chen, Xiao Bai, Hongjun Yang, Qingfeng Li, Ganglin Wei, Jian Mou, Feng Shi, Keting Chen, Peng Tang, Zhitao Shen, Zheng Li, Wenhui Shi, Junwei Guo, Hang Yu
Abstract
The escalating complexity of modern software imposes an unsustainable
operational burden on Site Reliability Engineering (SRE) teams, demanding
AI-driven automation that can emulate expert diagnostic reasoning. Existing
solutions, from traditional AI methods to general-purpose multi-agent systems,
fall short: they either lack deep causal reasoning or are not tailored for the
specialized, investigative workflows unique to SRE. To address this gap, we
present OpenDerisk, a specialized, open-source multi-agent framework
architected for SRE. OpenDerisk integrates a diagnostic-native collaboration
model, a pluggable reasoning engine, a knowledge engine, and a standardized
protocol (MCP) to enable specialist agents to collectively solve complex,
multi-domain problems. Our comprehensive evaluation demonstrates that
OpenDerisk significantly outperforms state-of-the-art baselines in both
accuracy and efficiency. This effectiveness is validated by its large-scale
production deployment at Ant Group, where it serves over 3,000 daily users
across diverse scenarios, confirming its industrial-grade scalability and
practical impact. OpenDerisk is open source and available at
https://github.com/derisk-ai/OpenDerisk/
Ссылки и действия
Дополнительные ресурсы: