xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems

2509.13021v1 cs.CR, cs.AI 2025-09-18

Авторы:

Phung Duc Luong, Le Tran Gia Bao, Nguyen Vu Khai Tam, Dong Huu Nguyen Khoa, Nguyen Huu Quyen, Van-Hau Pham, Phan The Duy

Резюме на русском

**Резюме** Современные методы аутоматизированного пентестинга часто сталкиваются с ограничениями в масштабируемости, точности и эффективности, особенно при использовании общего целевого языковых моделей (LLM). Для решения этой проблемы представлено решение xOffense — AI-driven, multi-agent penetration testing framework, которое использует fine-tuned mid-scale open-source LLM (Qwen3-32B) для улучшения принятия решений в процессе penetration testing. Фреймворк разделяет задачи на специализированные агенты для reconnaissance, vulnerability scanning, и exploitation, обеспечивая координацию через orchestration layer. Fine-tuning на Chain-of-Thought penetration testing data позволяет генерировать точные tool commands и выполнять многоэтапную логику. В ходе эVALуации на AutoPenBench и AI-Pentest-Benchmark xOffense показал высокую эффективность, достигнув sub-task completion rate 79.17% и превзойдя соревнования, такие как VulnBot и PentestGPT. Эти результаты указывают на перспективу domain-adapted mid-scale LLMs в создании scalable, cost-efficient, и reproducible решений для autonomous penetration testing.

Abstract

This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

A Light-Weight Large Language Model File Format for Highly-Secure Model Distribu...

SoK: a Comprehensive Causality Analysis Framework for Large Language Model Secur...

Hey GPT-OSS, Looks Like You Got It - Now Walk Me Through It! An Assessment of th...

Context-Aware Hierarchical Learning: A Two-Step Paradigm towards Safer LLMs

Large Language Model based Smart Contract Auditing with LLMBugScanner

Навигация