LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models
2510.24031v1
cs.AI, cs.CR, H.3.3, I.2.7, I.5.3, I.2.5,
2025-10-30
Авторы:
Peng Cai, Reza Ryan, Nickson M. Karie
Abstract
System logs are a cornerstone of cybersecurity, supporting proactive breach
prevention and post-incident investigations. However, analyzing vast amounts of
diverse log data remains significantly challenging, as high costs, lack of
in-house expertise, and time constraints make even basic analysis difficult for
many organizations. This study introduces LLMLogAnalyzer, a clustering-based
log analysis chatbot that leverages Large Language Models (LLMs) and Machine
Learning (ML) algorithms to simplify and streamline log analysis processes.
This innovative approach addresses key LLM limitations, including context
window constraints and poor structured text handling capabilities, enabling
more effective summarization, pattern extraction, and anomaly detection tasks.
LLMLogAnalyzer is evaluated across four distinct domain logs and various tasks.
Results demonstrate significant performance improvements over state-of-the-art
LLM-based chatbots, including ChatGPT, ChatPDF, and NotebookLM, with consistent
gains ranging from 39% to 68% across different tasks. The system also exhibits
strong robustness, achieving a 93% reduction in interquartile range (IQR) when
using ROUGE-1 scores, indicating significantly lower result variability. The
framework's effectiveness stems from its modular architecture comprising a
router, log recognizer, log parser, and search tools. This design enhances LLM
capabilities for structured text analysis while improving accuracy and
robustness, making it a valuable resource for both cybersecurity experts and
non-technical users.