Uncovering Code Insights: Leveraging GitHub Artifacts for Deeper Code Understanding
2511.03549v1
cs.SE, cs.AI
2025-11-07
Авторы:
Ziv Nevo, Orna Raz, Karen Yorav
Abstract
Understanding the purpose of source code is a critical task in software
maintenance, onboarding, and modernization. While large language models (LLMs)
have shown promise in generating code explanations, they often lack grounding
in the broader software engineering context. We propose a novel approach that
leverages natural language artifacts from GitHub -- such as pull request
descriptions, issue descriptions and discussions, and commit messages -- to
enhance LLM-based code understanding. Our system consists of three components:
one that extracts and structures relevant GitHub context, another that uses
this context to generate high-level explanations of the code's purpose, and a
third that validates the explanation. We implemented this as a standalone tool,
as well as a server within the Model Context Protocol (MCP), enabling
integration with other AI-assisted development tools. Our main use case is that
of enhancing a standard LLM-based code explanation with code insights that our
system generates. To evaluate explanations' quality, we conducted a small scale
user study, with developers of several open projects, as well as developers of
proprietary projects. Our user study indicates that when insights are generated
they often are helpful and non trivial, and are free from hallucinations.
Ссылки и действия
Дополнительные ресурсы: