CodeGenLink: A Tool to Find the Likely Origin and License of Automatically Generated Code
2510.01077v1
cs.SE, cs.AI
2025-10-04
Авторы:
Daniele Bifolco, Guido Annicchiarico, Pierluigi Barbiero, Massimiliano Di Penta, Fiorella Zampetti
Abstract
Large Language Models (LLMs) are widely used in software development tasks
nowadays. Unlike reusing code taken from the Web, for LLMs' generated code,
developers are concerned about its lack of trustworthiness and possible
copyright or licensing violations, due to the lack of code provenance
information. This paper proposes CodeGenLink, a GitHub CoPilot extension for
Visual Studio Code aimed at (i) suggesting links containing code very similar
to automatically generated code, and (ii) whenever possible, indicating the
license of the likely origin of the code. CodeGenLink retrieves candidate links
by combining LLMs with their web search features and then performs similarity
analysis between the generated and retrieved code. Preliminary results show
that CodeGenLink effectively filters unrelated links via similarity analysis
and provides licensing information when available. Tool URL:
https://github.com/danielebifolco/CodeGenLink Tool Video:
https://youtu.be/M6nqjBf9_pw
Ссылки и действия
Дополнительные ресурсы: