Constraint-Driven Small Language Models Based on Agent and OpenAlex Knowledge Graph: Mining Conceptual Pathways and Discovering Innovation Points in Academic Papers
2510.14303v1
cs.CL, cs.LG, I.2.7
2025-10-18
Авторы:
Ziye Xia, Sergei S. Ospichev
Abstract
In recent years, the rapid increase in academic publications across various
fields has posed severe challenges for academic paper analysis: scientists
struggle to timely and comprehensively track the latest research findings and
methodologies. Key concept extraction has proven to be an effective analytical
paradigm, and its automation has been achieved with the widespread application
of language models in industrial and scientific domains. However, existing
paper databases are mostly limited to similarity matching and basic
classification of key concepts, failing to deeply explore the relational
networks between concepts. This paper is based on the OpenAlex opensource
knowledge graph. By analyzing nearly 8,000 open-source paper data from
Novosibirsk State University, we discovered a strong correlation between the
distribution patterns of paper key concept paths and both innovation points and
rare paths. We propose a prompt engineering-based key concept path analysis
method. This method leverages small language models to achieve precise key
concept extraction and innovation point identification, and constructs an agent
based on a knowledge graph constraint mechanism to enhance analysis accuracy.
Through fine-tuning of the Qwen and DeepSeek models, we achieved significant
improvements in accuracy, with the models publicly available on the Hugging
Face platform.
Ссылки и действия
Дополнительные ресурсы: