Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry
2510.04631v2
cs.CL, cs.IR
2025-10-08
Авторы:
Anastasia Zhukova, Jonas Lührs, Christian E. Lobmüller, Bela Gipp
Abstract
Recent trends in NLP utilize knowledge graphs (KGs) to enhance pretrained
language models by incorporating additional knowledge from the graph structures
to learn domain-specific terminology or relationships between documents that
might otherwise be overlooked. This paper explores how SciNCL, a graph-aware
neighborhood contrastive learning methodology originally designed for
scientific publications, can be applied to the process industry domain, where
text logs contain crucial information about daily operations and are often
structured as sparse KGs. Our experiments demonstrate that language models
fine-tuned with triplets derived from graph embeddings (GE) outperform a
state-of-the-art mE5-large text encoder by 9.8-14.3% (5.45-7.96p) on the
proprietary process industry text embedding benchmark (PITEB) while having 3
times fewer parameters.
Ссылки и действия
Дополнительные ресурсы: