Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry

2510.04631v2 cs.CL, cs.IR 2025-10-08

Авторы:

Anastasia Zhukova, Jonas Lührs, Christian E. Lobmüller, Bela Gipp

Abstract

Recent trends in NLP utilize knowledge graphs (KGs) to enhance pretrained language models by incorporating additional knowledge from the graph structures to learn domain-specific terminology or relationships between documents that might otherwise be overlooked. This paper explores how SciNCL, a graph-aware neighborhood contrastive learning methodology originally designed for scientific publications, can be applied to the process industry domain, where text logs contain crucial information about daily operations and are often structured as sparse KGs. Our experiments demonstrate that language models fine-tuned with triplets derived from graph embeddings (GE) outperform a state-of-the-art mE5-large text encoder by 9.8-14.3% (5.45-7.96p) on the proprietary process industry text embedding benchmark (PITEB) while having 3 times fewer parameters.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Contrastive Learning Using Graph Embeddings for Domain Adaptation of Language Models in the Process Industry

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Informa...

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance ...

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

Towards Unification of Hallucination Detection and Fact Verification for Large L...

Навигация