Intent Clustering with Shared Pseudo-Labels

2510.14640v2 cs.CL, cs.IR 2025-10-20

Авторы:

I-Fan Lin, Faegheh Hasibi, Suzan Verberne

Abstract

In this paper, we propose an intuitive, training-free and label-free method for intent clustering that makes minimal assumptions using lightweight and open-source LLMs. Many current approaches rely on commercial LLMs, which are costly, and offer limited transparency. Additionally, their methods often explicitly depend on knowing the number of clusters in advance, which is often not the case in realistic settings. To address these challenges, instead of asking the LLM to match similar text directly, we first ask it to generate pseudo-labels for each text, and then perform multi-label classification in this pseudo-label set for each text. This approach is based on the hypothesis that texts belonging to the same cluster will share more labels, and will therefore be closer when encoded into embeddings. These pseudo-labels are more human-readable than direct similarity matches. Our evaluation on four benchmark sets shows that our approach achieves results comparable to and better than recent baselines, while remaining simple and computationally efficient. Our findings indicate that our method can be applied in low-resource scenarios and is stable across multiple models and datasets.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Intent Clustering with Shared Pseudo-Labels

Авторы:

Abstract

Ссылки и действия

Связанные статьи

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

AR-Med: Automated Relevance Enhancement in Medical Search via LLM-Driven Informa...

Mitigating the Threshold Priming Effect in Large Language Model-Based Relevance ...

MMAG: Mixed Memory-Augmented Generation for Large Language Models Applications

Towards Unification of Hallucination Detection and Fact Verification for Large L...

Навигация