Intent Clustering with Shared Pseudo-Labels
2510.14640v2
cs.CL, cs.IR
2025-10-20
Авторы:
I-Fan Lin, Faegheh Hasibi, Suzan Verberne
Abstract
In this paper, we propose an intuitive, training-free and label-free method
for intent clustering that makes minimal assumptions using lightweight and
open-source LLMs. Many current approaches rely on commercial LLMs, which are
costly, and offer limited transparency. Additionally, their methods often
explicitly depend on knowing the number of clusters in advance, which is often
not the case in realistic settings. To address these challenges, instead of
asking the LLM to match similar text directly, we first ask it to generate
pseudo-labels for each text, and then perform multi-label classification in
this pseudo-label set for each text. This approach is based on the hypothesis
that texts belonging to the same cluster will share more labels, and will
therefore be closer when encoded into embeddings. These pseudo-labels are more
human-readable than direct similarity matches. Our evaluation on four benchmark
sets shows that our approach achieves results comparable to and better than
recent baselines, while remaining simple and computationally efficient. Our
findings indicate that our method can be applied in low-resource scenarios and
is stable across multiple models and datasets.
Ссылки и действия
Дополнительные ресурсы: