Detecting and Rectifying Noisy Labels: A Similarity-based Approach
2509.23964v1
cs.LG, cs.CL
2025-10-01
Авторы:
Dang Huu-Tien, Naoya Inoue
Abstract
Label noise in datasets could damage the performance of neural net training.
As the size of modern deep networks grows, there is a growing demand for
automated tools for detecting such errors. In this paper, we propose post-hoc,
model-agnostic error detection and rectification methods utilizing the
penultimate feature from a neural network. Our idea is based on the observation
that the similarity between the penultimate feature of a mislabeled data point
and its true class data points is higher than that for data points from other
classes, making the probability of label occurrence within a tight, similar
cluster informative for detecting and rectifying errors. Extensive experiments
show our method not only demonstrates high performance across various noises
but also automatically rectifies these errors to improve the quality of
datasets.
Ссылки и действия
Дополнительные ресурсы: