How Scale Breaks "Normalized Stress" and KL Divergence: Rethinking Quality Metrics
2510.08660v1
cs.LG, stat.ML
2025-10-14
Авторы:
Kiran Smelser, Kaviru Gunaratne, Jacob Miller, Stephen Kobourov
Abstract
Complex, high-dimensional data is ubiquitous across many scientific
disciplines, including machine learning, biology, and the social sciences. One
of the primary methods of visualizing these datasets is with two-dimensional
scatter plots that visually capture some properties of the data. Because
visually determining the accuracy of these plots is challenging, researchers
often use quality metrics to measure the projection's accuracy and faithfulness
to the original data. One of the most commonly employed metrics, normalized
stress, is sensitive to uniform scaling (stretching, shrinking) of the
projection, despite this act not meaningfully changing anything about the
projection. Another quality metric, the Kullback--Leibler (KL) divergence used
in the popular t-Distributed Stochastic Neighbor Embedding (t-SNE) technique,
is also susceptible to this scale sensitivity. We investigate the effect of
scaling on stress and KL divergence analytically and empirically by showing
just how much the values change and how this affects dimension reduction
technique evaluations. We introduce a simple technique to make both metrics
scale-invariant and show that it accurately captures expected behavior on a
small benchmark.
Ссылки и действия
Дополнительные ресурсы: