The Tree-SNE Tree Exists
2510.15014v1
stat.ML, cs.LG, math.OC
2025-10-21
Авторы:
Jack Kendrick
Abstract
The clustering and visualisation of high-dimensional data is a ubiquitous
task in modern data science. Popular techniques include nonlinear
dimensionality reduction methods like t-SNE or UMAP. These methods face the
`scale-problem' of clustering: when dealing with the MNIST dataset, do we want
to distinguish different digits or do we want to distinguish different ways of
writing the digits? The answer is task dependent and depends on scale. We
revisit an idea of Robinson & Pierce-Hoffman that exploits an underlying
scaling symmetry in t-SNE to replace 2-dimensional with (2+1)-dimensional
embeddings where the additional parameter accounts for scale. This gives rise
to the t-SNE tree (short: tree-SNE). We prove that the optimal embedding
depends continuously on the scaling parameter for all initial conditions
outside a set of measure 0: the tree-SNE tree exists. This idea conceivably
extends to other attraction-repulsion methods and is illustrated on several
examples.