Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation
2510.23039v1
cs.LG, cs.DS, stat.ML
2025-10-29
Авторы:
Ved Danait, Srijan Das, Sujoy Bhore
Abstract
Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density
Estimation (A-KDE) are fundamental problems at the core of modern machine
learning, with broad applications in data analysis, information systems, and
large-scale decision making. In massive and dynamic data streams, a central
challenge is to design compact sketches that preserve essential structural
properties of the data while enabling efficient queries.
In this work, we develop new sketching algorithms that achieve sublinear
space and query time guarantees for both ANN and A-KDE for a dynamic stream of
data. For ANN in the streaming model, under natural assumptions, we design a
sublinear sketch that requires only $\mathcal{O}(n^{1+\rho-\eta})$ memory by
storing only a sublinear ($n^{-\eta}$) fraction of the total inputs, where
$\rho$ is a parameter of the LSH family, and $0<\eta<1$. Our method supports
sublinear query time, batch queries, and extends to the more general Turnstile
model. While earlier works have focused on Exact NN, this is the first result
on ANN that achieves near-optimal trade-offs between memory size and
approximation error.
Next, for A-KDE in the Sliding-Window model, we propose a sketch of size
$\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+\epsilon} - 1} \log^2 N\right)$,
where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window
size, and $\epsilon$ is the approximation error. This, to the best of our
knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the
Sliding-Window model.
We complement our theoretical results with experiments on various real-world
datasets, which show that the proposed sketches are lightweight and achieve
consistently low error in practice.
Ссылки и действия
Дополнительные ресурсы: