Learning-Based Hashing for ANN Search: Foundations and Early Advances
2510.04127v1
cs.IR, cs.AI, cs.CV, cs.LG
2025-10-08
Авторы:
Sean Moran
Abstract
Approximate Nearest Neighbour (ANN) search is a fundamental problem in
information retrieval, underpinning large-scale applications in computer
vision, natural language processing, and cross-modal search. Hashing-based
methods provide an efficient solution by mapping high-dimensional data into
compact binary codes that enable fast similarity computations in Hamming space.
Over the past two decades, a substantial body of work has explored learning to
hash, where projection and quantisation functions are optimised from data
rather than chosen at random.
This article offers a foundational survey of early learning-based hashing
methods, with an emphasis on the core ideas that shaped the field. We review
supervised, unsupervised, and semi-supervised approaches, highlighting how
projection functions are designed to generate meaningful embeddings and how
quantisation strategies convert these embeddings into binary codes. We also
examine extensions to multi-bit and multi-threshold models, as well as early
advances in cross-modal retrieval.
Rather than providing an exhaustive account of the most recent methods, our
goal is to introduce the conceptual foundations of learning-based hashing for
ANN search. By situating these early models in their historical context, we aim
to equip readers with a structured understanding of the principles, trade-offs,
and open challenges that continue to inform current research in this area.