📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
Авторы:
Philipp Jung, Nicholas Chandler, Sebastian Jäger, Felix Biessmann
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Data quality monitoring is a core challenge in modern information processing systems. While many approaches to detect data errors or shifts have been proposed, few studies investigate the mechanisms governing error generation. We argue that knowing how errors were generated can be key to tracing and fixing them. In this study, we build on existing work in the statistics literature on missing values and propose MechDetect, a simple algorithm to investigate error generation mechanisms. Given a tab...
📄 Learning Filter-Aware Distance Metrics for Nearest Neighbor Search with Multiple Filters
2025-11-08Авторы:
Ananya Sutradhar, Suryansh Gupta, Ravishankar Krishnaswamy, Haiyang Xu, Aseem Rastogi, Gopal Srinivasa
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Filtered Approximate Nearest Neighbor (ANN) search retrieves the closest
vectors for a query vector from a dataset. It enforces that a specified set of
discrete labels $S$ for the query must be included in the labels of each
retrieved vector. Existing graph-based methods typically incorporate filter
awareness by assigning fixed penalties or prioritizing nodes based on filter
satisfaction. However, since these methods use fixed, data in- dependent
penalties, they often fail to generalize across d...
Авторы:
Dingyi Kang, Dongming Jiang, Hanshen Yang, Hang Liu, Bingzhe Li
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Approximate Nearest Neighbor Search (ANNS), as the core of vector databases
(VectorDBs), has become widely used in modern AI and ML systems, powering
applications from information retrieval to bio-informatics. While graph-based
ANNS methods achieve high query efficiency, their scalability is constrained by
the available host memory. Recent disk-based ANNS approaches mitigate memory
usage by offloading data to Solid-State Drives (SSDs). However, they still
suffer from issues such as long I/O trav...