📊 Статистика дайджестов
Всего дайджестов: 34022 Добавлено сегодня: 82
Последнее обновление: сегодня
📄 The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation
2025-10-16Авторы:
Minghao Tang, Shiyu Ni, Jingtong Wu, Zengxin Han, Keping Bi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by
retrieving external documents. As an emerging form of RAG, parametric
retrieval-augmented generation (PRAG) encodes documents as model parameters
(i.e., LoRA modules) and injects these representations into the model during
inference, enabling interaction between the LLM and documents at parametric
level. Compared with directly placing documents in the input context, PRAG is
more efficient and has the potential to offer...
📄 CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms
2025-10-15Авторы:
Peiyuan Gong, Feiran Zhu, Yaqi Yin, Chenglei Dai, Chao Zhang, Kai Zheng, Wentian Bao, Jiaxin Mao, Yi Zhang
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Short-video platforms have rapidly become a new generation of information
retrieval systems, where users formulate queries to access desired videos.
However, user queries, especially long-tail ones, often suffer from spelling
errors, incomplete phrasing, and ambiguous intent, resulting in mismatches
between user expectations and retrieved results. While large language models
(LLMs) have shown success in long-tail query rewriting within e-commerce, they
struggle on short-video platforms, where pr...
📄 QDER: Query-Specific Document and Entity Representations for Multi-Vector Document Re-Ranking
2025-10-15Авторы:
Shubham Chatterjee, Jeff Dalton
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Neural IR has advanced through two distinct paths: entity-oriented approaches
leveraging knowledge graphs and multi-vector models capturing fine-grained
semantics. We introduce QDER, a neural re-ranking model that unifies these
approaches by integrating knowledge graph semantics into a multi-vector model.
QDER's key innovation lies in its modeling of query-document relationships:
rather than computing similarity scores on aggregated embeddings, we maintain
individual token and entity representat...
Авторы:
Shubham Chatterjee
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Current neural re-rankers often struggle with complex information needs and
long, content-rich documents. The fundamental issue is not computational--it is
intelligent content selection: identifying what matters in lengthy,
multi-faceted texts. While humans naturally anchor their understanding around
key entities and concepts, neural models process text within rigid token
windows, treating all interactions as equally important and missing critical
semantic signals. We introduce REGENT, a neural ...
Авторы:
Peiyang Liu, Ziqiang Cui, Di Liang, Wei Ye
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) by
mitigating hallucinations and outdated information issues, yet simultaneously
facilitates unauthorized data appropriation at scale. This paper addresses this
challenge through two key contributions. First, we introduce RPD, a novel
dataset specifically designed for RAG plagiarism detection that encompasses
diverse professional domains and writing styles, overcoming limitations in
existing resources. Second, we develop ...
Авторы:
Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, Zheng Liu
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
In this paper, we introduce ReasonEmbed, a novel text embedding model
developed for reasoning-intensive document retrieval. Our work includes three
key technical contributions. First, we propose ReMixer, a new data synthesis
method that overcomes the triviality problem prevalent in previous synthetic
datasets, enabling large-scale production of 82K high-quality training samples.
Second, we design Redapter, a self-adaptive learning algorithm that dynamically
adjusts training each sample's weight ...
Авторы:
Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Automatic Term Extraction (ATE) is a critical component in downstream NLP
tasks such as document tagging, ontology construction and patent analysis.
Current state-of-the-art methods require expensive human annotation and
struggle with domain transfer, limiting their practical deployment. This
highlights the need for more robust, scalable solutions and realistic
evaluation settings. To address this, we introduce a comprehensive benchmark
spanning seven diverse domains, enabling performance evalua...
Авторы:
Simon Lupart, Daniël van Dijk, Eric Langezaal, Ian van Dort, Mohammad Aliannejadi
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Personalized Conversational Information Retrieval (CIR) has seen rapid
progress in recent years, driven by the development of Large Language Models
(LLMs). Personalized CIR aims to enhance document retrieval by leveraging
user-specific information, such as preferences, knowledge, or constraints, to
tailor responses to individual needs. A key resource for this task is the TREC
iKAT 2023 dataset, designed to evaluate personalization in CIR pipelines.
Building on this resource, Mo et al. explored s...
Авторы:
Yu-Fei Shih, An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
People often struggle to remember specific details of past experiences, which
can lead to the need to revisit these memories. Consequently, lifelog retrieval
has emerged as a crucial application. Various studies have explored methods to
facilitate rapid access to personal lifelogs for memory recall assistance. In
this paper, we propose a Captioning-Integrated Visual Lifelog (CIVIL) Retrieval
System for extracting specific images from a user's visual lifelog based on
textual queries. Unlike tradi...
Авторы:
Jingjie Ning, Yibo Kong, Yunfan Long, Jamie Callan
Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']
Annotation:
Retrieval-Augmented Generation (RAG) couples document retrieval with large
language models (LLMs). While scaling generators improves accuracy, it also
raises cost and limits deployability. We explore an orthogonal axis: enlarging
the retriever's corpus to reduce reliance on large LLMs. Experimental results
show that corpus scaling consistently strengthens RAG and can often serve as a
substitute for increasing model size, though with diminishing returns at larger
scales. Small- and mid-sized gene...
Показано 21 -
30
из 67 записей