📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 The Role of Parametric Injection-A Systematic Study of Parametric Retrieval-Augmented Generation

2025-10-16

Авторы:

Minghao Tang, Shiyu Ni, Jingtong Wu, Zengxin Han, Keping Bi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by retrieving external documents. As an emerging form of RAG, parametric retrieval-augmented generation (PRAG) encodes documents as model parameters (i.e., LoRA modules) and injects these representations into the model during inference, enabling interaction between the LLM and documents at parametric level. Compared with directly placing documents in the input context, PRAG is more efficient and has the potential to offer...

ID: 2510.12668v1 cs.IR, cs.CL

arXiv PDF

📄 CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms

2025-10-15

Авторы:

Peiyuan Gong, Feiran Zhu, Yaqi Yin, Chenglei Dai, Chao Zhang, Kai Zheng, Wentian Bao, Jiaxin Mao, Yi Zhang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Short-video platforms have rapidly become a new generation of information retrieval systems, where users formulate queries to access desired videos. However, user queries, especially long-tail ones, often suffer from spelling errors, incomplete phrasing, and ambiguous intent, resulting in mismatches between user expectations and retrieved results. While large language models (LLMs) have shown success in long-tail query rewriting within e-commerce, they struggle on short-video platforms, where pr...

ID: 2510.10095v1 cs.IR, cs.CL

arXiv PDF

📄 QDER: Query-Specific Document and Entity Representations for Multi-Vector Document Re-Ranking

2025-10-15

Авторы:

Shubham Chatterjee, Jeff Dalton

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Neural IR has advanced through two distinct paths: entity-oriented approaches leveraging knowledge graphs and multi-vector models capturing fine-grained semantics. We introduce QDER, a neural re-ranking model that unifies these approaches by integrating knowledge graph semantics into a multi-vector model. QDER's key innovation lies in its modeling of query-document relationships: rather than computing similarity scores on aggregated embeddings, we maintain individual token and entity representat...

ID: 2510.11589v1 cs.IR, cs.CL

arXiv PDF

📄 REGENT: Relevance-Guided Attention for Entity-Aware Multi-Vector Neural Re-Ranking

2025-10-15

Авторы:

Shubham Chatterjee

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Current neural re-rankers often struggle with complex information needs and long, content-rich documents. The fundamental issue is not computational--it is intelligent content selection: identifying what matters in lengthy, multi-faceted texts. While humans naturally anchor their understanding around key entities and concepts, neural models process text within rigid token windows, treating all interactions as equally important and missing critical semantic signals. We introduce REGENT, a neural ...

ID: 2510.11592v1 cs.IR, cs.CL

arXiv PDF

📄 Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft

2025-10-11

Авторы:

Peiyang Liu, Ziqiang Cui, Di Liang, Wei Ye

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) by mitigating hallucinations and outdated information issues, yet simultaneously facilitates unauthorized data appropriation at scale. This paper addresses this challenge through two key contributions. First, we introduce RPD, a novel dataset specifically designed for RAG plagiarism detection that encompasses diverse professional domains and writing styles, overcoming limitations in existing resources. Second, we develop ...

ID: 2510.07728v1 cs.IR, cs.CL

arXiv PDF

📄 ReasonEmbed: Enhanced Text Embeddings for Reasoning-Intensive Document Retrieval

2025-10-11

Авторы:

Jianlyu Chen, Junwei Lan, Chaofan Li, Defu Lian, Zheng Liu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

In this paper, we introduce ReasonEmbed, a novel text embedding model developed for reasoning-intensive document retrieval. Our work includes three key technical contributions. First, we propose ReMixer, a new data synthesis method that overcomes the triviality problem prevalent in previous synthetic datasets, enabling large-scale production of 82K high-quality training samples. Second, we design Redapter, a self-adaptive learning algorithm that dynamically adjusts training each sample's weight ...

ID: 2510.08252v1 cs.IR, cs.CL

arXiv PDF

📄 Crossing Domains without Labels: Distant Supervision for Term Extraction

2025-10-10

Авторы:

Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Automatic Term Extraction (ATE) is a critical component in downstream NLP tasks such as document tagging, ontology construction and patent analysis. Current state-of-the-art methods require expensive human annotation and struggle with domain transfer, limiting their practical deployment. This highlights the need for more robust, scalable solutions and realistic evaluation settings. To address this, we introduce a comprehensive benchmark spanning seven diverse domains, enabling performance evalua...

ID: 2510.06838v1 cs.IR, cs.CL

arXiv PDF

📄 Investigating LLM Variability in Personalized Conversational Information Retrieval

2025-10-08

Авторы:

Simon Lupart, Daniël van Dijk, Eric Langezaal, Ian van Dort, Mohammad Aliannejadi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Personalized Conversational Information Retrieval (CIR) has seen rapid progress in recent years, driven by the development of Large Language Models (LLMs). Personalized CIR aims to enhance document retrieval by leveraging user-specific information, such as preferences, knowledge, or constraints, to tailor responses to individual needs. A key resource for this task is the TREC iKAT 2023 dataset, designed to evaluate personalization in CIR pipelines. Building on this resource, Mo et al. explored s...

ID: 2510.03795v1 cs.IR, cs.CL

arXiv PDF

📄 Visual Lifelog Retrieval through Captioning-Enhanced Interpretation

2025-10-08

Авторы:

Yu-Fei Shih, An-Zi Yen, Hen-Hsen Huang, Hsin-Hsi Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

People often struggle to remember specific details of past experiences, which can lead to the need to revisit these memories. Consequently, lifelog retrieval has emerged as a crucial application. Various studies have explored methods to facilitate rapid access to personal lifelogs for memory recall assistance. In this paper, we propose a Captioning-Integrated Visual Lifelog (CIVIL) Retrieval System for extracting specific images from a user's visual lifelog based on textual queries. Unlike tradi...

ID: 2510.04010v1 cs.IR, cs.CL, cs.CV, cs.MM

arXiv PDF

📄 Less LLM, More Documents: Searching for Improved RAG

2025-10-07

Авторы:

Jingjie Ning, Yibo Kong, Yunfan Long, Jamie Callan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Retrieval-Augmented Generation (RAG) couples document retrieval with large language models (LLMs). While scaling generators improves accuracy, it also raises cost and limits deployability. We explore an orthogonal axis: enlarging the retriever's corpus to reduce reliance on large LLMs. Experimental results show that corpus scaling consistently strengthens RAG and can often serve as a substitute for increasing model size, though with diminishing returns at larger scales. Small- and mid-sized gene...

ID: 2510.02657v2 cs.IR, cs.CL, H.3.3; I.2.7

arXiv PDF

Показано 21 - 30 из 67 записей