📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Iterative Layer-wise Distillation for Efficient Compression of Large Language Models

2025-11-11

Авторы:

Grigory Kovalev, Mikhail Tikhomirov

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

This work investigates distillation methods for large language models (LLMs) with the goal of developing compact models that preserve high performance. Several existing approaches are reviewed, with a discussion of their respective strengths and limitations. An improved method based on the ShortGPT approach has been developed, building upon the idea of incorporating iterative evaluation of layer importance. At each step, importance is assessed by measuring performance degradation when individual...

ID: 2511.05085v1 cs.CL, cs.LG

arXiv PDF

📄 QUESTER: Query Specification for Generative Retrieval

2025-11-11

Авторы:

Arthur Satouf, Yuxuan Zong, Habiboulaye Amadou-Boubacar, Pablo Piantanida, Benjamin Piwowarski

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Generative Retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and directly generating document identifiers. However, GR often struggles to generalize and is costly to scale. We introduce QUESTER (QUEry SpecificaTion gEnerative Retrieval), which reframes GR as query specification generation - in this work, a simple keyword query handled by BM25 - using a (small) LLM. The policy is trained using reinforcement learning techniques (GRPO)...

ID: 2511.05301v1 cs.IR, cs.CL, cs.LG, 68P20, 68T50, H.3

arXiv PDF

📄 Steering Language Models with Weight Arithmetic

2025-11-11

Авторы:

Constanza Fierro, Fabien Roger

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Providing high-quality feedback to Large Language Models (LLMs) on a diverse training distribution can be difficult and expensive, and providing feedback only on a narrow distribution can result in unintended generalizations. To better leverage narrow training data, we propose contrastive weight steering, a simple post-training method that edits the model parameters using weight arithmetic. We isolate a behavior direction in weight-space by subtracting the weight deltas from two small fine-tunes...

ID: 2511.05408v1 cs.CL, cs.LG

arXiv PDF

📄 GRAD: Graph-Retrieved Adaptive Decoding for Hallucination Mitigation

2025-11-08

Авторы:

Manh Nguyen, Sunil Gupta, Dai Do, Hung Le

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Hallucination mitigation remains a persistent challenge for large language models (LLMs), even as model scales grow. Existing approaches often rely on external knowledge sources, such as structured databases or knowledge graphs, accessed through prompting or retrieval. However, prompt-based grounding is fragile and domain-sensitive, while symbolic knowledge integration incurs heavy retrieval and formatting costs. Motivated by knowledge graphs, we introduce Graph-Retrieved Adaptive Decoding (GRAD...

ID: 2511.03900v1 cs.CL, cs.LG

arXiv PDF

📄 REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

2025-11-08

Авторы:

Liran Cohen, Yaniv Nemcovesky, Avi Mendelson

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Machine unlearning aims to remove the influence of specific training data from a model without requiring full retraining. This capability is crucial for ensuring privacy, safety, and regulatory compliance. Therefore, verifying whether a model has truly forgotten target data is essential for maintaining reliability and trustworthiness. However, existing evaluation methods often assess forgetting at the level of individual inputs. This approach may overlook residual influence present in semantical...

ID: 2511.04228v1 cs.CL, cs.LG, I.2.7; I.2.6; K.4.1

arXiv PDF

📄 DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration

2025-11-08

Авторы:

Narjes Nourzad, Hanqing Yang, Shiyu Chen, Carlee Joe-Wong

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Cooperative multi-agent planning requires agents to make joint decisions with partial information and limited communication. Coordination at the trajectory level often fails, as small deviations in timing or movement cascade into conflicts. Symbolic planning mitigates this challenge by raising the level of abstraction and providing a minimal vocabulary of actions that enable synchronization and collective progress. We present DR. WELL, a decentralized neurosymbolic framework for cooperative mult...

ID: 2511.04646v1 cs.AI, cs.CL, cs.LG, cs.MA

arXiv PDF

📄 Automatic Machine Translation Detection Using a Surrogate Multilingual Translation Model

2025-11-07

Авторы:

Cristian García-Romero, Miquel Esplà-Gomis, Felipe Sánchez-Martínez

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Modern machine translation (MT) systems depend on large parallel corpora, often collected from the Internet. However, recent evidence indicates that (i) a substantial portion of these texts are machine-generated translations, and (ii) an overreliance on such synthetic content in training data can significantly degrade translation quality. As a result, filtering out non-human translations is becoming an essential pre-processing step in building high-quality MT systems. In this work, we propose a ...

ID: 2511.02958v1 cs.CL, cs.LG

arXiv PDF

📄 Data-Efficient Adaptation and a Novel Evaluation Method for Aspect-based Sentiment Analysis

2025-11-07

Авторы:

Yan Cathy Hua, Paul Denny, Jörg Wicker, Katerina Taškova

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Aspect-based Sentiment Analysis (ABSA) is a fine-grained opinion mining approach that identifies and classifies opinions associated with specific entities (aspects) or their categories within a sentence. Despite its rapid growth and broad potential, ABSA research and resources remain concentrated in commercial domains, leaving analytical needs unmet in high-demand yet low-resource areas such as education and healthcare. Domain adaptation challenges and most existing methods' reliance on resource...

ID: 2511.03034v1 cs.CL, cs.LG

arXiv PDF

📄 PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech

2025-11-07

Авторы:

Michel Wong, Ali Alshehri, Sophia Kao, Haotian He

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Text Normalization (TN) is a key preprocessing step in Text-to-Speech (TTS) systems, converting written forms into their canonical spoken equivalents. Traditional TN systems can exhibit high accuracy, but involve substantial engineering effort, are difficult to scale, and pose challenges to language coverage, particularly in low-resource settings. We propose PolyNorm, a prompt-based approach to TN using Large Language Models (LLMs), aiming to reduce the reliance on manually crafted rules and ena...

ID: 2511.03080v1 cs.CL, cs.LG

arXiv PDF

📄 BanglaSTEM: A Parallel Corpus for Technical Domain Bangla-English Translation

2025-11-07

Авторы:

Kazi Reyazul Hasan, Mubasshira Musarrat, A. B. M. Alim Al Islam, Muhammad Abdullah Adnan

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models work well for technical problem solving in English but perform poorly when the same questions are asked in Bangla. A simple solution would be to translate Bangla questions into English first and then use these models. However, existing Bangla-English translation systems struggle with technical terms. They often mistranslate specialized vocabulary, which changes the meaning of the problem and leads to wrong answers. We present BanglaSTEM, a dataset of 5,000 carefully selecte...

ID: 2511.03498v1 cs.CL, cs.LG

arXiv PDF

Показано 101 - 110 из 573 записей