📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR

2025-10-02

Авторы:

Fanding Huang, Guanbo Huang, Xiao Fan, Yi He, Xiao Liang, Xiao Chen, Qinting Jiang, Faisal Nadeem Khan, Jingyan Jiang, Zhi Wang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

A prevailing view in Reinforcement Learning for Verifiable Rewards (RLVR) interprets recent progress through the lens of an exploration-exploitation trade-off, a perspective largely shaped by token-level metrics. We re-examine this perspective, proposing that this perceived trade-off may not be a fundamental constraint but rather an artifact of the measurement level. To investigate this, we shift the analysis to the semantically rich hidden-state space, adopting Effective Rank (ER) to quantify e...

ID: 2509.23808v2 cs.LG, cs.CL

arXiv PDF

📄 SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression

2025-10-02

Авторы:

Haoming Wen, Yushi Bai, Juanzi Li, Jie Tang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce SIRI, Scaling Iterative Reinforcement Learning with Interleaved Compression, a simple yet effective RL approach for Large Reasoning Models (LRMs) that enables more efficient and accurate reasoning. Existing studies have observed repetitive thinking patterns in LRMs, and attempts to reduce them often come at the cost of performance. In this paper, we show that this trade-off can be overcome through a training regime that iteratively alternates between compressing and expanding the re...

ID: 2509.25176v1 cs.LG, cs.CL

arXiv PDF

📄 Nudging the Boundaries of LLM Reasoning

2025-10-02

Авторы:

Justin Chih-Yao Chen, Becky Xiangyu Peng, Prafulla Kumar Choubey, Kung-Hsiang Huang, Jiaxin Zhang, Mohit Bansal, Chien-Sheng Wu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Current online reinforcement learning (RL) algorithms like GRPO share a key limitation in LLM reasoning: they cannot learn from problems that are "unsolvable" to the model. In other words, they can only improve performance on problems where the model is capable of exploring the correct answer. Consequently, the model's "upper limit" remains unchanged after RL training, even though the likelihood of solving easier, solvable problems may increase. These hard samples cannot contribute to training, ...

ID: 2509.25666v1 cs.LG, cs.CL

arXiv PDF

📄 Can VLM Pseudo-Labels Train a Time-Series QA Model That Outperforms the VLM?

2025-10-02

Авторы:

Takuya Fujimura, Kota Dohi, Natsuo Yamashita, Yohei Kawaguchi

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Time-series question answering (TSQA) tasks face significant challenges due to the lack of labeled data. Alternatively, with recent advancements in large-scale models, vision-language models (VLMs) have demonstrated the potential to analyze time-series signals in a zero-shot manner. In this paper, we propose a training approach that uses pseudo labels generated by a VLM. Although VLMs can produce incorrect labels, TSQA models can still be effectively trained based on the property that deep neura...

ID: 2509.25696v1 cs.LG, cs.CL, eess.SP

arXiv PDF

📄 MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding

2025-10-02

Авторы:

Hanghui Guo, Shimin Di, Pasquale De Meo, Zhangze Chen, Jia Zhu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As a critical task in data quality control, claim verification aims to curb the spread of misinformation by assessing the truthfulness of claims based on a wide range of evidence. However, traditional methods often overlook the complex interactions between evidence, leading to unreliable verification results. A straightforward solution represents the claim and evidence as a fully connected graph, which we define as the Claim-Evidence Graph (C-E Graph). Nevertheless, claim verification methods ba...

ID: 2509.25715v1 cs.LG, cs.CL, stat.ME

arXiv PDF

📄 Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space

2025-10-02

Авторы:

Xiang Zhang, Kun Wei, Xu Yang, Chenghao Xu, Su Yan, Cheng Deng

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

As Large Language Models (LLMs) become increasingly prevalent, their security vulnerabilities have already drawn attention. Machine unlearning is introduced to seek to mitigate these risks by removing the influence of undesirable data. However, existing methods not only rely on the retained dataset to preserve model utility, but also suffer from cumulative catastrophic utility loss under continuous unlearning requests. To solve this dilemma, we propose a novel method, called Rotation Control Unl...

ID: 2509.25743v1 cs.LG, cs.CL

arXiv PDF

📄 CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models

2025-10-02

Авторы:

Weiyu Huang, Yuezhou Hu, Jun Zhu, Jianfei Chen

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Sparsity-aware training is an effective approach for transforming large language models (LLMs) into hardware-friendly sparse patterns, thereby reducing latency and memory consumption during inference. In this paper, we propose Continuous Adaptive Sparse Trainer (CAST), a fully continuous and differentiable sparsity-aware training framework for semi-structured (or "N:M") sparse models. Unlike previous approaches that optimize sparsity patterns and weights separately, CAST enables seamless joint o...

ID: 2509.25996v1 cs.LG, cs.CL

arXiv PDF

📄 FITS: Towards an AI-Driven Fashion Information Tool for Sustainability

2025-10-02

Авторы:

Daphne Theodorakopoulos, Elisabeth Eberling, Miriam Bodenheimer, Sabine Loos, Frederic Stahl

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Access to credible sustainability information in the fashion industry remains limited and challenging to interpret, despite growing public and regulatory demands for transparency. General-purpose language models often lack domain-specific knowledge and tend to "hallucinate", which is particularly harmful for fields where factual correctness is crucial. This work explores how Natural Language Processing (NLP) techniques can be applied to classify sustainability data for fashion brands, thereby ad...

ID: 2509.26017v1 cs.LG, cs.CL

arXiv PDF

📄 Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners

2025-10-02

Авторы:

Xin Xu, Cliveb AI, Kai Yang, Tianhao Chen, Yang Wang, Saiyong Yang, Can Yang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement Learning with Verifiable Reward (RLVR) effectively solves complex tasks but demands extremely long context lengths during training, leading to substantial computational costs. While multi-stage training can partially mitigate this, starting with overly short contexts often causes irreversible performance degradation, ultimately failing to reduce overall training compute significantly. In this paper, we introduce **T**hinking-**F**ree **P**olicy **I**nitialization (**TFPI**), a simp...

ID: 2509.26226v1 cs.LG, cs.CL

arXiv PDF

📄 Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

2025-10-02

Авторы:

Runze Liu, Jiakang Wang, Yuling Shi, Zhihui Xie, Chenxin An, Kaiyan Zhang, Jian Zhao, Xiaodong Gu, Lei Lin, Wenping Hu, Xiu Li, Fuzheng Zhang, Guorui Zhou, Kun Gai

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Reinforcement Learning (RL) has shown remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). Process-Supervised RL (PSRL) has emerged as a more effective paradigm compared to outcome-based RL. However, existing PSRL approaches suffer from limited exploration efficiency, both in terms of branching positions and sampling. In this paper, we introduce a novel PSRL framework (AttnRL), which enables efficient exploration for reasoning models. Motivated by prelimina...

ID: 2509.26628v1 cs.LG, cs.CL

arXiv PDF

Показано 131 - 140 из 233 записей