📊 Статистика дайджестов

Всего дайджестов: 34022 Добавлено сегодня: 82

Последнее обновление: сегодня

📄 Domain-Shift-Aware Conformal Prediction for Large Language Models

2025-10-09

Авторы:

Zhexiao Lin, Yuanyuan Li, Neeraj Sarna, Yuanyuan Gao, Michael von Gablenz

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Confo...

ID: 2510.05566v1 stat.ML, cs.AI, cs.CL, cs.LG, stat.AP

arXiv PDF

📄 In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

2025-10-09

Авторы:

Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Outcome-driven reinforcement learning has advanced reasoning in large language models (LLMs), but prevailing tool-augmented approaches train a single, monolithic policy that interleaves thoughts and tool calls under full context; this scales poorly with long horizons and diverse tools and generalizes weakly to new scenarios. Agentic systems offer a promising alternative by decomposing work across specialized modules, yet most remain training-free or rely on offline training decoupled from the li...

ID: 2510.05592v1 cs.AI, cs.CL, cs.LG, cs.MA

arXiv PDF

📄 ARM: Discovering Agentic Reasoning Modules for Generalizable Multi-Agent Systems

2025-10-09

Авторы:

Bohan Yao, Shiva Krishna Reddy Malay, Vikas Yadav

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Language Model (LLM)-powered Multi-agent systems (MAS) have achieved state-of-the-art results on various complex reasoning tasks. Recent works have proposed techniques to automate the design of MASes, eliminating the need for manual engineering. However, these techniques perform poorly, often achieving similar or inferior performance to simple baselines. Furthermore, they require computationally expensive re-discovery of architectures for each new task domain and expensive data annotation ...

ID: 2510.05746v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning

2025-10-09

Авторы:

Jiaru Zou, Soumya Roy, Vinay Kumar Verma, Ziyi Wang, David Wipf, Pan Lu, Sumit Negi, James Zou, Jingrui He

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Process Reward Models (PRMs) have recently emerged as a powerful framework for enhancing the reasoning capabilities of large reasoning models (LRMs), particularly in the context of test-time scaling (TTS). However, their potential for supervising LRMs on tabular reasoning domains remains underexplored. Through detailed empirical analyses, we identify that existing PRMs, though widely adopted for supervising text-only reasoning steps, struggle with table-specific operations such as sub-table retr...

ID: 2510.06217v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 MedCLM: Learning to Localize and Reason via a CoT-Curriculum in Medical Vision-Language Models

2025-10-08

Авторы:

Soo Yong Kim, Suin Cho, Vincent-Daniel Yun, Gyeongyeon Hwang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Bridging clinical diagnostic reasoning with AI remains a central challenge in medical imaging. We introduce MedCLM, an automated pipeline that converts detection datasets into large-scale medical visual question answering (VQA) data with Chain-of-Thought (CoT) reasoning by linking lesion boxes to organ segmentation and structured rationales. These contextual signals enable medical vision-language models to generate question-answer pairs with step-by-step reasoning. To utilize this data effective...

ID: 2510.04477v1 cs.CV, cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Speak, Edit, Repeat: High-Fidelity Voice Editing and Zero-Shot TTS with Cross-Attentive Mamba

2025-10-08

Авторы:

Baher Mohammad, Magauiya Zhussip, Stamatios Lefkimmiatis

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

We introduce MAVE (Mamba with Cross-Attention for Voice Editing and Synthesis), a novel autoregressive architecture for text-conditioned voice editing and high-fidelity text-to-speech (TTS) synthesis, built on a cross-attentive Mamba backbone. MAVE achieves state-of-the-art performance in speech editing and very competitive results in zero-shot TTS, while not being explicitly trained on the latter task, outperforming leading autoregressive and diffusion models on diverse, real-world audio. By in...

ID: 2510.04738v1 cs.SD, cs.AI, cs.CL, cs.LG, eess.AS

arXiv PDF

📄 BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

2025-10-08

Авторы:

Ivo Petrov, Jasper Dekoninck, Martin Vechev

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) have recently shown strong performance on mathematical benchmarks. At the same time, they are prone to hallucination and sycophancy, often providing convincing but flawed proofs for incorrect mathematical statements provided by users. This significantly limits the applicability of LLMs in theorem proving, as verification of these flawed proofs must be done manually by expert mathematicians. However, existing benchmarks that measure sycophancy in mathematics are limit...

ID: 2510.04721v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 MARS: Optimizing Dual-System Deep Research via Multi-Agent Reinforcement Learning

2025-10-08

Авторы:

Guoxin Chen, Zile Qiao, Wenqing Wang, Donglei Yu, Xuanzhong Chen, Hao Sun, Minpeng Liao, Kai Fan, Yong Jiang, Penguin Xie, Wayne Xin Zhao, Ruihua Song, Fei Huang

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large Reasoning Models (LRMs) often exhibit a tendency for overanalysis in simple tasks, where the models excessively utilize System 2-type, deliberate reasoning, leading to inefficient token generation. Furthermore, these models face challenges in adapting their reasoning capabilities to rapidly changing environments due to the static nature of their pretraining data. To address these issues, advancing Large Language Models (LLMs) for complex reasoning tasks requires innovative approaches that ...

ID: 2510.04935v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 On the Role of Temperature Sampling in Test-Time Scaling

2025-10-07

Авторы:

Yuheng Wu, Azalia Mirhoseini, Thierry Tambe

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Large language models (LLMs) can improve reasoning at inference time through test-time scaling (TTS), where multiple reasoning traces are generated and the best one is selected. Prior work shows that increasing the number of samples K steadily improves accuracy. In this paper, we demonstrate that this trend does not hold indefinitely: at large K, further scaling yields no gains, and certain hard questions remain unsolved regardless of the number of traces. Interestingly, we find that different s...

ID: 2510.02611v1 cs.AI, cs.CL, cs.LG

arXiv PDF

📄 Pareto-optimal Non-uniform Language Generation

2025-10-07

Авторы:

Moses Charikar, Chirag Pabbaraju

Саммари на русском не найдено
Доступные поля: ['id', 'arxiv_id', 'title', 'authors', 'abstract', 'summary_ru', 'categories', 'published_date', 'created_at']

Annotation:

Kleinberg and Mullainathan (2024) recently proposed an interesting model for language generation in the limit: Given a countable collection of languages, and an adversary enumerating the strings of some language $L$ from the collection, the objective is to generate new strings from the target language, such that all strings generated beyond some finite time are valid. Li, Raman and Tewari (2024) and Charikar and Pabbaraju (2024) showed strong non-uniform generation guarantees in this model, givi...

ID: 2510.02795v1 cs.DS, cs.AI, cs.CL, cs.LG

arXiv PDF

Показано 81 - 90 из 196 записей