Test-time Prompt Intervention

2508.02511v1 cs.AI, cs.CL 2025-08-09

Авторы:

Chenxu Yang, Qingyi Si, Mz Dai, Dingyu Yao, Mingyu Zheng, Minghui Chen, Zheng Lin, Weiping Wang

Резюме на русском

**Резюме** Проблема: Многие современные л LLM, особенно те, что используют длинные цепочки мышления (CoTs) для улучшения логических выводов, страдают от избыточной реплитации и непоследовательности в цепочках принятия решений. Это возникает из-за ориентации на пост-тренировочные модели, сконцентрированных на получении высокого награды в итоге, а не на оптимизации процесса мышления. Данные для регулирования промежуточных шагов малоизвестны и сложно получить в масштабе. Решение: Мы предлагаем Test-time Prompt Intervention (PI), новую архитектуру для динамического управления принятием решений во время выполнения. Она включает три модуля: когда (When), как (How) и что (Which). Эти модули позволяют взаимодействовать с моделью во время работы, управляя процессом уточняющими интервенциями и улучшая контролируемость и прозрачность. Основные выводы: Тестирование показало, что PI существенно сокращает длину CoTs, уменьшает ошибки при семантическом разборе и повышает надежность моделей. Это новая шаг в практическом интегрировании экспертных принципов логического мышления в традиционные л LLM.

Abstract

Test-time compute has led to remarkable success in the large language model (LLM) community, particularly for complex tasks, where longer chains of thought (CoTs) are generated to enhance reasoning capabilities. However, growing evidence reveals that such reasoning models often produce CoTs plagued by excessive redundancy, including unnecessary verification steps and repetitive reasoning shifts. The root cause lies in post-training of them that overly rely on outcome reward paradigms, as the data of process reward paradigms, which regulate intermediate reasoning steps, is difficult to construct at scale. To address this, we propose PI, a novel framework for Test-time Prompt Intervention. PI provides an interface to dynamically guide and regulate reasoning paths during inference through timely (When module) and proper (How module) interventions and post-intervention sampling (Which module). This allows human problem-solving expertise and cognitive science principles to be seamlessly integrated into LLMs' reasoning processes, enhancing controllability and interpretability. Extensive experiments across multiple models and datasets demonstrate that PI significantly shortens CoTs while reducing hallucination, yielding more concise and reliable reasoning.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Test-time Prompt Intervention

Авторы:

Резюме на русском

Abstract

Ссылки и действия

Связанные статьи

Algorithmic Thinking Theory

From Atomic to Composite: Reinforcement Learning Enables Generalization in Compl...

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Ches...

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Probing the "Psyche'' of Large Reasoning Models: Understanding Through a Human L...

Навигация