Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

2510.26026v1 stat.ML, cs.LG 2025-11-01

Авторы:

Feichen Gan, Youcun Lu, Yingying Zhang, Yukun Liu

Abstract

Reliable uncertainty quantification is crucial for reinforcement learning (RL) in high-stakes settings. We propose a unified conformal prediction framework for infinite-horizon policy evaluation that constructs distribution-free prediction intervals {for returns} in both on-policy and off-policy settings. Our method integrates distributional RL with conformal calibration, addressing challenges such as unobserved returns, temporal dependencies, and distributional shifts. We propose a modular pseudo-return construction based on truncated rollouts and a time-aware calibration strategy using experience replay and weighted subsampling. These innovations mitigate model bias and restore approximate exchangeability, enabling uncertainty quantification even under policy shifts. Our theoretical analysis provides coverage guarantees that account for model misspecification and importance weight estimation. Empirical results, including experiments in synthetic and benchmark environments like Mountain Car, show that our method significantly improves coverage and reliability over standard distributional RL baselines.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

Conformal Prediction Beyond the Horizon: Distribution-Free Inference for Policy Evaluation

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Comparison of neural network training strategies for the simulation of dynamical...

Informative missingness and its implications in semi-supervised learning

Recurrent Neural Networks with Linear Structures for Electricity Price Forecasti...

Control Consistency Losses for Diffusion Bridges

Foundations of Diffusion Models in General State Spaces: A Self-Contained Introd...

Навигация