From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning?
2510.01571v1
cs.LG, cs.AI, q-bio.BM
2025-10-04
Авторы:
Hanqun Cao, Hongrui Zhang, Junde Xu, Zhou Zhang, Lingdong Shen, Minghao Sun, Ge Liu, Jinbo Xu, Wu-Jun Li, Jinren Ni, Cesar de la Fuente-Nunez, Tianfan Fu, Yejin Choi, Pheng-Ann Heng, Fang Wu
Abstract
Protein language models (PLMs) have advanced computational protein science
through large-scale pretraining and scalable architectures. In parallel,
reinforcement learning (RL) has broadened exploration and enabled precise
multi-objective optimization in protein design. Yet whether RL can push PLMs
beyond their pretraining priors to uncover latent sequence-structure-function
rules remains unclear. We address this by pairing RL with PLMs across four
domains: antimicrobial peptide design, kinase variant optimization, antibody
engineering, and inverse folding. Using diverse RL algorithms and model
classes, we ask if RL improves sampling efficiency and, more importantly, if it
reveals capabilities not captured by supervised learning. Across benchmarks, RL
consistently boosts success rates and sample efficiency. Performance follows a
three-factor interaction: task headroom, reward fidelity, and policy capacity
jointly determine gains. When rewards are accurate and informative, policies
have sufficient capacity, and tasks leave room beyond supervised baselines,
improvements scale; when rewards are noisy or capacity is constrained, gains
saturate despite exploration. This view yields practical guidance for RL in
protein design: prioritize reward modeling and calibration before scaling
policy size, match algorithm and regularization strength to task difficulty,
and allocate capacity where marginal gains are largest. Implementation is
available at https://github.com/chq1155/RL-PLM.
Ссылки и действия
Дополнительные ресурсы: