Control Barrier Function for Aligning Large Language Models
2511.03121v2
cs.CL, cs.AI, cs.SY, eess.SY
2025-11-07
Авторы:
Yuya Miyaoka, Masaki Inoue
Abstract
This paper proposes a control-based framework for aligning large language
models (LLMs) by leveraging a control barrier function (CBF) to ensure
user-desirable text generation. The presented framework applies the CBF safety
filter to the predicted token generated from the baseline LLM, to intervene in
the generated text. The safety filter includes two significant advantages: this
safety filter is an add-on type, allowing it to be used for alignment purposes
without fine-tuning the baseline LLM, and if there is an evaluation model
regarding the desired alignment, it can be directly applied to the filter
design. The overall text-generation system is implemented with open-source
language models, aiming to generate positive text.