Steering Autoregressive Music Generation with Recursive Feature Machines
2510.19127v1
cs.LG, cs.AI, cs.SD, eess.AS
2025-10-24
Авторы:
Daniel Zhao, Daniel Beaglehole, Taylor Berg-Kirkpatrick, Julian McAuley, Zachary Novack
Abstract
Controllable music generation remains a significant challenge, with existing
methods often requiring model retraining or introducing audible artifacts. We
introduce MusicRFM, a framework that adapts Recursive Feature Machines (RFMs)
to enable fine-grained, interpretable control over frozen, pre-trained music
models by directly steering their internal activations. RFMs analyze a model's
internal gradients to produce interpretable "concept directions", or specific
axes in the activation space that correspond to musical attributes like notes
or chords. We first train lightweight RFM probes to discover these directions
within MusicGen's hidden states; then, during inference, we inject them back
into the model to guide the generation process in real-time without per-step
optimization. We present advanced mechanisms for this control, including
dynamic, time-varying schedules and methods for the simultaneous enforcement of
multiple musical properties. Our method successfully navigates the trade-off
between control and generation quality: we can increase the accuracy of
generating a target musical note from 0.23 to 0.82, while text prompt adherence
remains within approximately 0.02 of the unsteered baseline, demonstrating
effective control with minimal impact on prompt fidelity. We release code to
encourage further exploration on RFMs in the music domain.