Disentangled Representation Learning via Modular Compositional Bias
2510.21402v1
cs.LG, cs.CV
2025-10-28
Авторы:
Whie Jung, Dong Hoon Lee, Seunghoon Hong
Abstract
Recent disentangled representation learning (DRL) methods heavily rely on
factor specific strategies-either learning objectives for attributes or model
architectures for objects-to embed inductive biases. Such divergent approaches
result in significant overhead when novel factors of variation do not align
with prior assumptions, such as statistical independence or spatial
exclusivity, or when multiple factors coexist, as practitioners must redesign
architectures or objectives. To address this, we propose a compositional bias,
a modular inductive bias decoupled from both objectives and architectures. Our
key insight is that different factors obey distinct recombination rules in the
data distribution: global attributes are mutually exclusive, e.g., a face has
one nose, while objects share a common support (any subset of objects can
co-exist). We therefore randomly remix latents according to factor-specific
rules, i.e., a mixing strategy, and force the encoder to discover whichever
factor structure the mixing strategy reflects through two complementary
objectives: (i) a prior loss that ensures every remix decodes into a realistic
image, and (ii) the compositional consistency loss introduced by Wiedemer et
al. (arXiv:2310.05327), which aligns each composite image with its
corresponding composite latent. Under this general framework, simply adjusting
the mixing strategy enables disentanglement of attributes, objects, and even
both, without modifying the objectives or architectures. Extensive experiments
demonstrate that our method shows competitive performance in both attribute and
object disentanglement, and uniquely achieves joint disentanglement of global
style and objects. Code is available at
https://github.com/whieya/Compositional-DRL.
Ссылки и действия
Дополнительные ресурсы: