Connecting Jensen-Shannon and Kullback-Leibler Divergences: A New Bound for Representation Learning
2510.20644v1
cs.LG, cs.IT, math.IT
2025-10-25
Авторы:
Reuben Dorent, Polina Golland, William Wells III
Abstract
Mutual Information (MI) is a fundamental measure of statistical dependence
widely used in representation learning. While direct optimization of MI via its
definition as a Kullback-Leibler divergence (KLD) is often intractable, many
recent methods have instead maximized alternative dependence measures, most
notably, the Jensen-Shannon divergence (JSD) between joint and product of
marginal distributions via discriminative losses. However, the connection
between these surrogate objectives and MI remains poorly understood. In this
work, we bridge this gap by deriving a new, tight, and tractable lower bound on
KLD as a function of JSD in the general case. By specializing this bound to
joint and marginal distributions, we demonstrate that maximizing the JSD-based
information increases a guaranteed lower bound on mutual information.
Furthermore, we revisit the practical implementation of JSD-based objectives
and observe that minimizing the cross-entropy loss of a binary classifier
trained to distinguish joint from marginal pairs recovers a known variational
lower bound on the JSD. Extensive experiments demonstrate that our lower bound
is tight when applied to MI estimation. We compared our lower bound to
state-of-the-art neural estimators of variational lower bound across a range of
established reference scenarios. Our lower bound estimator consistently
provides a stable, low-variance estimate of a tight lower bound on MI. We also
demonstrate its practical usefulness in the context of the Information
Bottleneck framework. Taken together, our results provide new theoretical
justifications and strong empirical evidence for using discriminative learning
in MI-based representation learning.
Ссылки и действия
Дополнительные ресурсы: