Sliced mutual information: a scalable measure of statistical dependence
Mutual information (MI) is a fundamental measure of statistical dependence, with a myriad of applications to information theory, statistics, and machine learning. While it possesses many virtuous properties, the estimation of high-dimensional MI suffers from the curse of dimensionality, whereby the number of data points needed to obtain reliable estimates grows exponentially with dimension. To overcome this bottleneck, this talk will introduce sliced MI (SMI) as a surrogate measure of dependence. SMI is defined as an average of MI terms between one-dimensional random projections. We will first show that SMI preserves many of the structural properties of classic MI, from identification of independence to (sliced) entropy decomposition, and variational forms. Next, we will focus on formal guarantees for empirical estimation, demonstrating that SMI between high-dimensional random vectors can be estimated at a rate corresponding to classic MI between scalar variables. We will also discuss key differences between SMI and MI, and present simple numerical experiments to independence testing and feature extraction, highlighting potential gains SMI offers over classic MI for high-dimensional inference and learning.