#
On the Estimation of Mutual Information^{ †}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

## 2. Mutual Information

## 3. Non-Parametric Estimation

#### 3.1. The Vanilla KSG Estimator

#### 3.2. The LNC Correction to KSG

## 4. Robustness Tests of NP Estimators

#### 4.1. Coordinate Transformations

#### 4.2. Redundancy vs. Noise

## 5. Discussion

## Acknowledgments

## References

- Carrara, N.; Ernst, J.A. On the Upper Limit of Separability. arXiv
**2017**, arXiv:1708.09449. [Google Scholar] - Tishby, N.; Pereira, F.C.; Bialek, W. The information bottleneck method. arXiv
**2000**, arXiv:physics/0004057. [Google Scholar] - Tishby, N.; Zaslavsky, N. Deep Learning and the Information Bottleneck Principle. arXiv
**2015**, arXiv:1503.02406. [Google Scholar] - Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv
**2016**, arXiv:1606.03657. [Google Scholar] - Ver Steeg, G.; Galstyan, A. The Information Sieve. arXiv
**2015**, arXiv:1507.02284. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Carrara, N.; Vanslette, K. The Inferential Foundations of Mutual Information. arXiv
**2019**, arXiv:1907.06992. [Google Scholar] - Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E
**2004**, 69, 066138. [Google Scholar] - Gao, S.; Ver Steeg, G.; Galstyan, A. Efficient Estimation of Mutual Information for Strongly Dependent Variables. arXiv
**2014**, arXiv:1411.2003. [Google Scholar] - Kozachenko, L.F.; Leonenko, N.N. A statistical estimate for the entropy of a random vector. Probl. PeredachiInformats
**1987**, 2, 9–16. [Google Scholar] - Ver Steeg, G. Non-parametric Entropy Estimation Toolbox. Available online: https://github.com/gregversteeg/NPEET (accessed on 14 January 2020).
- Baldi, P.; Sadowski, P.; Whiteson, D. Searching for exotic particles in high-energy physics with deep learning. Nat. Commun.
**2014**, 5, 4308. [Google Scholar] [CrossRef] [PubMed]

**Figure 1.**Comparison of mutual information estimates KSG for a bivariate normal distribution before (npeet_1) and after (npeet_2) a linear transformation of one variable by a factor of ${10}^{5}$. The second plot shows the difference between (npeet_1) and (npeet_2).

**Figure 2.**Comparison of mutual information estimates LNC for a bivariate normal distribution before (lnc_1) and after (lnc_2) a linear transformation of one variable by a factor of ${10}^{5}$. The second plot shows the difference between (lnc_1) and (lnc_2).

**Figure 3.**Comparison of mutual information estimates KSG for a multivariate normal distribution with equal correlation coefficients ${\rho}_{ij}$ before (npeet_1) and after (npeet_2) a linear transformation of one variable by a factor of 10. The second plot compares the KSG estimators after four variables are multiplied by a factor of 10.

**Figure 4.**Comparison of true mutual information between the eight variables of a multivariate normal distribution where all correlation coefficients are equal to $1/2$ (black line) and the value estimated from $N=10,000$ and $N=100,000$ samples before (npeet_1) and after (npeet_2) a linear transformation is applied to one variable $x\to {x}^{\prime}=10x$.

**Figure 5.**Comparison of MI values for increasing additions of discriminating variables for the SUSY data set. The first eight variables are low-level and the last ten are functions of the first eight. The first plot shows a comparison of (NN) performance when the last ten variables are redundant while the second shows how KSG’s accuracy deteriorates when the high-level variables are shuffled.

**Figure 6.**Comparison of true mutual information between the eight variables of a multivariate normal distribution where all correlation coefficients are equal to $1/2$ (black line) and the value estimated from $N=10,000$ samples before (npeet_1) and after (npeet_2) a randomization is applied to the set of redundant variables.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Carrara, N.; Ernst, J.
On the Estimation of Mutual Information. *Proceedings* **2019**, *33*, 31.
https://doi.org/10.3390/proceedings2019033031

**AMA Style**

Carrara N, Ernst J.
On the Estimation of Mutual Information. *Proceedings*. 2019; 33(1):31.
https://doi.org/10.3390/proceedings2019033031

**Chicago/Turabian Style**

Carrara, Nicholas, and Jesse Ernst.
2019. "On the Estimation of Mutual Information" *Proceedings* 33, no. 1: 31.
https://doi.org/10.3390/proceedings2019033031