On the Estimation of Mutual Information^{ †}

## Abstract

## 1. Introduction

## 2. Mutual Information

## 3. Non-Parametric Estimation

#### 3.1. The Vanilla KSG Estimator

#### 3.2. The LNC Correction to KSG

## 4. Robustness Tests of NP Estimators

#### 4.1. Coordinate Transformations

#### 4.2. Redundancy vs. Noise

## 5. Discussion

## Acknowledgments

## References

**Figure 1.**Comparison of mutual information estimates KSG for a bivariate normal distribution before (npeet_1) and after (npeet_2) a linear transformation of one variable by a factor of ${10}^{5}$. The second plot shows the difference between (npeet_1) and (npeet_2).

**Figure 2.**Comparison of mutual information estimates LNC for a bivariate normal distribution before (lnc_1) and after (lnc_2) a linear transformation of one variable by a factor of ${10}^{5}$. The second plot shows the difference between (lnc_1) and (lnc_2).

**Figure 3.**Comparison of mutual information estimates KSG for a multivariate normal distribution with equal correlation coefficients ${\rho}_{ij}$ before (npeet_1) and after (npeet_2) a linear transformation of one variable by a factor of 10. The second plot compares the KSG estimators after four variables are multiplied by a factor of 10.

**Figure 4.**Comparison of true mutual information between the eight variables of a multivariate normal distribution where all correlation coefficients are equal to $1/2$ (black line) and the value estimated from $N=10,000$ and $N=100,000$ samples before (npeet_1) and after (npeet_2) a linear transformation is applied to one variable $x\to {x}^{\prime}=10x$.

**Figure 5.**Comparison of MI values for increasing additions of discriminating variables for the SUSY data set. The first eight variables are low-level and the last ten are functions of the first eight. The first plot shows a comparison of (NN) performance when the last ten variables are redundant while the second shows how KSG’s accuracy deteriorates when the high-level variables are shuffled.

**Figure 6.**Comparison of true mutual information between the eight variables of a multivariate normal distribution where all correlation coefficients are equal to $1/2$ (black line) and the value estimated from $N=10,000$ samples before (npeet_1) and after (npeet_2) a randomization is applied to the set of redundant variables.

