# Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions

## Abstract

## 1. Introduction

## 2. Nearest Neighbor Estimates of Entropy Based on Circular Distance ${d}_{1}$

**Lemma 2.1.**Let $r\in [0,\sqrt{m}\pi ]$, $\psi \in {[0,2\pi )}^{m}$ and let ${V}_{r}$ be the volume of the ball ${N}_{r}(\psi )$, defined by (5). Then

**Proof.**Without loss of generality, we may take $\psi =(0,\dots ,0)$. Then

**Remark 2.1.**(i) For $m=1$ and $x\in [0,1]$, we have ${A}_{1}(x)=\sqrt{x}$. For $m=2$ and $x\in [0,2]$, it can be verified that

**Theorem 2.1.**Suppose that there exists an $\u03f5>0$, such that

**Proof.**Let

**Corollary 2.1.**Under the assumptions of Theorem 2.1, the estimator ${\widehat{H}}_{k,n}^{(1)}$ is asymptotically unbiased for estimating the entropy $H(f)$.

**Theorem 2.2.**Suppose that there exists an $\u03f5>0$, such that

**Remark 2.2.**For small values of $m\ge 4$, the function ${A}_{m}(\xb7)$ involved in the evaluation of estimate ${\widehat{H}}_{k,n}^{(1)}$ can be computed using numerical integration. For moderate and large values of m, which is the case with many molecules encountered in molecular sciences, using the central limit theorem one can get a reasonable approximation of ${A}_{m}(\xb7)$ by the cumulative distribution function of a normal distribution having mean $m/3$ and variance $4m/45$.

## 3. Nearest Neighbor Estimates of Entropy Based on Circular Distance ${d}_{2}$

**Lemma 3.1.**Let $r\in [0,2\sqrt{m}]$, $\psi \in {[0,2\pi )}^{m}$ and let ${W}_{r}$ be the volume of the ball ${S}_{r}(\psi )$, defined by (20). Then

**Proof.**We have

**Remark 3.1.**(i) For $m=1$ and $x\in (0,1)$,

**Theorem 3.1.**Suppose that there exists an $\u03f5>0$ such that (10) holds and

**Corollary 3.1.**Under the assumptions of Theorem 3.1, the estimator ${\widehat{H}}_{k,n}^{(2)}$ is asymptotically unbiased for estimating the entropy $H(f)$.

**Theorem 3.2.**Suppose that there exists an $\u03f5>0$ such that (18) holds and

**Remark 3.2.**(i) Using Remark 3.1 (i) and the fact that $arccos(x)\in [0,\pi ]$ is a decreasing function of $x\in [-1,1]$, for $m=1$ and $i\in \{1,\dots ,n\}$, we have

## 4. Monte Carlo Results and a Molecular Entropy Example

**Table 1.**Circular- and Euclidean-distance estimates ${\widehat{H}}_{k,n}^{(1)}$ and ${\widehat{H}}_{k,n}$, respectively, from samples of size n of the analytic distribution of 6 circular variables; the exact entropy value is (to 4 decimals) $H=1.8334$.

$n\times {10}^{-6}$ | ${\widehat{H}}_{1,n}^{(1)}/{\widehat{H}}_{1,n}$ | ${\widehat{H}}_{2,n}^{(1)}/{\widehat{H}}_{2,n}$ | ${\widehat{H}}_{3,n}^{(1)}/{\widehat{H}}_{3,n}$ | ${\widehat{H}}_{4,n}^{(1)}/{\widehat{H}}_{4,n}$ | ${\widehat{H}}_{5,n}^{(1)}/{\widehat{H}}_{5,n}$ |
---|---|---|---|---|---|

0.05 | 1.86851 | 1.92019 | 1.96379 | 2.00395 | 2.03653 |

2.01225 | 2.09929 | 2.16540 | 2.22380 | 2.27390 | |

0.10 | 1.81503 | 1.86968 | 1.90481 | 1.92899 | 1.95164 |

1.95640 | 2.01952 | 2.07339 | 2.11255 | 2.14865 | |

0.20 | 1.80070 | 1.83713 | 1.85672 | 1.87498 | 1.89122 |

1.92305 | 1.96691 | 2.00110 | 2.03086 | 2.05650 | |

0.40 | 1.8007 | 1.81582 | 1.82840 | 1.83904 | 1.84862 |

1.89363 | 1.92652 | 1.95001 | 1.96939 | 1.98668 | |

0.60 | 1.80066 | 1.80693 | 1.81496 | 1.82296 | 1.83097 |

1.88696 | 1.90952 | 1.92764 | 1.94358 | 1.95783 | |

0.80 | 1.79837 | 1.80297 | 1.81063 | 1.81660 | 1.82222 |

1.87936 | 1.89971 | 1.91649 | 1.92946 | 1.94125 | |

1.00 | 1.79539 | 1.80017 | 1.80566 | 1.81030 | 1.81566 |

1.87317 | 1.89238 | 1.90694 | 1.91850 | 1.92915 | |

2.00 | 1.79660 | 1.79533 | 1.79736 | 1.79970 | 1.80266 |

1.86471 | 1.87555 | 1.88493 | 1.89298 | 1.90073 | |

4.00 | 1.79673 | 1.79383 | 1.79404 | 1.79480 | 1.79613 |

1.85696 | 1.86477 | 1.87136 | 1.87702 | 1.88211 | |

6.00 | 1.79795 | 1.79491 | 1.79385 | 1.79419 | 1.79458 |

1.85403 | 1.86071 | 1.86552 | 1.87029 | 1.87414 | |

8.00 | 1.79893 | 1.79484 | 1.79373 | 1.79322 | 1.79337 |

1.85240 | 1.85745 | 1.86197 | 1.86545 | 1.86891 | |

10.00 | 1.80036 | 1.79562 | 1.79426 | 1.79350 | 1.79329 |

1.85170 | 1.85578 | 1.85969 | 1.86287 | 1.86583 |

**Figure 1.**Plots of the circular- and Euclidean-distance estimates ${\widehat{H}}_{1,n}^{(1)}$ and ${\widehat{H}}_{1,n}$, respectively, as functions of the sample size n, for the analytic distribution of 6 circular variables; the exact entropy value is (to 4 decimals) $H=1.8334$.

**Figure 2.**Smoothed marginal histograms of the internal-rotation angles ${\varphi}_{i}$, $i=1,\cdots ,7$ of the (R,S) isomer of tartaric acid obtained by molecular dynamics simulations.

**Figure 3.**Circular-distance nearest-neighbor estimates ${\widehat{H}}_{k,n}^{(1)}$, $k=1,\cdots ,5$ of the entropy of the 7-dimensional joint distribution of internal-rotation angles in the (R,S) isomer of tartaric acid as functions of the sample size n. The estimates ${\widehat{H}}_{k,n}^{(1)}$ at a fixed $n\lesssim 7$ million increase in value as k increases.

**Figure 4.**Circular-distance nearest-neighbor estimates ${\widehat{H}}_{1,n}^{(1)}$ and ${\widehat{H}}_{5,n}^{(1)}$ of the internal-rotation entropy of tartaric acid as functions of the sample size n compared with the Euclidean-distance nearest-neighbor estimates ${\widehat{H}}_{1,n}$ and ${\widehat{H}}_{5,n}$. An $n\to \infty $ extrapolated estimate is $\widehat{H}=5.04\pm 0.01$ [21].

## Acknowledgments

## Appendix

## References

