# Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions

^{1}

^{2}

^{3}

^{*}

^{}

## Abstract

**:**

## 1. Introduction

## 2. Nearest Neighbor Estimates of Entropy Based on Circular Distance ${d}_{1}$

**Lemma 2.1.**Let $r\in [0,\sqrt{m}\pi ]$, $\psi \in {[0,2\pi )}^{m}$ and let ${V}_{r}$ be the volume of the ball ${N}_{r}(\psi )$, defined by (5). Then

**Proof.**Without loss of generality, we may take $\psi =(0,\dots ,0)$. Then

**Remark 2.1.**(i) For $m=1$ and $x\in [0,1]$, we have ${A}_{1}(x)=\sqrt{x}$. For $m=2$ and $x\in [0,2]$, it can be verified that

**Theorem 2.1.**Suppose that there exists an $\u03f5>0$, such that

**Proof.**Let

**Corollary 2.1.**Under the assumptions of Theorem 2.1, the estimator ${\widehat{H}}_{k,n}^{(1)}$ is asymptotically unbiased for estimating the entropy $H(f)$.

**Theorem 2.2.**Suppose that there exists an $\u03f5>0$, such that

**Remark 2.2.**For small values of $m\ge 4$, the function ${A}_{m}(\xb7)$ involved in the evaluation of estimate ${\widehat{H}}_{k,n}^{(1)}$ can be computed using numerical integration. For moderate and large values of m, which is the case with many molecules encountered in molecular sciences, using the central limit theorem one can get a reasonable approximation of ${A}_{m}(\xb7)$ by the cumulative distribution function of a normal distribution having mean $m/3$ and variance $4m/45$.

## 3. Nearest Neighbor Estimates of Entropy Based on Circular Distance ${d}_{2}$

**Lemma 3.1.**Let $r\in [0,2\sqrt{m}]$, $\psi \in {[0,2\pi )}^{m}$ and let ${W}_{r}$ be the volume of the ball ${S}_{r}(\psi )$, defined by (20). Then

**Proof.**We have

**Remark 3.1.**(i) For $m=1$ and $x\in (0,1)$,

**Theorem 3.1.**Suppose that there exists an $\u03f5>0$ such that (10) holds and

**Corollary 3.1.**Under the assumptions of Theorem 3.1, the estimator ${\widehat{H}}_{k,n}^{(2)}$ is asymptotically unbiased for estimating the entropy $H(f)$.

**Theorem 3.2.**Suppose that there exists an $\u03f5>0$ such that (18) holds and

**Remark 3.2.**(i) Using Remark 3.1 (i) and the fact that $arccos(x)\in [0,\pi ]$ is a decreasing function of $x\in [-1,1]$, for $m=1$ and $i\in \{1,\dots ,n\}$, we have

## 4. Monte Carlo Results and a Molecular Entropy Example

**Table 1.**Circular- and Euclidean-distance estimates ${\widehat{H}}_{k,n}^{(1)}$ and ${\widehat{H}}_{k,n}$, respectively, from samples of size n of the analytic distribution of 6 circular variables; the exact entropy value is (to 4 decimals) $H=1.8334$.

$n\times {10}^{-6}$ | ${\widehat{H}}_{1,n}^{(1)}/{\widehat{H}}_{1,n}$ | ${\widehat{H}}_{2,n}^{(1)}/{\widehat{H}}_{2,n}$ | ${\widehat{H}}_{3,n}^{(1)}/{\widehat{H}}_{3,n}$ | ${\widehat{H}}_{4,n}^{(1)}/{\widehat{H}}_{4,n}$ | ${\widehat{H}}_{5,n}^{(1)}/{\widehat{H}}_{5,n}$ |
---|---|---|---|---|---|

0.05 | 1.86851 | 1.92019 | 1.96379 | 2.00395 | 2.03653 |

2.01225 | 2.09929 | 2.16540 | 2.22380 | 2.27390 | |

0.10 | 1.81503 | 1.86968 | 1.90481 | 1.92899 | 1.95164 |

1.95640 | 2.01952 | 2.07339 | 2.11255 | 2.14865 | |

0.20 | 1.80070 | 1.83713 | 1.85672 | 1.87498 | 1.89122 |

1.92305 | 1.96691 | 2.00110 | 2.03086 | 2.05650 | |

0.40 | 1.8007 | 1.81582 | 1.82840 | 1.83904 | 1.84862 |

1.89363 | 1.92652 | 1.95001 | 1.96939 | 1.98668 | |

0.60 | 1.80066 | 1.80693 | 1.81496 | 1.82296 | 1.83097 |

1.88696 | 1.90952 | 1.92764 | 1.94358 | 1.95783 | |

0.80 | 1.79837 | 1.80297 | 1.81063 | 1.81660 | 1.82222 |

1.87936 | 1.89971 | 1.91649 | 1.92946 | 1.94125 | |

1.00 | 1.79539 | 1.80017 | 1.80566 | 1.81030 | 1.81566 |

1.87317 | 1.89238 | 1.90694 | 1.91850 | 1.92915 | |

2.00 | 1.79660 | 1.79533 | 1.79736 | 1.79970 | 1.80266 |

1.86471 | 1.87555 | 1.88493 | 1.89298 | 1.90073 | |

4.00 | 1.79673 | 1.79383 | 1.79404 | 1.79480 | 1.79613 |

1.85696 | 1.86477 | 1.87136 | 1.87702 | 1.88211 | |

6.00 | 1.79795 | 1.79491 | 1.79385 | 1.79419 | 1.79458 |

1.85403 | 1.86071 | 1.86552 | 1.87029 | 1.87414 | |

8.00 | 1.79893 | 1.79484 | 1.79373 | 1.79322 | 1.79337 |

1.85240 | 1.85745 | 1.86197 | 1.86545 | 1.86891 | |

10.00 | 1.80036 | 1.79562 | 1.79426 | 1.79350 | 1.79329 |

1.85170 | 1.85578 | 1.85969 | 1.86287 | 1.86583 |

**Figure 1.**Plots of the circular- and Euclidean-distance estimates ${\widehat{H}}_{1,n}^{(1)}$ and ${\widehat{H}}_{1,n}$, respectively, as functions of the sample size n, for the analytic distribution of 6 circular variables; the exact entropy value is (to 4 decimals) $H=1.8334$.

**Figure 2.**Smoothed marginal histograms of the internal-rotation angles ${\varphi}_{i}$, $i=1,\cdots ,7$ of the (R,S) isomer of tartaric acid obtained by molecular dynamics simulations.

**Figure 3.**Circular-distance nearest-neighbor estimates ${\widehat{H}}_{k,n}^{(1)}$, $k=1,\cdots ,5$ of the entropy of the 7-dimensional joint distribution of internal-rotation angles in the (R,S) isomer of tartaric acid as functions of the sample size n. The estimates ${\widehat{H}}_{k,n}^{(1)}$ at a fixed $n\lesssim 7$ million increase in value as k increases.

**Figure 4.**Circular-distance nearest-neighbor estimates ${\widehat{H}}_{1,n}^{(1)}$ and ${\widehat{H}}_{5,n}^{(1)}$ of the internal-rotation entropy of tartaric acid as functions of the sample size n compared with the Euclidean-distance nearest-neighbor estimates ${\widehat{H}}_{1,n}$ and ${\widehat{H}}_{5,n}$. An $n\to \infty $ extrapolated estimate is $\widehat{H}=5.04\pm 0.01$ [21].

## Acknowledgments

## Appendix

## References

- Karplus, M.; Kushick, J.N. Method for estimating the configurational entropy of macromolecules. Macromolecules
**1981**, 14, 325–332. [Google Scholar] [CrossRef] - Misra, N.; Singh, H.; Demchuk, E. Estimation of the entropy of a multivariate normal distribution. J. Multivar. Anal.
**2005**, 92, 324–342. [Google Scholar] [CrossRef] - Demchuk, E.; Singh, H. Statistical thermodynamics of hindered rotation from computer simulations. Mol. Phys.
**2001**, 99, 627–636. [Google Scholar] [CrossRef] - Singh, H.; Hnizdo, V.; Demchuk, E. Probabilistic modeling of two dependent circular variables. Biometrika
**2002**, 89, 719–723. [Google Scholar] [CrossRef] - Mardia, K.V.; Hughes, G.; Taylor, C.C.; Singh, H. A multivariate von Mises distribution with applications to bioinformatics. Can. J. Stat.
**2008**, 36, 99–109. [Google Scholar] [CrossRef] - Hnizdo, V.; Fedorowicz, A.; Singh, H.; Demchuk, E. Statistical thermodynamics of internal rotation in a hindering potential of mean force obtained from computer simulations. J. Comput. Chem.
**2003**, 24, 1172–1183. [Google Scholar] [CrossRef] [PubMed] - Darian, E.; Hnizdo, V.; Fedorowicz, A.; Singh, H.; Demchuk, E. Estimation of the absolute internal-rotation entropy of molecules with two torsional degrees of freedom from stochastic simulations. J. Comput. Chem.
**2005**, 26, 651–660. [Google Scholar] [CrossRef] [PubMed] - Beirlant, J.; Dudewicz, E.J.; Gyorfi, L.; van der Meulen, E.C. Nonparametric estimation of entropy: An overview. Internat. J. Math. Stat.
**1997**, 6, 17–39. [Google Scholar] - Scott, D. Multivariate Density Estimation: Theory, Practice and Visualization; Wiley: New York, NY, USA, 1992. [Google Scholar]
- Vasicek, O. On a test for normality based on sample entropy. J. R. Stat. Soc. Series B
**1976**, 38, 54–59. [Google Scholar] - Dudewicz, E.J.; van der Meulen, E.C. Entropy-based tests of uniformity. J. Am. Stat. Assoc.
**1981**, 76, 967–974. [Google Scholar] [CrossRef] - Singh, H.; Misra, N.; Hnizdo, V.; Fedorowicz, E.; Demchuk, E. Nearest neighbor estimates of entropy. Am. J. Math. Manag. Sci.
**2003**, 23, 301–321. [Google Scholar] [CrossRef] - Kozachenko, L.F.; Leonenko, N.N. Sample estimates of entropy of a random vector. Prob. Inf. Trans.
**1987**, 23, 95–101. [Google Scholar] - Goria, M.N.; Leonenko, N.N.; Novi Inveradi, P.L. A new class of random vector entropy estimators and its applications. Nonparam. Stat.
**2005**, 17, 277–297. [Google Scholar] [CrossRef] - Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E
**2004**, 69, 066138-1–066138-16. [Google Scholar] [CrossRef] - Tsybakov, A.B.; van der Meulen, E.C. Root-n consistent estimators of entropy for densities with unbounded support. Scan. J. Stat.
**1996**, 23, 75–83. [Google Scholar] - Loftsgaarden, D.O.; Quesenberry, C.P. A non-parametric estimate of a multivariate density function. Ann. Math. Stat.
**1965**, 36, 1049–1051. [Google Scholar] [CrossRef] - Mnatsakanov, R.M.; Misra, N.; Li, Sh.; Harner, E.J. k
_{n}-Nearest neighbor estimators of entropy. Math. Meth. Stat.**2008**, 17, 261–277. [Google Scholar] [CrossRef] - Lebesgue, H. Sur l’intégration des fonctions discontinues. Ann. Ecole Norm.
**1910**, 27, 361–450. [Google Scholar] - Hnizdo, V.; Tan, J.; Killian, B.J.; Gilson, M.K. Efficient calculation of configurational entropy from molecular simulations by combining the mutual-information expansion and nearest-neighbor methods. J. Comput. Chem.
**2008**, 29, 1605–1614. [Google Scholar] [CrossRef] [PubMed] - Hnizdo, V.; Darian, E.; Fedorowicz, A.; Demchuk, E.; Li, S.; Singh, H. Nearest-neighbor nonparametric method for estimating the configurational entropy of complex molecules. J. Comput. Chem.
**2006**, 28, 655–668. [Google Scholar] [CrossRef] [PubMed] - Arya, S.; Mount, D.M. Approximate nearest neighbor searching. In the Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, 25–27 January 1993; p. 271. Available online: http:// www.cs.umd.edu/∼mount/ANN/ (accessed on 5 May 2010).
- Friedman, J.H.; Bentley, J.L.; Finkel, R.A. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Software
**1977**, 3, 209–226. [Google Scholar] [CrossRef]

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.

## Share and Cite

**MDPI and ACS Style**

Misra, N.; Singh, H.; Hnizdo, V.
Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions. *Entropy* **2010**, *12*, 1125-1144.
https://doi.org/10.3390/e12051125

**AMA Style**

Misra N, Singh H, Hnizdo V.
Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions. *Entropy*. 2010; 12(5):1125-1144.
https://doi.org/10.3390/e12051125

**Chicago/Turabian Style**

Misra, Neeraj, Harshinder Singh, and Vladimir Hnizdo.
2010. "Nearest Neighbor Estimates of Entropy for Multivariate Circular Distributions" *Entropy* 12, no. 5: 1125-1144.
https://doi.org/10.3390/e12051125