# A New Tight Upper Bound on the Entropy of Sums

^{*}

## Abstract

**:**

## 1. Introduction

- The first one is an upper bound on the entropy of Random Variables (RV)s having a finite second moment by virtue of the fact that Gaussian distributions maximize entropy under a second moment constraint (the (differential) entropy $h\left(Y\right)$ of a random variable Y having a probability density function $p\left(y\right)$ is defined as:$$h\left(Y\right)=-{\int}_{-\infty}^{+\infty}p\left(y\right)lnp\left(y\right)\phantom{\rule{0.166667em}{0ex}}dy,$$$$h(X+Z)\le \frac{1}{2}ln2\pi e\left({\sigma}_{X}^{2}+{\sigma}_{Z}^{2}\right).$$
- The second one is a lower bound on the entropy of independent sums of RVs and commonly known as the Entropy Power Inequality (EPI). The EPI states that given two real independent RVs X, Z such that $h\left(X\right)$, $h\left(Z\right)$ and $h(X+Z)$ exist, then (Corollary 3, [2])$$N(X+Z)\ge N\left(X\right)+N\left(Z\right),$$$$N\left(X\right)=\frac{1}{2\pi e}{e}^{2h\left(X\right)}.$$

- 1-
- The Fisher Information Inequality (FII): Let X and Z be two independent RVs such that the respective Fisher informations $J\left(X\right)$ and $J\left(Z\right)$ exist (the Fisher information $J\left(Y\right)$ of a random variable Y having a probability density function $p\left(y\right)$ is defined as:$$J\left(Y\right)={\int}_{-\infty}^{+\infty}\frac{1}{p\left(y\right)}\phantom{\rule{0.166667em}{0ex}}{p}^{\prime 2}\left(y\right)\phantom{\rule{0.166667em}{0ex}}dy,$$$$\frac{1}{J(X+Z)}\ge \frac{1}{J\left(X\right)}+\frac{1}{J\left(Z\right)}.$$
- 2-
- The de Bruijn’s identity: For any $\u03f5>0$,$$\frac{d}{d\u03f5}\phantom{\rule{0.166667em}{0ex}}h(X+\sqrt{\u03f5}Z)=\frac{{\sigma}^{2}}{2}J(X+\sqrt{\u03f5}Z),$$Rioul proved that the de Bruijn’s identity holds at $\u03f5={0}^{+}$ for any finite-variance RV Z (Proposition 7, p. 39, [5]).

## 2. Main Result

- ${Z}_{1}\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$ is a Gaussian RV with mean ${\mu}_{1}$ and positive variance ${\sigma}_{1}^{2}$.
- ${Z}_{2}$ is an infinitely divisible RV with mean ${\mu}_{2}$ and finite (possibly zero) variance ${\sigma}_{2}^{2}$ that is independent of ${Z}_{1}$.

**Theorem 1.**

- 1-
- While the usefulness of this upper bound is clear for RVs X having an infinite second moment for which Equation (1) fails, it can in some cases, present a tighter upper bound than the one provided by Shannon for finite second moment variables X. This is the case, for example, when $Z\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$ and X is a RV having the following PDF:$${p}_{X}\left(x\right)=\left\{\begin{array}{cc}{\displaystyle f(x+a)}\hfill & -1-a\le x\le 1-a\hfill \\ {\displaystyle f(x-a)}\hfill & -1+a\le x\le 1+a,\hfill \end{array}\right.$$$$f\left(x\right)=\left\{\begin{array}{cc}{\displaystyle \frac{3}{4}{(1+x)}^{2}}\hfill & -1\le x\le 0\hfill \\ \\ {\displaystyle \frac{3}{4}{(1-x)}^{2}}\hfill & 0<x\le 1\hfill \\ \\ {\displaystyle 0}\hfill & \mathrm{otherwise}.\hfill \end{array}\right.$$$$h(X+Z)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(X\right)\right)=ln\frac{4}{3}+\frac{2}{3}+\frac{1}{2}ln(1+12\phantom{\rule{0.166667em}{0ex}}{\sigma}_{1}^{2}),$$$$h(X+Z)\le \frac{1}{2}ln2\pi e\left({\sigma}_{X}^{2}+{\sigma}_{1}^{2}\right)=\frac{1}{2}ln2\pi e+\frac{1}{2}ln\left({a}^{2}+\frac{1}{10}+{\sigma}_{1}^{2}\right).$$
- 2-
- Theorem 1 gives an analytical bound on the change in the transmission rates of the linear Gaussian channel function of an input scaling operation. In fact, let X be a RV satisfying the conditions of Theorem 1 and $Z\sim \mathcal{N}({\mu}_{1},{\sigma}_{1}^{2})$. Then $aX$ satisfies similar conditions for some positive scalar a. Hence$$h(aX+Z)\le h\left(aX\right)+\frac{1}{2}ln\left(1+{\sigma}_{1}^{2}J\left(aX\right)\right)=h\left(X\right)+lna+\frac{1}{2}ln\left(1+\frac{{\sigma}_{1}^{2}}{{a}^{2}}J\left(X\right)\right),$$$$I(aX+Z;X)-I(X+Z;X)\le \frac{1}{2}ln\left({a}^{2}+{\sigma}_{1}^{2}J\left(X\right)\right).$$
- 3-
- If the EPI is regarded as being a lower bound on the entropy of sums, Equation (10) can be considered as its upper bound counterpart whenever one of the variables is Gaussian. In fact using both of these inequalities gives:$$N\left(X\right)+N\left(Z\right)\le N\left(Y\right)\le N\left(X\right)+N\left(Z\right)\left[N\left(X\right)J\left(X\right)\right].$$
- 4-
- The result of Theorem 1 is more powerful that the IIE in Equation (5). Indeed, using the fact that $h\left(Z\right)\le h(X+Z)$, inequality Equation (10) gives the looser inequality:$$h\left(Z\right)\le h\left(X\right)+\frac{1}{2}ln\left(1+{\sigma}^{2}J\left(X\right)\right),$$$$N\left(X\right)J\left(X\right)\ge \frac{{\sigma}^{2}J\left(X\right)}{1+{\sigma}^{2}J\left(X\right)},$$$$N\left(aX\right)J\left(aX\right)=N\left(X\right)J\left(X\right)\ge \frac{{\sigma}^{2}J\left(X\right)}{{a}^{2}+{\sigma}^{2}J\left(X\right)},$$
- 5-
- Finally, in the context of communicating over a channel, it is well-known that, under a second moment constraint, the best way to “fight” Gaussian noise is to use Gaussian inputs. This follows from the fact that Gaussian variables maximize entropy under a second moment constraint. Conversely, when using a Gaussian input, the worst noise in terms of minimizing the transmission rates is also Gaussian. This is a direct result of the EPI and is also due to the fact that Gaussian distributions have the highest entropy and therefore are the worst noise to deal with. If one were to make a similar statement where instead of the second moment, the Fisher information is constrained, i.e., if the input X is subject to a Fisher information constraint: $J\left(X\right)\le A$ for some $A>0$, then the input minimizing the mutual information of the additive white Gaussian channel is Gaussian distributed. This is a result of the EPI in Equation (2) and the IIE in Equation (5). They both reduce in this setting to$$\mathrm{arg}\underset{X:J\left(X\right)\le A}{min}h(X+Z)\sim \mathcal{N}\left(0;\frac{1}{A}\right).$$$$\mathrm{arg}\underset{Z:J\left(Z\right)\le A}{max}h(X+Z).$$$$I(Y;X)\le \frac{1}{2}ln\left(1+pJ\left(Z\right)\right),$$

## 3. Proof of the Upper Bound

#### 3.1. Concavity of Differential Entropy

**Lemma 1.**

**Proof.**

#### 3.2. Perturbations along ${U}_{t}$: An Identity of the de-Bruijn Type

- It was found by Verdu [20] to be equal to the channel capacity per unit cost of the linear average power constrained additive noise channel where the noise is independent of the input and is distributed according to X.
- Using the above interpretation, one can infer that for independent RVs X and W,$${C}_{X+W}\le {C}_{X}.$$$$D\left({p}_{X+W}(u-x)\parallel {p}_{X+W}\left(u\right)\right)\le D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right),$$
- Using Kullback’s well-known result on the divergence (Section 2.6, [21]),$${C}_{X}\ge \underset{x\to {0}^{+}}{lim}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}}=\frac{1}{2}J\left(X\right).$$
- Whenever the supremum is at “0”,$${C}_{X}=\frac{1}{2}J\left(X\right),$$

**Lemma 2.**

- X has a positive PDF ${p}_{X}\left(x\right)$.
- The integrals ${\left\{{\displaystyle {\int}_{\mathbb{R}}{\left|\omega \right|}^{k}\left|{\varphi}_{X}\left(\omega \right)\right|\phantom{\rule{0.166667em}{0ex}}d\omega}\right\}}_{k}$ are finite for all $k\in \mathbb{N}\backslash \left\{0\right\}$.
- $C}_{X}=\underset{x\ne 0}{sup}\frac{D\left({p}_{X}(u-x)\parallel {p}_{X}\left(u\right)\right)}{{x}^{2}$ is finite.

**Proof.**

#### 3.3. Proof of Theorem 1

**Proof.**

## 4. Extension

- When $\mathbf{Z}$ has n IID Gaussian components –i.e., with covariance matrix ${\Lambda}_{Z}={\sigma}^{2}\mathsf{I}$, following similar steps lead to:$$N(\mathbf{X}+\mathbf{Z})\le N\left(\mathbf{X}\right)+N\left(\mathbf{Z}\right)\frac{N\left(\mathbf{X}\right)J\left(\mathbf{X}\right)}{n},$$
- In general, for any positive-definite matrix ${\Lambda}_{Z}$ with a singular value decomposition $\mathsf{U}\mathsf{D}{\mathsf{U}}^{T}$, if we denote by $\mathsf{B}=\mathsf{U}{\mathsf{D}}^{-\frac{1}{2}}{\mathsf{U}}^{T}$ then$$\mathsf{B}\mathbf{Y}=\mathsf{B}(\mathbf{X}+\mathbf{Z})=\mathsf{B}\mathbf{X}+\mathsf{B}\mathbf{Z}=\mathsf{B}\mathbf{X}+{\mathbf{Z}}^{\prime}$$$$\begin{array}{cc}\hfill N\left(\mathsf{B}\right(\mathbf{X}+\mathbf{Z}\left)\right)& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathsf{B}\mathbf{X}\right)+N\left({\mathbf{Z}}^{\prime}\right)\frac{N\left(\mathsf{B}\mathbf{X}\right)J\left(\mathsf{B}\mathbf{X}\right)}{n}\hfill \\ \hfill \u27faN(\mathbf{X}+\mathbf{Z})& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathbf{X}\right)+\frac{N\left(\mathbf{X}\right)J\left(\mathsf{B}\mathbf{X}\right)}{n}\hfill \\ \hfill \u27faN(\mathbf{X}+\mathbf{Z})& \phantom{\rule{0.166667em}{0ex}}\le N\left(\mathbf{X}\right)+\frac{N\left(\mathbf{X}\right)\mathrm{Tr}\left(\mathsf{J}\left(\mathbf{X}\right){\Lambda}_{Z}\right)}{n},\hfill \end{array}$$

## 5. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Shannon, C.E. A mathematical theory of communication, part I. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Bobkov, S.G.; Chistyakov, G.P. Entropy power inequality for the Renyi entropy. IEEE Trans. Inf. Theory
**2015**, 61, 708–714. [Google Scholar] [CrossRef] - Stam, A.J. Some inequalities satisfied by the quantities of information of Fisher and Shannon. Inf. Control
**1959**, 2, 101–112. [Google Scholar] [CrossRef] - Blachman, N.M. The convolution inequality for entropy powers. IEEE Trans. Inf. Theory
**1965**, 11, 267–271. [Google Scholar] [CrossRef] - Rioul, O. Information theoretic proofs of entropy power inequality. IEEE Trans. Inf. Theory
**2011**, 57, 33–55. [Google Scholar] [CrossRef] - Dembo, A.; Cover, T.M.; Thomas, J.A. Information Theoretic Inequalities. IEEE Trans. Inf. Theory
**1991**, 37, 1501–1518. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley: New York, NY, USA, 2006. [Google Scholar]
- Ruzsa, I.Z. Sumsets and entropy. Random Struct. Algorithms
**2009**, 34, 1–10. [Google Scholar] [CrossRef] - Tao, T. Sumset and inverse sumset theory for Shannon entropy. Comb. Probab. Comput.
**2010**, 19, 603–639. [Google Scholar] [CrossRef] - Kontoyiannis, I.; Madiman, M. Sumset and inverse sumset inequalities for differential entropy and mutual information. IEEE Trans. Inf. Theory
**2014**, 60, 4503–4514. [Google Scholar] [CrossRef] - Madiman, M. On the entropy of sums. In Proceedings of the 2008 IEEE Information Theory Workshop, Oporto, Portugal, 5–9 May 2008.
- Cover, T.M.; Zhang, Z. On the maximum entropy of the sum of two dependent random variables. IEEE Trans. Inf. Theory
**1994**, 40, 1244–1246. [Google Scholar] [CrossRef] - Ordentlich, E. Maximizing the entropy of a sum of independent bounded random variables. IEEE Trans. Inf. Theory
**2006**, 52, 2176–2181. [Google Scholar] [CrossRef] - Bobkov, S.; Madiman, M. On the Problem of Reversibility of the Entropy Power Inequality. In Limit Theorems in Probability, Statistics and Number Theory; Springer-Verlag: Berlin/Heidelberg, Germany, 2013; pp. 61–74. [Google Scholar]
- Miclo, L. Notes on the speed of entropic convergence in the central limit theorem. Progr. Probab.
**2003**, 56, 129–156. [Google Scholar] - Luisier, F.; Blu, T.; Unser, M. Image denoising in mixed Poisson-Gaussian noise. IEEE Trans. Image Process.
**2011**, 20, 696–708. [Google Scholar] [CrossRef] [PubMed] - Fahs, J.; Abou-Faycal, I. Using Hermite bases in studying capacity-achieving distributions over AWGN channels. IEEE Trans. Inf. Theory
**2012**, 58, 5302–5322. [Google Scholar] [CrossRef] - Heyer, H. Structural Aspects in the Theory of Probability: A Primer in Probabilities on Algebraic-Topological Structures; World Scientific: Singapore, Singapore, 2004; Volume 7. [Google Scholar]
- Costa, M.H.M. A new entropy power inequality. IEEE Trans. Inf. Theory
**1985**, 31, 751–760. [Google Scholar] [CrossRef] - Verdú, S. On channel capacity per unit cost. IEEE Trans. Inf. Theory
**1990**, 36, 1019–1030. [Google Scholar] [CrossRef] - Kullback, S. Information Theory and Statistics; Dover Publications: Mineola, NY, USA, 1968. [Google Scholar]
- Steutel, F.W.; Harn, K.V. Infinite Divisibility of Probability Distributions on the Real Line; Marcel Dekker Inc.: New York, NY, USA, 2006. [Google Scholar]
- Fahs, J.; Abou-Faycal, I. On the finiteness of the capacity of continuous channels. IEEE Trans. Commun.
**2015**. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons by Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Fahs, J.; Abou-Faycal, I.
A New Tight Upper Bound on the Entropy of Sums. *Entropy* **2015**, *17*, 8312-8324.
https://doi.org/10.3390/e17127881

**AMA Style**

Fahs J, Abou-Faycal I.
A New Tight Upper Bound on the Entropy of Sums. *Entropy*. 2015; 17(12):8312-8324.
https://doi.org/10.3390/e17127881

**Chicago/Turabian Style**

Fahs, Jihad, and Ibrahim Abou-Faycal.
2015. "A New Tight Upper Bound on the Entropy of Sums" *Entropy* 17, no. 12: 8312-8324.
https://doi.org/10.3390/e17127881