Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support

Eisen, Jonah; Mazumdar, Ravi R.; Mitran, Patrick

doi:10.3390/e25081180

Open AccessFeature PaperArticle

Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support

by

Jonah Eisen

^*

,

Ravi R. Mazumdar

and

Patrick Mitran

Department of Electrical & Computer Engineering, University of Waterloo, 200 University Ave W, Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(8), 1180; https://doi.org/10.3390/e25081180

Submission received: 25 July 2023 / Revised: 4 August 2023 / Accepted: 5 August 2023 / Published: 8 August 2023

(This article belongs to the Special Issue Shannon Entropy: Mathematical View)

Download Versions Notes

Abstract

:

We investigate the support of a capacity-achieving input to a vector-valued Gaussian noise channel. The input is subjected to a radial even-moment constraint and is either allowed to take any value in

R^{n}

or is restricted to a given compact subset of

R^{n}

. It is shown that the support of the capacity-achieving distribution is composed of a countable union of submanifolds, each with a dimension of

n - 1

or less. When the input is restricted to a compact subset of

R^{n}

, this union is finite. Finally, the support of the capacity-achieving distribution is shown to have Lebesgue measure 0 and to be nowhere dense in

R^{n}

.

Keywords:

amplitude constraint; even-moment constraint; capacity-achieving distribution; Gaussian noise; spherically asymmetric channel

1. Introduction

In this paper, we consider the support of the capacity-achieving input to a vector-valued channel that is subject to additive non-degenerate Gaussian noise. Vector-valued channels are used in a variety of applications, including the complex-valued inputs and outputs of quadrature channels, which have alternate representations as two-dimensional real vectors. Larger antenna arrays enable Multiple-Input Multiple-Output (MIMO) channels, which have inputs of

n \geq 1

complex components. Additionally, noise with memory can be expressed by correlated noise components in a vector-valued channel.

Throughout the paper, the average input power is bounded to limit the consumption of environmental, battery, and monetary resources. Since the output of the amplifiers used in transmitters is severely distorted when the input is too large [1,2,3] and signals that are too small can be challenging to produce, it is also of practical interest to restrict the input to an arbitrary compact set.

There has been appreciable prior effort dedicated to understanding the capacity-achieving input to vector-valued channels subject to average power constraints and restrictions to compact sets. Nevertheless, there are significant technical challenges in working with vector-valued inputs. Therefore, much of the work to this point has been limited to either one-dimensional channels [4] or spherically symmetric channels [5,6,7,8], where the latter case ensures that the capacity-achieving distribution can be expressed as a univariate function of the radius. However, this restriction limits the scope of study to channels in which the input is constrained to a ball and the noise components are independent and identically distributed. In this paper, the only assumption made on the Gaussian noise distribution is that it is non-degenerate. We consider both cases, those in which inputs are restricted to arbitrary compact sets and those in which inputs are allowed to take any value in

R^{n}

.

The power of a vector-valued signal is equivalent to the second moment of its Euclidean norm. A constraint on the fourth moment then has the practical interpretation of limiting the second moment of the instantaneous power. Furthermore, imposing a moment constraint of order

2 k

ensures that the tails of the input distribution decay at least as quickly as a degree

2 k

monomial. Therefore, increasing the even-moment constraint penalizes large inputs without imposing a strict cutoff. This motivates us to generalize the average power constraint by limiting some even moment of the input’s Euclidean norm.

The results in this paper apply to any combination of input constraints described above, except for the special case where the input is allowed to take any value in

R^{n}

and is subject to a second-moment constraint. This case reduces to a classical result in which the capacity-achieving distribution is known to be Gaussian. For all other cases, we show that the capacity-achieving distribution is contained in a countable union of i-dimensional submanifolds, where i ranges

{0, \dots, n - 1}

. Furthermore, this union is finite when the input is restricted to a compact set. We then show that the support of the capacity-achieving distribution is a nowhere dense set with Lebesgue measure 0.

The paper is organized as follows. We first give a review of prior work in Section 2. Section 3.1, Section 3.2 and Section 3.3 provide intermediary results prior to the main results in Section 3.4. Section 4 concludes the paper.

2. Prior Work

Dating back to Shannon’s work in [9], much of the research on continuous channels has focused on average power (equivalently, second-moment) constraints on the input. A transmitter’s inability to produce arbitrarily large powers then led to the consideration of additional peak power constraints, modeled by restricting the input almost surely to compact sets.

The first major result on amplitude-constrained channels considers a scalar Additive White Gaussian Noise (AWGN) model, both with and without a variance constraint [4]. In each case, the support of the capacity-achieving distribution has a finite number of points.

The use of the Identity Theorem for functions of a single complex variable is key to the argument of [4] and many papers that follow. The theorem can be applied to any univariate analytic function that has an accumulation point of zeros. By contrast, the Identity Theorem in n complex dimensions requires an analytic function with an open set of zeros in

C^{n}

. Therefore, to apply the Identity Theorem directly for

n > 1

, a random vector with support containing an open subset of

C^{n}

must be considered. It was suspected by some authors that, since

R^{n}

is not open in

C^{n}

, no topological assumption on the support of the capacity-achieving distribution would be sufficient for this purpose [5,10,11]. Therefore, many papers restrict their models to ones that maintain spherical symmetry so that the capacity-achieving distribution can be expressed as a one-dimensional function of radius (e.g., Refs. [5,6,7,8]).

In [5] and [6], the results of [4] are extended to multivariate spherically symmetric channels, in n dimensions and 2 dimensions, respectively. In both papers, the inputs are subject to average and peak radial constraints. It is shown that it is optimal to concentrate the input on a finite number of concentric shells.

In [7] and [8], the number and positioning of optimal concentric shells under a peak radial constraint is studied. In [7], the least restrictive amplitude constraint for which the optimal distribution is concentrated on a single sphere is found. In [8], it is shown that the number of shells grows at most quadratically in the amplitude constraint. A similar result is found for

n = 1

under an additional average power constraint.

While much of the prior work has focused on spherically symmetric channels, some research has considered spherically asymmetric channels. The case of inputs constrained to arbitrary compact sets and subject to a finite number of quadratic cost constraints as well as non-degenerate multivariate Gaussian noise is considered in [12]. It is concluded that the support of the capacity-achieving distribution must be “sparse”—that is, there must exist a not identically zero analytic function that is 0 on the support of the capacity-achieving distribution. Assuming otherwise leads to a contradiction by the n-dimensional Identity Theorem and Fourier analysis. These results, while quite general, do not consider either inputs of unbounded support or inputs subject to higher-moment constraints. Furthermore, outside of the special cases of

n = 1

and spherically symmetric channels, they do not explore a characterization of sparse sets in

R^{n}

.

In [13], MIMO channels with inputs that are restricted to compact sets, yet have no average power constraints, are considered. Using the Real-Analytic Identity Theorem and steps similar to [4], it is determined that the support of the optimal input distribution is nowhere dense in

R^{n}

and has Lebesgue measure 0. For the case considered in [12] that coincides with this setup, [13] gives an instance of sparsity in terms of subsets of

R^{n}

, rather than analytic functions.

There has also been work dedicated to generalizing the classic quadratic average cost constraint. In [14], a scalar channel with the input subject to a combination of even-moment constraints and restrictions to compact or non-negative subsets of

R

is studied. It is shown that, in most of the cases considered, the support has a finite number of points.

In [15], a complex-valued non-dispersive optical channel is considered, where the input is subject to an average cost that grows super-quadratically in radius, a peak constraint, or both. The noise is taken to be circularly symmetric and, under these conditions, so is the optimal input. The number of concentric circles composing the support of the distribution is shown to be finite.

In this paper, we study an n-dimensional channel subject to non-degenerate Gaussian noise. The input can either take any value in

R^{n}

or is restricted to a compact subset of

R^{n}

, and its norm is subject to even-moment constraints. The noise need not be spherically symmetric.

This paper gives a characterization of the capacity-achieving distribution to spherically asymmetric channels under peak and average power constraints that improves on prior work in three respects. Firstly, when our cases overlap with [12], our characterization of the capacity-achieving distribution is more detailed than the notion of sparsity used there. Secondly, our results apply to multivariate channels with inputs subject to even-moment constraints greater than 2. Thirdly, we consider both inputs that are restricted to compact sets and those that are allowed to take any value in

R^{n}

.

3. Results

In this section, we consider

R^{n}

-valued inputs subject to additive non-degenerate multivariate Gaussian noise. In Section 3.1, the capacity-achieving distribution,

F^{*}

, is formulated as the objective of an optimization problem; its support is then framed in terms of the zero set of a certain real-analytic function, which is dependent on

F^{*}

and referred to as

s (\cdot; F^{*})

. Section 3.2 finds an equivalent expression for

s (\cdot; F^{*})

, which is an intermediary step to showing in Section 3.3 that

s (\cdot; F^{*})

is non-constant. Section 3.4 uses the result that

s (\cdot; F^{*})

is non-constant to show that the support of the capacity-achieving distribution is contained in a countable union of submanifolds of dimensions in the range

{0, \dots, n - 1}

. This union is finite when the input is constrained to a compact subset of

R^{n}

. It is then shown that the support of the capacity-achieving input has Lebesgue measure 0 and is nowhere dense in

R^{n}

.

Appendix A is dedicated to showing the convexity and compactness of the optimization space used in Section 3.1. Appendix B establishes a pointwise characterization of

s u p p (F^{*})

, which justifies the definition of

s (\cdot, F^{*})

. Appendix C provides integrability results, which are used throughout the paper. Appendix D shows that the objective functional is weak continuous, strictly concave, and weak differentiable on the optimization space. Appendix E shows that

s (\cdot, F^{*})

has an analytic extension to

C^{n}

. Finally, Appendix F supports Section 3.3 by finding bounds for certain functions.

As a first step towards defining the set of feasible input distributions, let

F (R^{n})

be the set of finite Borel measures on

R^{n}

. Note that

F (R^{n})

is contained in the set of finite signed Borel measures on

R^{n}

, which has an intrinsic vector space structure and can be equipped with a norm [16]. Since

F (R^{n})

lies within a normed vector space, the convexity and compactness of its subsets can be discussed.

The possibility that the transmitter is unable to produce arbitrary signals in

R^{n}

is modeled by restricting the input to an alphabet

A \subseteq R^{n}

. Denote the set of distributions for which the associated random variable is almost surely in

A

by

F_{n} (A) ≜ {F \in F (R^{n}) ∣ F (A) = 1} .

(1)

Two cases for

A

are considered:

$A = R^{n}$ ;
$A ⊊ R^{n}$ is compact.

In addition to the restriction to

A

, a radial even-moment constraint is associated with the input. For

k \in Z_{> 0}

, the input must belong to the set

P_{n} (A, k, a) ≜ {F \in F_{n} (A) ∣ E_{X \sim F} {[∥ X ∥}^{2 k}] \leq a} .

(2)

The resulting channel model, with input

X \sim F \in P_{n} (A, k, a)

, is

Y = A X + N,

(3)

where

Y

and

N \sim N (0, Σ)

are output and noise, respectively, and

A

is an invertible matrix known to the transmitter and receiver. It is assumed that the noise covariance matrix

Σ

is positive-definite.

We will simplify the analysis of (3) by showing that no generality is lost in assuming that

A = I_{n}

, the n-dimensional identity matrix, and

Σ

is diagonal. Since

Σ

is positive-definite and

A

is invertible, the positive-definite matrix

A^{- 1} Σ {(A^{- 1})}^{T}

can be diagonalized by the orthogonal matrix

Q

.

Now, multiplying the output

Y

in (3) by

Q A^{- 1}

, the receiver obtains

\begin{matrix} \tilde{Y} & ≜ Q A^{- 1} Y \end{matrix}

(4)

\begin{matrix} = \tilde{X} + \tilde{N}, \end{matrix}

(5)

where

\tilde{X} = Q X

and the covariance matrix of

\tilde{N} = Q A^{- 1} N

is diagonal. Since

Q A^{- 1}

is invertible,

I (X; Y) = I (\tilde{X}; \tilde{Y})

. Furthermore, since

Q

is orthogonal,

E [∥ \tilde{X} ∥^{2 k}] {= E [∥ X ∥}^{2 k}]

(6)

and the set

{Q x | x \in A}

is merely a rotated version of

A

. Hence, no generality is lost by dropping the

“ \sim ”

and adopting the following channel model for the remainder of the paper:

Y = X + N,

(7)

where

N \sim N (0, Σ)

and

Σ

is diagonal with entries

0 < σ_{1}^{2} \leq σ_{2}^{2} \leq \dots \leq σ_{n}^{2}

. The density of

N

is denoted by

p_{N} (\cdot)

.

3.1. Optimization Problem

By Theorem 3.6.2 of [17], the capacity of the channel in (7) is given by the optimization problem

C = sup_{X \sim F \in P_{n} (A, k, a)} I (X; Y) = \underset{E_{X \sim F} {[∥ X ∥}^{2 k}] \leq a}{sup_{F \in F_{n} (A)}} I (X; Y) .

(8)

Since the relationship between

Y

,

X

, and

N

is known, the mutual information is a function of the distribution of

X

alone. Thus, the mutual information induced between

X \sim F

and

Y

will be denoted by

I (F)

. Similarly, we express the even-moment constraint in terms of a functional

g_{k} : F_{n} (A) \to R \cup {\infty}

given by

g_{k} (F) ≜ \int_{A} {∥ x ∥}^{2 k} d F (x) - a,

(9)

where

g_{k} (F) \leq 0

is equivalent to

E_{X \sim F} {[∥ X ∥}^{2 k}] \leq a

. Rewriting (8) in terms of

I (\cdot)

and

g_{k} (\cdot)

yields

C = sup_{F \in P_{n} (A, k, a)} I (F) = \underset{g_{k} (F) \leq 0}{sup_{F \in F_{n} (A)}} I (F) .

(10)

Much of the appendix is dedicated to understanding properties of the problem presented in (10). It is shown in Theorem A1 that

P_{n} (A, k, a)

is convex and compact. Furthermore, by Theorems A3 and A4,

I (\cdot)

is a weakly continuous and strictly concave function on

P_{n} (A, k, a)

. Therefore, the supremum is achieved by a unique input distribution

F^{*} \in P_{n} (A, k, a)

(see, e.g., Appendix C of [14])—that is,

C = max_{F \in P_{n} (A, k, a)} I (F) = \underset{g_{k} (F) \leq 0}{max_{F \in F_{n} (A)}} I (F) = I (F^{*}) .

(11)

We use the notation

X^{*}

to describe a capacity-achieving input directly (i.e.,

X^{*} \sim F^{*}

).

Before proceeding, we require some definitions and notations. In the first definition, and throughout the paper, for

x \in R^{n}

and

r > 0

, we denote the ball of radius r centered at

x

by

B_{r} (x) \subseteq R^{n}

. We will denote the closure of

B_{r} (x)

by

\bar{B_{r}} (x)

. The output density induced by an input

X \sim F

is denoted

p (\cdot; F)

.

Definition 1.

Let

V

be a random variable with alphabet

A \subseteq R^{n}

. Then, the support of

V

is the set given by

s u p p (V) ≜ {x \in A ∣ \forall r > 0, P {V \in B_{r} (x)} > 0} .

(12)

If

V

has distribution

F_{V}

, we may alternatively refer to

s u p p (F_{V}) ≜ s u p p (V)

.

Definition 2.

For

F \in F_{n} (A)

, the output differential entropy is given by

h_{Y} (F) ≜ - \int_{R^{n}} p (y; F) ln p (y; F) d y

(13)

and the marginal entropy density at

x \in R^{n}

is given by

h (x; F) ≜ - \int_{R^{n}} p_{N} (y - x) ln p (y; F) d y,

(14)

whenever the integrals exist.

The relationship between the differential entropy and marginal entropy density can be seen as follows. For any

b > 0

and

F_{1}, F_{2} \in P_{n} (A, k, b)

, we have that

\begin{matrix} \int_{A} h (x; F_{1}) d F_{2} (x) & = - \int_{A} \int_{R^{n}} p_{N} (y - x) ln p (y; F_{1}) d y d F_{2} (x) \end{matrix}

(15)

\begin{matrix} = - \int_{R^{n}} \int_{A} p_{N} (y - x) d F_{2} (x) ln p (y; F_{1}) d y \end{matrix}

(16)

\begin{matrix} = - \int_{R^{n}} p (y; F_{2}) ln p (y; F_{1}) d y . \end{matrix}

(17)

The equivalence of (15)–(17) is due to Fubini–Tonelli [18] and Lemma A4. If

F = F_{1} = F_{2}

, then

\int_{A} h (x; F) d F (x) = h_{Y} (F) .

(18)

Lastly, define

\begin{matrix} Q_{n} (A, k, a) & ≜ ⋃_{b \geq a} P_{n} (A, k, b) \end{matrix}

(19)

and let

s (\cdot; F^{*}) : R^{n} \to R

be given by

\begin{matrix} s (x; F^{*}) & ≜ {γ (∥ x ∥}^{2 k} - a) + C + h (N) - h (x; F^{*}) \end{matrix}

(20)

\begin{matrix} = {γ (∥ x ∥}^{2 k} - a) + C + h (N) + \int_{R^{n}} p_{N} (y - x) ln p (y; F^{*}) d y . \end{matrix}

(21)

Since

F^{*} \in P_{n} (A, k, a)

, we have that

h (x; F^{*})

is finite for all

x \in R^{n}

and conclude that

s (x; F^{*})

is also finite for all

x \in R^{n}

. Furthermore, by Lemma A8,

s (\cdot; F^{*})

can be extended to a complex analytic function for all

z \in C^{n}

; hence, it is continuous.

The remainder of Section 3.1 consists of two steps:

We show that $F^{*}$ solves (11) if and only if there exists $γ \geq 0$ such that

$\int_{R^{n}} h (x; F^{*}) d F (x) - C - h (N) - γ g_{k} (F) \leq 0,$

(22)

for all $F \in Q_{n} (A, k, a)$ .
We show that, for the choice of $γ$ in (22), the inequality in (22) is equivalent to the condition that for all $x \in A$ ,

$h (x; F^{*}) \leq {γ (∥ x ∥}^{2 k} - a) + C + h (N),$

(23)

and if $x \in s u p p (F^{*})$ , then

$h (x; F^{*}) = {γ (∥ x ∥}^{2 k} - a) + C + h (N) .$

(24)

For $x \in A$ , (24) is satisfied if and only if $s (x; F^{*}) = 0$ .

To establish (22), we will first use a Lagrange multiplier to reformulate (11) as an unconstrained problem over

Q_{n} (A, k, a)

. We will then obtain (22) by taking the weak derivative of the resulting objective functional and applying a Karush–Kuhn–Tucker condition for optimality. We choose to work in the space

Q_{n} (A, k, a)

since, when

A = R^{n}

, the functionals

I (\cdot)

and

g_{k} (\cdot)

are not weakly differentiable on the larger space

F_{n} (A) = F_{n} (R^{n})

.

Since

{F \in Q_{n} (A, k, a) | g_{k} (F) \leq 0} = P_{n} (A, k, a)

, (11) can equivalently be written as

C = \underset{g_{k} (F) \leq 0}{max_{F \in Q_{n} (A, k, a)}} I (F) = I (F^{*}),

(25)

where

F^{*}

is the same as in (11). By Theorem A5,

g_{k} (\cdot)

is convex. Moreover, letting

F_{s} \in Q_{n} (A, k, a)

be a Heaviside step function at

0 \in R^{n}

,

F_{s}

is an interior point of the feasible region since

g_{k} (F_{s}) = - a < 0

. Since

Q_{n} (A, k, a)

is convex by Lemma A1, there exists

γ \geq 0

such that

C = max_{F \in Q_{n} (A, k, a)} J_{γ} (F) = J_{γ} (F^{*}),

(26)

where

J_{γ} (F) ≜ I (F) - γ g_{k} (F),

(27)

and

γ g_{k} (F^{*}) = 0

(see e.g., Appendix C of [14]). Furthermore, for an arbitrary

b \geq a

,

F^{*} \in P_{n} (A, k, b) \subseteq Q_{n} (A, k, a)

. Therefore, for this choice of

γ

, we also have

C = max_{F \in P_{n} (A, k, b)} J_{γ} (F) = J_{γ} (F^{*}) .

(28)

By Theorems A5 and A6,

J_{γ} (\cdot)

has a weak derivative at

F^{*}

in the direction of any

F \in P_{n} (A, k, b)

given by

\begin{matrix} J_{γ}^{'} (F^{*}, F) & = I^{'} (F^{*}, F) - γ g_{k}^{'} (F^{*}, F) \end{matrix}

(29)

\begin{matrix} = \int_{A} h (x; F^{*}) d F (x) - h_{Y} (F^{*}) - γ (g_{k} (F) - g_{k} (F^{*})) \end{matrix}

(30)

\begin{matrix} = \int_{A} h (x; F^{*}) d F (x) - h_{Y} (F^{*}) - γ g_{k} (F), \end{matrix}

(31)

where (31) is due to

γ g_{k} (F^{*}) = 0

. Substituting

C = h_{Y} (F^{*}) - h (N)

gives

J_{γ}^{'} (F^{*}, F) = \int_{A} h (x; F^{*}) d F (x) - C - h (N) - γ g_{k} (F),

(32)

where the differential entropy of the noise

h (N) = \frac{1}{2} ln | 2 π e Σ |

is finite since

Σ

is positive definite.

Now,

J_{γ} (\cdot)

is the difference between a strictly concave function (see Theorem A4) and a convex function (due to Theorem A5 and the non-negativity of

γ

). Therefore,

J_{γ} (\cdot)

is strictly concave and

F^{*}

is optimal if and only if, for all

F \in P_{n} (A, k, b)

,

J_{γ}^{'} (F^{*}, F) = \int_{A} h (x; F^{*}) d F (x) - C - h (N) - γ g_{k} (F) \leq 0 .

(33)

However,

b \geq a

is arbitrary and each

F \in Q_{n} (A, k, a)

satisfies

F \in P_{n} (A, k, b)

for some

b \geq a

. Therefore,

F^{*}

is optimal if and only if, for all

F \in Q_{n} (A, k, a)

,

J_{γ}^{'} (F^{*}, F) = \int_{A} h (x; F^{*}) d F (x) - C - h (N) - γ g_{k} (F) \leq 0,

(34)

which is the statement we sought to show in (22).

The condition (34) is on the capacity-achieving distribution

F^{*}

itself. Since our objective is to characterize

s u p p (F^{*})

, we find an equivalent condition to (34) to describe the behavior of

F^{*}

at individual points in the input alphabet

A

. Thus, by (34) and Theorem A2, for all

x \in A \subseteq R^{n}

,

s (x; F^{*}) = {γ (∥ x ∥}^{2 k} - a) + C + h (N) + \int_{R^{n}} p_{N} (y - x) ln p (y; F^{*}) d y \geq 0,

(35)

and if

x \in s u p p (F^{*})

, then

\begin{matrix} s (x; F^{*}) = {γ (∥ x ∥}^{2 k} - a) + C + h (N) + \int_{R^{n}} p_{N} (y - x) ln p (y; F^{*}) d y = 0 . \end{matrix}

(36)

The rest of Section 3 is dedicated to exploiting the relationship

s u p p (F^{*}) \subseteq Z (s; F^{*}) \cap A,

(37)

where

Z (s; F^{*})

is the zero set of

s (\cdot, F^{*})

.

3.2. Hilbert Space and Hermite Polynomial Representation

In this subsection, an equivalent expression for (36) is found by viewing the integral as an inner product in a Hilbert space and writing

ln p (\cdot; F^{*})

in terms of a Hermite polynomial basis for that space. Hermite polynomial bases are well-suited to analysis of Gaussian noise channels and they have been used in a number of information-theoretic papers (see, e.g., Refs. [14,19]).

Consider the Hilbert space

L_{p_{N}}^{2} (R^{n}) ≜ {ξ : R^{n} \to R | \int_{R^{n}} ξ^{2} (x) p_{N} (x) d x < \infty},

(38)

equipped with inner product

{〈 ξ, ψ 〉}_{L_{p_{N}}^{2} (R^{n})} ≜ \int_{R^{n}} ξ (x) ψ (x) p_{N} (x) d x .

(39)

The inner product’s subscript is omitted when the space can be inferred.

Since the components of

N

are independent, with

N_{i}

having variance

σ_{i}^{2} > 0

, the density of

N

factors into

\begin{matrix} p_{N} (y) & = \prod_{i = 1}^{n} p_{N_{i}} (y_{i}) \end{matrix}

(40)

\begin{matrix} = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π σ_{i}^{2}}} e^{- \frac{1}{2 σ_{i}^{2}} y_{i}^{2}} . \end{matrix}

(41)

We will construct an orthogonal basis for

L_{p_{N}}^{2} (R^{n})

from orthogonal bases for the spaces

L_{p_{N_{i}}}^{2} (R)

.

First, with

Z \sim N (0, 1)

, an orthogonal basis for

L_{p_{Z}}^{2} (R)

is given by the Hermite polynomials

{H_{m}}_{m = 0}^{\infty}

[20], which are defined through the generating function

e^{- \frac{1}{2} x^{2} + y x} = \sum_{m = 0}^{\infty} H_{m} (y) \frac{x^{m}}{m!} .

(42)

For any

m \in Z_{\geq 0}

, the mth Hermite polynomial has degree m and a positive leading coefficient. Next, for each

i \in {1, \dots, n}

and

m_{i} \in Z_{\geq 0}

, define the stretched Hermite polynomials

H_{m_{i}} (y; σ_{i}^{2}) ≜ \frac{1}{σ_{i}^{m_{i}}} H_{m} (\frac{y}{\sqrt{σ_{i}^{2}}}) .

(43)

The inner product of

L_{p_{N_{i}}}^{2} (R)

is related to that of

L_{p_{Z}}^{2} (R)

by

\begin{matrix} {〈 ξ, ψ 〉}_{L_{p_{N_{i}}}^{2} (R)} & = \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π σ_{i}^{2}}} e^{- \frac{1}{2 σ_{i}^{2}} y^{2}} ξ (y) ψ (y) d y \end{matrix}

(44)

\begin{matrix} = \int_{- \infty}^{\infty} \frac{1}{\sqrt{2 π}} e^{- \frac{1}{2} u^{2}} ξ (\sqrt{σ_{i}^{2}} u) ψ (\sqrt{σ_{i}^{2}} u) d u \end{matrix}

(45)

\begin{matrix} = {〈 ξ_{\sqrt{σ_{i}^{2}}}, ψ_{\sqrt{σ_{i}^{2}}} 〉}_{L_{p_{Z}}^{2} (R)}, \end{matrix}

(46)

where

ξ_{\sqrt{σ_{i}^{2}}}

and

ψ_{\sqrt{σ_{i}^{2}}}

are versions of

ξ

and

ψ

that have been stretched horizontally by a factor of

1 / \sqrt{σ_{i}^{2}}

. Substituting any

H_{m_{i}} (\cdot, σ_{i}^{2})

and

H_{l_{i}} (\cdot, σ_{i}^{2})

into (44) shows that the set

{H_{m_{i}} (\cdot, σ_{i}^{2})}_{m_{i} = 0}^{\infty}

is orthogonal. Furthermore, if there was a non-zero function (in an

L_{p_{N_{i}}}^{2} (R)

sense) that had a zero

L_{p_{N_{i}}}^{2} (R)

inner product with

H_{m_{i}} (\cdot, σ_{i}^{2})

for each

m_{i}

, then a stretched version of this function would also have a zero

L_{p_{Z}}^{2} (R)

inner product with

H_{m}

for each m. This would contradict the completeness of the Hermite polynomials in

L_{p_{Z}}^{2} (R)

; hence,

{H_{m_{i}} (\cdot, σ_{i}^{2})}_{m_{i} = 0}^{\infty}

forms a basis for

L_{p_{N_{i}}}^{2} (R)

. Lastly, the stretched Hermite polynomials

{H_{m_{i}} (\cdot, σ_{i}^{2})}_{m_{i} = 0}^{\infty}

have the generating function

e^{- \frac{1}{2 σ_{i}^{2}} x^{2} + \frac{1}{σ_{i}^{2}} y x} = \sum_{m_{i} = 0}^{\infty} H_{m_{i}} (y; σ_{i}^{2}) \frac{x^{m_{i}}}{m_{i}!} .

(47)

Now,

L_{p_{N}}^{2} (R^{n})

is isomorphic to the tensor product of the

L_{p_{N_{i}}}^{2} (R)

spaces,

i \in {1, \dots, n}

; consequently,

{H_{m} (\cdot; σ^{2}) | m \in Z_{\geq 0}^{n}}

forms an orthonormal basis for

L_{p_{N}}^{2} (R^{n})

[21], where

H_{m} (y; σ^{2}) ≜ \prod_{i = 1}^{n} H_{m_{i}} (y_{i}; σ_{i}^{2}) .

(48)

Since, by Lemma A2,

ln p (\cdot; F^{*}) \in L_{p_{N}}^{2} (R^{n})

, there exist constants

{c_{i}^{'}}_{i \in Z_{\geq 0}^{n}}

for which

ln p (y; F^{*}) = \sum_{i \in Z_{\geq 0}^{n}} c_{i}^{'} H_{i} (y; σ^{2}),

(49)

where equality is in an

L_{p_{N}}^{2} (R^{n})

sense. Then, substituting (47), and using the notations

m! ≜ m_{1}! \dots m_{n}!,

(50)

and

x^{m} ≜ x_{1}^{m_{1}} \dots x_{n}^{m_{n}}

(51)

for

m \in Z_{\geq 0}^{n}

and

x \in R^{n}

, we write

\begin{matrix} e^{- \frac{1}{2} x^{T} Σ^{- 1} x + x^{T} Σ^{- 1} y} & = \prod_{i = 1}^{n} e^{- \frac{1}{2 σ_{i}^{2}} x_{i}^{2} + \frac{1}{σ_{i}^{2}} y_{i} x_{i}} \end{matrix}

(52)

\begin{matrix} = \prod_{i = 1}^{n} \sum_{m_{i} = 0}^{\infty} H_{m_{i}} (y_{i}; σ_{i}^{2}) \frac{x_{i}^{m_{i}}}{m_{i}!} \end{matrix}

(53)

\begin{matrix} = \sum_{m \in Z_{\geq 0}^{n}} \prod_{i = 1}^{n} H_{m_{i}} (y_{i}; σ_{i}^{2}) \frac{x_{i}^{m_{i}}}{m_{i}!} \end{matrix}

(54)

\begin{matrix} = \sum_{m \in Z_{\geq 0}^{n}} H_{m} (y; σ^{2}) \frac{x^{m}}{m!} . \end{matrix}

(55)

Substituting (49) and (55) into the integral term in (36) yields

\begin{matrix} \int_{R^{n}} p_{N} (y - x) ln p (y; F^{*}) d y & = \int_{R^{n}} p_{N} (y) e^{- \frac{1}{2} x^{T} Σ^{- 1} x + x^{T} Σ^{- 1} y} \sum_{i \in Z_{\geq 0}^{n}} c_{i}^{'} H_{i} (y; σ^{2}) d y \end{matrix}

(56)

\begin{matrix} = \int_{R^{n}} p_{N} (y) \sum_{m \in Z_{\geq 0}^{n}} \frac{x^{m}}{m!} H_{m} (y; σ^{2}) \sum_{i \in Z_{\geq 0}^{n}} c_{i}^{'} H_{i} (y; σ^{2}) d y \end{matrix}

(57)

\begin{matrix} = \sum_{m \in Z_{\geq 0}^{n}} \frac{x^{m}}{m!} \sum_{i \in Z_{\geq 0}^{n}} c_{i}^{'} 〈 H_{m} (y; σ^{2}), H_{i} (y; σ^{2}) 〉 \end{matrix}

(58)

\begin{matrix} = \sum_{m \in Z_{\geq 0}^{n}} c_{m} x^{m}, \end{matrix}

(59)

where

c_{m} = c_{m}^{'} / ρ_{m}

, with

\infty > ρ_{m} ≜ \frac{m!}{〈 H_{m} (y; σ^{2}), H_{m} (y; σ^{2}) 〉} > 0 .

(60)

This simplification to a polynomial will be helpful since the cost function associated with the even-moment constraint is also a polynomial. This relationship is exploited in Section 3.3.

3.3. Non-Constancy of $s (\cdot; F^{*})$

Recall the relationship

s u p p (F^{*}) \subseteq Z (s; F^{*}) \cap A

from (37). Since

s u p p (F^{*}) \neq \emptyset

,

s (\cdot; F^{*})

has at least one zero, it is constant if and only if

Z (s; F^{*}) = R^{n} .

This subsection is dedicated to showing the latter equivalent condition. The immediate implication is that

s u p p (F^{*})

is a strict subset of

R^{n}

; however, the fact that

s (\cdot; F^{*})

is a non-zero real-analytic function will be used in Section 3.4 to prove the main results.

By way of contradiction, suppose that, for all

x \in R^{n}

,

s (x; F^{*}) = 0

. Substituting (59) into (36), this is equivalent to

\begin{matrix} [γ a - C - h (N)] - γ (\sum_{i = 1}^{n} x_{i}^{2})^{k} & = \sum_{m \in Z_{\geq 0}^{n}} c_{m} x_{1}^{m_{1}} \dots x_{n}^{m_{n}}, \end{matrix}

(61)

for all

x \in R^{n}

. The discussion proceeds in two cases:

k = 1

and

k > 1

.

Case $k = 1$ :

In the case that

k = 1

and

A = R^{n}

,

X^{*}

is known to be Gaussian [22] and there is no contradiction with (61). Therefore, for

k = 1

, we focus only on compact input alphabets

A ⊊ R^{n}

.

With

k = 1

and

A ⊊ R^{n}

, ref. (61) reduces to

\begin{matrix} [γ a - C - h (N)] - γ \sum_{i = 1}^{n} x_{i}^{2} & = \sum_{m \in Z_{\geq 0}^{n}} c_{m} x_{1}^{m_{1}} \dots x_{n}^{m_{n}}, \end{matrix}

(62)

for all

x \in R^{n}

. Let

e_{i}

be the i′th row of the

n \times n

identity matrix and let

0 \in Z_{\geq 0}^{n}

be the all zero vector. Since (62) holds for all

x \in R^{n}

, matching coefficients gives

c_{m} = \{\begin{matrix} γ a - C - h (N), & if m = 0 \\ - γ, & if m = 2 e_{i}, i \in {1, \dots, n} \\ 0, & otherwise . \end{matrix}

(63)

Since, for each

i \in {1, \dots, n}

and

m \in Z_{\geq 0}^{n}

,

H_{m_{i}} (y_{i}; σ_{i}^{2})

has degree

m_{i}

and a positive leading coefficient,

H_{m} (y; σ^{2}) = \prod_{i = 1}^{n} H_{m_{i}} (y_{i}; σ_{i}^{2})

(64)

also has degree

m_{i}

in

y_{i}

and the unique term with total degree

d (m) = m_{1} + \dots + m_{n}

is

y^{m_{1}} \dots y^{m_{n}}

, which has a positive coefficient. Therefore, the polynomials present in the sum are of the form

H_{0} (y; σ^{2}) = κ_{0} > 0

(65)

and, for

i \in {1, \dots, n}

,

H_{2 e_{i}} (y; σ^{2}) = κ_{2 e_{i}} y_{i}^{2} + α_{2 e_{i}} y_{i} + β_{2 e_{i}} .

(66)

The constants

κ_{0}

and

κ_{2 e_{i}}

are positive, while

α_{2 e_{i}}

and

β_{2 e_{i}}

are real. Substituting this and the identity

c_{m}^{'} = c_{m} ρ_{m}

into (49) yields

\begin{matrix} ln p (y; F^{*}) & = c_{0} ρ_{0} H_{0} (y; σ^{2}) + \sum_{i = 1}^{n} c_{2 e_{i}} ρ_{2 e_{i}} H_{2 e_{i}} (y; σ^{2}) \end{matrix}

(67)

\begin{matrix} = κ_{0} ρ_{0} [γ a - C - h (N)] - γ \sum_{i = 1}^{n} ρ_{2 e_{i}} (κ_{2 e_{i}} y_{i}^{2} + α_{2 e_{i}} y_{i} + β_{2 e_{i}}), \end{matrix}

(68)

or equivalently,

p (y; F^{*}) = e^{κ_{0} ρ_{0} [γ a - C - h (N)]} \prod_{i = 1}^{n} e^{- γ ρ_{2 e_{i}} (κ_{2 e_{i}} y_{i}^{2} + α_{2 e_{i}} y_{i} + β_{2 e_{i}})} .

(69)

By definition,

γ \geq 0

; however,

γ = 0

results in a constant density on

R^{n}

, which is invalid. Then, it must be the case that

γ > 0

. Thus, the output achieved by

X^{*}

,

Y^{*} ≜ X^{*} + N

has independent Gaussian components. Since

X^{*}

and

N

are independent and

N

is an n-variate Gaussian random variable,

X^{*}

must either be an n-variate Gaussian random variable or be almost surely equal to some

x_{0} \in R^{n}

. In the former case,

X^{*}

violates the stipulation that the input alphabet

A

is compact and contradicts the assumption that

s (\cdot; F^{*})

is identically 0. In the latter case,

s u p p (F^{*}) = {x_{0}}

is trivial and satisfies the main results of the paper.

Case $k > 1$ :

For the case

k > 1

, we derive a contradiction to (61) using results on the rate of decay of a function compared with that of its Fourier transform to conclude that

s (\cdot; F^{*})

is not identically 0.

Lemma 1.

Let

U \in R^{n}

have, for some

β > 0

, a characteristic function satisfying

| ϕ_{U} (ω) | ≜ | E [e^{i ω^{T} U}] | \leq e^{- \frac{{β ∥ ω ∥}^{2}}{2}}

(70)

for all

ω \in R^{n}

. Let

V

be a random variable independent of

U

. Then, the characteristic function of

W = U + V

satisfies, for all

ω \in R^{n}

,

| ϕ_{W} (ω) | \leq e^{- \frac{{β ∥ ω ∥}^{2}}{2}} .

(71)

Proof.

By the independence of

U

and

V

, and the fact that characteristic functions have pointwise moduli upper-bounded by 1,

\begin{matrix} | ϕ_{W} (ω) | & = | ϕ_{U} (ω) | | ϕ_{V} (ω) | \end{matrix}

(72)

\begin{matrix} \leq | ϕ_{V} (ω) | e^{- \frac{{β ∥ ω ∥}^{2}}{2}} \end{matrix}

(73)

\begin{matrix} \leq e^{- \frac{{β ∥ ω ∥}^{2}}{2}} . \end{matrix}

(74)

□

Lemma 2.

Let

U \in R^{n}

have, for some constant

β > 0

, a characteristic function satisfying

| ϕ_{U} (ω) | ≜ | E [e^{i ω^{T} U}] | \leq e^{- \frac{{β ∥ ω ∥}^{2}}{2}}

(75)

for all

ω \in R^{n}

. Let

V

be a random variable independent of

U

and

W = U + V

have density

p_{W} (\cdot)

. If there exist positive constants α and K such that, for all

x \in R^{n}

,

p_{W} (x) \leq K e^{- {α ∥ x ∥}^{2}},

(76)

then

α β \leq 0.5

.

Proof.

Apply Lemma 1 and Theorem 4 of [23], noting that an identically 0 function cannot be a density. □

We make use of Lemma 2 by setting

U = N

,

V = X^{*}

, and

W = Y^{*}

and deriving a contradiction to the assumption that

s (\cdot; F^{*})

is identically 0. Note that, using Rayleigh quotients, the modulus of the characteristic function of

N

can be upper-bounded for any

ω \in R^{n}

by

| E [e^{i ω^{T} N}] | = e^{- \frac{1}{2} ω^{T} Σ ω} \leq e^{- \frac{1}{2} σ_{1}^{2} {∥ ω ∥}^{2}} .

(77)

That is, the characteristic function of

N

satisfies (75) with

β = σ_{1}^{2}

.

To complete the contradiction, we show that there exists

α > 0.5 / β

and

K_{α} > 0

such that

p (\cdot; F^{*})

satisfies the bound in (76). The assumption that

s (\cdot; F^{*})

is identically 0 yields (61); substituting the Multinomial Theorem,

[γ a - C - h (N)] - γ \sum_{k_{1} + \dots + k_{n} = k} \frac{k!}{k_{1}! \dots k_{n}!} x_{1}^{2 k_{1}} \dots x_{n}^{2 k_{n}} = \sum_{m \in Z_{\geq 0}^{n}} c_{m} x_{1}^{m_{1}} \dots x_{n}^{m_{n}} .

(78)

By coefficient matching in (78), the set of non-zero coefficients, other than

c_{0}

, is indexed by the set

M ≜ {m \in Z_{\geq 0}^{n} | \sum_{i = 1}^{n} m_{i} = 2 k and m_{i} is even \forall i \in {1, \dots, n}} .

(79)

Furthermore, for

m \in M

,

c_{m} = - γ \frac{(\sum_{i = 1}^{n} m_{i} / 2)!}{(m_{1} / 2)! \dots (m_{n} / 2)!}

(80)

Therefore, substituting (80) into (49),

\begin{matrix} p (y; F^{*}) & = e^{κ_{0} ρ_{0} (γ a - C - h (N))} e^{\sum_{m \in M} c_{m} ρ_{m} H_{m} (y; σ^{2})} . \end{matrix}

(81)

for some positive constant

κ_{0} = H_{0} (y; σ^{2})

. As with the case

k = 1

,

γ = 0

results in a constant output density over

R^{n}

and can be disregarded as a possibility. Thus, for each

m \in M

, we have

c_{m} < 0 .

(82)

With

W = Y^{*}

and

α > 0

, showing that there exists

K_{α}

for which (76) holds, is equivalent to showing that

\begin{matrix} \frac{p (y; F^{*})}{e^{- {α ∥ y ∥}^{2}}} = e^{κ_{0} ρ_{0} (γ a - C - h (N))} e^{\sum_{m \in M} c_{m} ρ_{m} H_{m} (y; σ^{2}) + α \sum_{i = 1}^{n} y_{i}^{2}} \end{matrix}

(83)

is bounded. This, in turn, is equivalent to showing that the polynomial in the exponent,

q_{α} (y) ≜ \sum_{m \in M} c_{m} ρ_{m} H_{m} (y; σ^{2}) + α \sum_{i = 1}^{n} y_{i}^{2},

(84)

is upper bounded. We proceed by considering the degrees of the terms of

q_{α} (y)

to determine the behavior of (84) as

∥ y ∥

increases.

For each

m \in Z_{\geq 0}^{n}

and

i \in {1, \dots, n}

,

H_{m} (y; σ^{2})

has degree

m_{i}

in

y_{i}

. Furthermore,

H_{m} (y; σ^{2})

has total degree

d (m) ≜ \sum_{i = 1}^{n} m_{i}

(85)

and the unique highest degree term,

y_{1}^{m_{1}} \dots y_{n}^{m_{n}}

, has coefficient

κ_{m} > 0

. Note that, since

k > 1

, and by the definition of

M

,

q_{α} (\cdot)

has total degree

2 k \geq 4

. Hence, ref. (84) can be rewritten as

q_{α} (y) = q_{α}^{(0)} (y) + q_{α}^{(1)} (y) + q_{α}^{(2)} (y)

, where

\begin{matrix} q_{α}^{(0)} (y) & ≜ \sum_{i = 1}^{n} c_{2 k e_{i}} ρ_{2 k e_{i}} κ_{2 k e_{i}} y_{i}^{2 k} = - γ \sum_{i = 1}^{n} ρ_{2 k e_{i}} κ_{2 k e_{i}} y_{i}^{2 k}, \end{matrix}

(86)

\begin{matrix} q_{α}^{(1)} (y) & ≜ \sum_{\begin{matrix} m \in M \\ m \notin {2 k e_{i} ∣ i \in {1, \dots, n}} \end{matrix}} c_{m} ρ_{m} κ_{m} y^{m}, \end{matrix}

(87)

and

q_{α}^{(2)} (y)

is the sum of the remaining terms, each with a total degree of at most

2 k - 2

.

Note the following:

For each $y \in R^{n}$ , $q_{α}^{(0)} (y) \leq 0$ and, by Lemma A9, the minimal value of $| q_{α}^{(0)} (y) |$ —evaluated on a sphere $∥ y ∥ = L$ of radius $L \geq 0$ —is at least $γ {min}_{i \in {1, \dots, n}} {ρ_{2 k e_{i}} κ_{2 k e_{i}}} L^{2 k} / n^{k}$ .
For each $y \in R^{n}$ , we have that $q_{α}^{(1)} (y) \leq 0$ . Indeed, for each $m \in M$ , $c_{m} < 0$ by (82), $ρ_{m} > 0$ , and $κ_{m} > 0$ ; further, for each $i \in {1, \dots, n}$ , $m_{i}$ is even.
The maximum value of $| q_{α}^{(2)} (y) |$ , evaluated on a sphere $∥ y ∥ = L$ of radius $L \geq 1$ , is at most $A L^{2 k - 2}$ for some $A > 0$ —that is, each term of $q_{α}^{(2)} (y)$ is either of the form $α y_{i}^{2}$ or $c_{m} ρ_{m} ν_{m, l} y^{l}$ for some $m \in M, l \in Z_{\geq 0}^{n}$ , and $ν_{m, l} \in R$ , where $d (l) \leq 2 k - 2$ . Lemma A10 shows that these are no more than $α L^{2}$ or $c_{m} ρ_{m} ν_{m, l} L^{d (l)}$ .

We conclude that, since

q_{α}^{(0)} (y) \leq 0

and

q_{α}^{(1)} (y) \leq 0

for all

y \in R^{n}

,

\begin{matrix} lim_{∥ y ∥ \to \infty} q_{α} (y) & = lim_{∥ y ∥ \to \infty} (q_{α}^{(0)} (y) + q_{α}^{(2)} (y)) + lim_{∥ y ∥ \to \infty} q_{α}^{(1)} (y) \end{matrix}

(88)

\begin{matrix} \leq lim_{∥ y ∥ \to \infty} (q_{α}^{(0)} (y) + q_{α}^{(2)} (y)) \end{matrix}

(89)

\begin{matrix} = - \infty . \end{matrix}

(90)

Thus, since

q_{α} (y)

is a continuous function that satisfies (90), it is bounded from above. Let

M_{q, α} = {sup}_{y \in R^{n}} q_{α} (y)

and

K_{α} = e^{κ_{0} ρ_{0} (γ a - C - h (N))} e^{M_{q, α}} .

(91)

Then, for all

y \in R^{n}

,

p (y; F^{*}) \leq K_{α} e^{- {α ∥ y ∥}^{2}} .

(92)

Recall that, with

σ_{1}^{2} > 0

, the smallest eigenvalue of

Σ

, and

β = σ_{1}^{2}

, the characteristic function of

N

satisfies (75). Let

α = 1 / σ_{1}^{2}

and choose

K_{α}

according to (91). Then,

p (y; F^{*})

satisfies (92), yet

α β = 1 > 0.5

. Hence, the bound on the characteristic function of

N

given by (77) and the bound on the density of

Y^{*}

given in (92) contradict Lemma 2. Therefore, the coefficient matching equation (78) cannot hold for all

x \in R^{n}

and we conclude that, for

k > 1

,

s (\cdot; F^{*})

cannot be identically 0 on

R^{n}

.

We summarize the results of the two cases,

k = 1

and

k > 1

, in a theorem.

Theorem 1.

Suppose that either

1.: $A ⊊ R^{n}$ is compact, or
2.: $A = R^{n}$ , with $k \neq 1$ .

Then, either

s u p p (F^{*}) = {x_{0}}

for some

x_{0} \in R^{n}

or

Z (s; F^{*}) ⊊ R^{n} .

(93)

An immediate consequence of Theorem 1 is that

s u p p (F^{*})

is a strict subset of

R^{n}

. Recall from Section 3.1 that

s (\cdot; F^{*})

has an analytic extension to

C^{n}

. Therefore, Theorem 1 shows that

s u p p (F^{*})

is “sparse" in the sense used by [12]—that is, there exists a non-zero function with an analytic extension to

C^{n}

that is zero on

s u p p (F^{*})

. However, the primary importance of Theorem 1 is as an intermediary result that is used in Section 3.4 to obtain a better understanding of the structure of

s u p p (F^{*})

.

3.4. Main Results

In this section, we use geometry to show that

s u p p (F^{*})

is contained in a countable disjoint union of submanifolds of dimensions ranging

0, \dots, n - 1

. Furthermore, this union is finite when

A

is compact. We then show that

s u p p (F^{*})

has Lebesgue measure 0 and is nowhere dense in

R^{n}

.

The discussions in this section consider subsets of a vector’s components; so, for

x \in R^{n}

and

i \in {1, \dots, n}

, we introduce the notation

x^{i} ≜ (x_{1}, \dots, x_{i}) \in R^{i} .

(94)

Recall from (37) in Section 3.1 that

\begin{matrix} s u p p (F^{*}) & \subseteq Z (s; F^{*}) \cap A . \end{matrix}

(95)

Since, by Lemma A8,

s (\cdot; F^{*})

has an analytic extension to

C^{n}

, it is real-analytic, which motivates us to study the geometry of zero sets of real-analytic functions. We start by restating Theorem 6.3.3 of [24] to the level that is needed in this paper.

Theorem 2

(Structure Theorem). Let

ψ (\cdot) : R^{n} \to R

be a real-analytic function, where

ψ (0, \dots, 0, x_{n})

is not identically 0 in

x_{n}

. After a rotation of the coordinates

x_{1}, \dots, x_{n - 1}

, there exist constants

δ_{m}

,

m \in {1, \dots, n}

such that with

Q ≜ {x \in R^{n} ∣ | x_{m} | < δ_{m} \forall m \in {1, \dots, n}},

(96)

we have

{x \in Q ∣ ψ (x) = 0} = ⋃_{i = 0}^{n - 1} V_{i},

(97)

where

V_{0}

is either empty or contains only the origin and

V_{i}

,

i \in {1, \dots, n - 1}

, is a finite disjoint union of i-dimensional submanifolds—that is, for each

i \in {1, \dots, n - 1}

, there exists

n_{i}

for which

V_{i} = ⋃_{j = 0}^{n_{i}} Γ_{i}^{j},

(98)

where each

Γ_{i}^{j}

is an i-dimensional submanifold. Furthermore, letting

Q_{i} ≜ {x^{i} \in R^{i} ∣ | x_{m} | < δ_{m} \forall m \in {1, \dots, i}},

(99)

there exists an open set

Ω_{i}^{j} \subseteq Q_{i}

and real-analytic functions

α_{i}^{j, m} (\cdot)

,

m \in {i + 1, \dots, n}

, on

Ω_{i}^{j}

for which

Γ_{i}^{j} = {(x^{i}, α_{i}^{j, i + 1} (x^{i}), \dots, α_{i}^{j, n} (x^{i})) \in R^{n} ∣ x^{i} \in Ω_{i}^{j}} .

(100)

We apply Theorem 2 to characterize the zero set of

s (\cdot; F^{*})

in the form of (97) and obtain the following result.

Theorem 3.

Suppose that either

1.: $A ⊊ R^{n}$ is compact, or
2.: $A = R^{n}$ , with $k \neq 1$ .

Then,

s u p p (F^{*}) \subseteq Z (s; F^{*}) \cap A = A \cap (⋃_{i = 0}^{n - 1} T_{i}),

(101)

where

T_{0}

is a countable union of isolated points and

T_{i}

,

i \in {1, \dots, n - 1}

, is a countable disjoint union of i-dimensional submanifolds. Furthermore, if

A

is compact, these unions are finite.

Proof.

First, note that, by Theorem 1, either

s u p p (F^{*}) = {x_{0}}

for some

x_{0} \in R^{n}

or

s (\cdot; F^{*})

is not identically 0 on

R^{n}

. In the former case, the result is trivially true; so, assume that

s (\cdot; F^{*})

is not identically 0 on

R^{n}

. Therefore, for any

q \in Q^{n}

, we can translate

s (\cdot; F^{*})

by

q

and rotate its coordinate system to apply Theorem 2—that is, there exists a sufficiently small open set

Q^{q}

around

q

such that

Z (s; F^{*}) \cap Q^{q} = ⋃_{i = 0}^{n - 1} V_{i}^{q},

(102)

where the

V_{i}

values are as in Theorem 2.

Since

Q^{n}

is dense in

R^{n}

,

A \subseteq ⋃_{q \in Q^{n}} Q^{q}

(103)

Furthermore, if

A

is compact, the open cover

{Q^{q}}_{q \in Q^{n}}

has a finite subcover

{Q^{q_{j}}}_{j = 1}^{m}

—that is,

A \subseteq ⋃_{j = 1}^{m} Q^{q_{j}} .

(104)

Defining the index set

M ≜ \{\begin{matrix} Q^{n}, & A = R^{n} \\ {q_{j}}_{j = 1}^{m}, & A ⊊ R^{n} is compact, \end{matrix}

(105)

we obtain

\begin{matrix} Z (s; F^{*}) \cap A & = A \cap (⋃_{q \in M} (Z (s; F^{*}) \cap Q^{q})) \end{matrix}

(106)

\begin{matrix} = A \cap (⋃_{q \in M} ⋃_{i = 0}^{n - 1} V_{i}^{q}) \end{matrix}

(107)

\begin{matrix} = A \cap (⋃_{i = 0}^{n - 1} ⋃_{q \in M} V_{i}^{q}) . \end{matrix}

(108)

Since, for each

q \in M

,

V_{0}^{q}

is either empty or a single point,

T_{0} ≜ ⋃_{q \in M} V_{0}^{q}

(109)

is a countable set of points and is finite when

A

is compact. Furthermore, each

V_{i}^{q}

, where

i \in {1, \dots, n - 1}

, is itself a finite union of i-dimensional submanifolds. Hence,

T_{i} ≜ ⋃_{q \in M} V_{i}^{q}

(110)

is a countable union of i-dimensional submanifolds. When

A

is compact, this union is also finite. □

Note that Theorem 3 agrees with the results of [5] when the cases overlap. Indeed, consider the case in which

A

is a ball centered at the origin,

k = 1

, and the noise covariance matrix

Σ = t I_{n}

, where

I_{n}

is the

n \times n

identity matrix and

t > 0

. Then, ref. [5] shows that the capacity-achieving distribution is supported on a finite number of concentric

(n - 1)

-spheres. Each

(n - 1)

-sphere is an

n - 1

dimensional submanifold.

In the next two theorems, we show that

s u p p (F^{*})

has Lebesgue measure 0 and is nowhere dense in

R^{n}

.

Theorem 4.

Suppose that either

1.: $A ⊊ R^{n}$ is compact, or
2.: $A = R^{n}$ , with $k \neq 1$ .

Let

μ (\cdot)

denote the n-dimensional Lebesgue measure. Then,

μ (s u p p (F^{*})) = 0 .

(111)

Proof.

By Theorem 3, we have

\begin{matrix} s u p p (F^{*}) & \subseteq A \cap (⋃_{i = 0}^{n - 1} T_{i}) \end{matrix}

(112)

\begin{matrix} \subseteq ⋃_{i = 0}^{n - 1} ⋃_{q \in M} V_{i}^{q}, \end{matrix}

(113)

where

M

is countable. Note that for each

q \in M

,

V_{0}^{q}

is either empty or a single point; so,

μ (V_{0}^{q}) = 0

. Furthermore, for each

q \in M

and

i \in {1, \dots, n - 1}

,

V_{i}^{q}

is a finite disjoint union of

n_{i}^{q}

i-dimensional submanifolds, and for

i \leq n - 1

, each submanifold has Lebesgue measure 0. Therefore,

μ (s u p p (F^{*})) = 0

. □

We will now define the notion of a subset being nowhere dense in its superset and show that

s u p p (F^{*})

is nowhere dense in

R^{n}

.

Definition 3.

Let

B \subseteq R^{n}

. A set

A \subseteq B

is said to be dense in B if, for every

b \in B

, there exists a sequence

{a_{i}}_{i = 0}^{\infty} \subseteq A

such that

lim_{i \to \infty} a_{i} = b .

(114)

Definition 4.

Let

B \subseteq R^{n}

. A set

A \subseteq B

is called nowhere dense in B if, for every open set

U \subseteq B

,

A \cap U

is not dense in U.

Theorem 5.

Suppose that either

1.: $A ⊊ R^{n}$ is compact, or
2.: $A = R^{n}$ , with $k \neq 1$ .

Then,

s u p p (F^{*})

is nowhere dense in

R^{n}

.

Proof.

By Theorem 1, either

s u p p (F^{*}) = {x_{0}}

for some

x_{0} \in R^{n}

or

Z (s; F^{*}) ⊊ R^{n} .

(115)

Since, in the former, case

s u p p (F^{*})

is nowhere dense in

R^{n}

, assume the latter and let (115) hold. Let

U \subseteq R^{n}

be a non-empty open set; we will show the result by proving that

Z (s; F^{*}) \cap U

is not dense in U.

Fix

x \in U

. Translating

s (\cdot; F^{*})

by

x

, rotating the coordinate system and applying Theorem 2 shows that there exists a sufficiently small open set Q containing

x

on which

\begin{matrix} Z (s; F^{*}) \cap Q & = ⋃_{i = 0}^{n - 1} V_{i} \end{matrix}

(116)

\begin{matrix} = V_{0} \cup (⋃_{i = 1}^{n - 1} ⋃_{j = 0}^{n_{i}} Γ_{i}^{j}) . \end{matrix}

(117)

It suffices to show the existence of a point of the form

(x^{n - 1}, u_{n}) \in U \cap Q

that is not the limit of any sequence in

Z (s; F^{*}) \cap U \cap Q

. Let

(y_{m}^{n - 1}, y_{n, m})

be a convergent sequence in

Z (s; F^{*}) \cap U \cap Q

, indexed by m, for which

\begin{matrix} lim_{m \to \infty} y_{m}^{n - 1} = x^{n - 1} . \end{matrix}

(118)

Using the parameterization from (100), the n’th component of sequence index m satisfies one of the following:

$y_{n, m} \in {v_{n} ∣ v \in V_{0}}$ , or
for some $i_{m} \in {1, \dots, n - 1}$ and $j_{m} \in {1, \dots, n_{i_{m}}}$ ,

$y_{n, m} = α_{i_{m}}^{j_{m}, n} (y_{m}^{i_{m}}) .$

(119)

Since each $α_{i_{m}}^{j_{m}, n} (\cdot)$ is real-analytic, it is continuous. Then, for $y_{n, m}$ satisfying (119), if ${lim}_{m \to \infty} y_{n, m}$ exists, we have ${lim}_{m \to \infty} y_{n, m} = α_{i}^{j, n} (x^{i})$ for some $i \in {1, \dots, n - 1}$ and $j \in {1, \dots, n_{i}}$ . Since $V_{0}$ is either empty or a single point, the number of possible values for ${lim}_{m \to \infty} {(y_{n})}_{m}$ is at most

$\begin{matrix} | V_{0} | + \sum_{i = 1}^{n - 1} n_{i} & < \infty . \end{matrix}$

(120)

However, since

x \in U \cap Q

, where

U \cap Q

is open, the set

{t \in R ∣ (x^{n - 1}, x_{n} + t) \in U \cap Q}

is uncountable. Thus, there exists t such that

(x^{n - 1}, x_{n} + t) \in U \cap Q

is not the limit of any sequence in

Z (s; F^{*}) \cap U \cap Q

. □

4. Discussion

This paper has considered vector-valued channels with additive Gaussian noise. Unlike much of the prior work in this area, the noise was not limited to having independent and identically distributed components. The support of the capacity-achieving input distribution was discussed when inputs were subjected to an even-moment radial constraint of order

2 k

. Furthermore, the inputs were either allowed to take any value in

R^{n}

or restricted to a compact set. When the input alphabet was the entire space,

R^{n}

, only the case

k \geq 2

was considered since, for

k = 1

, the optimal input distribution is well-known to be Gaussian.

The problem was framed as a convex optimization problem that was shown to be solved by a unique input distribution

F^{*}

. The conditions for optimality yielded a real-analytic function

s (\cdot; F^{*})

whose zero set contained

s u p p (F^{*})

, the support of

F^{*}

. Using the framework of an

L^{2}

space that was weighted by the noise density,

s (\cdot; F^{*})

was simplified and shown to be non-constant on

R^{n}

. Through geometric analysis of the zero set of

s (\cdot; F^{*})

,

s u p p (F^{*})

was shown to be contained in a countable union of single points and submanifolds of dimensions ranging

1, \dots, n - 1

. When the input alphabet was compact, this union was further shown to be finite. Finally, it was determined that

s u p p (F^{*})

has Lebesgue measure 0 and is nowhere dense in

R^{n}

.

This paper is an expansion of the work concerning even-moment input constraints in [14] to vector-valued channels that are not necessarily spherically symmetric. Viewed as a generalization of [12], it considers order

2 k

rather than second-moment radial constraints and includes

R^{n}

as a possible input alphabet. Unlike prior work, it also provides geometric results on the supports of capacity-achieving inputs to spherically asymmetric channels.

Author Contributions

Conceptualization, J.E., R.R.M. and P.M.; methodology, J.E., R.R.M. and P.M.; validation, J.E., R.R.M. and P.M.; formal analysis, J.E., R.R.M. and P.M.; investigation, J.E.; resources, J.E., R.R.M. and P.M.; writing—original draft preparation, J.E.; writing—review and editing, J.E., R.R.M. and P.M.; supervision, R.R.M. and P.M.; project administration, R.R.M. and P.M.; funding acquisition, J.E., R.R.M. and P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC) through a CGS-M Grant (J.E.) and Discovery Grants (R.R.M. and P.M.).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Convexity and Compactness of Optimization Space

Theorem A1.

The following properties hold for sets defined in Section 3.1:

$F_{n} (A)$ is convex;
for any $b > 0$ , $P_{n} (A, k, b)$ is convex and compact;
$Q_{n} (A, k, a)$ is convex.

Proof.

We first show the convexity of

F_{n} (A)

. Let

F_{1}, F_{2} \in F_{n} (A)

,

t \in [0, 1]

, and

F_{t} = t F_{1} + (1 - t) F_{2}

. Then, since

s u p p (F_{t}) \subseteq s u p p (F_{1}) \cup s u p p (F_{2}) \subseteq A,

(A1)

F_{n} (A)

is convex.

To show convexity of

P_{n} (A, k, b)

for

b > 0

, let

F_{1}, F_{2} \in P_{n} (A, k, b)

,

t \in [0, 1]

and

F_{t} = t F_{1} + (1 - t) F_{2}

. Since

P_{n} (A, k, b) \subseteq F_{n} (A)

, it suffices to show that

F_{t}

satisfies the radial even-moment constraint:

\begin{matrix} \int_{R^{n}} {∥ x ∥}^{2 k} d F_{t} (x) & = t \int_{R^{n}} {∥ x ∥}^{2 k} d F_{1} (x) + (1 - t) \int_{R^{n}} {∥ x ∥}^{2 k} d F_{2} (x) \end{matrix}

(A2)

\begin{matrix} \leq t b + (1 - t) b = b . \end{matrix}

(A3)

We now show the convexity of

Q_{n} (A, k, a)

. For any

F_{1}, F_{2} \in Q_{n} (A, k, a)

, there exists

b_{F_{1}, F_{2}} > 0

for which

F_{1}, F_{2} \in P (A, k, b_{F_{1}, F_{2}})

. Since

P (A, k, b_{F_{1}, F_{2}})

is convex,

t F_{1} + (1 - t) F_{2} \in P (A, k, b_{F_{1}, F_{2}}) \subseteq Q_{n} (A, k, a)

. Hence,

Q_{n} (A, k, a)

is convex.

It remains to show the compactness of

P_{n} (A, k, b)

for any

b > 0

. Note that the Lévy–Prokhorov metric metrizes weak convergence in

F (R^{n})

[25]; so, sequential compactness is equivalent to compactness. To prove compactness of

P_{n} (A, k, b)

, we first show relative compactness, which allows us to conclude that any sequence in

P_{n} (A, k, b)

has a subsequence that converges to some

F \in F_{n} (A)

. Further, showing that

F \in P_{n} (A, k, b)

will complete the proof.

Observe that each

F \in F (R^{n})

is defined on the complete separable metric space

R^{n}

equipped with Euclidean distance. By Prokhorov’s Theorem (Theorem 3.2.1 of [25]), the relative compactness of

P_{n} (A, k, b)

is equivalent to the tightness of

P_{n} (A, k, b)

; so, we will prove the latter.

To show the tightness of

P_{n} (A, k, b)

, let

X \sim F \in P_{n} (A, k, b)

,

ϵ > 0

, and

D = {(a / ϵ)}^{1 / 2 k}

. Then, applying Markov’s inequality,

\begin{matrix} P {X \in R^{n} ∖ \bar{B_{D}} (0)} & = P {∥ X ∥ > D} \end{matrix}

(A4)

\begin{matrix} \leq P {∥ X ∥^{2 k} \geq D^{2 k}} \end{matrix}

(A5)

\begin{matrix} \leq \frac{E [∥ X ∥^{2 k}]}{D^{2 k}} \end{matrix}

(A6)

\begin{matrix} \leq \frac{a}{D^{2 k}} \end{matrix}

(A7)

\begin{matrix} = ϵ . \end{matrix}

(A8)

This is a uniform upper-bound for

F \in P_{n} (A, k, b)

; so,

P_{n} (A, k, b)

is tight and, thus, relatively compact.

By the relative compactness of

P_{n} (A, k, b)

, any sequence

{F_{m}}_{m = 0}^{\infty} \subseteq P_{n} (A, k, b)

has a subsequence

{F_{m_{j}}}_{m_{j} = 0}^{\infty}

that converges weakly to some

F \in F (R^{n})

. To show compactness, we must show that

F \in P_{n} (A, k, b)

.

Since each

F_{m_{j}} \in P_{n} (A, k, b)

, it follows that

\int_{A} {∥ x ∥}^{2 k} d F_{m_{j}} (x) \leq a .

(A9)

By Theorem A.3.12 of [26], since

{∥ x ∥}^{2 k}

is non-negative and lower semicontinuous,

0 \leq \int_{A} {∥ x ∥}^{2 k} d F (x) \leq \underset{j \to \infty}{lim inf} \int_{A} {∥ x ∥}^{2 k} d F_{m_{j}} (x) \leq a .

(A10)

Therefore, the limiting distribution F satisfies the radial even-moment constraint imposed by

P_{n} (A, k, b)

.

When

A = R^{n}

, we conclude that

F \in P_{n} (A, k, b)

. However, for the case that

A

is compact, we must also show almost surely that

X \in A

. For any index of the subsequence

m_{j}

,

\int_{A} d F_{m_{j}} (x) = 1 .

(A11)

By the Portmanteau Theorem [27], since

A

is closed,

\int_{A} d F (x) \geq \underset{j \to \infty}{lim sup} \int_{A} d F_{m_{j}} (x) = 1 .

(A12)

□

Appendix B. Necessary Conditions for the Capacity-Achieving Distribution

Theorem A2.

Recall that

\begin{matrix} Q_{n} (A, k, a) & = ⋃_{b \geq a} P_{n} (A, k, b) . \end{matrix}

(A13)

Suppose that

F^{*}

solves the optimization problem in (26), where

γ \geq 0

is the Lagrange multiplier corresponding to the problem in (11). Then, the following are equivalent:

P.1: For every $F \in Q_{n} (A, k, a)$ ,

$\int_{A} h (x; F^{*}) d F (x) \leq γ (\int_{A} {∥ x ∥}^{2 k} d F (x) - a) + C + h (N) .$

(A14)
P.2: For all $x \in A$ ,

$h (x; F^{*}) \leq {γ (∥ x ∥}^{2 k} - a) + C + h (N),$

(A15)

and if $x \in s u p p (F^{*})$ , then

$h (x; F^{*}) = {γ (∥ x ∥}^{2 k} - a) + C + h (N) .$

(A16)

Proof.

For any

F \in Q_{n} (A, k, a)

, integrating both sides of (A15) with respect to

d F (\cdot)

yields that (P.2) implies (P.1).

It remains to be shown that (P.1) implies (P.2). Suppose this implication is false—that is, (P.1) holds but there either exists

v \in A

for which

h (v; F^{*}) {> γ (∥ v ∥}^{2 k} - a) + C + h (N),

(A17)

or there exists

w \in s u p p (F^{*})

such that

h (w; F^{*}) \neq {γ (∥ w ∥}^{2 k} - a) + C + h (N) .

(A18)

If (A17) holds, let

b = max {∥ v ∥^{2 k}, a}

and let

F (x) = \prod_{i = 1}^{n} u_{1} (x_{i} - v_{i})

, where

u_{1} (\cdot)

is the Heaviside step function. Then,

F \in P_{n} (A, k, b) \subseteq Q_{n} (A, k, a)

and

\begin{matrix} \int_{A} h (x; F^{*}) d F (x) - γ (\int_{A} {∥ x ∥}^{2 k} d F (x) - a) & = h (v; F^{*}) - {γ (∥ v ∥}^{2 k} - a) \end{matrix}

(A19)

\begin{matrix} > C + h (N), \end{matrix}

(A20)

contradicting (A14). Therefore, (A17) cannot be satisfied for any

x \in A

and we are left with the alternative that there exists

w \in s u p p (F^{*}) \subseteq A

for which (A18) hold—that is,

h (w; F^{*}) {< γ (∥ w ∥}^{2 k} - a) + C + h (N) .

(A21)

By Lemma A7, the extension of

h (\cdot; F)

to

C^{n}

is continuous; hence,

h (\cdot; F)

is continuous on

R^{n}

. Since

{∥ \cdot ∥}^{2 k}

is continuous as well, there exists

δ > 0

such that

h (x; F^{*}) {< γ (∥ x ∥}^{2 k} - a) + C + h (N)

(A22)

for every

x \in B_{δ} (w)

. Furthermore, since

w \in s u p p (F^{*})

, there exists

ϵ

such that

P {X^{*} \in B_{δ} (w)} = ϵ > 0

.

Recall from (18) that

h_{Y} (F^{*}) = - \int_{A} h (x; F^{*}) d F^{*} (x),

(A23)

and since

γ g_{k} (F^{*}) = 0

(see (27)),

γ a = γ \int_{A} {∥ x ∥}^{2 k} d F^{*} (x) .

(A24)

Substituting (A23) and (A24), and noting that

F^{*} (B_{δ} (w) \cap A^{C}) = 0

, yields

\begin{matrix} C + h (N) - γ a & = h_{Y} (F^{*}) - γ a \end{matrix}

(A25)

\begin{matrix} = \int_{A} [h (x; F^{*}) - γ {∥ x ∥}^{2 k}] d F^{*} (x) \end{matrix}

(A26)

\begin{matrix} = \int_{A \cup B_{δ} (w)} [h (x; F^{*}) - γ {∥ x ∥}^{2 k}] d F^{*} (x) \end{matrix}

(A27)

\begin{matrix} = \int_{B_{δ} (w)} [h (x; F^{*}) - γ {∥ x ∥}^{2 k}] d F^{*} (x) \\ + \int_{A ∖ B_{δ} (w)} [h (x; F^{*}) - γ {∥ x ∥}^{2 k}] d F^{*} (x) \end{matrix}

(A28)

\begin{matrix} < ϵ [C + h (N) - γ a] + (1 - ϵ) [C + h (N) - γ a] \end{matrix}

(A29)

\begin{matrix} = C + h (N) - γ a, \end{matrix}

(A30)

where the first term of (A29) is due to (A22) and the second is due to the contradiction derived from (A17). The above is a contradiction, which completes the proof. □

Appendix C. Integrability Results

Lemma A1.

Let

b > 0

and

F \in P_{n} (R^{n}, k, b)

. Then, there exist positive constants η and κ for which

| ln p (y; F) | \leq {η ∥ y ∥}^{2} + κ

(A31)

for any

y \in R^{n}

.

Proof.

With

η = \frac{2}{σ_{1}^{2}}

,

D = {(2 b)}^{\frac{1}{2 k}}

, and

M = \frac{1}{2 \sqrt{{(2 π)}^{n} | Σ |}},

(A32)

let

ζ_{y} = M e^{- η max {∥ y ∥^{2}, D^{2}}} .

(A33)

Then, for any

y \in R^{n}

,

0 < ζ_{y} \leq p (y; F) \leq 2 M

. Since

| ln (\cdot) |

is continuous and has no local maxima,

\begin{matrix} | ln p (y; F) | & \leq max {| ln (ζ_{y}) |, | ln (2 M) |} \end{matrix}

(A34)

\begin{matrix} \leq η max {∥ y ∥^{2}, D^{2}} + | ln M | + | ln (2 M) | \end{matrix}

(A35)

\begin{matrix} \leq {η ∥ y ∥}^{2} + η D^{2} + | ln M | + | ln (2 M) | . \end{matrix}

(A36)

□

Lemma A2.

For any

F \in P_{n} (R^{n}, k, a)

,

ln p (\cdot; F) \in L_{p_{N}}^{2} (R^{n})

Proof.

The result follows by Lemma A1. □

Lemma A3.

Let

β \geq α \geq 1

and

Z

be a random variable taking values in

R^{n}

. If

E [∥ Z ∥^{β}] = ρ

, then

E [∥ Z ∥^{α}] \leq ρ

+1.

Proof.

Since

β \geq α \geq 1

, we have

\begin{matrix} E [∥ Z ∥^{α}] & \leq {E [max {1, ∥ Z ∥}}^{α}] \end{matrix}

(A37)

\begin{matrix} \leq {E [max {1, ∥ Z ∥}}^{β}] \end{matrix}

(A38)

\begin{matrix} = {E [max {1, ∥ Z ∥}^{β}}] \end{matrix}

(A39)

\begin{matrix} \leq ρ + 1 . \end{matrix}

(A40)

□

Lemma A4.

For any

b > 0

and

F_{0}, F_{1} \in P_{n} (R^{n}, k, b)

,

\begin{matrix} \int_{R^{n}} | p (y; F_{0}) ln p (y; F_{1}) | d y & < \infty . \end{matrix}

(A41)

Proof.

Note that

\begin{matrix} \int_{R^{n}} | p (y; F_{0}) ln p (y; F_{1}) | d y & = \int_{R^{n}} p (y; F_{0}) | ln p (y; F_{1}) | d y \end{matrix}

(A42)

\begin{matrix} = \int_{R^{n}} \int_{R^{n}} p_{N} (y - x) | ln p (y; F_{1}) | d F_{0} (x) d y . \end{matrix}

(A43)

We proceed by first proving that

\begin{matrix} \int_{R^{n}} \int_{R^{n}} p_{N} (y - x) | ln p (y; F_{1}) | d y d F_{0} (x) & < \infty . \end{matrix}

(A44)

Then, since the integrand is non-negative, the Tonelli–Fubini Theorem [18] justifies interchanging the order of integration in (A44) to conclude that (A43) and, hence, the left side of (A42) is finite.

By Lemma A1, for any

y \in R^{n}

,

| ln p (y; F_{1}) {| \leq η ∥ y ∥}^{2} + κ_{b} .

(A45)

Initially considering only the inner integral in (A44), with the substitution

u = y - x

, yields

\begin{matrix} \int_{R^{n}} p_{N} (y - x) | ln p (y; F_{1}) | d y & \leq η \int_{R^{n}} p_{N} (y - x) {∥ y ∥}^{2} d y + κ_{b} \end{matrix}

(A46)

\begin{matrix} = η \int_{R^{n}} p_{N} (u) {∥ u + x ∥}^{2} d u + κ_{b} \end{matrix}

(A47)

\begin{matrix} \leq η \int_{R^{n}} p_{N} (u) (∥ u ∥ + {∥ x ∥)}^{2} d u + κ_{b} \end{matrix}

(A48)

\begin{matrix} \leq η \int_{R^{n}} p_{N} {(u) (4 ∥ u ∥}^{2} + {4 ∥ x ∥}^{2}) d u + κ_{b} \end{matrix}

(A49)

\begin{matrix} \leq {8 η E [∥ N ∥}^{2} {] + 8 η ∥ x ∥}^{2} + κ_{b}, \end{matrix}

(A50)

where (A48) is due to the triangle inequality. Since

E [∥ N ∥^{2}] = t r (Σ)

is finite, there are positive constants

\begin{matrix} κ_{0} & = {8 η E [∥ N ∥}^{2}] + κ_{b}, a n d \end{matrix}

(A51)

\begin{matrix} κ_{2} & = 8 η, \end{matrix}

(A52)

for which

\int_{R^{n}} p_{N} (y - x) | ln p (y; F_{1}) | d y \leq κ_{2} {∥ x ∥}^{2} + κ_{0} .

(A53)

Furthermore, since

k \geq 1

and

E_{F_{0}} {[∥ X ∥}^{2 k}] \leq b

, then by Lemma A3,

E_{F_{0}} {[∥ X ∥}^{2}] \leq b + 1

. Substituting this and (A53) into (A44), we obtain

\begin{matrix} \int_{R^{n}} \int_{R^{n}} p_{N} (y - x) | ln p (y; F_{1}) | d y d F_{0} (x) & \leq \int_{R^{n}} (κ_{2} {∥ x ∥}^{2} + κ_{0}) d F_{0} (x) \end{matrix}

(A54)

\begin{matrix} = κ_{2} E_{F_{0}} {[∥ X ∥}^{2}] + κ_{0} \end{matrix}

(A55)

\begin{matrix} \leq (b + 1) κ_{2} + κ_{0} \end{matrix}

(A56)

\begin{matrix} < \infty . \end{matrix}

(A57)

□

Appendix D. Properties of the Objective Functional

The aim of this section is to discuss the weak continuity, strict concavity, and weak differentiability of the objective function,

J_{γ} (F) = I (F) - γ g_{k} (F),

(A58)

for the optimization problem posed in (26). These properties are instrumental in the establishment and subsequent analysis of the convex optimization problem considered in Section 3. To support the proof of Theorem A2, we show that, for arbitrary

b > 0

, the required properties hold on

P_{n} (A, k, b)

.

Theorem A3.

I (\cdot)

is weak continuous on

P_{n} (A, k, a)

.

Proof.

For any

F \in P_{n} (A, k, a)

, we write

\begin{matrix} I (F) & = h_{Y} (F) - h (N) \end{matrix}

(A59)

\begin{matrix} = h_{Y} (F) - \frac{1}{2} ln | 2 π e Σ |, \end{matrix}

(A60)

where

h_{Y} (F)

is finite by Lemma A4. Therefore, weak continuity of

I (\cdot)

on

P_{n} (A, k, a)

is equivalent to weak continuity of

h_{Y} (\cdot)

on

P_{n} (A, k, a)

.

Let

{F_{m}}_{m = 0}^{\infty}

be a sequence in

P_{n} (A, k, a)

converging weakly to

F \in P_{n} (A, k, a)

. By the Helly–Bray Theorem, since

p_{N} (\cdot)

is bounded and continuous,

\begin{matrix} lim_{m \to \infty} p (y; F_{m}) & = lim_{m \to \infty} \int_{R^{n}} p_{N} (y - x) d F_{m} (x) \end{matrix}

(A61)

\begin{matrix} = \int_{R^{n}} p_{N} (y - x) d F (x) \end{matrix}

(A62)

\begin{matrix} = p (y; F), \end{matrix}

(A63)

for any

y \in R^{n}

. Therefore, by Scheffé’s Lemma, the sequence

{p (y; F_{m})}_{m = 0}^{\infty}

converges in total variation to

p (y; F)

. It suffices to show that differential entropy is uniformly continuous over

{p (\cdot; \hat{F}) | \hat{F} \in P_{n} (A, k, a)}

with respect to the total variation metric.

The family of densities

{p (\cdot; \hat{F}) | \hat{F} \in P_{n} (A, k, a)}

is uniformly upper bounded. Furthermore, the corresponding random vectors,

Y = X + N

for some

X \sim \hat{F} \in P_{n} (A, k, a)

, uniformly satisfy the bound

E [∥ Y ∥] \leq a + 1 + E [∥ N ∥]

(A64)

given by Lemma A3. The result follows by Theorem 1 of [28]. □

Theorem A4.

For any

b > 0

,

I (\cdot)

is strictly concave on

P_{n} (A, k, b)

.

Proof.

See Appendix E of [14]. □

Theorem A5.

g_{k} (\cdot)

is convex on

Q_{n} (A, k, a)

.

Proof.

Let

t \in [0, 1]

and

F_{0}, F_{1} \in Q_{n} (A, k, a)

. Then,

g_{k} (F_{0})

and

g_{k} (F_{1})

are finite and

\begin{matrix} g_{k} (t F_{0} + (1 - t) F_{1}) & = \int_{R^{n}} {∥ x ∥}^{2 k} d t F_{0} (x) + \int_{R^{n}} {∥ x ∥}^{2 k} d (1 - t) F_{1} (x) - a \end{matrix}

(A65)

\begin{matrix} = t (\int_{R^{n}} {∥ x ∥}^{2 k} d F_{0} (x) - a) + (1 - t) (\int_{R^{n}} {∥ x ∥}^{2 k} d F_{1} (x) - a) \end{matrix}

(A66)

\begin{matrix} = t g_{k} (F_{0}) + (1 - t) g_{k} (F_{1}) . \end{matrix}

(A67)

□

We make use of the following notion of a derivative of a function defined on a convex set

Ω

[14].

Definition A1.

Define the weak derivative of

L : Ω \to R

at

F_{0}

in the direction F by

L^{'} (F_{0}, F) = lim_{t ↓ 0} \frac{L ((1 - t) F_{0} + t F) - L (F_{0})}{t},

(A68)

whenever it exists.

Lemma A5.

g_{k} (\cdot)

is weakly differentiable on

Q_{n} (A, k, a)

. For any

F_{0}, F \in Q_{n} (A, k, a)

, the weak derivative is finite and given by

g_{k}^{'} (F_{0}, F) = g_{k} (F) - g_{k} (F_{0}) .

(A69)

Proof.

Let

t \in [0, 1]

and

F_{0}, F \in Q_{n} (A, k, a)

. Then, noting that

g_{k} (F_{0})

and

g_{k} (F_{1})

are finite,

\begin{matrix} g_{k} (t F + (1 - t) F_{0}) - g_{k} (F_{0}) & = g_{k} (t (F - F_{0}) + F_{0}) - g_{k} (F_{0}) \end{matrix}

(A70)

\begin{matrix} \begin{matrix} = \int_{A} {∥ x ∥}^{2 k} d [t (F (x) - F_{0} (x)) + F_{0} (x)] - a \\ - (\int_{A} {∥ x ∥}^{2 k} d F_{0} (x) - a) \end{matrix} \end{matrix}

(A71)

\begin{matrix} = t (\int_{A} {∥ x ∥}^{2 k} d F (x) - \int_{A} {∥ x ∥}^{2 k} d F_{0} (x)) \end{matrix}

(A72)

\begin{matrix} = t [\int_{A} {∥ x ∥}^{2 k} d F (x) - a - (\int_{A} {∥ x ∥}^{2 k} d F_{0} (x) - a)] \end{matrix}

(A73)

\begin{matrix} = t (g_{k} (F) - g_{k} (F_{0})) . \end{matrix}

(A74)

Dividing by t and taking the limit as it goes to 0 gives (A69). □

Lemma A6.

I (\cdot)

is weakly differentiable on

Q_{n} (A, k, a)

. For any

F_{0}, F \in Q_{n} (A, k, a)

, the weak derivative at

F_{0}

in the direction of F is given by

I^{'} (F_{0}, F) = \int_{A} h (x; F_{0}) d F (x) - h_{Y} (F_{0}) .

(A75)

Proof.

The proof largely follows Appendix E from [14]. The step that requires special attention is the application of the Dominated Convergence Theorem in (27) of [14]—that is, we would like to show the integrability of

\begin{matrix} | (p (y; F) - p (y; F_{0})) ln (\frac{1}{2} p (y; F_{0})) | & \leq | p (y; F) ln p (y; F_{0}) | + | p (y; F_{0}) ln p (y; F_{0}) | \end{matrix}

(A76)

\begin{matrix} + (p (y; F) + p (y; F_{0})) ln 2 . \end{matrix}

(A77)

Since

F_{0}, F \in Q_{n} (A, k, a)

, there exists

b > 0

for which

F_{0}, F \in P_{n} (A, k, b)

. Then, (A77) follows by Lemma A4. □

Appendix E. Analycity of Marginal Entropy Density

Lemma A7.

For any

b > 0

and

F \in P_{n} (A, k, b)

, the extension of

h (x; F)

to

z \in C^{n}

given by

h (z; F) = - \int_{R^{n}} p_{N} (y - z) ln p (y; F) d y

(A78)

is continuous in

z

.

Proof.

Let

z \in C^{n}

. Fix

ϵ > 0

and consider

\bar{B_{C^{n}, ϵ}} (z)

, the ball of radius

ϵ

around

z

in

C^{n}

. For any sequence

{z_{m}}_{m = 0}^{\infty} \subseteq C^{n}

converging to

z

, there exists

M \geq 0

such that

z_{m} \in \bar{B_{C^{n}, ϵ}} (z)

for each

m \geq M

. Therefore, it suffices to show that

{lim}_{m \to \infty} h (z_{m}; F) = h (z; F)

for each sequence

{z_{m}}_{m = 0}^{\infty} \subseteq \bar{B_{C^{n}, ϵ}} (z)

.

Since the extension of

p_{N} (u)

to

C^{n}

is continuous,

\begin{matrix} lim_{m \to \infty} h (z_{m}; F) & = lim_{m \to \infty} - \int_{R^{n}} p_{N} (y - z_{m}) ln p (y; F) d y \end{matrix}

(A79)

\begin{matrix} = - \int_{R^{n}} lim_{m \to \infty} p_{N} (y - z_{m}) ln p (y; F) d y \end{matrix}

(A80)

\begin{matrix} = - \int_{R^{n}} p_{N} (y - z) ln p (y; F) d y \end{matrix}

(A81)

\begin{matrix} = h (z; F), \end{matrix}

(A82)

where (A80) is due to the Dominated Convergence Theorem, which will be justified next.

Let

y \in R^{n}

and

z_{m} = α_{m} + i β_{m} \in \bar{B_{C^{n}, ϵ}} (z)

. Prior to finding a dominating function for the entire integrand in (A80), we establish the following upper bound on

| p_{N} (y - z_{m}) |

:

\begin{matrix} | p_{N} (y - z_{m}) | & = \frac{1}{\sqrt{{(2 π)}^{n} | Σ |}} |e^{- \frac{1}{2} (y - z_{m}) Σ^{- 1} (y - z_{m})}| \end{matrix}

(A83)

\begin{matrix} = \frac{1}{\sqrt{{(2 π)}^{n} | Σ |}} e^{- \frac{1}{2} (y - α_{m}) Σ^{- 1} (y - α_{m})} e^{\frac{1}{2} β_{m} Σ^{- 1} β_{m}} \end{matrix}

(A84)

\begin{matrix} \leq \frac{1}{\sqrt{{(2 π)}^{n} | Σ |}} e^{- \frac{1}{2 σ_{n}^{2}} {∥ y - α_{m} ∥}^{2}} e^{\frac{1}{2 σ_{1}^{2}} {∥ β_{m} ∥}^{2}} \end{matrix}

(A85)

\begin{matrix} \leq \frac{e^{\frac{1}{2 σ_{1}^{2}} ϵ^{2}}}{\sqrt{{(2 π)}^{n} | Σ |}} e^{- \frac{1}{2 σ_{n}^{2}} {∥ y - α_{m} ∥}^{2}} \end{matrix}

(A86)

\begin{matrix} \leq \frac{e^{\frac{1}{2 σ_{1}^{2}} ϵ^{2}}}{\sqrt{{(2 π)}^{n} | Σ |}} max_{u \in \bar{B_{ϵ}} (z)} e^{- \frac{1}{2 σ_{n}^{2}} {∥ y - u ∥}^{2}} \end{matrix}

(A87)

\begin{matrix} \leq \frac{e^{\frac{1}{2 σ_{1}^{2}} ϵ^{2}}}{\sqrt{{(2 π)}^{n} | Σ |}} \{\begin{matrix} e^{- \frac{1}{2 σ_{n}^{2}} {(1 - \frac{ϵ}{∥ y ∥})}^{2} {∥ y ∥}^{2}}, & if y \notin {0} \cup \bar{B_{ϵ}} (z) \\ 1, & if y \in {0} \cup \bar{B_{ϵ}} (z) \end{matrix} \end{matrix}

(A88)

\begin{matrix} ≜ Θ (y) . \end{matrix}

(A89)

Now, by Lemma A4,

\begin{matrix} | p_{N} (y - z_{m}) ln p (y; F) | & \leq Θ (y) (η ∥ y ∥^{2} + κ_{b}), \end{matrix}

(A90)

which is integrable with respect to

y

. □

Lemma A8.

For any

F \in P_{n} (A, k, a)

,

h (x; F)

has an analytic extension to an entire function on

C^{n}

.

Proof.

For convenience of notation, we will prove the case of

n = 2

here.

Consider the extension of

h (x; F)

to

z \in C^{2}

:

\begin{matrix} h (z; F) & = - \int_{R^{2}} p_{N} (y - z) ln p (y; F) d y \end{matrix}

(A91)

\begin{matrix} = - \int_{R^{2}} p_{N_{1}} (y_{1} - z_{1}) p_{N_{2}} (y_{2} - z_{2}) ln p (y; F) d y \end{matrix}

(A92)

\begin{matrix} = - \int_{R^{2}} κ_{1} e^{- \frac{{(y_{1} - z_{1})}^{2}}{2 σ_{1}^{2}}} κ_{2} e^{- \frac{{(y_{2} - z_{2})}^{2}}{2 σ_{2}^{2}}} ln p (y; F) d y, \end{matrix}

(A93)

where, for

i \in {1, 2}

,

κ_{i} ≜ \frac{1}{\sqrt{2 π σ_{i}^{2}}} .

(A94)

We will show that

h ((z_{1}, z_{2}); F)

is an entire function in

z_{1} \in C

for fixed

z_{2}

. Similarly, by the symmetry of the problem,

h ((z_{1}, z_{2}); F)

is an entire function in

z_{2} \in C

for fixed

z_{1}

. We finally conclude, by Hartog’s Theorem [29], that

h ((z_{1}, z_{2}); F)

is entire on

C^{2}

. Therefore, it suffices to show that

h ((\cdot, z_{2}); F)

is entire, for which we use Morera’s Theorem.

Morera’s Theorem requires that the function under consideration, in this case

h (\cdot, z_{2}; F)

, be continuous, which holds by Lemma A7. If, for any closed smooth curve

θ (t)

, defined for

0 \leq t \leq 1

, and any fixed

z_{2} \in C

,

\begin{matrix} \int_{0}^{1} h (θ (t), z_{2}; F) θ^{'} (t) d t & = - \int_{0}^{1} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} p_{N_{2}} (y_{2} - z_{2}) \times \\ ln p ((y_{1}, y_{2}); F) d y_{1} d y_{2} θ^{'} (t) d t \end{matrix}

(A95)

\begin{matrix} = 0, \end{matrix}

(A96)

then, by Morera’s Theorem,

h ((\cdot, z_{2}); F)

is entire. Furthermore, by the Fubini–Tonelli Theorem [18], if

\begin{matrix} \int_{0}^{1} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} |κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} p_{N_{2}} (y_{2} - z_{2}) ln p ((y_{1}, y_{2}); F)| d y_{1} d y_{2} | θ^{'} (t) | d t < \infty, \end{matrix}

(A97)

then the order of integration in (A95) can be interchanged such that integration with respect to t is performed first. Under this condition, since the extension of

p_{N_{1}} (u_{1}) = κ_{1} e^{- \frac{u_{1}^{2}}{2 σ_{1}^{2}}}

(A98)

to

z_{1} \in C

is analytic, we obtain

\int_{0}^{1} κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} p_{N_{2}} (y_{2} - z_{2}) ln p ((y_{1}, y_{2}); F) θ^{'} (t) d t = 0,

(A99)

thereby fulfilling the condition for Morera’s Theorem in (A96). It remains only to justify the application of the Fubini–Tonelli theorem by showing (A97).

To upper bound the integrand on the left side of (A97), let

α_{0}, β_{0} > 0

be sufficiently large such that

{θ (t) | 0 \leq t \leq 1} \subseteq {z \in C ∣ R e {z} \in [- α_{0}, α_{0}], I m {z} \in [- β_{0}, β_{0}]} .

(A100)

By Lemma A1, there exists

κ \geq 0

such that, for all

y \in R^{2}

,

\begin{matrix} | ln p (y; F) | & \leq {η ∥ y ∥}^{2} + κ . \end{matrix}

(A101)

We proceed by splitting the integral with respect to

y_{1}

into two intervals:

\begin{matrix} A_{1} & ≜ (- \infty, 0), \end{matrix}

(A102)

\begin{matrix} A_{2} & ≜ [0, \infty) . \end{matrix}

(A103)

For

0 \leq t \leq 1

, let

θ (t) = α (t) + i β (t)

. Let

y_{1} \in A_{2}

and note that, since

y_{1} > 0

,

\begin{matrix} - \frac{1}{2 σ_{1}^{2}} {(y_{1} - α (t))}^{2} & \leq \frac{1}{2 σ_{1}^{2}} (- y_{1}^{2} + 2 y_{1} α_{0} - {(α (t))}^{2}) \end{matrix}

(A104)

\begin{matrix} \leq \frac{1}{2 σ_{1}^{2}} (- y_{1}^{2} + 2 y_{1} α_{0}) \end{matrix}

(A105)

\begin{matrix} = - \frac{1}{2 σ_{1}^{2}} {(y_{1} - α_{0})}^{2} + \frac{1}{2 σ_{1}^{2}} α_{0}^{2} . \end{matrix}

(A106)

From (A106), we obtain the upper bound

\begin{matrix} |e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}}| & = |e^{- \frac{1}{2 σ_{1}^{2}} ({(y_{1} - α (t))}^{2} - {(β (t))}^{2} - 2 i (y_{1} - α (t)) β (t))}| \end{matrix}

(A107)

\begin{matrix} = e^{\frac{1}{2 σ_{1}^{2}} {(β (t))}^{2}} e^{- \frac{1}{2 σ_{1}^{2}} {(y_{1} - α (t))}^{2}} \end{matrix}

(A108)

\begin{matrix} \leq e^{\frac{1}{2 σ_{1}^{2}} (α_{0}^{2} + β_{0}^{2})} e^{- \frac{1}{2 σ_{1}^{2}} {(y_{1} - α_{0})}^{2}} . \end{matrix}

(A109)

Therefore, substituting (A105), for any

0 \leq t \leq 1

,

\begin{matrix} \int_{0}^{\infty} |κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} ln p ((y_{1}, y_{2}); F)| d y_{1} \end{matrix}

(A110)

\begin{matrix} \leq κ_{1} e^{\frac{1}{2 σ_{1}^{2}} (α_{0}^{2} + β_{0}^{2})} \int_{0}^{\infty} e^{- \frac{1}{2 σ_{1}^{2}} {(y_{1} - α_{0})}^{2}} (η (y_{1}^{2} + y_{2}^{2}) + κ) d y_{1} \end{matrix}

(A111)

\begin{matrix} = c_{0} y_{2}^{2} + c_{1}, \end{matrix}

(A112)

for some constants

0 \leq c_{0}, c_{1} < \infty

. Applying similar reasoning when

y_{1} \in A_{1}

shows that there are constants

0 \leq c_{2}, c_{3} < \infty

for which

\int_{- \infty}^{0} |κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} ln p ((y_{1}, y_{2}); F)| d y_{1} \leq c_{2} y_{2}^{2} + c_{3} .

(A113)

We now integrate with respect to

y_{2}

. For any

z_{2} = τ + i r \in C

and

0 \leq t \leq 1

,

\begin{matrix} ψ (θ (t), z_{2}) & ≜ \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} |κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} p_{U_{2}} (y_{2} - z_{2}) ln p ((y_{1}, y_{2}); F)| d y_{1} d y_{2} \end{matrix}

(A114)

\begin{matrix} = \int_{- \infty}^{\infty} \sum_{i = 1}^{2} \int_{A_{i}} |κ_{1} e^{- \frac{{(y_{1} - θ (t))}^{2}}{2 σ_{1}^{2}}} κ_{2} e^{- \frac{{(y_{2} - z_{2})}^{2}}{2 σ_{2}^{2}}} ln p ((y_{1}, y_{2}); F)| d y_{1} d y_{2} \end{matrix}

(A115)

\begin{matrix} \leq κ_{2} e^{\frac{r^{2}}{2 σ_{2}^{2}}} \int_{- \infty}^{\infty} e^{- \frac{{(y_{2} - τ)}^{2}}{2 σ_{2}^{2}}} [c_{0} y_{2}^{2} + c_{1} + c_{2} y_{2}^{2} + c_{3}] d y_{2} \end{matrix}

(A116)

\begin{matrix} < \infty . \end{matrix}

(A117)

Therefore,

ψ (θ (t), z_{2})

is uniformly bounded over

{θ (t) | 0 \leq t \leq 1}

and

\int_{0}^{1} | ψ (θ (t), z_{2}) θ^{'} (t) | d t < \infty .

(A118)

□

Appendix F. Polynomial Bounds

Lemma A9.

The function

f (L) = min_{∥ y ∥ = L} \sum_{i = 1}^{n} y_{i}^{2 k}

(A119)

satisfies

f (L) \geq L^{2 k} / n^{k}

.

Proof.

First, note that, for any

L \geq 0

and

y \in R^{n}

satisfying

L = ∥ y ∥

, we have

\begin{matrix} max_{i \in {1, \dots, n}} y_{i}^{2} & \geq \frac{L^{2}}{n} . \end{matrix}

(A120)

Then, for any

L \geq 0

, we obtain the lower bound

\begin{matrix} f (L) & = min_{∥ y ∥ = L} \sum_{i = 1}^{n} y_{i}^{2 k} \end{matrix}

(A121)

\begin{matrix} \geq min_{∥ y ∥ = L} max_{i \in {1, \dots, n}} y_{i}^{2 k} \end{matrix}

(A122)

\begin{matrix} \geq \frac{L^{2 k}}{n^{k}} . \end{matrix}

(A123)

□

Lemma A10.

For any

l \in Z_{\geq 0}^{n}

, the function

ξ_{l} (L) = max_{∥ y ∥ = L} | y^{l} |

(A124)

satisfies

ξ_{l} (L) \leq L^{d (l)}

, where

d (l) = l_{1} + \dots + l_{n}

.

Proof.

For

∥ y ∥ = L

,

\begin{matrix} ξ_{l} (L) & = max_{∥ y ∥ = L} | y^{l} | \end{matrix}

(A125)

\begin{matrix} = max_{∥ y ∥ = L} | y_{1}^{l_{1}} \dots y_{n}^{l_{n}} | \end{matrix}

(A126)

\begin{matrix} \leq max_{∥ y ∥ = L} L^{l_{1}} \dots L^{l_{n}} \end{matrix}

(A127)

\begin{matrix} = L^{d (l)} . \end{matrix}

(A128)

□

References

Alireza Banani, S.; Vaughan, R.G. Compensating for Non-Linear Amplifiers in MIMO Communications Systems. IEEE Trans. Antennas Propag. 2012, 60, 700–714. [Google Scholar] [CrossRef]
Liang, C.P.; Jong, J.H.; Stark, W.E.; East, J.R. Nonlinear amplifier effects in communications systems. IEEE Trans. Microw. Theory Tech. 1999, 47, 1461–1466. [Google Scholar] [CrossRef] [Green Version]
Raich, R.; Zhou, G.T. On the modeling of memory nonlinear effects of power amplifiers for communication applications. In Proceedings of the 2002 IEEE 10th Digital Signal Processing Workshop, 2002 and the 2nd Signal Processing Education Workshop, Pine Mountain, GA, USA, 16 October 2002; pp. 7–10. [Google Scholar] [CrossRef]
Smith, J.G. The information capacity of amplitude- and variance-constrained sclar Gaussian channels. Inf. Control. 1971, 18, 203–219. [Google Scholar] [CrossRef] [Green Version]
Rassouli, B.; Clerckx, B. On the Capacity of Vector Gaussian Channels With Bounded Inputs. IEEE Trans. Inf. Theory 2016, 62, 6884–6903. [Google Scholar] [CrossRef]
Shamai, S.; Bar-David, I. The capacity of average and peak-power-limited quadrature Gaussian channels. IEEE Trans. Inf. Theory 1995, 41, 1060–1071. [Google Scholar] [CrossRef] [Green Version]
Dytso, A.; Al, M.; Poor, H.V.; Shamai Shitz, S. On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective. IEEE Trans. Inf. Theory 2019, 65, 3907–3921. [Google Scholar] [CrossRef]
Dytso, A.; Yagli, S.; Poor, H.V.; Shamai Shitz, S. The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. IEEE Trans. Inf. Theory 2020, 66, 2006–2022. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
Sommerfeld, J.; Bjelakovic, I.; Boche, H. On the boundedness of the support of optimal input measures for Rayleigh fading channels. In Proceedings of the 2008 IEEE International Symposium on Information Theory, Toronto, ON, Canada, 6–11 July 2008; pp. 1208–1212. [Google Scholar] [CrossRef] [Green Version]
Dytso, A.; Goldenbaum, M.; Shamai, S.; Poor, H.V. Upper and Lower Bounds on the Capacity of Amplitude-Constrained MIMO Channels. In Proceedings of the GLOBECOM 2017—2017 IEEE Global Communications Conference, Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
Chan, T.H.; Hranilovic, S.; Kschischang, F.R. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai (Shitz), S. Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity. Entropy 2019, 21, 200. [Google Scholar] [CrossRef] [Green Version]
Fahs, J.J.; Abou-Faycal, I.C. Using Hermite Bases in Studying Capacity-Achieving Distributions Over AWGN Channels. IEEE Trans. Inf. Theory 2012, 58, 5302–5322. [Google Scholar] [CrossRef]
Fahs, J.; Tchamkerten, A.; Yousefi, M.I. On the Optimal Input of the Nondispersive Optical Fiber. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 131–135. [Google Scholar] [CrossRef]
Bhaskara Rao, K.; Bhaskara Rao, M. Theory of Charges; Pure and Applied Mathematics; Academic Press: New York, NY, USA, 1983. [Google Scholar]
Han, T.S. Information-Spectrum Methods in Information Theory; Springer: Berlin/Heidelberg, Germany, 2003. [Google Scholar] [CrossRef]
Dudley, R.M. Integration. In Real Analysis and Probability, 2nd ed.; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 2002; pp. 114–151. [Google Scholar] [CrossRef]
Rose, K. A mapping approach to rate-distortion computation and analysis. IEEE Trans. Inf. Theory 1994, 40, 1939–1952. [Google Scholar] [CrossRef] [Green Version]
Johnston, W. The Weighted Hermite Polynomials Form a Basis for L²(R). Am. Math. Mon. 2014, 121, 249–253. [Google Scholar] [CrossRef]
REED, M.; SIMON, B. II—Hilbert Spaces. In Methods of Modern Mathematical Physics; Reed, M., Simon, B., Eds.; Academic Press: New York, NY, USA, 1972; pp. 36–66. [Google Scholar] [CrossRef]
Alajaji, F.; Chen, P. An Introduction to Single-User Information Theory; Springer Undergraduate Texts in Mathematics and Technology; Springer: Singapore, 2018. [Google Scholar]
Sitaram, A.; Sundari, M.; Thangavelu, S. Uncertainty principles on certain Lie groups. Proc. Indian Acad. Sci. Math. Sci. 1995, 105, 135–151. [Google Scholar] [CrossRef]
Krantz, S.; Parks, H. A Primer of Real Analytic Functions; Advanced Texts Series; Birkhäuser: Boston, MA, USA, 2002. [Google Scholar]
Shiryaev, A.; Chibisov, D. Probability-1; Graduate Texts in Mathematics; Springer: New York, NY, USA, 2016. [Google Scholar]
Dupuis, P.; Ellis, R. A Weak Convergence Approach to the Theory of Large Deviations; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 1997. [Google Scholar]
Billingsley, P. Convergence of Probability Measures; Wiley Series in Probability and Statistics; Wiley: New York, NY, USA, 2013. [Google Scholar]
Ghourchian, H.; Gohari, A.; Amini, A. Existence and Continuity of Differential Entropy for a Class of Distributions. IEEE Commun. Lett. 2017, 21, 1469–1472. [Google Scholar] [CrossRef] [Green Version]
Gunning, R.; Rossi, H. Analytic Functions of Several Complex Variables; Ams Chelsea Publishing, AMS Chelsea Pub.: Englewood Cliffs, NJ, USA, 2009. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Eisen, J.; Mazumdar, R.R.; Mitran, P. Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support. Entropy 2023, 25, 1180. https://doi.org/10.3390/e25081180

AMA Style

Eisen J, Mazumdar RR, Mitran P. Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support. Entropy. 2023; 25(8):1180. https://doi.org/10.3390/e25081180

Chicago/Turabian Style

Eisen, Jonah, Ravi R. Mazumdar, and Patrick Mitran. 2023. "Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support" Entropy 25, no. 8: 1180. https://doi.org/10.3390/e25081180

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support

Abstract

1. Introduction

2. Prior Work

3. Results

3.1. Optimization Problem

3.2. Hilbert Space and Hermite Polynomial Representation

3.3. Non-Constancy of $s (\cdot; F^{*})$

3.4. Main Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Convexity and Compactness of Optimization Space

Appendix B. Necessary Conditions for the Capacity-Achieving Distribution

Appendix C. Integrability Results

Appendix D. Properties of the Objective Functional

Appendix E. Analycity of Marginal Entropy Density

Appendix F. Polynomial Bounds

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Capacity-Achieving Input Distributions of Additive Vector Gaussian Noise Channels: Even-Moment Constraints and Unbounded or Compact Support

Abstract

1. Introduction

2. Prior Work

3. Results

3.1. Optimization Problem

3.2. Hilbert Space and Hermite Polynomial Representation

3.3. Non-Constancy of s ( · ; F * )

3.4. Main Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Convexity and Compactness of Optimization Space

Appendix B. Necessary Conditions for the Capacity-Achieving Distribution

Appendix C. Integrability Results

Appendix D. Properties of the Objective Functional

Appendix E. Analycity of Marginal Entropy Density

Appendix F. Polynomial Bounds

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.3. Non-Constancy of $s (\cdot; F^{*})$