Next Article in Journal
A Semi-Quantum Secret-Sharing Protocol with a High Channel Capacity
Next Article in Special Issue
Overview of Tensor-Based Cooperative MIMO Communication Systems—Part 1: Tensor Modeling
Previous Article in Journal
Position-Wise Gated Res2Net-Based Convolutional Network with Selective Fusing for Sentiment Analysis
Previous Article in Special Issue
Information Rates for Channels with Fading, Side Information and Adaptive Codewords
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution

1
Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milano, Italy
2
Qualcomm, Bridgewater, NJ 08807, USA
*
Author to whom correspondence should be addressed.
Entropy 2023, 25(5), 741; https://doi.org/10.3390/e25050741
Submission received: 13 February 2023 / Revised: 23 April 2023 / Accepted: 25 April 2023 / Published: 30 April 2023
(This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives III)

Abstract

:
This paper studies the secrecy capacity of an n-dimensional Gaussian wiretap channel under a peak power constraint. This work determines the largest peak power constraint R ¯ n , such that an input distribution uniformly distributed on a single sphere is optimal; this regime is termed the low-amplitude regime. The asymptotic value of R ¯ n as n goes to infinity is completely characterized as a function of noise variance at both receivers. Moreover, the secrecy capacity is also characterized in a form amenable to computation. Several numerical examples are provided, such as the example of the secrecy-capacity-achieving distribution beyond the low-amplitude regime. Furthermore, for the scalar case ( n = 1 ) , we show that the secrecy-capacity-achieving input distribution is discrete with finitely many points at most at the order of R 2 σ 1 2 , where σ 1 2 is the variance of the Gaussian noise over the legitimate channel.

1. Introduction

Consider the vector Gaussian wiretap channel with outputs
Y 1 = X + N 1 ,
Y 2 = X + N 2 ,
where X R n , N 1 N ( 0 n , σ 1 2 I n ) and N 2 N ( 0 n , σ 2 2 I n ) , and with ( X , N 1 , N 2 ) being mutually independent. The output Y 1 is observed by the legitimate receiver, whereas the output Y 2 is observed by the malicious receiver. In this work, we are interested in the scenario where the input X is limited by a peak power constraint or amplitude constraint, and assume that X B 0 ( R ) = { x : x R } , i.e., B 0 ( R ) is an n-ball centered at the origin and of radius R . For this setting, the secrecy capacity is given by
C s ( σ 1 2 , σ 2 2 , R , n ) = max X B 0 ( R ) I ( X ; Y 1 ) I ( X ; Y 2 )
= max X B 0 ( R ) I ( X ; Y 1 | Y 2 ) ,
where the last expression holds due to the (stochastically) degraded nature of the channel. It can be shown that for σ 1 2 σ 2 2 the secrecy capacity is equal to zero. Therefore, in the remainder, we assume that σ 1 2 < σ 2 2 .
We are interested in studying the input distribution P X that maximizes (3) in the low (but not vanishing) amplitude regime. Since closed-form expressions for secrecy capacity are rare, we derive the secrecy capacity in an integral form that is easy to evaluate. For the scalar case ( n = 1 ) , we establish an upper bound on the number of mass points of P X , valid for any amplitude regime. We also argue in Section 2.3 that the solution to the secrecy capacity can shed light on other problems seemingly unrelated to security. The paper also provides a number of numerical simulations of P X and C s , the data for which are made available at [1].

1.1. Literature Review

The wiretap channel was introduced by Wyner in [2], who also established the secrecy capacity of the degraded wiretap channel. The results of [2] were extended to the Gaussian wiretap channel in [3]. The wiretap channel plays a central role in network information theory; the interested reader is referred to [4,5,6,7,8] and references therein for a detailed treatment of the topic. Furthermore, for an in-depth discussion on the wiretap fading channel, refer to [9,10,11,12].
In [3], it was shown that the secrecy-capacity-achieving input distribution of the Gaussian wiretap channel, under an average power constraint, is Gaussian. In [13], the authors investigated the Gaussian wiretap channel consisting of two antennas, both at the transmitter and receiver sides, and of a single antenna for the eavesdropper. The secrecy capacity of the MIMO wiretap channel was characterized in [14,15], where the Gaussian input was shown to be optimal. An elegant proof, using the I-MMSE relationship [16], of the optimality of Gaussian input, is given in [17]. Moreover, an alternative approach in the characterization of the secrecy capacity of a MIMO wiretap channel was proposed in [18]. In [19,20], the authors discuss the optimal signaling for secrecy rate maximization under average power constraints.
The secrecy capacity of the Gaussian wiretap channel under the peak power constraint has received far less attention. The secrecy capacity of the scalar Gaussian wiretap channel with an amplitude and power constraint was considered in [21], where the authors showed that the capacity-achieving input distribution P X is discrete with finitely many support points.
The work of [21] was extended to noise-dependent channels by Soltani and Rezki in [22]. For further studies on the properties of the secrecy-capacity-achieving input distribution for a class of degraded wiretap channels, refer to [23,24,25].
The secrecy capacity for the vector wiretap channel with a peak power constraint was considered in [25], where it was shown that the optimal input distribution is concentrated on finitely many co-centric shells.

1.2. Contributions and Paper Outline

In Section 2, we introduce the mathematical tools, assumptions, and definitions used throughout the paper. Specifically, in Section 2.1, we introduce the oscillation theorem. In Section 2.2, we give a definition of low-amplitude regimes. Moreover, in Section 2.3, we show how the wiretap channel can be seen as a generalization of point-to-point channels and the evaluation of the largest minimum mean square error (MMSE), both under the assumption of amplitude-constrained input. In Section 2.4, we provide a definition of the Karush–Kuhn–Tucker (KKT) conditions for the wiretap channel.
In Section 3, we detail our main results. Theorem 2 provides a sufficient condition for the optimality of a single hypersphere. Theorem 3 and Theorem 4 give the conditions under which we can fully characterize the behavior of R ¯ n , that is, the radius below which we are in the low-amplitude regime, i.e., the optimal input distribution is composed of a single shell. Furthermore, Theorem 5 gives an implicit and an explicit upper bound on the number of mass points of the secrecy-capacity-achieving input distribution when n = 1 .
In Section 4, we derive the secrecy capacity expression for the low-amplitude regime in Theorem 6. We also investigate its behavior when the number of antennas n goes to infinity.
Section 5 extends the investigation of the secrecy capacity beyond the low-amplitude regime. We numerically estimate both the optimal input pmf and the resulting capacity via an algorithmic procedure based on the KKT conditions introduced in Lemma 2.
Section 6, Section 7, Section 8 and Section 9 provide the proof for Theorem 3 and Theorem 4–6, respectively. Finally, Section 10 concludes the paper.

1.3. Notation

We use bold letters for vectors ( x ) and uppercase letters for random variables (X). We denote by x the Euclidean norm of the vector x . Given a vector x R n and a scalar a, with a little abuse of notation, we denote a · e 1 + x by a + x , where e 1 = [ 1 , 0 , , 0 ] is the first vector in the standard basis of the Euclidean vector space R n . Given a random variable X, its probability density function (pdf), pmf, and cumulative distribution function are denoted by f X , P X , and F X , respectively. The support set of P X is denoted and defined as
supp ( P X ) = { x : for every open set D x we have that P X ( D ) > 0 } .
We denote by N ( μ , Σ ) a multivariate Gaussian distribution with mean vector μ and covariance matrix Σ . The pdf of a Gaussian random variable with zero mean and variance σ 2 is denoted by ϕ σ ( · ) . We denote by χ n 2 ( λ ) the noncentral chi-square distribution with n degrees of freedom and with noncentrality parameter λ . We represent the n × 1 vector of zeros by 0 n and the n × n identity matrix by I n . Furthermore, we represent by D the relative entropy. The minimum mean squared error is denoted by
mmse ( X | X + N ) = E X E [ X | X + N ] 2 .
The modified Bessel function of the first kind of order v 0 is denoted by I v ( x ) , x R . The following ratio of the Bessel functions is commonly used in this work:
h v ( x ) = I v ( x ) I v 1 ( x ) , x R , v 0 .
Finally, the number of zeros (counted in accordance with their multiplicities) of a function f : R R on the interval I is denoted by N ( I , f ) . Similarly, if f : C C is a function on the complex domain, N ( D , f ) denotes the number of its zeros within the region D .

2. Preliminaries

2.1. Oscillation Theorem

In this work, we often need to upper bound the number of oscillations of a function, i.e., its number of sign changes. This is useful, for example, to bound the number of zeros of a function or the number of roots of an equation. To be more precise, let us define the number of sign changes as follows.
Definition 1
(Sign Changes of a Function). The number of sign changes of a function ξ : Ω R is given by
S ( ξ ) = sup m N sup y 1 < < y m Ω N { ξ ( y i ) } i = 1 m ,
where N { ξ ( y i ) } i = 1 m is the number of sign changes of the sequence { ξ ( y i ) } i = 1 m .
Definition 2
(Totally Positive Kernel). A function f : I 1 × I 2 R is said to be a totally positive kernel of order n if det [ f ( x i , y j ) ] i , j = 1 m > 0 for all 1 m n , for all x 1 < < x m I 1 , and y 1 < < y m I 2 . If f is a totally positive kernel of order n for all n N , then f is a strictly totally positive kernel.
In [26], Karlin noticed that some integral transformations have a variation-diminishing property, which is described in the following theorem.
Theorem 1
(Oscillation Theorem). Given domains I 1 and I 2 , let p : I 1 × I 2 R be a strictly totally positive kernel. For an arbitrary y, suppose p ( · , y ) : I 1 R is an n-times differentiable function. Assume that μ is a measure on I 2 , and let ξ : I 2 R be a function with S ( ξ ) = n . For x I 1 , define
Ξ ( x ) = ξ ( y ) p ( x , y ) d μ ( y ) .
If Ξ : I 1 R is an n-times differentiable function, then either N ( I 1 , Ξ ) n , or Ξ 0 .
The above theorem says that the number of zeros of a function Ξ , which is the output of the integral transformation, is less than the number of sign changes of the function ξ , which is the input to the integral transformation.

2.2. Low-Amplitude Regime

In this work, a low-amplitude regime is defined as follows.
Definition 3.
Let X R P X R be uniform on C ( R ) = { x : x = R } . The capacity in (3) is said to be in the low-amplitude regime if R R ¯ n ( σ 1 2 , σ 2 2 ) , where
R ¯ n ( σ 1 2 , σ 2 2 ) = max R : P X R = arg max P X : X B 0 ( R ) I ( X ; Y 1 | Y 2 ) .
If the set in (9) is empty, then we assign R ¯ n ( σ 1 2 , σ 2 2 ) = 0 .
The quantity R ¯ n ( σ 1 2 , σ 2 2 ) represents the largest radius R , for which P X R is secrecy-capacity-achieving.
One of the main objectives of this work is to characterize R ¯ n ( σ 1 2 , σ 2 2 ) .

2.3. Connections to Other Optimization Problems

The distribution P X R occurs in a variety of statistical and information-theoretic applications. For example, consider the following two optimization problems:
max P X : X B 0 ( R ) I ( X ; X + N ) ,
max P X : X B 0 ( R ) mmse ( X | X + N ) ,
where N N ( 0 n , σ 2 I n ) . The first problem seeks to characterize the capacity of the point-to-point channel under an amplitude constraint, and the second problem seeks to find the largest minimum mean squared error under the assumption that the signal has bounded amplitude; the interested reader is referred to [27,28,29] for a detailed background on both problems.
Similarly to the wiretap channel, we can define the low-amplitude regime for both problems as the largest R such that P X R is optimal and denote these by R ¯ n ptp ( σ 2 ) and R ¯ n MMSE ( σ 2 ) . We now argue that both R ¯ n ptp ( σ 2 ) and R ¯ n MMSE ( σ 2 ) can be seen as a special case of the wiretap solution. Hence, the wiretap channel provides an interesting unification and generalization of these two problems.
First, note that the point-to-point solution can be recovered from the wiretap by simply specializing the wiretap channel to the point-to-point channel, that is,
R ¯ n ptp ( σ 2 ) = lim σ 2 R ¯ n ( σ 2 , σ 2 2 ) .
Second, to see that the MMSE solution can be recovered from the wiretap, recall that by the I-MMSE relationship [16] we have that
max P X : X B 0 ( R ) I ( X ; Y 1 ) I ( X ; Y 2 )
= max P X : X B 0 ( R ) 1 2 σ 1 2 mmse ( X | X + s Z ) s 2 d s 1 2 σ 2 2 mmse ( X | X + s Z ) s 2 d s
= max P X : X B 0 ( R ) 1 2 σ 1 2 σ 2 2 mmse ( X | X + s Z ) s 2 d s
where Z is standard Gaussian. Now, note that if we choose σ 2 2 = σ 1 2 + ϵ , then by the mean value theorem we arrive at
max P X : X B 0 ( R ) I ( X ; Y 1 ) I ( X ; Y 2 ) = max P X : X B 0 ( R ) ϵ 2 mmse ( X | X + σ 1 2 Z ) σ 1 4 + o ( ϵ ) ,
where lim ϵ 0 + o ( ϵ ) / ϵ = 0 . Consequently, for a small enough ϵ > 0 ,
R ¯ n MMSE ( σ 2 ) = R ¯ n ( σ 2 , σ 2 + ϵ ) .

2.4. KKT Conditions

Let us define the secrecy density for the vector Gaussian wiretap channel as
Ξ ( x ; P X ) = D ( f Y 1 | X ( · | x ) f Y 1 ) D ( f Y 2 | X ( · | x ) f Y 2 ) ,
where D ( · · ) is the relative entropy.
For the scalar case ( n = 1 ) , the KKT conditions are necessary and sufficient to ensure that P X is capacity-achieving [21].
Lemma 1.
P X maximizes (3) if, and only if,
Ξ ( x ) = C s ( σ 1 2 , σ 2 2 , R , 1 ) , x supp ( P X ) ,
Ξ ( x ) C s ( σ 1 2 , σ 2 2 , R , 1 ) , x [ R , R ] ,
where for x R
Ξ ( x ) = D ( f Y 1 | X ( · | x ) f Y 1 ) D ( f Y 2 | X ( · | x ) f Y 2 )
= E g ( Y 1 ) | X = x + log σ 2 σ 1 ,
and where
g ( y ) = E log f Y 2 ( y + N ) f Y 1 ( y ) , y R ,
with N N ( 0 , σ 2 2 σ 1 2 ) .
Proof. 
The first part of Lemma 1 was shown in [21]. The proof of (21) goes as follows:
D ( f Y 1 | X ( · | x ) f Y 1 ) D ( f Y 2 | X ( · | x ) f Y 2 ) log σ 2 σ 1
= log 1 f Y 1 ( y ) ϕ σ 1 ( y x ) d y log 1 f Y 2 ( y ) E [ ϕ σ 1 ( y x N ) ] d y
= log 1 f Y 1 ( y ) ϕ σ 1 ( y x ) d y E log 1 f Y 2 ( y + N ) ϕ σ 1 ( y x ) d y
= E log f Y 2 ( y + N ) f Y 1 ( y ) ϕ σ 1 ( y x ) d y
= g ( y ) ϕ σ 1 ( y x ) d y ,
where N N ( 0 , σ 2 2 σ 1 2 ) and (24) hold by noticing that ϕ σ 2 ( y x ) can be reformulated as the convolution of Gaussian pdfs E [ ϕ σ 1 ( y x N ) ] ; in (25) we applied the change in variable y y + N . This concludes the proof.    □
The convexity of the optimization problem is also guaranteed for the vector wiretap model in (1) with n > 1 . Then, the results of Lemma 1 can be extended to the vector case as follows.
Lemma 2.
P X maximizes (3) if, and only if,
Ξ ( x ; P X ) = C s ( σ 1 2 , σ 2 2 , R , n ) , x supp ( P X ) ,
Ξ ( x ; P X ) C s ( σ 1 2 , σ 2 2 , R , n ) , x B 0 ( R ) ,
where for x R n
Ξ ( x ; P X ) = D ( f Y 1 | X ( · | x ) f Y 1 ) D ( f Y 2 | X ( · | x ) f Y 2 )
= E g ( Y 1 ) | X = x ,
and where
g ( y ) = E log f Y 2 ( y + N ) f Y 1 ( y ) + n log σ 2 σ 1 , y R n ,
with N N ( 0 n , ( σ 2 2 σ 1 2 ) I n ) .
Proof. 
This is a straightforward vector extension of Lemma 1.    □
Thanks to the spherical symmetry of the additive noise distributions and of P X , the secrecy density Ξ ( x ; P X ) can be expressed as a function of x only. Therefore, we denote the secrecy density in spherical coordinates by Ξ ˜ ( x ; P X ) , and give a rigorous definition in (A9).

3. Main Results

3.1. A New Sufficient Condition on the Optimality of P X R

Our first main result provides a sufficient condition for the optimality of P X R .
Theorem 2.
If
R < σ 1 2 n 1 σ 1 2 1 σ 2 2 ,
then P X R is secrecy-capacity-achieving.
Proof. 
Let us consider the equivalent definition of the secrecy density in spherical coordinates (A9). Note that if the derivative of Ξ ˜ ( x ; P X R ) makes at most one sign change, from negative to positive, then the maximum of x Ξ ˜ ( x ; P X R ) occurs at either x = 0 or x = R .
From Lemma A1 in the Appendix B, the derivative of Ξ ˜ is as given below
Ξ ˜ ( x ; P X R ) = x E M ˜ 2 ( σ 1 Q n + 2 ) M 1 ( σ 1 Q n + 2 )
where Q n + 2 2 is a noncentral chi-square random variable with n + 2 degrees of freedom and noncentrality parameter x 2 σ 1 2 , and
M i ( y ) = 1 σ i 2 R y h n 2 R σ i 2 y 1 , i { 1 , 2 }
M ˜ 2 ( y ) = E M 2 ( y + W ) ,
where W N ( 0 n + 2 , ( σ 2 2 σ 1 2 ) I n + 2 ) . A calculation related to (33) was erroneously performed in [27]. However, this error does not change the results of [27] as only the sign of the derivative is important and not the value itself. Note that Ξ ˜ ( 0 ; P X R ) = 0 and that Ξ ˜ ( x ; P X R ) > 0 for a sufficiently large x ; in fact, we have
Ξ ˜ ( x ; P X R ) > x 1 σ 1 2 1 σ 2 2 x σ 1 2 E R σ 1 Q n + 2
= x 1 σ 1 2 1 σ 2 2 x σ 1 2 E R x h n 2 x σ 1 Q n
x 1 σ 1 2 1 σ 2 2 R σ 1 2 ,
where (36) follows from 0 h n 2 ( x ) 1 for x 0 ; (37) follows by noticing that R σ 1 t f Q n + 2 2 ( t ) = R x h n 2 x σ 1 t f Q n 2 ( t ) ; and finally, (38) holds by h n 2 ( x ) 1 .
Then, to show that Ξ ˜ ( x ; P X R ) is maximized in x = R , we need to prove that Ξ ˜ ( x ; P X R ) changes sign at most once. To that end, we need Karlin’s oscillation theorem presented in Section 2.1. By using (33), the fact that the pdf of a chi-square is a positive defined kernel [26], and Theorem 1, the number of sign changes of Ξ ˜ ( x ; P X R ) is upper-bounded by the number of sign changes of
G σ 1 , σ 2 , R , n ( y ) = M ˜ 2 ( y ) M 1 ( y ) ,
for y R + . Note that
G σ 1 , σ 2 , R , n ( y ) 1 σ 2 2 + 1 σ 1 2 R σ 1 2 y h n 2 R σ 1 2 y
1 σ 2 2 + 1 σ 1 2 R 2 σ 1 4 n ,
where the inequality in (40) follows from h n 2 ( x ) 0 for x 0 , and (41) follows from h n 2 ( x ) x n for x 0 and n N . We conclude by noting that (41) is nonnegative, hence has no sign change, for
R < σ 1 2 n 1 σ 1 2 1 σ 2 2
for all y R + , thus guaranteeing that P X R is secrecy-capacity-achieving.    □
Remark 1.
As a consequence of the proof of Theorem 2, for any R 0 , σ 2 σ 1 0 and n N , if G σ 1 , σ 2 , R , n ( y ) has at most one sign change, then P X R is secrecy-capacity-achieving if, and only if, for all x = R
Ξ ( 0 ; P X R ) Ξ ( x ; P X R ) .
Because of the difficulty in evaluating analytical properties of (39), proving that G σ 1 , σ 2 , R , n has at most one sign change does not seem easy. However, in Appendix A, we show via extensive numerical evaluations that G σ 1 , σ 2 , R , n changes sign at most once for any n , R , σ 1 , σ 2 that we tried.

3.2. Characterizing the Low-Amplitude Regime

Let us characterize the low-amplitude regime as follows.
Theorem 3.
Consider a function
f ( R ) = σ 1 2 σ 2 2 E h n 2 2 s Z R s + h n 2 2 R + s Z R s 1 s 2 d s
where Z N ( 0 n , I n ) . If G σ 1 , σ 2 , R , n of (39) has at most one sign change, the input X R is secrecy-capacity-achieving if, and only if, R R ¯ n ( σ 1 2 , σ 2 2 ) , where R ¯ n ( σ 1 2 , σ 2 2 ) is given as the solution of
f ( R ) = 0 .
Remark 2.
Note that (45) always has a solution. To see this, observe that f ( 0 ) = 1 σ 2 2 1 σ 1 2 < 0 and f ( ) = 1 σ 1 2 1 σ 2 2 > 0 . Moreover, the solution is unique because f ( R ) monotonically increases for R 0 .
The solution to (45) needs to be found numerically. To avoid any loss of accuracy in the numerical evaluation of h v ( x ) for large values of x, we used the exponential scaling provided in the MATLAB implementation of I v ( x ) . Since evaluating f ( R ) is rather straightforward and not time-consuming, we opted for a binary search algorithm.
In Table 1, we show the values of R ¯ n ( 1 , σ 2 2 ) for some values of  σ 2 2 and n. Moreover, we report the values of R ¯ n ptp ( 1 ) and R ¯ n MMSE ( 1 ) from [27] in the first and the last row, respectively. As predicted by (12), we can appreciate the close match of the R ¯ n ptp ( 1 ) row with the one of R ¯ n ( 1 , 1000 ) . Similarly, the agreement between the R ¯ n MMSE ( 1 ) row and the R ¯ n ( 1 , 1.001 ) row is justified by (16).

3.3. Large n Asymptotics

We now use the result in Theorem 3 to characterize the asymptotic behavior of R ¯ n ( σ 1 2 , σ 2 2 ) . In particular, it is shown that R ¯ n ( σ 1 2 , σ 2 2 ) increases as n .
Theorem 4.
For σ 1 2 σ 2 2
lim n R ¯ n ( σ 1 2 , σ 2 2 ) n = c ( σ 1 2 , σ 2 2 ) ,
where c = c ( σ 1 2 , σ 2 2 ) is the solution of
σ 1 2 σ 2 2 c 2 s 2 + s 4 + c 2 2 + c 2 ( c 2 + s ) s 2 + s 2 4 + c 2 ( c 2 + s ) 2 1 s 2 d s = 0 .
Proof. 
See Section 7.    □
In Figure 1, for σ 1 2 = 1 and σ 2 2 = 1.001 , 1.5 , 10 , 1000 , we show the behavior of R ¯ n ( 1 , σ 2 2 ) / n and how its asymptotic converges to c ( 1 , σ 2 2 ) .

3.4. Scalar Case ( n = 1 )

For the scalar case, the optimal input distribution P X is discrete. In this regime, we provide an implicit and an explicit upper bound on the number of support points of the optimal input probability mass function (pmf) P X .
Theorem 5.
Let Y 1 and Y 2 be the secrecy-capacity-achieving output distributions at the legitimate and malicious receivers, respectively, and let
g ( y ) = E log f Y 2 ( y + N ) f Y 1 ( y ) , y R ,
with N N ( 0 , σ 2 2 σ 1 2 ) . For R > 0 , an implicit upper bound on the number of support points of P X is
| supp ( P X ) | N [ L , L ] , g ( · ) + κ 1 <
where
κ 1 = log σ 2 σ 1 C s ,
L = R σ 2 + σ 1 σ 2 σ 1 + σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 .
Moreover, an explicit upper bound on the number of support points of P X is obtained by using
N [ L , L ] , g ( · ) + κ 1 ρ R 2 σ 1 2 + O ( log ( R ) ) ,
where ρ = ( 2 e + 1 ) 2 σ 2 + σ 1 σ 2 σ 1 2 + σ 2 + σ 1 σ 2 σ 1 + 1 2 .
The upper bounds in Theorem 5 are generalizations of the upper bounds on the number of points presented in [30] in the context of a point-to-point AWGN channel with an amplitude constraint. Indeed, if we let σ 2 , while keeping σ 1 and R fixed, then the wiretap channel reduces to the AWGN point-to-point channel.
To find a lower bound on the number of mass points, a possible approach consists of the following steps:
C s ( σ 1 2 , σ 2 2 , R , 1 ) = I ( X ; Y 1 ) I ( X ; Y 2 )
H ( X ) I ( X ; Y 2 )
log ( | supp ( P X ) | ) I ( X ; Y 2 ) ,
where the above uses the nonnegativity of the entropy and the fact that entropy is maximized by a uniform distribution. Furthermore, by using a suboptimal uniform (continuous) distribution on [ R , R ] as an input and the entropy power inequality, the secrecy capacity is lower-bounded by
C s ( σ 1 2 , σ 2 2 , R , 1 ) 1 2 log 1 + 2 R 2 π e σ 1 2 1 + R 2 σ 2 2 .
Combining the bounds in (55) and (56), we arrive at the following lower bound on the number of points:
| supp ( P X ) | 1 + 2 R 2 π e σ 1 2 1 + R 2 σ 2 2 e I ( X ; Y 2 ) .
At this point, one needs to determine the behavior of I ( X ; Y 2 ) . A trivial lower bound on | supp ( P X ) | can be found by lower-bounding I ( X ; Y 2 ) by zero. However, this lower bound on | supp ( P X ) | does not grow with R , while the upper bound does increase with R . A possible way of establishing a lower bound that increases in R is by showing that I ( X ; Y 2 ) 1 2 log 1 + R 2 σ 2 2 . However, because not much is known about the structure of the optimal input distribution P X , it is not immediately evident how one can establish such an approximation or whether it is valid.

4. Secrecy Capacity Expression in the Low-Amplitude Regime

The result in Theorem 3 can also be used to establish the secrecy capacity for all R R ¯ n ( σ 1 2 , σ 2 2 ) , as is performed next.
Theorem 6.
If G σ 1 , σ 2 , R , n of (39) has at most one sign change and if R R ¯ n ( σ 1 2 , σ 2 2 ) , then
C s ( σ 1 2 , σ 2 2 , R , n ) = 1 2 σ 1 2 σ 2 2 R 2 R 2 E h n 2 2 R + s Z R s s 2 d s .
Proof. 
See Section 9.    □

Large n Asymptotics

It is important to note that as R ¯ n ( σ 1 2 , σ 2 2 ) grows as n , according to Theorem 4, when we keep R constant and increase the number of antennas to infinity, the low-amplitude regime becomes the only regime. The next theorem characterizes the secrecy capacity in this ‘massive-MIMO’ regime (i.e., where R is fixed and n goes to infinity).
Theorem 7.
Consider the expression in (58) and fix R 0 and σ 1 2 σ 2 2 , then
lim n C s ( σ 1 2 , σ 2 2 , R , n ) = R 2 1 2 σ 1 2 1 2 σ 2 2 .
Proof. 
See Appendix C.    □
Remark 3.
The result in Theorem 7 is reminiscent of the capacity in the wideband regime [31, Ch. 9], where the capacity increases linearly in the signal-to-noise ratio. Similarly, Theorem 7 shows that in the large antenna regime, the secrecy capacity grows linearly with the difference in the single-to-noise ratio between the legitimate user and the eavesdropper.
In Theorem 7, R was held fixed. It is also interesting to study the case when R is a function of n. Specifically, it is interesting to study the case when R = c n for some coefficient c.
Theorem 8.
Suppose that c c ( σ 1 2 , σ 2 2 ) . Then,
lim n C s ( σ 1 2 , σ 2 2 , c n , n ) n = 1 2 log 1 + c 2 / σ 1 2 1 + c 2 / σ 2 2 .
Proof. 
See Appendix D.    □
Notice that (60) is equivalent to the secrecy capacity of a vector Gaussian wiretap channel subject to an average power constraint. Gaussian wiretap channels under average power constraints have been extensively investigated [3,32] and, for an average power constraint E [ X 2 ] P , the resulting secrecy capacity is given by [3]
C G ( σ 1 2 , σ 2 2 , P , n ) = n 2 log 1 + P / σ 1 2 1 + P / σ 2 2 .
Thus, the result in (60) can be restated as
lim n C s ( σ 1 2 , σ 2 2 , c n , n ) C G ( σ 1 2 , σ 2 2 , c 2 , n ) = 1 .
In other words, for the regime considered in Theorem 8, for a large enough n the secrecy capacity under the amplitude constraint R n = c n behaves as the secrecy capacity under the average power constraint c 2 .

5. Beyond the Low-Amplitude Regime

To evaluate the secrecy capacity and find the optimal distribution P X beyond R ¯ n we rely on numerical estimations. We remark that, as pointed out in [25], the secrecy-capacity-achieving distribution is isotropic and consists of finitely many co-centric shells. Keeping this in mind, we can find the optimal input distribution P X by just optimizing over P X with X R .

5.1. Numerical Algorithm

In the case of scalar Gaussian wiretap channels, the secrecy capacity and the optimal input pmf can be estimated via the algorithm described in [33], i.e., a numerical procedure that takes inspiration from the deterministic annealing algorithm sketched in [34]. Let us denote by C ^ s ( σ 1 2 , σ 2 2 , R , n ) the numerical estimate of the secrecy capacity, and by P ^ X , the estimate of the optimal pmf on the input norm. To numerically evaluate C ^ s ( σ 1 2 , σ 2 2 , R , n ) and P ^ X , we extend to the vector case the algorithm in [33]. Our extension is defined in Algorithm 1. The input parameters of the main function are the noise variances σ 1 2 and σ 2 2 , the radius R , the vectors ρ and p being, respectively, the mass points positions and probabilities of a tentative input pmf, the number of iterations in the while loop N c , and finally, a tolerance ε to set the precision of the secrecy capacity estimate.
Algorithm 1 Secrecy capacity and optimal input pmf estimation
1:
procedure Main σ 1 2 , σ 2 2 , R , ρ , p , N c , ε
2:
    repeat
3:
         k 0
4:
        while  k < N c  do
5:
            k k + 1
6:
            ρ Gradient Ascent ( ρ , p )
7:
            p BlahutArimoto ( ρ , p )
8:
        end while
9:
        valid ← KKT Validation ( ρ , p , ε )
10:
        if valid = False then
11:
            ( ρ , p ) AddPoint ( ρ , p )
12:
        end if
13:
    until valid = True  
14:
     P ^ X ( ρ , p )  
15:
     C ^ s σ 1 2 , σ 2 2 , R , n I s ( X ; P ^ X )  
16:
    return  P ^ X , C ^ s σ 1 2 , σ 2 2 , R , n  
17:
end procedure
At its core, the numerical procedure iteratively refines its estimate of P X by running a gradient ascent algorithm to update the vector ρ and a variant of the Blahut–Arimoto algorithm [35] to update p .
The Gradient Ascent procedure uses the secrecy information as the objective function and stops either when ρ has reached convergence or at a given maximum number of iterations. Let us denote by I s ( X ; P X ) the secrecy information as a function of the input norm. Notice that, given a tentative pmf P ^ X of mass points ρ , probabilities p , and | supp ( P ^ X ) | = K , we have
I s ( X ; P ^ X ) = i = 1 K p i · Ξ ˜ ρ i ; P ^ X ,
where Ξ ˜ ( t ; P ^ X ) is the secrecy density, with respect to the input norm, defined in (A9) and where p i and ρ i are, respectively, the ith element of p and ρ . Then, the Gradient Ascent updates are given by
ρ i = ρ i + α · ρ i I s ( X ; P ^ X ) , i = 1 , , K ,
where the partial derivatives are defined in Appendix E and α is the step size in the gradient ascent. We remark that, to ensure convergence to a local maximum, we use the gradient ascent algorithm in a backtracking line search version [36]. By suitably adjusting the step size α at each iteration, the backtracking line search version guarantees us that each new update of ρ provides a nondecreasing associated secrecy information, compared to the previous update of ρ .
The Blahut–Arimoto function runs a variant of the Blahut–Arimoto algorithm. For the scalar case, an example of the Blahut–Arimoto optimization, applied to wiretap channels, is given in [37]. Similar results can be extended to the case of vector wiretap channels. Given the current probabilities p i ’s, the updates are obtained by evaluating
p i = p i exp Ξ ˜ ρ i ; P ^ X , i = 1 , , K ,
and finally, by normalizing each p i and assigning them to the entries of the vector p
p i = p i k = 1 K p i , i = 1 , , K .
Similarly to Gradient Ascent, the Blahut–Arimoto procedure stops either when the values of p have reached a stable convergence or after a set number of updates.
Since the joint optimization of ρ and p is not numerically feasible, we need to reiterate both the Blahut–Arimoto and the Gradient Ascent procedures a given number of times, namely N c . The parameter N c is chosen empirically in such a way that ρ and p become fairly stable, and therefore we can expect to have reached joint convergence for both of them.
Then, the KKT Validation procedure ensures that the values of ρ and p are indeed close to the optimal ones. We check the optimality of P ^ X by verifying whether the KKT conditions in Lemma 2 are satisfied. Since the algorithm has to verify the KKT conditions numerically, i.e., with finite precision, we find it more convenient to check the negated version of (28), where a tolerance parameter ε is introduced that trades off accuracy with computational burden. Specifically, P ^ X is not an optimal input pmf if any of the following conditions are satisfied:
| Ξ ˜ ( t ; P ^ X ) I s ( X ; P ^ X ) | > ε , for some t supp ( P ^ X )
I s ( X ; P ^ X ) + ε < Ξ ˜ ( t ; P ^ X ) , for some t [ 0 , R ] .
Note that in (67), in place of the secrecy capacity C s ( σ 1 2 , σ 2 2 , R , n ) , which is unknown, we used the secrecy information given by the tentative pmf P ^ X , i.e., I s ( X ; P ^ X ) . Condition (67a) is derived by negating (28a): there exists a t supp ( P ^ X ) , such that Ξ ˜ ( t ; P ^ X ) is ε -away from the secrecy information I s ( X ; P ^ X ) . Condition (67b) is the negated version of (28b): there exists a t [ 0 , R ] such that Ξ ˜ ( t ; P ^ X ) is at least ε -larger than the secrecy information I s ( X ; P ^ X ) . With some abuse of notation, we refer to (67) as to the ε -KKT conditions. If the tentative pmf P ^ X does not pass the check of the ε -KKT conditions, then the algorithm checks whether a new point has to be added to the pmf.
The Add Point procedure evaluates the position of the new mass point
ρ new = arg max t [ 0 , R ] Ξ ˜ ( t ; P ^ X ) .
The point ρ new is appended to the vector ρ and the probabilities p are set to be equiprobable.
The whole procedure is repeated until KKT Validation gives a positive outcome, and at that point the algorithm returns P ^ X as the optimal pmf estimate and C ^ s ( σ 1 2 , σ 2 2 , R , n ) as the secrecy capacity estimate.
Remark 4.
In this work, we focus on the secrecy capacity and on the secrecy-capacity-achieving input distribution. However, it is possible to study other points of the rate-equivocation region of the degraded wiretap Gaussian channel by suitably changing the KKT conditions, as reported in [21], Equations (33) and (34). With the due modifications, the proposed optimization algorithm can find the optimal input distribution for any point of the rate-equivocation region.

5.2. Numerical Results

In Figure 2, we show with black dots the numerical estimate C ^ s ( σ 1 2 , σ 2 2 , R , n ) versus R , evaluated via Algorithm 1, for σ 1 2 = 1 , σ 2 2 = 1.5 , 10 , n = 2 , 4 , and tolerance ε = 10 6 . For the same values of σ 1 2 , σ 2 2 , and n we also show, with the red lines, the analytical low-amplitude regime secrecy capacity C s ( σ 1 2 , σ 2 2 , R , n ) versus R from Theorem 6. In addition, we show with blue dotted lines the secrecy capacity under the average power constraint E X 2 R 2 :
C G ( σ 1 2 , σ 2 2 , R 2 , n ) = n 2 log 1 + R 2 / σ 1 2 1 + R 2 / σ 2 2 C s ( σ 1 2 , σ 2 2 , R , n ) ,
where the inequality follows by noting that the average power constraint E X 2 R 2 is weaker than the amplitude constraint X R . Finally, the dashed vertical lines show R ¯ n , i.e., the upper limit of the low-amplitude regime, for the considered values of σ 1 2 , σ 2 2 , and n.
In Figure 3, we consider discrete values for R and for each value of R we plot the corresponding estimated pmf P ^ X , evaluated via Algorithm 1, for σ 1 2 = 1 , σ 2 2 = 1.5 , n = 2 , 8 , and tolerance ε = 10 6 . The figure shows, at each R , the normalized amplitude of support points in the estimated pmf, while the size of the circles qualitatively shows the probability associated with each support point. Similarly, Figure 4 shows the evolution of the pmf estimate for σ 1 2 = 1 , σ 2 2 = 10 , n = 2 , 8 , and ε = 10 6 . It is interesting to notice how in both Figure 3 and Figure 4 when a new mass point is added to the pmf, it appears in zero. Moreover, the mass point of radius R always seems to be optimal.
Finally, Figure 5 shows the output distributions of the legitimate user and of the eavesdropper in the case of σ 1 2 = 1 , σ 2 2 = 10 , n = 2 , and for two values of R . At the top of the figure, the distributions are shown for R = 2.25 , which is a value close to R ¯ 2 ( 1 , 10 ) . At the bottom of the figure, the distributions are shown for R = 7.5 . For both values of R , the legitimate user sees an output distribution where the co-centric rings of the input distribution are easily distinguishable. On the other hand, as expected, the output distribution seen by the eavesdropper is close to a Gaussian.

6. Proof of Theorem 3

Estimation Theoretic Representation

By Remark 1, if G σ 1 , σ 2 , R , n has at most one sign change, P X R is secrecy-capacity-achieving if, and only if, for all x = R
Ξ ( 0 ; P X R ) Ξ ( x ; P X R ) .
We seek to re-write the condition (70) in the estimation theoretic form. To that end, we need the following representation of the relative entropy [38]:
D ( P X 1 + t Z P X 2 + t Z ) = 1 2 t g ( s ) s 2 d s ,
where
g ( s ) = E X 1 2 ( X 1 + s Z ) 2 E X 1 1 ( X 1 + s Z ) 2
and where
i ( y ) = E [ X i | X i + s Z = y ]
= x i f X i | X i + s Z ( x i y ) d x i , i { 1 , 2 } .
Another fact that will be important for our expression is
E X R X R + s Z = y = R y y h n 2 y R s ,
see, for example [27], for the proof.
Next, using (71) and (75) note that for any x = R we have that for i { 1 , 2 }
D ( P x + σ i 2 Z P X R + σ i 2 Z ) = 1 2 σ i 2 E x R ( x + s Z ) x + s Z h n 2 x + s Z R s 2 s 2 d s
= 1 2 σ i 2 E x 2 E R ( x + s Z ) x + s Z h n 2 x + s Z R s 2 s 2 d s
= 1 2 σ i 2 R 2 R 2 E h n 2 2 x + s Z R s s 2 d s ,
where (77) follows from
mmse ( X R | Y ) = E X R E [ X R | Y ] 2
= E X R 2 E E [ X R | Y ] 2 .
Moreover, for x = 0 , it holds
D ( P 0 + σ i 2 Z P X R + σ i 2 Z ) = 1 2 σ i 2 R 2 E h n 2 2 R Z s s 2 d s .
Now, note that by using the definition of Ξ ( x ; P X R ) in (30), (78), and (81) we have that for x = R
Ξ ( x ; P X R ) = D ( P x + σ 1 2 Z P X R + σ 1 2 Z ) D ( P x + σ 2 2 Z P X R + σ 2 2 Z )
= 1 2 σ 1 2 σ 2 2 R 2 R 2 E h n 2 2 x + s Z R s s 2 d s ,
and
Ξ ( 0 ; P X R ) = D ( P 0 + σ 1 2 Z P X R + σ 1 2 Z ) D ( P 0 + σ 2 2 Z P X R + σ 2 2 Z )
= 1 2 σ 1 2 σ 2 2 R 2 E h n 2 2 s Z R s s 2 d s
Consequently, the necessary and sufficient condition in Theorem 2 can be equivalently written as
σ 1 2 σ 2 2 E h n 2 2 s Z R s + h n 2 2 x + s Z R s 1 s 2 d s 0 .
Now R ¯ n ( σ 1 2 , σ 2 2 ) will be the largest R that satisfies (86), which concludes the proof of Theorem 3.

7. Proof of Theorem 4

The objective of the proof is to understand how the condition in (45) behaves as n . To study the large n behavior, we need to the following bounds on the h ν [39,40]: for ν > 1 2
h ν ( x ) = x 2 ν 1 2 + ( 2 ν 1 ) 2 4 + x 2 · g ν ( x ) ,
where
1 g ν ( x ) 2 ν 1 2 + ( 2 ν 1 ) 2 4 + x 2 ν + ν 2 + x 2 .
Now let R = c n for some c > 0 . The goal is to understand the behavior of
E h n 2 2 s Z R s + h n 2 2 x + s Z R s
as n goes to infinity. First, let
V n = Z n ,
and note that
lim n E h n 2 2 s Z c n s = lim n E c V n s n 1 2 n + ( n 1 ) 2 4 n 2 + c V n s 2 · g n 2 c V n s n 2
= E lim n c V n s n 1 2 n + ( n 1 ) 2 4 n 2 + c V n s 2 · g n 2 c V n s n 2
= c 2 s 2 + s 4 + c 2 2 ,
where (92) follows from the dominated convergence theorem, and (93) follows since, by the law of large numbers we have, almost surely,
lim n V n 2 = lim n 1 n i = 1 n Z i 2 = E [ Z 2 ] = 1 .
Second, let
W n = x + s Z n ,
where, without loss of generality, we take x = [ R , 0 , , 0 ]
lim n E h n 2 2 x + s Z c n s = lim n E c W n s · g n 2 c W n s n n 1 2 n + ( n 1 ) 2 4 n 2 + c W n s 2 2
= E lim n c W n s · g n 2 c W n s n n 1 2 n + ( n 1 ) 2 4 n 2 + c W n s 2 2
= c 2 ( c 2 + s ) s 2 + s 2 4 + c 2 ( c 2 + s ) 2 ,
where (97) follows from the dominated convergence theorem and where (98) follows since, by the strong law of large numbers we have, almost surely,
lim n W n 2 = lim n 1 n ( s Z 1 + c n ) 2 + s lim n 1 n i = 2 n Z i 2
= c 2 + s .
Combining (93) and (98) with (45), we arrive at
σ 1 2 σ 2 2 c 2 s 2 + s 4 + c 2 2 + c 2 ( c 2 + s ) s 2 + s 2 4 + c 2 ( c 2 + s ) 2 1 s 2 d s = 0 .

8. Proof of Theorem 5

8.1. Implicit Upper Bound

A consequence of the KKT conditions of Lemma 1 is the inclusion
supp ( P X ) x [ R , R ] : Ξ ( x ) C s = 0
which suggests the following upper bound on the number of support points of P X :
| supp ( P X ) | N [ R , R ] , Ξ ( x ) C s ( σ 1 2 , σ 2 2 , R , 1 )
= N [ R , R ] , E g ( Y 1 ) + log σ 2 σ 1 C s | X = x
S g ( · ) + log σ 2 σ 1 C s
N R , g ( · ) + log σ 2 σ 1 C s
= N [ L , L ] , g ( · ) + log σ 2 σ 1 C s
< ,
where (104) follows from using (21); (105) follows from applying Karlin’s oscillation Theorem 1 and the fact that the Gaussian pdf is a strictly totally positive kernel, which was shown in [26]; (107) is proved in Lemma A3 in the Appendix B; and (108) follows because g ( · ) is an analytic function in ( L , L ) . The implicit upper bound (49) of Theorem 5 follows from (107) and (108).

8.2. Explicit Upper Bound

The key to finding an explicit upper bound on the number of zeros will be the following complex-analytic result.
Lemma 3
(Tijdeman’s Number of Zeros Lemma [41]). Let L , s , t be positive numbers, such that s > 1 . For the complex valued function f 0 , which is analytic on | z | < ( s t + s + t ) L , its number of zeros N ( D L , f ) within the disk D L = { z : | z | L } satisfies
N ( D L , f ) 1 log s log max | z | ( s t + s + t ) L | f ( z ) | log max | z | t L | f ( z ) | .
Furthermore, the following loosened version of the implicit upper bound in (49) will be useful.
Lemma 4.
| supp ( P X ) | N [ L , L ] , h ( · ) + 1
where
h ( y ) σ 1 2 f Y 1 ( y ) = E N E [ X | Y 2 = y + N ] y σ 2 2 E [ X | Y 1 = y ] y σ 1 2
= E N log f Y 2 ( y + N ) σ 2 2 σ 1 2 E [ X | Y 1 = y ] y σ 1 2 ,
and where N N ( 0 , σ 2 2 σ 1 2 ) .
Proof. 
Starting from (107), we can write
| supp ( P X ) | N [ L , L ] , g ( · ) + log σ 2 σ 1 C s
N [ L , L ] , g ( · ) + 1
= N [ L , L ] , σ 1 2 f Y 1 ( · ) g ( · ) + 1
where in step (114), we applied Rolle’s theorem, and in step (115), we used the fact that multiplying by a strictly positive function (i.e., σ 1 2 f Y 1 ) does not change the number of zeros. The first derivative of g can be computed as follows:
g ( y ) = E d d y log f Y 2 ( y + N ) d d y log f Y 1 ( y )
= E N E [ X | Y 2 = y + N ] y σ 2 2 E [ X | Y 1 = y ] y σ 1 2 ,
where in the last step, we used the well-known Tweedie’s formula (see for example [42,43]):
E [ X | Y i = y ] = y + σ i 2 d d y log f Y i ( y ) .
An alternative expression for the first term in the right-hand side (RHS) of (116) is as follows:
E d d y log f Y 2 ( y + N ) = f N ( n ) d d y log f Y 2 ( y + n ) d n
= d d n f N ( n ) · log f Y 2 ( y + n ) d n
= n σ 2 2 σ 1 2 f N ( n ) · log f Y 2 ( y + n ) d n
= 1 σ 2 2 σ 1 2 E N log f Y 2 ( y + N ) ,
where f N ( n ) = ϕ σ 2 2 σ 1 2 ( n ) . The proof is concluded by letting
h ( y ) σ 1 2 f Y 1 ( y ) g ( y ) .
To apply Tijdeman’s number of zeros Lemma, upper and lower bounds to the maximum module of the complex analytic extension of h over the disk D L = { z : | z | L } are proposed in Lemmas A4 and A5 in the Appendix B. Using those bounds, we can provide an upper bound on the number of mass points as follows:
N [ L , L ] , h ( · )
N D L , h ˘ ( · )
min s > 1 , t > 0 log max | z | ( s t + s + t ) L | h ˘ ( z ) | max | z | t L | h ˘ ( z ) | log s
log e ( 2 e + 1 ) 2 L 2 2 σ 1 2 2 π σ 1 2 a 1 ( 2 e + 1 ) 2 L 2 + a 2 ( 2 e + 1 ) L + a 3 c 1 L c 2 R exp ( L + R ) 2 2 σ 1 2 2 π σ 1 2
= ( 2 e + 1 ) 2 L 2 2 σ 1 2 + ( L + R ) 2 2 σ 1 2 + log a 1 ( 2 e + 1 ) 2 L 2 + a 2 ( 2 e + 1 ) L + a 3 c 1 L c 2 R = ( 2 e + 1 ) 2 ( d 1 R + d 2 ) 2 2 σ 1 2 + ( ( d 1 + 1 ) R + d 2 ) 2 2 σ 1 2
+ log a 1 ( 2 e + 1 ) 2 ( d 1 R + d 2 ) 2 + a 2 ( 2 e + 1 ) ( d 1 R + d 2 ) + a 3 ( c 1 d 1 c 2 ) R + c 1 d 2
b 1 R 2 σ 1 2 + b 2 + log b 3 R 2 + b 4 R + b 5 b 6 R + b 7
b 1 R 2 σ 1 2 + O ( log ( R ) ) ,
where (124) follows because extending to a larger domain can only increase the number of zeros; (125) follows from the Tijdeman’s Number of Zeros Lemma; (126) follows from choosing s = e and t = 1 and using bounds in Lemmas A4 and A5; (128) follows from using the value of L in (A38); (129) using the bound ( a + b ) 2 2 ( a 2 + b 2 ) and defining
b 1 = ( 2 e + 1 ) 2 d 1 2 + ( d 1 + 1 ) 2
= ( 2 e + 1 ) 2 σ 2 + σ 1 σ 2 σ 1 2 + σ 2 + σ 1 σ 2 σ 1 + 1 2
b 2 = ( ( 2 e + 1 ) 2 + 1 ) d 2 2 σ 1 2
= ( ( 2 e + 1 ) 2 + 1 ) σ 1 2 σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2
= ( ( 2 e + 1 ) 2 + 1 ) 1 + 2 σ 2 2 σ 2 2 σ 1 2 C s
b 3 = 2 ( 2 e + 1 ) 2 a 1 d 1 2
= 2 ( 2 e + 1 ) 2 3 σ 1 2 σ 2 2 σ 2 2 σ 1 2 σ 2 + σ 1 σ 2 σ 1 2
b 4 = ( 2 e + 1 ) d 1 a 2
= ( 2 e + 1 ) σ 2 + σ 1 σ 2 σ 1 2 σ 1 2 σ 2 2 σ 2 2 σ 1 2 + 2
b 5 = 2 ( 2 e + 1 ) 2 a 1 d 2 2 + ( 2 e + 1 ) a 2 d 2 + a 3 = 2 ( 2 e + 1 ) 2 3 σ 1 2 σ 2 2 σ 2 2 σ 1 2 σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 + ( 2 e + 1 ) 2 σ 1 2 σ 2 2 σ 2 2 σ 1 2 + 2 σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2
+ σ 1 2 σ 2 2 σ 1 2 · | log ( 2 π σ 2 2 ) | 2 + 24 ( σ 2 2 σ 1 2 ) 2 σ 2 4 + π 2
b 6 = c 1 d 1 c 2
= σ 2 2 σ 1 2 σ 2 2 σ 2 + σ 1 σ 2 σ 1 σ 2 2 + σ 1 2 σ 2 2 = 2 σ 1 σ 2
b 7 = c 1 d 2
= σ 2 2 σ 1 2 σ 2 2 σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 ;
and (130) follows from the fact that the b 1 , b 3 , b 4 , and b 6 coefficients do not depend on R and the fact that the coefficients b 2 , b 5 , and b 4 , while they do depend on R through C s , do not grow with R . The fact that C s does not grow with R follows from the bound in (69).
Finally, the explicit upper bound on the number of support points of P X in (52) is a consequence of (130).

9. Proof of Theorem 6

Using the KKT conditions in (28), we have that for x = [ R , 0 , , 0 ]
C s ( σ 1 2 , σ 2 2 , R , n ) = Ξ ( x ; P X R )
= D ( f Y 1 | X ( · | x ) f Y 1 ) D ( f Y 2 | X ( · | x ) f Y 2 )
= 1 2 σ 1 2 σ 2 2 R 2 R 2 E h n 2 2 R + s Z R s s 2 d s
where the last expression was computed in (83). This concludes the proof.

10. Conclusions

This paper has focused on the secrecy capacity of the n-dimensional vector Gaussian wiretap channel under the peak power (or amplitude constraint) in a so-called low (but not vanishing) amplitude regime. In this regime, the optimal input distribution P X R is supported on a single n-dimensional sphere of radius R . The paper has identified the largest R ¯ n , such that the distribution P X R is optimal. In addition, the asymptotic of R ¯ n has been completely characterized as dimension n approaches infinity. As a by-product of the analysis, the capacity in the low-amplitude regime has also been characterized in a more or less closed form. The paper has also provided a number of supporting numerical examples. Implicit and explicit upper bounds have been proposed on the number of mass points for the optimal input distribution P X in the scalar case with n = 1 .
There are several interesting future directions. For example, one interesting direction would be to determine a regime in which a mixture of a mass point at zero and P X R is optimal. It would also be interesting to establish a lower bound on the number of mass points in the support of the optimal input distribution when n = 1 . We note that such a lower bound was obtained for a point-to-point channel in [30]. We finally remark that the extension of the results of this paper to nondegraded wiretap channels is not trivial and also constitutes an interesting but ambitious future direction.

Author Contributions

A.F., L.B. and A.D. contributed equally to this work. All authors have read and agreed to the published version of the manuscript. Part of this work was presented at the 2021 IEEE Information Theory Workshop [44], at the 2022 IEEE International Symposium on Information Theory [45], at the 2022 IEEE International Mediterranean Conference on Communications and Networking [33], and in the PhD dissertation in [46].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Datasets for the numerical results provided in this work are available at [1].

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Examples of the Function Gσ1,σ2,R,n

In this section, we give supporting numerical arguments that the function G σ 1 , σ 2 , R , n defined in (39) has at most one sign change. Figure A1 demonstrates the behavior of the function G σ 1 , σ 2 , R , n . In addition, the code that generates the function G σ 1 , σ 2 , R , n for various values of n , σ 1 , and σ 2 is provided in [1].
Figure A1. Examples of the function G σ 1 , σ 2 , R , n defined in (39). (a) n = 3 , σ 1 = 1 , and σ 2 = 2 . (b) n = 11 , σ 1 = 1 , and σ 2 = 2 . (c) n = 4 , σ 1 = 3 , and σ 2 = 3.1 . (d) n = 11 , σ 1 = 3 , and σ 2 = 3.1 .
Figure A1. Examples of the function G σ 1 , σ 2 , R , n defined in (39). (a) n = 3 , σ 1 = 1 , and σ 2 = 2 . (b) n = 11 , σ 1 = 1 , and σ 2 = 2 . (c) n = 4 , σ 1 = 3 , and σ 2 = 3.1 . (d) n = 11 , σ 1 = 3 , and σ 2 = 3.1 .
Entropy 25 00741 g0a1

Appendix B. Derivative of the Secrecy-Density

Lemma A1.
The derivative of the secrecy density for the input P X R is
Ξ ˜ ( x ; P X R ) = x E M ˜ 2 ( σ 1 Q n + 2 ) M 1 ( σ 1 Q n + 2 )
where Q n + 2 2 is a noncentral chi-square random variable with n + 2 degrees of freedom and noncentrality parameter x 2 σ 1 2 and
M i ( y ) = 1 σ i 2 R y h n 2 R σ i 2 y 1 , i { 1 , 2 }
M ˜ 2 ( y ) = E M 2 ( y + W ) ,
where W N ( 0 n + 2 , ( σ 2 2 σ 1 2 ) I n + 2 ) .
Proof. 
We start with the secrecy density expressed in spherical coordinates. A quick way to obtain the information densities in this coordinate system is to note that:
I ( X ; Y i )
= h ( Y i ) h ( N i )
= h ( Y i ) + ( n 1 ) E [ log Y i ] + h λ Y i Y i h ( N i )
= h ( Y i 2 ) + n 2 1 E [ log Y i 2 ] + log π n 2 Γ n 2 n 2 log ( 2 π e σ i 2 )
= h σ i 2 X σ i + N ˜ i 2 + n 2 1 E log σ i 2 X σ i + N ˜ i 2 + log π n 2 Γ n 2 n 2 log ( 2 π e σ i 2 )
= h X σ i + N ˜ i 2 + n 2 1 E log X σ i + N ˜ i 2 log ( 2 e ) n 2 Γ n 2 ,
where (A5) holds by [47], Lemma 6.17, and by independence between Y i and Y i Y i ; the term h λ ( · ) is a differential entropy-like quantity for random vectors on the n-dimensional unit sphere ([47], Lemma 6.16); (A6) holds because Y i Y i is uniform on the unit sphere and thanks to [47], Lemma 6.15; the term Γ ( z ) is the gamma function; and in (A7) we have N ˜ i N ( 0 n , I n ) . It is now required to write the secrecy density as follows:
Ξ ˜ ( x ; P X ) = i 1 ( x ; P X ) i 2 ( x ; P X )
where
i j ( x ; P X ) = 0 f χ n 2 ( x 2 σ j 2 ) ( y ) log 0 R f χ n 2 ( t 2 σ j 2 ) ( y ) d P X ( t ) y n 2 1 d y log ( 2 e ) n 2 Γ n 2 ,
for j { 1 , 2 } . The term f χ n 2 ( λ ) ( y ) is the noncentral chi-square pdf with n degrees of freedom and noncentrality parameter λ .
Given two values ρ 1 , ρ 2 with ρ 1 > ρ 2 , write
i j ( ρ 1 ; P X ) i j ( ρ 2 ; P X ) = 0 f χ n 2 ( ρ 1 2 σ j 2 ) ( y ) f χ n 2 ( ρ 2 2 σ j 2 ) ( y ) log y n 2 1 f Y σ j 2 ( y ; P X ) d y
= 0 F χ n 2 ( ρ 2 2 σ j 2 ) ( y ) F χ n 2 ( ρ 1 2 σ j 2 ) ( y ) d d y log y n 2 1 f Y σ j 2 ( y ; P X ) d y
where we have integrated by parts and where F χ n 2 ( λ ) ( y ) is the cumulative distribution function of χ n 2 ( λ ) . Now notice that
0 F χ n 2 ( ρ 2 2 σ j 2 ) ( y ) F χ n 2 ( ρ 1 2 σ j 2 ) ( y ) d y = ρ 1 2 ρ 2 2 σ j 2 .
Since χ n 2 ( ρ 1 2 σ j 2 ) statistically dominates χ n 2 ( ρ 2 2 σ j 2 ) , the integrand function in (A13) is always positive. We can introduce an auxiliary output random variable Q j , for j { 1 , 2 } , with pdf
f Q j ( y ; ρ 1 , ρ 2 ) = σ j 2 ρ 1 2 ρ 2 2 F χ n 2 ( ρ 2 2 σ j 2 ) ( y ) F χ n 2 ( ρ 1 2 σ j 2 ) ( y ) ,
for y > 0 , to rewrite (A12) as follows:
i j ( ρ 1 ; P X ) i j ( ρ 2 ; P X ) = ρ 1 2 ρ 2 2 σ j 2 0 f Q j ( y ; ρ 1 , ρ 2 ) d d y log f Y σ j 2 ( y ; P X ) y n 2 1 d y .
We evaluate the derivative in (A15) as:
d d y log f Y σ j 2 ( y ; P X ) y n 2 1
= y n 2 1 f Y σ j 2 ( y ; P X ) 0 R d d y f χ n 2 ( t 2 σ j 2 ) ( y ) y n 2 1 d P X ( t )
= y n 2 1 f Y σ j 2 ( y ; P X ) 0 R f χ n 2 2 ( t 2 σ j 2 ) ( y ) 2 y n 2 1 1 2 + n 2 1 y f χ n 2 ( t 2 σ j 2 ) ( y ) y n 2 1 d P X ( t )
= E 1 2 f χ n 2 2 ( X 2 σ j 2 ) ( Y 2 σ j 2 ) f χ n 2 ( X 2 σ j 2 ) ( Y 2 σ j 2 ) 1 2 + n 2 1 Y 2 σ j 2 | Y 2 σ j 2 = y
= E 1 2 X Y I n 2 2 ( X Y σ j 2 ) I n 2 1 ( X Y σ j 2 ) 1 2 + n 2 1 Y 2 σ j 2 | Y 2 σ j 2 = y
= E 1 2 X Y h n 2 X Y σ j 2 1 2 | Y 2 σ j 2 = y
where, in (A16), we used
f Y σ j 2 ( y ; P X ) = 0 R f χ n 2 ( t 2 σ j 2 ) ( y ) d P X ( t ) ;
in (A17), we used the relationship
d d y f χ n 2 ( ρ 2 ) ( y ) = 1 2 f χ n 2 2 ( ρ 2 ) ( y ) 1 2 f χ n 2 ( ρ 2 ) ( y ) ;
and (A20) follows from the recurrence relationship
I ν 1 ( z ) I ν + 1 ( z ) = 2 ν z I ν ( z ) .
Putting together (A15) and (A20), we find
i j ( ρ 1 ; P X ) i j ( ρ 2 ; P X ) = ρ 1 2 ρ 2 2 2 σ j 2 E E X Y h n 2 X Y σ j 2 1 | Y 2 σ j 2 = Q j .
We are now in the position to compute the derivative of the information density as
i j ( ρ ; P X ) = lim h 0 i j ( ρ + h ; P X ) i j ( ρ ; P X ) h
= ρ σ j 2 E E X Y h n 2 X Y σ j 2 1 | Y 2 σ j 2 = Q ,
where Q χ n + 2 2 ( ρ 2 σ j 2 ) thanks to Lemma A2.
The final result is obtained by letting
Ξ ˜ ( x ; P x ) = i 1 ( x ; P X ) i 2 ( x ; P X )
and by specializing the result to the input P X R . □
Lemma A2.
Consider the pdf f Q j ( y ; ρ 1 , ρ 2 ) defined in (A14). For any ρ 0 we have
lim h 0 f Q j ( y ; ρ + h , ρ ) = f χ n + 2 2 ( ρ 2 σ j 2 ) ( y ) , y > 0 .
Proof. 
Thanks to the definition (A14), we have
lim h 0 f Q j ( y ; ρ + h , ρ )
= lim h 0 σ j 2 h ( 2 ρ + h ) F χ n 2 ( ρ 2 σ j 2 ) ( y ) F χ n 2 ( ( ρ + h ) 2 σ j 2 ) ( y )
= lim h 0 σ j 2 h ( 2 ρ + h ) 0 y f χ n 2 ( ρ 2 σ j 2 ) ( t ) f χ n 2 ( ( ρ + h ) 2 σ j 2 ) ( t ) d t
= σ j 2 2 ρ 0 y i = 0 lim h 0 1 h e ρ 2 2 σ j 2 ρ 2 2 σ j 2 i i ! e ( ρ + h ) 2 2 σ j 2 ( ρ + h ) 2 2 σ j 2 i i ! f χ n + 2 i 2 ( t ) d t
= σ j 2 2 ρ 0 y i = 0 d d ρ e ρ 2 2 σ j 2 ρ 2 2 σ j 2 i i ! f χ n + 2 i 2 ( t ) d t
= 1 2 0 y i = 0 e ρ 2 2 σ j 2 ρ 2 2 σ j 2 i i ! + e ρ 2 2 σ j 2 ρ 2 2 σ j 2 i 1 ( i 1 ) ! 1 ( i 1 ) f χ n + 2 i 2 ( t ) d t
= 1 2 0 y f χ n 2 ( ρ 2 σ j 2 ) ( t ) + f χ n + 2 2 ( ρ 2 σ j 2 ) ( t ) d t
= 0 y d d t f χ n + 2 2 ( ρ 2 σ j 2 ) ( t ) d t
= f χ n + 2 2 ( ρ 2 σ j 2 ) ( y ) ,
where 1 ( · ) is the indicator function; in (A31) we used the Poisson-weighted mixture representation of the noncentral chi-square pdf, and in (A35), we used (A22). □
Lemma A3.
There exists some L = L ( σ 1 , σ 2 , R ) < such that
N R , g ( · ) + log σ 2 σ 1 C s = N [ L , L ] , g ( · ) + log σ 2 σ 1 C s < .
Furthermore, L can be upper-bounded as follows:
L R d 1 + d 2
where
d 1 = σ 2 + σ 1 σ 2 σ 1 ,
d 2 = σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 σ 2 2 σ 1 2 σ 2 2 + 2 C G 1 σ 1 2 1 σ 2 2 ,
with
C G ( σ 1 2 , σ 2 2 , R 2 , 1 ) = 1 2 log 1 + R 2 / σ 1 2 1 + R 2 / σ 2 2 .
Proof. 
First, note that C s C G thanks to (69). Second, for | y | R , we can lower-bound the function g as follows:
g ( y ) = E log f Y 2 ( y + N ) log f Y 1 ( y )
= E log E [ ϕ σ 2 ( y + N X ) | N ] log E [ ϕ σ 1 ( y X ) ]
E log ϕ σ 2 ( y + N X ) log E [ ϕ σ 1 ( y X ) ]
log σ 1 σ 2 E ( y + N X ) 2 2 σ 2 2 + ( | y | R ) 2 2 σ 1 2
= log σ 1 σ 2 E ( y X ) 2 2 σ 2 2 σ 2 2 σ 1 2 2 σ 2 2 + ( | y | R ) 2 2 σ 1 2
log σ 1 σ 2 ( | y | + R ) 2 2 σ 2 2 σ 2 2 σ 1 2 2 σ 2 2 + ( | y | R ) 2 2 σ 1 2 ,
where (A44) follows from applying Jensen’s inequality and the law of iterated expectation to the first term; (A45) follows from
E [ ϕ σ 1 ( y X ) ] ϕ σ 1 ( | y | R ) , | y | R ;
and (A47) follows from ( y X ) 2 ( | y | + R ) 2 for all | y | R | X | . The RHS of
g ( y ) + log σ 2 σ 1 C s ( | y | + R ) 2 2 σ 2 2 σ 2 2 σ 1 2 2 σ 2 2 + ( | y | R ) 2 2 σ 1 2 C s
is strictly positive when
| y | > R 1 σ 1 2 + 1 σ 2 2 + 4 R 2 σ 1 2 σ 2 2 + 1 σ 1 2 1 σ 2 2 σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 .
By using the bound a + b a + b , we arrive at
| y | R σ 2 + σ 1 σ 2 σ 1 + σ 2 2 σ 1 2 σ 2 2 + 2 C s 1 σ 1 2 1 σ 2 2 .
This concludes the proof for the bound on L. □
Lemma A4.
Let h ˘ : C C denote the complex extension of the function h in (123). Then, for B R , we have that
max | z | B | h ˘ ( z ) | 1 2 π σ 1 2 e B 2 2 σ 1 2 a 1 B 2 + a 2 B + a 3
where
a 1 = 3 σ 1 2 σ 2 2 σ 2 2 σ 1 2 ,
a 2 = 2 σ 1 2 σ 2 2 σ 2 2 σ 1 2 + 2 ,
a 3 = σ 1 2 σ 2 2 σ 1 2 | log ( 2 π σ 2 2 ) | 2 + 24 ( σ 2 2 σ 1 2 ) 2 σ 2 4 + π 2 .
Proof. 
Let us denote z = z R + i z I , where z R and z I are real numbers and i = 1 is the imaginary unit. Then, by triangular inequality, we have:
| h ˘ ( z ) | = σ 1 2 f Y 1 ( z ) E N log f Y 2 ( z + N ) σ 2 2 σ 1 2 E X ϕ σ 1 ( z X ) + z f Y 1 ( z )
f Y 1 ( z ) σ 1 2 σ 2 2 σ 1 2 E | N | · | log f Y 2 ( z + N ) | + | z | + E | X | · | ϕ σ 1 ( z X ) | .
Next, let us upper-bound each contribution of (A57). For | z | B , we have
log f Y 2 ( z + n ) 2
= log f Y 2 ( z + n ) + i arg ( f Y 2 ( z + n ) ) 2
= log 2 | f Y 2 ( z + n ) | + arg 2 ( f Y 2 ( z + n ) )
= log 2 E ϕ σ 2 ( z + n X ) + arg 2 E ϕ σ 2 ( z + n X )
log 2 1 2 π σ 2 2 E exp ( z R + n X ) 2 z I 2 2 σ 2 2 + arg 2 x α x exp ( i θ x )
z I 2 2 σ 2 2 1 2 log ( 2 π σ 2 2 ) + log E e ( z R + n X ) 2 2 σ 2 2 2 + π 2
2 z I 2 2 σ 2 2 1 2 log ( 2 π σ 2 2 ) 2 + 2 log 2 E e ( z R + n X ) 2 2 σ 2 2 + π 2
2 z I 2 2 σ 2 2 1 2 log ( 2 π σ 2 2 ) 2 + 2 E 2 ( z R + n X ) 2 4 σ 2 4 + π 2
2 z I 2 2 σ 2 2 1 2 log ( 2 π σ 2 2 ) 2 + 2 ( z R + n ) 2 + R 2 2 4 σ 2 4 + π 2
2 B 2 σ 2 2 + | log ( 2 π σ 2 2 ) | 2 + 8 ( B 4 + n 4 ) + R 4 σ 2 4 + π 2 ,
where step (A61) holds by triangular inequality; step (A62) holds by noticing that
π < arg x supp ( P X ) α x exp ( i θ x ) π ,
where { α x } and { θ x } are real numbers that depend on x; (A63) follows from using the bound ( a + b ) 2 2 ( a 2 + b 2 ) ; (A64) holds because x log 2 ( x ) is a decreasing function for x < 1 and because E e ( z R + n X ) 2 2 σ 2 2 e E ( z R + n X ) 2 2 σ 2 2 , which follows from Jensen’s inequality; (A65) follows from E [ X ] = 0 and E [ ( X ) 2 ] R 2 ; and (A66) follows from the bound | a + b | k 2 k 1 ( | a | k + | b | k ) for k 1 . Furthermore, given that | z R | B and | z I | B , we arrive at the bound
( z R + n ) 2 + R 2 2 2 8 ( B 4 + n 4 ) + R 4 .
Consequently,
E | N | · | log f Y 2 ( z + N ) | σ 2 2 σ 1 2
E | N | 2 E | log f Y 2 ( z + N ) | 2 σ 2 2 σ 1 2
2 B 2 σ 2 2 + | log ( 2 π σ 2 2 ) | 2 + 8 ( B 4 + E [ N 4 ] ) + R 4 σ 2 4 + π 2
= 2 B 2 σ 2 2 + | log ( 2 π σ 2 2 ) | 2 + 8 B 4 + 24 ( σ 2 2 σ 1 2 ) 2 + R 4 σ 2 4 + π 2 ,
where (A69) follows from Cauchy–Schwarz inequality; (A70) follows from E [ N 4 ] = 3 ( σ 2 2 σ 1 2 ) 2 . Moreover, we have
| f Y 1 ( z ) | E ϕ σ 1 ( z X )
= 1 2 π σ 1 2 E exp ( z R X ) 2 z I 2 2 σ 1 2
1 2 π σ 1 2 exp B 2 2 σ 1 2 ,
and finally
E | X | · | ϕ σ 1 ( z X ) | R E | ϕ σ 1 ( z X ) |
R 1 2 π σ 1 2 exp B 2 2 σ 1 2 .
Putting all contributions together, we get
| h ˘ ( z ) | 2 π σ 1 2 e B 2 2 σ 1 2 σ 1 2 2 B 2 σ 2 2 + | log ( 2 π σ 2 2 ) | 2 + 8 B 4 + 24 ( σ 2 2 σ 1 2 ) 2 + R 4 σ 2 4 + π 2 σ 2 2 σ 1 2 + B + R
a 1 B 2 + a 2 B + a 3 ,
where, in the last step, we have used that i x i i x i and the fact that R B . □
Lemma A5.
Let h ˘ : C C denote the complex extension of the function h in (123). Then, for
B R σ 2 2 + σ 1 2 σ 2 2 σ 1 2 ,
we have that
max | z | B | h ˘ ( z ) | c 1 B c 2 R exp ( B + R ) 2 2 σ 1 2 2 π σ 1 2 > 0 ,
where c 1 = 1 σ 1 2 σ 2 2 and  c 2 = 1 + σ 1 2 σ 2 2 .
Proof. 
First, note that
E N E [ X | Y 2 = B + N ] σ 2 2 E [ X | Y 1 = B ] σ 1 2 R σ 2 2 R σ 1 2 .
Second, note that the condition in (A79) implies that
0 B 1 σ 1 2 1 σ 2 2 R σ 2 2 R σ 1 2 .
Therefore, by using (111) together with (A81) and (A82), we arrive at
max | z | B | h ˘ ( z ) | h ˘ ( B )
= E E [ X | Y 2 = B + N ] B σ 2 2 E [ X | Y 1 = B ] B σ 1 2 σ 1 2 f Y 1 ( B )
B 1 σ 1 2 1 σ 2 2 R σ 2 2 R σ 1 2 σ 1 2 f Y 1 ( B )
B 1 σ 1 2 1 σ 2 2 R σ 2 2 R σ 1 2 σ 1 2 2 π σ 1 2 exp ( B + R ) 2 2 σ 1 2 ,
where in last bound we have used Jensen’s inequality to arrive at
f Y 1 ( B ) = E ϕ σ 1 ( B X )
= 1 2 π σ 1 2 E exp ( B X ) 2 2 σ 1 2
1 2 π σ 1 2 exp ( B + R ) 2 2 σ 1 2 .
This concludes the proof. □

Appendix C. Proof of Theorem 7

To study the large n behavior, we need the following bounds on the function h ν [39,40]: for ν > 1 2
h ν ( x ) = x 2 ν 1 2 + ( 2 ν 1 ) 2 4 + x 2 · g ν ( x ) ,
where
1 g ν ( x ) 2 ν 1 2 + ( 2 ν 1 ) 2 4 + x 2 ν + ν 2 + x 2 .
Moreover, let
U n = R + s Z
with Z N ( 0 n , σ 2 I n ) . Consequently,
lim n E h n 2 2 R + s Z R s
= E lim n h n 2 2 R + s Z R s
= E lim n U n 2 R 2 s 2 n 1 2 + ( n 1 ) 2 4 + U n 2 R 2 s 2 2 · g n 2 2 U n R s
= E lim n 1 n U n 2 R 2 s 2 n · 1 2 + 1 4 + 1 n U n R s 2 2 · g n 2 2 U n R s
= 0 ,
where (A93) follows from the dominated convergence theorem, since | h ν | 1 ; (A94) follows from using (A90); (A96) follows from using the strong law of large numbers to note that
lim n 1 n U n 2 = lim n R + s Z 2 n = s .
Now, combining the capacity expression in (58) and (A96), we have that
lim n C s ( σ 1 2 , σ 2 2 , R , n ) = 1 2 σ 1 2 σ 2 2 R 2 s 2 d s = R 2 1 2 σ 1 2 1 2 σ 2 2 .

Appendix D. Proof of Theorem 8

Let R n = c n
lim n C s ( σ 1 2 , σ 2 2 , R n , n ) n = c 2 2 σ 1 2 σ 2 2 1 lim n E h n 2 2 R n + s Z R n s s 2 d s
= c 2 2 σ 1 2 σ 2 2 1 c 2 ( c 2 + s ) s 2 + s 2 4 + c 2 ( c 2 + s ) 2 s 2 d s
= 1 2 log σ 2 2 ( c 2 + σ 1 2 ) σ 1 2 ( c 2 + σ 2 2 ) ,
where (A100) follows from the limit established in (98). This concludes the proof.

Appendix E. Partial Derivatives for the Gradient Ascent Algorithm

The partial derivatives of the secrecy information, with respect to any mass point ρ l supp ( P X ) , are defined as
ρ l I s ( X ; P X ) = k = 1 K p i · ρ l Ξ ˜ ρ k ; P ^ X , l = 1 , , K .
By (A9), we have that Ξ ˜ x ; P ^ X = i 1 ( x ; P X ) i 2 ( x ; P X ) , where i j ( x ; P X ) , for j = 1 , 2 , is defined in (A10). Therefore, to compute (A102), we define the following derivatives
ρ l i j ( ρ k ; P X ) = 0 ρ l f χ n 2 ( ρ k 2 / σ j 2 ) ( y ) log y n 2 1 m = 1 K p m f χ n 2 ( ρ m 2 / σ j 2 ) ( y ) d y ,
where f χ n 2 ( ρ k 2 / σ j 2 ) ( y ) is the noncentral chi-square pdf with noncentrality parameter ρ k 2 / σ j 2 and n degrees of freedom. Notice that the derivative of f χ n 2 ( ρ k 2 / σ j 2 ) ( y ) with respect to ρ l is different from zero only when k = l and is given by
ρ l f χ n 2 ( ρ l 2 / σ j 2 ) ( y ) = ρ l σ j 2 f χ n + 2 2 ( ρ l 2 / σ j 2 ) ( y ) f χ n 2 ( ρ l 2 / σ j 2 ) ( y ) .
Moreover, given the probability p l associated with ρ l , we have that
ρ l log y n 2 1 k = 1 K p k f χ n 2 ( ρ k 2 / σ j 2 ) ( y ) = p l ρ l f χ n 2 ( ρ l 2 / σ j 2 ) ( y ) k = 1 K p k f χ n 2 ( ρ k 2 / σ j 2 ) ( y ) .
Finally, by combining everything together, we find
ρ l I s ( X ; P X ) = p l 0 ρ l σ 1 2 f χ n + 2 2 ( ρ l 2 / σ 1 2 ) ( y ) f χ n 2 ( ρ l 2 / σ 1 2 ) ( y ) log y n 2 1 k = 1 K p k f χ n 2 ( ρ k 2 / σ 1 2 ) ( y ) 1 d y p l 0 ρ l σ 2 2 f χ n + 2 2 ( ρ l 2 / σ 2 2 ) ( y ) f χ n 2 ( ρ l 2 / σ 2 2 ) ( y ) log y n 2 1 k = 1 K p k f χ n 2 ( ρ k 2 / σ 2 2 ) ( y ) 1 d y .

References

  1. Favano, A.; Barletta, L.; Dytso, A. Simulated Data. Available online: https://github.com/ucando83/WiretapCapacity (accessed on 26 April 2023).
  2. Wyner, A.D. The wire-tap channel. Bell Syst. Tech. J. 1975, 54, 1355–1387. [Google Scholar] [CrossRef]
  3. Leung-Yan-Cheong, S.; Hellman, M. The Gaussian wire-tap channel. IEEE Trans. Inf. Theory 1978, 24, 451–456. [Google Scholar] [CrossRef]
  4. Bloch, M.; Barros, J. Physical-Layer Security: From Information Theory to Security Engineering; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  5. Oggier, F.; Hassibi, B. A Perspective on the MIMO Wiretap Channel. Proc. IEEE 2015, 103, 1874–1882. [Google Scholar] [CrossRef]
  6. Liang, Y.; Poor, H.V.; Shamai (Shitz), S. Information theoretic security. Found. Trends Commun. Inf. Theory 2009, 5, 355–580. [Google Scholar] [CrossRef]
  7. Poor, H.V.; Schaefer, R.F. Wireless physical layer security. Proc. Natl. Acad. Sci. USA 2017, 114, 19–26. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Mukherjee, A.; Fakoorian, S.A.A.; Huang, J.; Swindlehurst, A.L. Principles of physical layer security in multiuser wireless networks: A survey. IEEE Commun. Surv. Tutor. 2014, 16, 1550–1573. [Google Scholar] [CrossRef] [Green Version]
  9. Gopala, P.K.; Lai, L.; El Gamal, H. On the secrecy capacity of fading channels. IEEE Trans. Inf. Theory 2008, 54, 4687–4698. [Google Scholar] [CrossRef] [Green Version]
  10. Bloch, M.; Barros, J.; Rodrigues, M.R.; McLaughlin, S.W. Wireless information-theoretic security. IEEE Trans. Inf. Theory 2008, 54, 2515–2534. [Google Scholar] [CrossRef]
  11. Khisti, A.; Tchamkerten, A.; Wornell, G.W. Secure broadcasting over fading channels. IEEE Trans. Inf. Theory 2008, 54, 2453–2469. [Google Scholar] [CrossRef] [Green Version]
  12. Liang, Y.; Poor, H.V.; Shamai, S. Secure communication over fading channels. IEEE Trans. Inf. Theory 2008, 54, 2470–2492. [Google Scholar] [CrossRef] [Green Version]
  13. Shafiee, S.; Liu, N.; Ulukus, S. Towards the secrecy capacity of the Gaussian MIMO wire-tap channel: The 2-2-1 channel. IEEE Trans. Inf. Theory 2009, 55, 4033–4039. [Google Scholar] [CrossRef] [Green Version]
  14. Khisti, A.; Wornell, G.W. Secure transmission with multiple antennas–Part II: The MIMOME wiretap channel. IEEE Trans. Inf. Theory 2010, 56, 5515–5532. [Google Scholar] [CrossRef] [Green Version]
  15. Oggier, F.; Hassibi, B. The secrecy capacity of the MIMO wiretap channel. IEEE Trans. Inf. Theory 2011, 57, 4961–4972. [Google Scholar] [CrossRef] [Green Version]
  16. Guo, D.; Shamai, S.; Verdú, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inf. Theory 2005, 51, 1261–1282. [Google Scholar] [CrossRef] [Green Version]
  17. Bustin, R.; Liu, R.; Poor, H.V.; Shamai, S. An MMSE approach to the secrecy capacity of the MIMO Gaussian wiretap channel. Eurasip J. Wirel. Commun. Netw. 2009, 2009, 370970. [Google Scholar] [CrossRef] [Green Version]
  18. Liu, T.; Shamai, S. A note on the secrecy capacity of the multiple-antenna wiretap channel. IEEE Trans. Inf. Theory 2009, 55, 2547–2553. [Google Scholar] [CrossRef]
  19. Loyka, S.; Charalambous, C.D. An algorithm for global maximization of secrecy rates in Gaussian MIMO wiretap channels. IEEE Trans. Commun. 2015, 63, 2288–2299. [Google Scholar] [CrossRef] [Green Version]
  20. Loyka, S.; Charalambous, C.D. Optimal signaling for secure communications over Gaussian MIMO wiretap channels. IEEE Trans. Inf. Theory 2016, 62, 7207–7215. [Google Scholar] [CrossRef] [Green Version]
  21. Ozel, O.; Ekrem, E.; Ulukus, S. Gaussian wiretap channel with amplitude and variance constraints. IEEE Trans. Inf. Theory 2015, 61, 5553–5563. [Google Scholar] [CrossRef]
  22. Soltani, M.; Rezki, Z. Optical wiretap channel with input-dependent Gaussian noise under peak-and average-intensity constraints. IEEE Trans. Inf. Theory 2018, 64, 6878–6893. [Google Scholar] [CrossRef]
  23. Soltani, M.; Rezki, Z. The Degraded Discrete-Time Poisson Wiretap Channel. arXiv 2021, arXiv:2101.03650. [Google Scholar]
  24. Nam, S.H.; Lee, S.H. Secrecy Capacity of a Gaussian Wiretap Channel with One-bit ADCs is Always Positive. In Proceedings of the 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 25–28 August 2019; pp. 1–5. [Google Scholar] [CrossRef]
  25. Dytso, A.; Egan, M.; Perlaza, S.M.; Poor, H.V.; Shitz, S.S. Optimal Inputs for Some Classes of Degraded Wiretap Channels. In Proceedings of the 2018 IEEE Information Theory Workshop (ITW), Guangzhou, China, 25–29 November 2018; pp. 1–5. [Google Scholar] [CrossRef] [Green Version]
  26. Karlin, S. Pólya type distributions, II. Ann. Math. Stat. 1957, 28, 281–308. [Google Scholar] [CrossRef]
  27. Dytso, A.; Al, M.; Poor, H.V.; Shamai Shitz, S. On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective. IEEE Trans. Inf. Theory 2019, 65, 3907–3921. [Google Scholar] [CrossRef]
  28. Favano, A.; Ferrari, M.; Magarini, M.; Barletta, L. The Capacity of the Amplitude-Constrained Vector Gaussian Channel. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 426–431. [Google Scholar] [CrossRef]
  29. Berry, J.C. Minimax estimation of a bounded normal mean vector. J. Multivar. Anal. 1990, 35, 130–139. [Google Scholar] [CrossRef] [Green Version]
  30. Dytso, A.; Yagli, S.; Poor, H.V.; Shamai (Shitz), S. The Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. IEEE Trans. Inf. Theory 2020, 66, 2006–2022. [Google Scholar] [CrossRef] [Green Version]
  31. Cover, T.; Thomas, J. Elements of Information Theory, 2nd ed.; Wiley: Hoboken, NJ, USA, 2006. [Google Scholar]
  32. Han, T.S.; Endo, H.; Sasaki, M. Reliability and Secrecy Functions of the Wiretap Channel Under Cost Constraint. IEEE Trans. Inf. Theory 2014, 60, 6819–6843. [Google Scholar] [CrossRef]
  33. Barletta, L.; Dytso, A. Amplitude-Constrained Gaussian Wiretap Channel: Computation of the Optimal Input Distribution. In Proceedings of the 2022 IEEE International Mediterranean Conference on Communications and Networking (MeditCom), Athens, Greece, 5–8 September 2022; pp. 106–111. [Google Scholar] [CrossRef]
  34. Rose, K. A mapping approach to rate-distortion computation and analysis. IEEE Trans. Inf. Theory 1994, 40, 1939–1952. [Google Scholar] [CrossRef] [Green Version]
  35. Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef] [Green Version]
  36. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  37. Yasui, K.; Suko, T.; Matsushima, T. An algorithm for computing the secrecy capacity of broadcast channels with confidential messages. In Proceedings of the 2007 IEEE International Symposium on Information Theory (ISIT), Nice, France, 24–29 June 2007; pp. 936–940. [Google Scholar] [CrossRef]
  38. Verdú, S. Mismatched estimation and relative entropy. IEEE Trans. Inf. Theory 2010, 56, 3712–3720. [Google Scholar] [CrossRef]
  39. Segura, J. Bounds for ratios of modified Bessel functions and associated Turán-type inequalities. J. Math. Anal. Appl. 2011, 374, 516–528. [Google Scholar] [CrossRef] [Green Version]
  40. Baricz, Á. Bounds for Turánians of modified Bessel functions. Expo. Math. 2015, 33, 223–251. [Google Scholar] [CrossRef] [Green Version]
  41. Tijdeman, R. On the number of zeros of general exponential polynomials. In Proceedings of the Indagationes Mathematicae; North-Holland: Amsterdam, The Netherlands, 1971; Volume 74, pp. 1–7. [Google Scholar]
  42. Esposito, R. On a relation between detection and estimation in decision theory. Inf. Control 1968, 12, 116–120. [Google Scholar] [CrossRef] [Green Version]
  43. Dytso, A.; Poor, H.V.; Shitz, S.S. A general derivative identity for the conditional mean estimator in Gaussian noise and some applications. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020; pp. 1183–1188. [Google Scholar] [CrossRef]
  44. Barletta, L.; Dytso, A. Scalar Gaussian Wiretap Channel: Bounds on the Support Size of the Secrecy-Capacity-Achieving Distribution. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
  45. Favano, A.; Barletta, L.; Dytso, A. On the Capacity Achieving Input of Amplitude Constrained Vector Gaussian Wiretap Channel. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 850–855. [Google Scholar] [CrossRef]
  46. Favano, A. The Capacity of Amplitude-Constrained Vector Gaussian Channels. Ph.D. Dissertation, Politecnico di Milano, Milan, Italy, 2022. [Google Scholar]
  47. Lapidoth, A.; Moser, S.M. Capacity bounds via duality with applications to multiple-antenna systems on flat-fading channels. IEEE Trans. Inf. Theory 2003, 49, 2426–2467. [Google Scholar] [CrossRef]
Figure 1. Asymptotic behavior of R ¯ n ( 1 , σ 2 2 ) / n versus n for σ 1 2 = 1 and σ 2 2 = 1.001 , 1.5 , 10 , 1000 . In red, we show c ( 1 , σ 2 2 ) defined in (46).
Figure 1. Asymptotic behavior of R ¯ n ( 1 , σ 2 2 ) / n versus n for σ 1 2 = 1 and σ 2 2 = 1.001 , 1.5 , 10 , 1000 . In red, we show c ( 1 , σ 2 2 ) defined in (46).
Entropy 25 00741 g001
Figure 2. Secrecy capacity in bit per channel use (bpcu) versus R for σ 2 2 = 1.5 , 10 and n = 2 , 4 . The secrecy capacity under average power constraints C G ( σ 1 2 , σ 2 2 , R 2 , n ) is defined in (69), while under peak power constraints, i.e., C s ( σ 1 2 , σ 2 2 , R , n ) , is defined in (58).
Figure 2. Secrecy capacity in bit per channel use (bpcu) versus R for σ 2 2 = 1.5 , 10 and n = 2 , 4 . The secrecy capacity under average power constraints C G ( σ 1 2 , σ 2 2 , R 2 , n ) is defined in (69), while under peak power constraints, i.e., C s ( σ 1 2 , σ 2 2 , R , n ) , is defined in (58).
Entropy 25 00741 g002
Figure 3. Evolution of the numerically estimated P ^ X versus R for σ 1 2 = 1 , σ 2 2 = 1.5 , (a) n = 2 , and (b) n = 8 .
Figure 3. Evolution of the numerically estimated P ^ X versus R for σ 1 2 = 1 , σ 2 2 = 1.5 , (a) n = 2 , and (b) n = 8 .
Entropy 25 00741 g003
Figure 4. Evolution of the numerically estimated P ^ X versus R for σ 1 2 = 1 , σ 2 2 = 10 , (a) n = 2 , and (b) n = 8 .
Figure 4. Evolution of the numerically estimated P ^ X versus R for σ 1 2 = 1 , σ 2 2 = 10 , (a) n = 2 , and (b) n = 8 .
Entropy 25 00741 g004
Figure 5. Output pdf of the legitimate user and of the eavesdropper for σ 1 2 = 1 , σ 2 2 = 10 , n = 2 , (a,b) R = 2.25 , and (c,d) R = 7.5 . An animation showing the evolution of the output pdf as R varies can be found in [1].
Figure 5. Output pdf of the legitimate user and of the eavesdropper for σ 1 2 = 1 , σ 2 2 = 10 , n = 2 , (a,b) R = 2.25 , and (c,d) R = 7.5 . An animation showing the evolution of the output pdf as R varies can be found in [1].
Entropy 25 00741 g005
Table 1. Values of R ¯ n MMSE ( 1 ) , R ¯ n ( 1 , σ 2 2 ) , and R ¯ n ptp ( 1 ) .
Table 1. Values of R ¯ n MMSE ( 1 ) , R ¯ n ( 1 , σ 2 2 ) , and R ¯ n ptp ( 1 ) .
nMMSE σ 2 2 ptp
1.0011.5101000
11.0571.0571.1611.5181.6641.666
21.5351.5351.6872.2212.4502.454
31.9081.9092.0982.7683.0613.065
42.2232.2242.4443.2293.5753.580
52.5012.5012.7503.6344.0264.031
62.7512.7523.0253.9994.4324.438
72.9812.9823.2784.3344.8054.811
83.1953.1963.5134.6465.1515.158
93.3953.3963.7334.9375.4755.483
103.5853.5863.9415.2135.7815.789
113.7653.7664.1395.4756.0726.080
123.9363.9384.3285.7256.3506.359
134.1014.1024.5095.9646.6166.625
144.2594.2604.6836.1956.8726.881
154.4124.4134.8516.4177.1197.128
164.5604.5615.0136.6327.3577.367
174.7024.7045.1706.8397.5887.598
184.8414.8425.3237.0417.8127.823
194.9764.9775.4717.2388.0308.041
205.1075.1095.6167.4298.2428.254
215.2355.2375.7567.6158.4498.461
225.3605.3625.8947.7978.6518.663
235.4835.4846.0287.9748.8488.860
245.6025.6036.1598.1489.0419.054
255.7195.7206.2888.3189.2309.243
265.8345.8356.4148.4859.4169.428
275.9465.9486.5388.6499.5979.610
286.0566.0586.6598.8099.7759.789
296.1656.1666.7788.9679.9519.964
306.2716.2736.8959.12210.12310.136
316.3766.3787.0109.27410.29210.306
326.4796.4817.1249.42410.45810.472
336.5806.5827.2359.57110.62210.636
346.6806.6827.3459.71710.78310.798
356.7796.7807.4539.86010.94210.957
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Favano, A.; Barletta, L.; Dytso, A. Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution. Entropy 2023, 25, 741. https://doi.org/10.3390/e25050741

AMA Style

Favano A, Barletta L, Dytso A. Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution. Entropy. 2023; 25(5):741. https://doi.org/10.3390/e25050741

Chicago/Turabian Style

Favano, Antonino, Luca Barletta, and Alex Dytso. 2023. "Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution" Entropy 25, no. 5: 741. https://doi.org/10.3390/e25050741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop