Next Article in Journal
Energy Stability Property of the CPR Method Based on Subcell Second-Order CNNW Limiting in Solving Conservation Laws
Next Article in Special Issue
Amplitude Constrained Vector Gaussian Wiretap Channel: Properties of the Secrecy-Capacity-Achieving Input Distribution
Previous Article in Journal
Deep Classification with Linearity-Enhanced Logits to Softmax Function
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Information Rates for Channels with Fading, Side Information and Adaptive Codewords

School of Computation, Information and Technology, Technical University of Munich (TUM), 80333 Munich, Germany
Entropy 2023, 25(5), 728; https://doi.org/10.3390/e25050728
Submission received: 3 March 2023 / Revised: 26 March 2023 / Accepted: 22 April 2023 / Published: 27 April 2023
(This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives III)

Abstract

:
Generalized mutual information (GMI) is used to compute achievable rates for fading channels with various types of channel state information at the transmitter (CSIT) and receiver (CSIR). The GMI is based on variations of auxiliary channel models with additive white Gaussian noise (AWGN) and circularly-symmetric complex Gaussian inputs. One variation uses reverse channel models with minimum mean square error (MMSE) estimates that give the largest rates but are challenging to optimize. A second variation uses forward channel models with linear MMSE estimates that are easier to optimize. Both model classes are applied to channels where the receiver is unaware of the CSIT and for which adaptive codewords achieve capacity. The forward model inputs are chosen as linear functions of the adaptive codeword’s entries to simplify the analysis. For scalar channels, the maximum GMI is then achieved by a conventional codebook, where the amplitude and phase of each channel symbol are modified based on the CSIT. The GMI increases by partitioning the channel output alphabet and using a different auxiliary model for each partition subset. The partitioning also helps to determine the capacity scaling at high and low signal-to-noise ratios. A class of power control policies is described for partial CSIR, including a MMSE policy for full CSIT. Several examples of fading channels with AWGN illustrate the theory, focusing on on-off fading and Rayleigh fading. The capacity results generalize to block fading channels with in-block feedback, including capacity expressions in terms of mutual and directed information.

1. Introduction

The capacity of fading channels is a topic of interest in wireless communications [1,2,3,4]. Fading refers to model variations over time, frequency, and space. A common approach to track fading is to insert pilot symbols into transmit symbol strings, have receivers estimate fading parameters via the pilot symbols, and have the receivers share their estimated channel state information (CSI) with the transmitters. The CSI available at the receiver (CSIR) and transmitter (CSIT) may be different and imperfect.
Information-theoretic studies on fading channels distinguish between average (ergodic) and outage capacity, causal and non-causal CSI, symbol and rate-limited CSI, and different qualities of CSIR and CSIT that are coarsely categorized as no, perfect, or partial. We refer to [5] for a review of the literature up to 2008. We here focus exclusively on average capacity and causal CSIT as introduced in [6]. Codes for such CSIT, or more generally for noisy feedback [7], are based on Shannon strategies, also called codetrees ([8], Chapter 9.4), or adaptive codewords ([9], Section 4.1). (The term “adaptive codeword” was suggested to the author by J. L. Massey.) Adaptive codewords are usually implemented by a conventional codebook and by modifying the codeword symbols as a function of the CSIT. This approach is optimal for some channels [10] and will be our main interest.

1.1. Block Fading

A model that accounts for the different time scales of data transmission (e.g., nanoseconds) and channel variations (e.g., milliseconds) is block fading [11,12]. Such fading has the channel parameters constant within blocks of L symbols and varying across blocks. A basic setup is as follows.
  • The fading is described by a state process S H 1 , S H 2 , independent of the transmitter messages and channel noise. The subscript “H” emphasizes that the states S H i may be hidden from the transceivers.
  • Each receiver sees a state process S R 1 , S R 2 , where S R i is a noisy function of S H i for all i.
  • Each transmitter sees a state process S T 1 , S T 2 , where S T i is a noisy function of S H i for all i.
The state processes may be modeled as memoryless [11,12] or governed by a Markov chain [13,14,15,16,17,18,19,20,21]. The memoryless models are particular cases of Shannon’s model [6]. For scalar channels, S H i is usually a complex number H i . Similarly, for vector or multi-input, multi-output (MIMO) channels with M- and N-dimensional inputs and outputs, respectively, S H i is a N × M matrix H i .
Consider, for example, a point-to-point channel with block-fading and complex-alphabet inputs X i and outputs
Y i = H i X i + Z i
where the index i, i = 1 , , n , enumerates the blocks and the index , = 1 , , L , enumerates the symbols of each block. The additive white Gaussian noise (AWGN) Z 11 , Z 12 , is a sequence of independent and identically distributed (i.i.d.) random variables that have a common circularly-symmetric complex Gaussian (CSCG) distribution.

1.2. CSI and In-Block Feedback

The motivation for modeling CSI as independent of the messages is simplicity. If one uses only pilot symbols to estimate the H i in (1), for example, then the independence is valid, and the capacity analysis may be tractable. However, to improve performance, one can implement data and parameter estimation jointly, and one can actively adjust the transmit symbols X i using past received symbols Y i k , k = 1 , , 1 , if in-block feedback is available. (Across-block feedback does not increase capacity if the state processes are memoryless; see ([22], Remark 16).) An information theory for such feedback was developed in [22], where a challenge is that code design is based on adaptive codewords that are more sophisticated than conventional codewords.
For example, suppose the CSIR is S R i = H i . Then, one might expect that CSCG signaling is optimal, and the capacity is an average of log ( 1 + SNR ) terms, where SNR is a signal-to-noise ratio. However, this simplification is based on constraints, e.g., that the CSIT is a function of the CSIR and that the X i cannot influence the CSIT. The former constraint can be realistic, e.g., if the receiver quantizes a pilot-based estimate of H i and sends the quantization bits to the transmitter via a low-latency and reliable feedback link. On the other hand, the latter constraint is unrealistic in general.

1.3. Auxiliary Models

This paper’s primary motivation is to further develop information theory for adaptive codewords. To gain insight, it is helpful to have achievable rates with log ( 1 + SNR ) terms. A common approach to obtain such expressions is to lower bound the channel mutual information I ( X ; Y ) as follows.
Suppose X is continuous and consider two conditional densities: the density p ( x | y ) and an auxiliary density q ( x | y ) . We will refer to such densities as reverse models; similarly, p ( y | x ) and q ( y | x ) are called forward models. One may write the differential entropy of X given Y as
h ( X | Y ) = E log p ( X | Y ) = E log q ( X | Y ) average cross entropy E log p ( X | Y ) q ( X | Y ) average divergence 0
where the first expectation in (2) is an average cross-entropy, and the second is an average informational divergence, which is non-negative. Several criteria affect the choice of q ( x | y ) : the cross-entropy should be simple enough to admit theoretical or numerical analysis, e.g., by Monte Carlo simulation; the cross-entropy should be close to h ( X | Y ) ; and the cross-entropy should suggest suitable transmitter and receiver structures.
We illustrate how reverse and forward auxiliary models have been applied to bound mutual information. Assume that E X = E Y = 0 for simplicity.
Reverse Model: Consider the reverse density that models X , Y as jointly CSCG:
q ( x | y ) = 1 π σ L 2 exp x x ^ L 2 / σ L 2
where X ^ L = E X Y * / E | Y | 2 Y and
σ L 2 = E X X ^ L 2 = E | X | 2 | E X Y * | 2 E | Y | 2
is the mean square error (MSE) of the estimate X ^ L . In fact, X ^ L is the linear estimate with the minimum MSE (MMSE), and σ L 2 is the linear MMSE (LMMSE) which is independent of Y = y ; see Section 2.5. The bound in (2) gives
h ( X | Y ) log π e σ L 2 .
Thus, if X is CSCG, then we have the desired form
I ( X ; Y ) = h ( X ) h ( X | Y ) log 1 + | h | 2 E | X | 2 σ 2
where the parameters h and σ 2 are
h = E Y X * E | X | 2 , σ 2 = E | Y h X | 2 .
The bound (6) is apparently due to Pinsker [23,24,25] and is widely used in the literature; see e.g., [18,26,27,28,29,30,31,32,33,34,35,36,37,38]. The bound is usually related to channels p ( y | x ) with additive noise but (2)–(6) show that it applies generally. The extension to vector channels is given in Section 2.7 below.
Forward Model: A more flexible approach is to choose the reverse density as
q ( x | y ) = p ( x ) q ( y | x ) s q ( y )
where q ( y | x ) is a forward auxiliary model (not necessarily a density), s 0 is a parameter to be optimized, and
q ( y ) = C p ( x ) q ( y | x ) s d x .
Inserting (8) into (2) we compute
I ( X ; Y ) max s 0 E log q ( Y | X ) s q ( Y ) .
The right-hand side (RHS) of (10) is called a generalized mutual information (GMI) [39,40] and has been applied to problems in information theory [41], wireless communications [42,43,44,45,46,47,48,49,50,51], and fiber-optic communications [52,53,54,55,56,57,58,59,60,61]. For example, the bounds (6) and (10) are the same if s = 1 and
q ( y | x ) = exp | y h x | 2 / σ 2
where h and σ 2 are given by (7). Note that (11) is not a density unless σ 2 = 1 / π but q ( x | y ) is a density. (We require q ( x | y ) to be a density to apply the divergence bound in (2).)
We compare the two approaches. The bound (5) is simple to apply and works well since the choices (7) give the maximal GMI for CSCG X; see Proposition 1 below. However, there are limitations: one must use continuous X, the auxiliary model q ( y | x ) is fixed as (11), and the bound does not show how to design the receiver. Instead, the GMI applies to continuous/discrete/mixed X and has an operational interpretation: the receiver uses q ( y | x ) rather than p ( y | x ) to decode. The framework of such mismatched receivers appeared in ([62], Exercise 5.22); see also [63].

1.4. Refined Auxiliary Models

The two approaches above can be refined in several ways, and we review selected variations in the literature.
Reverse Models: The model q ( x | y ) can be different for each Y = y , e.g., on may choose X as Gaussian with mean E X | Y = y and variance
Var X | Y = y = E | X | 2 | Y = y | E X | Y = y | 2
and where
q ( x | y ) = 1 π Var X | Y = y exp x E X | Y = y 2 Var X | Y = y .
Inserting (13) in (2) we have the bound
h ( X | Y ) E log π e Var X | Y
which improves (5) in general, since Var X | Y = y is the MMSE of X given the event Y = y . In other words, we have Var X | Y = y σ L 2 for all Y = y and the following bound improves (6) for CSCG X:
I ( X ; Y ) E log E | X | 2 Var X | Y .
In fact, the bound (15) was derived in ([50], Section III.B) by optimizing the GMI in (10) over all forward models of the form
q ( y | x ) = exp g ˜ y f ˜ y x 2
where f ˜ y , g ˜ y depend on y; see also [47,48,49]. We provide a simple proof. By inserting (16) into (8) and (9), absorbing the s parameter in f ˜ y and g ˜ y , and completing squares, one can equivalently optimize over all reverse densities of the form
q ( x | y ) = exp g y f y x 2 + h y
where | f y | 2 = π e h y so that q ( x | y ) is a density. We next bound the cross-entropy as
E log q ( X | Y = y ) = E g y / f y X 2 | f y | 2 h y Var X | Y = y π e h y h y
with equality if g y / f y = E X | Y = y ; see Section 2.5. The RHS of (18) is minimized by Var X | Y = y π e h y = 1 , so the best choice for f y , g y , h y gives the bound (14).
Remark 1.
The model (16) uses generalized nearest-neighbor decoding, improving the rules proposed in [42,43,44]. The authors of [50] pointed out that (6) and (15) use the LMMSE and MMSE, respectively; see ([50], Equation (87)).
Remark 2.
A corresponding forward model can be based on (8) and (13), namely
q ( y | x ) s = q ( x | y ) p ( x ) q ( y ) = 1 .
Remark 3.
The RHS of (15) has a more complicated form than the RHS of (6) due to the outer expectation and conditional variance, and this makes optimizing X challenging when there is CSIR and CSIT. Also, if p ( y | x ) is known, then it seems sensible to numerically compute p ( y ) and I ( X ; Y ) directly, e.g., via Monte Carlo or numerical integration.
Remark 4.
Decoding rules for discrete X can be based on decision theory as well as estimation theory; see ([64], Equation (11)).
Forward Models: Refinements of (11) appear in the optical fiber literature where the non-linear Schrödinger equation describes wave propagation [52]. Such channels exhibit complicated interactions of attenuation, dispersion, nonlinearity, and noise, and the channel density is too challenging to compute. One thus resorts to capacity lower bounds based on GMI and Monte Carlo simulation. The simplest models are memoryless, and they work well if chosen carefully. For example, the paper [52] used auxiliary models of the form
q ( y | x ) = exp | y h x | 2 / σ | x | 2
where h accounts for attenuation and self-phase modulation, and where the noise variance σ | x | 2 depends on | x | . Also, X was chosen to have concentric rings rather than a CSCG density. Subsequent papers applied progressively more sophisticated models with memory to better approximate the actual channel; see [53,54,55,56,57,58,59]. However, the rate gains over the model (20) are minor (≈12%) for 1000 km links, and the newer models do not suggest practical receiver structures.
A related application is short-reach fiber-optic systems that use direct detection (DD) receivers [65] with photodiodes. The paper [60] showed that sampling faster than the symbol rate increases the DD capacity. However, spectrally efficient filtering gives the channel a long memory, motivating auxiliary models q ( y | x ) with reduced memory to simplify GMI computations [61,66]. More generally, one may use channel-shortening filters [67,68,69] to increase the GMI.
Remark 5.
The ultimate GMI is I ( X ; Y ) , and one can compute this quantity numerically for the channels considered in this paper. We are motivated to focus on forward auxiliary models q ( y | x ) to understand how to improve information rates for more complex channels. For instance, simple q ( y | x ) let one understand properties of optimal codes, see Lemma 3, and they suggest explicit power control policies, see Theorem 2.
Remark 6.
The paper [37] (see also ([2], Equation (3.3.45)) and ([70], Equation (6))) derives two capacity lower bounds for massive MIMO channels. These bounds are designed for problems where the fading parameters have small variance so that, in effect, σ 2 in (7) is small. We will instead encounter cases where σ 2 grows in proportion to E | X | 2 and the RHS of (6) quickly saturates as E | X | 2 grows; see Remark 20.

1.5. Organization

This paper is organized as follows. Section 2 defines notation and reviews basic results. Section 3 develops two results for the GMI of scalar auxiliary models with AWGN:
  • Proposition 1 in Section 3.1 states a known result, namely that the RHS of (6) is the maximum GMI for the AWGN auxiliary model (11) and a CSCG X.
  • Lemma 1 in Section 3.2 generalizes Proposition 1 by partitioning the channel output alphabet into K subsets, K 1 . We use K = 2 to establish capacity properties at high and low SNR.
Section 4 and Section 5 apply the GMI to channels with CSIT and CSIR.
  • Section 4.3 treats adaptive codewords and develops structural properties of their optimal distribution.
  • Lemma 2 in Section 4.4 generalizes Proposition 1 to MIMO channels and adaptive codewords. The receiver models each transmit symbol as a weighted sum of the entries of the corresponding adaptive symbol.
  • Lemma 3 in Section 4.5 states that the maximum GMI for scalar channels, an AWGN auxiliary model, adaptive codewords with jointly CSCG entries, and K = 1 is achieved by using a conventional codebook where each symbol is modified based on the CSIT.
  • Lemma 4 in Section 4.6 extends Lemma 3 to MIMO channels, including diagonal or parallel channels.
  • Theorem 1 in Section 5.1 generalizes Lemma 3 to include CSIR; we use this result several times in Section 6.
  • Lemma 5 in Section 5.3 generalizes Lemmas 1 and 2 by partitioning the channel output alphabet.
Section 6, Section 7 and Section 8 apply the GMI to fading channels with AWGN and illustrate the theory for on-off and Rayleigh fading.
  • Lemma 6 in Section 6 gives a general capacity upper bound.
  • Section 6.5 introduces a class of power control policies for full CSIT. Theorem 2 develops the optimal policy with an MMSE form.
  • Theorem 3 in Section 6.6 provides a quadratic waterfilling expression for the GMI with partial CSIR.
Section 9 develops theory for block fading channels with in-block feedback (or in-block CSIT) that is a function of the CSIR and past channel inputs and outputs.
  • Theorem 4 in Section 9.2 generalizes Lemma 4 to MIMO block fading channels;
  • Section 9.3 develops capacity expressions in terms of directed information;
  • Section 9.4 specializes the capacity to fading channels with AWGN and delayed CSIR;
  • Proposition 3 generalizes Proposition 2 to channels with special CSIR and CSIT.
Section 10 concludes the paper. Finally, Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G provide results on special functions, GMI calculations, and proofs.

2. Preliminaries

2.1. Basic Notation

Let 1 ( · ) be the indicator function that takes on the value 1 if its argument is true and 0 otherwise. Let δ ( . ) be the Dirac generalized function with X δ ( x ) f ( x ) d x = f ( 0 ) · 1 ( 0 X ) . For x R , define ( x ) + = max ( 0 , x ) . The complex-conjugate, absolute value, and phase of x C are written as x * , | x | , and arg ( x ) , respectively. We write j = 1 and ϵ ¯ = 1 ϵ .
Sets are written with calligraphic font, e.g., S = { 1 , , n } and the cardinality of S is | S | . The complement of S in T is S c where T is understood from the context.

2.2. Vectors and Matrices

Column vectors are written as x ̲ = [ x 1 , , x M ] T where M is the dimension, and T denotes transposition. The complex-conjugate transpose (or Hermitian) of x ̲ is written as x ̲ . The Euclidean norm of x ̲ is x ̲ . Matrices are written with bold letters such as A . The letter I denotes the identity matrix. The determinant and trace of a square matrix A are written as det A and tr A , respectively.
A singular value decomposition (SVD) is A = U Σ V where U and V are unitary matrices and Σ is a rectangular diagonal matrix with the singular values of A on the diagonal. The square matrix A is positive semi-definite if x ̲ A x ̲ 0 for all x ̲ . The notation A B means that B A is positive semi-definite. Similarly, A is positive definite if x ̲ A x ̲ > 0 for all x ̲ , and we write A B if B A is positive definite.

2.3. Random Variables

Random variables are written with uppercase letters, such as X, and their realizations with lowercase letters, such as x. We write the distribution of discrete X with alphabet X = { 0 , , n 1 } as P X = [ P X ( 0 ) , , P X ( n 1 ) ] . The density of a real- or complex-valued X is written as p X . Mixed discrete-continuous distributions are written using mixtures of densities and Dirac- δ functions.
Conditional distributions and densities are written as P X | Y and p X | Y , respectively. We usually drop subscripts if the argument is a lowercase version of the random variable, e.g., we write p ( y | x ) for p Y | X ( y | x ) . One exception is that we consistently write the distributions P S R ( . ) and P S T ( . ) of the CSIR and CSIT with the subscript to avoid confusion with power notation.

2.4. Second-Order Statistics

The expectation and variance of the complex-valued random variable X are E X and Var X = E | X E X | 2 , respectively. The correlation coefficient of X 1 and X 2 is ρ = E U 1 U 2 * where
U i = ( X i E X i ) / Var X i
for i = 1 , 2 . We say that X 1 and X 2 are fully correlated if ρ = e j ϕ for some real ϕ . Conditional expectation and variance are written as E X | A = a and
Var X | A = a = E ( X E X ) ( X E X ) * | A = a .
The expressions E X | A , Var X | A are random variables that take on the values E X | A = a , Var X | A = a if A = a .
The expectation and covariance matrix of the random column vector X ̲ = [ X 1 , , X M ] T are E X ̲ and Q X ̲ = E ( X ̲ E X ̲ ) ( X ̲ E X ̲ ) , respectively. We write Q X ̲ , Y ̲ for the covariance matrix of the stacked vector [ X ̲ T Y ̲ T ] T . We write Q X ̲ | Y ̲ = y ̲ for the covariance matrix of X ̲ conditioned on the event Y ̲ = y ̲ . Q X ̲ | Y ̲ is a random matrix that takes on the matrix value Q X ̲ | Y ̲ = y ̲ when Y ̲ = y ̲ .
We often consider CSCG random variables and vectors. A CSCG X ̲ has density
p ( x ̲ ) = exp x ̲ Q X ̲ 1 x ̲ π M det Q X ̲
and we write X ̲ CN ( 0 ̲ , Q X ̲ ) .

2.5. MMSE and LMMSE Estimation

Assume that E X ̲ = E Y ̲ = 0 ̲ . The MMSE estimate of X ̲ given the event Y ̲ = y ̲ is the vector X ̲ ^ ( y ̲ ) that minimizes
E X ̲ X ̲ ^ ( y ̲ ) 2 Y ̲ = y ̲ .
Direct analysis gives ([71], Chapter 4)
X ̲ ^ ( y ̲ ) = E X ̲ | Y ̲ = y ̲
E X ̲ X ̲ ^ 2 = E X ̲ 2 E X ̲ ^ 2
Q X ̲ X ̲ ^ = Q X ̲ Q X ̲ ^
E X ̲ X ̲ ^ Y ̲ = 0
where the last identity is called the orthogonality principle.
The LMMSE estimate of X ̲ given Y ̲ with invertible Q Y ̲ is the vector X ̲ ^ L = C Y ̲ where C is chosen to minimize E X ̲ X ̲ ^ L 2 . We compute
X ̲ ^ L = E X ̲ Y ̲ Q Y ̲ 1 Y ̲
and we also have the properties (22)–(24) with X ̲ ^ replaced by X ̲ ^ L . Moreover, if X ̲ and Y ̲ are jointly CSCG, then the MMSE and LMMSE estimators coincide, and the orthogonality principle (24) implies that the error X ̲ X ̲ ^ is independent of Y ̲ , i.e., we have
E X ̲ X ̲ ^ X ̲ X ̲ ^ Y ̲ = y ̲ = E X ̲ X ̲ Y ̲ = y ̲ E X ̲ Y ̲ Q Y ̲ 1 y ̲ y ̲ Q Y ̲ 1 E X ̲ Y ̲ = Q X ̲ Q X ̲ ^ .

2.6. Entropy, Divergence, and Information

Entropies of random vectors with densities p are written as
h ( X ̲ ) = E log p ( X ̲ ) , h ( X ̲ | Y ̲ ) = E log p ( X ̲ | Y ̲ )
where we use logarithms to the base e for analysis. The informational divergence of the densities p and q is
D p q = E log p ( X ̲ ) q ( X ̲ )
and D ( p q ) 0 with equality if and only if p = q almost everywhere. The mutual information of X ̲ and Y ̲ is
I ( X ̲ ; Y ̲ ) = D p ( X ̲ , Y ̲ ) p ( X ̲ ) p ( Y ̲ ) = E log p ( Y ̲ | X ̲ ) p ( Y ̲ ) .
The average mutual information of X ̲ and Y ̲ conditioned on Z ̲ is I ( X ̲ ; Y ̲ | Z ̲ ) . We write strings as X L = ( X 1 , X 2 , , X L ) and use the directed information notation (see [9,72])
I ( X L Y L | Z ) = = 1 L I ( X ; Y | Y 1 , Z )
I ( X L Y L Z L | W ) = = 1 L I ( X ; Y | Y 1 , Z , W )
where Y 0 = 0 .

2.7. Entropy and Information Bounds

The expression (2) applies to random vectors. Choosing q ( x ̲ | y ̲ ) as the conditional density where the X ̲ , Y ̲ are modeled as jointly CSCG we obtain a generalization of (5):
h ( X ̲ | Y ̲ ) log det π e Q X ̲ , Y ̲ det π e Q Y ̲ = log det π e Q X ̲ E X ̲ Y ̲ Q Y ̲ 1 E Y ̲ X ̲ .
The vector generalization of (6) for CSCG X ̲ is
I ( X ̲ ; Y ̲ ) = h ( X ̲ ) h ( X ̲ | Y ̲ ) log det Q X ̲ E X ̲ Y ̲ Q Y ̲ 1 E Y ̲ X ̲ 1 Q X ̲ = ( a ) log det I + Q Z ̲ 1 H Q X ̲ 1 H
where (cf. (7))
H = E Y ̲ X ̲ Q X ¯ ̲ 1 , Q Z ̲ = Q Y ̲ H Q X ̲ H
and step ( a ) in (30) follows by the Woodbury identity
A + B C D 1 = A 1 A 1 B C 1 + D A 1 B 1 D A 1
and the Sylvester identity
det I + A B = det I + B A .
We also have vector generalizations of (14) and (15):
h ( X ̲ | Y ̲ ) E log det π e Q X ̲ | Y ̲
I ( X ̲ ; Y ̲ ) E log det Q X ̲ det Q X ̲ | Y ̲ for CSC G X ̲ .

2.8. Capacity and Wideband Rates

Consider the complex-alphabet AWGN channel with output Y = X + Z and noise Z CN ( 0 , 1 ) . The capacity with the block power constraint 1 n i = 1 n | X i | 2 P is
C ( P ) = max E | X | 2 P I ( X ; Y ) = log ( 1 + P ) .
The low SNR regime (small P) is known as the wideband regime [73]. For well-behaved channels such as AWGN channels, the minimum E b / N 0 and the slope S of the capacity vs. E b / N 0 in bits/(3 dB) at the minimum E b / N 0 are (see ([73], Equation (35)) and ([73], Theorem 9))
E b N 0 min = log 2 C ( 0 ) , S = 2 [ C ( 0 ) ] 2 C ( 0 )
where C ( P ) and C ( P ) are the first and second derivatives of C ( P ) (measured in nats) with respect to P, respectively. For example, the wideband derivatives for (36) are C ( 0 ) = 1 and C ( 0 ) = 1 so that the wideband values (37) are
E b N 0 min = log 2 , S = 2 .
The minimal E b / N 0 is usually stated in decibels, for example 10 log 10 ( log 2 ) = 1.59 dB. An extension of the theory to general channels is described in ([74], Section III).
Remark 7.
A useful method is flash signaling, where one sends with zero energy most of the time. In particular, we will consider the CSCG flash density
p ( x ) = ( 1 p ) δ ( x ) + p e | x | 2 / ( P / p ) π ( P / p )
where 0 < p 1 so that the average power is E | X | 2 = P . Note that flash signaling is defined in ([73], Definition 2) as a family of distributions satisfying a particular property as P 0 . We use the terminology informally.

2.9. Uniformly-Spaced Quantizer

Consider a uniformly-spaced scalar quantizer q u ( . ) with B bits, domain [ 0 , ) , and reconstruction points
s { Δ / 2 , 3 Δ / 2 , , Δ / 2 + ( 2 B 1 ) Δ }
where Δ > 0 . The quantization intervals are
I ( s ) = s Δ 2 , s + Δ 2 , s s max s Δ 2 , , s = s max
where s max = Δ / 2 + ( 2 B 1 ) Δ . We will consider B = 0 , 1 , . For B = we choose q u ( x ) = x .
Suppose one applies the quantizer to the non-negative random variable G with density p ( g ) to obtain S T = q u ( G ) . Let P S T and P S T | G be the probability mass functions of S T without and with conditioning on G, respectively. We have
P S T | G ( s | g ) = 1 g I ( s ) , P S T ( s ) = g I ( s ) p ( g ) d g
and using Bayes’ rule, we obtain
p ( g | s ) = p ( g ) / P S T ( s ) , g I ( s ) 0 , else .

3. Generalized Mutual Information

We re-derive the GMI in the usual way, where one starts with the forward model q ( y | x ) rather than the reverse density q ( x | y ) in (8). Consider the joint density p ( x , y ) and define q ( y ) as in (9) for s 0 . Note that neither q ( y | x ) nor q ( y ) must be densities. The GMI is defined in [39] to be max s 0 I s ( X ; Y ) where (see the RHS of (10))
I s ( X ; Y ) = E log q ( Y | X ) s q ( Y )
and where the expectation is with respect to p ( x , y ) . The GMI is a lower bound on the mutual information since
I s ( X ; Y ) = I ( X ; Y ) D p X , Y p Y q X | Y .
Moreover, by using Gallager’s derivation of error exponents, but without modifying his “s” variable, the GMI I s ( X ; Y ) is achievable with a mismatched decoder that uses q ( y | x ) for its decoding metric [39].

3.1. AWGN Forward Model with CSCG Inputs

A natural metric is based on the AWGN auxiliary channel Y a = h X + Z where h is a channel parameter and Z CN ( 0 , σ 2 ) is independent of X, i.e., we have the auxiliary model (here a density)
q ( y | x ) = 1 π σ 2 exp | y h x | 2 / σ 2
where h and σ 2 are to be optimized. A natural input is X CN ( 0 , P ) so that (9) is
q ( y ) = π σ 2 / s ( π σ 2 ) s · exp | y | 2 σ 2 / s + | h | 2 P π ( σ 2 / s + | h | 2 P ) .
We have the following result, see [43] that considers channels of the form (1) and ([47], Proposition 1) that considers general p ( y | x ) .
Proposition 1.
The maximum GMI (42) for the channel p ( y | x ) , a CSCG input X with variance P > 0 , and the auxiliary model (44) with σ 2 > 0 is
I 1 ( X ; Y ) = log 1 + | h ˜ | 2 P σ ˜ 2
where s = 1 and (cf. (7))
h ˜ = E Y X * / P
σ ˜ 2 = E | Y h ˜ X | 2 = E | Y | 2 | h ˜ | 2 P .
The expectations are with respect to the actual density p ( x , y ) .
Proof. 
The GMI (42) for the model (44) is
I s ( X ; Y ) = log 1 + | h | 2 P σ 2 / s + E | Y | 2 σ 2 / s + | h | 2 P E | Y h X | 2 σ 2 / s .
Since (49) depends only on the ratio σ 2 / s one may as well set s = 1 . Thus, choosing h = h ˜ and σ 2 = σ ˜ 2 gives (46).
Next, consider Y a = h ˜ X + Z ˜ where Z ˜ CN ( 0 , σ ˜ 2 ) is independent of X. We have
E | Y a | 2 = E | Y | 2
E | Y a h ˜ X | 2 = E | Y h ˜ X | 2 .
In other words, the second-order statistics for the two channels with outputs Y (the actual channel output) and Y a are the same. But the GMI (46) is the mutual information I ( X ; Y a ) . Using (43) and (49), for any s, h and σ 2 we have
I ( X ; Y a ) = log 1 + | h ˜ | 2 P σ ˜ 2 I s ( X ; Y a ) = I s ( X ; Y )
and equality holds if h = h ˜ and σ 2 / s = σ ˜ 2 . □
Remark 8.
The rate (46) is the same as the RHS of (6).
Remark 9.
Proposition 1 generalizes to vector models and adaptive input symbols; see Section 4.4.
Remark 10.
The estimate h ˜ is the MMSE estimate of h:
h ˜ = arg min h E | Y h X | 2
and σ ˜ 2 is the variance of the error. To see this, expand
E | Y h X | 2 = E | ( Y h ˜ X ) + ( h ˜ h ) X | 2 = σ ˜ 2 + | h ˜ h | 2 P
where the final step follows by the definition of h ˜ in (47).
Remark 11.
Suppose that h is an estimate other than (53). Then if E | Y | 2 > E Y h X 2 we may choose
σ 2 / s = | h | 2 P · E Y h X 2 E | Y | 2 E Y h X 2
and the GMI (49) simplifies to
I s ( X ; Y ) = log E | Y | 2 E Y h X 2 .
Remark 12.
The LM rate (for “lower bound to the mismatch capacity”) improves the GMI for some q ( y | x ) [40,75]. The LM rate replaces q ( y | x ) with q ( y | x ) e t ( x ) / s for some function t ( . ) and permits optimizing s and t ( . ) ; see ([41], Section 2.3.2). For example, if p ( y | x ) has the form q ( y | x ) s e t ( x ) then the LM rate can be larger than the GMI; see [76,77].

3.2. CSIR and K-Partitions

We consider two generalizations of Proposition 1. The first is for channels with a state S R known at the receiver but not at the transmitter. The second expands the class of CSCG auxiliary models. The motivation is to obtain more precise models under partial CSIR, especially to better deal with channels at high SNR and with high rates. We here consider discrete S R and later extend to continuous S R .
CSIR: Consider the average GMI
I 1 ( X ; Y | S R ) = s R P S R ( s R ) I 1 ( X ; Y | S R = s R )
where I 1 ( X ; Y | S R = s R ) is the usual GMI where all densities are conditioned on S R = s R . The parameters (47) and (48) for the event S R = s R are now
h ˜ ( s R ) = E Y X * S R = s R E | X | 2 S R = s R
σ ˜ 2 ( s R ) = E | Y h ˜ ( s R ) X | 2 S R = s R .
The GMI (57) is thus
I 1 ( X ; Y | S R ) = s R P S R ( s R ) log 1 + | h ˜ ( s R ) | 2 P σ ˜ ( s R ) 2 .
K-Partitions: Let { Y k : k = 1 , , K } be a K-partition of Y and define the auxiliary model
q ( y | x ) = 1 π σ k 2 e | y h k x | 2 / σ k 2 , y Y k .
Observe that q ( y | x ) is not necessarily a density. We choose X CN ( 0 , P ) so that (9) becomes (cf. (45))
q ( y ) = π σ k 2 / s ( π σ k 2 ) s · exp | y | 2 σ k 2 / s + | h k | 2 P π ( σ k 2 / s + | h k | 2 P ) , y Y k .
Define the events E k = { Y Y k } for k = 1 , , K . We have
I s ( X ; Y ) = k = 1 K Pr E k · E log q ( Y | X ) s q ( Y ) E k
and inserting (61) and (62) we have the following lemma.
Lemma 1.
The GMI (42) for the channel p ( y | x ) , s = 1 , a CSCG input X with variance P, and the auxiliary model (61) is (see (49))
I 1 ( X ; Y ) = k = 1 K Pr E k log 1 + | h k | 2 P σ k 2 + E | Y | 2 | E k σ k 2 + | h k | 2 P E | Y h k X | 2 | E k σ k 2 .
Remark 13.
K-partitioning formally includes (57) as a special case by including S R as part of the receiver’s “overall” channel output Y ˜ = [ Y , S R ] . For example, one can partition Y ˜ as { Y ˜ s R : s R S R } where Y ˜ s R = Y × { s R } .
Remark 14.
The models (16) and (61) suggest building receivers based on adaptive Gaussian statistics. However, we are motivated to introduce (61) to prove capacity scaling results. For this purpose, we will use K = 2 with the partition
E 1 = { | Y | 2 < t R } , E 2 = { | Y | 2 t R }
and h 1 = 0 , σ 1 2 = 1 . The GMI (64) thus has only the k = 2 term and it remains to choose h 2 , σ 2 2 , and t R .
Remark 15.
One can generalize Lemma 1 and partition X × Y rather than Y only. However, the q ( y ) in (62) might not have a CSCG form.
Remark 16.
Define P k = E | X | 2 | E k and choose the LMMSE auxiliary models with
h k = E Y X * E k / P k
σ k 2 = E | Y h k X | 2 E k = E | Y | 2 E k | h k | 2 P k
for k = 1 , , K . The expression (64) is then
I 1 ( X ; Y ) = k = 1 K Pr E k log 1 + | h k | 2 P E | Y | 2 | E k | h k | 2 P k | h k | 2 ( P P k ) E | Y | 2 | E k + | h k | 2 ( P P k ) .
Remark 17.
The LMMSE-based GMI (68) reduces to the GMI of Proposition 1 by choosing the trivial partition with K = 1 and Y 1 = Y . However, the GMI (68) may not be optimal for K 2 . What can be said is that the phase of h k in (64) should be the same as the phase of E Y X * | E k for all k. We thus have K two-dimensional optimization problems, one for each pair ( | h k | , σ k 2 ) , k = 1 , , K .
Remark 18.
Suppose we choose a different auxiliary model for each Y = y , i.e., consider K . The reverse density GMI uses the auxiliary model (19) which gives the RHS of (15):
I 1 ( X ; Y ) = C p ( y ) log P Var X | Y = y d y .
Instead, the suboptimal (68) is the complicated expression
I 1 ( X ; Y ) = C p ( y ) log 1 + | E X | Y = y | 2 ( P / P y ) Var X | Y = y | E X | Y = y | 2 ( P / P y 1 ) Var X | Y = y + | E X | Y = y | 2 ( P / P y ) d y .
where P y = E | X | 2 | Y = y . We show how to compute these GMIs in Appendix C.

3.3. Example: On-Off Fading

Consider the channel Y = H X + Z where H , X , Z are mutually independent, P H ( 0 ) = P H ( 2 ) = 1 / 2 , and Z CN ( 0 , 1 ) . The channel exhibits particularly simple fading, giving basic insight into more realistic fading models. We consider two basic scenarios: full CSIR and no CSIR.
Full CSIR: Suppose S R = H and
q ( y | x , h ) = p ( y | x , h ) = 1 π σ 2 e | y h x | 2 / σ 2
which corresponds to having (58) and (59) as
h ˜ ( 0 ) = 0 , h ˜ 2 = 2 , σ ˜ 2 ( 0 ) = σ 2 2 = 1 .
The GMI (60) with X CN ( 0 , P ) thus gives the capacity
C ( P ) = 1 2 log 1 + 2 P .
The wideband values (37) are
E b N 0 min = log 2 , S = 1 .
Compared with (38), the minimal E b / N 0 is the same as without fading, namely 1.59 dB. However, fading reduces the capacity slope S; see the dashed curve in Figure 1.
No CSIR: Suppose S R = 0 and X CN ( 0 , P ) and consider the densities
p ( y | x ) = e | y | 2 2 π + e | y 2 x | 2 2 π
p ( y ) = e | y | 2 2 π + e | y | 2 / ( 1 + 2 P ) 2 π ( 1 + 2 P ) .
The mutual information can be computed by numerical integration or by Monte Carlo integration:
I ( X ; Y ) 1 N i = 1 N log p Y | X ( y i | x i ) p Y ( y i )
where the RHS of (77) converges to I ( X ; Y ) for long strings x N , y N sampled from p ( x , y ) . The results for X CN ( 0 , P ) are shown in Figure 1 as the curve labeled “ I ( X ; Y ) Gauss”.
Next, Proposition 1 gives h = 1 / 2 , σ 2 = 1 + P / 2 , and
I 1 ( X ; Y ) = log 1 + P 2 + P .
The wideband values (37) are
E b N 0 min = log 4 , S = 2 / 3
so the minimal E b / N 0 is 1.42 dB and the capacity slope S has decreased further. Moreover, the rate saturates at large SNR at 1 bit per channel use.
The “ I ( X ; Y ) Gauss” curve in Figure 1 suggests that the no-CSIR capacity approaches the full-CSIR capacity for large SNR. To prove this, consider the K = 2 partition specified in Remark 14 with h 1 = 0 , h 2 = 2 , and σ 2 2 = 1 . Since we are not using LMMSE auxiliary models, we must compute the GMI using the general expression (64), which is
I 1 ( X ; Y ) = Pr E 2 log ( 1 + 2 P ) + E | Y | 2 | E 2 1 + 2 P E Y 2 X 2 | E 2 .
In Appendix B.1, we show that choosing t R = P λ R + b where 0 < λ R < 1 and b is a real constant makes all terms behave as desired as P increases:
Pr E 2 1 / 2 , E | Y | 2 | E 2 1 + 2 P 1 , E Y 2 X 2 E 2 1 .
The GMI (80) of Lemma 1 thus gives the maximal value (73) for large P:
lim P 1 2 log ( 1 + 2 P ) I 1 ( X ; Y ) = 0 .
Figure 1 shows the behavior of I 1 ( X ; Y ) for K = 2 , λ R = 0.4 , and b = 3 . Effectively, at large SNR, the receiver can estimate H accurately, and one approaches the full-CSIR capacity.
Remark 19.
For on-off fading, one may compute I ( X ; Y ) directly and use the densities (75) and (76) to decode. Nevertheless, the partitioning of Lemma 1 helps prove the capacity scaling (82).
Consider next the reverse density GMI (69) and the forward model GMI (70). Appendix C.1 shows how to compute E X | Y = y , E | X | 2 | Y = y , and Var X | Y = y , and Figure 1 plots the GMIs as the curves labeled “rGMI” and “GMI, K = ”, respectively. The rGMI curve gives the best possible rates for AWGN auxiliary models, as shown in Section 1.4. The results also show that the large-K GMI (70) is worse than the K = 1 GMI at low SNR but better than the K = 2 GMI of Remark 14.
Finally, the curve labeled “ I ( X ; Y ) Gauss” in Figure 1 suggests that the minimal E b / N 0 is 1.42 dB even for the capacity-achieving distribution. However, we know from ([73], Theorem 1) that flash signaling (39) can approach the minimal E b / N 0 of 1.59 dB. For example, the flash rates I ( X ; Y ) with p = 0.05 are plotted in Figure 1. Unfortunately, the wideband slope is S = 0 ([73], Theorem 17), and one requires very large flash powers (very small p) to approach 1.59 dB.
Remark 20.
As stated in Remark 6, the paper [37] (see also [2,70]) derives two capacity lower bounds. These bounds are the same for our problem, and they are derived using the following steps (see ([37], Lemmas 3 and 4)):
I ( X ; Y ) = I ( X , S H ; Y ) I ( S H ; Y | X ) I ( X ; Y | S H ) I ( S H ; Y | X ) .
Now consider Y = H X + Z where H , X , Z are mutually independent, S H = H , Var Z = 1 , and X CN ( 0 , P ) . We have
I ( X ; Y | H ) E log ( 1 + | H | 2 P )
I ( H ; Y | X ) = h ( Y | X ) h ( Z ) log π e ( 1 + Var H P ) h ( Z )
where (84) and (85) follow by (5), in the latter case with the roles of X and Y reversed. The bound (85) works well if Var H is small, as for massive MIMO with “channel hardening”. However, for our on-off fading model, the bound (83) is
I ( X ; Y ) E log 1 + | H | 2 P log ( 1 + Var H P ) = 1 2 log ( 1 + 2 P ) log ( 1 + P / 2 )
which is worse than the K = 1 and K = GMIs and is not shown in Figure 1.

4. Channels with CSIT

This section studies Shannon’s channel with side information, or state, known causally at the transmitter [5,6]. We begin by treating general channels and then focus mainly on complex-alphabet channels. The capacity expression has a random variable A that is either a list (for discrete-alphabet states) or a function (for continuous-alphabet states). We refer to A as an adaptive symbol of an adaptive codeword.

4.1. Model

The problem is specified by the functional dependence graph (FDG) in Figure 2. The model has a message M, a CSIT string S T n , and a noise string Z n . The variables M, S T n , Z n are mutually statistically independent, and S T n and Z n are strings of i.i.d. random variables with the same distributions as S T and Z, respectively. S T n is available causally at the transmitter, i.e., the channel input X i , i = 1 , , n , is a function of M and the sub-string S T i . The receiver sees the channel outputs
Y i = f ( X i , S T i , Z i )
for some function f ( . ) and i = 1 , 2 , , n .
Each A i represents a list of possible choices of X i at time i. More precisely, suppose that S T has alphabet S T = { 0 , 1 , , ν 1 } and define the adaptive symbol
A = X ( 0 ) , , X ( ν 1 )
whose entries have alphabet X . Here S T = s T means that X ( s T ) is transmitted, i.e., we have X = X ( S T ) . If S T has a continuous alphabet, we make A a function rather than a list, and we may again write X = X ( S T ) . Some authors therefore write A as X ( . ) . (Shannon in [6] denoted our A and X as the respective X and x.)
Remark 21.
The conventional choice for A if X = C is
A = P ( 0 ) e j ϕ ( 0 ) , , P ( ν 1 ) e j ϕ ( ν 1 ) · U
where U has E | U | 2 = 1 , P ( s T ) = E | X ( s T ) | 2 , and ϕ ( s T ) is a phase shift. The interpretation is that U represents the symbol of a conventional codebook without CSIT, and these symbols are scaled and rotated. In other words, one separates the message-carrying U from an adaptation due to S T via
X = P ( S T ) e j ϕ ( S T ) U .
Remark 22.
One may define the channel by the functional relation (87), by p ( y | a ) , or by p ( y | x , s T ) ; see Shannon’s emphasis in ([6], Theorem); see ([22], Remark 3). We generally prefer to use p ( y | a ) since we interpret A as a channel input.
Remark 23.
One can add feedback and let X i be a function of ( M , S T i , Y i 1 ) , but feedback does not increase the capacity if the state and noise processes are memoryless ([22], Section V).
Remark 24.
The model (87) permits block fading and MIMO transmission by choosing X i and Y i as vectors [11,78].

4.2. Capacity

The capacity of the model under study is (see [6])
C = max A I ( A ; Y )
where A [ S T , X ] Y forms a Markov chain. One may limit attention to A with cardinality | A | satisfying (see ([22], Equation (56)), [79], ([80], Theorem 1))
| A | min | Y | , 1 + | S T | ( | X | 1 ) .
As usual, for the cost function c ( x , y ) and the average block cost constraint
1 n i = 1 n E c ( X i , Y i ) P
the unconstrained maximization in (90) becomes a constrained maximization over the A for which E c ( X , Y ) P . Also, a simple upper bound on the capacity is
C ( P ) max A : E c ( X , Y ) P I ( A ; Y , S T ) = ( a ) max X ( S T ) : E c ( X ( S T ) , Y ) P I ( X ; Y | S T )
where step ( a ) follows by the independence of A and S T . This bound is tight if the receiver knows S T .
Remark 25.
The chain rule for mutual information gives
I ( A ; Y ) = I X ( 0 ) X ( ν 1 ) ; Y
= s T = 0 ν 1 I X ( s T ) ; Y | X ( 0 ) , , X ( s T 1 ) .
The RHS of (94) suggests treating the channel as a multi-input, single-output (MISO) channel, and the expression (95) suggests using multi-level coding with multi-stage decoding [81]. For example, one may use polar coded modulation [82,83,84] with Honda-Yamamoto shaping [85,86].
Remark 26.
For X = C and the conventional adaptive symbol (88), we compute I ( A ; Y ) = I ( U ; Y ) and
C ( P ) = max P ( S T ) , ϕ ( S T ) : E c ( X ( S T ) , Y ) P I ( U ; Y ) .

4.3. Structure of the Optimal Input Distribution

Let A be the alphabet of A and let X = C , i.e., we have A = C ν for discrete S T . Consider the expansions
p ( y | a ) = s T P S T ( s T ) p ( y | x ( s T ) , s T ) p ( y ) = A p ( a ) p ( y | a ) d a
= s T P S T ( s T ) C p ( x ( s T ) ) p ( y | x ( s T ) , s T ) d x ( s T ) .
Observe that p ( y ) , and hence h ( Y ) , depends only on the marginals p ( x ( s T ) ) of A; see ([80], Section III). So define the set of densities having the same marginals as A:
P ( A ) = p ( a ˜ ) : p ( x ˜ ( s T ) ) = p ( x ( s T ) ) for all s T S T .
This set is convex, since for any p ( 1 ) ( a ) , p ( 2 ) ( a ) P ( A ) and 0 λ 1 we have
λ p ( 1 ) ( a ) + ( 1 λ ) p ( 2 ) ( a ) P ( A ) .
Moreover, for fixed p ( y ) , the expression I ( A ; Y ) is a convex function of p ( a | y ) , and p ( a | y ) = p ( a ) p ( y | a ) / p ( y ) is a linear function of p ( a ) . Maximizing I ( A ; Y ) over P ( A ) is thus the same as minimizing the concave function h ( Y | A ) over the convex set P ( A ) . An optimal p ( a ) is thus an extreme of P ( A ) . Some properties of such extremes are developed in [87,88].
For example, consider | S T | = 2 and X = S T = { 0 , 1 } , for which (91) states that at most | A | = 3 adaptive symbols need have positive probability (and at most | A | = 2 adaptive symbols if | Y | = 2 ). Suppose the marginals have P X ( 0 ) ( 0 ) = 1 / 2 , P X ( 1 ) ( 0 ) = 3 / 4 and consider the matrix notation
P A = P A ( 0 , 0 ) P A ( 0 , 1 ) P A ( 1 , 0 ) P A ( 1 , 1 )
where we write P A ( x 1 , x 2 ) for P A ( [ x 1 , x 2 ] ) . The optimal P A must then be one of the two extremes
P A = 1 / 2 0 1 / 4 1 / 4 , P A = 1 / 4 1 / 4 1 / 2 0 .
For the first P A , the codebook has the property that if X ( 0 ) = 0 then X ( 1 ) = 0 while if X ( 0 ) = 1 then X ( 1 ) is uniformly distributed over X = { 0 , 1 } .
Next, consider | S T | = 2 and marginals P X ( 0 ) , P X ( 1 ) that are uniform over X = { 0 , 1 , , | X | 1 } . This case was treated in detail in ([80], Section VI.A), see also [89], and we provide a different perspective. A classic theorem of Birkhoff [90] ensures that the extremes of P ( A ) are the | X | ! distributions P A for which the | X | × | X | matrix
P A = P A ( 0 , 0 ) P A ( 0 , | X | 1 ) P A ( | X | 1 , 0 ) P A ( | X | 1 , | X | 1 ) .
is a permutation matrix multiplied by 1 / | X | . For example, for | X | = 2 we have the two extremes
P A = 1 2 1 0 0 1 , P A = 1 2 0 1 1 0 .
The permutation property means that X ( s T ) is a function of X ( 0 ) , i.e., the encoding simplifies to a conventional codebook as in Remark 21 with uniformly-distributed U and a permutation π s T ( . ) indexed by s T such that X ( S T ) = π S T ( U ) . For example, for the first P A in (101) we may choose X ( S T ) = U , which is independent of S T . On the other hand, for the second P A in (101) we may choose X ( S T ) = U S T where ⊕ denotes addition modulo-2.
For | S T | > 2 , the geometry of P ( A ) is more complicated; see ([80], Section VI.B). For example, consider X = { 0 , 1 } and suppose the marginals P X ( s T ) , s T S T , are all uniform. Then the extremes include P A related to linear codes and their cosets, e.g., two extremes for | S T | = 3 are related to the repetition code and single parity check code:
P A ( a ) = 1 / 2 , a { [ 0 , 0 , 0 ] , [ 1 , 1 , 1 ] } P A ( a ) = 1 / 4 , a { [ 0 , 0 , 0 ] , [ 0 , 1 , 1 ] , [ 1 , 0 , 1 ] , [ 1 , 1 , 0 ] } .
This observation motivates concatenated coding, where the message is first encoded by an outer encoder followed by an inner code that is the coset of a linear code. The transmitter then sends the entries at position S T of the inner codewords, which are vectors of dimension | S T | . We do not know if there are channels for which such codes are helpful.

4.4. Generalized Mutual Information

Consider the vector channel p ( y ̲ | x ̲ ) with input set X = C M and output set Y = C N . The GMI for adaptive symbols is max s 0 I s ( A ; Y ̲ ) where
I s ( A ; Y ̲ ) = E log q ( Y ̲ | A ) s q ( Y ̲ )
and the expectation is with respect to p ( a , y ̲ ) . Suppose the auxiliary model is q ( y ̲ | a ) and define
q ( y ̲ ) = A p ( a ) q ( y ̲ | a ) s d a .
The GMI again provides a lower bound on the mutual information since (cf. (43))
I s ( A ; Y ̲ ) = I ( A ; Y ̲ ) D p A , Y ̲ p Y ̲ q A | Y ̲
where q ( a | y ̲ ) = p ( a ) q ( y ̲ | a ) s / q ( y ̲ ) is a reverse channel density.
We next study reverse and forward models as in Section 1.3 and Section 1.4. Suppose the entries X ̲ ( s T ) of A are jointly CSCG.
Reverse Model: We write A ̲ when we consider A to be a column vector that stacks the X ̲ ( s T ) . Consider the following reverse density motivated by (13):
q ( a ̲ | y ̲ ) = exp ( a ̲ E A ̲ | Y ̲ = y ̲ ) Q A ̲ | Y ̲ = y ̲ 1 ( a ̲ E A ̲ | Y ̲ = y ̲ ) π ν M det Q A ̲ | Y ̲ = y ̲ .
A corresponding forward model is q y ̲ | a = q a | y ̲ / p ( a ) and the GMI with s = 1 becomes (cf. (35))
I 1 ( A ; Y ̲ ) = E log det Q A ̲ det Q A ̲ | Y ̲ .
To simplify, one may focus on adaptive symbols as in (89):
X ̲ = Q X ̲ ( S T ) 1 / 2 · U ̲
where U ̲ CN ( 0 ̲ , I ) and the Q X ̲ ( s T ) are covariance matrices. We thus have I ( A ; Y ̲ ) = I ( U ̲ ; Y ̲ ) (cf. (96)) and using (105) but with A ̲ replaced with U ̲ we obtain
I 1 ( A ; Y ̲ ) = E log det Q U ̲ | Y ̲ .
Forward Model: Perhaps the simplest forward model is q ( y ̲ | a ) = p ( y ̲ | x ̲ ( s T ) ) for some fixed value s T S T . One may interpret this model as having the receiver assume that S T = s T . A natural generalization of this idea is as follows: define the auxiliary vector
X ¯ ̲ = s T W ( s T ) X ̲ ( s T )
where the W ( s T ) are M × M complex matrices, i.e., X ¯ ̲ is a linear function of the entries of A = [ X ̲ ( s T ) : s T S T ] . For example, the matrices might be chosen based on P S T ( . ) . However, observe that X ¯ ̲ is independent of S T . Now define the auxiliary model
q ( y ̲ | a ) = q ( y ̲ | x ¯ ̲ )
where we abuse notation by using the same q ( . ) . The expression (103) becomes
q ( y ̲ ) = A p ( a ) q ( y ̲ | a ) s d a = C p ( x ¯ ̲ ) q ( y ̲ | x ¯ ̲ ) s d x ¯ ̲ .
Remark 27.
We often consider S T to be a discrete set, but for CSCG channels we also consider S T = C so that the sum over S T in (109) is replaced by an integral over C .
We now specialize further by choosing the auxiliary channel Y ̲ a = H X ¯ ̲ + Z ̲ where H is an N × M complex matrix, Z ̲ is an N-dimensional CSCG vector that is independent of X ¯ ̲ and has invertible covariance matrix Q Z ̲ , and H and Q Z ̲ are to be optimized. Further choose A = [ X ̲ ( s T ) : s T S T ] whose entries are jointly CSCG with correlation matrices
R ( s T 1 , s T 2 ) = E X ̲ ( s T 1 ) X ̲ ( s T 2 ) .
Since X ¯ ̲ in (109) is independent of S T , we have
q ( y ̲ | a ) = exp y ̲ H x ¯ ̲ Q Z ̲ 1 y ̲ H x ¯ ̲ π N det Q Z ̲ .
Moreover, X ¯ ̲ is CSCG so (110) is
q ( y ̲ ) = π N det Q Z ̲ / s π N det Q Z ̲ s · exp y ̲ Q Z ̲ / s + H Q X ¯ ̲ H 1 y ̲ π N det Q Z ̲ / s + H Q X ¯ ̲ H
where
Q X ¯ ̲ = s T 1 , s T 2 W ( s T 1 ) R ( s T 1 , s T 2 ) W ( s T 2 ) .
We have the following generalization of Proposition 1.
Lemma 2.
The maximum GMI (102) for the channel p ( y ̲ | a ) , an adaptive vector A = [ X ̲ ( s T ) : s T S T ] that has jointly CSCG entries, an X ¯ ̲ as in (109) with Q X ¯ ̲ 0 , and the auxiliary model (111) with Q Z ̲ 0 is
I 1 ( A ; Y ̲ ) = log det I + Q Z ˜ ̲ 1 H ˜ Q X ¯ ̲ H ˜
where (cf. (31))
H ˜ = E Y ̲ X ¯ ̲ Q X ¯ ̲ 1
Q Z ˜ ̲ = Q Y ̲ H ˜ Q X ¯ ̲ H ˜ .
The expectation is with respect to the actual channel with joint distribution/density p ( a , y ̲ ) .
Proof. 
See Appendix D. □
Remark 28.
Since X ¯ ̲ is a function of A, the rate (112) can alternatively be derived by using I ( A ; Y ̲ ) I ( X ¯ ̲ ; Y ̲ ) and applying the bound (30) with X ̲ replaced with X ¯ ̲ .
Remark 29.
The estimate H ˜ is the MMSE estimate of H :
H ˜ = arg min H E Y ̲ H X ¯ ̲ 2
and Q Z ̲ ˜ is the resulting covariance matrix of the error. To see this, expand (cf. (54))
E Y ̲ H X ¯ ̲ 2 = E ( Y ̲ H ˜ X ¯ ̲ ) + ( H ˜ H ) X ¯ ̲ 2 = E Y ̲ H ˜ X ¯ ̲ 2 + tr ( H ˜ H ) Q X ¯ ̲ ( H ˜ H )
where the final step follows by the definition of H ˜ in (113).
Remark 30.
Suppose that H is an estimate other than (115). Generalizing (55), if Q Y ̲ Q Z ¯ ̲ we may choose
Q Z ̲ / s = H Q X ¯ ̲ H 1 / 2 Q Y ̲ Q Z ¯ ̲ 1 / 2 Q Z ¯ ̲ Q Y ̲ Q Z ¯ ̲ 1 / 2 H Q X ¯ ̲ H 1 / 2
where
Q Z ¯ ̲ = E Y ̲ H X ¯ ̲ Y ̲ H X ¯ ̲ .
Appendix D shows that (102) then simplifies to (cf. (56))
I s ( A ; Y ̲ ) = log det Q Z ¯ ̲ 1 Q Y ̲ .
Remark 31.
The GMI (112) does not depend on the scaling of X ¯ ̲ since this is absorbed in H ˜ . For example, one can choose the weighting matrices in (109) so that E X ¯ ̲ 2 = P .

4.5. Optimal Codebooks for CSCG Forward Models

The following Lemma maximizes the GMI for scalar channels and A with CSCG entries without requiring A to have the form (89). Nevertheless, this form is optimal, and we refer to ([10], page 2013) and Section 6.4 for similar results. In the following, let U ( s T ) CN ( 0 , 1 ) for all s T .
Lemma 3.
The maximum GMI (102) for the channel p ( y | a ) , an adaptive symbol A with jointly CSCG entries, the forward model (111), and with fixed P ( s T ) = E | X ( s T ) | 2 is
I 1 ( A ; Y ) = log 1 + P ˜ E | Y | 2 P ˜
where, writing X ( s T ) = P ( s T ) U ( s T ) for all s T , we have
P ˜ = E E Y U ( S T ) * S T 2 .
This GMI is achieved by choosing fully-correlated symbols:
X ( s T ) = P ( s T ) e j ϕ ( s T ) U
and X ¯ = c U for some non-zero constant c and a common U CN ( 0 , 1 ) , and where
ϕ ( s T ) = arg E Y U ( s T ) * S T = s T .
Proof. 
See Appendix E. □
Remark 32.
The expression (121) is based on (A58) in Appendix E and can alternatively be written as P ˜ = | h ˜ | 2 P ¯ where
h ˜ = E Y X ¯ * / P ¯ .
Remark 33.
The power levels P ( s T ) may be optimized, usually under a constraint such as E P ( S T ) P .
Remark 34.
By the Cauchy-Schwarz inequality, we have
E E Y U ( S T ) * S T 2 E | Y | 2 .
Furthermore, equality holds if and only if | Y U ( s T ) * | is a constant for each s T , but this case is not interesting.

4.6. Forward Model GMI for MIMO Channels

The following lemma generalizes Lemma 3 to MIMO channels without claiming a closed-form expression for the optimal GMI. Let U ̲ ( s T ) CN ( 0 ̲ , I ) for all s T .
Lemma 4.
A GMI (102) for the channel p ( y ̲ | a ) , an adaptive vector A with jointly CSCG entries, the auxiliary model (111), and with fixed Q X ̲ ( s T ) is given by (112) that we write as
I 1 ( A ; Y ̲ ) = log det Q Y ̲ det Q Y ̲ D ˜ D ˜ .
where for M × M unitary V R ( s T ) we have
D ˜ = E U T ( S T ) Σ ( S T ) V R ( S T )
and U T ( s T ) and Σ ( s T ) are N × N unitary and N × M rectangular diagonal matrices, respectively, of the SVD
E Y ̲ U ̲ ( s T ) S T = s T = U T ( s T ) Σ ( s T ) V T ( s T )
for all s T , and the V T ( s T ) are M × M unitary matrices. The GMI (124) is achieved by choosing the symbols (cf. (122) and (A87) below):
X ̲ ( s T ) = Q X ̲ ( s T ) 1 / 2 V T ( s T ) U ̲
and X ¯ ̲ = C U ̲ for some invertible M × M matrix C and a common M-dimensional vector U ̲ CN ( 0 ̲ , I ) . One may maximize (124) over the unitary V R ( s T ) .
Proof. 
See Appendix G. □
Using Lemma 4, the theory for MISO channels with N = 1 is similar to the scalar case of Lemma 3; see Remark 35 below. However, optimizing the GMI is more difficult for N > 1 because one must optimize over the unitary matrices V R ( s T ) in (125); see Remark 36 below.
Remark 35.
Consider N = 1 in which case one may set U T ( s T ) = 1 and (126) is a 1 × M vector where Σ ( s T ) has as the only non-zero singular value
σ ( s T ) = E Y U ̲ ( s T ) S T = s T = m = 1 M E Y U m ( s T ) * S T = s T 2 1 / 2 .
The absolute value of the scalar (125) is maximized by choosing V R ( s T ) = I for all s T to obtain (cf. (121))
D ˜ D ˜ = E σ ( S T ) 2 .
Remark 36.
Consider M = 1 in which case one may set V T ( s T ) = 1 and (126) is a N × 1 vector where Σ ( s T ) has as the only non-zero singular value
σ ( s T ) = E Y ̲ U ( s T ) S T = s T = n = 1 N E Y n U ( s T ) * S T = s T 2 1 / 2 .
We should now find the V R ( s T ) = e j ϕ R ( s T ) that minimize the determinant in the denominator of (124) where (see (125))
D ˜ = E u ̲ T ( S T ) σ ( S T ) e j ϕ R ( S T )
and where each u ̲ T ( s T ) is one of the columns of the N × N unitary matrix U T ( s T ) .
Remark 37.
Consider M = N and the product channel
p ( y ̲ | a ) = m = 1 M p y m | [ x m ( s T ) : s T S T ]
where x m ( s T ) is the m’th entry of x ̲ ( s T ) . We choose Q X ̲ ( s T ) as diagonal with diagonal entries P m ( s T ) , m = 1 , , M . Also choosing V R ( s T ) = I makes the matrix D ˜ D ˜ diagonal with the diagonal entries (cf. (121) where M = N = 1 )
s T P S T ( s T ) E Y m U m ( s T ) * S T = s T 2
for m = 1 , , M . The GMI (124) is thus (cf. (120))
I 1 ( A ; Y ̲ ) = m = 1 M log E | Y m | 2 E | Y m | 2 E | E Y m U m ( S T ) * S T | 2 .
Remark 38.
For general p ( y ̲ | a ) , one might wish to choose diagonal Q X ̲ ( s T ) and a product model
q ( y ̲ | a ) = m = 1 M q m ( y m | x ¯ m )
where the q m ( . ) are scalar AWGN channels
q m ( y | x ) = 1 π σ m 2 exp | y h m x | 2 / σ m 2
with possibly different h m and σ m 2 for each m. Consider also
X ¯ m = s T w m ( s T ) X m ( s T )
for some complex weights w m ( s T ) , i.e., X ¯ m is a weighted sum of entries from the list [ X m ( s T ) : s T S T ] . The maximum GMI is now the same as (134) but without requiring the actual channel to have the form (132).
Remark 39.
If the actual channel is Y ̲ = H X ̲ + Z ̲ then
E Y ̲ U ̲ ( s T ) | S T = s T = E H X ̲ ( s T ) U ̲ ( s T ) | S T = s T = E H | S T = s T Q X ̲ ( s T ) 1 / 2
where the final step follows because U ̲ ( S T ) S T H forms a Markov chain. The expression (135) is useful because it separates the effects of the channel and the transmitter.
Remark 40.
Combining Remarks 37 and 39, suppose the actual channel is Y ̲ = H X ̲ + Z ̲ with M = N and where H is diagonal with diagonal entries H m , m = 1 , , M . The GMI (124) is then (cf. (134))
I 1 ( A ; Y ̲ ) = m = 1 M log E | Y m | 2 E | Y m | 2 E E H m P m ( S T ) S T 2
where E | Y m | 2 = 1 + E | H m | 2 P m ( S T ) .

5. Channels with CSIR and CSIT

Shannon’s model includes CSIR [11]. The FDG is shown in Figure 3 where there is a hidden state S H , the CSIR S R and CSIT S T are functions of S H , and the receiver sees the channel outputs
[ Y i , S R i ] = [ f ( X i , S H i , Z i ) , S R i ]
for some function f ( . ) and i = 1 , 2 , , n . (By defining S H = [ S H 1 , Z H ] and calling S H 1 the hidden channel state we can include the case where S R and S T are noisy functions of S H 1 .) As before, M, S H n , Z n are mutually statistically independent, and S H n and Z n are i.i.d. strings of random variables with the same distributions as S T and Z, respectively. Observe that we have changed the notation by writing Y for only part of the channel output. The new Y (without the S R ) is usually called the “channel output”.

5.1. Capacity and GMI

We begin with scalar channels for which (90) is
C = max A I ( A ; Y , S R ) = max A I ( A ; Y | S R )
where A and S R are independent.
Reverse Model: The expression (108) with the adaptive symbol (88) is
I 1 ( A ; Y , S R ) = E log Var U | Y , S R .
Forward Model: Consider the expansion
I 1 ( A ; Y | S R ) = S R p ( s R ) I 1 ( A ; Y | S R = s R ) d s R
where I 1 ( A ; Y | S R = s R ) is the GMI (102) with all densities conditioned on S R = s R . We choose the forward model
q ( y | a , s R ) = 1 π σ ( s R ) 2 exp | y h ( s R ) x ¯ ( s R ) | 2 σ ( s R ) 2 .
where similar to (109) we define
X ¯ ( s R ) = s T w ( s T , s R ) X ( s T )
for complex weights w ( s T , s R ) , i.e., X ¯ ( s R ) is a weighted sum of entries from the list A = [ X ( s T ) : s T S T ] . We have the following straightforward generalization of Lemma 3.
Theorem 1.
The maximum GMI (140) for the channel p ( y | a , s R ) , an adaptive symbol A with jointly CSCG entries, the model (141), and with fixed P ( s T ) = E | X ( s T ) | 2 is
I 1 ( A ; Y | S R ) = E log 1 + P ˜ ( S R ) E | Y | 2 | S R P ˜ ( S R )
where for all s R S R we have
P ˜ ( s R ) = E E Y U ( S T ) * S T , S R = s R 2 .
Remark 41.
To establish Theorem 1, the receiver may choose X ¯ = P U to be independent of s R . Alternatively, the receiver may choose X ¯ ( s R ) = E | X | 2 | S R = s R U . Both choices give the same GMI since the expectation in (144) does not depend on the scaling of X ¯ ; see Remark 31.
Remark 42.
The partition idea of Lemmas 1 and 5 carries over to Theorem 1. We may generalize (143) as
I 1 ( A ; Y | S R ) = S R p ( s R ) k = 1 K Pr E k | S R = s R log 1 + | h k ( s R ) | 2 P σ k 2 ( s R ) + E | Y | 2 E k , S R = s R σ k 2 ( s R ) + | h k ( s R ) | 2 P E | Y h k ( s R ) P U | 2 E k , S R = s R σ k 2 ( s R ) d s R
where the X ( s T ) , s T S T , are given by (122) and the h k ( s R ) and σ k 2 ( s R ) , k = 1 , , K , s R S R , can be optimized.
Remark 43.
One is usually interested in the optimal power control policy P ( s T ) under the constraint E P ( S T ) P . Taking the derivative of (143) with respect to P ( s T ) and setting to zero we obtain
E E | Y | 2 | S R P ˜ ( S R ) P ˜ ( S R ) E | Y | 2 | S R E | Y | 2 | S R E | Y | 2 | S R P ˜ ( S R ) = 2 λ P ( s T ) P S T ( s T )
where P ˜ ( S R ) and E | Y | 2 | S R are derivatives with respect to P ( s T ) . We use (146) below to derive power control policies.
Remark 44.
A related model is a compound channel where p ( y | a , s R ) is indexed by the parameter s R ([91], Chapter 4). The problem is to find the maximum worst-case reliable rate if the transmitter does not know s R . Alternatively, the transmitter must send its message to all | S R | receivers indexed by s R S R . A compound channel may thus be interpreted as a broadcast channel with a common message.

5.2. CSIT@ R

An interesting specialization of Shannon’s model is when the receiver knows S T and can determine X ( S T ) . We refer to this scenario as CSIT@R. The model was considered in ([10], Section II) when S T is a function of S R . More generally, suppose S T is a function of [ Y , S R ] . The capacity (138) then simplifies to (see ([10], Proposition 1))
C = ( a ) max A I ( A ; Y , S T | S R ) = ( b ) max A I ( X ; Y | S R , S T ) = ( c ) s T P S T ( s T ) max X ( s T ) I ( X ( s T ) ; Y | S R , S T = s T )
where step ( a ) follows because S T is a function of [ Y , S R ] ; step ( b ) follows because A and ( S R , S T ) are independent, X is a function of [ A , S T ] , and A [ S T , X ] Y forms a Markov chain; and step ( c ) follows because one may optimize X ( s T ) separately for each s T S T .
As discussed in [10], a practical motivation for this model is when the CSIT is based on error-free feedback from the receiver to the transmitter. In this case, where S T is a function of S R , the expression (144) becomes
P ˜ ( s R ) = E Y U ( s T ) * S R = s R 2 .
Remark 45.
The insight that one can replace adaptive symbols A with channel inputs X when X is a function of A and past Y appeared for two-way channels in ([9], Section 4.2.3) and networks in ([22], Section V.A), ([72], Section IV.F).

5.3. MIMO Channels and K-Partitions

We consider generalizations to MIMO channels and K-partitions as in Section 3.2.
MIMO Channels: Consider the average GMI
I 1 ( A ; Y ̲ | S R ) = S R p ( s R ) I 1 ( A ; Y ̲ | S R = s R ) d s R
and choose the parameters (113) and (114) for the event S R = s R . We have
H ˜ ( s R ) = E Y ̲ X ¯ ̲ S R = s R E X ¯ ̲ X ¯ ̲ S R = s R 1
Q Z ˜ ̲ ( s R ) = E Y ̲ Y ̲ S R = s R H ˜ ( s R ) E X ¯ ̲ X ¯ ̲ S R = s R H ˜ ( s R )
and the GMI (149) is (cf. (60) and (112))
I 1 ( A ; Y ̲ | S R ) = E log det I + Q Z ˜ ̲ ( S R ) 1 H ˜ ( S R ) Q X ¯ ̲ H ˜ ( S R ) .
K-Partitions: Let { Y ̲ k : k = 1 , , K } be a K-partition of Y ̲ and define the events E k = { Y ̲ Y ̲ k } for k = 1 , , K . As in Remark 13, K-partitioning formally includes (149) as a special case by including S R as part of the receiver’s “overall” channel output Y ˜ ̲ = [ Y ̲ , S R ] . The following lemma generalizes Lemma 1.
Lemma 5.
A GMI with s = 1 for the channel p ( y ̲ | a ) is
I 1 ( A ; Y ̲ ) = k = 1 K Pr E k log det I + Q Z ̲ k 1 H k Q X ¯ ̲ H k + E Y ̲ Q Z ̲ k + H k Q X ¯ ̲ H k 1 Y ̲ E k E Y ̲ H k X ¯ ̲ Q Z ̲ k 1 Y ̲ H k X ¯ ̲ E k
where the H k and Q Z ̲ k , k = 1 , , K , can be optimized.
Remark 46.
For scalars the GMI (153) is
I 1 ( A ; Y ) = k = 1 K Pr E k log 1 + | h k | 2 P ¯ σ k 2 + E | Y | 2 | E k σ k 2 + | h k | 2 P ¯ E | Y h k X ¯ | 2 | E k σ k 2
which is the same as (64) except that X ¯ , P ¯ replace X , P . If we follow (66) and (67) then (154) becomes (68) but with
h k = E Y X ¯ * E k / P k , P k = E X ¯ 2 E k .
Remark 47.
Consider Remark 14 and choose K = 2 , h 1 = 0 , σ 1 2 = 1 . The GMI (154) then has only the k = 2 term, and it again remains to select h 2 , σ 2 2 , and t R .
Remark 48.
If we define
Q X ¯ ̲ ( k ) = E X ¯ ̲ X ¯ ̲ E k , Q Y ̲ ( k ) = E Y ̲ Y ̲ E k
and choose the LMMSE auxiliary models with
H k = E Y ̲ X ¯ ̲ E k Q X ¯ ̲ ( k ) 1
Q Z ̲ k = Q Y ̲ ( k ) H k Q X ¯ ̲ ( k ) H k
for k = 1 , , K then the expression (153) is (cf. (68))
I 1 ( A ; Y ̲ ) = k = 1 K Pr E k log det I + Q Z ̲ k 1 H k Q X ¯ ̲ H k
tr Q Y ̲ ( k ) + H k D X ¯ ̲ ( k ) H k 1 H k D X ¯ ̲ ( k ) H k
where D X ¯ ̲ ( k ) = Q X ¯ ̲ Q X ¯ ̲ ( k ) .
Remark 49.
We may proceed as in Remark 18 and consider large K. These steps are given in Appendix F.

6. Fading Channels with AWGN

This section treats scalar, complex-alphabet, AWGN channels with CSIR for which the channel output is
[ Y , S R ] = [ H X + Z , S R ]
where H , A , Z are mutually independent, E | H | 2 = 1 , and Z CN ( 0 , 1 ) . The capacity under the power constraint E | X | 2 P is (cf. (138))
C ( P ) = max A : E | X | 2 P I ( A ; Y | S R ) .
However, the optimization in (160) is often intractable, and we desire expressions with log ( 1 + SNR ) terms to gain insight. We develop three such expressions: an upper bound and two lower bounds. It will be convenient to write G = | H | 2 .
Capacity Upper Bound: We state this bound as a lemma since we use it to prove Proposition 2 below.
Lemma 6.
The capacity (160) is upper bounded as
C ( P ) max E log 1 + G P ( S T )
where the maximization is over P ( S T ) with E P ( S T ) = P .
Proof. 
Consider the steps
I ( A ; Y | S R ) I ( A ; Y , S T , H | S R ) = ( a ) I ( A ; Y | S R , S T , H ) = h ( Y | S R , S T , H ) h ( Z ) ( b ) E log Var Y | S R , S T , H
where step ( a ) is because A and [ S R , S T , H ] are independent, and step ( b ) follows by the entropy bound
h ( Y | B = b ) log π e Var Y | B = b
which we applied with B = [ S R , S T , H ] . Finally, we compute Var Y | S R , S T , H = 1 + G P ( S T ) . □
Reverse Model GMI: Consider the adaptive symbol (88) and the GMI (139). We expand the variances in (139) as
Var U | Y = y , S R = s R = E | U | 2 | Y = y , S R = s R | E U | Y = y , S R = s R | 2 .
Appendix C shows that one may write
E U | Y = y , S R = s R = C × S T p ( h , s T | y , s R ) h P ( s T ) e j ϕ ( s T ) y 1 + | h | 2 P ( s T ) d s T d h
and
E | U | 2 | Y = y , S R = s R = C × S T p ( h , s T | y , s R ) 1 1 + | h | 2 P ( s T ) + | h | 2 P ( s T ) | y | 2 1 + | h | 2 P ( s T ) 2 d s T d h .
We use the expressions (164) and (165) to compute achievable rates by numerical integration. For example, suppose that S T = 0 and S R = H , i.e., we have full CSIR and no CSIT. The averaging density is then
p ( h , s T | y , s R ) = δ ( h s R ) δ ( s T )
and the variance simplifies to the capacity-achieving form
Var U | Y = y , S R = h = 1 1 + | h | 2 P .
Forward Model GMI: A forward model GMI is given by Theorem 1 where
P ˜ ( s R ) = E E H P ( S T ) S T , S R = s R 2
E | Y | 2 | S R = s R = 1 + E G P ( S T ) | S R = s R
so that (143) becomes
I 1 ( A ; Y | S R ) = E log 1 + P ˜ ( S R ) 1 + E G P ( S T ) | S R P ˜ ( S R ) .
Remark 50.
Jensen’s inequality implies that the denominator in (168) is greater than or equal to
1 + Var G P ( S T ) S R .
Equality requires that for all S R = s R we have
P ˜ ( s R ) = E G P ( S T ) S R = s R 2
which is valid if H is a function of [ S R , S T ] , for example. However, if there is channel uncertainty after conditioning on [ S R , S T ] then P ˜ ( s R ) is usually smaller than the RHS of (170).
Remark 51.
Consider S R = H or S R = H P ( S T ) . For both cases, H is a function of [ S R , S T ] and the denominator in (168) is the variance (169). In fact, for S R = H P ( S T ) , the expression (169) takes on the minimal value 1. This CSIR is thus the best possible; see Proposition 2.
Remark 52.
For MIMO channels we replace (159) with
[ Y ̲ , S R ] = [ H X ̲ + Z ̲ , S R ]
where H , A , Z ̲ are mutually independent and Z ̲ CN ( 0 ̲ , I ) . One usually considers the constraint E X ̲ 2 P .
Remark 53.
The model (171) includes block fading. For example, choosing M = N and H = H I gives scalar block fading. Moreover, the capacity per symbol without in-block feedback is the same as for the M = N = 1 case except that P is replaced with P / M ; see [11] and Section 9.

6.1. CSIR and CSIT Models

We study two classes of CSIR, as shown in Table 1. The first class has full (or “perfect”) CSIR, by which we mean either S R = H or S R = H P ( S T ) . The motivation for studying the latter case is that it models block fading channels with long blocks where the receiver estimates H P ( S T ) using pilot symbols, and the number of pilot symbols is much smaller than the block length [10]. Moreover, one achieves the upper bound (161), see Proposition 2 below.
We coarsely categorize the CSIT as follows:
  • Full CSIT: S T = H ;
  • CSIT@R: S T = q u ( G ) where q u ( . ) is the quantizer of Section 2.9 with B = 0 , 1 , ;
  • Partial CSIT: S T is not known exactly at the receiver.
The capacity of the CSIT@R models is given by log ( 1 + SNR ) expressions [10,92]; see also [93]. The partial CSIT model is interesting because achieving capacity generally requires adaptive codewords and closed-form capacity expressions are unavailable. The GMI lower bound of Theorem 1 and Remark 42 and the capacity upper bound of Lemma 6 serve as benchmarks.
The partial CSIR models have S R being a lossy function of H. For example, a common model is based on LMMSE channel estimation with
H = ϵ ¯ S R + ϵ Z R
where 0 ϵ 1 and S R , Z R are uncorrelated. The CSIT is categorized as above, except that we consider S T = f T ( S R ) for some function f T ( . ) rather than S T = q u ( G ) .
To illustrate the theory, we study two types of fading: one with discrete H and one with continuous H, namely
  • Section 7: on-off fading with P H ( 0 ) = P H ( 2 ) = 1 / 2 ;
  • Section 8: Rayleigh fading with H CN ( 0 , 1 ) .
For on-off fading we have p ( g ) = 1 2 δ ( g ) + 1 2 δ ( g 2 ) and for Rayleigh fading we have p ( g ) = e g · 1 ( g 0 ) .
Remark 54.
For channels with partial CSIR, we will study the GMI for partitions with K = 1 and K = 2 . The full CSIT model has received relatively little attention in the literature, perhaps because CSIR is usually more accurate than CSIT ([5], Section 4.2.3).

6.2. No CSIR, No CSIT

Without CSIR or CSIT, the channel is a classic memoryless channel [94] for which the capacity (160) becomes the usual expression with S R = 0 and A = X . For CSCG X and U = X / E | X | 2 , the reverse and forward model GMIs (139) and (168) are the respective
I 1 ( X ; Y ) = E log Var U | Y
I 1 ( X ; Y ) = log 1 + P | E H | 2 1 + P Var H .
For example, the forward model GMI is zero if E H = 0 .

6.3. Full CSIR, CSIT@ R

Consider the full CSIR models with S R = H and CSIT@R. The capacity is given by log ( 1 + SNR ) expressions that we review.
First, the capacity with B = 0 (no CSIT) is
C ( P ) = E log 1 + G P = 0 p ( g ) log 1 + g P d g .
The wideband derivatives are (see (37))
C ( 0 ) = E G = 1 , C ( 0 ) = E G 2
so that the wideband values (37) are (see ([73], Theorem 13))
E b N 0 min = log 2 , S = 2 E G 2 .
The minimal E b / N 0 is the same as without fading, namely 1.59 dB. However, Jensen’s inequality gives E G 2 E G 2 = 1 with equality if and only if G = 1 . Thus, fading reduces the capacity slope S.
More generally, the capacity with full CSIR and S T = q u ( G ) is (see [10])
C ( P ) = max P ( S T ) : E P ( S T ) P E log 1 + G P ( S T ) = max P ( S T ) : E P ( S T ) P 0 p ( g , s T ) log 1 + g P ( s T ) d g d s T .
To optimize the power levels P ( s T ) , consider the Lagrangian
E log 1 + G P ( S T ) + λ P E P ( S T )
where λ 0 is a Lagrange multiplier. Taking the derivative with respect to P ( s T ) , we have
λ = E G 1 + G P ( s T ) S T = s T = 0 p ( g | s T ) g 1 + g P ( s T ) d g
as long as P ( s T ) 0 . If this equation cannot be satisfied, choose P ( s T ) = 0 . Finally, set λ so that E P ( S T ) = P .
For example, consider B = and S T = G . We then have p ( g | s T ) = δ ( g s T ) and therefore
P ( g ) = 1 λ 1 g +
where λ is chosen so that E P ( G ) = P . The capacity (178) is then (see ([95], Equation (7)))
C ( P ) = λ p ( g ) log g / λ d g .
Consider now the quantizer q u ( . ) of Section 2.9 with B = 1 . We have two equations for λ , namely
λ = 0 Δ p ( g ) P S T ( Δ / 2 ) · g 1 + g P ( Δ / 2 ) d g
λ = Δ p ( g ) P S T ( 3 Δ / 2 ) · g 1 + g P ( 3 Δ / 2 ) d g .
Observe the following for (183) and (184):
  • both P ( Δ / 2 ) and P ( 3 Δ / 2 ) decrease as λ increases;
  • the maximal λ permitted by (183) is E G | G Δ which is obtained with P ( Δ / 2 ) = 0 ;
  • the maximal λ permitted by (184) is E G | G Δ which is obtained with P ( 3 Δ / 2 ) = 0 .
Thus, if E G | G Δ > E G | G Δ , then at P below some threshold, we have P ( Δ / 2 ) = 0 and P ( 3 Δ / 2 ) = P / P S T ( 3 Δ / 2 ) . The capacity in nats per symbol at low power and for fixed Δ is thus
C ( P ) = Δ p ( g ) log 1 + g P ( 3 Δ / 2 ) d g P E G | G Δ P 2 2 P S T ( 3 Δ / 2 ) E G 2 | G Δ
where we used
log ( 1 + x ) x x 2 2
for small x. The wideband values (37) are
E b N 0 min = log 2 E G | G Δ
S = 2 P S T ( 3 Δ / 2 ) E G | G Δ 2 E G 2 | G Δ .
One can thus make the minimum E b / N 0 approach if one can make E G | G Δ as large as desired by increasing Δ .
Remark 55.
Consider the MIMO model (171) with S R = H . Suppose the CSIT is S T = f T ( S R ) for some function f T ( · ) . The capacity (178) generalizes to
C ( P ) = max X ̲ ( S T ) : E X ̲ ( S T ) 2 P I ( X ̲ ; H X ̲ + Z ̲ | H , S T ) = max Q ( S T ) : E tr Q ( S T ) P E log det I + H Q ( S T ) H .

6.4. Full CSIR, Partial CSIT

Consider first the full CSIR S R = H P ( S T ) and then the less informative S R = H .
S R = H P ( S T ) : We have the following capacity result that implies this CSIR is the best possible since one can achieve the same rate as if the receiver sees both H and S T ; see the first step of (162). We could thus have classified this model as CSIT@R.
Proposition 2
(see ([10], Proposition 3)). The capacity of the channel (159) with S R = H P ( S T ) and general S T is
C ( P ) = max P ( S T ) : E P ( S T ) P C p ( s R ) log 1 + | s R | 2 d s R = max P ( S T ) : E P ( S T ) P E log 1 + G P ( S T ) .
Proof. 
Achievability follows by Theorem 1 with Remark 51. The converse is given by Lemma 6. □
Remark 56.
Proposition 2 gives an upper bound and (thus) a target rate when the receiver has partial CSIR. For example, we will use the K-partition idea of Lemma 1 (see also Remark 46) to approach the upper bound for large SNR.
Remark 57.
Proposition 2 partially generalizes to block-fading channels; see Proposition 3 in Section 9.5.
S R = H : The capacity is (138) with
I ( A ; Y | H ) = E log p ( Y | A , H ) p ( Y | H )
where E | X | 2 P and where
p ( y | a , h ) = C p ( s T | h ) e | y h x ( s T ) | 2 π d s T
and
p ( y | h ) = C p ( s T | h ) A p ( a ) p ( y | a , h , s T ) d a d s T = C p ( s T | h ) C p ( x ( s T ) ) e | y h x ( s T ) | 2 π d x ( s T ) d s T .
For example, if each entry X ( s T ) of A is CSCG with variance P ( s T ) then
p ( y | h ) = C p ( s T | h ) exp | y | 2 1 + g P ( s T ) π ( 1 + g P ( s T ) ) d s T .
In general, one can compute I ( A ; Y | H ) numerically by using (190)–(192), but the calculations are hampered if the integrals in (191) and (192) do not simplify.
For the reverse model GMI (139), the averaging density in (164) and (165) is here
p ( h , s T | y , s R ) = δ ( h s R ) p ( s T | h ) p ( y | h , s T ) p ( y | h ) .
We use numerical integration to compute the GMI.
To obtain more insight, we state the forward model rates of Theorem 1 and Remark 51 as a Corollary.
Corollary 1.
An achievable rate for the fading channels (159) with S R = H and partial CSIT is the forward model GMI
I 1 ( A ; Y | H ) = E log 1 + SNR ( H )
where
SNR ( h ) = | h | 2 P ˜ T ( h ) 1 + | h | 2 Var P ( S T ) H = h
and
P ˜ T ( h ) = E P ( S T ) H = h 2 .
Remark 58.
Jensen’s inequality gives
P ˜ T ( h ) E P ( S T ) | H = h
by the concavity of the square root. Equality holds if and only if P ( S T ) is a constant given H = h .
Remark 59.
Choosing P ( s T ) = P for all s T in Corollary 1 gives P ˜ T ( h ) = P for all h and the rate (195) is the capacity (175) without CSIT.
Remark 60.
For large P, the SNR ( h ) in (196) saturates unless P ( s T ) / P 1 for all s T , i.e., the high-SNR capacity is the same as the capacity without CSIT. The CSIT thus must become more accurate as P increases to improve the rate.
Remark 61.
To optimize the power levels, consider (146) and
P ˜ ( h ) = 2 | h | 2 P ˜ T ( h ) p ( s T | h )
E | Y | 2 | H = h = 2 | h | 2 P ( s T ) p ( s T | h ) .
However, the resulting equations give little insight due to the expectation over H in (146). An exception is the on-off fading case where the expectation has only one term; see (254) and (255).

6.5. Partial CSIR, Full CSIT

Suppose S R is a (perhaps noisy) function of H; see (172). The capacity is given by (160) for which we need to compute p ( y | a , s R ) and p ( y | s R ) . The GMI with a K-partition of the output space Y × S R can be helpful for these problems. We assume that the CSIR is either S R = 0 or S R = 1 ( G t ) for some transmitter threshold t; see [95].
Suppose that S T = H . We then have
p ( y | a , s R ) = C p ( h | s R ) exp y h x ( h ) 2 π d h p ( y | s R ) = C 2 p ( h | s R ) p ( x ( h ) ) exp y h x ( h ) 2 π d x ( h ) d h .
Now select the X ( h ) to be jointly CSCG with variances E | X ( h ) | 2 = P ( h ) and correlation coefficients
ρ ( h , h ) = E X ( h ) X ( h ) * P ( h ) P ( h )
and where E P ( H ) P . We then have
p ( y | s R ) = C p ( h ) e | y | 2 / ( | h | 2 P ( h ) + 1 ) 2 π ( | h | 2 P ( h ) + 1 ) d h .
As in (97), p ( y | s R ) and therefore h ( Y | S R ) depend only on the marginals p ( x ( h ) ) of A and not on the ρ ( h , h ) . We thus have the problem of finding the ρ ( h , h ) that minimize
h ( Y | S R , A ) = A p ( a ) h ( Y | S R , A = a ) d a .
However, we study the conventional A in (88) for simplicity.
For the reverse model GMI (139), the averaging density in (164) and (165) is (cf. (194))
p ( h , s T | y , s R ) = δ ( s T h ) p ( h | s R ) p ( y | h , s R ) p ( y | s R ) .
We again use numerical integration to compute the GMI.
For the forward model GMI, consider the same model and CSCG X as in Theorem 1. Since H is a function of S T , we use (169) in Remark 50 to write
I 1 ( A ; Y | S R ) = E log 1 + P ˜ ( S R ) 1 + Var G P ( H ) S R
where (see (170))
P ˜ ( s R ) = E G P ( H ) S R = s R 2
E | Y | 2 | S R = s R = 1 + E G P ( H ) | S R = s R .
The transmitter compensates for the phase of H, and it remains to adjust the transmit power levels P ( h ) . We study five power control policies and two types of CSIR; see Table 2.
Heuristic Policies: The first three policies are reasonable heuristics and have the form
P ( h ) = P ^ g a , g t 0 , else
for some choice of real a and where
P ^ = P t p ( g ) g a d g .
In particular, choosing a = 0 , + 1 , 1 , we obtain policies that we call truncated constant power (TCP), truncated matched filtering (TMF), and truncated channel inversion (TCI), respectively; see ([5], page 487), [95]. For such policies, we compute
P ˜ ( s R ) = P ^ t p ( g | s R ) g 1 + a d g 2
E G P ( H ) | S R = s R = P ^ t p ( g | s R ) g 1 + a d g .
These policies all have the form P ( h ) = P · f ( h ) for some function f ( . ) that is independent of P. The minimum SNR in (37) with C ( P ) replaced with the GMI is thus
E b N 0 min = t p ( g ) g a d g log 2 E t p ( g | S R ) g 1 + a d g 2 .
For instance, consider the threshold t = 0 (no truncation). The TCP ( a = 0 ) and TMF ( a = 1 ) policies have P ^ = P while TCI ( a = 1 ) has P = P ^ / E G 1 . For TCP, TMF, and TCI, we compute the respective
E b N 0 min = log 2 E E G S R 2
E b N 0 min = log 2 E E G | S R 2
E b N 0 min = E G 1 log 2 .
Applying Jensen’s inequality to the square root, square, and inverse functions in (210)–(212), we find that for t = 0 :
  • the minimum E b / N 0 of TCP and TCI is larger (worse) than 1.59 dB unless there is no fading;
  • the minimum E b / N 0 of TMF is smaller (better) than 1.59 dB unless E G | S R = E G = 1 .
However, we emphasize that these claims apply to the GMI and not necessarily the mutual information; see Section 8.4.
GMI-Optimal Policy: The fourth policy is optimal for the GMI (202) and has the form of an MMSE precoder. This policy motivates a truncated MMSE (TMMSE) policy that generalizes and improves TMF and TCI.
Taking the derivative of the Lagrangian
I 1 ( A ; Y | S R ) + λ P E P ( H )
with respect to P ( h ) we have the following result.
Theorem 2.
The optimal power control policy for the GMI I 1 ( A ; Y | S R ) for the fading channels (159) with S T = H is
P ( h ) = α ( h ) | h | λ + β ( h ) | h | 2
where λ > 0 is chosen so that E P ( H ) = P and
α ( h ) = C p ( s R | h ) P ˜ ( s R ) E | Y | 2 | S R = s R P ˜ ( s R ) d s R
β ( h ) = C p ( s R | h ) P ˜ ( s R ) E | Y | 2 | S R = s R P ˜ ( s R ) E | Y | 2 | S R = s R d s R .
Proof. 
Apply (146) with (203) and (204) to obtain
P ˜ ( s R ) = 2 | h | P ˜ ( s R ) p ( h | s R )
E | Y | 2 | S R = s R = 2 | h | 2 P ( h ) p ( h | s R ) .
Inserting into (146) and rearranging terms we obtain (214) with (215) and (216). □
Remark 62.
The expressions (215) and (216) are self-referencing, as P ˜ ( s R ) itself depends on α ( h ) and β ( h ) . However, one simplification occurs if S R is a function of H: α ( h ) and β ( h ) are functions of s R only since the p ( s R | h ) in (215) and (216) is a Dirac generalized function.
Remark 63.
Consider the expression (214). We effectively have a matched filter for small | h | ; for large | h | , we effectively have a channel inversion. Recall that LMMSE filtering has similar behavior for low and high SNR, respectively.
Remark 64.
A heuristic based on the optimal policy is a TMMSE policy where the transmitter sets P ( h ) = 0 if G < t , and otherwise uses (214) but where α ( h ) , β ( h ) are independent of h. There are thus four parameters to optimize: λ, α, β, and t. This TMMSE policy will outperform TMF and TCI in general, as these are special cases where β = 0 and λ = 0 , respectively.
S R = 0 : For this CSIR, the GMI (202) simplifies to I 1 ( A ; Y ) and the heuristic policy (TCP, TMF, TCI) rates are
I 1 ( A ; Y ) = log 1 + P ^ E G 1 + a · 1 ( G t ) 2 1 + P ^ Var G 1 + a · 1 ( G t ) .
Moreover, the expression (209) gives
E b N 0 min = E G a · 1 ( G t ) E G 1 + a · 1 ( G t ) 2 log 2 .
For TCP, TMF, and TCI, we compute the respective
E b N 0 min = log 2 Pr G t E G G t 2
E b N 0 min = log 2 t p ( g ) g d g
E b N 0 min = E G 1 G t Pr G t log 2 .
Again applying Jensen’s inequality to the various functions in (221)–(223), we find that:
  • the minimum E b / N 0 of TMF is smaller (better) than that of TCP and TCI unless there is no fading, or if the minimal E b / N 0 is ;
  • the best threshold for TMF is t = 0 and the minimal E b / N 0 is 1.59 dB.
For the optimal policy, the parameters α ( h ) and β ( h ) in (215) and (216) are constants independent of h, see Remark 62, and the TMMSE policy with t = 0 is the GMI-optimal policy.
Remark 65.
The TCI channel densities are
p ( y | a ) = Pr G < t e | y | 2 π + Pr G t e y P ^ u 2 π p ( y ) = Pr G < t e | y | 2 π + Pr G t e | y | 2 / ( 1 + P ^ ) π ( 1 + P ^ ) .
Remark 66.
At high SNR, one might expect that the receiver can estimate P ( S T ) precisely even if S R = 0 . We show that this is indeed the case for on-off fading by using the K = 2 partition (154) of Remark 46. Moreover, the results prove that at high SNR one can approach I ( A ; Y ) ; see Section 7.3.
Remark 67.
For Rayleigh fading, the GMI with K = 2 in (154) is helpful for both high and low SNR. For instance, for S R = 0 and TCI, the K = 2 GMI approaches the mutual information for S R = 1 ( G t ) as the SNR increases; see Remark 74 in Section 8.4. We further show that for S R = 0 , the TCI policy can achieve a minimal E b / N 0 of dB, see (301) in Section 8.4.
S R = 1 ( G t ) : The heuristic policy rates are now (cf. (219) and note the Pr G t term and conditioning)
I 1 ( A ; Y | S R ) = Pr G t log 1 + P ^ E G 1 + a G t 2 1 + P ^ Var G 1 + a G t .
Moreover, the expression (209) is
E b N 0 min = E G a G t E G 1 + a G t 2 log 2 .
For TCP, TMF, and TCI we compute the respective
E b N 0 min = log 2 E G G t 2
E b N 0 min = log 2 E G | G t
E b N 0 min = E G 1 G t log 2 .
Again applying Jensen’s inequality to the various functions in (226)–(228), we find that:
  • the minimum E b / N 0 of all policies can be better than 1.59 dB by choosing t > 0 ;
  • the minimum E b / N 0 of TMF is smaller (better) than that of TCP and TCI unless there is no fading or the minimal E b / N 0 is .
For the optimal policy, Remark 62 points out that α ( h ) and β ( h ) depend on s R only. We compute
P ( h ) = α 0 | h | λ + β 0 | h | 2 , g < t α 1 | h | λ + β 1 | h | 2 , g t
where for s R { 0 , 1 } we have
α s R = P ˜ ( s R ) E | Y | 2 | S R = s R P ˜ ( s R ) β s R = P ˜ ( s R ) E | Y | 2 | S R = s R P ˜ ( s R ) E | Y | 2 | S R = s R .
Remark 68.
The GMI (224) for TCI ( a = 1 ) is the mutual information I ( A ; Y | S R ) . To see this, observe that the model q ( y | a , s R ) has
q ( y | a , 0 ) = e | y | 2 π , q ( y | a , 1 ) = e y P ^ u 2 π
and thus we have q ( y | a , s R ) = p ( y | a , s R ) for all y , a , s R .

6.6. Partial CSIR, CSIT@ R

Suppose next that S R is a noisy function of H (see for instance (172)) and S T = f T ( S R ) . The capacity is given by (147) and we compute
I ( X ; Y | S R ) = E log p ( Y | X , S R ) p ( Y | S R )
where writing s T = f T ( s R ) we have
p ( y | s R , x ) = C p ( h | s R ) e | y h x ( s T ) | 2 π d h
p ( y | s R ) = C 2 p ( h | s R ) p ( x ( s T ) ) e | y h x ( s T ) | 2 π d x ( s T ) d h .
For example, if X ( s T ) is CSCG with variance P ( s T ) then
p ( y | s R ) = C p ( h | s R ) exp | y | 2 1 + | h | 2 P ( s T ) π ( 1 + | h | 2 P ( s T ) ) d h .
One can compute I ( X ; Y | S R ) numerically using (231) and (232). However, optimizing over X ( s T ) is usually difficult.
For the reverse model GMI (139), the averaging density in (164) and (165) is now (cf. (194) and (201))
p ( h , s T | y , s R ) = δ s T f T ( s R ) p ( h | s R ) p ( y | h , s R ) p ( y | s R ) .
We use numerical integration to compute the rates.
The forward model GMI again gives more insight. Define the channel gain and variance as the respective
g ˜ ( s R ) = E H | S R = s R 2
σ ˜ 2 ( s R ) = Var H | S R = s R .
Theorem 3.
An achievable rate for AWGN fading channels (159) with power constraint E | X | 2 P and with partial CSIR S R and S T = f T ( S R ) is
I 1 ( X ; Y | S R ) = E log 1 + g ˜ ( S R ) P ( S T ) 1 + σ ˜ 2 ( S R ) P ( S T )
where E P ( S T ) = P . The optimal power levels P ( s T ) are obtained by solving
λ = R p ( s R | s T ) g ˜ ( s R ) 1 + g ˜ ( s R ) + σ ˜ 2 ( s R ) P ( s T ) 1 + σ ˜ 2 ( s R ) P ( s T ) d s R .
In particular, if S T determines S R (CSIR@T) then we have the quadratic waterfilling expression
f P ( s T ) , g ˜ ( s R ) , σ ˜ 2 ( s R ) = 1 λ 1 g ˜ ( s R ) +
where
f Q , g , σ 2 = 1 + 2 σ 2 g Q + 1 + σ 2 g σ 2 Q 2
and where λ is chosen so that E P ( H R ) = P .
Proof. 
Apply Theorem 1 with
P ˜ ( s R ) = g ˜ ( s R ) P ( s T )
E | Y | 2 | S R = s R = 1 + g ˜ ( s R ) + σ ˜ 2 ( s R ) P ( s T )
to obtain (237). To optimize the power levels P ( s T ) with (146), consider the derivatives
P ˜ ( s R ) = 2 g ˜ ( s R ) P ( s T ) 1 ( s T = f T ( s R ) )
E | Y | 2 | S R = s R = 2 g ˜ ( s R ) + σ ˜ 2 ( s R ) P ( s T ) 1 ( s T = f T ( s R ) ) .
The expression (146) thus becomes (238). If S T determines S R then the expression simplifies to
λ = g ˜ ( s R ) 1 + g ˜ ( s R ) + σ ˜ 2 ( s R ) P ( s T ) 1 + σ ˜ 2 ( s R ) P ( s T )
from which we obtain (239). □
Remark 69.
The optimal power control policy with CSIT@R and CSIR@T can be written explicitly by solving the quadratic in (239). The result is
P ( s T ) = g ˜ + 2 σ ˜ 2 2 σ ˜ 2 ( g ˜ + σ ˜ 2 ) 1 + 4 σ ˜ 2 1 λ 1 g ˜ + g ˜ ( g ˜ + σ ˜ 2 ) ( g ˜ + 2 σ ˜ 2 ) 2 1
where we have discarded the dependence on s R for convenience. The alternative form (239) relates to the usual waterfilling where the left-hand side of (239) is P ( s T ) . Observe that σ ˜ 2 = 0 gives conventional waterfilling.
Remark 70.
As in Section 3.3, we show that at high SNR the K = 2 GMI of Remark 42 approaches the upper bound of Proposition 2 in some cases; see Section 7.4. The channel parameters depend on s R , and we choose h 1 ( s R ) = 0 and σ 1 2 ( s R ) = σ 2 2 ( s R ) = 1 for all s R .

7. On-Off Fading

Consider again on-off fading with P G ( 0 ) = P G ( 2 ) = 1 / 2 . We study the scenarios listed in Table 1. The case of no CIR and no CSIT was studied in Section 3.3.

7.1. Full CSIR, CSIT@ R

Consider S R = H . The capacity with B = 0 (no CSIT) is given by (175) (cf. (73)):
C ( P ) = 1 2 log 1 + 2 P
and the wideband values are given by (177) (cf. (74)); the minimal E b / N 0 is log 2 and the slope is S = 1 .
The capacity with B = (or S T = G ) increases to
C ( P ) = 1 2 log 1 + 4 P
where P ( 0 ) = 0 and P ( 2 ) = 2 P . This capacity is also achieved with B = 1 since there are only two values for G. We compute C ( 0 ) = 2 and C ( 0 ) = 8 , and therefore
E b N 0 min = log 2 2 , S = 1 .
The power gain due to CSIT compared to no fading is thus 3.01 dB, but the capacity slope is the same. The rate curves are compared in Figure 4.

7.2. Full CSIR, Partial CSIT

Consider next noisy CSIT with 0 ϵ 1 2 and
Pr S T = G = ϵ ¯ , Pr S T G = ϵ .
S R = H P ( S T ) : The capacity of Proposition 2 is
C ( P ) = max P ( 0 ) + P ( 2 ) = 2 P ϵ 2 log 1 + 2 P ( 0 ) + ϵ ¯ 2 log 1 + 2 P ( 2 ) .
Optimizing the power levels, we have
P ( 0 ) = 2 ϵ P ϵ ¯ ϵ 2 + , P ( 2 ) = 2 P P ( 0 ) .
Figure 4 shows C ( P ) for ϵ = 0.1 as the curve labeled “Best CSIR”. For P ( ϵ ¯ ϵ ) / ( 4 ϵ ) , we compute
C ( P ) = 1 2 log ( 1 + 2 P ) + 1 2 [ 1 H 2 ( ϵ ) ] log 2
where H 2 ( ϵ ) = ϵ log 2 ϵ ϵ ¯ log 2 ϵ ¯ is the binary entropy function. For example, if ϵ = 0.1 then for P 2 one gains Δ C = [ 1 H 2 ( 0.1 ) ] / 2 0.27 bits over the capacity without CSIT. This translates to an SNR gain of 2 Δ C · 10 log 10 ( 2 ) 1.60 dB. On the other hand, for P ( ϵ ¯ ϵ ) / ( 4 ϵ ) we have P ( 0 ) = 0 , P ( 2 ) = 2 P , and the capacity is
C ( P ) = ϵ ¯ 2 log 1 + 4 P .
We have C ( 0 ) = 2 ϵ ¯ and lose a fraction of ϵ ¯ of the power as compared to having full CSIT ( ϵ = 0 ). For example, if ϵ = 0.1 , the minimal E b / N 0 is approximately 4.14 dB.
S R = H : To compute I ( A ; Y | H ) in (190), we write (191) and (193) for CSCG X ( s T ) as
p Y | A , H ( y | a , 0 ) = p Y | H ( y | 0 ) = e | y | 2 π p Y | A , H y | a , 2 = ϵ e y 2 x ( 0 ) 2 π + ϵ ¯ e y 2 x 2 2 π p Y | H y | 2 = ϵ exp | y | 2 1 + 2 P ( 0 ) π ( 1 + 2 P ( 0 ) ) + ϵ ¯ exp | y | 2 1 + 2 P ( 2 ) π ( 1 + 2 P ( 2 ) ) .
Figure 4 shows the rates as the curve labeled “ I ( A ; Y | H ) ”. This curve was computed by Monte Carlo integration with P ( 0 ) = 0.1 · P and P ( 2 ) = 1.9 · P , which is near-optimal for the range of SNRs depicted.
The reverse model GMI (139) requires Var U | Y , H . We show how to compute this variance in Appendix C.2 by applying (164) and (165). Figure 4 shows the GMIs as the curve labeled “rGMI”, where we used the same power levels as for the I ( A ; Y | H ) curve. The two curves are indistinguishable for small P, but the “rGMI” rates are poor at large P. This example shows that the forward model GMI with optimized powers can be substantially better than the reverse model GMI with a reasonable but suboptimal power policy.
The forward model GMI (195) is
I 1 ( A ; Y | H ) = 1 2 log 1 + SNR 2
where SNR 2 is given by (196) with
P ˜ T 2 = ϵ P ( 0 ) + ϵ ¯ P ( 2 ) 2 Var P ( S T ) H = h = 1 + 2 ϵ ϵ ¯ P ( 2 ) P ( 0 ) 2 .
Applying Remark 61, the optimal power control policy is
P ( s T ) = p H | S T 2 | s T γ + β p H | S T 2 | s T = ϵ γ + β ϵ , s T = 0 ϵ ¯ γ + β ϵ ¯ , s T = 2
where
β = 2 P ˜ T 2 E | Y | 2 | H = 2
and γ 0 is chosen so that P ( 0 ) + P ( 2 ) = 2 P . Figure 4 shows the resulting GMI as the curve labeled “GMI, K = 1”. At low SNR, we achieve the rate P ˜ T 2 and the optimal power control has β 0 so that
P ( 0 ) = 2 P ϵ 2 ϵ 2 + ϵ ¯ 2 , P ( 2 ) = 2 P ϵ ¯ 2 ϵ 2 + ϵ ¯ 2
and therefore
P ˜ T ( 2 ) = 2 ϵ 2 + ϵ ¯ 2 P .
We have C ( 0 ) = 2 ϵ 2 + ϵ ¯ 2 and lose a fraction of ( ϵ 2 + ϵ ¯ 2 ) of the power as compared to having full CSIT ( ϵ = 0 ). For example, if ϵ = 0.1 , the minimal E b / N 0 is approximately 3.74 dB.
We remark that the I ( A ; Y | H ) and reverse model GMI curves lie above the forward model curve if we choose the same power policy as for the forward channel.

7.3. Partial CSIR, Full CSIT

This section studies S T = H . The capacity with partial CSIR is given by (138) for which we need to compute p ( y | a , s R ) and p ( y | s R ) . We consider two cases.
S R = 1 ( G t ) : Here we recover the case with full CSIR by choosing t to satisfy 0 < t 2 .
S R = 0 : The best power policy clearly has P ( 0 ) = 0 and P ( 2 ) = 2 P . The mutual information is thus I ( A ; Y ) = I X 2 ; Y and the channel densities are (cf. (75) and (76))
p ( y | a ) = e | y | 2 2 π + e y 2 P u 2 2 2 π p ( y ) = e | y | 2 2 π + e | y | 2 / ( 1 + 4 P ) 2 π ( 1 + 4 P ) .
The rates I ( A ; Y ) are shown in Figure 5. Observe that the low-SNR rates are larger than without fading; this is a consequence of the slightly bursty nature of transmission.
The reverse model GMI (139) requires Var U | Y . We compute this variance in Appendix C.3 by using (164) and (165) with (201) and ϕ ( s T ) = 0 . Figure 5 shows the GMIs as the curve labeled “rGMI”.
Next, the TCP, TMF, TCI, and TMMSE policies are the same for 0 < t 2 , since they use P ( 0 ) = 0 and P 2 = 2 P . The resulting rate is given by (202)–(204) with P ˜ ( 0 ) = 0 , P ˜ ( 1 ) = P , and Var G P ( S T ) S R = 1 = P and
I 1 ( A ; Y ) = log 1 + P 1 + P .
The rates are plotted in Figure 5 as the curve labeled “GMI, K = 1”. This example again shows that choosing K = 1 is a poor choice at high SNR.
To improve the auxiliary model at high SNR, consider the GMI (154) with K = 2 and the subsets (65). We further choose the parameters h 1 = 0 , σ 1 2 = 0 , h 2 = 2 , σ 2 2 = 1 , and adaptive coding with X ( 0 ) = 0 , X 2 = 2 P U , X ¯ = P U , where U CN ( 0 , 1 ) . The GMI (154) is
I 1 ( A ; Y ) = Pr E 2 log ( 1 + 4 P ) + E | Y | 2 | E 2 1 + 4 P E Y 4 P U 2 E 2 .
In Appendix B.2, we show that choosing t R = P λ R + b where 0 < λ R < 1 and b is a real constant makes all terms behave as desired as P increases:
Pr E 2 1 / 2 , E | Y | 2 | E 2 1 + 4 P 1 , E Y 4 P U 2 E 2 1 .
We thus have
lim P 1 2 log ( 1 + 4 P ) I 1 ( X ; Y ) = 0 .
Figure 5 shows the behavior of I 1 ( A ; Y ) for λ R = 1 / 2 and b = 3 as the curve labeled “GMI, K = 2”. As for the case without CSIT, the receiver can estimate H accurately at large SNR, and one approaches the capacity with full CSIR.
Finally, the large-K forward model rates are computed using (70) but where X ¯ replaces X. One may again use the results of Appendix C.3 and the relations
E X ¯ Y = y = P E U | Y = y E | X ¯ | 2 Y = y = P E | U | 2 | Y = y Var X ¯ | Y = y = P Var U | Y = y .
The rates are shown as the curve labeled “GMI, K = ” in Figure 5. So again, the large-K forward model is good at high SNR but worse than the best K = 1 model at low SNR.

7.4. Partial CSIR, CSIT@ R

Consider partial CSIR with S T = S R and
Pr S R = H = ϵ ¯ , Pr S R H = ϵ
where 0 ϵ 1 2 . We thus have both CSIT@R and CSIR@T. To compute I ( X ; Y | S R ) in (230), we write (231) and (232) as
p Y | S R , X ( y | 0 , x ) = ϵ ¯ e | y | 2 π + ϵ e y 2 x ( 0 ) 2 π p Y | S R , X ( y | 2 , x ) = ϵ ¯ e y 2 x 2 2 π + ϵ e | y | 2 π p Y | S R ( y | 0 ) = ϵ ¯ e | y | 2 π + ϵ e | y | 2 / [ 1 + 2 P ( 0 ) ] π [ 1 + 2 P ( 0 ) ] p Y | S R ( y | 2 ) = ϵ ¯ e | y | 2 / 1 + 2 P 2 π 1 + 2 P 2 + ϵ e | y | 2 π
where X ( s T ) is CSCG. We choose the transmit powers P ( 0 ) and P 2 as in (250) to compare with the best CSIR. Figure 6 shows the resulting rates for ϵ = 0.1 as the curve labeled “Partial CSIR, I ( X ; Y | S R ) ”. Observe that at high SNR, the curve seems to approach the best CSIR curve from Figure 4 with S R = H P ( S T ) . We prove this by studying a forward model GMI with K = 2 .
The reverse model GMI requires Var U | Y , S R , which can be computed by simulation; see Appendix C.4. However, optimizing the powers seems difficult. We instead focus on the forward model GMI of Theorem 3 for which we compute
g ˜ ( 0 ) = 2 ϵ 2 , g ˜ 2 = 2 ϵ ¯ 2 , σ ˜ 2 ( 0 ) = σ ˜ 2 2 = 2 ϵ ϵ ¯
and therefore (237) is
I 1 ( X ; Y | S R ) = 1 2 log 1 + 2 ϵ 2 P ( 0 ) 1 + 2 ϵ ϵ ¯ P ( 0 ) + 1 2 log 1 + 2 ϵ ¯ 2 P 2 1 + 2 ϵ ϵ ¯ P 2 .
For CSIR@T, the optimal power control policy is given by the quadratic waterfilling specified by (239) or (245):
P ( 0 ) = 1 + ϵ ¯ 4 ϵ ϵ ¯ 1 + 8 ϵ ϵ ¯ 1 λ 1 2 ϵ 2 + ϵ ( 1 + ϵ ¯ ) 2 1 P 2 = 1 + ϵ 4 ϵ ϵ ¯ 1 + 8 ϵ ϵ ¯ 1 λ 1 2 ϵ ¯ 2 + ϵ ¯ ( 1 + ϵ ) 2 1 .
The rates are shown in Figure 6 as the curve labeled “Partial CSIR, GMI, K = 1”. Observe that at high SNR the GMI (263) saturates at
1 2 log 1 + ϵ ϵ ¯ + 1 2 log 1 + ϵ ¯ ϵ .
For example, for ϵ = 0.1 , we approach 1.74 bits at high SNR. On the other hand, at low SNR, the rate is maximized with P ( 0 ) = 0 and P 2 = 2 P so that I 1 ( X ; Y | S R ) 2 ϵ ¯ 2 P . We thus achieve a fraction of ϵ ¯ 2 of the power compared to full CSIT. For example, if ϵ = 0.1 , the minimal E b / N 0 is approximately 3.69 dB.
Figure 6 also shows the conventional waterfilling rates as the curve labeled “Partial CSIR, GMI, c-waterfill”. These rates are almost the same as the quadratic waterfilling rates except for the range of E b / N 0 between 9 to 13 dB shown in the inset.
To improve the auxiliary model at high SNR, we use a K = 2 GMI with (see Remark 70)
h 1 ( s R ) = 0 , h 2 ( s R ) = 2 , σ 1 2 ( s R ) = σ 2 2 ( s R ) = 1
for s R = 0 , 2 . The receiver chooses X ¯ ( s R ) = P ( s R ) U (see Remark 41) and we have (see Remark 42)
I 1 ( X ; Y | S R ) = 1 2 Pr E 2 | S R = 0 log 1 + 2 P ( 0 ) + E | Y | 2 E 2 , S R = 0 1 + 2 P ( 0 ) E | Y 2 X ( 0 ) | 2 E 2 , S R = 0 + 1 2 Pr E 2 | S R = 2 log 1 + 2 P 2 + E | Y | 2 E 2 , S R = 2 1 + 2 P 2 E | Y 2 X 2 | 2 E 2 , S R = 2
where the X ( s T ) , s T S T , are given by (122). We consider P ( 0 ) and P 2 that scale in proportion to P. In this case, Appendix B.3 shows that choosing t R = P λ R where 0 < λ R < 1 gives the (best) full-CSIR capacity for large P, which is the rate specified in (249):
lim P ϵ 2 log 1 + 2 P ( 0 ) + ϵ ¯ 2 log 1 + 2 P 2 I 1 ( X ; Y | S R ) = 0 .
In other words, by optimizing P ( 0 ) and P 2 , at high SNR the K = 2 GMI can approach the capacity of Proposition 2. This is expected since the receiver can estimate H P ( S T ) reliably at high SNR.
Figure 6 shows the behavior of this GMI and t R = P 0.4 , and where we have chosen P ( 0 ) and P 2 according to (250). The abrupt change in slope at approximately 2.5 dB is because P ( 0 ) becomes positive beyond this E b / N 0 . Keeping P ( 0 ) = 0 for E b / N 0 up to about 12 dB gives better rates, but for high SNR one should choose the powers according to (250).

8. Rayleigh Fading

Rayleigh fading has H CN ( 0 , 1 ) . The random variable G = | H | 2 thus has the density p ( g ) = e g · 1 ( g 0 ) . Section 8.1 and Section 8.2 review known results.

8.1. No CSIR, No CSIT

Suppose S R = S T = 0 and X CN ( 0 , P ) . The densities to compute I ( X ; Y ) for CSCG X are
p ( y | x ) = e | y | 2 / ( | x | 2 + 1 ) π ( | x | 2 + 1 )
p ( y ) = 0 e g / P P e | y | 2 / ( g + 1 ) π ( g + 1 ) d g .
The minimum E b / N 0 is approximately 9.2 dB, and the forward model GMI (174) is zero. The capacity is achieved by discrete and finite X [96], and at large SNR, the capacity behaves as log log P [97]. Further results are derived in [98,99,100,101,102].

8.2. Full CSIR, CSIT@ R

The capacity (175) for B = 0 (no CSIT) is
C ( P ) = 0 e g log 1 + g P d g = e 1 / P E 1 1 / P log ( e )
where the exponential integral E 1 ( . ) is given by (A4) below. The wideband values are given by (177):
E b N 0 min = log 2 , S = 1 .
The minimal E b / N 0 is 1.59 dB, but the fading reduces the capacity slope. At high SNR, we have
C ( P ) log ( P ) γ
where γ 0.57721 is Euler’s constant. The capacity thus behaves as for the case without fading but with an SNR loss of approximately 2.5 dB.
The capacity (182) with B = (or S T = G ) is (see ([95], Equation (7)))
C ( P ) = λ e g log g / λ d g = E 1 ( λ ) .
where P ( g ) is given by (181) and λ is chosen so that
P = λ e g P ( g ) d g = e λ λ E 1 ( λ ) .
At low SNR we have large λ and using the approximation (A7) below we compute
C ( P ) e λ / λ and P e λ / λ 2 .
We thus have E b / N 0 log ( 2 ) / λ and the minimal E b / N 0 is .
Consider now B = 1 for which P S T ( 3 Δ / 2 ) = e Δ and
E G | G Δ = 1 + Δ
E G 2 | G Δ = 2 + 2 Δ + Δ 2 .
We thus have the wideband quantities in (186) and (187):
E b N 0 min = log 2 1 + Δ
S = 2 e Δ ( 1 + Δ ) 2 2 + 2 Δ + Δ 2 .
Figure 7 shows the capacities for B = 1 and Δ = 1 , 2 , 1 / 2 . The minimum E b / N 0 value is
1.59 dB 10 log 10 1 + Δ
and for Δ = 1 , 2 , 1 / 2 we gain 3 dB, 4.8 dB, 1.8 dB, respectively, over no CSIT at low power. Note that one bit of feedback allows one to approach the full CSIT rates closely.
Remark 71.
For the scalar channel (159), knowing H at both the transmitter and receiver provides significant gains at low SNR [73] but small gains at high SNR ([95], Figure 4) as compared to knowing H at the receiver only. Furthermore, the reliability can be improved ([78], Figures 5–7). Significant gains are also possible for MIMO channels.
Remark 72.
An alternative way to derive (272)–(275) is as follows. Define P ^ = P e Δ so for small P the capacity is
C ( P ) = Δ e g log 1 + g P ^ d g = e 1 / P ^ E 1 1 P ^ + Δ + e Δ log ( 1 + P ^ Δ ) P ( 1 + Δ ) 1 2 P 2 e Δ 2 + 2 Δ + Δ 2 .

8.3. Full CSIR, Partial CSIT

Consider noisy CSIT with
Pr S T = 1 ( G Δ ) = ϵ ¯ , Pr S T 1 ( G Δ ) = ϵ .
We begin with the most informative CSIR.
S R = P ( S T ) H : Proposition 2 gives the capacity
C ( P ) = 0 e g s T P ( s T | g ) log 1 + g P ( s T ) d g = 0 Δ e g ϵ ¯ log 1 + g P ( 0 ) + ϵ log 1 + g P ( 1 ) d g + Δ e g ϵ ¯ log 1 + g P ( 1 ) + ϵ log 1 + g P ( 0 ) d g .
It remains to optimize P ( 0 ) , P ( 1 ) and Δ . The two equations for the Lagrange multiplier λ are
λ · P S T ( 0 ) = 0 Δ e g · ϵ ¯ g 1 + g P ( 0 ) d g + Δ e g · ϵ g 1 + g P ( 0 ) d g
λ · P S T ( 1 ) = 0 Δ e g · ϵ g 1 + g P ( 1 ) d g + Δ e g · ϵ ¯ g 1 + g P ( 1 ) d g
where P S T ( 0 ) = ϵ ¯ ( ϵ ¯ ϵ ) e Δ and P S T ( 1 ) = ϵ + ( ϵ ¯ ϵ ) e Δ . The rates are shown in Figure 8.
For fixed Δ and large P, we have 1 / λ P ( 0 ) P ( 1 ) P and approach the capacity (269) without CSIT. In contrast, for small P we may use similar steps as for (183) and (184). Observe the following for (278) and (279):
  • both P ( 0 ) and P ( 1 ) decrease as λ increases;
  • the maximal λ in (278) is obtained with P ( 0 ) = 0 ; this value is
    E G | S T = 0 = ϵ ¯ ( ϵ ¯ ϵ ) ( 1 + Δ ) e Δ P S T ( 0 )
  • the maximal λ in (279) is obtained with P ( 1 ) = 0 ; this value is
    E G | S T = 1 = ϵ + ( ϵ ¯ ϵ ) ( 1 + Δ ) e Δ P S T ( 1 ) .
Thus, if E G | S T = 0 < E G | S T = 1 and 0 ϵ < 1 / 2 , then for P below some threshold we have P ( 0 ) = 0 , P ( 1 ) = P / P S T ( 1 ) and the capacity is
C ( P ) = 0 Δ e g ϵ log 1 + g P P S T ( 1 ) d g + Δ e g ϵ ¯ log 1 + g P P S T ( 1 ) d g .
We compute C ( 0 ) = E G | S T = 1 which is given by (281) so that 1 C ( 0 ) 1 + Δ , as expected from (274). For example, for ϵ = 0.1 and Δ = 1 we have C ( 0 ) 1.75 and therefore the minimal E b / N 0 is approximately 4.01 dB.
The best Δ is the unique solution Δ ^ of the equation
e Δ = ϵ ϵ ¯ ϵ ( Δ 1 )
and the result is C ( 0 ) = Δ ^ 1 . We have the simple bounds
1 + 1 2 log 1 ϵ 2 C ( 0 ) 1 + 1 e 1 ϵ 2
where the left inequality follows by taking logarithms and using log ( Δ 1 ) Δ 2 , and the right inequality follows by using e Δ e 1 in (283). For example, for ϵ 0 we have C ( 0 ) , and for ϵ 1 / 2 we have C ( 0 ) 1 .
S R = H : For the less informative CSIR, one may use (191) and (193) to compute I ( A ; Y | H ) . The reverse model GMI requires Var U | Y , S R , which can be computed by simulation; see Appendix C.2. Again, however, optimizing the powers seems difficult. We instead focus on the forward model GMI of Corollary 1, which is
I 1 ( A ; Y | H ) = 0 e g log 1 + SNR ( g ) d g
where
SNR ( g ) = g P ˜ T ( g ) 1 + g ϵ ϵ ¯ P ( 0 ) P ( 1 ) 2
and
P ˜ T ( g ) = ϵ ¯ P ( 0 ) + ϵ P ( 1 ) 2 , g < Δ ϵ P ( 0 ) + ϵ ¯ P ( 1 ) 2 , g Δ .
It remains to optimize P ( 0 ) , P ( 1 ) and Δ . Computing the derivatives seems complicated, so we use numerical optimization for fixed Δ = 1 as in Figure 8. The results are shown in Figure 9. For fixed Δ and large P, it is best to choose P ( 0 ) P ( 1 ) so that S N R ( g ) g P and we approach the rate of no CSIT. For small P, however, the best P ( 0 ) is no longer zero and C ( 0 ) is smaller than (281).

8.4. Partial CSIR, Full CSIT

Consider S T = H and suppose we choose the X ( h ) to be jointly CSCG with variances E | X ( h ) | 2 = P ( h ) and correlation coefficients
ρ ( h , h ) = E X ( h ) X ( h ) * P ( h ) P ( h )
and where E P ( H ) P . We then have
p ( y | s R ) = C p ( h | s R ) e | y | 2 / ( | h | 2 P ( h ) + 1 ) π ( | h | 2 P ( h ) + 1 ) d h .
As in (97), p ( y | s R ) and h ( Y | S R ) depend only on the marginals of A and not on the ρ ( h , h ) . We thus have the problem of finding the ρ ( h , h ) that minimize
h ( Y | A , S R ) = A p ( a ) h ( Y | S R , A = a ) d a .
We will use fully-correlated X ( h ) as discussed in Section 6.5. We again consider S R = 0 and S R = 1 ( G t ) .
S R = 0 : For the heuristic policies, the power (206) is
P ^ = P Γ 1 + a , t
and the rate (219) is
I 1 ( A ; Y ) = log 1 + P Γ 3 + a 2 , t 2 Γ 1 + a , t + P Γ 2 + a , t Γ 3 + a 2 , t 2
where Γ ( s , x ) is the upper incomplete gamma function; see Appendix A.3. Moreover, the expression (220) is
E b N 0 min = Γ 1 + a , t Γ 3 + a 2 , t 2 · log 2 .
We remark that Γ ( s , 0 ) = Γ ( s ) where Γ ( x ) is the gamma function. We further have
Γ ( 0 , t ) = E 1 ( t ) , Γ ( 1 , t ) = e t , Γ ( 2 , t ) = e t ( t + 1 ) , Γ ( 3 , t ) = e t ( t 2 + 2 t + 2 ) .
For example, the TCP policy ( a = 0 ) has P ^ = P e t . At low SNR, it turns out that the best choice is t = 0.283 for which we have Γ ( 1 , t ) / Γ ( 3 / 2 , t ) 2 1.174 . The minimum E b / N 0 in (222) is thus 0.90 dB. At high SNR, the best choice is t = 0 so that (289) with Γ ( 3 / 2 , 0 ) = Γ ( 3 / 2 ) = π / 2 gives
I 1 ( A ; Y ) = log 1 + P π / 4 1 + P ( 1 π / 4 ) .
The TCP rate thus saturates at 2.22 bits per channel use; see the curve labeled “TCP, GMI, K = 1” in Figure 10.
The TMF policy ( a = 1 ) has P ^ = P e t / ( t + 1 ) . The best choice is t = 0 for which we have Γ ( 2 ) = 1 and Γ ( 3 ) = 2 and therefore (289) is
I 1 ( A ; Y ) = log 1 + P 1 + P .
The minimum E b / N 0 in (222) is 1.59 dB, and at high SNR, the TMF rate saturates at 1 bit per channel use. The rates are shown as the curve labeled “TMF, GMI, K = 1” in Figure 10.
The TCI policy ( a = 1 ) has P ^ = P / E 1 ( t ) and using Γ ( 0 , t ) = E 1 ( t ) and Γ ( 1 , t ) = e t gives
I 1 ( A ; Y ) = log 1 + P e 2 t E 1 ( t ) + P e t 1 .
The minimum E b / N 0 in (290) is
E b N 0 min = E 1 ( t ) e 2 t · log 2 .
Optimizing over t by taking derivatives (see (A5) below), the best t satisfies the equation 2 t e t E 1 ( t ) = 1 which gives t 0.61 and the minimal E b / N 0 is approximately 0.194 dB. On the other hand, for large SNR, we may choose t = 1 / P and using E 1 ( t ) log ( 1 / t ) for small t gives
I 1 ( A ; Y ) log 1 + P 1 + log P .
Since the pre-log is at most 1, the capacity grows with pre-log 1 for large P. We see that TMF is best at small P while TCI is best at large P. The rates are shown as the curve labeled “TCI, GMI, K = 1” in Figure 10.
The simple channel output of TCI permits further analysis. Using Remark 65, we compute the mutual information I ( A ; Y ) by numerical integration; see the curve labeled “TCI, I ( A ; Y ) ” in Figure 10. We see that at high SNR, the TCI mutual information is larger than the GMI for TCP, TMF, and (of course) TCI. Moreover, as we show, the TCI mutual information can work well at low SNR.
Motivated by Section 7.3 and Figure 5, we again use the GMI (154) with K = 2 and (65). We further choose h 1 = 0 , σ 1 2 = σ 2 2 = 1 , and
X ¯ = P ^ h 2 U , U CN ( 0 , 1 ) .
The expression (154) simplifies to
I 1 ( A ; Y ) = Pr E 2 log 1 + P ^ + E | Y | 2 | E 2 1 + P ^ E Y P ^ U 2 E 2 .
The GMI (295) exhibits interesting high and low SNR scaling by choosing the following thresholds t , t R .
  • For high SNR, we choose
    t = P λ and t R = P ^ λ R
    where 0 < λ < 1 and 0 < λ R < 1 . As P increases, t decreases and Appendix B.4 shows that
    Pr E 2 1 , E | Y | 2 | E 2 1 + P ^ 1 , E Y P ^ U 2 | E 2 1 .
    Inserting P ^ = P / E 1 ( t ) , we thus have
    lim P I 1 ( A ; Y ) log 1 + P E 1 ( t ) = 0 .
    We further have E 1 ( t ) λ log P by using (A6) in Appendix A.2, and the high-SNR slope of the GMI matches the slope of log P but the additive gap to log P increases. The high SNR rates are shown as the curve labeled “TCI, GMI, K = 2” in Figure 10 for λ = λ R = 0.4 .
  • For low SNR, we choose
    t = log ( P / c ) and t R = P ^
    for a constant c > 0 . As P decreases, both t and P ^ = P / E 1 ( t ) increase and Appendix B.4 shows that
    Pr E 2 e t 1 , E | Y | 2 | E 2 1 + 2 P ^ 1 , E Y P ^ U 2 | E 2 1 .
    Using (A7), we have I 1 ( A ; Y ) e t 1 log t which vanishes as t grows. But we also have
    E b N 0 = P R log 2 c e t log 2 e t 1 log t c e log 2 log ( log P )
    which decreases (very slowly) as P decreases. The minimal E b / N 0 is therefore . The low SNR rates are shown as the curve labeled “TCI, GMI, K = 2” in Figure 11 for c = 1.4 .
Figure 11. Low-SNR rates for Rayleigh fading with S T = H and S R = 0 . The threshold t was optimized for the K = 1 curves, while t = log ( P / 1.4 ) for the I ( A ; Y ) , rGMI, and K = 2 curves. The K = 2 GMI uses t R = P ^ . The TMF and TMMSE GMIs are indistinguishable for this range of rates.
Figure 11. Low-SNR rates for Rayleigh fading with S T = H and S R = 0 . The threshold t was optimized for the K = 1 curves, while t = log ( P / 1.4 ) for the I ( A ; Y ) , rGMI, and K = 2 curves. The K = 2 GMI uses t R = P ^ . The TMF and TMMSE GMIs are indistinguishable for this range of rates.
Entropy 25 00728 g011
Figure 11 shows that the TCI mutual information achieves a minimal E b / N 0 below 1.59 dB. At E b / N 0 = 2 dB, we computed I 1 ( A ; Y ) 6 × 10 7 and I ( A ; Y ) 3 × 10 4 . The K = 2 partition is thus useful to prove that TCI can achieve E b / N 0 arbitrarily close to zero. Figure 11 also shows the reverse model GMI as the curve labeled “TCI, rGMI” which has the rate I 1 ( A ; Y ) 8 × 10 6 at E b / N 0 = 2 dB.
We compare the full CSIR and full CSIT rates. At high SNR, the GMI for S R = 0 achieves the same capacity pre-log as S R = H . At low SNR, recall from (271) that with full CSIR/CSIT we have E b / N 0 log ( 2 ) / λ . To compare the rates for similar E b / N 0 , we set λ = log t , where t is as in (299) and c 1 . The TCI K = 2 GMI without CSIR is approximately e t log t while the full CSIR rate (271) is approximately e λ / λ 1 / ( t log ( t ) ) . Thus, the K = 2 GMI with no CSIR is a fraction t e t log ( t ) 2 of the full CSIR capacity.
S R = 1 ( G t ) : The power in (206) is again (288) and the rate (224) is
I 1 ( A ; Y | S R ) = e t · log 1 + P e 2 t Γ 3 + a 2 , t 2 Γ 1 + a , t + P e t Γ 2 + a , t e 2 t Γ 3 + a 2 , t 2 .
Moreover, the expression (225) is
E b N 0 min = Γ 1 + a , t e t · Γ 3 + a 2 , t 2 · log 2
which is the same as (290) except for the factor e t in the denominator. This implies that the minimal E b / N 0 can be improved for t > 0 .
The TCP, TMF, and TCI rates (302) are the respective
I 1 ( A ; Y | S R ) = e t log 1 + P e 2 t Γ 3 2 , t 2 e t + P t + 1 e 2 t Γ 3 2 , t 2
I 1 ( A ; Y | S R ) = e t log 1 + P ( t + 1 ) 2 e t ( t + 1 ) + P
I 1 ( A ; Y | S R ) = e t log 1 + P E 1 ( t ) .
Remark 73.
As pointed out in Remark 68, the TCI GMI (306) is I ( A ; Y | S R ) . One can also understand this by observing that the receiver knows G P ( G ) for all G. The mutual information is thus related to the rate (189) of Proposition 2.
The minimal E b / N 0 in (303) are the respective
E b N 0 min = 1 e 2 t · Γ 3 2 , t 2 · log 2
E b N 0 min = 1 t + 1 · log 2
E b N 0 min = e t E 1 ( t ) · log 2 .
The above expressions mean that, for all three policies, we can make the minimal E b / N 0 as small as desired by increasing t. For example, for TCI, we can bound (see (A9) below)
1 t + 1 < e t E 1 ( t ) < 1 t .
TCI thus has a slightly larger (slightly worse) minimal E b / N 0 than TMF for the same t, as discussed after (212).
For large P, the TCP rate (304) is optimized by t 0.163 and the rate saturates at 2.35 bits per channel use. The TMF rate (305) is optimized with t = 0 , and the rate saturates at 1 bit per channel use. For the TCI rate (306), we again choose t = 1 / P and use E 1 ( t ) log ( 1 / t ) for small t to show that the capacity grows with pre-log 1:
I 1 ( A ; Y | S R ) log 1 + P log P .
Again, TMF is best at small P while TCI is best at large P.
Remark 74.
Comparing (298) and (306), the S R = 0 , K = 2 , TCI GMI in (295) approaches the S R = 1 ( G t ) mutual information I ( A ; Y | S R ) in (306) at high SNR.
Optimal Policy: Consider now the optimal power control policy. Suppose first that S R = 0 for which Theorem 2 gives the TMMSE policy with t = 0 :
P ( h ) = α | h | β + | h | 2 .
For Rayleigh fading, we thus have (see (A13) below)
P = 0 e g α 2 g ( β + g ) 2 d g = α 2 ( β + 1 ) e β E 1 ( β ) 1
with the two expressions (see (A12) and (A14) below)
P ˜ = 0 e g α 2 g β + g d g = α 2 1 β e β E 1 ( β ) 2
E G P ( H ) = 0 e g α 2 g 2 ( β + g ) 2 d g = α 2 1 + β β ( β + 2 ) e β E 1 ( β ) .
Given P and β , we may compute α 2 from (312). We then search for the optimal β for fixed P. The rates are shown as the curve labeled “TMMSE, GMI, K = 1” in Figure 10 and Figure 11 and we see that the TMMSE strategy has the best K = 1 rates.
Consider next S R = 1 ( G t ) and the TMMSE policy. We compute (see (A13) below)
P = t e g α 2 g ( β + g ) 2 d g = α 2 ( β + 1 ) e β E 1 ( t + β ) e t β t + β
and (see (A12) and (A14) below)
P ˜ ( 1 ) = t e g e t α g β + g d g = α 1 β e t + β E 1 ( t + β )
E | Y | 2 | S R = 1 = t e g e t 1 + α 2 g 2 ( β + g ) 2 d g = 1 + α 2 1 + β 2 t + β β ( β + 2 ) e t + β E 1 ( t + β ) .
We optimize as for the S R = 0 case: given P, β , t, we compute α 2 from (315). We then search for the optimal β for fixed P and t. The optimal t is approximately a factor of 1.1 smaller than for the TCI policy. The rates are shown in Figure 12 as the curve labeled “TMMSE, GMI”.

8.5. Partial CSIR, CSIT@ R

Suppose S R is defined by (see (172))
H = ϵ ¯ S R + ϵ Z R
where 0 ϵ 1 and S R , Z R are independent with distribution CN ( 0 , 1 ) . We further consider the CSIT S T = | S R | 2 .
The reverse model GMI again requires Var U | Y , S R , which can be computed by simulation; see Appendix C.4. However, as in Section 7.4 and Section 8.3, optimizing the powers seems difficult, and we instead focus on forward models. The expressions (235) and (236) are
g ˜ ( s R ) = ϵ ¯ s T , σ ˜ 2 ( s R ) = ϵ .
The GMI (237) of Theorem 3 is
I 1 ( X ; Y | S R ) = λ / ϵ ¯ e s T log 1 + ϵ ¯ s T P ( s T ) 1 + ϵ P ( s T ) d s T
where the power control policy P ( s T ) is given by (245). The parameter λ is chosen so that E P ( S T ) = P . For example, for ϵ 0 we recover the waterfilling solution (181). Figure 13 shows the quadratic and conventional waterfilling rates, which lie almost on top of each other. For example, the inset shows the rates for ϵ = 0.2 and a small range of E b / N 0 .

9. Channels with In-Block Feedback

This section generalizes Shannon’s model described in Section 4.1 to include block fading with in-block feedback. For example, the model lets one include delay in the CSIT and permits many other generalizations for network models [22].

9.1. Model and Capacity

The problem is specified by the FDG in Figure 14. The model has a message M, and the channel input and output strings
X i L = ( X i 1 , , X i L ) , Y i L = ( Y i 1 , , Y i L )
for blocks i = 1 , , n . The channel is specified by a string S H n = ( S H 1 , , S H n ) of i.i.d. hidden channel states. The CSIR S R i is a (possibly noisy) function of S H i for all i and . The receiver sees the channel outputs (see (159))
( Y i , S R i ) = f X i , S H i , Z i L , S R i
for some functions f ( · ) , = 1 , , L . Observe that the X i influence the Y i in a causal fashion. The random variables M , S H 1 , , S H n , Z 1 L , , Z n L are mutually independent.
We now permit past channel symbols to influence the CSIT; see Section 1.2. Suppose the CSIT has the form
S T i = f T S H i , X i 1 , Y i 1
for some function f T ( . ) and for all i and . The motivation for (321) is that useful CSIR may not be available until the end of a block or even much later. In the meantime, the receiver can, e.g., quantize the Y i 1 and transmit the quantization bits via feedback. This lets one study fast power control and beamforming without precise knowledge of the channel coefficients.
Define the string of past and current states as
s T i = s T 1 L , , s T ( i 1 ) L , s T i .
The channel input at time i is X ( s T i ) and the adaptive codeword A n L is defined by the ordered lists
A i = X ( s T i ) , s T i
for 1 i n and 1 L . The adaptive codeword A n L is a function of M and is thus independent of S H n and S R n L .
The model under consideration is a special case of the channels introduced in ([22], Section V). However, the model in [22] has transmission and reception begin at time = 2 rather than = 1 . To compare the theory, one must thus shift the time indexes by 1 unit and increase L to L + 1 . The capacity for our model is given by ([22], Theorem 2) which we write as
C = ( a ) max A L 1 L I ( A L ; Y L , S R L ) = ( b ) max A L 1 L I ( A L ; Y L | S R L ) .
where ( a ) follows by normalizing by L rather than L + 1 , and step ( b ) follows by the independence of A L and S R L .

9.2. GMI for Scalar Channels

We will study scalar block fading channels; extensions to vector channels follow as described in Section 4.4. Let Y ̲ = [ Y 1 , , Y L ] T be the vector form of Y L and similarly for other strings with L symbols. The GMI with parameter s is
I s ( A L ; Y L | S R L ) = E log q ( Y ̲ | A ̲ , S ̲ R ) s q ( Y ̲ | S ̲ R )
Reverse Model: For the reverse model, let A ̲ be a column vector that stacks the X ( s T ) for all s T and . Consider a reverse density as in (105):
q a L | y L = exp z ̲ ( y ̲ , s ̲ R ) Q A ̲ | Y ̲ = y ̲ , S ̲ R = s ̲ R 1 z ̲ ( y ̲ , s ̲ R ) π N det Q A ̲ | Y ̲ = y ̲ , S ̲ R = s ̲ R
where
z ̲ ( y ̲ , s ̲ R ) = a ̲ E A ̲ | Y ̲ = y ̲ , S ̲ R = s ̲ R .
Using the forward model q ( y L | a L ) = q ( a L | y L ) / p ( a L ) , the GMI with s = 1 becomes
I 1 ( A L ; Y L , S R L ) = E log det Q A ̲ det Q A ̲ | Y ̲ , S ̲ R .
To simplify, consider adaptive symbols as in (89) (cf. (107)):
X ( S T ) = P ( S T ) e j ϕ ( S T ) U
where U ̲ CN ( 0 ̲ , I ) . In other words, consider a conventional codebook represented by the U and adapt the power and phase based on the available CSIT. The mutual information becomes I ( A L ; Y L , S R L ) = I ( U L ; Y L , S R L ) (cf. (96)) and the GMI with s = 1 is (cf. (108))
I 1 ( A L ; Y L | S R L ) = E log det Q U ̲ Y ̲ , S ̲ R .
In fact, one may also consider choosing U = U for all in which case we compute (cf. (139))
I 1 ( A L ; Y L | S R L ) = E log Var U | Y ̲ , S ̲ R .
Forward Model: Consider the following forward model (cf. (111) and (141)):
q ( y ̲ | a ̲ , s ̲ R ) = exp z ̲ ( s ̲ R ) Q Z ̲ ( s ̲ R ) 1 z ̲ ( s ̲ R ) π L det Q Z ̲ ( s ̲ R ) .
with
z ̲ ( s ̲ R ) = y ̲ H ( s ̲ R ) x ¯ ̲ ( s ̲ R )
and where similar to (142) we define
X ¯ ̲ ( s ̲ R ) = s ̲ T W ( s ̲ T , s ̲ R ) X ̲ ( s ̲ T )
where the W ( s ̲ T , s ̲ R ) are L × L complex matrices. Note that
X ̲ ( s ̲ T ) = [ X 1 ( s T 1 ) , X 2 ( s T 2 ) , , X 2 ( s T L ) ] T
so X is a function of A L and S T , = 1 , , L .
We have the following generalization of Lemma 4 (see also Theorem 1) where the novelty is that S T is replaced with S ̲ T . Define U ̲ ( s ̲ T ) CN ( 0 ̲ , I ) and X ̲ ( s ̲ T ) = Q X ̲ ( s ̲ T ) 1 / 2 U ̲ ( s ̲ T ) for all s ̲ T .
Theorem 4.
A GMI (325) for the scalar block fading channel p ( y L | a L , s R L ) , an adaptive codeword A L with jointly CSCG entries, the auxiliary model (330), and with fixed Q X ( s ̲ T ) is
I 1 ( A L ; Y L | S R L ) = E log det Q Y ̲ ( S ̲ R ) det Q Y ̲ ( S ̲ R ) D ˜ ( S ̲ R ) D ˜ ( S ̲ R ) .
where
Q Y ̲ ( s ̲ R ) = E Y ̲ Y ̲ S ̲ R = s ̲ R
and for M × M unitary V R ( s ̲ T , s ̲ R ) the matrix D ˜ ( s ̲ R ) is
E U T ( S ̲ T , s ̲ R ) Σ ( S ̲ T , s ̲ R ) V R ( S ̲ T , s ̲ R ) S ̲ R = s ̲ R
and U T ( s ̲ T , s ̲ R ) and Σ ( s ̲ T , s ̲ R ) are N × N unitary and N × M rectangular diagonal matrices, respectively, of the SVD
E Y ̲ U ̲ ( s ̲ T ) S ̲ T = s ̲ T , S ̲ R = s ̲ R = U T ( s ̲ T , s ̲ R ) Σ ( s ̲ T , s ̲ R ) V T ( s ̲ T , s ̲ R )
for all s ̲ T , s ̲ R and the V T ( s ̲ T , s ̲ R ) are M × M unitary matrices. One may maximize (333) over the unitary V R ( s ̲ T , s ̲ R ) .
Suppose next that the actual channel is Y ̲ = H X ̲ + Z ̲ where Z ̲ CN ( 0 ̲ , I ) . The extension of (136) and (168) to block fading channels with CSIR is
I 1 ( A L ; Y L | S R L ) = = 1 L E log 1 + P ˜ ( S ̲ R ) 1 + E G P ( S T ) | S ̲ R P ˜ ( S ̲ R )
where (cf. (166) and (167))
P ˜ ( s ̲ R ) = E E H P ( S T ) S T , S ̲ R = s ̲ R 2 E | Y | 2 | S ̲ R = s ̲ R = 1 + E G P ( S T ) | S ̲ R = s ̲ R .

9.3. CSIT@ R

Continuing as in Section 5.2, suppose the CSIT in (321) can be written by replacing S H i with S R i for all i and :
S T i = f T S R i , X i 1 , Y i 1 .
The capacity (324) then simplifies to a directed information. To see this, expand the mutual information in (324) as
I ( A L ; Y L | S R L ) = ( a ) = 1 L I A L , X ; Y | S R L , Y 1 = ( b ) = 1 L I ( X ; Y | S R L , Y 1 )
where step ( a ) follows because X is a function of A L and S T in (338), and step ( b ) follows by the Markov chains
A L [ S R L , X , Y 1 ] Y .
The capacity is therefore (see the definition (27))
C = max X ( S T ) , = 1 , , L 1 L I ( X L Y L | S R L ) .
The maximization in (341) under a cost constraint becomes a constrained maximization for which E c ( X L , Y L ) L P for some cost function c ( · ) .
Remark 75.
As outlined at the end of Section 9.1, the capacity (341) is a special case of the theory in ([22], Equation (48)). To see this, define the extended and time-shifted strings
A ^ L + 1 = ( 0 , A L ) , X ^ L + 1 = ( 0 , X L ) , Y ^ L + 1 = ( 0 , Y L ) .
Since A L and S R L are independent, one may expand (339) as
I ( A L ; Y L | S R L ) = I ( A L ; ( S R 2 , , S R L , 0 ) , Y L | S R 1 ) = ( a ) = 1 L I ( A L , X ; S R ( + 1 ) , Y | S R , Y 1 ) = ( b ) = 1 L I ( X ; S R ( + 1 ) , Y | S R , Y 1 ) = = 2 L + 1 I ( X ^ ; S R , Y ^ | S R 1 , Y ^ 1 )
where step ( a ) follows because X is a function of A L and S T in (338), and where S R ( L + 1 ) = 0 , and step ( b ) follows by the Markov chains
A L [ X , Y 1 , S R ] [ Y , S R ( + 1 ) ] .
The expression (342) is the desired directed information
I ( A L ; Y L , S R L ) = I ( X ^ L + 1 Y ^ L + 1 , S R L + 1 ) .
Remark 76.
Consider the basic CSIT model
S T i = f T ( S R i )
for some function f T ( · ) and for = 1 , , L and i = 1 , , n . This model was studied in ([103], Section III.C) and its capacity is given as (see ([103], Equation (35) with Equation (13)))
C = max X ( S T ) , = 1 , , L 1 L I ( X L ; Y L | S R L , S T L ) .
To see that (346) is a special case of (341), observe that
I ( X L Y L | S R L ) = ( a ) = 1 L I ( X ; Y | S R L , S T L , Y 1 ) = ( b ) = 1 L I ( X L ; Y | S R L , S T L , Y 1 )
where step ( a ) follows by (339), and step ( b ) follows by the Markov chains
[ X + 1 , , X L ] [ S R L , S T L , Y 1 , X ] Y .
The expression (347) gives (346). Related results are available in ([10], Section III) and [104,105].
Remark 77.
The capacity (341) has only S R L in the conditioning while (346) has both S R L and S T L in the conditioning. This subtle difference is due to permitting X 1 to influence the S T in (338), and it complicates the analysis. On the other hand, if we remove only X 1 from (338) then the receiver knows S T at time ℓ and the capacity (341) can be written as (see the definition (28))
C = max X ( S T ) , = 1 , , L 1 L I ( X L Y L S T L | S R L ) .
We treat such a model in Section 9.7 below.

9.4. Fading Channels with AWGN

The expression (341) is valid for general statistics. We next specialize to the block-fading AWGN model
Y = H X + Z
where = 1 , , L , Z L CN ( 0 ̲ , I ) , and ( H , S R L ) , A L , Z L are mutually independent. Consider the power constraint
= 1 L E P S T L P
where P ( s T ) = E | X ( s T ) | 2 . The optimization of (341) under the constraint (351) is usually intractable, and we again desire expressions with log ( 1 + SNR ) terms to obtain insight.
Capacity Upper Bound: Using similar steps as in (162), we have
I ( A L ; Y L | S R L ) I ( A L ; Y L , H | S R L ) = = 1 L I A L ; Y | S R L , H , Y 1 = 1 L h ( Y | S R L , H , Y 1 ) h ( Z ) ( a ) = 1 L E log 1 + E G P ( S T ) | S R L , H , Y 1
where G = | H | 2 and step ( a ) follows by (163). However, CSCG inputs do not necessarily maximize the RHS of (352) because the inputs affect the CSIT.
Remark 78.
The expectation inside the logarithm in (352) becomes G P ( S T ) if S T is a function of S R L , H , Y 1 ; see (161), Remark 77, and Proposition 3 below.
Achievable Rates: Deriving achievable rates is more subtle than in Section 6. Consider the CSIT model (338) where for each block, we have
S T = f T ( H , X 1 , Y 1 )
for all . The capacity (341) is
C ( P ) = max X ( S T ) , = 1 , , L 1 L I ( X L Y L | H )
= max X ( S T ) , = 1 , , L 1 L h ( Y L | H ) log ( π e ) .
However, CSCG inputs are not necessarily optimal since the inputs affect the CSIT.
Instead of trying to optimize the input, consider X that are CSCG. We may write
I ( X L Y L | H ) = = 1 L E log 1 + G P ( S T )
and the Lagrangians to maximize (355) are
= 1 L E log 1 + G P ( S T ) + λ L P = 1 L E P ( S T ) .
Suppose the S T are discrete random variables. Taking the derivative with respect to P ( s T ) , we obtain
λ = 0 p ( g | s T ) g 1 + g P ( s T ) d g + k = + 1 L s T k 0 p ( g ) d P S T k | G ( s T k | g ) d P ( s T ) log 1 + g P k ( s T k ) P S T ( s T ) d g
as long as P ( s T ) > 0 . This expression is complicated because the choice of transmit powers P ( s T ) influences the statistics of the future CSIT S T ( + 1 ) , , S T L . If (357) cannot be satisfied, choose P ( s T ) = 0 . Finally, set λ so that = 1 L E P ( S T ) = L P .
Instead of the above, consider the simpler CSIT model with S T = f T ( H ) for all , cf. (345). The capacity (346) is now given by (355) with CSCG inputs and (357) simplifies because the derivatives with respect to P ( s T ) are zero, i.e., the double sum in (357) disappears and for all and s T we have
λ = 0 p ( g | s T ) g 1 + g P ( s T ) d g .
We use (358) for (362)–(364) in Section 9.7 below.

9.5. Full CSIR, Partial CSIT

We next generalize Proposition 2 in Section 6.4 to the block-fading AWGN model (350) with the CSIR
S R = H P ( S T ) , = 1 , , L
and where S T = f T ( S H ) , i.e., we have discarded X i 1 and Y i 1 in (321). We then have the following capacity result that implies this CSIR is the best possible since one achieves a capacity upper bound similar to (161).
Proposition 3.
The capacity of the channel (350) with the CSIR (359) and S T = f T ( S H ) for = 1 , , L is
C ( P ) = max 1 L = 1 L E log 1 + G P ( S T )
where the maximization is over the power control policies P ( S T ) such that = 1 L E P ( S T ) L P . One may use (358) to compute the P ( S T ) .
Proof. 
For achievability, apply (337) with
P ˜ ( S ̲ R ) = G P ( S T ) and E | Y | 2 | S ̲ R = 1 + P ˜ ( S ̲ R ) .
The converse follows by applying similar steps as in (162):
I ( A L ; Y L | S R L ) I ( A L ; Y L , S T L , H | S R L ) = = 1 L I A L ; Y | S R L , S T L , H , Y 1 = 1 L h ( Y | S R L , S T L , H , Y 1 ) h ( Z ) ( a ) = 1 L E log Var Y | S R L , S T L , H , Y 1 .
Finally, insert Var Y | S R L , S T L , H , Y 1 = 1 + G P ( S T ) . □
The RHS of (361) is at most the RHS of (352) and hence (361) gives a better bound. However, the bound (361) is valid only for particular CSIT, as in Remark 78.

9.6. On-Off Fading with Delayed CSIT

Consider on-off fading where the CSIT is delayed by D symbols, i.e., we have S T = 0 for = 1 , , D and S T ( D + 1 ) = H . Define the transmit powers as P ( s T ) = E | X ( s T ) | 2 for = 1 , , L . The capacity is
C ( P ) = D 2 L log 1 + 2 P 1 + L D 2 L log 1 + 2 P D + 1
where we write P D + 1 = P D + 1 s T D + 1 . Optimizing the powers, we obtain
P 1 = P L D 4 L P D + 1 = 2 P + D 2 L if P L D 4 L P 1 = 0 P D + 1 = 2 L P L D else .
For large P, we thus have C ( P ) 1 2 log ( P ) for all 0 D L . For small P, we have
C ( P ) = L D 2 L log 1 + 4 L P L D , if 0 D < L log ( 1 + 2 P ) / 2 , if D = L 2 P 4 L L D P 2 log ( e ) , if 0 D < L P P 2 log ( e ) , if D = L .
The CSIT thus gives a 3 dB power gain at low SNR since C ( P ) 2 P log ( e ) for 0 D < L and C ( P ) P log ( e ) for D = L . Furthermore, using (37), the slope of the capacity versus E b / N 0 in bits/s/Hz/(3 dB) is
1 D / L if 0 D < L 1 if D = L .
In other words, the delay reduces the low-SNR rate by a factor of 1 D / L for 0 D < L .

9.7. Rayleigh Fading and One-Bit Feedback

Let q u ( . ) be the one-bit ( B = 1 ) quantizer in Section 2.9. We study Rayleigh fading for two scenarios with S R L = H , i.e., the receiver knows H after the L transmissions of each block.
  • For the CSIT (345), we study delayed feedback where S T = 0 for = 1 , , L 1 and S T L = q u ( G ) . The delay is thus D = L 1 in the sense of Section 9.6.
  • For the CSIT (338), we study the case S T 1 = 0 , S T 2 = q u ( | Y 1 | ) , and S T = 0 for = 3 , , L . The delay is thus D = 1 in the sense of Section 9.6.
Delayed Quantized CSIR Feedback: Consider S T = 0 for = 1 , , L 1 and S T L = q u ( G ) . CSCG inputs are optimal, and (347) has the same form as (360). The Lagrangians are given by (356), and we again obtain (358). For the case at hand, we have L + 1 equations for λ , namely
λ = 0 e g g 1 + g P d g , = 1 , , L 1
λ = 0 Δ e g 1 e Δ g 1 + g P L ( Δ / 2 ) d g
λ = Δ e g e Δ g 1 + g P L ( 3 Δ / 2 ) d g
where we used (40) and (41) and abused notation by writing P L ( s T L ) for P L ( s T L ) . We thus have P 1 = = P L 1 and obtain three equations. We now search for λ such that
( L 1 ) P 1 + s P S T L ( s ) P L ( s ) = L P
and the capacity (353) is
C ( P ) = L 1 L e 1 / P 1 E 1 1 / P 1 + 1 L s I ( s ) e g log 1 + g P L ( s ) d g
where the sums are over s = Δ / 2 , 3 Δ / 2 and
I ( Δ / 2 ) = [ 0 , Δ ) , I ( 3 Δ / 2 ) = [ Δ , ) .
We remark that, if P 1 = 0 , then we set e 1 / P 1 E 1 1 / P 1 = 0 since lim x e x E 1 ( x ) = 0 .
Figure 15 shows these capacities for L = 1 , 2 , 3 and Δ = 1 . At low SNR (e.g., for L = 3 below 2.97 dB) we have P 1 = 0 and P L ( Δ / 2 ) = 0 , i.e., the transmitter is silent unless S T L = 3 Δ / 2 and it uses power at time = L only. Observe that, as in Section 9.6, a delay of L steps reduces the low-SNR slope, and therefore the low-SNR rates, by a factor of L. Delay can thus be costly at low SNR.
Quantized Channel Output Feedback: Consider S T 1 = 0 , S T 2 = q u ( | Y 1 | ) , and S T = 0 for = 3 , , L . As discussed in Remark 77, the capacity is given by the directed information expression (349). However, optimizing the input statistics seems difficult, i.e., CSCG inputs are not necessarily optimal. Instead, we compute achievable rates for a strategy where one symbol partially acts as a pilot.
Suppose the transmitter sends X 1 = P 1 e j Φ as the first symbol of each block, where Φ is uniformly distributed in [ 0 , 2 π ) . The idea is that | X 1 | = P 1 is known at the receiver, and thus X 1 acts as a pilot to test the channel amplitude. Next, we choose a variation of flash signaling. Define the event E = { | Y 1 | Δ } = { S T 2 = 3 Δ / 2 } . If this event does not occur, the transmitter sends X = 0 for = 2 , , L . Otherwise, the transmitter sends independent CSCG X with variance P 2 / Pr E for = 2 , , L . Define P ( s T ) = E | X ( s T ) | 2 . We have P = P 2 for 2 and the power constraint is P 1 + ( L 1 ) P 2 L P .
We use (347) to write
C ( P ) 1 L I ( X 1 ; Y 1 | H ) + L 1 L I ( X 2 ; Y 2 | H , Y 1 ) .
The first mutual information in (366) is
I ( X 1 ; Y 1 | H ) = h ( Y 1 | H ) log ( π e )
and we compute (see ([52], Appendix A))
p ( y 1 | h ) = 1 π e ( | y 1 | 2 + P 1 | h | 2 ) I 0 2 | y 1 | | h | P 1
where I 0 ( . ) is the modified Bessel function of the first kind of order zero. The Jacobian of the mapping from Cartesian coordinates [ ( y 1 ) , ( y 1 ) ] to polar coordinates [ | y 1 | , arg y 1 ] is | y 1 | , so we have
h ( Y 1 | H = h ) = 0 p ( y 1 | h ) log ( p ( y 1 | h ) ) 2 π | y 1 | d | y 1 | .
We further compute
I X 2 ; Y 2 | H , Y 1 = 0 e g Pr E | G = g log 1 + g P 2 Pr E d g .
The conditional probability of a high-energy Y 1 is
Pr E | G = g = Q 1 2 g P 1 , 2 Δ
where Q 1 ( . ) is the Marcum Q-function of order 1; see (A3) in Appendix A.1. For Rayleigh fading, we compute
Pr E = Pr H P 1 e j Φ + Z 1 2 Δ 2 = e Δ 2 / ( P 1 + 1 ) .
The resulting rates are shown in Figure 16 for the block lengths L = 10 , 20 , 100 . Observe that each curve turns back on itself, which reflects the non-concavity of the directed information rates in P; see ([74], Section III). All rates below the curves are achievable by “time-wasting”, i.e., by transmitting for some fraction of the time only. This suggests that flash signaling [73] will improve the rates since one sends information by choosing whether to transmit energy.

10. Conclusions

This paper reviewed and derived achievable rates for channels with CSIR, CSIT, block fading, and in-block feedback. GMI expressions were developed for adaptive codewords and two classes of auxiliary channel models with AWGN and CSCG inputs: reverse and forward channel models. The forward model inputs were chosen as linear functions of the adaptive codeword’s symbols. We showed that, for scalar channels, an input distribution that maximizes the GMI generates a conventional codebook, where the codeword symbols are multiplied by a complex number that depends on the CSIT. The GMI increases by partitioning the channel output alphabet and modifying the auxiliary model parameters for each partition subset. The partitioning helps to determine the capacity scaling at high and low SNR. Power control policies were developed for full CSIT, including TMMSE policies. The theory was applied to channels with on-off fading and Rayleigh fading. The capacities with in-block feedback simplify to directed information expressions if the CSIT is a function of the CSIR and past channel inputs and outputs.
There are many possible applications and extensions of this work. For example, adaptive coding and modulation are important for all practical communication systems, including wireless, copper, and fiber-optic networks. Shannon’s adaptive codewords can improve current systems since the CSIT is usually a noisy version of the CSIR; see Remark 25. Moreover, the information theory for in-block feedback [22] applies to beamforming [106] and intelligent reflecting surfaces [107,108]. One may also apply GMI to multi-user channels with in-block feedback, such as multi-access and broadcast channels. Finally, it is important to develop improved capacity upper bounds. The standard approach here is the duality framework described in [97,109]; see also ([110], page 128).

Funding

This work was supported by the 6G Future Lab Bavaria funded by the Bavarian State Ministry of Science and the Arts, the project 6G-life funded by the Germany Federal Ministry for Education and Research (BMBF), and by the German Research Foundation (DFG) through projects 390777439 and 509917421.

Acknowledgments

The author wishes to thank the reviewers for their helpful comments and W. Zhang for sending his recent paper [50].

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Special Functions

This appendix reviews three classes of functions that we use to analyze information rates: the non-central chi-squared distribution, the exponential integral, and gamma functions.

Appendix A.1. Non-Central Chi-Squared Distribution

The non-central chi-squared distribution with two degrees of freedom is the probability distribution of Y = | x + Z | 2 where x C and Z CN ( 0 , 2 ) . The density is
p ( y ) = 1 2 e ( y + | x | 2 ) / 2 I 0 ( | x | y ) · 1 ( y 0 )
where I 0 ( . ) is the modified Bessel function of the first kind of order zero. The cumulative distribution function is
Pr Y t = 1 Q 1 | x | , t
where Q 1 ( . ) is the Marcum Q-function of order 1. Observe that if we change Z to Z CN ( 0 , σ 2 ) then for Y = | x + Z | 2 we instead have
Pr Y t = 1 Q 1 2 | x | 2 / σ 2 , 2 t / σ 2 .

Appendix A.2. Exponential Integral

The exponential integral is defined for x > 0 as
E 1 ( x ) = x e t t d t .
The derivative of E 1 ( x ) is
d E 1 ( x ) d x = e x x .
For small x one may apply ([111], Equation (3))
E 1 ( x ) γ log x + x
where γ 0.57721 is Euler’s constant. For large x we have
E 1 ( x ) e x x 1 1 x + 2 x 2 6 x 3 .
We have the bounds [112]
1 2 log 1 + 2 x < e x E 1 ( x ) < log 1 + 1 x
1 x + 1 < e x E 1 ( x ) < x + 1 x ( x + 2 ) .
Using integration by parts, for x > 0 , we have
x e t log t d t = E 1 ( x ) + e x log ( x )
x e t 1 t 2 d t = e x x E 1 ( x ) .
Using the translation t ˜ = t + y we also have
x e t t t + y d t = e x y e y E 1 ( x + y )
x e t t ( t + y ) 2 d t = e x y x + y + ( y + 1 ) e y E 1 ( x + y )
x e t t 2 ( t + y ) 2 d t = e x 1 + y 2 x + y y ( y + 2 ) e y E 1 ( x + y ) .

Appendix A.3. Gamma Functions

The upper and lower incomplete gamma functions are the respective
Γ ( s , t ) = t e g g s 1 d g
γ ( s , t ) = 0 t e g g s 1 d g .
For instance, we have Γ ( 1 , t ) = e t and γ ( 1 , t ) = 1 e t . We further have Γ ( 0 , t ) = E 1 ( t ) where E 1 ( x ) is the exponential integral defined in Appendix A.2.
The Gamma function is Γ ( s ) = Γ ( s , 0 ) = γ ( s , ) and for positive integers n we have
Γ ( n ) = ( n 1 ) ! , Γ n 1 2 = ( 2 n 2 ) ! 4 n 1 ( n 1 ) ! π .
For example, the following cases are used in Section 8.4:
Γ ( 1 ) = Γ ( 2 ) = 1 , Γ 1 2 = π , Γ 3 2 = π 2 , Γ 5 2 = 3 4 π .
The value Γ ( 0 ) is undefined but we have lim x 0 + Γ ( x ) = .

Appendix B. Forward Model GMIs with K = 2

This appendix studies K = 2 GMIs to develop high and low SNR capacity scaling results. Consider the independent random variables Z CN ( 0 , 1 ) and X CN ( 0 , P ) . We need the following expression for the event E = { | X + Z | 2 t R } :
E | Z | 2 E = C p Z | E ( z ) | z | 2 d z = 1 Pr E C e | z | 2 π | z | 2 Pr | X + z | 2 t R d z = e t R / ( 1 + P ) 0 e g g Q 1 2 g P , 2 t R P d g .
The integral can be computed using ([113], Equation (12)) with k = 2 , m = 1 , p = 1 , the Gamma functions above, and the following identities for Kummer’s confluent hypergeometric function:
1 F 1 ( 1 ; 2 ; z ) = ( e z 1 ) / z , 1 F 1 ( 2 ; 2 ; z ) = e z .
The result is
E | Z | 2 | X + Z | 2 t R = 1 + t R ( 1 + P ) 2 .

Appendix B.1. On-Off Fading

Consider on-off fading as in Section 3.3 and the K = 2 partition in Remark 14 with h 2 = 2 . We compute
Pr E 2 = h = 0 , 2 Pr H = h Pr E 2 | H = h = 1 2 e t R + 1 2 e t R / ( 1 + 2 P ) .
If t R = P λ R + b where 0 < λ R < 1 and b is a real constant then Pr E 2 1 / 2 as P , as desired. We further have
Pr H = 0 E 2 = e t R 2 Pr E 2
Pr H = 2 E 2 = e t R / ( 1 + 2 P ) 2 Pr E 2 .
The choice t R = P λ R + b gives Pr H = 2 E 2 1 as P . In other words, the receiver can reliably determine H by choosing t R to grow with P, but not too fast.
We next compute
E | Y | 2 | E 2 = h = 0 , 2 Pr H = h | E 2 E | Y | 2 | E 2 , H = h = e t R ( t R + 1 ) + e t R / ( 1 + 2 P ) ( t R + 1 + 2 P ) 2 Pr E 2 .
The choice t R = P λ R + b makes E | Y | 2 | E 2 / ( 1 + 2 P ) 1 as P . Finally, we compute
E | Y 2 X | 2 | E 2 = h = 0 , 2 Pr H = h | E 2 E Y 2 X 2 E 2 , H = h = 1 2 Pr E 2 e t R ( t R + 1 + 2 P ) + e t R / ( 1 + 2 P ) 1 + t R ( 1 + 2 P ) 2
where the last step uses (A18). The choice t R = P λ R + b makes E | Y 2 X | 2 | E 2 1 as P .

Appendix B.2. On-Off Fading, Partial CSIR, and Full CSIT

The analysis for Section 7.3 is similar to that of Appendix B.1. Consider the GMI (259) and observe that we can replace 2 P with 4 P in (A19)–(A22). We also have
E | Y 4 P U | 2 E 2 = h = 0 , 2 Pr H = h | E 2 E | Y 4 P U | 2 | E 2 , H = h = 1 2 Pr E 2 e t R ( t R + 1 + 4 P ) + e t R / ( 1 + 4 P ) 1 + t R ( 1 + 4 P ) 2 .
The choice t R = P λ R + b as in Appendix B.1 gives (260).

Appendix B.3. On-Off Fading, Partial CSIR, and CSIT@R

The analysis for Section 7.4 is similar to that of Appendices Appendix B.1 and Appendix B.2. We compute
Pr E 2 | S R = 0 = ϵ ¯ e t R + ϵ e t R / [ 1 + 2 P ( 0 ) ]
Pr E 2 | S R = 2 = ϵ e t R + ϵ ¯ e t R / [ 1 + 2 P ( 2 ) ] .
Suppose P ( 0 ) and P ( 2 ) both scale in proportion to P. If we choose t R = P λ R + b as in Appendix B.1 then Pr E 2 | S R = 0 ϵ and Pr E 2 | S R = 2 ϵ ¯ as P . We also have
Pr H = 0 E 2 , S R = 0 = ϵ ¯ e t R Pr E 2 | S R = 0
Pr H = 2 E 2 , S R = 0 = ϵ e t R / [ 1 + 2 P ( 0 ) ] Pr E 2 | S R = 0
and similarly for the probabilities Pr H = 0 | E 2 , S R = 2 and Pr H = 2 | E 2 , S R = 2 . Choosing t R = P λ R + b gives the desired behavior Pr H = 2 | E 2 , S R = 0 1 and Pr H = 2 | E 2 , S R = 2 1 as P . Again, the receiver can reliably determine H by choosing t R to grow with P, but not too fast.
We next have
E | Y | 2 | E 2 , S R = 0 = ϵ ¯ e t R ( t R + 1 ) + ϵ e t R / [ 1 + 2 P ( 0 ) ] ( t R + 1 + 2 P ( 0 ) ) Pr E 2 | S R = 0 .
The expression for E | Y | 2 | E 2 , S R = 2 is similar but ϵ and ϵ ¯ are swapped and P ( 0 ) is replaced with P ( 2 ) . We also have
E | Y 2 X ( 0 ) | 2 E 2 , S R = 0 = 1 Pr E 2 | S R = 0 { ϵ ¯ e t R ( t R + 1 + 2 P ( 0 ) ) + ϵ e t R / [ 1 + 2 P ( 0 ) ] 1 + t R ( 1 + 2 P ( 0 ) ) 2 .
The expression for E | Y 2 X ( 0 ) | 2 E 2 , S R = 2 is similar: swap ϵ and ϵ ¯ and replace P ( 0 ) with P ( 2 ) . The choice t = P λ R + b makes all terms in (265) behave as desired. We thus obtain (266).

Appendix B.4. Rayleigh Fading, No CSIR, full CSIT, and TCI

The analysis for Section 8.4 is similar to that of Appendices Appendix B.1Appendix B.3, but we now have a continuous H. Recall that E 2 = { | Y | 2 t R } and Y = P ( h ) U + Z where P ( h ) = 0 for g < t and P ( h ) = P ^ otherwise. We compute
Pr E 2 = Pr G < t Pr E 2 | G < t + Pr G t Pr E 2 | G t = ( 1 e t ) e t R + e t e t R / ( 1 + P ^ )
where we used Pr E 2 | G < t = Pr | Z | 2 t R and similarly for Pr E 2 | G t . For example, for the t and t R in (296) we find that Pr E 2 1 as P grows. Similarly, for the t and t R in (299) we find that Pr E 2 e t 1 as P decreases.
We write
E | Y | 2 | E 2 = Pr G < t | E 2 E | Z | 2 E 2 , G < t + Pr G t | E 2 E P ^ U + Z 2 E 2 , G t = ( 1 e t ) e t R ( t R + 1 ) + e t e t R / ( 1 + P ^ ) ( t R + 1 + P ^ ) Pr E 2 .
For the t and t R in (296) we have E | Y | 2 | E 2 / ( 1 + P ^ ) 1 as P grows. Similarly, for the t and t R in (299) we find that E | Y | 2 | E 2 / ( 1 + 2 P ^ ) 1 as P decreases. Next, we write
E Y P ^ U 2 E 2 = Pr G < t | E 2 E Z P ^ U 2 | Z | 2 t R + Pr G t | E 2 E | Z | 2 P ^ U + Z 2 t R = 1 Pr E 2 ( 1 e t ) e t R ( t R + 1 + P ^ ) + e t e t R / ( 1 + P ^ ) 1 + t R 1 + P ^ 2 .
For the t and t R in (296) the expression (A33) approaches 1 as P grows. Similarly, for the t and t R in (299) we find that (A33) approaches 1 as P decreases.

Appendix C. Conditional Second-Order Statistics

This appendix shows how to compute conditional second-order statistics for the reverse model GMIs and the forward model GMIs with K = . Suppose that U , Y are jointly CSCG given H = h . Using (25) and (26), we have
E U | Y = y , H = h = E U Y * | H = h E | Y | 2 | H = h · y
Var U | Y = y , H = h = E | U | 2 | H = h E U Y * | H = h 2 E | Y | 2 | H = h .
Now consider the channel Y = H X + Z where X = P ( S T ) e j ϕ ( S T ) U with U CN ( 0 , 1 ) . We may write
E U | Y = y , S R = s R = C × S T p ( h , s T | y , s R ) h * P ( s T ) e j ϕ ( s T ) y 1 + | h | 2 P ( s T ) d s T d h
and
E | U | 2 | Y = y , S R = s R = C × S T p ( h , s T | y , s R ) 1 1 + | h | 2 P ( s T ) + | h | 2 P ( s T ) | y | 2 1 + | h | 2 P ( s T ) 2 d s T d h .

Appendix C.1. No CSIR, No CSIT

Consider S R = S T = 0 . The expectations in (A36) and (A37) are computed via
p ( h | y ) = p ( h ) p ( y | h ) p ( y ) .
The expression (A36) with ϕ ( 0 ) = 0 gives
E U | Y = y = C p ( h | y ) h * P y 1 + | h | 2 P d h .
Similarly, the expression (A37) gives
E | U | 2 | Y = y = C p ( h | y ) E | X | 2 Y = y , H = h d h = C p ( h | y ) 1 1 + | h | 2 P + | h | 2 P | y | 2 1 + | h | 2 P 2 d h
We may now compute Var U | Y = y using (A39) and (A40). For the expressions (69) and (70), one may use
E X | Y = y = P E U | Y = y , E | X | 2 | Y = y = P E | U | 2 | Y = y .
For example, for on-off fading as in Section 3.3 we compute
E X | Y = y = P H | Y 2 y 2 P 1 + 2 P · y
E | X | 2 Y = y = P H | Y ( 0 | y ) P + P H | Y 2 y P 1 + 2 P + 2 P 2 | y | 2 ( 1 + 2 P ) 2
and therefore
Var X | Y = y = P H | Y ( 0 | y ) P + P H | Y 2 y P 1 + 2 P + 2 P 2 | y | 2 ( 1 + 2 P ) 2 P H | Y ( 0 | y )
where P H | Y 2 y = 1 P H | Y ( 0 | y ) and
P H | Y ( 0 | y ) = e | y | 2 e | y | 2 + 1 1 + 2 P e | y | 2 / ( 1 + 2 P ) .
For Rayleigh fading as in Section 8.1, the density (A38) is
p ( h | y ) = e g e | y | 2 / ( 1 + g P ) π 2 ( 1 + g P ) · 1 p ( y )
where g = | h | 2 . Moreover, p ( y ) in (268) depends on g only. We thus have E U | Y = y = 0 and the integrand in (A40) depends on g and | y | 2 only.

Appendix C.2. Full CSIR, Partial CSIT

Consider S R = H and partial S T . The expectations in (A36) and (A37) are computed via (194) that we repeat here:
p ( h , s T | y , s R ) = δ ( h s R ) p ( s T | h ) p ( y | h , s T ) p ( y | h ) .
For on-off fading as in Section 7.2, the expression (A36) with ϕ ( 0 ) = 0 gives the expectations E U | Y = y , H = 0 = 0 and
E U | Y = y , H = 2 = s T = 0 , 2 P S T | Y , H ( s T | y , 2 ) 2 P ( s T ) y 1 + 2 P ( s T )
and, similarly, (A37) gives E | U | 2 | Y = y , H = 0 = 1 and
E | U | 2 | Y = y , H = 2 = s T = 0 , 2 P S T | Y , H ( s T | y , 2 ) 1 1 + 2 P ( s T ) + 2 P ( s T ) | y | 2 1 + 2 P ( s T ) 2
where P S T | Y , H ( 2 | y , 2 ) = 1 P S T | Y , H ( 0 | y , 2 ) and
P S T | Y , H ( 0 | y , 2 ) = ϵ 1 + 2 P ( 0 ) e | y | 2 / ( 1 + 2 P ( 0 ) ) ϵ 1 + 2 P ( 0 ) e | y | 2 / ( 1 + 2 P ( 0 ) ) + ϵ ¯ 1 + 2 P ( 2 ) e | y | 2 / ( 1 + 2 P ( 2 ) ) .
For Rayleigh fading as in Section 8.3, the sums over s T = 0 , 2 become sums over s T = 0 , 1 and the probabilities P ( s T | y , h ) take on similar forms as above.

Appendix C.3. Partial CSIR, Full CSIT

Consider S T = H and partial S R . The expectations in (A36) and (A37) are computed via (201) that we repeat here:
p ( h , s T | y , s R ) = δ ( s T h ) p ( h | s R ) p ( y | h , s R ) p ( y | s R ) .
For on-off fading with S R = 0 as in Section 7.3, the expression (A36) with ϕ ( 0 ) = 0 gives
E U | Y = y = P H | Y 2 y 4 P y 1 + 4 P
and (A37) gives
E | U | 2 | Y = y = P H | Y ( 0 | y ) + P H | Y 2 y 1 1 + 4 P + 4 P | y | 2 ( 1 + 4 P ) 2
where P H | Y 2 y = 1 P H | Y ( 0 | y ) and
P H | Y ( 0 | y ) = e | y | 2 e | y | 2 + 1 1 + 4 P e | y | 2 / ( 1 + 4 P ) .
For Rayleigh fading with S R = 0 and TCI as in Section 8.4, the expressions (A36) and (A37) give (cf. (A41) and (A42))
E U | Y = y = Pr G t | Y = y P ^ y 1 + P ^ E | U | 2 | Y = y = Pr G < t | Y = y + Pr G t | Y = y 1 1 + P ^ + P ^ | y | 2 ( 1 + P ^ ) 2
and therefore (cf. (A43))
Var U | Y = y = Pr G < t | Y = y + Pr G t | Y = y · 1 1 + P ^ + P ^ | y | 2 ( 1 + P ^ ) 2 Pr G < t | Y = y
where (cf. (A44))
Pr G < t | Y = y = 1 e t e | y | 2 1 e t e | y | 2 + e t 1 1 + P ^ e | y | 2 / ( 1 + P ^ ) .

Appendix C.4. Partial CSIR, CSIT@R

Consider S T = S R and partial S R . The expectations in (A36) and (A37) are computed via (234) that we repeat here:
p ( h , s T | y , s R ) = δ s T f ( s R ) p ( h | s R ) p ( y | h , s R ) p ( y | s R ) .
For on-off fading as in Section 7.3, the expression (A36) with ϕ ( 0 ) = 0 gives
E U | Y = y , S R = 0 = P H | Y , S R ( 1 | y , 0 ) 2 P ( 0 ) y 1 + 2 P ( 0 ) E U | Y = y , S R = 2 = P H | Y , S R ( 1 | y , 2 ) 2 P ( 2 ) y 1 + 2 P ( 2 )
and (A37) gives
E | U | 2 | Y = y , S R = 0 = P H | Y , S R ( 0 | y , 0 ) + P H | Y , S R ( 1 | y , 0 ) 1 1 + 2 P ( 0 ) ) + 2 P ( 0 ) | y | 2 1 + 2 P ( 0 ) 2 E | U | 2 | Y = y , S R = 2 = P H | Y , S R ( 0 | y , 2 ) + P H | Y , S R ( 1 | y , 2 ) 1 1 + 2 P ( 2 ) + 2 P ( 2 ) | y | 2 1 + 2 P ( 2 ) 2
where P H | Y , S R ( 2 | y , s R ) = 1 P H | Y , S R ( 0 | y , s R ) and
P H | Y , S R ( 0 | y , 0 ) = ϵ ¯ e | y | 2 ϵ ¯ e | y | 2 + ϵ 1 + 2 P ( 0 ) e | y | 2 / ( 1 + 2 P ( 0 ) ) P H | Y , S R ( 0 | y , 2 ) = ϵ e | y | 2 ϵ e | y | 2 + ϵ ¯ 1 + 2 P ( 2 ) e | y | 2 / ( 1 + 2 P ( 2 ) ) .
For Rayleigh fading as in Section 8.5, the probabilities P ( h | y , s R ) take on similar forms as above.

Appendix D. Proof of Lemma 2 and (119)

We prove Lemma 2 by using the same steps as in the proof of Proposition 1. The GMI (102) with a vector Y ̲ is
I s ( A ; Y ̲ ) = log det I + Q Z ̲ / s 1 H Q X ¯ ̲ H + E Y ̲ Q Z ̲ / s + H Q X ¯ ̲ H 1 Y ̲ E Y ̲ H X ¯ ̲ Q Z ̲ / s 1 Y ̲ H X ¯ ̲ .
One can again set s = 1 . Choosing H = H ˜ and Q Z ̲ = Q ˜ Z ˜ ̲ then gives (112).
Next, consider the channel Y ̲ a = H ˜ X ¯ ̲ + Z ˜ ̲ where Z ˜ ̲ is CSCG with covariance matrix Q Z ˜ ̲ and Z ˜ ̲ is independent of X ¯ ̲ . Generalizing (50) and (51), we compute Q Y ̲ a = Q Y ̲ and
E Y ̲ a H ˜ X ¯ ̲ Y ̲ a H ˜ X ¯ ̲ = E Y ̲ H X ¯ ̲ Y ̲ H X ¯ ̲ .
In other words, the second-order statistics for the two channels with outputs Y ̲ (the actual channel output) and Y ̲ a are the same. Moreover, the GMI (112) is the mutual information I ( A ; Y ̲ a ) . Using (104) and (A45), for any s, H and Q Z ̲ we have
I ( A ; Y ̲ a ) = log det I + Q Z ˜ ̲ 1 H ˜ Q X ¯ ̲ H ˜ I s ( A ; Y ̲ a ) = I s ( A ; Y ̲ )
and equality holds if H = H ˜ and Q Z ̲ / s = Q Z ˜ ̲ .
To prove (119), recall that tr AB = tr BA for matrices A and B with appropriate dimensions. Furthermore, for Hermitian matrices A , B , C with the same dimensions we have
tr A B C = tr ( A B C ) = tr C B A = tr A C B .
For notational convenience, consider the covariance matrix (117) with s = 1 and use
A = Q Z ¯ ̲ , B = H Q X ¯ ̲ H 1 / 2 Q Y ̲ Q Z ¯ ̲ 1 / 2 C = Q Z ¯ ̲ 1 Q Y ̲ Q Z ¯ ̲ 1 / 2 H Q X ¯ ̲ H 1 / 2
to compute (cf. (A45))
E Y ̲ H X ¯ ̲ Q Z ̲ 1 Y ̲ H X ¯ ̲ = tr Q Z ¯ ̲ Q Z ̲ 1 = ( a ) tr Q Y ̲ Q Z ¯ ̲ H Q X ¯ ̲ H 1
where step ( a ) follows by (A48). Next, by using (117) we have
Q Z ̲ + H Q X ¯ ̲ H 1 = Q Y ̲ Q Z ¯ ̲ 1 / 2 H Q X ¯ ̲ H 1 / 2 Q Y ̲ 1 · H Q X ¯ ̲ H 1 / 2 Q Y ̲ Q Z ¯ ̲ 1 / 2
and therefore (cf. (A45))
E Y ̲ Q Z ̲ + H Q X ¯ ̲ H 1 Y ̲ = tr Q Y ̲ Q Z ̲ + H Q X ¯ ̲ H 1 = ( a ) tr Q Y ̲ Q Z ¯ ̲ H Q X ¯ ̲ H 1
where step ( a ) again follows by (A48). We are thus left with the logarithm term in (A45). Finally, the determinant in (A45) is
det I + Q Z ̲ 1 H Q X ¯ ̲ H = det Q Z ¯ ̲ 1 Q Y ̲
where we applied (117) and Sylvester’s identity (33).

Appendix E. Proof of Lemma 3

Let P ¯ = E | X ¯ | 2 and write
X ¯ = P ¯ U ¯ , X ( s T ) = P ( s T ) U ( s T ) .
Since the U ( s T ) are CSCG we have
U ( s T ) = ρ ( s T , s T ) U ( s T ) + Z ( s T )
where ρ ( s T , s T ) = E U ( s T ) U ( s T ) * and
Z ( s T ) CN ( 0 , 1 | ρ ( s T , s T ) | 2 )
is independent of U ( s T ) . As in (109), define
X ¯ = s T w ( s T ) X ( s T ) = s T w ( s T ) P ( s T ) U ( s T ) ρ ( s T , s T ) + Z ( s T ) = P ¯ ρ ¯ ( s T ) U ( s T ) + s T w ( s T ) P ( s T ) Z ( s T )
where, assuming that P ¯ > 0 , we have
ρ ¯ ( s T ) = E U ¯ U ( s T ) * = s T w ( s T ) P ( s T ) P ¯ ρ ( s T , s T ) .
Observe that P ¯ ρ ¯ ( s T ) U ( s T ) is the LMMSE estimate of X ¯ given U ( s T ) .
Using Lemma 2, we have the auxiliary variables
h ˜ = E Y X ¯ * P ¯ , σ ˜ 2 = E | Y | 2 | h ˜ | 2 P ¯
and the GMI
I 1 ( A ; Y ) = log E | Y | 2 E | Y | 2 | h ˜ | 2 P ¯ .
If the P ( s T ) are fixed, then so is E | Y | 2 because U ( s T ) is CSCG and independent of Z given S T = s T . The GMI (A59) is thus maximized by maximizing | h ˜ | 2 P ¯ . We compute
| h ˜ | 2 P ¯ = s T P S T ( s T ) E Y X ¯ * S T = s T P ¯ 2 = ( a ) s T P S T ( s T ) E Y U ( s T ) * S T = s T ρ ¯ ( s T ) * 2
s T P S T ( s T ) E Y U ( s T ) * S T = s T 2
where step ( a ) follows because we have the Markov chain A [ U ( S T ) , S T ] Y which implies that Y and the Z ( s T ) in (A56) are independent give S T = s T .
Equality holds in (A61) if the summands in (A60) all have the same phase and | ρ ¯ ( s T ) | = 1 for all s T . But this is possible by choosing X ( s T ) as given in (122) so that U ( s T ) = e j ϕ ( s T ) U . Moreover, choose the receiver weights as
w ( s ˜ T ) = P ¯ P ( s ˜ T ) e j ϕ ( s ˜ T )
for one s ˜ T S T with P ( s ˜ T ) > 0 , and w ( s T ) = 0 otherwise. We then have X ¯ = P ¯ U and
ρ ( s T , s T ) = e j ( ϕ ( s T ) ϕ ( s T ) ) , ρ ¯ ( s T ) = e j ϕ ( s T )
and the resulting maximal I 1 ( A ; Y ) is given by (120) and (121).
Remark A1.
The full correlation permits many choices for the w ( s T ) ; hence, these weights do not seem central to the design. However, including weights can be useful if the codebook is not designed for the CSIR. For example, suppose A has independent entries X ( s T ) for which we compute
ρ ¯ ( s T ) = w ( s T ) P ( s T ) s T | w ( s T ) | 2 P ( s T )
and thus (A60) becomes
s T P S T ( s T ) E Y X ( s T ) * S T = s T w ( s T ) * 2 s T | w ( s T ) | 2 P ( s T ) .
Using Bergström’s inequality (or the Cauchy-Schwarz inequality), the expression (A65) is maximized by
w ( s T ) = P S T ( s T ) E Y X ( s T ) * S T = s T P ( s T ) · c
for some constant c 0 . The expression (A60) is therefore
s T P S T ( s T | h ) 2 E Y U ( s T ) * S T = s T 2
which is generally smaller than E E Y U ( S T ) * S T 2 (apply i a i 2 ( i a i ) 2 for non-negative a i ).
Remark A2.
The following example shows that more general signaling and more general X ¯ can be useful. Consider the channel with two equally-likely states S T = { + 1 , 1 } and Y = | X | exp ( j s T arg ( X ) ) + Z . We compute
E Y U ( + 1 ) * | S T = + 1 = P ( 1 ) E Y U ( 1 ) * | S T = 1 = 0 ρ ¯ ( + 1 ) = w ( 1 ) P ( 1 ) + w ( 1 ) P ( 1 ) ρ ( 1 , + 1 ) P ¯
and one should choose P ( 1 ) = 0 and P ( 1 ) = 2 P if the power constraint is E P ( S T ) P . We thus have
E | Y | 2 = P + 1 , P ˜ = P 2
and therefore (120) gives
I 1 ( A ; Y ) = log 1 + P 2 + P .
However, one can achieve the rate log ( 1 + P ) with other Gaussian X ¯ , namely linear combinations of both the X ( s T ) and the X ( s T ) * in (A56). This idea permits circularly asymmetric X ¯ , also known as improper X ¯ [114]. Alternatively, the transmitter can send the complex-conjugate symbols if S T = 1 .

Appendix F. Large K for Section 5.3

We complete Remark 49 by proceeding as in Appendix C.1. To generalize (70), we must deal with unit-rank matrices y ̲ y ̲ that do not have inverses. Consider first finite K. Conditioned on the event E k , we may write
Y ̲ = y ̲ k + ϵ 1 / 2 Z ˜ ̲ k
where y ̲ k = E Y ̲ | E k and E Z ˜ ̲ k E k = 0 ̲ . We abuse notation and write the conditional covariance matrix of Z ˜ ̲ k as Q Z ˜ ̲ k , and we assume that Q Z ˜ ̲ k is invertible. Define y ˜ ̲ k = Q Z ˜ ̲ k 1 / 2 y ̲ k and compute
Q Y ̲ ( k ) = ϵ Q Z ˜ ̲ k 1 / 2 I + 1 ϵ y ˜ ̲ k y ˜ ̲ k Q Z ˜ ̲ k 1 / 2
Q Y ̲ ( k ) 1 = 1 ϵ Q Z ˜ ̲ k 1 / 2 I y ˜ ̲ k y ˜ ̲ k ϵ + y ˜ ̲ 2 Q Z ˜ ̲ k 1 / 2 .
We further compute approximations for small ϵ :
y ̲ k Q Y ̲ ( k ) 1 y ̲ k = y ˜ ̲ k 2 ϵ + y ˜ ̲ k 2 1
H k = y ̲ k E X ¯ ̲ E k + ϵ 1 / 2 E Z ˜ ̲ k X ¯ ̲ E k Q X ¯ ̲ ( k ) 1 y ̲ k E X ¯ ̲ E k Q X ¯ ̲ ( k ) 1 .
We can now treat the limit of large K for which ϵ approaches zero, i.e., we choose a different auxiliary model for each Y ̲ = y ̲ . Applying the Woodbury and Sylvester identities (32) and (33) several times, (158) becomes
I 1 ( A ; Y ̲ ) = C N p ( y ̲ ) log det I + Q X ¯ ̲ ( y ̲ ) E ̲ y ̲ E ̲ y ̲ 1 Q X ¯ ̲ Q X ¯ ̲ ( y ̲ ) 1 E ̲ y ̲ E ̲ y ̲ tr Q X ¯ ̲ ( y ̲ ) D X ¯ ̲ ( y ̲ ) 1 Q X ¯ ̲ ( y ̲ ) E ̲ y ̲ E ̲ y ̲ 1 E ̲ y ̲ E ̲ y ̲ d y ̲
where
E ̲ y ̲ = E X ¯ ̲ | Y ̲ = y ̲ , Q X ¯ ̲ ( y ̲ ) = E X ¯ ̲ X ¯ ̲ Y ̲ = y ̲ , D X ¯ ̲ ( y ̲ ) = Q X ¯ ̲ Q X ¯ ̲ ( y ̲ ) .
If X ¯ ̲ , Y ̲ are jointly CSCG, then using (25) and (26) we have
E ̲ y ̲ = E X ¯ ̲ Y ̲ Q Y ̲ 1 · y ̲
Q X ¯ ̲ ( y ̲ ) E ̲ y ̲ E ̲ y ̲ = Q X ¯ ̲ E X ¯ ̲ Y ̲ Q Y ̲ 1 E X ¯ ̲ Y ̲ .
For example, if Y ̲ = H X ̲ + Z ̲ where H , X ̲ , Z ̲ are mutually independent and E Z ̲ = 0 , then we have (cf. (A39))
E ̲ y ̲ = C N × M p ( h | y ̲ ) E X ¯ ̲ | Y ̲ = y ̲ , H = h d h = C N × M p ( h | y ̲ ) Q X ¯ ̲ h I + h Q X ̲ h 1 y ̲ d h
= E Q X ¯ ̲ H I + H Q X ̲ H 1 Y ̲ = y ̲ · y ̲
where we have applied (A74) with conditioning on the event H = h . Similarly, we apply a conditional version of (A75) and the step (A76) to compute (cf. (A40))
Q X ¯ ̲ ( y ̲ ) = C N × M p ( h | y ̲ ) E X ¯ ̲ X ¯ ̲ Y ̲ = y ̲ , H = h d h = C N × M p ( h | y ̲ ) Q X ¯ ̲ ( y ̲ , h ) + E ̲ y ̲ , h E ̲ y ̲ , h d h = E Q X ¯ ̲ ( y ̲ , H ) + E ̲ y ̲ , H E ̲ y ̲ , H Y ̲ = y ̲
where
Q X ¯ ̲ ( y ̲ , h ) = Q X ¯ ̲ Q X ¯ ̲ h I + h Q X ̲ h 1 h Q X ¯ ̲ E ̲ y ̲ , h = E X ¯ ̲ | Y ̲ = y ̲ , H = h = Q X ¯ ̲ h I + h Q X ̲ h 1 y ̲ .

Appendix G. Proof of Lemma 4

We mimic the steps of Appendix E. Consider the SVDs
Q X ¯ ̲ = V X ¯ ̲ Σ X ¯ ̲ V X ¯ ̲ , Q X ̲ ( s T ) = V X ̲ ( s T ) Σ X ̲ ( s T ) V X ̲ ( s T ) .
Let U ¯ ̲ CN ( 0 , I ) and write
X ¯ ̲ = Q X ¯ ̲ 1 / 2 U ¯ ̲ .
Since the U ̲ ( s T ) are CSCG, we have
U ̲ ( s T ) = R ( s T , s T ) U ̲ ( s T ) + Z ̲ ( s T )
where R ( s T , s T ) = E U ̲ ( s T ) U ̲ ( s T ) and
Z ̲ ( s T ) CN ( 0 , I R ( s T , s T ) R ( s T , s T ) )
is independent of U ̲ ( s T ) . As in (109), define
X ¯ ̲ = s T W ( s T ) X ̲ ( s T ) = s T W ( s T ) Q X ̲ ( s T ) 1 / 2 R ( s T , s T ) U ̲ ( s T ) + Z ̲ ( s T ) = Q X ¯ ̲ 1 / 2 R ¯ ( s T ) U ̲ ( s T ) + s T W ( s T ) Q X ̲ ( s T ) 1 / 2 Z ̲ ( s T )
where as in (A57), and assuming Q X ¯ ̲ 0 , we write
R ¯ ( s T ) = E U ¯ ̲ U ̲ ( s T ) = s T Q X ¯ ̲ 1 / 2 W ( s T ) Q X ̲ ( s T ) 1 / 2 R ( s T , s T ) .
Observe that the vector Q X ¯ ̲ 1 / 2 R ¯ ( s T ) U ̲ ( s T ) is the LMMSE estimate of X ¯ ̲ given U ̲ ( s T ) .
Using Lemma 2, we have (see (A58))
H ˜ = E Y ̲ X ¯ ̲ Q X ¯ ̲ 1 , Q Z ̲ ˜ = Q Y ̲ H ˜ Q X ¯ ̲ H ˜
and we have the GMI (124) that we repeat here:
I 1 ( A ; Y ̲ ) = log det Q Y ̲ det Q Y ̲ H ˜ Q X ¯ ̲ H ˜ .
As in Appendix E, if the Q X ̲ ( s T ) are fixed, then so is Q Y ̲ because U ̲ ( s T ) CN ( 0 ̲ , I ) is independent of Z ̲ given S T = s T . We want to maximize the GMI (A84). Similar to (A60), we have the decomposition
H ˜ Q X ¯ ̲ H ˜ = D ˜ D ˜
where
D ˜ = s T P S T ( s T ) E Y ̲ U ̲ ( s T ) S T = s T R ¯ ( s T ) .
As in (A60), we have the Markov chain A [ U ̲ ( S T ) , S T ] Y ̲ which implies that Y ̲ and the Z ̲ ( s T ) in (A81) are independent give S T = s T . It is natural to expect that the matrix R ¯ ( s T ) of correlation coefficients should be “maximized” somehow. Indeed, the Cauchy-Schwarz inequality gives
v ̲ 1 R ¯ ( s T ) v ̲ 2 = E v ̲ 1 U ¯ ̲ · U ̲ ( s T ) v ̲ 2 E U ¯ ̲ v ̲ 1 2 · E U ̲ ( s T ) v ̲ 2 2 = v ̲ 1 · v ̲ 2
for any complex M-dimensional vectors v ̲ 1 and v ̲ 2 . The singular values of R ( s T ) are thus at most 1. We will choose the U ̲ ( s T ) so that the R ( s T ) are unitary matrices, and thus all singular values are 1.
Consider the SVD decompositions (126) and a codebook based on scaling and rotating a common U ̲ CN ( 0 ̲ , I ) of dimension N (see (122)):
U ̲ ( s T ) = V T ( s T ) U ̲ .
The receiver chooses M × M unitary matrices V R ( s T ) for all s T and uses the weighting matrix (cf. (A62))
W ( s ˜ T ) = Q X ¯ ̲ 1 / 2 V R ( s ˜ T ) V T ( s ˜ T ) Q X ̲ ( s ˜ T ) 1 / 2
for one s ˜ T S T with Q X ̲ ( s ˜ T ) 0 , and W ( s ˜ T ) = 0 otherwise. These choices give X ¯ ̲ = Q X ¯ ̲ 1 / 2 U ̲ and (cf. (A63))
R ( s T , s T ) = V T ( s T ) V T ( s T ) , R ¯ ( s T ) = V R ( s T ) V T ( s T ) .
Using (126), (A86), and (A89), we have
D ˜ = s T P S T ( s T ) U T ( s T ) Σ ( s T ) V R ( s T ) .

References

  1. Ozarow, L.; Shamai, S.; Wyner, A.D. Information theoretic consideration for cellular mobile radio. IEEE Trans. Inf. Theory 1994, 43, 359–378. [Google Scholar] [CrossRef]
  2. Biglieri, E.; Proakis, J.; Shamai (Shitz), S. Fading channels: Information-theoretic and communications aspects. IEEE Trans. Inf. Theory 1998, 44, 2619–2692. [Google Scholar] [CrossRef]
  3. Love, D.J.; Heath, R.W., Jr.; Lau, V.K.N.; Gesbert, D.; Rao, B.D.; Andrews, M. An overview of limited feedback in wireless communication systems. IEEE J. Select. Areas Commun. 2008, 26, 1341–1365. [Google Scholar] [CrossRef]
  4. Kim, Y.H.; Kramer, G. Information theory for cellular wireless networks. In Information Theoretic Perspectives on 5G Systems and Beyond; Cambridge University Press: Cambridge, UK, 2022; pp. 10–92. [Google Scholar] [CrossRef]
  5. Keshet, G.; Steinberg, Y.; Merhav, N. Channel coding in the presence of side information. Found. Trends Commun. Inf. Theory 2008, 4, 445–586. [Google Scholar] [CrossRef] [Green Version]
  6. Shannon, C.E. Channels with side information at the transmitter. IBM J. Res. Develop. 1958, 2, 289–293, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 273–278. [Google Scholar] [CrossRef]
  7. Shannon, C.E. Two-way communication channels. In Proceedings of the Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability; Neyman, J., Ed.; Univ. Calif. Press: Berkeley, CA, USA, 1961; Volume 1, pp. 611–644, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 351–384. [Google Scholar]
  8. Blahut, R. Principles and Practice of Information Theory; Addison-Wesley: Reading, MA, USA, 1987. [Google Scholar]
  9. Kramer, G. Directed Information for Channels with Feedback; Vol. ETH Series in Information Processing; Hartung-Gorre Verlag: Konstanz, Germany, 1998; Volume 11. [Google Scholar] [CrossRef]
  10. Caire, G.; Shamai (Shitz), S. On the capacity of some channels with channel state information. IEEE Trans. Inf. Theory 1999, 45, 2007–2019. [Google Scholar] [CrossRef] [Green Version]
  11. McEliece, R.J.; Stark, W.E. Channels with block interference. IEEE Trans. Inf. Theory 1984, 30, 44–53. [Google Scholar] [CrossRef]
  12. Stark, W.; McEliece, R. On the capacity of channels with block memory. IEEE Trans. Inf. Theory 1988, 34, 322–324. [Google Scholar] [CrossRef] [Green Version]
  13. Wang, H.S.; Moayeri, N. Finite-state Markov channel-a useful model for radio communication channels. IEEE Trans. Vehic. Technol. 1995, 44, 163–171. [Google Scholar] [CrossRef]
  14. Wang, H.S.; Chang, P.C. On verifying the first-order Markovian assumption for a Rayleigh fading channel model. IEEE Trans. Vehic. Technol. 1996, 45, 353–357. [Google Scholar] [CrossRef]
  15. Viswanathan, H. Capacity of Markov channels with receiver CSI and delayed feedback. IEEE Trans. Inf. Theory 1999, 45, 761–771. [Google Scholar] [CrossRef]
  16. Zhang, Q.; Kassam, S. Finite-state Markov model for Rayleigh fading channels. IEEE Trans. Commun. 1999, 47, 1688–1692. [Google Scholar] [CrossRef]
  17. Tan, C.C.; Beaulieu, N.C. On first-order Markov modeling for the Rayleigh fading channel. IEEE Trans. Commun. 2000, 48, 2032–2040. [Google Scholar] [CrossRef]
  18. Médard, M. The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel. IEEE Trans. Inf. Theory 2000, 46, 933–946. [Google Scholar] [CrossRef] [Green Version]
  19. Riediger, M.; Shwedyk, E. Communication receivers based on Markov models of the fading channel. In Proceedings of the IEEE CCECE2002, Canadian Conference on Electrical and Computer Engineering, Conference Proceedings (Cat. No.02CH37373), Winnipeg, MB, Canada, 12–15 May 2002; Volume 3, pp. 1255–1260. [Google Scholar] [CrossRef] [Green Version]
  20. Agarwal, M.; Honig, M.L.; Ata, B. Adaptive training for correlated fading channels With feedback. IEEE Trans. Inf. Theory 2012, 58, 5398–5417. [Google Scholar] [CrossRef]
  21. Ezzine, R.; Wiese, M.; Deppe, C.; Boche, H. A rigorous proof of the capacity of MIMO Gauss-Markov Rayleigh fading channels. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2732–2737. [Google Scholar] [CrossRef]
  22. Kramer, G. Information networks with in-block memory. IEEE Trans. Inf. Theory 2014, 60, 2105–2120. [Google Scholar] [CrossRef] [Green Version]
  23. Pinsker, M.S. Calculation of the rate of information production by means of stationary random processes and the capacity of stationary channel. Dokl. Akad. Nauk USSR 1956, 111, 753–756. [Google Scholar]
  24. Ihara, S. On the capacity of channels with additive non-Gaussian noise. Inf. Control 1978, 37, 34–39. [Google Scholar] [CrossRef] [Green Version]
  25. Pinsker, M.; Prelov, V.; Verdú, S. Sensitivity of channel capacity. IEEE Trans. Inf. Theory 1995, 41, 1877–1888. [Google Scholar] [CrossRef] [Green Version]
  26. Shamai, S. On the capacity of a twisted-wire pair: Peak-power constraint. IEEE Trans. Commun. 1990, 38, 368–378. [Google Scholar] [CrossRef]
  27. Kalet, I.; Shamai, S. On the capacity of a twisted-wire pair: Gaussian model. IEEE Trans. Commun. 1990, 38, 379–383. [Google Scholar] [CrossRef] [Green Version]
  28. Diggavi, S.; Cover, T. The worst additive noise under a covariance constraint. IEEE Trans. Inf. Theory 2001, 47, 3072–3081. [Google Scholar] [CrossRef]
  29. Klein, T.; Gallager, R. Power control for the additive white Gaussian noise channel under channel estimation errors. In Proceedings of the 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252), Washington, DC, USA, 29–29 June 2001; p. 304. [Google Scholar] [CrossRef]
  30. Bhashyam, S.; Sabharwal, A.; Aazhang, B. Feedback gain in multiple antenna systems. IEEE Trans. Commun. 2002, 50, 785–798. [Google Scholar] [CrossRef]
  31. Hassibi, B.; Hochwald, B. How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49, 951–963. [Google Scholar] [CrossRef] [Green Version]
  32. Yoo, T.; Goldsmith, A. Capacity and power allocation for fading MIMO channels with channel estimation error. IEEE Trans. Inf. Theory 2006, 52, 2203–2214. [Google Scholar] [CrossRef] [Green Version]
  33. Agarwal, M.; Honig, M.L. Wideband fading channel capacity With training and partial feedback. IEEE Trans. Inf. Theory 2010, 56, 4865–4873. [Google Scholar] [CrossRef]
  34. Soysal, A.; Ulukus, S. Joint channel estimation and resource allocation for MIMO systems-part I: Single-user analysis. IEEE Trans. Wireless Commun. 2010, 9, 624–631. [Google Scholar] [CrossRef] [Green Version]
  35. Marzetta, T.L.; Larsson, E.G.; Yang, H.; Ngo, H.Q. Fundamentals of Massive MIMO; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar] [CrossRef]
  36. Li, Y.; Tao, C.; Lee Swindlehurst, A.; Mezghani, A.; Liu, L. Downlink achievable rate analysis in massive MIMO systems with one-bit DACs. IEEE Commun. Lett. 2017, 21, 1669–1672. [Google Scholar] [CrossRef]
  37. Caire, G. On the ergodic rate lower bounds With applications to massive MIMO. IEEE Trans. Wireless Communi. 2018, 17, 3258–3268. [Google Scholar] [CrossRef]
  38. Noam, Y.; Zaidel, B.M. On the two-user MISO interference channel with single-user decoding: Impact of imperfect CSIT and channel dimension reduction. IEEE Trans. Signal Proc. 2019, 67, 2608–2623. [Google Scholar] [CrossRef]
  39. Kaplan, G.; Shamai (Shitz), S. Information rates and error exponents of compound channels with application to antipodal signaling in a fading environment. Arch. für Elektron. Und Übertragungstechnik 1993, 47, 228–239. [Google Scholar]
  40. Merhav, N.; Kaplan, G.; Lapidoth, A.; Shamai Shitz, S. On information rates for mismatched decoders. IEEE Trans. Inf. Theory 1994, 40, 1953–1967. [Google Scholar] [CrossRef] [Green Version]
  41. Scarlett, J.; i Fàbregas, A.G.; Somekh-Baruch, A.; Martinez, A. Information-Theoretic Foundations of Mismatched Decoding. Found. Trends Commun. Inf. Theory 2020, 17, 149–401. [Google Scholar] [CrossRef]
  42. Lapidoth, A. Nearest neighbor decoding for additive non-Gaussian noise channels. IEEE Trans. Inf. Theory 1996, 42, 1520–1529. [Google Scholar] [CrossRef]
  43. Lapidoth, A.; Shamai, S. Fading channels: How perfect need “perfect side information” be? IEEE Trans. Inf. Theory 2002, 48, 1118–1134. [Google Scholar] [CrossRef]
  44. Weingarten, H.; Steinberg, Y.; Shamai, S. Gaussian codes and weighted nearest neighbor decoding in fading multiple-antenna channels. IEEE Trans. Inf. Theory 2004, 50, 1665–1686. [Google Scholar] [CrossRef]
  45. Asyhari, A.T.; Fàbregas, A.G.i. MIMO block-fading channels With mismatched CSI. IEEE Trans. Inf. Theory 2014, 60, 7166–7185. [Google Scholar] [CrossRef] [Green Version]
  46. Östman, J.; Lancho, A.; Durisi, G.; Sanguinetti, L. URLLC With massive MIMO: Analysis and design at finite blocklength. IEEE Trans. Wireless Commun. 2021, 20, 6387–6401. [Google Scholar] [CrossRef]
  47. Zhang, W. A general framework for transmission with transceiver distortion and some applications. IEEE Trans. Commun. 2012, 60, 384–399. [Google Scholar] [CrossRef] [Green Version]
  48. Zhang, W.; Wang, Y.; Shen, C.; Liang, N. A regression approach to certain information transmission problems. IEEE J. Selected Areas Commun. 2019, 37, 2517–2531. [Google Scholar] [CrossRef] [Green Version]
  49. Pang, S.; Zhang, W. Generalized nearest neighbor decoding for MIMO channels with imperfect channel state information. In Proceedings of the IEEE Inf. Theory Workshop, Kanazawa, Japan, 17–21 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
  50. Wang, Y.; Zhang, W. Generalized nearest neighbor decoding. IEEE Trans. Inf. Theory 2022, 68, 5852–5865. [Google Scholar] [CrossRef]
  51. Nedelcu, A.S.; Steiner, F.; Kramer, G. Low-resolution precoding for multi-antenna downlink channels and OFDM. Entropy 2022, 24, 504. [Google Scholar] [CrossRef] [PubMed]
  52. Essiambre, R.J.; Kramer, G.; Winzer, P.J.; Foschini, G.J.; Goebel, B. Capacity Limits of Optical Fiber Networks. IEEE/OSA J. Lightw. Technol. 2010, 28, 662–701. [Google Scholar] [CrossRef]
  53. Dar, R.; Shtaif, M.; Feder, M. New bounds on the capacity of the nonlinear fiber-optic channel. Opt. Lett. 2014, 39, 398–401. [Google Scholar] [CrossRef]
  54. Secondini, M.; Agrell, E.; Forestieri, E.; Marsella, D.; Camara, M.R. Nonlinearity mitigation in WDM systems: Models, strategies, and achievable rates. IEEE/OSA J. Lightw. Technol. 2019, 37, 2270–2283. [Google Scholar] [CrossRef] [Green Version]
  55. García-Gómez, F.J.; Kramer, G. Mismatched models to lower bound the capacity of optical fiber channels. IEEE/OSA J. Lightw. Technol. 2020, 38, 6779–6787. [Google Scholar] [CrossRef]
  56. García-Gómez, F.J.; Kramer, G. Mismatched models to lower bound the capacity of dual-polarization optical fiber channels. IEEE/OSA J. Lightw. Technol. 2021, 39, 3390–3399. [Google Scholar] [CrossRef]
  57. García-Gómez, F.J.; Kramer, G. Rate and power scaling of space-division multiplexing via nonlinear perturbation. J. Lightw. Technol. 2022, 40, 5077–5082. [Google Scholar] [CrossRef]
  58. Secondini, M.; Civelli, S.; Forestieri, E.; Khan, L.Z. New lower bounds on the capacity of optical fiber channels via optimized shaping and detection. J. Lightw. Technol. 2022, 40, 3197–3209. [Google Scholar] [CrossRef]
  59. Shtaif, M.; Antonelli, C.; Mecozzi, A.; Chen, X. Challenges in estimating the information capacity of the fiber-optic channel. Proc. IEEE 2022, 110, 1655–1678. [Google Scholar] [CrossRef]
  60. Mecozzi, A.; Shtaif, M. Information Capacity of Direct Detection Optical Transmission Systems. IEEE/OSA Trans. Lightw. Technol. 2018, 36, 689–694. [Google Scholar] [CrossRef] [Green Version]
  61. Plabst, D.; Prinz, T.; Wiegart, T.; Rahman, T.; Stojanović, N.; Calabrò, S.; Hanik, N.; Kramer, G. Achievable rates for short-reach fiber-optic channels with direct detection. IEEE/OSA J. Lightw. Technol. 2022, 40, 3602–3613. [Google Scholar] [CrossRef]
  62. Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
  63. Divsalar, D. Performance of Mismatched Receivers on Bandlimited Channels. Ph.D. Thesis, Univ. California, Los Angeles, CA, USA, 1978. [Google Scholar]
  64. Ozarow, L.; Wyner, A. On the capacity of the Gaussian channel with a finite number of input levels. IEEE Trans. Inf. Theory 1990, 36, 1426–1428. [Google Scholar] [CrossRef]
  65. Chagnon, M. Optical Communications for Short Reach. IEEE/OSA Trans. Lightw. Technol. 2019, 37, 1779–1797. [Google Scholar] [CrossRef]
  66. Arnold, D.; Loeliger, H.A.; Vontobel, P.; Kavcic, A.; Zeng, W. Simulation-based computation of information rates for channels with memory. IEEE Trans. Inf. Theory 2006, 52, 3498–3508. [Google Scholar] [CrossRef] [Green Version]
  67. Aboy-Faycal, I.; Lapidoth, A. On the capacity of reduced-complexity receivers for intersymbol interference channels. In Proceedings of the Convention of the Electrical and Electronic Engineers in Israel, Tel-Aviv, Israel, 11–12 April 2000; pp. 263–266. [Google Scholar] [CrossRef]
  68. Rusek, F.; Prlja, A. Optimal channel shortening for MIMO and ISI channels. IEEE Trans. Wireless Commun. 2012, 11, 810–818. [Google Scholar] [CrossRef]
  69. Hu, S.; Rusek, F. On the design of channel shortening demodulators for iterative receivers in linear vector channels. IEEE Access 2018, 6, 48339–48359. [Google Scholar] [CrossRef]
  70. Mezghani, A.; Nossek, J.A. Analysis of 1-bit output noncoherent fading channels in the low SNR regime. In Proceedings of the IEEE International Symposium on Information Theory, Seoul, Republic of Korea, 28 June–3 July 2009; pp. 1080–1084. [Google Scholar] [CrossRef] [Green Version]
  71. Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill: New York, NY, USA, 1984. [Google Scholar]
  72. Kramer, G. Capacity results for the discrete memoryless network. IEEE Trans. Inf. Theory 2003, 49, 4–21. [Google Scholar] [CrossRef]
  73. Verdú, S. Spectral efficiency in the wideband regime. IEEE Trans. Inf. Theory 2002, 48, 1319–1343. [Google Scholar] [CrossRef] [Green Version]
  74. Kramer, G.; Ashikhmin, A.; van Wijngaarden, A.; Wei, X. Spectral efficiency of coded phase-shift keying for fiber-optic communication. IEEE/OSA J. Lightw. Technol. 2003, 21, 2438–2445. [Google Scholar] [CrossRef]
  75. Hui, J.Y.N. Fundamental Issues of Multiple Accessing. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1983. [Google Scholar]
  76. Scarlett, J.; Martinez, A.; Fabregas, A.G.i. Mismatched decoding: Error exponents, second-order rates and saddlepoint approximations. IEEE Trans. Inf. Theory 2014, 60, 2647–2666. [Google Scholar] [CrossRef] [Green Version]
  77. Asadi Kangarshahi, E.; Guillén i Fàbregas, A. A single-letter upper bound to the mismatch capacity. IEEE Trans. Inf. Theory 2021, 67, 2013–2033. [Google Scholar] [CrossRef]
  78. Lau, V.K.N.; Liu, Y.; Chen, T.A. Capacity of memoryless channels and block-fading channels with designable cardinality-constrained channel state feedback. IEEE Trans. Inf. Theory 2004, 50, 2038–2049. [Google Scholar] [CrossRef]
  79. Shannon, C.E. Geometrische Deutung einiger Ergebnisse bei der Berechnung der Kanalkapazität. Nachrichtentechnische Z. 1957, 10, 1–4, English version in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A., Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 259–264. [Google Scholar]
  80. Farmanbar, H.; Khandani, A.K. Precoding for the AWGN channel with discrete interference. IEEE Trans. Inf. Theory 2009, 55, 4019–4032. [Google Scholar] [CrossRef] [Green Version]
  81. Wachsmann, U.; Fischer, R.; Huber, J. Multilevel codes: Theoretical concepts and practical design rules. IEEE Trans. Inf. Theory 1999, 45, 1361–1391. [Google Scholar] [CrossRef] [Green Version]
  82. Stolte, N. Rekursive Codes mit der Plotkin-Konstruktion und ihre Decodierung. Ph.D. Thesis, Technische Universität Darmstadt, Darmstadt, Germany, 2002. [Google Scholar]
  83. Arikan, E. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory 2009, 55, 3051–3073. [Google Scholar] [CrossRef]
  84. Seidl, M.; Schenk, A.; Stierstorfer, C.; Huber, J.B. Polar-coded modulation. IEEE Trans. Commun. 2013, 61, 4108–4119. [Google Scholar] [CrossRef]
  85. Honda, J.; Yamamoto, H. Polar coding Without alphabet extension for asymmetric models. IEEE Trans. Inf. Theory 2013, 59, 7829–7838. [Google Scholar] [CrossRef]
  86. Runge, C.; Wiegart, T.; Lentner, D.; Prinz, T. Multilevel binary polar-coded modulation achieving the capacity of asymmetric channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2595–2600. [Google Scholar] [CrossRef]
  87. Parthasarathy, K.R. Extreme points of the convex set of joint probability distributions with fixed marginals. Proc. Math. Sci 2007, 117, 505–515. [Google Scholar] [CrossRef] [Green Version]
  88. Nadkarni, M.G.; Navada, K.G. On the number of extreme measures with fixed marginals. arXiv 2008, arXiv:math/0806.1214. [Google Scholar] [CrossRef]
  89. Farmanbar, H.; Gharan, S.O.; Khandani, A.K. Channel code design with causal side information at the encoder. Eur. Trans. Telecommun. 2010, 21, 337–351. [Google Scholar] [CrossRef] [Green Version]
  90. Birkhoff, G. Three observations on linear algebra. Univ. Nac. Tucumán. Revista A 1946, 5, 147–151. [Google Scholar]
  91. Wolfowitz, J. Coding Theorems of Information Theory, 2nd ed.; Springer: Berlin, Germany, 1964. [Google Scholar]
  92. Kim, T.T.; Skoglund, M. On the Expected Rate of Slowly Fading Channels With Quantized Side Information. IEEE Trans. Commun. 2007, 55, 820–829. [Google Scholar] [CrossRef]
  93. Rosenzweig, A.; Steinberg, Y.; Shamai, S. On channels with partial channel state information at the transmitter. IEEE Trans. Inf. Theory 2005, 51, 1817–1830. [Google Scholar] [CrossRef]
  94. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 5–83. [Google Scholar] [CrossRef] [Green Version]
  95. Goldsmith, A.J.; Varaiya, P.P. Capacity of fading channels with channel side information. IEEE Trans. Inf. Theory 1997, 43, 1986–1992. [Google Scholar] [CrossRef] [Green Version]
  96. Abou-Faycal, I.; Trott, M.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef] [Green Version]
  97. Lapidoth, A.; Moser, S.M. Capacity bounds via duality with applications to multiple-antenna systems on flat fading channels. IEEE Trans. Inf. Theory 2003, 49, 2426–2467. [Google Scholar] [CrossRef]
  98. Taricco, G.; Elia, M. Capacity of fading channel with no side information. Elec. Lett. 1997, 33, 1368–1370. [Google Scholar] [CrossRef]
  99. Marzetta, T.L.; Hochwald, B.M. Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading. IEEE Trans. Inf. Theory 1999, 45, 139–157. [Google Scholar] [CrossRef] [Green Version]
  100. Zheng, L.; Tse, D. Communication on the Grassmann manifold: A geometric approach to the noncoherent multiple-antenna channel. IEEE Trans. Inf. Theory 2002, 48, 359–383. [Google Scholar] [CrossRef] [Green Version]
  101. Gursoy, M.; Poor, H.; Verdu, S. The noncoherent Rician fading channel—part I: Structure of the capacity-achieving input. IEEE Trans. Wireless Commun. 2005, 4, 2193–2206. [Google Scholar] [CrossRef] [Green Version]
  102. Chowdhury, M.; Goldsmith, A. Capacity of block Rayleigh fading channels without CSI. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 1884–1888. [Google Scholar] [CrossRef]
  103. Goldsmith, A.J.; Médard, M. Capacity of time-varying channels with causal channel side information. IEEE Trans. Inf. Theory 2007, 53, 881–899. [Google Scholar] [CrossRef] [Green Version]
  104. Jelinek, F. Indecomposable channels with side information at the transmitter. Inf. Control 1965, 8, 36–55. [Google Scholar] [CrossRef] [Green Version]
  105. Das, A.; Narayan, P. Capacities of time-varying multiple-access channels with side information. IEEE Trans. Inf. Theory 2002, 48, 4–25. [Google Scholar] [CrossRef]
  106. Van Veen, B.; Buckley, K. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Mag. 1988, 5, 4–24. [Google Scholar] [CrossRef]
  107. Liaskos, C.; Nie, S.; Tsioliaridou, A.; Pitsillides, A.; Ioannidis, S.; Akyildiz, I. A new wireless communication paradigm through software-controlled metasurfaces. IEEE Commun. Mag. 2018, 56, 162–169. [Google Scholar] [CrossRef] [Green Version]
  108. Renzo, M.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.; Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has come. J. Wirel. Com. Netw. 2019, 2019, 129. [Google Scholar] [CrossRef] [Green Version]
  109. Thangaraj, A.; Kramer, G.; Böcherer, G. Capacity bounds for discrete-time, amplitude-constrained, additive white Gaussian noise channels. IEEE Trans. Inf. Theory 2017, 63, 4172–4182. [Google Scholar] [CrossRef] [Green Version]
  110. Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Channels; Akadémiai Kiadó: Budapest, Hungary, 1981. [Google Scholar]
  111. Cody, W.J.; Thacher, H.C., Jr. Rational Chebyshev approximations for the exponential integral E1(x). Math. Comp. 1968, 22, 641–649. [Google Scholar]
  112. Nantomah, K. On Some Bounds for the Exponential Integral Function. J. Nepal Mathem. Soc. 2021, 4, 28–34. [Google Scholar] [CrossRef]
  113. Sofotasios, P.C.; Muhaidat, S.; Karagiannidis, G.K.; Sharif, B.S. Solutions to integrals involving the Marcum Q-function and applications. IEEE Signal Proc. Lett. 2015, 22, 1752–1756. [Google Scholar] [CrossRef] [Green Version]
  114. Neeser, F.; Massey, J. Proper complex random processes with applications to information theory. IEEE Trans. Inf. Theory 1993, 39, 1293–1302. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Rates for on-off fading with S R = 0 . The curve “Full CSIR” refers to S R = H and is a capacity upper bound. Flash signaling uses p = 0.05 ; the GMI for the K = 2 partition uses the threshold t R = P 0.4 + 3 .
Figure 1. Rates for on-off fading with S R = 0 . The curve “Full CSIR” refers to S R = H and is a capacity upper bound. Flash signaling uses p = 0.05 ; the GMI for the K = 2 partition uses the threshold t R = P 0.4 + 3 .
Entropy 25 00728 g001
Figure 2. FDG for n = 2 uses of a channel with CSIT. Open nodes represent statistically independent random variables, and filled nodes represent random variables that are functions of their parent variables. Dashed lines represent the CSIT influence on X n .
Figure 2. FDG for n = 2 uses of a channel with CSIT. Open nodes represent statistically independent random variables, and filled nodes represent random variables that are functions of their parent variables. Dashed lines represent the CSIT influence on X n .
Entropy 25 00728 g002
Figure 3. FDG for n = 2 channel uses with different CSIT and CSIR. The hidden channel state S H i permits dependent S R i and S T i .
Figure 3. FDG for n = 2 channel uses with different CSIT and CSIR. The hidden channel state S H i permits dependent S R i and S T i .
Entropy 25 00728 g003
Figure 4. Rates for on-off fading with full CSIR and partial CSIT with noise parameter ϵ = 0.1 . The curve “Best CSIR” shows the capacity with S R = H P ( S T ) . The curves for I ( A ; Y | H ) , the reverse model GMI (rGMI), and the forward model GMI (GMI, K = 1) are for S R = H with CSCG inputs X ( s T ) . The I ( A ; Y | H ) and rGMI curves are indistinguishable in the inset.
Figure 4. Rates for on-off fading with full CSIR and partial CSIT with noise parameter ϵ = 0.1 . The curve “Best CSIR” shows the capacity with S R = H P ( S T ) . The curves for I ( A ; Y | H ) , the reverse model GMI (rGMI), and the forward model GMI (GMI, K = 1) are for S R = H with CSCG inputs X ( s T ) . The I ( A ; Y | H ) and rGMI curves are indistinguishable in the inset.
Entropy 25 00728 g004
Figure 5. Rates for on-off fading with S T = H and S R = 0 . The GMI for the K = 2 partition uses the threshold t R = P + 3 .
Figure 5. Rates for on-off fading with S T = H and S R = 0 . The GMI for the K = 2 partition uses the threshold t R = P + 3 .
Entropy 25 00728 g005
Figure 6. Rates for on-off fading with partial CSIR and CSIT@R. The curve “Best CSIR” shows the capacity with S R = H P ( S T ) . The mutual information I ( X ; Y | S R ) and the GMI are for Pr S R H = 0.1 and with CSCG inputs X ( s T ) . The GMI for the K = 2 partition uses t R = P 0.4 . The curve labeled ‘c-waterfill’ shows the conventional waterfilling rates.
Figure 6. Rates for on-off fading with partial CSIR and CSIT@R. The curve “Best CSIR” shows the capacity with S R = H P ( S T ) . The mutual information I ( X ; Y | S R ) and the GMI are for Pr S R H = 0.1 and with CSCG inputs X ( s T ) . The GMI for the K = 2 partition uses t R = P 0.4 . The curve labeled ‘c-waterfill’ shows the conventional waterfilling rates.
Entropy 25 00728 g006
Figure 7. Capacities for Rayleigh fading with full CSIR, a one-bit quantizer with threshold Δ , and CSIT@R.
Figure 7. Capacities for Rayleigh fading with full CSIR, a one-bit quantizer with threshold Δ , and CSIT@R.
Entropy 25 00728 g007
Figure 8. Capacities for Rayleigh fading, S R = P ( S T ) H , and a one-bit quantizer with threshold Δ = 1 , and various CSIT error probabilities ϵ .
Figure 8. Capacities for Rayleigh fading, S R = P ( S T ) H , and a one-bit quantizer with threshold Δ = 1 , and various CSIT error probabilities ϵ .
Entropy 25 00728 g008
Figure 9. Rates for Rayleigh fading, S R = H and S R = H P ( S T ) , a one-bit quantizer with threshold Δ = 1 , and various ϵ . The curves labeled “best CSIR” show the capacities with S R = H P ( S T ) . The curves labeled “GMI” show the rates (285) for the optimal powers P ( 0 ) and P ( 1 ) .
Figure 9. Rates for Rayleigh fading, S R = H and S R = H P ( S T ) , a one-bit quantizer with threshold Δ = 1 , and various ϵ . The curves labeled “best CSIR” show the capacities with S R = H P ( S T ) . The curves labeled “GMI” show the rates (285) for the optimal powers P ( 0 ) and P ( 1 ) .
Entropy 25 00728 g009
Figure 10. Rates for Rayleigh fading with S T = H and S R = 0 . The threshold t was optimized for the K = 1 curves, while t = P 0.4 for the I ( A ; Y ) and K = 2 curves. The K = 2 GMI uses t R = P 0.4 .
Figure 10. Rates for Rayleigh fading with S T = H and S R = 0 . The threshold t was optimized for the K = 1 curves, while t = P 0.4 for the I ( A ; Y ) and K = 2 curves. The K = 2 GMI uses t R = P 0.4 .
Entropy 25 00728 g010
Figure 12. Rates for Rayleigh fading with full CSIT and S R = 1 ( G t ) .
Figure 12. Rates for Rayleigh fading with full CSIT and S R = 1 ( G t ) .
Entropy 25 00728 g012
Figure 13. Rates for Rayleigh fading with partial CSIR and CSIT@R. The curves labeled ‘q-waterfill’ and ‘c-waterfill’ are the quadratic and conventional waterfilling rates, respectively.
Figure 13. Rates for Rayleigh fading with partial CSIR and CSIT@R. The curves labeled ‘q-waterfill’ and ‘c-waterfill’ are the quadratic and conventional waterfilling rates, respectively.
Entropy 25 00728 g013
Figure 14. FDG for a block fading model with n = 2 blocks of length L = 2 and in-block feedback. Across-block dependence via past S T i is not shown.
Figure 14. FDG for a block fading model with n = 2 blocks of length L = 2 and in-block feedback. Across-block dependence via past S T i is not shown.
Entropy 25 00728 g014
Figure 15. Capacities for Rayleigh block fading with L = 1 , 2 , 3 and a CSIT delay of D = L 1 . The CSIT at symbol L is S T L = q u ( G ) .
Figure 15. Capacities for Rayleigh block fading with L = 1 , 2 , 3 and a CSIT delay of D = L 1 . The CSIT at symbol L is S T L = q u ( G ) .
Entropy 25 00728 g015
Figure 16. Rates for Rayleigh block fading with block lengths L = 10 , 20 , 100 . The CSIT at symbol 2 is S T 2 = q u ( | Y 1 | ) .
Figure 16. Rates for Rayleigh block fading with block lengths L = 10 , 20 , 100 . The CSIT at symbol 2 is S T 2 = q u ( | Y 1 | ) .
Entropy 25 00728 g016
Table 1. Models Studied in Section 6 (General Fading), Section 7 (On–Off Fading) and Section 8 (Rayleigh Fading).
Table 1. Models Studied in Section 6 (General Fading), Section 7 (On–Off Fading) and Section 8 (Rayleigh Fading).
CSIR
FullPartial/No
CSITFullSection 6.3Section 6.5
@RSection 6.3Section 6.6
Partial/NoSection 6.4Section 6.2
Table 2. Power Control Policies and Minimal SNRs.
Table 2. Power Control Policies and Minimal SNRs.
CSIR
None: S R = 0 S R = 1 ( G t )
PolicyTCPEquation (221)Equation (226)
TMFEquation (222)Equation (227)
TCIEquation (223)Equation (228)
GMI-OptimalSee Theorem 2
TMMSESee Remark 64
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kramer, G. Information Rates for Channels with Fading, Side Information and Adaptive Codewords. Entropy 2023, 25, 728. https://doi.org/10.3390/e25050728

AMA Style

Kramer G. Information Rates for Channels with Fading, Side Information and Adaptive Codewords. Entropy. 2023; 25(5):728. https://doi.org/10.3390/e25050728

Chicago/Turabian Style

Kramer, Gerhard. 2023. "Information Rates for Channels with Fading, Side Information and Adaptive Codewords" Entropy 25, no. 5: 728. https://doi.org/10.3390/e25050728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop