Next Article in Journal
Efficient Heuristics for Structure Learning of k-Dependence Bayesian Classifier
Next Article in Special Issue
Detection Games under Fully Active Adversaries
Previous Article in Journal
Optimal Design of Nanoparticle Enhanced Phan-Thien–Tanner Flow of a Viscoelastic Fluid in a Microchannel
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression

Department of Electrical Engineering, Technion—Israel Institute of Technology, Haifa 3200003, Israel
Entropy 2018, 20(12), 896; https://doi.org/10.3390/e20120896
Submission received: 16 October 2018 / Revised: 5 November 2018 / Accepted: 20 November 2018 / Published: 22 November 2018

Abstract

:
This paper provides tight bounds on the Rényi entropy of a function of a discrete random variable with a finite number of possible values, where the considered function is not one to one. To that end, a tight lower bound on the Rényi entropy of a discrete random variable with a finite support is derived as a function of the size of the support, and the ratio of the maximal to minimal probability masses. This work was inspired by the recently published paper by Cicalese et al., which is focused on the Shannon entropy, and it strengthens and generalizes the results of that paper to Rényi entropies of arbitrary positive orders. In view of these generalized bounds and the works by Arikan and Campbell, non-asymptotic bounds are derived for guessing moments and lossless data compression of discrete memoryless sources.

1. Introduction

Majorization theory is a simple and productive concept in the theory of inequalities, which also unifies a variety of familiar bounds [1,2]. These mathematical tools find various applications in diverse fields (see, e.g., [3]) such as economics [2,4,5], combinatorial analysis [2,6], geometric inequalities [2], matrix theory [2,6,7,8], Shannon theory [5,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25], and wireless communications [26,27,28,29,30,31,32,33].
This work, which relies on the majorization theory, has been greatly inspired by the recent insightful paper by Cicalese et al. [12] (the research work in the present paper has been initialized while the author handled [12] as an associate editor). The work in [12] provides tight bounds on the Shannon entropy of a function of a discrete random variable with a finite number of possible values, where the considered function is not one to one. For that purpose, and while being of interest by its own right (see [12], Section 6), a tight lower bound on the Shannon entropy of a discrete random variable with a finite support was derived in [12] as a function of the size of the support, and the ratio of the maximal to minimal probability masses. The present paper aims to extend the bounds in [12] to Rényi entropies of arbitrary positive orders (note that the Shannon entropy is equal to the Rényi entropy of order 1), and to study the information-theoretic applications of these (non-trivial) generalizations in the context of non-asymptotic analysis of guessing moments and lossless data compression.
The motivation for this work is rooted in the diverse information-theoretic applications of Rényi measures [34]. These include (but are not limited to) asymptotically tight bounds on guessing moments [35], information-theoretic applications such as guessing subject to distortion [36], joint source-channel coding and guessing with application to sequential decoding [37], guessing with a prior access to a malicious oracle [38], guessing while allowing the guesser to give up and declare an error [39], guessing in secrecy problems [40,41], guessing with limited memory [42], and guessing under source uncertainty [43]; encoding tasks [44,45]; Bayesian hypothesis testing [9,22,23], and composite hypothesis testing [46,47]; Rényi generalizations of the rejection sampling problem in [48], motivated by the communication complexity in distributed channel simulation, where these generalizations distinguish between causal and noncausal sampler scenarios [49]; Wyner’s common information in distributed source simulation under Rényi divergence measures [50]; various other source coding theorems [23,39,51,52,53,54,55,56,57,58], channel coding theorems [23,58,59,60,61,62,63,64], including coding theorems in quantum information theory [65,66,67].
The presentation in this paper is structured as follows: Section 2 provides notation and essential preliminaries for the analysis in this paper. Section 3 and Section 4 strengthen and generalize, in a non-trivial way, the bounds on the Shannon entropy in [12] to Rényi entropies of arbitrary positive orders (see Theorems 1 and 2). Section 5 relies on the generalized bound from Section 4 and the work by Arikan [35] to derive non-asymptotic bounds for guessing moments (see Theorem 3); Section 5 also relies on the generalized bound in Section 4 and the source coding theorem by Campbell [51] (see Theorem 4) for the derivation of non-asymptotic bounds for lossless compression of discrete memoryless sources (see Theorem 5).

2. Notation and Preliminaries

Let
  • P be a probability mass function defined on a finite set X ;
  • p max and p min be, respectively, the maximal and minimal positive masses of P;
  • G P ( k ) be the sum of the k largest masses of P for k { 1 , , | X | } (note that G P ( 1 ) = p max and G P ( | X | ) = 1 );
  • P n , for an integer n 2 , be the set of all probability mass functions defined on X with | X | = n ; without any loss of generality, let X = { 1 , , n } ;
  • P n ( ρ ) , for ρ 1 and an integer n 2 , be the subset of all probability measures P P n such that
    p max p min ρ .
Definition 1 (Majorization).
Consider discrete probability mass functions P and Q defined on the same (finite or countably infinite) set X . It is said that P is majorized by Q (or Q majorizes P), and it is denoted by P Q , if G P ( k ) G Q ( k ) for all k { 1 , , | X | 1 } (recall that G P ( | X | ) = G Q ( | X | ) = 1 ). If P and Q are defined on finite sets of different cardinalities, then the probability mass function which is defined over the smaller set is first padded by zeros for making the cardinalities of these sets be equal.
By Definition 1, a unit mass majorizes any other distribution; on the other hand, the equiprobable distribution on a finite set is majorized by any other distribution defined on the same set.
Definition 2 (Schur-convexity/concavity).
A function f : P n R is said to be Schur-convex if for every P , Q P n such that P Q , we have f ( P ) f ( Q ) . Likewise, f is said to be Schur-concave if f is Schur-convex, i.e., P , Q P n and P Q imply that f ( P ) f ( Q ) .
Definition 3 
(Rényi entropy [34]). Let X be a random variable taking values on a finite or countably infinite set X , and let P X be its probability mass function. The Rényi entropy of order α ( 0 , 1 ) ( 1 , ) is given by
H α ( X ) = H α ( P X ) = 1 1 α log x X P X α ( x ) .
Unless explicitly stated, the logarithm base can be chosen by the reader, with exp indicating the inverse function of log.
By its continuous extension,
H 0 ( X ) = log { x X : P X ( x ) > 0 } ,
H 1 ( X ) = H ( X ) ,
H ( X ) = log 1 p max
where H ( X ) is the (Shannon) entropy of X.
Proposition 1
(Schur-concavity of the Rényi entropy (Appendix F.3.a (p. 562) of [2])). The Rényi entropy of an arbitrary order α > 0 is Schur-concave; in particular, for α = 1 , the Shannon entropy is Schur-concave.
Remark 1.
[17] (Theorem 2) strengthens Proposition 1, though it is not needed for our analysis.
Definition 4 
(Rényi divergence [34]). Let P and Q be probability mass functions defined on a finite or countably infinite set X . The Rényi divergence of order α [ 0 , ] is defined as follows:
  • If α ( 0 , 1 ) ( 1 , ) , then
    D α ( P Q ) = 1 α 1 log x X P α ( x ) Q 1 α ( x ) .
  • By the continuous extension of D α ( P Q ) ,
    D 0 ( P Q ) = max A : P ( A ) = 1 log 1 Q ( A ) ,
    D 1 ( P Q ) = D ( P Q ) ,
    D ( P Q ) = log sup x X P ( x ) Q ( x ) ,
    where D ( P Q ) in the right side of (8) is the relative entropy (a.k.a. Kullback-Leibler divergence).
Throughout this paper, for a R , a denotes the ceiling of a (i.e., the smallest integer not smaller than the real number a), and a denotes the flooring of a (i.e., the greatest integer not greater than a).

3. A Tight Lower Bound on the Rényi Entropy

We provide in this section a tight lower bound on the Rényi entropy, of an arbitrary order α > 0 , when the probability mass function of the discrete random variable is defined on a finite set of cardinality n, and the ratio of the maximal to minimal probability masses is upper bounded by an arbitrary fixed value ρ [ 1 , ) . In other words, we derive the largest possible gap between the order- α Rényi entropies of an equiprobable distribution and a non-equiprobable distribution (defined on a finite set of the same cardinality) with a given value for the ratio of the maximal to minimal probability masses. The basic tool used for the development of our result in this section relies on the majorization theory. Our result strengthens the result in [12] (Theorem 2) for the Shannon entropy, and it further provides a generalization for the Rényi entropy of an arbitrary order α > 0 (recall that the Shannon entropy is equal to the Rényi entropy of order α = 1 , see (4)). Furthermore, the approach for proving the main result in this section differs significantly from the proof in [12] for the Shannon entropy. The main result in this section is a key result for all what follows in this paper.
The following lemma is a restatement of [12] (Lemma 6).
Lemma 1.
Let P P n ( ρ ) with ρ 1 and an integer n 2 , and assume without any loss of generality that the probability mass function P is defined on the set X = { 1 , , n } . Let Q P n be defined on X as follows:
Q ( j ) = { ρ p min , j { 1 , , i } , 1 ( n + i ρ i 1 ) p min , j = i + 1 , p min , j { i + 2 , , n }
where
i : = 1 n p min ( ρ 1 ) p min .
Then,
(1) 
Q P n ( ρ ) , and Q ( 1 ) Q ( 2 ) Q ( n ) > 0 ;
(2) 
P Q .
Proof. 
See [12] (p. 2236) (top of the second column). ☐
Lemma 2.
Let ρ > 1 , α > 0 , and n 2 be an integer. For
β 1 1 + ( n 1 ) ρ , 1 n : = Γ ρ ( n ) ,
let Q β P n ( ρ ) be defined on X = { 1 , , n } as follows:
Q β ( j ) = { ρ β , j { 1 , , i β } , 1 ( n + i β ρ i β 1 ) β , j = i β + 1 , β , j { i β + 2 , , n }
where
i β : = 1 n β ( ρ 1 ) β .
Then, for every α > 0 ,
min P P n ( ρ ) H α ( P ) = min β Γ ρ ( n ) H α ( Q β ) .
Proof. 
See Appendix A. ☐
Lemma 3.
For ρ > 1 and α > 0 , let
c α ( n ) ( ρ ) : = log n min P P n ( ρ ) H α ( P ) , n = 2 , 3 ,
with c α ( 1 ) ( ρ ) : = 0 . Then, for every n N ,
0 c α ( n ) ( ρ ) log ρ ,
c α ( n ) ( ρ ) c α ( 2 n ) ( ρ ) ,
and c α ( n ) ( ρ ) is monotonically increasing in α [ 0 , ] .
Proof. 
See Appendix B. ☐
Lemma 4.
For α > 0 and ρ > 1 , the limit
c α ( ) ( ρ ) : = lim n c α ( n ) ( ρ )
exists, having the following properties:
(a) 
If α ( 0 , 1 ) ( 1 , ) , then
c α ( ) ( ρ ) = 1 α 1 log 1 + 1 + α ( ρ 1 ) ρ α ( 1 α ) ( ρ 1 ) α α 1 log 1 + 1 + α ( ρ 1 ) ρ α ( 1 α ) ( ρ α 1 ) ,
and
lim α c α ( ) ( ρ ) = log ρ .
(b) 
If α = 1 , then
c 1 ( ) ( ρ ) = lim α 1 c α ( ) ( ρ ) = ρ log ρ ρ 1 log e ρ log e ρ ρ 1 .
(c) 
For all α > 0 ,
lim ρ 1 c α ( ) ( ρ ) = 0 .
  • For every n N , α > 0 and ρ 1 ,
    0 c α ( n ) ( ρ ) c α ( 2 n ) ( ρ ) c α ( ) ( ρ ) log ρ .
Proof. 
See Appendix C. ☐
In view of Lemmata 1–4, we obtain the following main result in this section:
Theorem 1.
Let α > 0 , ρ > 1 , n 2 , and let c α ( n ) ( ρ ) in (16) designate the maximal gap between the order-α Rényi entropies of equiprobable and arbitrary distributions in P n ( ρ ) . Then,
(a) 
The non-negative sequence { c α ( n ) ( ρ ) } n = 2 can be calculated by the real-valued single-parameter optimization in the right side of (15).
(b) 
The asymptotic limit as n , denoted by c α ( ) ( ρ ) , admits the closed-form expressions in (20) and (22), and it satisfies the properties in (21), (23) and (24).
Remark 2.
Setting α = 2 in Theorem 1 gives that, for all P P n ( ρ ) (with ρ > 1 , and an integer n 2 ),
H 2 ( P ) log n c 2 ( n ) ( ρ )
log n c 2 ( ) ( ρ )
= log 4 ρ n ( 1 + ρ ) 2
where (25)–(27) hold, respectively, due to (16), (24) and (20). This strengthens the result in [68] (Proposition 2) which gives the same lower bound as in the right side of (27) for H ( P ) rather than for H 2 ( P ) (recall that H ( P ) H 2 ( P ) ).
For a numerical illustration of Theorem 1, Figure 1 provides a plot of c α ( ) ( ρ ) in (20) and (22) as a function of ρ 1 , confirming numerically the properties in (21) and (23). Furthermore, Figure 2 provides plots of c α ( n ) ( ρ ) in (16) as a function of α > 0 , for ρ = 2 (left plot) and ρ = 256 (right plot), with several values of n 2 ; the calculation of the curves in these plots relies on (15), (20) and (22), and they illustrate the monotonicity and boundedness properties in (24).
Remark 3.
Theorem 1 strengthens the result in [12] (Theorem 2) for the Shannon entropy (i.e., for α = 1 ), in addition to its generalization to Rényi entropies of arbitrary orders α > 0 . This is because our lower bound on the Shannon entropy is given by
H ( P ) log n c 1 ( n ) ( ρ ) , P P n ( ρ ) ,
whereas the looser bound in [12] is given by (see [12] ((7)) and (22) here)
H ( P ) log n c 1 ( ) ( ρ ) , P P n ( ρ ) ,
and we recall that 0 c 1 ( n ) ( ρ ) c 1 ( ) ( ρ ) (see (24)). Figure 3 shows the improvement in the new lower bound (28) over (29) by comparing c 1 ( ) ( ρ ) versus c 1 ( n ) ( ρ ) for ρ [ 1 , 10 5 ] and with several values of n. It is reflected from Figure 3 that there is a very marginal improvement in the lower bound on the Shannon entropy (28) over the bound in (29) if ρ 30 (even for small values of n), whereas there is a significant improvement over the bound in (29) for large values of ρ; by increasing the value of n, also the value of ρ needs to be increased for observing an improvement of the lower bound in (28) over (29) (see Figure 3).
An improvement of the bound in (28) over (29) leads to a tightening of the upper bound in [12] (Theorem 4) on the compression rate of Tunstall codes for discrete memoryless sources, which further tightens the bound by Jelinek and Schneider in [69] (Equation (9)). More explicitly, in view of [12] (Section 6), an improved upper bound on the compression rate of these variable-to-fixed lossless source codes is obtained by combining [12] (Equations (36) and (38)) with a tightened lower bound on the entropy H ( W ) of the leaves of the tree graph for Tunstall codes. From (28), the latter lower bound is given by H ( W ) log 2 n c 1 ( n ) ( ρ ) where c 1 ( n ) ( ρ ) is expressed in bits, ρ : = 1 p min is the reciprocal of the minimal positive probability of the source symbols, and n is the number of codewords (so, all codewords are of length log 2 n bits). This yields a reduction in the upper bound on the non-asymptotic compression rate R of Tunstall codes from log 2 n H ( X ) log 2 n c 1 ( ) ( ρ ) (see [12] (Equation (40)) and (22)) to log 2 n H ( X ) log 2 n c 1 ( n ) ( ρ ) bits per source symbol where H ( X ) denotes the source entropy (converging, in view of (17), to H ( X ) as we let n ).
Remark 4.
Equality (15) with the minimizing probability mass function of the form (13) holds, in general, by replacing the Rényi entropy with an arbitrary Schur-concave function (as it can be easily verified from the proof of Lemma 2 in Appendix A). However, the analysis leading to Lemmata 3–4 and Theorem 1 applies particularly to the Rényi entropy.

4. Bounds on the Rényi Entropy of a Function of a Discrete Random Variable

This section relies on Theorem 1 and majorization for extending [12] (Theorem 1), which applies to the Shannon entropy, to Rényi entropies of any positive order. More explicitly, let α ( 0 , ) and
  • X and Y be finite sets of cardinalities | X | = n and | Y | = m with n > m 2 ; without any loss of generality, let X = { 1 , , n } and Y = { 1 , , m } ;
  • X be a random variable taking values on X with a probability mass function P X P n ;
  • F n , m be the set of deterministic functions f : X Y ; note that f F n , m is not one to one since m < n .
The main result in this section sharpens the inequality H α f ( X ) H α ( X ) , for every deterministic function f F n , m with n > m 2 and α > 0 , by obtaining non-trivial upper and lower bounds on max f F n , m H α f ( X ) . The calculation of the exact value of min f F n , m H α f ( X ) is much easier, and it is expressed in closed form by capitalizing on the Schur-concavity of the Rényi entropy.
The following main result extends [12] (Theorem 1) to Rényi entropies of arbitrary positive orders.
Theorem 2.
Let X { 1 , , n } be a random variable which satisfies P X ( 1 ) P X ( 2 ) P X ( n ) .
(a) 
For m { 2 , , n 1 } , if P X ( 1 ) < 1 m , let X ˜ m be the equiprobable random variable on { 1 , , m } ; otherwise, if P X ( 1 ) 1 m , let X ˜ m { 1 , , m } be a random variable with the probability mass function
P X ˜ m ( i ) = { P X ( i ) , i { 1 , , n } , 1 m n j = n + 1 n P X ( j ) , i { n + 1 , , m } ,
where n is the maximal integer i { 1 , , m 1 } such that
P X ( i ) 1 m i j = i + 1 n P X ( j ) .
Then, for every α > 0 ,
max f F n , m H α f ( X ) H α ( X ˜ m ) v ( α ) , H α ( X ˜ m ) ,
where
v ( α ) : = c α ( ) ( 2 ) = { log α 1 2 α 2 α α 1 log α 2 α 1 , α 1 , log 2 e ln 2 0.08607 b i t s , α = 1 .
(b) 
There exists an explicit construction of a deterministic function f F n , m such that
H α f ( X ) H α ( X ˜ m ) v ( α ) , H α ( X ˜ m )
where f is independent of α, and it is obtained by using Huffman coding (as in [12] for α = 1 ).
(c) 
Let Y ˜ m { 1 , , m } be a random variable with the probability mass function
P Y ˜ m ( i ) = { k = 1 n m + 1 P X ( k ) , i = 1 , P X ( n m + i ) , i { 2 , , m } .
Then, for every α > 0 ,
min f F n , m H α f ( X ) = H α ( Y ˜ m ) .
Remark 5.
Setting α = 1 specializes Theorem 2 to [12] (Theorem 1) (regarding the Shannon entropy). This point is further elaborated in Remark 8, after the proof of Theorem 2.
Remark 6.
Similarly to [12] (Lemma 1), an exact solution of the maximization problem in the left side of (32) is strongly NP-hard [70]; this means that, unless P = N P , there is no polynomial time algorithm which, for an arbitrarily small ε > 0 , computes an admissible deterministic function f ε F n , m such that
H α f ε ( X ) ( 1 ε ) max f F n , m H α f ( X ) .
This motivates the derivation of the bounds in (32), and the simple construction of a deterministic function f F n , m achieving (34).
A proof of Theorem 2 relies on the following lemmata.
Lemma 5.
Let X { 1 , , n } , m < n and α > 0 . Then,
max Q P m : P X Q H α ( Q ) = H α ( X ˜ m )
where the probability mass function of X ˜ m is given in (30).
Proof. 
Since P X P X ˜ m (see [12] (Lemma 2)) with P X ˜ m P m , and P X ˜ m Q for all Q P m such that P X Q (see [12] (Lemma 4)), the result follows from the Schur-concavity of the Rényi entropy. ☐
Lemma 6.
Let X { 1 , , n } , α > 0 , and f F n , m with m < n . Then,
H α f ( X ) H α ( X ˜ m ) .
Proof. 
Since f is a deterministic function in F n , m with m < n , the probability mass function of f ( X ) is an element in P m which majorizes P X (see [12] (Lemma 3)). Inequality (39) then follows from Lemma 5. ☐
We are now ready to prove Theorem 2.
Proof. 
In view of (39),
max f F n , m H α f ( X ) H α ( X ˜ m ) .
We next construct a function f F n , m such that, for all α > 0 ,
H α f ( X ) max Q P m : P X Q H α ( Q ) v ( α )
max f F n , m H α f ( X ) v ( α )
where the function v : ( 0 , ) ( 0 , ) in the right side of (41) is given in (33), and (42) holds due to (38) and (40). The function f in our proof coincides with the construction in [12], and it is, therefore, independent of α .
We first review and follow the concept of the proof of [12] (Lemma 5), and we then deviate from the analysis there for proving our result. The idea behind the proof of [12] (Lemma 5) relies on the following algorithm:
(1)
Start from the probability mass function P X P n with P X ( 1 ) P X ( n ) ;
(2)
Merge successively pairs of probability masses by applying the Huffman algorithm;
(3)
Stop the merging process in Step 2 when a probability mass function Q P m is obtained (with Q ( 1 ) Q ( m ) );
(4)
Construct the deterministic function f F n , m by setting f ( k ) = j { 1 , , m } for all probability masses P X ( k ) , with k { 1 , , n } , being merged in Steps 2–3 into the node of Q ( j ) .
Let i { 0 , , m 1 } be the largest index such that P X ( 1 ) = Q ( 1 ) , , P X ( i ) = Q ( i ) (note that i = 0 corresponds to the case where each node Q ( j ) , with j { 1 , , m } , is constructed by merging at least two masses of the probability mass function P X ). Then, according to [12] (p. 2225),
Q ( i + 1 ) 2 Q ( m ) .
Let
S : = j = i + 1 m Q ( j )
be the sum of the m i smallest masses of the probability mass function Q. In view of (43), the vector
Q ¯ : = Q ( i + 1 ) S , , Q ( m ) S
represents a probability mass function where the ratio of its maximal to minimal masses is upper bounded by 2.
At this point, our analysis deviates from [12] (p. 2225). Applying Theorem 1 to Q ¯ with ρ = 2 gives
H α ( Q ¯ ) log ( m i ) c α ( ) ( 2 )
with
c α ( ) ( 2 ) = 1 α 1 log 1 + 1 + α 2 α 1 α α α 1 log 1 + 1 + α 2 α ( 1 α ) ( 2 α 1 )
= log α 1 2 α 2 α α 1 log α 2 α 1
= v ( α )
where (47) follows from (20); (48) is straightforward algebra, and (49) is the definition in (33).
For α ( 0 , 1 ) ( 1 , ) , we get
H α ( Q ) = 1 1 α log j = 1 m Q α ( j )
= 1 1 α log j = 1 i Q α ( j ) + j = i + 1 m Q α ( j )
= 1 1 α log j = 1 i Q α ( j ) + S α exp ( 1 α ) H α ( Q ¯ )
1 1 α log j = 1 i Q α ( j ) + S α exp ( 1 α ) log ( m i ) v ( α )
= 1 1 α log j = 1 i Q α ( j ) + S α ( m i ) 1 α exp ( α 1 ) v ( α )
where (51) holds since i { 0 , , m 1 } ; (52) follows from (2) and (45); (53) holds by (46)–(49).
In view of (44), let Q P m be the probability mass function which is given by
Q ( j ) = { Q ( j ) , j = 1 , , i S m i , j = i + 1 , , m .
From (50)–(55), we get
H α ( Q ) 1 1 α log j = 1 i Q ( j ) α + j = i + 1 m Q ( j ) α exp ( α 1 ) v ( α )
= 1 1 α log j = 1 m Q ( j ) α + j = i + 1 m Q ( j ) α exp ( α 1 ) v ( α ) 1
= H α ( Q ) + 1 1 α log 1 + T exp ( α 1 ) v ( α ) 1
with
T : = j = i + 1 m Q ( j ) α j = 1 m Q ( j ) α [ 0 , 1 ] .
Since T [ 0 , 1 ] and v ( α ) > 0 for α > 0 , it can be verified from (56)–(58) that for α ( 0 , 1 ) ( 1 , )
H α ( Q ) H α ( Q ) v ( α ) .
The validity of (60) is extended to α = 1 by taking the limit α 1 on both sides of this inequality, and due to the continuity of v ( · ) in (33) at α = 1 . Applying the majorization result Q P X ˜ m in [12] ((31)), it follows from (60) and the Schur-concavity of the Rényi entropy that, for all α > 0 ,
H α ( Q ) H α ( Q ) v ( α ) H α ( X ˜ m ) v ( α ) ,
which together with (40), prove Items a) and b) of Theorem 2 (note that, in view of the construction of the deterministic function f F n , m in Step 4 of the above algorithm, we get H α f ( X ) = H α ( Q ) ).
We next prove Item c). Equality (36) is due to the Schur-concavity of the Rényi entropy, and since we have
  • f ( X ) is an aggregation of X, i.e., the probability mass function Q P m of f ( X ) satisfies Q ( j ) = i I j P X ( i ) ( 1 j m ) where I 1 , , I m partition { 1 , , n } into m disjoint subsets as follows:
    I j : = { i { 1 , , n } : f ( i ) = j } , j = 1 , , m ;
  • By the assumption P X ( 1 ) P X ( 2 ) P X ( n ) , it follows that Q P Y ˜ m for every such Q P m ;
  • From (35), Y ˜ m = f ˜ ( X ) where the function f ˜ F n , m is given by f ˜ ( k ) : = 1 for all k { 1 , , n m + 1 } , and f ˜ ( n m + i ) : = i for all i { 2 , , m } . Hence, P Y ˜ m is an element in the set of the probability mass functions of f ( X ) with f F n , m which majorizes every other element from this set.
 ☐
Remark 7.
The solid line in the left plot of Figure 2 depicts v ( α ) : = c α ( ) ( 2 ) in (33) for α > 0 . In view of Lemma 4, and by the definition in (33), the function v : ( 0 , ) ( 0 , ) is indeed monotonically increasing and continuous.
Remark 8.
Inequality (43) leads to the application of Theorem 1 with ρ = 2 (see (46)). In the derivation of Theorem 2, we refer to v ( α ) : = c α ( ) ( 2 ) (see (47)–(49)) rather than referring to c α ( n ) ( 2 ) (although, from (24), we have 0 c α ( n ) ( 2 ) v ( α ) for all α > 0 ). We do so since, for n 16 , the difference between the curves of c α ( n ) ( 2 ) (as a function of α > 0 ) and the curve of c α ( ) ( 2 ) is marginal (see the dashed and solid lines in the left plot of Figure 2), and also because the function v in (33) is expressed in a closed form whereas c α ( n ) ( 2 ) is subject to numerical optimization for finite n (see (15) and (16)). For this reason, Theorem 2 coincides with the result in [12] (Theorem 1) for the Shannon entropy (i.e., for α = 1 ) while providing a generalization of the latter result for Rényi entropies of arbitrary positive orders α. Theorem 1, however, both strengthens the bounds in [12] (Theorem 2) for the Shannon entropy with finite cardinality n (see Remark 3), and it also generalizes these bounds to Rényi entropies of all positive orders.
Remark 9.
The minimizing probability mass function in (35) to the optimization problem (36), and the maximizing probability mass function in (30) to the optimization problem (38) are in general valid when the Rényi entropy of a positive order is replaced by an arbitrary Schur-concave function. However, the main results in (32)–(34) hold particularly for the Rényi entropy.
Remark 10.
Theorem 2 makes use of the random variables denoted by X ˜ m and Y ˜ m , rather than (more simply) X m and Y m respectively, because Section 5 considers i.i.d. samples { X i } i = 1 k and { Y i } i = 1 k with X i P X and Y i P Y ; note, however, that the probability mass functions of X ˜ m and Y ˜ m are different from P X and P Y , respectively, and for that reason we make use of tilted symbols in the left sides of (30) and (35).

5. Information-Theoretic Applications: Non-Asymptotic Bounds for Lossless Compression and Guessing

Theorem 2 is applied in this section to derive non-asymptotic bounds for lossless compression of discrete memoryless sources and guessing moments. Each of the two subsections starts with a short background for making the presentation self-contained.

5.1. Guessing

5.1.1. Background

The problem of guessing discrete random variables has various theoretical and operational aspects in information theory (see [35,36,37,38,40,41,43,56,71,72,73,74,75,76,77,78,79,80,81]). The central object of interest is the distribution of the number of guesses required to identify a realization of a random variable X, taking values on a finite or countably infinite set X = { 1 , , | X | } , by successively asking questions of the form “Is X equal to x?” until the value of X is guessed correctly. A guessing function is a one-to-one function g : X X , which can be viewed as a permutation of the elements of X in the order in which they are guessed. The required number of guesses is therefore equal to g ( x ) when X = x with x X .
Lower and upper bounds on the minimal expected number of required guesses for correctly identifying the realization of X, expressed as a function of the Shannon entropy H ( X ) , have been respectively derived by Massey [77] and by McEliece and Yu [78], followed by a derivation of improved upper and lower bounds by De Santis et al. [80]. More generally, given a probability mass function P X on X , it is of interest to minimize the generalized guessing moment E [ g ρ ( X ) ] = x X P X ( x ) g ρ ( x ) for ρ > 0 . For an arbitrary positive ρ , the ρ -th moment of the number of guesses is minimized by selecting the guessing function to be a ranking function g X , for which g X ( x ) = if P X ( x ) is the -th largest mass [77]. Although the tie breaking affects the choice of g X , the distribution of g X ( X ) does not depend on how ties are resolved. Not only does this strategy minimize the average number of guesses, but it also minimizes the ρ -th moment of the number of guesses for every ρ > 0 . Upper and lower bounds on the ρ -th moment of ranking functions, expressed in terms of the Rényi entropies, were derived by Arikan [35], Boztaş [71], followed by recent improvements in the non-asymptotic regime by Sason and Verdú [56]. Although if | X | is small, it is straightforward to evaluate numerically the guessing moments, the benefit of bounds expressed in terms of Rényi entropies is particularly relevant when dealing with a random vector X k = ( X 1 , , X k ) whose letters belong to a finite alphabet X ; computing all the probabilities of the mass function P X k over the set X k , and then sorting them in decreasing order for the calculation of the ρ -th moment of the optimal guessing function for the elements of X k becomes infeasible even for moderate values of k. In contrast, regardless of the value of k, bounds on guessing moments which depend on the Rényi entropy are readily computable if for example, { X i } i = 1 k are independent; in which case, the Rényi entropy of the vector is equal to the sum of the Rényi entropies of its components. Arikan’s bounds in [35] are asymptotically tight for random vectors of length k as k , thus providing the correct exponential growth rate of the guessing moments for sufficiently large k.

5.1.2. Analysis

We next analyze the following setup of guessing. Let { X i } i = 1 k be i.i.d. random variables where X 1 P X takes values on a finite set X with | X | = n . To cluster the data [82] (see also [12] (Section 3.A) and references therein), suppose that each X i is mapped to Y i = f ( X i ) where f F n , m is an arbitrary deterministic function (independent of the index i) with m < n . Consequently, { Y i } i = 1 k are i.i.d., and each Y i takes values on a finite set Y with | Y | = m < | X | .
Let g X k : X k { 1 , , n k } and g Y k : Y k { 1 , , m k } be, respectively, the ranking functions of the random vectors X k = ( X 1 , , X k ) and Y k = ( Y 1 , , Y k ) by sorting in separate decreasing orders the probabilities P X k ( x k ) = i = 1 k P X ( x i ) for x k X k , and P Y k ( y k ) = i = 1 k P Y ( y i ) for y k Y k where ties in both cases are resolved arbitrarily. In view of Arikan’s bounds on the ρ -th moment of ranking functions (see [35] (Theorem 1) for the lower bound, and [35] (Proposition 4) for the upper bound), since | X k | = n k and | Y k | = m k , the following bounds hold for all ρ > 0 :
ρ H 1 1 + ρ ( X ) ρ log ( 1 + k ln n ) k 1 k log E g X k ρ ( X k ) ρ H 1 1 + ρ ( X ) ,
ρ H 1 1 + ρ ( Y ) ρ log ( 1 + k ln m ) k 1 k log E g Y k ρ ( Y k ) ρ H 1 1 + ρ ( Y ) .
In the following, we rely on Theorem 2 and the bounds in (63) and (64) to obtain bounds on the exponential reduction of the ρ -th moment of the ranking function of X k as a result of its mapping to Y k . First, the combination of (63) and (64) yields
ρ H 1 1 + ρ ( X ) H 1 1 + ρ ( Y ) ρ log ( 1 + k ln n ) k 1 k log E g X k ρ ( X k ) E g Y k ρ ( Y k )
ρ H 1 1 + ρ ( X ) H 1 1 + ρ ( Y ) + ρ log ( 1 + k ln m ) k .
In view of Theorem 2-(a) and (65), it follows that for an arbitrary f F n , m and ρ > 0
1 k log E g X k ρ ( X k ) E g Y k ρ ( Y k ) ρ H 1 1 + ρ ( X ) H 1 1 + ρ ( X ˜ m ) ρ log ( 1 + k ln n ) k
where X ˜ m is a random variable whose probability mass function is given in (30). Please note that
H 1 1 + ρ ( X ˜ m ) H 1 1 + ρ ( X ) , ρ log ( 1 + k ln n ) k k 0
where the first inequality in (68) holds since P X P X ˜ m (see Lemma 5) and the Rényi entropy is Schur-concave.
By the explicit construction of the function f F n , m according to the algorithm in Steps 1–4 in the proof of Theorem 2 (based on the Huffman procedure), by setting Y i : = f ( X i ) for every i { 1 , , k } , it follows from (34) and (66) that for all ρ > 0
1 k log E g X k ρ ( X k ) E g Y k ρ ( Y k ) ρ H 1 1 + ρ ( X ) H 1 1 + ρ ( X ˜ m ) + v 1 1 + ρ + ρ log ( 1 + k ln m ) k
where the monotonically increasing function v : ( 0 , ) ( 0 , ) is given in (33), and it is depicted by the solid line in the left plot of Figure 2. In view of (33), it can be shown that the linear approximation v ( α ) v ( 1 ) α is excellent for all α [ 0 , 1 ] , and therefore for all ρ > 0
v 1 1 + ρ 0.08607 1 + ρ bits .
Hence, for sufficiently large value of k, the gap between the lower and upper bounds in (67) and (69) is marginal, being approximately equal to 0.08607 ρ 1 + ρ bits for all ρ > 0 .
The following theorem summarizes our result in this section.
Theorem 3.
Let
  • { X i } i = 1 k be i.i.d. with X 1 P X taking values on a set X with | X | = n ;
  • Y i = f ( X i ) , for every i { 1 , , k } , where f F n , m is a deterministic function with m < n ;
  • g X k : X k { 1 , , n k } and g Y k : Y k { 1 , , m k } be, respectively, ranking functions of the random vectors X k = ( X 1 , , X k ) and Y k = ( Y 1 , , Y k ) .
Then, for every ρ > 0 ,
(a) 
The lower bound in (67) holds for every deterministic function f F n , m ;
(b) 
The upper bound in (69) holds for the specific f F n , m , whose construction relies on the Huffman algorithm (see Steps 1–4 of the procedure in the proof of Theorem 2);
(c) 
The gap between these bounds, for f = f and sufficiently large k, is at most ρ v 1 1 + ρ 0.08607 ρ 1 + ρ b i t s .

5.1.3. Numerical Result

The following simple example illustrates the tightness of the achievable upper bound and the universal lower bound in Theorem 3, especially for sufficiently long sequences.
Example 1.
Let X be geometrically distributed restricted to { 1 , , n } with the probability mass function
P X ( j ) = ( 1 a ) a j 1 1 a n , j { 1 , , n }
where a = 24 25 and n = 128 . Assume that X 1 , , X k are i.i.d. with X 1 P X , and let Y i = f ( X i ) for a deterministic function f F n , m with n = 128 and m = 16 . We compare the upper and lower bounds in Theorem 3 for the two cases where the sequence X k = ( X 1 , , X k ) is of length k = 100 or k = 1000 . The lower bound in (67) holds for an arbitrary deterministic f F n , m , and the achievable upper bound in (69) holds for the construction of the deterministic function f = f F n , m (based on the Huffman algorithm) in Theorem 3. Numerical results are shown in Figure 4, providing plots of the upper and lower bounds on 1 k log 2 E g X k ρ ( X k ) E g Y k ρ ( Y k ) in Theorem 3, and illustrating the improved tightness of these bounds when the value of k is increased from 100 (left plot) to 1000 (right plot). From Theorem 3-(c), for sufficiently large k, the gap between the upper and lower bounds is less than 0.08607 bits (for all ρ > 0 ); this is consistent with the right plot of Figure 4 where k = 1000 .

5.2. Lossless Source Coding

5.2.1. Background

For uniquely decodable (UD) lossless source coding, Campbell [51,83] proposed the cumulant generating function of the codeword lengths as a generalization to the frequently used design criterion of average code length. Campbell’s motivation in [51] was to control the contribution of the longer codewords via a free parameter in the cumulant generating function: if the value of this parameter tends to zero, then the resulting design criterion becomes the average code length per source symbol; on the other hand, by increasing the value of the free parameter, the penalty for longer codewords is more severe, and the resulting code optimization yields a reduction in the fluctuations of the codeword lengths.
We introduce the coding theorem by Campbell [51] for lossless compression of a discrete memoryless source (DMS) with UD codes, which serves for our analysis jointly with Theorem 2.
Theorem 4 
(Campbell 1965, [51]). Consider a DMS which emits symbols with a probability mass function P X defined on a (finite or countably infinite) set X . Consider a UD fixed-to-variable source code operating on source sequences of k symbols with an alphabet of the codewords of size D. Let ( x k ) be the length of the codeword which corresponds to the source sequence x k : = ( x 1 , , x k ) X k . Consider the scaled cumulant generating function of the codeword lengths
Λ k ( ρ ) : = 1 k log D x k X k P X k ( x k ) D ρ ( x k ) , ρ > 0
where
P X k ( x k ) = i = 1 k P X ( x i ) , x k X k .
Then, for every ρ > 0 , the following hold:
(a) 
Converse result:
Λ k ( ρ ) ρ 1 log D H 1 1 + ρ ( X ) .
(b) 
Achievability result: there exists a UD source code, for which
Λ k ( ρ ) ρ 1 log D H 1 1 + ρ ( X ) + 1 k .
The term scaled cumulant generating function is used in view of [56] (Remark 20). The bounds in Theorem 4, expressed in terms of the Rényi entropy, imply that for sufficiently long source sequences, it is possible to make the scaled cumulant generating function of the codeword lengths approach the Rényi entropy as closely as desired by a proper fixed-to-variable UD source code; moreover, the converse result shows that there is no UD source code for which the scaled cumulant generating function of its codeword lengths lies below the Rényi entropy. By invoking L’Hôpital’s rule, one gets from (72)
lim ρ 0 Λ k ( ρ ) ρ = 1 k x k X k P X k ( x k ) ( x k ) = 1 k E [ ( X k ) ] .
Hence, by letting ρ tend to zero in (74) and (75), it follows from (4) that Campbell’s result in Theorem 4 generalizes the well-known bounds on the optimal average length of UD fixed-to-variable source codes (see, e.g., [84] ((5.33) and (5.37))):
1 log D H ( X ) 1 k E [ ( X k ) ] 1 log D H ( X ) + 1 k ,
and (77) is satisfied by Huffman coding (see, e.g., [84] (Theorem 5.8.1)). Campbell’s result therefore generalizes Shannon’s fundamental result in [85] for the average codeword lengths of lossless compression codes, expressed in terms of the Shannon entropy.
Following the work by Campbell [51], Courtade and Verdú derived in [52] non-asymptotic bounds for the scaled cumulant generating function of the codeword lengths for P X -optimal variable-length lossless codes [23,86]. These bounds were used in [52] to obtain simple proofs of the asymptotic normality of the distribution of codeword lengths, and the reliability function of memoryless sources allowing countably infinite alphabets. Sason and Verdú recently derived in [56] improved non-asymptotic bounds on the cumulant generating function of the codeword lengths for fixed-to-variable optimal lossless source coding without prefix constraints, and non-asymptotic bounds on the reliability function of a DMS, tightening the bounds in [52].

5.2.2. Analysis

The following analysis for lossless source compression with UD codes relies on a combination of Theorems 2 and 4.
Let X 1 , , X k be i.i.d. symbols which are emitted from a DMS according to a probability mass function P X whose support is a finite set X with | X | = n . Similarly to Section 5.1, to cluster the data, suppose that each symbol X i is mapped to Y i = f ( X i ) where f F n , m is an arbitrary deterministic function (independent of the index i) with m < n . Consequently, the i.i.d. symbols Y 1 , , Y k take values on a set Y with | Y | = m < | X | . Consider two UD fixed-to-variable source codes: one operating on the sequences x k X k , and the other one operates on the sequences y k Y k ; let D be the size of the alphabets of both source codes. Let ( x k ) and ¯ ( y k ) denote the length of the codewords for the source sequences x k and y k , respectively, and let Λ k ( · ) and Λ ¯ k ( · ) denote their corresponding scaled cumulant generating functions (see (72)).
In view of Theorem 4-(b), for every ρ > 0 , there exists a UD source code for the sequences in X k such that the scaled cumulant generating function of its codeword lengths satisfies (75). Furthermore, from Theorem 4-(a), we get
Λ ¯ k ( ρ ) ρ 1 log D H 1 1 + ρ ( Y ) .
From (75), (78) and Theorem 2 (a) and (b), for every ρ > 0 , there exist a UD source code for the sequences in X k , and a construction of a deterministic function f F n , m (as specified by Steps 1–4 in the proof of Theorem 2, borrowed from [12]) such that the difference between the two scaled cumulant generating functions satisfies
Λ k ( ρ ) Λ ¯ k ( ρ ) ρ log D H 1 1 + ρ ( X ) H 1 1 + ρ ( X ˜ m ) + v 1 1 + ρ + ρ k ,
where (79) holds for every UD source code operating on the sequences in Y k with Y i = f ( X i ) (for i = 1 , , k ) and the specific construction of f F n , m as above, and X ˜ m in the right side of (79) is a random variable whose probability mass function is given in (30). The right side of (79) can be very well approximated (for all ρ > 0 ) by using (70).
We proceed with a derivation of a lower bound on the left side of (79). In view of Theorem 4, it follows that (74) is satisfied for every UD source code which operates on the sequences in X k ; furthermore, Theorems 2 and 4 imply that, for every f F n , m , there exists a UD source code which operates on the sequences in Y k such that
Λ ¯ k ( ρ ) ρ 1 log D H 1 1 + ρ ( Y ) + 1 k ,
1 log D H 1 1 + ρ ( X ˜ m ) + 1 k ,
where (81) is due to (39) since Y i = f ( X i ) (for i = 1 , , k ) with an arbitrary deterministic function f F n , m , and Y i P Y for every i; hence, from (74), (80) and (81),
Λ k ( ρ ) Λ ¯ k ( ρ ) ρ log D H 1 1 + ρ ( X ) H 1 1 + ρ ( X ˜ m ) ρ k .
We summarize our result as follows.
Theorem 5.
Let
  • X 1 , , X k be i.i.d. symbols which are emitted from a DMS according to a probability mass function P X whose support is a finite set X with | X | = n ;
  • Each symbol X i be mapped to Y i = f ( X i ) where f F n , m is the deterministic function (independent of the index i) with m < n , as specified by Steps 1–4 in the proof of Theorem 2 (borrowed from [12]);
  • Two UD fixed-to-variable source codes be used: one code encodes the sequences x k X k , and the other code encodes their mappings y k Y k ; let the common size of the alphabets of both codes be D;
  • Λ k ( · ) and Λ ¯ k ( · ) be, respectively, the scaled cumulant generating functions of the codeword lengths of the k-length sequences in X k (see (72)) and their mapping to Y k .
Then, for every ρ > 0 , the following holds for the difference between the scaled cumulant generating functions Λ k ( · ) and Λ ¯ k ( · ) :
(a) 
There exists a UD source code for the sequences in X k such that the upper bound in (79) is satisfied for every UD source code which operates on the sequences in Y k ;
(b) 
There exists a UD source code for the sequences in Y k such that the lower bound in (82) holds for every UD source code for the sequences in X k ; furthermore, the lower bound in (82) holds in general for every deterministic function f F n , m ;
(c) 
The gap between the upper and lower bounds in (79) and (82), respectively, is at most ρ log D v 1 1 + ρ + 2 ρ k (the function v : ( 0 , ) ( 0 , ) is introduced in (33)), which is approximately 0.08607 ρ log D 2 1 + ρ + 2 ρ k ;
(d) 
The UD source codes in Items (a) and (b) for the sequences in X k and Y k , respectively, can be constructed to be prefix codes by the algorithm in Remark 11.
Remark 11 (An Algorithm for Theorem 5 (d)).
A construction of the UD source codes for the sequences in X k and Y k , whose existence is assured by Theorem 5 (a) and (b) respectively, is obtained by the following algorithm (of three steps) which also constructs them as prefix codes:
(1) 
As a preparatory step, we first calculate the probability mass function P Y from the given probability mass function P X and the deterministic function f F n , m which is obtained by Steps 1–4 in the proof of Theorem 2; accordingly, P Y ( y ) = x X : f ( x ) = y P X ( x ) for all y Y . We then further calculate the probability mass functions for the i.i.d. sequences in X k and Y k (see (73)); recall that the number of types in X k and Y k is polynomial in k (being upper bounded by ( k + 1 ) n 1 and ( k + 1 ) m 1 , respectively), and the values of these probability mass functions are fixed over each type;
(2) 
The sets of codeword lengths of the two UD source codes, for the sequences in X k and Y k , can (separately) be designed according to the achievability proof in Campbell’s paper (see [51] (p. 428)). More explicitly, let α : = 1 1 + ρ ; for all x k X k , let ( x k ) N be given by
( x k ) = α log D P X k ( x k ) + log D Q k
with
Q k : = x k X k P X k α ( x k ) = x X P X α ( x ) k ,
and let ¯ ( y k ) N , for all y k Y k , be given similarly to (83) and (84) by replacing P X with P Y , and P X k with P Y k . This suggests codeword lengths for the two codes which fulfil (75) and (80), and also, both satisfy Kraft’s inequality;
(3) 
The separate construction of two prefix codes (a.k.a. instantaneous codes) based on their given sets of codeword lengths { ( x k ) } x k X k and { ¯ ( y k ) } y k Y k , as determined in Step 2, is standard (see, e.g., the construction in the proof of [84] (Theorem 5.2.1)).
Theorem 5 is of interest since it provides upper and lower bounds on the reduction in the cumulant generating function of close-to-optimal UD source codes because of clustering data, and Remark 11 suggests an algorithm to construct such UD codes which are also prefix codes. For long enough sequences (as k ), the upper and lower bounds on the difference between the scaled cumulant generating functions of the suggested source codes for the original and clustered data almost match (see (79) and (82)), being roughly equal to ρ H 1 1 + ρ ( X ) H 1 1 + ρ ( X ˜ m ) (with logarithms on base D, which is the alphabet size of the source codes); as k , the gap between these upper and lower bounds is less than 0.08607 log D 2 . Furthermore, in view of (76),
lim ρ 0 Λ k ( ρ ) Λ ¯ k ( ρ ) ρ = 1 k E [ ( X k ) ] E [ ¯ ( Y k ) ] ,
so, it follows from (4), (33), (79) and (82) that the difference between the average code lengths (normalized by k) of the original and clustered data satisfies
H ( X ) H ( X ˜ m ) log D 1 k E [ ( X k ) ] E [ ¯ ( Y k ) ] k H ( X ) H ( X ˜ m ) + 0.08607 log 2 log D ,
where the gap between the upper and lower bounds is equal to 0.08607 log D 2 + 1 k .

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof of Lemma 2

We first find the extreme values of p min under the assumption that P P n ( ρ ) . If p max p min = 1 , then P is the equiprobable distribution on X and p min = 1 n . On the other hand, if p max p min = ρ , then the minimal possible value of p min is obtained when P is the one-odd-mass distribution with n 1 masses equal to ρ p min and a smaller mass equal to p min . The latter case yields p min = 1 1 + ( n 1 ) ρ .
Let β : = p min , so β can get any value in the interval 1 1 + ( n 1 ) ρ , 1 n : = Γ ρ ( n ) . From Lemma 1, P Q β and Q β P n ( ρ ) , and the Schur-concavity of the Rényi entropy yields H α ( P ) H α ( Q β ) for all P P n ( ρ ) with p min = β . Minimizing H α ( P ) over P P n ( ρ ) can be therefore restricted to minimizing H α ( Q β ) over β Γ ρ ( n ) .

Appendix B. Proof of Lemma 3

The sequence { c α ( n ) ( ρ ) } n N is non-negative since H α ( P ) log n for all P P n . To prove (17),
0 c α ( n ) ( ρ ) = log n min P P n ( ρ ) H α ( P )
log n min P P n ( ρ ) H ( P )
log n log n ρ = log ρ
where (A2) holds since H α ( P ) is monotonically decreasing in α , and (A3) is due to (5) and p max ρ n .
Let U n denote the equiprobable probability mass function on { 1 , , n } . By the identity
D α ( P U n ) = log n H α ( P ) ,
and since, by Lemma 2, H α ( · ) attains its minimum over the set of probability mass functions P n ( ρ ) , it follows that D α ( · U n ) attains its maximum over this set. Let P P n ( ρ ) be the probability measure which achieves the minimum in c α ( n ) ( ρ ) (see (16)), then from (A4)
c α ( n ) ( ρ ) = max P P n ( ρ ) D α ( P U n )
= D α ( P U n ) .
Let Q be the probability mass function which is defined on { 1 , , 2 n } as follows:
Q ( i ) = { 1 2 P ( i ) , i { 1 , , n } , 1 2 P ( i n ) , i { n + 1 , , 2 n } .
Since by assumption P P n ( ρ ) , it is easy to verify from (A7) that
Q P 2 n ( ρ ) .
Furthermore, from (A7),
D α ( Q U 2 n ) = 1 α 1 log i = 1 2 n Q ( i ) α 1 2 n 1 α
= 1 α 1 log 1 2 i = 1 n P ( i ) α 1 n 1 α + 1 2 i = n + 1 2 n P ( i n ) α 1 n 1 α
= 1 α 1 log i = 1 n P ( i ) α 1 n 1 α
= D α ( P U n ) .
Combining (A5)–(A12) yields
c α ( 2 n ) ( ρ ) = max Q P 2 n ( ρ ) D α ( Q U 2 n )
D α ( Q U 2 n )
= D α ( P U n )
= c α ( n ) ( ρ ) ,
proving (18). Finally, in view of (A5), c α ( n ) ( ρ ) is monotonically increasing in α since so is the Rényi divergence of order α (see [87] (Theorem 3)).

Appendix C. Proof of Lemma 4

From Lemma 2, the minimizing distribution of H α is given by Q β P n ( ρ ) where
Q β = ρ β , , ρ β i , 1 ( n + i ρ i 1 ) β , β , β , , β n i 1
with β 1 1 + ( n 1 ) ρ , 1 n , and 1 ( n + i ρ i 1 ) β ρ β ρ n . It therefore follows that the influence of the middle probability mass of Q β on H α ( Q β ) tends to zero as n . Therefore, in this asymptotic case, one can instead minimize H α ( Q ˜ m ) where
Q ˜ m = ρ β , , ρ β m , β , β , , β n m
with the free parameter m { 1 , , n } and β = 1 n + m ( ρ 1 ) (so that the total mass of Q ˜ m is equal to 1).
For α ( 0 , 1 ) ( 1 , ) , straightforward calculation shows that
H α ( Q ˜ m ) = 1 1 α log j = 1 n Q ˜ m α ( j ) = log n 1 α 1 log 1 + m n ( ρ α 1 ) 1 + m n ( ρ 1 ) α ,
and by letting n , the limit of the sequence { c α ( n ) ( ρ ) } n N exists, and it is equal to
c α ( ) ( ρ ) : = lim n c α ( n ) ( ρ ) = lim n log n min m { 1 , , n } H α ( Q ˜ m ) = lim n max m { 1 , , n } 1 α 1 log 1 + m n ( ρ α 1 ) 1 + m n ( ρ 1 ) α = max x [ 0 , 1 ] 1 α 1 log 1 + ( ρ α 1 ) x 1 + ( ρ 1 ) x α .
Let f α : [ 0 , 1 ] R be given by
f α ( x ) = 1 + ( ρ α 1 ) x 1 + ( ρ 1 ) x α , x [ 0 , 1 ] .
Then, f α ( 0 ) = f α ( 1 ) = 1 , and straightforward calculation shows that the derivative f α vanishes if and only if
x = x : = 1 + α ( ρ 1 ) ρ α ( 1 α ) ( ρ 1 ) ( ρ α 1 ) .
We rely here on a specialized version of the mean value theorem, known as Rolle’s theorem, which states that any real-valued differentiable function that attains equal values at two distinct points must have a point somewhere between them where the first derivative at this point is zero. By Rolle’s theorem, and due to the uniqueness of the point x in (A22), it follows that x ( 0 , 1 ) . Substituting (A22) into (A20) gives (20). Taking the limit of (20) when α gives the result in (21).
In the limit where α 1 , the Rényi entropy of order α tends to the Shannon entropy. Hence, letting α 1 in (20), it follows that for the Shannon entropy
c 1 ( ) ( ρ ) = lim α 1 c α ( ) ( ρ ) = lim α 1 1 α 1 log 1 + 1 + α ( ρ 1 ) ρ α ( 1 α ) ( ρ 1 ) α α 1 log 1 + 1 + α ( ρ 1 ) ρ α ( 1 α ) ( ρ α 1 ) = ρ log ρ ρ 1 log e log ρ log e ρ ρ 1 ,
where (A23) follows by invoking L’Hôpital’s rule. This proves (22).
From (17)–(19), we get 0 c α ( n ) ( ρ ) c α ( ) ( ρ ) . Since c α ( n ) ( ρ ) is monotonically increasing in α [ 0 , ] , for every n N , so is c α ( ) ( ρ ) ; hence, (21) yields c α ( ) ( ρ ) log ρ . This proves (24).

References

  1. Hardy, G.H.; Littlewood, J.E.; Pólya, G. Inequalities, 2nd ed.; Cambridge University Press: Cambridge, UK, 1952. [Google Scholar]
  2. Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
  3. Arnold, B.C. Majorization: Here, there and everywhere. Stat. Sci. 2007, 22, 407–413. [Google Scholar] [CrossRef]
  4. Arnold, B.C.; Sarabia, J.M. Majorization and the Lorenz Order with Applications in Applied Mathematics and Economics; Statistics for Social and Behavioral Sciences; Springer: New York, NY, USA, 2018. [Google Scholar]
  5. Cicalese, F.; Gargano, L.; Vaccaro, U. Information theoretic measures of distances and their econometric applications. In Proceedings of the 2013 IEEE International Symposium on Information Theory, Istanbul, Turkey, 7–12 July 2013; pp. 409–413. [Google Scholar]
  6. Steele, J.M. The Cauchy-Schwarz Master Class; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  7. Bhatia, R. Matrix Analysis Graduate Texts in Mathematics; Springer: New York, NY, USA, 1997. [Google Scholar]
  8. Horn, R.A.; Johnson, C.R. Matrix Analysis, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  9. Ben-Bassat, M.; Raviv, J. Rényi’s entropy and probability of error. IEEE Trans. Inf. Theory 1978, 24, 324–331. [Google Scholar] [CrossRef]
  10. Cicalese, F.; Vaccaro, U. Bounding the average length of optimal source codes via majorization theory. IEEE Trans. Inf. Theory 2004, 50, 633–637. [Google Scholar] [CrossRef]
  11. Cicalese, F.; Gargano, L.; Vaccaro, U. How to find a joint probability distribution with (almost) minimum entropy given the marginals. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 2178–2182. [Google Scholar]
  12. Cicalese, F.; Gargano, L.; Vaccaro, U. Bounds on the entropy of a function of a random variable and their applications. IEEE Trans. Inf. Theory 2018, 64, 2220–2230. [Google Scholar] [CrossRef]
  13. Cicalese, F.; Vaccaro, U. Maximum entropy interval aggregations. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–20 June 2018; pp. 1764–1768. [Google Scholar]
  14. Harremoës, P. A new look on majorization. In Proceedings of the 2004 IEEE International Symposium on Information Theory and Its Applications, Parma, Italy, 10–13 October 2004; pp. 1422–1425. [Google Scholar]
  15. Ho, S.W.; Yeung, R.W. The interplay between entropy and variational distance. IEEE Trans. Inf. Theory 2010, 56, 5906–5929. [Google Scholar] [CrossRef]
  16. Ho, S.W.; Verdú, S. On the interplay between conditional entropy and error probability. IEEE Trans. Inf. Theory 2010, 56, 5930–5942. [Google Scholar] [CrossRef]
  17. Ho, S.W.; Verdú, S. Convexity/concavity of the Rényi entropy and α-mutual information. In Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong Kong, China, 14–19 June 2015; pp. 745–749. [Google Scholar]
  18. Joe, H. Majorization, entropy and paired comparisons. Ann. Stat. 1988, 16, 915–925. [Google Scholar] [CrossRef]
  19. Joe, H. Majorization and divergence. J. Math. Anal. Appl. 1990, 148, 287–305. [Google Scholar] [CrossRef]
  20. Koga, H. Characterization of the smooth Rényi entropy using majorization. In Proceedings of the 2013 IEEE Information Theory Workshop, Seville, Spain, 9–13 September 2013; pp. 604–608. [Google Scholar]
  21. Puchala, Z.; Rudnicki, L.; Zyczkowski, K. Majorization entropic uncertainty relations. J. Phys. A Math. Theor. 2013, 46, 1–12. [Google Scholar] [CrossRef]
  22. Sason, I.; Verdú, S. Arimoto-Rényi conditional entropy and Bayesian M-ary hypothesis testing. IEEE Trans. Inf. Theory 2018, 64, 4–25. [Google Scholar] [CrossRef]
  23. Verdú, S. Information Theory, 2018; in preparation.
  24. Witsenhhausen, H.S. Some aspects of convexity useful in information theory. IEEE Trans. Inf. Theory 1980, 26, 265–271. [Google Scholar] [CrossRef]
  25. Xi, B.; Wang, S.; Zhang, T. Schur-convexity on generalized information entropy and its applications. In Information Computing and Applications; Lecture Notes in Computer Science; Springer: New York, NY, USA, 2011; Volume 7030, pp. 153–160. [Google Scholar]
  26. Inaltekin, H.; Hanly, S.V. Optimality of binary power control for the single cell uplink. IEEE Trans. Inf. Theory 2012, 58, 6484–6496. [Google Scholar] [CrossRef]
  27. Jorshweick, E.; Bosche, H. Majorization and matrix-monotone functions in wireless communications. Found. Trends Commun. Inf. Theory 2006, 3, 553–701. [Google Scholar] [CrossRef]
  28. Palomar, D.P.; Jiang, Y. MIMO transceiver design via majorization theory. Found. Trends Commun. Inf. Theory 2006, 3, 331–551. [Google Scholar] [CrossRef]
  29. Roventa, I. Recent Trends in Majorization Theory and Optimization: Applications to Wireless Communications; Editura Pro Universitaria & Universitaria Craiova: Bucharest, Romania, 2015. [Google Scholar]
  30. Sezgin, A.; Jorswieck, E.A. Applications of majorization theory in space-time cooperative communications. In Cooperative Communications for Improved Wireless Network Transmission: Framework for Virtual Antenna Array Applications; Information Science Reference; Uysal, M., Ed.; IGI Global: Hershey, PA, USA, 2010; pp. 429–470. [Google Scholar]
  31. Viswanath, P.; Anantharam, V. Optimal sequences and sum capacity of synchronous CDMA systems. IEEE Trans. Inf. Theory 1999, 45, 1984–1993. [Google Scholar] [CrossRef]
  32. Viswanath, P.; Anantharam, V.; Tse, D.N.C. Optimal sequences, power control, and user capacity of synchronous CDMA systems with linear MMSE multiuser receivers. IEEE Trans. Inf. Theory 1999, 45, 1968–1983. [Google Scholar] [CrossRef] [Green Version]
  33. Viswanath, P.; Anantharam, V. Optimal sequences for CDMA under colored noise: A Schur-saddle function property. IEEE Trans. Inf. Theory 2002, 48, 1295–1318. [Google Scholar] [CrossRef]
  34. Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Probability Theory and Mathematical Statistics, Berkeley, CA, USA, 8–9 August 1961; pp. 547–561. [Google Scholar]
  35. Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef] [Green Version]
  36. Arikan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef] [Green Version]
  37. Arikan, E.; Merhav, N. Joint source-channel coding and guessing with application to sequential decoding. IEEE Trans. Inf. Theory 1998, 44, 1756–1769. [Google Scholar] [CrossRef] [Green Version]
  38. Burin, A.; Shayevitz, O. Reducing guesswork via an unreliable oracle. IEEE Trans. Inf. Theory 2018, 64, 6941–6953. [Google Scholar] [CrossRef]
  39. Kuzuoka, S. On the conditional smooth Rényi entropy and its applications in guessing and source coding. arXiv, 2018; arXiv:1810.09070. [Google Scholar]
  40. Merhav, N.; Arikan, E. The Shannon cipher system with a guessing wiretapper. IEEE Trans. Inf. Theory 1999, 45, 1860–1866. [Google Scholar] [CrossRef] [Green Version]
  41. Sundaresan, R. Guessing based on length functions. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 716–719. [Google Scholar]
  42. Salamatian, S.; Beirami, A.; Cohen, A.; Médard, M. Centralized versus decentralized multi-agent guesswork. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 2263–2267. [Google Scholar]
  43. Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef]
  44. Bracher, A.; Lapidoth, A.; Pfister, C. Distributed task encoding. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 1993–1997. [Google Scholar]
  45. Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef]
  46. Shayevitz, O. On Rényi measures and hypothesis testing. In Proceedings of the 2011 IEEE International Symposium on Information Theory, Saint Petersburg, Russia, 31 July–5 August 2011; pp. 800–804. [Google Scholar]
  47. Tomamichel, M.; Hayashi, M. Operational interpretation of Rényi conditional mutual information via composite hypothesis testing against Markov distributions. In Proceedings of the 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, 10–15 July 2016; pp. 585–589. [Google Scholar]
  48. Harsha, P.; Jain, R.; McAllester, D.; Radhakrishnan, J. The communication complexity of correlation. IEEE Trans. Inf. Theory 2010, 56, 438–449. [Google Scholar] [CrossRef]
  49. Liu, J.; Verdú, S. Rejection sampling and noncausal sampling under moment constraints. In Proceedings of the 2018 IEEE International Symposium on Information Theory, Vail, CO, USA, 17–22 June 2018; pp. 1565–1569. [Google Scholar]
  50. Yu, L.; Tan, V.Y.F. Wyner’s common information under Rényi divergence measures. IEEE Trans. Inf. Theory 2018, 64, 3616–3632. [Google Scholar] [CrossRef]
  51. Campbell, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
  52. Courtade, T.; Verdú, S. Cumulant generating function of codeword lengths in optimal lossless compression. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2494–2498. [Google Scholar]
  53. Courtade, T.; Verdú, S. Variable-length lossy compression and channel coding: Non-asymptotic converses via cumulant generating functions. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2499–2503. [Google Scholar]
  54. Hayashi, M.; Tan, V.Y.F. Equivocations, exponents, and second-order coding rates under various Rényi information measures. IEEE Trans. Inf. Theory 2017, 63, 975–1005. [Google Scholar] [CrossRef]
  55. Kuzuoka, S. On the smooth Rényi entropy and variable-length source coding allowing errors. In Proceedings of the 2016 IEEE International Symposium on Information Theory, Barcelona, Spain, 10–15 July 2016; pp. 745–749. [Google Scholar]
  56. Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef]
  57. Tan, V.Y.F.; Hayashi, M. Analysis of ramaining uncertainties and exponents under various conditional Rényi entropies. IEEE Trans. Inf. Theory 2018, 64, 3734–3755. [Google Scholar] [CrossRef]
  58. Tyagi, H. Coding theorems using Rényi information measures. In Proceedings of the 2017 IEEE Twenty-Third National Conference on Communications, Chennai, India, 2–4 March 2017; pp. 1–6. [Google Scholar]
  59. Csiszár, I. Generalized cutoff rates and Rényi information measures. IEEE Trans. Inf. Theory 1995, 41, 26–34. [Google Scholar] [CrossRef]
  60. Polyanskiy, Y.; Verdú, S. Arimoto channel coding converse and Rényi divergence. In Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, 29 September–1 October 2010; pp. 1327–1333. [Google Scholar]
  61. Sason, I. On the Rényi divergence, joint range of relative entropies, and a channel coding theorem. IEEE Trans. Inf. Theory 2016, 62, 23–34. [Google Scholar] [CrossRef]
  62. Yu, L.; Tan, V.Y.F. Rényi resolvability and its applications to the wiretap channel. In Lecture Notes in Computer Science, Proceedings of the 10th International Conference on Information Theoretic Security, Hong Kong, China, 29 November–2 December 2017; Springer: New York, NY, USA, 2017; Volume 10681, pp. 208–233. [Google Scholar]
  63. Arimoto, S. On the converse to the coding theorem for discrete memoryless channels. IEEE Trans. Inf. Theory 1973, 19, 357–359. [Google Scholar] [CrossRef]
  64. Arimoto, S. Information measures and capacity of order α for discrete memoryless channels. In Proceedings of the 2nd Colloquium on Information Theory, Keszthely, Hungary, 25–30 August 1975; Csiszár, I., Elias, P., Eds.; Colloquia Mathematica Societatis Janós Bolyai: Amsterdam, The Netherlands, 1977; Volume 16, pp. 41–52. [Google Scholar]
  65. Dalai, M. Lower bounds on the probability of error for classical and classical-quantum channels. IEEE Trans. Inf. Theory 2013, 59, 8027–8056. [Google Scholar] [CrossRef]
  66. Leditzky, F.; Wilde, M.M.; Datta, N. Strong converse theorems using Rényi entropies. J. Math. Phys. 2016, 57, 1–33. [Google Scholar] [CrossRef]
  67. Mosonyi, M.; Ogawa, T. Quantum hypothesis testing and the operational interpretation of the quantum Rényi relative entropies. Commun. Math. Phys. 2015, 334, 1617–1648. [Google Scholar] [CrossRef]
  68. Simic, S. Jensen’s inequality and new entropy bounds. Appl. Math. Lett. 2009, 22, 1262–1265. [Google Scholar] [CrossRef] [Green Version]
  69. Jelineck, F.; Schneider, K.S. On variable-length-to-block coding. IEEE Trans. Inf. Theory 1972, 18, 765–774. [Google Scholar] [CrossRef]
  70. Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completness; W. H. Freedman and Company: New York, NY, USA, 1979. [Google Scholar]
  71. Boztaş, S. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar] [CrossRef]
  72. Bracher, A.; Hof, E.; Lapidoth, A. Guessing attacks on distributed-storage systems. In Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong-Kong, China, 14–19 June 2015; pp. 1585–1589. [Google Scholar]
  73. Christiansen, M.M.; Duffy, K.R. Guesswork, large deviations, and Shannon entropy. IEEE Trans. Inf. Theory 2013, 59, 796–802. [Google Scholar] [CrossRef]
  74. Hanawal, M.K.; Sundaresan, R. Guessing revisited: A large deviations approach. IEEE Trans. Inf. Theory 2011, 57, 70–78. [Google Scholar] [CrossRef]
  75. Hanawal, M.K.; Sundaresan, R. The Shannon cipher system with a guessing wiretapper: General sources. IEEE Trans. Inf. Theory 2011, 57, 2503–2516. [Google Scholar] [CrossRef]
  76. Huleihel, W.; Salamatian, S.; Médard, M. Guessing with limited memory. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 2258–2262. [Google Scholar]
  77. Massey, J.L. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
  78. McEliece, R.J.; Yu, Z. An inequality on entropy. In Proceedings of the 1995 IEEE International Symposium on Information Theory, Whistler, BC, Canada, 17–22 September 1995; p. 329. [Google Scholar]
  79. Pfister, C.E.; Sullivan, W.G. Rényi entropy, guesswork moments and large deviations. IEEE Trans. Inf. Theory 2004, 50, 2794–2800. [Google Scholar] [CrossRef]
  80. De Santis, A.; Gaggia, A.G.; Vaccaro, U. Bounds on entropy in a guessing game. IEEE Trans. Inf. Theory 2001, 47, 468–473. [Google Scholar] [CrossRef]
  81. Yona, Y.; Diggavi, S. The effect of bias on the guesswork of hash functions. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 2253–2257. [Google Scholar]
  82. Gan, G.; Ma, C.; Wu, J. Data Clustering: Theory, Algorithms, and Applications; ASA-SIAM Series on Statistics and Applied Probability; SIAM: Philadelphia, PA, USA, 2007. [Google Scholar]
  83. Campbell, L.L. Definition of entropy by means of a coding problem. Probab. Theory Relat. Field 1966, 6, 113–118. [Google Scholar] [CrossRef]
  84. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  85. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  86. Kontoyiannis, I.; Verdú, S. Optimal lossless data compression: Non-asymptotics and asymptotics. IEEE Trans. Inf. Theory 2014, 60, 777–795. [Google Scholar] [CrossRef]
  87. Van Erven, T.; Harremoës, P. Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 2014, 60, 3797–3820. [Google Scholar] [CrossRef]
Figure 1. A plot of c α ( ) ( ρ ) in (20) and (22) (log is on base 2) as a function of ρ , confirming numerically the properties in (21) and (23).
Figure 1. A plot of c α ( ) ( ρ ) in (20) and (22) (log is on base 2) as a function of ρ , confirming numerically the properties in (21) and (23).
Entropy 20 00896 g001
Figure 2. Plots of c α ( n ) ( ρ ) in (16) (log is on base 2) as a function of α > 0 , for ρ = 2 (left plot) and ρ = 256 (right plot), with several values of n 2 .
Figure 2. Plots of c α ( n ) ( ρ ) in (16) (log is on base 2) as a function of α > 0 , for ρ = 2 (left plot) and ρ = 256 (right plot), with several values of n 2 .
Entropy 20 00896 g002
Figure 3. A plot of c 1 ( ) ( ρ ) in (22) versus c 1 ( n ) ( ρ ) for finite n ( n = 512 , 128 , 32 , and 8) as a function of ρ.
Figure 3. A plot of c 1 ( ) ( ρ ) in (22) versus c 1 ( n ) ( ρ ) for finite n ( n = 512 , 128 , 32 , and 8) as a function of ρ.
Entropy 20 00896 g003
Figure 4. Plots of the upper and lower bounds on 1 k log 2 E g X k ρ ( X k ) E g Y k ρ ( Y k ) in Theorem 3, as a function of ρ > 0 , for random vectors of length k = 100 (left plot) or k = 1000 (right plot) in the setting of Example 1. Each plot shows the universal lower bound for an arbitrary deterministic f F 128 , 16 , and the achievable upper bound with the construction of the deterministic function f = f F 128 , 16 (based on the Huffman algorithm) in Theorem 3 (see, respectively, (67) and (69)).
Figure 4. Plots of the upper and lower bounds on 1 k log 2 E g X k ρ ( X k ) E g Y k ρ ( Y k ) in Theorem 3, as a function of ρ > 0 , for random vectors of length k = 100 (left plot) or k = 1000 (right plot) in the setting of Example 1. Each plot shows the universal lower bound for an arbitrary deterministic f F 128 , 16 , and the achievable upper bound with the construction of the deterministic function f = f F 128 , 16 (based on the Huffman algorithm) in Theorem 3 (see, respectively, (67) and (69)).
Entropy 20 00896 g004

Share and Cite

MDPI and ACS Style

Sason, I. Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression. Entropy 2018, 20, 896. https://doi.org/10.3390/e20120896

AMA Style

Sason I. Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression. Entropy. 2018; 20(12):896. https://doi.org/10.3390/e20120896

Chicago/Turabian Style

Sason, Igal. 2018. "Tight Bounds on the Rényi Entropy via Majorization with Applications to Guessing and Compression" Entropy 20, no. 12: 896. https://doi.org/10.3390/e20120896

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop