Next Article in Journal
Fast and Secure Image Encryption Algorithm with Simultaneous Shuffling and Diffusion Based on a Time-Delayed Combinatorial Hyperchaos Map
Next Article in Special Issue
Optimal Resource Allocation for Loss-Tolerant Multicast Video Streaming
Previous Article in Journal
Density of Avoided Crossings and Diabatic Representation
Previous Article in Special Issue
Improving Text-to-SQL with a Hybrid Decoding Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Families of Jensen-like Inequalities with Application to Information Theory

The Viterbi Faculty of Electrical and Computer Engineering, Technion—Israel Institute of Technology, Technion City, Haifa 3200003, Israel
Entropy 2023, 25(5), 752; https://doi.org/10.3390/e25050752
Submission received: 4 April 2023 / Revised: 23 April 2023 / Accepted: 3 May 2023 / Published: 4 May 2023
(This article belongs to the Collection Feature Papers in Information Theory)

Abstract

:
It is well known that the traditional Jensen inequality is proved by lower bounding the given convex function, f ( x ) , by the tangential affine function that passes through the point ( E { X } , f ( E { X } ) ) , where E { X } is the expectation of the random variable X. While this tangential affine function yields the tightest lower bound among all lower bounds induced by affine functions that are tangential to f, it turns out that when the function f is just part of a more complicated expression whose expectation is to be bounded, the tightest lower bound might belong to a tangential affine function that passes through a point different than ( E { X } , f ( E { X } ) ) . In this paper, we take advantage of this observation by optimizing the point of tangency with regard to the specific given expression in a variety of cases and thereby derive several families of inequalities, henceforth referred to as “Jensen-like” inequalities, which are new to the best knowledge of the author. The degree of tightness and the potential usefulness of these inequalities is demonstrated in several application examples related to information theory.

In memory of Jacob Ziv,
a shining star in the sky of information theory,
whose legacy as a researcher will continue to inspire me and many others
for years to come.

1. Introduction

As is well known, the Jensen inequality is one of the most fundamental and useful mathematical tools in a variety of fields, including information theory. Interestingly, it includes many other very well-known inequalities, which are important on their own, as special cases. Among many examples, we mention the Shwartz–Cauchy inequality (which in turn supports uncertainty principles and the Cramér–Rao bound), the Lyapunov inequality, the Hölder inequality, and the inequalities among the harmonic, geometric and arithmetic means. In the field of information theory, the Jensen inequality stands at the basis of the information inequality (i.e., the non-negativity of the relative entropy), the data processing inequality (which in turn leads to the Fano inequality), and the inequality between conditional and unconditional entropies. Moreover, it plays a central role in support of the derivation of single-letter formulas in Shannon theory and in the theory of maximum entropy under moment constraints (see, for example, Chapter 12 of [1]).
During the last two decades, there have been many research efforts around Jensen’s inequality, which included refinements [2,3,4,5], variations [6,7,8], improvements [9,10,11], and extensions [12], just to name a few. There have also many derivations of reversed versions of the Jensen inequality. For a non-exhaustive list of works, see, e.g., ref. [13] for mixtures of exponential families, refs. [14,15,16,17] for global bounds on the difference between the two sides of Jensen’s inequality, ref. [18] for functions of self-adjoint operators in Hibert spaces, refs. [19,20] for inequalities via Green functions, refs. [21,22] for inequalities via Chebychev and Chernoff bounds, ref. [23] for quantum Simpson’s and quantum Newton’s inequalities, and ref. [24] for new quantum Hermite–Hadamard-like inequalities. In most of them, the derived inequalities are exemplified in many applications, for instance, useful relationships between arithmetic and geometric means, converse bounds on the entropy, the relative entropy, as well as the more general f-divergence, converse forms of the Hölder inequality, and so on. In many of these works, the main results are given in the form of an upper bound on the difference, E { f ( X ) } f ( E { X } ) , where f is a convex function, E { · } is the expectation operator, and X is the random variable. However, those bounds depend mostly on global parameters associated with f, for example, its range and domain, but not particularly on the underlying probability function (probability density function in the continuous case, or probability mass function in the discrete case), of X. For one thing, a desirable property of a reverse Jensen inequality would be that it is tight when X is well concentrated in the vicinity of its mean, just like the same well-known property of the ordinary Jensen inequality. In [22], there is an attempt to address this issue.
This paper revisits the Jensen inequality from a completely different angle. It is not meant to be another improvement of earlier bounds in an existing line of work. It is meant to propose a different approach for generating useful inequalities in the spirit of Jensen’s inequality. It is based on the following simple observation, which is rooted in the proof of Jensen’s inequality: The given convex function, f ( x ) , is lower bounded by the tangential affine function, ( x ) = f ( a ) + f ( a ) ( x a ) , where a is an arbitrary number in the domain of x and f ( a ) is the derivative of f at x = a (provided that f is differentiable at x = a ). By selecting a = E { X } and taking expectations of both sides of the inequality, f ( X ) ( X ) , the Jensen inequality is readily proved. The point to be remembered is that here, a = E { X } is the optimal choice of a in the sense of maximizing E { ( X ) } over all possible values of a, thus yielding the tightest lower bound within this class of lower bounds on E { f ( X ) } . The optimal choice of a, however, might be different than E { X } when the function f ( X ) is only a part of a more complicated expression whose expectation is to be lower bounded. For example, one might be interested in lower bounding E { g [ f ( X ) ] } , where g is a monotonically non-decreasing function, or E { f ( X ) g ( X ) } , where g is a non-negative and/or convex function, or a combination of both, etc.
To demonstrate this fact, consider the example (to be treated in detail in Section 2) of lower bounding E { f ( X ) g ( X ) } , where g is a non-negative function. In this case,
E { f ( X ) g ( X ) } E { [ f ( a ) + f ( a ) ( X a ) ] g ( X ) } ,
and by maximizing the right-hand side (r.h.s.) over a, we easily obtain that the optimal choice of a here is a = E { X g ( X ) } / E { g ( X ) } , yielding the inequality,
E { f ( X ) g ( X ) } f E { X g ( X ) } E { g ( X ) } · E { g ( X ) } ,
which is useful as long as g is such that we can easily calculate both E { g ( X ) } and E { X g ( X ) } . While this particular inequality could have been obtained also by applying the (ordinary) Jensen inequality, E { f ( X ) } f ( E { X } ) , with respect to (with respect to) the density, p ˜ ( x ) = p ( x ) g ( x ) / p ( x ) g ( x ) d x , we will see in the sequel also various examples of inequalities with no apparent simple interpretations such as this. We henceforth refer to these classes of inequalities as Jensen-like inequalities, since they are derived using the same general idea that underlies the proof the classical Jensen inequality. We will also demonstrate the usefulness of these inequalities in information theory.
Our contributions, in this work, have the following features:
  • In many cases (such as the one above), the optimal value of the parameter(s) (e.g., the parameter a in the above discussion) can be found in closed form. In other cases, the resulting expressions may not lend themselves to closed-form optimization, and then we have two possibilities: (i) carry out the optimization numerically, and (ii) select an arbitrary choice of a and obtain a valid lower bound, bearing in mind that an educated guess can potentially result in a good bound.
  • Our inequalities provide two types of bounds: (i) bounds that require the calculation of the first two moments (or equivalently, the first two cumulants) of X, and (ii) bounds that require the calculation of the moment-generating function (MGF) of X and its derivative, or equivalently, the cumulant-generating function (CGF) of X and its derivative. All these types of moments are often easily calculable in closed form, especially in situations where X is given by the sum of independent and identically distributed (i.i.d.) random variables, which is frequently encountered in information–theoretic applications.
  • Most of our derivations extend to convex functions of more than one variable.
  • The classes of Jensen-like inequalities that we consider allow enough flexibility to obtain derivations of lower bounds on functions that are not necessarily convex, and even for some concave functions, and thereby open the door for another route to reverse Jensen inequalities. This can be accomplished by representing the given function in one of the categories discussed (e.g., a product of a convex function and a non-negaive function, a product of two non-negative convex functions, a composition of a monotone function and a convex function, etc.).
  • We demonstrate the utility of the Jensen-like inequalities in several examples of information–theoretic relevance. We also display numerical results that exemplify the degree of tightness of these bounds.
  • Our Jensen-like inequalities have the desirable property of becoming tighter as X becomes more and more concentrated around its mean, just like the ordinary Jensen inequality.
  • Throughout the paper, we confine ourselves to lower bounds on expectations of expressions that include a convex function f, but it should be understood that they all continue to apply also if f is concave and the inequalities are reversed.
  • It should be understood that the classes of Jensen-like inequalities that we derive in this work are just examples that demonstrate the basic underlying idea of optimizing the point of tangency to the given convex function for the specific expression at hand. It is conceivable that the same idea can be applied to many more situations of theoretical and practical interest.
In all forthcoming derivations, it will be assumed that the convex functions involved are weakly convex and differentiable. In other words, we will rely on the well-known fact that a differentiable convex function, f ( x ) , is nowhere below the supporting line, ( x ) = f ( a ) + f ( a ) ( x a ) , for every value of the parameter a in the domain of the independent variable, x [25] (p. 69, eq. (3.2)). In order to show that the point of zero-derivative of the lower bound (w.r.t. a) indeed yields a maximum (and not a minimum, etc.) of the lower bound, we will need to further assume that f is twice differentiable, but such an assumption will not limit the applicability of the claimed lower bound, because the lower bound applies to any value of a, including the point of zero-derivative, even if this point cannot be proved to yield the maximum of the lower bound using the standard methods. Similar comments apply when the lower bound will depend on more than one parameter.
In the remaining part of this article, each section is devoted to a different class of Jensen-like inequalities, which corresponds to a different form of an expression that includes the convex function, f.

2. A Product of a Convex Function and a Non-Negative Function

In this section, we focus on lower bounding expressions of the form E { f ( X ) g ( X ) } , where f is convex and g is non-negative. Indeed, let f : R R be a convex function and let g : R R + be a non-negative function. Then, for any a R ,
(3) E { f ( X ) g ( X ) } E { [ f ( a ) + f ( a ) ( X a ) ] g ( X ) } (4) = [ f ( a ) a f ( a ) ] E { g ( X ) } + f ( a ) E { X g ( X ) } .
To find the value of a that maximizes the r.h.s., we equate the derivative to zero and obtain:
[ f ( a ) f ( a ) a f ( a ) ] E { g ( X ) } + f ( a ) E { X g ( X ) } = 0
or equivalently,
f ( a ) [ E { X g ( X ) } a E { g ( X ) } ] = 0 ,
whose solution is readily obtained as
a = a = E { X g ( X ) } E { g ( X ) } ,
and it is easy to verify that the second derivative at a = a is f ( a ) E { g ( X ) } < 0 , which means that it is a maximum (at least a local one). The resulting lower bound on E { f ( X ) g ( X ) } is then given by
E { f ( X ) g ( X ) } f E { X g ( X ) } E { g ( X ) } · E { g ( X ) } .
This result extends straightforwardly to the case where X is a vector provided that f is jointly convex and differentiable in all components of X. In particular, it extends to the case where f and g act as different random variables, X and Y, with a joint distribution:
E { f ( X ) g ( Y ) } f E { X g ( Y ) } E { g ( Y ) } · E { g ( Y ) } .
We next consider several examples.
Example 1. 
Let f ( x ) = ln x and g ( x ) = x , x > 0 . Applying Inequality (8),
E { X ln X } E { X } · ln E { X 2 } E { X } = E { X } · ln ( E { X } ) E { X } · ln 1 + V a r { X } [ E { X } ] 2 .
Note that the function x ln x is concave, rather than convex, yet we have here a lower bound (rather than an upper bound) to its expectation, namely, a reversed Jensen inequality. The first term on the right-most side is the (ordinary) Jensen upper bound on E { X ln X } , and the second term is the gap, which depends not only on the expectation of X but also on its variance, which manifests the fluctuations around E { X } . Clearly, if V a r { X } = 0 , the second term vanishes, which makes sense, because when X is a degenerated random variable, Jensen’s inequality is achieved with equality and there is no gap. This inequality has an immediate application for obtaining a lower bound to the expectation of the empirical entropy of a sequence drawn by a memoryless source, which is relevant in the context of universal source coding [26]. Each term of the empirical entropy is of the form X ln X , where X = N ( u ) / N , N ( u ) is the number of occurrences of a letter u in a randomly drawn N-tuple from a memoryless source, P, with a finite alphabet, U . Clearly, each N ( u ) is a binomial random variable with N trials and probability of success, P ( u ) . In this case, E { X } = P ( u ) and V a r { X } = P ( u ) [ 1 P ( u ) ] / N . Thus, denoting the entropy and the empirical entropy, respectively, by
H = u U P ( u ) ln P ( u )
H ^ = u U N ( u ) N ln N ( u ) N ,
with the convention that 0 ln 0 = 0 , we have:
E { H ^ } u U P ( u ) ln P ( u ) u U P ( u ) ln 1 + P ( u ) [ 1 P ( u ) ] / N P 2 ( u ) = H u U P ( u ) ln 1 + 1 P ( u ) N P ( u ) H u U P ( u ) · 1 P ( u ) N P ( u ) = H 1 N u U [ 1 P ( u ) ] (13) = H | U | 1 N ,
where | U | is the cardinality of U . The use of the ordinary Jensen inequality yields an upper bound rather than a lower bound, E { H ^ } H . We conclude that the expected empirical entropy, E { H ^ } , is sandwiched between H and H ( | U | 1 ) / N , which is reasonable because the variance of the empirical probabilities, N ( u ) / N , decays at the rate of 1 / N .
Example 2. 
Let s and t be two real numbers whose difference, s t , is either negative or larger than unity. Now, let g ( x ) = x t , and f ( x ) = x s t . Then,
E { X s } = E { X t X s t } E { X t + 1 } E { X t } s t · E { X t } (14) = ( E { X t + 1 } ) s t ( E { X t } ) s t 1 .
In particular, for t = 1 and s ( 1 , 2 ) , this becomes
E { X s } ( E { X 2 } ) s 1 ( E { X } ) s 2 = [ E { X } ] s · 1 + V a r { X } [ E { X } ] 2 s 1
which is, once again, a bound that depends only on the first two moments of X. For s ( 0 , 1 ) , the function x s is concave, and so, this is a reversed version of the Jensen inequality. For s 0 and s 2 , the function x s is convex, and so, this is an improved version of the Jensen inequality: While the first factor, [ E { X } ] s , corresponds to the ordinary Jensen inequality, the second factor expresses the improvement, which depends on the relative fluctuation term, V a r { X } / [ E { X } ] 2 . The degree of improvement depends, of course, on the variance of X. If the variance vanishes, there is nothing to improve because the ordinary Jensen inequality becomes an equality. On the other hand, the larger the variance, the larger the gap between the ordinary Jensen bound, [ E { X } ] s , and the improved one. Accordingly, this also demonstrates the role of the optimization of the parameter a as opposed to the default choice of a = E { X } of the ordinary Jensen inequality.
To particularize this example even further, consider the problem of randomized guessing under a distribution Q (see, e.g., [27] and many references therein). Then, the probability of a single success in guessing a discrete alphabet random variable, X, given that we know that X = x (but not the guesser), is Q ( x ) . In sequential guessing until the first success, the number of guesses, G, is a geometric RV with parameter p = Q ( x ) , whose mean and variance are 1 / p and ( 1 p ) / p 2 , respectively. For s ( 1 , 2 ) ,
E { G s } 1 p s · 1 + ( 1 p ) / p 2 1 / p 2 s 1 = ( 2 p ) s 1 p s = [ 2 Q ( x ) ] s 1 [ Q ( x ) ] s .
Example 3. 
Let f be an arbitrary convex function and let g ( x ) = e s x , where s is a given real number. Then, Inequality (8) becomes:
E { f ( X ) e s X } f ( ψ ( s ) ) · e ψ ( s )
where
ψ ( s ) = ln E { e s X }
is the CGF of X and ψ ( s ) is its derivative. This gives a lower bound in terms of the CGF of X and its derivative. The ordinary Jensen inequality is obtained as the special case of s = 0 , where ψ ( 0 ) = 0 and ψ ( 0 ) = E { X } .

3. A Composition of a Monotone Function and a Convex Function

Another family of Jensen-like inequalities corresponds to the need to lower bound an expression of the form E { g [ f ( X ) ] } , where f is convex as before and g is a monotonically non-decreasing function. The general idea is to carry out the optimization of the r.h.s. of the following inequality.
E { g [ f ( X ) ] } sup a E { g [ f ( a ) + f ( a ) ( X a ) ] } .
In the important special case where g ( x ) = e x , we have:
E { e f ( X ) } sup a E { e f ( a ) + f ( a ) ( X a ) } = sup a e f ( a ) a f ( a ) E { e X f ( a ) } (20) = exp sup a { f ( a ) a f ( a ) + ψ [ f ( a ) ] } ,
where ψ ( · ) is again the CGF of X. The optimal value, a , of a, is the solution to the equation obtained by equating the derivative of the exponent to zero, i.e.,
ψ [ f ( a ) ] = a , provided that f ( a ) ψ [ f ( a ) ] < 1 ,
where ψ ( · ) and ψ ( · ) are the first and the second derivatives of ψ ( · ) , respectively.
Example 4. 
Consider the case where f ( x ) = s x 2 / 2 and X N ( μ , σ 2 ) , where σ 2 < 1 / s , as otherwise, E { e s X 2 } = . In this case, the condition f ( a ) ψ [ f ( a ) ] < 1 is equivalent to σ 2 < 1 / s , and we have f ( a ) = s a , ψ ( t ) = μ t + σ 2 t 2 / 2 , and so, ψ ( t ) = μ + σ 2 t , which means that ψ [ f ( a ) ] = μ + σ 2 s a . The equation for the optimal a becomes then
μ + σ 2 s a = a ,
whose solution is
a = a = μ 1 σ 2 s ,
which yields
E e s X 2 / 2 exp s a 2 / 2 s a 2 + μ s a + σ 2 s 2 a 2 / 2 = exp μ 2 s 2 ( 1 σ 2 s ) .
The ordinary Jensen inequality yields
E e s X 2 / 2 exp s E { X 2 } / 2 = e s ( μ 2 + σ 2 ) / 2 ,
which does not capture the singularity at s = 1 / σ 2 . The exact calculation yields
E e s X 2 / 2 = 1 1 σ 2 s · exp μ 2 s 2 ( 1 σ 2 s ) ,
namely, the Jensen-like bound (24) gives the correct exponential term (along with the singularity at s = 1 / σ 2 ) and differs from the exact quantity only in the pre-exponential factor. Once again, this demonstrates the fact that optimizing the point of tangency, a, rather than using the default value, a = E { X } , can make a significant difference.

4. A Product of a Convex Function and a Monotone-Convex Composition

Yet another class of Jensen-like inequalities corresponds to lower bounding the expectation of the product of two functions, where one is convex and the other is a composition of a non-negative monotonically non-decreasing function and a convex function, i.e.,
E { h [ f ( X ) ] g ( X ) } sup a , b E { h [ f ( a ) + f ( a ) ( X a ) ] · [ g ( b ) + g ( b ) ( X b ) ] } ,
where f and g are convex and h is monotonically non-decreasing and non-negative. For the case where h ( x ) = e x , we end up with a bound that depends on the CGF of X and its derivative:
(28) E { e f ( X ) g ( X ) } E e f ( a ) + f ( a ) ( X a ) [ g ( b ) + g ( b ) ( X b ) ] (29) = e f ( a ) a f ( a ) E e X f ( a ) [ g ( b ) b g ( b ) + g ( b ) X ] (30) = exp { f ( a ) a f ( a ) + ψ [ f ( a ) ] } { g ( b ) + g ( b ) ( ψ [ f ( a ) ] b ) } .
Maximizing with respect to b while a is kept fixed yields b = ψ [ f ( a ) ] , and we obtain:
E { e f ( X ) g ( X ) } sup a exp { f ( a ) a f ( a ) + ψ [ f ( a ) ] } · g ( ψ [ f ( a ) ] ) .
Example 5. 
Considering the case where f ( x ) = ln x and g ( x ) = x ln x , we may obtain a reversed Jensen-like inequality, namely, a lower bound to the expectation of the concave function ln X :
(32) E { ln X } = E e ln X · X ln X (33) sup a 0 exp { ln a + 1 + ψ ( 1 / a ) } · ψ ( 1 / a ) ln ψ ( 1 / a ) (34) = sup α 0 exp { ln α + 1 + ψ ( α ) } ψ ( α ) ln ψ ( α ) (35) = e · sup α 0 α e ψ ( α ) ψ ( α ) ln ψ ( α ) (36) = e · sup α 0 α E { X e α X } ln E { X e α X } E { e α X } .
Defining the MGF ϕ ( s ) = E { e s X } = e ψ ( s ) , we have:
(37) E { ln X } e · sup α 0 α ϕ ( α ) ln ψ ( α ) (38) = e · sup α 0 α ϕ ( α ) ψ ( α ) ln ψ ( α ) (39) = e · sup α 0 α ϕ ( α ) ln ϕ ( α ) ϕ ( α ) .
We obtained a lower bound in terms of the MGF and its derivative (or, equivalently, the CGF and its derivative), which is appealing in cases where X is the sum of i.i.d. random variables.
Accordingly, we now particularize this example further by examining the case where X = 1 + i = 1 k Y i 2 , with Y i N ( 0 , σ 2 ) , i = 1 , , k , being independent random variables. The motivation of assessing an expression of the form, E ln 1 + i = 1 k Y i 2 , is two-fold. The first is that it is useful for bounding the ergodic capacity of the single-input, multiple-output (SIMO) channel, where { Y i } designates random channel transfer coefficients (see, e.g., [22,28,29] and references therein). The second is that it is relevant for bounding the joint differential entropy associated with the multivariate Cauchy density. Here, ( Y 1 , , Y k ) are not Gaussian as defined above, but their multivariate Cauchy density can be represented as a continuous mixture of i.i.d. zero-mean Gaussian random variables, where the mixture is taken over all possible variances—see [22] (Example 6) for the details. In this case,
(40) ϕ ( s ) = E exp s 1 + i = 1 k Y i 2 (41) = e s E { e s Y 2 } k (42) = e s ( 1 2 s σ 2 ) k / 2 , s < 1 2 σ 2 .
Thus,
ψ ( s ) = s k 2 ln ( 1 2 s σ 2 ) ,
and
ψ ( s ) = 1 + k σ 2 1 2 s σ 2 .
It follows that
E ln 1 + i = 1 k Y i 2 e · sup α 0 α e α ( 1 + 2 α σ 2 ) k / 2 1 + k σ 2 1 + 2 α σ 2 ln 1 + k σ 2 1 + 2 α σ 2 .
The Jensen upper bound, ln ( 1 + k σ 2 ) , and the lower bound (45) are displayed in Figure 1 for σ 2 = 1 and k = 1 , 2 , , 100 . As can be seen, the bounds are quite close. Interestingly, the choice α = 1 / ( k σ 2 ) yields results that are very close to those of the optimal α .
Another instance of this example is the circularly symmetric complex Gaussian channel whose signal-to-noise ratio (SNR), Z, is a random variable (e.g., due to fading), which is known to both the transmitter and the receiver. The capacity is given by C = E { ln ( 1 + g Z ) } , where g is a certain deterministic gain factor and the expectation is with respect to the randomness of Z. For simplicity, let us assume that Z is distributed exponentially, i.e.,
p ( z ) = θ e θ z z 0 0 z < 0
where the parameter θ > 0 is given. In this case,
ϕ ( α ) = θ e α θ + g α
and
ψ ( α ) = ln θ ln ( θ + g α ) α ,
and so,
E { ln ( 1 + g Z ) } e θ · sup α 0 α e α θ + g α · 1 + g g + θ α ln 1 + g g + θ α .
In Figure 2, we plot this lower bound as a function of θ for g = 5 and compare it to the Jensen upper bound, ln ( 1 + g / θ ) (red curve) and to the lower bound of [22] (Sect. 4.1, Example 1). As can be seen, the lower bound proposed here is considerably tighter, especially for small θ .
Example 6. 
Yet another example of this family of Jensen-like inequalities applies to obtaining a lower bound to E { X t } , where t is an arbitrary real. For a given t, let s 0 be either larger than 1 t or smaller than t , and consider the case where f ( x ) = x t + s , g ( x ) = s ln x and h ( x ) = e x . Then,
(50) E { X t } = E { e s ln X X t + s } (51) E exp s ln a 1 a ( X a ) · b t + s + ( t + s ) b t + s 1 ( X b ) (52) = e s [ 1 ln a ] ϕ s a b t + s + ( t + s ) b t + s 1 ψ s a b .
Choosing b = ψ ( s / a ) , and changing the optimization variable a into α = 1 / a , we obtain
E { X t } sup α 0 ( α e ) s ϕ ( α s ) [ ψ ( α s ) ] t + s .
More specifically, if X = i = 1 n Y i , where { Y i } are Bernoulli i.i.d., with parameter p, then ϕ ( s ) = ( p e s + q ) n , where q = 1 p . We then obtain
E { X t } sup α 0 ( α e ) s ( p e α s + q ) n · n p e α s p e α s + q t + s .
Selecting α = 1 / ( n p ) , we obtain
E { X t } ( n p ) t · e s ( p e s / ( n p ) + q ) n e s ( t + s ) / ( n p ) ( p e s / ( n p ) + q ) t + s .
The first factor is ( E X ) t . The second factor tends to unity as n grows, because p e s / n p + q p ( 1 s / ( n p ) ) + q = 1 s / n , and so, ( p e s / n p + q ) n ( 1 s / n ) n e s . For t 1 and t 0 , the function f ( x ) = x t is convex, and so, ( E X ) t is the ordinary Jensen lower bound. In this case, the bound is valuable if the multiplicative factor,
e s ( p e s / ( n p ) + q ) n e s ( t + s ) / ( n p ) ( p e s / ( n p ) + q ) t + s ,
is larger than unity. If 0 < t < 1 , the function f ( x ) = x t is concave, and then ( E X ) t is an upper bound. Of course, the parameter s can be optimized, too. Some numerical results for t = 0.5 are depicted in Figure 3. As can be seen, the upper and the lower bounds are fairly close.
Another application of this example is related to estimation theory. Let θ R and let Y 1 , , Y n be i.i.d., with mean θ and variance σ 2 . Consider the t-th moment of the estimation error, E θ | 1 n i = 1 n Y i θ | t . Defining X = 1 n i = 1 n Y i θ 2 , we have
ϕ ( s ) = 1 1 2 s σ 2 / n ; ψ ( s ) = 1 2 ln 1 2 s σ 2 n .
and so,
ϕ ( α s ) = 1 1 + 2 α s σ 2 / n ; ψ ( α s ) = σ 2 / n 1 + 2 α s σ 2 / n .
(58) E θ | 1 n i = 1 n Y i θ | t = E θ X t / 2 ( α e ) s 1 + 2 α s σ 2 / n σ 2 / n 1 + 2 α s σ 2 / n t / 2 + s (59) = σ 2 n t / 2 + s · ( α e ) s ( 1 + 2 α s σ 2 / n ) ( t + 1 ) / 2 + s .
with either s 1 t / 2 or s t / 2 . For α = ζ n / σ 2 ( ζ > 0 being a constant), we have:
E θ | 1 n i = 1 n Y i θ | t σ t n t / 2 · sup ζ > 0 , s > 1 t / 2 ( ζ e ) s ( 1 + 2 ζ s ) ( t + 1 ) / 2 + s
where for t [ 0 , 2 ] , the first factor, σ t / n t / 2 , is the Jensen upper bound. The second factor,
μ t = sup ζ > 0 , s > 1 t / 2 ( ζ e ) s ( 1 + 2 ζ s ) ( t + 1 ) / 2 + s ,
is the gap between the Jensen upper bound and the proposed lower bound. In Figure 4, we display this factor. The result μ 2 = 1 is expected, because for t = 2 and s = 0 , the calculation is trivially exact. Note that the maximization over ζ , for a given s, can be carried out in closed form by equating to zero the partial derivative of ln [ ( ζ e ) s / ( 1 + 2 ζ s ) ( t + 1 ) / 2 + s ] with respect to ζ . The optimal ζ turns out to be equal to 1 / ( t + 1 ) (independently of s), and so,
μ t = sup s > 1 t / 2 t + 1 t + 2 s + 1 ( t + 1 ) / 2 · e t + 2 s + 1 s .
Finally, it should be pointed out that this family of Jensen-like bounds opens the door also to lower-bound calculations on the form E { f ( X ) / g ( X ) } , where f is non-negative convex and g is non-negative and concave. Using the fact the identity 1 / s = 0 e s t d t , we have:
(63) E f ( X ) g ( X ) = E f ( X ) · 0 e t g ( X ) d t (64) = 0 E e t g ( X ) f ( X ) d t
and we can apply the same ideas as before to the integrand, having the freedom to optimize the bound parameters with possible dependence on t.

5. A Product of Two Non-Negative Convex Functions

The last family of Jensen-like bounds that we present in this work is associated with the product of two non-negative convex functions. Let both f and g be non-negative convex functions of x 0 . Then,
(65) E { f ( X ) g ( X ) } E { [ f ( a ) + f ( a ) ( X a ) ] · g ( X ) } (66) = [ f ( a ) a f ( a ) ] E { g ( X ) } + f ( a ) E { X g ( X ) ) } [ f ( a ) a f ( a ) ] E { [ g ( b ) + g ( b ) ( X b ) ] } + (67) f ( a ) E { X [ g ( c ) + g ( c ) ( X c ) ] } f ( a ) a f ( a ) 0 = [ f ( a ) a f ( a ) ] · [ g ( b ) b g ( b ) + g ( b ) E { X } ] + (68) f ( a ) [ ( g ( c ) c g ( c ) ) E { X } + g ( c ) E { X 2 } } ] .
The optimal b and c are b = E { X } and c = E { X 2 } / E { X } , respectively. Thus,
E { f ( X ) g ( X ) } [ f ( a ) a f ( a ) ] · g ( E { X } ) + f ( a ) E { X } · g E { X 2 } E { X } .
Let
a = E { X } · g ( E { X 2 } / E { X } ) g ( E { X } )
and assume that f ( a ) a f ( a ) 0 . Then, a is the optimal value of a, which yields
E { f ( X ) g ( X ) } f E { X } · g ( E { X 2 } / E { X } ) g ( E { X } ) · g ( E { X } ) .
More generally, when X and Y are two random variables with a joint distribution, the above derivation easily extends to
E { f ( X ) g ( Y ) } f E { X } · g ( E { X Y } / E { X } ) g ( E { Y } ) · g ( E { Y } ) .
If f and g are both concave, rather than convex, then the inequalities are reversed.
Example 7. 
Consider again the example of the capacity of the AWGN with a random SNR, c ( Z ) = ln ( 1 + g Z ) , and suppose that we wish to bound the variance of c ( Z ) in order to assess the fluctuations (e.g., for the purpose of bounding the outage probability). Then, obviously,
V a r { c ( Z ) } = E { c 2 ( Z ) } [ E { c ( Z ) } ] 2 = E { ln 2 ( 1 + g Z ) } [ E { ln ( 1 + g Z ) } ] 2 .
To upper bound V a r { c ( Z ) } , we may derive an upper bound to E { ln 2 ( 1 + g Z ) } and a lower bound to E { ln ( 1 + g Z ) } . For the latter, a lower bound was already proposed earlier in Example 5. For the former, we may use the present inequality with the choice f ( z ) = g ( z ) = ln ( 1 + g z ) , which can easily be shown to satisfy the requirements. We then obtain the following upper bound, which depends merely on the first two moments of Z:
E { ln 2 ( 1 + g Z ) } ln ( 1 + g E { Z } ) · ln 1 + g E { Z } ln ( 1 + g E { Z 2 } / E { Z } ) ln ( 1 + g E { Z } ) .
Interestingly, the function ln 2 ( 1 + g x ) is neither convex nor concave, yet our approach offers an upper bound, which is fairly easy to calculate provided that one can compute the first two moments of Z.

6. Conclusions

In this work, we have revisited the Jensen inequality on the basis of taking advantage of the freedom to optimize the choice of the supporting line that is tangential to the given convex function. This optimal choice might be different than the ordinary one when the convex function does not stand alone, but it is rather only part of a more complicated expression. This more complicated expression can sometimes be created in an artificial manner, such as in Examples 2, 5 and 6. The resulting bounds depend on either the first two moments of the independent variable, X, or on its MGF and its derivative. Both types of moments often lend themselves to relatively easy calculations. The proposed methodology can be used both for improving on the ordinary Jensen inequality (such as in Examples 2 and 4), and for generating lower bounds to expectations of non-convex or even concave (rather than convex) functions (such as in Examples 1, 2, 5 and 7). Several families of Jensen-like inequalities have been derived along with a demonstration of numerical examples with application to information theory. The tightness of the inequalities obtained was also demonstrated in those examples.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
  2. Xiao, L.; Lu, G. A new refinement of Jensen’s inequality with applications in information theory. Open Math. 2020, 18, 1748–1759. [Google Scholar] [CrossRef]
  3. Deng, Y.; Ullah, H.; Khan, M.A.; Iqbal, S.; Wu, S. Refinements of Jensen’s inequality via majorization results with applications in information theory. Hindawi J. Math. 2021, 2021, 1951799. [Google Scholar] [CrossRef]
  4. Wu, S.; Khan, M.A.; Saeed, T.; Sayed, Z.M.M.M. A refined Jensen inequality connected to an arbitrary positive finite sequence. Mathematics 2022, 10, 4817. [Google Scholar] [CrossRef]
  5. Sayyari, Y.; Barsam, H.; Sattarzadeh, A.R. On new refinement of the Jensen inequality using uniformly convex functions with applications. Appl. Anal. 2023. [Google Scholar] [CrossRef]
  6. Jaafari, E.; Asgari, M.S.; Hosseini, M.S.; Moosavi, B. On the Jensen’s inequality and its variants. AIMS Math. 2020, 5, 1177–1185. [Google Scholar] [CrossRef]
  7. Matković, A.; Pečarić, J. A variant of Jensen’s inequality for convex functions of several variables. J. Math. 2007, 1, 45–51. [Google Scholar] [CrossRef] [Green Version]
  8. Bakula, M.K.; Matković, A.; Pečarić, J. On a variant of Jensen’s inequality for functions of nondecreasing increments. J. Korean Math. Soc. 2008, 45, 821–834. [Google Scholar] [CrossRef] [Green Version]
  9. Seuret, A.; Gouaisbaut, F. Reducing the Gap of Jensen’s Inequality by Using Wirtinger Inequality. Preprint Submitted to Automatica. 12 July 2012. Available online: https://www.semanticscholar.org/paper/Reducing-the-gap-of-Jensen’s-inequality-by-using-Seuret-Gouaisbaut/1751273aa96d157ee143e3e7212fa04e1798ef11 (accessed on 23 March 2023).
  10. Walker, S.G. On a lower bound for the Jensen inequality. SIAM J. Math. Anal. 2014, 46, 3151–3157. [Google Scholar] [CrossRef]
  11. Liao, J.G.; Berg, A. Sharpening Jensen’s inequality. Am. Stat. 2019, 73, 278–281. [Google Scholar] [CrossRef] [Green Version]
  12. Simić, S.; Almohsen, B. Some generalizations of Jensen’s inequality. Contemp. Math. 2021, 2, 1–14. [Google Scholar] [CrossRef]
  13. Jebara, T.; Pentland, A. On reversing Jensen’s inequality. In Proceedings of the 13th International Conference on Neural Information Processing Systems (NIPS 2000), Denver, CO, USA, 1 January 2000; pp. 213–219. [Google Scholar]
  14. Budimir, I.; Dragomir, S.S.; Pečarić, J. Further reverse results for Jensen’s discrete inequality and applications in information theory. J. Inequalities Pure Appl. Math. 2001, 2, 1–14. [Google Scholar]
  15. Simić, S. On an upper bound for Jensen’s inequality. J. Inequalities Pure Appl. Math. 2009, 10, 60. [Google Scholar]
  16. Simić, S. On a new converse of Jensen’s inequality. Publ. L’inst. Math. 2009, 85, 107–110. [Google Scholar] [CrossRef]
  17. Dragomir, S.S. Some reverses of the Jensen inequality with applications. Bull. Aust. Math. Soc. 2013, 87, 177–194. [Google Scholar] [CrossRef] [Green Version]
  18. Dragomir, S.S. Some reverses of the Jensen inequality for functions of selfadjoint operators in Hilbert spaces. J. Inequalities Appl. 2010, 2010, 496821. [Google Scholar] [CrossRef] [Green Version]
  19. Khan, S.; Khan, M.A.; Chu, Y.-M. New converses of Jensen inequality via Green functions with applications. Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A Mat. 2020, 114, 1–14. [Google Scholar] [CrossRef]
  20. Khan, S.; Khan, M.A.; Chu, Y.-M. Converses of Jensen inequality derived from the Green functions with applications in information theory. Math. Methods Appl. Sci. 2020, 43, 2577–2587. [Google Scholar] [CrossRef]
  21. Wunder, G.; Groß, B.; Fritschek, R.; Schaefer, R.F. A reverse Jensen inequality result with application to mutual information estimation. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW 2021), Kanazawa, Japan, 17–21 October 2021; Available online: https://arxiv.org/pdf/2111.06676.pdf (accessed on 25 March 2023).
  22. Merhav, N. Reversing Jensen’s inequality for information–theoretic analyses. Information 2022, 13, 39. [Google Scholar] [CrossRef]
  23. Ali, M.A.; Budak, H.; Zhang, Z. A new extension of quantum Simpson’s and quantum Newton’s inequalities for quantum differentiable convex functions. Math. Methods Appl. Sci. 2022, 45, 1845–1863. [Google Scholar] [CrossRef]
  24. Budak, H.; Ali, M.A.; Tarhanaci, M. Some new quantum Hermite-Hadamard like inequalities for co-ordinated convex functions. J. Optim. Appl. 2020, 186, 899–910. [Google Scholar] [CrossRef]
  25. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  26. Krichevsky, R.E.; Trofimov, V.K. The performance of universal encoding. IEEE Trans. Inform. Theory 1981, 27, 199–207. [Google Scholar] [CrossRef] [Green Version]
  27. Merhav, N.; Cohen, A. Universal randomized guessing with application to asynchronous decentralized brute-force attacks. IEEE Trans. Inform. Theory 2020, 66, 114–129. [Google Scholar] [CrossRef]
  28. Dong, A.; Zhang, H.; Wu, D.; Yuan, D. Logarithmic expectation of the sum of exponential random variables for wireless communication performance evaluation. In Proceedings of the 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall), Boston, MA, USA, 6–9 September 2015. [Google Scholar]
  29. Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
Figure 1. Upper and lower bounds on E ln 1 + i = 1 k Y i 2 , where Y i N ( 0 , σ 2 ) are i.i.d., for σ 2 = 1 and k = 1 , 2 , , 100 . The red curve is the upper bound, ln ( 1 + k σ 2 ) , which is obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of Equation (45), where the search over α was carried out with a resolution of 0.001 .
Figure 1. Upper and lower bounds on E ln 1 + i = 1 k Y i 2 , where Y i N ( 0 , σ 2 ) are i.i.d., for σ 2 = 1 and k = 1 , 2 , , 100 . The red curve is the upper bound, ln ( 1 + k σ 2 ) , which is obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of Equation (45), where the search over α was carried out with a resolution of 0.001 .
Entropy 25 00752 g001
Figure 2. Upper and lower bounds on E ln ( 1 + g Z ) , where Z is distributed exponentially with parameter θ , as functions of θ , for g = 5 . The red curve is the upper bound, ln ( 1 + g / θ ) , obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of of Equation (49), where the search over α was carried out with resolution of 0.001 . The green curve is the lower bound of [22] (Example 1).
Figure 2. Upper and lower bounds on E ln ( 1 + g Z ) , where Z is distributed exponentially with parameter θ , as functions of θ , for g = 5 . The red curve is the upper bound, ln ( 1 + g / θ ) , obtained by applying the ordinary Jensen inequality. The blue curve is the lower bound of of Equation (49), where the search over α was carried out with resolution of 0.001 . The green curve is the lower bound of [22] (Example 1).
Entropy 25 00752 g002
Figure 3. Upper and lower bounds on E { t = 1 n Y t } as functions of n, where { Y t } are i.i.d., Bernoulli ( 0.2 ) . The red curve is the Jensen upper bound, n p , and the blue curve is the proposed lower bound where α is optimized in the range [ 0 , 10 ] and s is optimized in the range [ 0.5 , 10 ] , both with resolution of 0.01 .
Figure 3. Upper and lower bounds on E { t = 1 n Y t } as functions of n, where { Y t } are i.i.d., Bernoulli ( 0.2 ) . The red curve is the Jensen upper bound, n p , and the blue curve is the proposed lower bound where α is optimized in the range [ 0 , 10 ] and s is optimized in the range [ 0.5 , 10 ] , both with resolution of 0.01 .
Entropy 25 00752 g003
Figure 4. The gap factor, μ t , as a function of t. The parameter s is optimized in the range [ 1 t / 2 , 10 ] with a resolution of 0.001 .
Figure 4. The gap factor, μ t , as a function of t. The parameter s is optimized in the range [ 1 t / 2 , 10 ] with a resolution of 0.001 .
Entropy 25 00752 g004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Merhav, N. Some Families of Jensen-like Inequalities with Application to Information Theory. Entropy 2023, 25, 752. https://doi.org/10.3390/e25050752

AMA Style

Merhav N. Some Families of Jensen-like Inequalities with Application to Information Theory. Entropy. 2023; 25(5):752. https://doi.org/10.3390/e25050752

Chicago/Turabian Style

Merhav, Neri. 2023. "Some Families of Jensen-like Inequalities with Application to Information Theory" Entropy 25, no. 5: 752. https://doi.org/10.3390/e25050752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop