Next Article in Journal
Mathematical Modeling the Time-Delay Interactions between Tumor Viruses and the Immune System with the Effects of Chemotherapy and Autoimmune Diseases
Next Article in Special Issue
Some New Beesack–Wirtinger-Type Inequalities Pertaining to Different Kinds of Convex Functions
Previous Article in Journal
Extreme Multistability and Its Incremental Integral Reconstruction in a Non-Autonomous Memcapacitive Oscillator
Previous Article in Special Issue
Companion to the Ostrowski–Grüss-Type Inequality of the Chebyshev Functional with an Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Refinement of Discrete Lah–Ribarič Inequality and Applications on Csiszár Divergence

1
Department of Media and Communication, University North, 48000 Koprivnica, Croatia
2
Croatian Academy of Sciences and Arts, 10000 Zagreb, Croatia
3
Department of Mathematics, Faculty of Science, University of Split, 21000 Split, Croatia
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(5), 755; https://doi.org/10.3390/math10050755
Submission received: 24 January 2022 / Revised: 17 February 2022 / Accepted: 23 February 2022 / Published: 26 February 2022
(This article belongs to the Special Issue Mathematical Inequalities with Applications)

Abstract

:
In this paper we give a new refinement of the Lah–Ribarič inequality and, using the same technique, we give a refinement of the Jensen inequality. Using these results, a refinement of the discrete Hölder inequality and a refinement of some inequalities for discrete weighted power means and discrete weighted quasi-arithmetic means are obtained. We also give applications in the information theory; namely, we give some interesting estimations for the discrete Csiszár divergence and for its important special cases.

1. Introduction

Research of the classical inequalities, such as the Jensen, the Hölder and similar, has experienced great expansion. These inequalities first appeared in discrete and integral forms, and then many generalizations and improvements have been proved (for instance, see [1,2]). Lately, they are proven to be very useful in information theory (for instance, see [3]).
Let I be an interval in R and f : I R a convex function. If x = x 1 , , x n is any n-tuple in I n and p = p 1 , , p n a nonnegative n-tuple such that P n = i = 1 n p i > 0 , then the well known Jensen’s inequality
f 1 P n i = 1 n p i x i 1 P n i = 1 n p i f x i
holds (see [4,5] or for example [6] (p. 43)). If f is strictly convex then (1) is strict unless x i = c for all i j : p j > 0 .
Jensen’s inequality is one of the most famous inequalities in convex analysis, for which special cases are other well-known inequalities (such as Hölder’s inequality, A-G-H inequality, etc.). Beside mathematics, it has many applications in statistics, information theory, and engineering.
Strongly related to Jensen’s inequality is the Lah–Ribarič inequality (see [7]),
1 P n i = 1 n p i f x i M x ¯ M m f m + x ¯ m M m f M ,
which holds when f : I R is a convex function on I , m , M I , < m < M < + , p is as in (1), x = x 1 , , x n is any n-tuple in [ m , M ] n and x ¯ = 1 P n i = 1 n p i x i . If f is strictly convex then (4) is strict unless x i m , M for all i j : p j > 0 .
The Lah–Ribarič inequality has been largely investigated and the interested reader can find many related results in the recent literature as well as in monographs such as [6,8,9]. It is interesting to find further refinements of the above inequality.
Our main result will be refinement of the inequality (2).
Using the same technique, we will give a refinement of the inequality (1) (see [10]).
In addition, we deal with the notion of f-divergences which measure the distance between two probability distributions. One of the most important is the Csiszár f-divergence, some special cases of which are the Shannon entropy, Jeffrey’s distance, Kullback–Leibler divergence, the Hellinger distance, and the Bhattacharyya distance. We deduce the relations for the mentioned f-divergences.
Let us say few words about the organization of the paper. In the following section we give a new refinement of the Lah–Ribarič inequality and state a known refinement of the Jensen inequality using the same technique. Using obtained results we give a refinement of the famous Hölder inequality and some new refinements for the weighted power means and quasi-arithmetic means. In addition, we give a historical remark regarding the Jensen–Boas inequality. In Section 3, we give the results for various f-divergences. These are further examined for the Zipf–Mandelbrot law.

2. New Refinements

The starting point of this consideration is the following lemma (see [11]).
Lemma 1.
Let f be a convex function on an interval I. If a , b , c , d I such that a b < c d , then the inequality
c u c b f ( b ) + u b c b f ( c ) d u d a f ( a ) + u a d a f ( d )
holds for any u [ b , c ] .
The main result is a refinement of the Lah–Ribarič inequality (2). As we will see, its proof is based on the idea from the proof of the Jensen–Boas inequality.
Theorem 1.
Let f : I R be a convex function on I , m , M I , < m < M < + , p is as in (1), x = x 1 , , x n be any n-tuple in [ m , M ] n and x ¯ = 1 P n i = 1 n p i x i . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i p j > 0 , for i = 1 , , m and m i = min { x j : j N i } , M i = max { x j : j N i } , for i = 1 , , m . Then
1 P n i = 1 n p i f x i 1 P n i = 1 m j N i p j M i x ¯ i M i m i f ( m i ) + x ¯ i m i M i m i f ( M i ) M x ¯ M m f m + x ¯ m M m f M
holds, where
x ¯ = 1 P n i = 1 n p i x i , x ¯ i = 1 j N i p j j N i p j x j .
If f is concave on I, then the inequalities in (3) are reversed.
Proof. 
We have
1 P n i = 1 n p i f ( x i ) = 1 P n i = 1 m j N i p j f ( x j ) = 1 P n i = 1 m j N i p j 1 j N i p j j N i p j f ( x j ) .
Using the Lah–Ribarič inequality (2) for each of the subsets N i , we obtain
1 P n i = 1 m j N i p j 1 j N i p j j N i p j f ( x j ) 1 P n i = 1 m j N i p j M i 1 j N i p j j N i p j x j M i m i f ( m i ) + 1 j N i p j j N i p j x j m i M i m i f ( M i ) = 1 P n i = 1 m j N i p j M i x ¯ i M i m i f ( m i ) + x ¯ i m i M i m i f ( M i ) .
Using m m i x ¯ i M i M , m < M , m i < M i and Lemma 1, we obtain
1 P n i = 1 m j N i p j M i x ¯ i M i m i f ( m i ) + x ¯ i m i M i m i f ( M i ) 1 P n i = 1 m j N i p j M x ¯ i M m f ( m ) + x ¯ i m M m f ( M ) = M x ¯ M m f ( m ) + x ¯ m M m f ( M )
Remark 1.
If N i = x j ( N i = 1 ) , the related term in the sum on the right-hand side of the first inequality in the proof of Theorem 1 remains unaltered (i.e., is equal to f x j ).
Using the same technique, we obtain the following refinement of the Jensen inequality (1).
Theorem 2.
Let I be an interval in R and f : I R a convex function. Let x = x 1 , , x n is any n-tuple in I n and p = p 1 , , p n a nonnegative n-tuple such that P n = i = 1 n p i > 0 . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i p j > 0 , i = 1 , , m . Then
f 1 P n i = 1 n p i x i 1 P n i = 1 m j N i p j f j N i p j x j j N i p j 1 P n i = 1 n p i f x i
holds.
If f is concave on I, then the inequalities in (4) are reversed.
Proof. 
We have
f 1 P n i = 1 n p i x i = f 1 P n i = 1 m j N i p j x j = f 1 P n i = 1 m j N i p j j N i p j x j j N i p j .
Using Jensen’s inequality (1), we obtain
f 1 P n i = 1 m j N i p j j N i p j x j j N i p j 1 P n i = 1 m j N i p j f j N i p j x j j N i p j 1 P n i = 1 m j N i p j 1 j N i p j j N i p j f x j = 1 P n i = 1 m j N i p j f x j ,
which is (4). □
We can find this idea for proving the refinement of our main result (and the refinement of the Jensen inequality) in one other well-known result (see [6] (pp. 55–60)).
In Jensen’s inequality there is a condition “ p = p 1 , , p n a nonnegative n-tuple such that P n = i = 1 n p i > 0 ”. In 1919, Steffensen gave the same inequality (1) with slightly relaxed conditions (see [12]).
Theorem 3
(Jensen–Steffensen). If f : I R is a convex function, x is a real monotonic n-tuple such that x i I , i = 1 , , n , and p is a real n-tuple such that
0 P k P n , k = 1 , , n , P n > 0 .
Then (1) holds. If f is strictly convex, then inequality (1) is strict unless x 1 = x 2 = = x n .
One of many generalizations of the Jensen inequality is the Riemann–Stieltjes integral form of the Jensen inequality.
Theorem 4
(the Riemann–Stieltjes form of Jensen’s inequality). Let ϕ : I R be a continuous convex function where I is the range of the continuous function f : [ a , b ] R . Inequality
ϕ a b f ( x ) d λ ( x ) a b d λ ( x ) a b ϕ f ( x ) d λ ( x ) a b d λ ( x )
holds provided that λ is increasing, bounded and λ ( a ) λ ( b ) .
Analogously, integral form of the Jensen–Steffensen’s inequality is given.
Theorem 5
(The Jensen–Steffensen). If f is continuous and monotonic (either increasing or decreasing) and λ is either continuous or of bounded variation satisfying
λ ( a ) λ ( x ) λ ( b ) for   all x [ a , b ] , λ ( a ) < λ ( b ) ,
then (5) holds.
In 1970, Boas gave the integral analogue of Jensen–Steffensen’s inequality with slightly different conditions.
Theorem 6
(the Jensen–Boas inequality). If f is continuous or of bounded variation satisfying
λ ( a ) λ ( x 1 ) λ ( y 1 ) λ ( x 2 ) λ ( y n 1 ) λ ( x n ) λ ( b )
for all x k y k 1 , y k , and λ ( b ) > λ ( a ) , and if f is continuous and monotonic (either increasing or decreasing) in each of the n 1 intervals y k 1 , y k , then inequality (5) holds.
In 1982, J. Pečarić gave the following proof of the Jensen–Boas inequality.
Proof. 
If λ ( a ) < λ ( x 1 ) < λ ( y 1 ) < λ ( x 2 ) < < λ ( y n 1 ) < λ ( x n ) < λ ( b ) with the notation
p k = y k 1 y k d λ ( x ) , t k = y k 1 y k f ( x ) d λ ( x ) y k 1 y k d λ ( x ) , k = 1 , , n
we have
ϕ a b f ( x ) d λ ( x ) a b d λ ( x ) = ϕ k = 1 n y k 1 y k f ( x ) d λ ( x ) k = 1 n y k 1 y k d λ ( x ) = ϕ k = 1 n p k t k k = 1 n p k .
Using Jensen’s inequality (1), we obtain
ϕ k = 1 n p k t k k = 1 n p k 1 k = 1 n p k k = 1 n p k ϕ t k = 1 k = 1 n p k k = 1 n p k ϕ y k 1 y k f ( x ) d λ ( x ) y k 1 y k d λ ( x ) .
Using Jensen–Steffensen’s inequality (5) on each subinterval [ y k 1 , y k ] , k = 1 , , n , we obtain
1 k = 1 n p k k = 1 n p k ϕ y k 1 y k f ( x ) d λ ( x ) y k 1 y k d λ ( x ) 1 k = 1 n p k k = 1 n p k 1 y k 1 y k d λ ( x ) y k 1 y k ϕ f ( x ) d λ ( x ) = 1 k = 1 n y k 1 y k d λ ( x ) k = 1 n y k 1 y k ϕ f ( x ) d λ ( x ) = a b ϕ f ( x ) d λ ( x ) a b d λ ( x ) .
If λ ( y j 1 ) = λ ( y j ) , for some j, then d λ ( x ) = 0 on [ y j 1 , y j ] and we can easily prove that the Jensen–Boas inequality is valid. □
If we look at the previous proof, we see that the technique is the same as for our main result and the refinement of the Jensen inequality.
By using Theorem 2, we obtain the following refinement of the discrete Hölder inequality (see [13,14]).
Corollary 1.
Let p , q > 1 such that 1 p + 1 q = 1 . Let a = ( a 1 , a 2 , , a n ) , b = ( b 1 , b 2 , , b n ) such that a i , b i > 0 , i = 1 , , n . Then:
i = 1 n a i b i i = 1 n b i q 1 q i = 1 m j N i b j q 1 p j N i a j b j p 1 p i = 1 n a i p 1 p i = 1 n b i q 1 q .
Proof. 
We use Theorem 2 with p i = b i q > 0 , x i = a i b i q p > 0 . Then p i x i = b i q a i b i q p = a i b i q q p = a i b i q ( 1 1 p ) = a i b i q 1 q = a i b i and from (4), we obtain
f 1 i = 1 n b i q i = 1 n a i b i 1 i = 1 n b i q i = 1 m j N i b j q f j N i a j b j j N i b j q 1 i = 1 n b i q i = 1 n b i q f a i b i q p .
For the function f ( t ) = t p from (7), we obtain
1 i = 1 n b i q i = 1 n a i b i p 1 i = 1 n b i q i = 1 m j N i b j q j N i a j b j j N i b j q p 1 i = 1 n b i q i = 1 n b i q a i b i q p p = 1 i = 1 n b i q i = 1 n a i p
Multiplying with i = 1 n b i q p , and raising to the power of 1 p , we obtain
i = 1 n a i b i i = 1 n b i q 1 1 p i = 1 m j N i b j q 1 p j N i a j b j p 1 p i = 1 n b i q 1 1 p i = 1 n a i p 1 p
which is (6). □
Corollary 2.
Using the same conditions as in previous corollary for p R , p < 1 , p 0 , we obtain
i = 1 n a i p 1 p i = 1 n b i q 1 q i = 1 m j N i b j q 1 q j N i a j p 1 p i = 1 n a i b i .
Proof. 
First for 0 < p < 1 . We use Theorem 2 with p i = b i q > 0 , x i = a i p b i q > 0 . Then p i x i = b i q a i p b i q = a i p and from (4), we obtain
f 1 i = 1 n b i q i = 1 n a i p 1 i = 1 n b i q i = 1 m j N i b j q f j N i a j p j N i b j q 1 i = 1 n b i q i = 1 n b i q f a i p b i q .
For the function f ( t ) = t 1 p , we obtain
1 i = 1 n b i q i = 1 n a i p 1 p 1 i = 1 n b i q i = 1 m j N i b j q j N i a j p j N i b j q 1 p 1 i = 1 n b i q i = 1 n b i q a i p b i q 1 p .
Multiplying with i = 1 n b i q 1 p , and then with i = 1 n b i q 1 q , we obtain
i = 1 n a i p 1 p i = 1 n b i q 1 q i = 1 m j N i b j q 1 q j N i a j p 1 p i = 1 n a i b i ,
which is (8).
If p < 0 , then 0 < q < 1 , and the same result follows from symmetry (see comments in Corollary 1). □
It is interesting to show how the previously obtained results impact the study of the weighted discrete power means and the weighted discrete quasi-arithmetic means.
Let n N , n 2 , x = ( x 1 , , x n ) , p = ( p 1 , , p n ) , x i , p i R + . The weighted discrete power means of order r R are defined as
M r ( x , p ) = 1 P n i = 1 n p i x i r 1 r , r 0 , i = 1 n x i p i 1 P n , r = 0 .
Using Theorem 2, we obtain the following inequalities for the weighted discrete power means. Let us notice that left-hand side and right-hand side of both inequalities are the same; only mixed means in the middle, which are a refinement, change.
Corollary 3.
Let n N , n 2 , x = ( x 1 , , x n ) , p = ( p 1 , , p n ) , x i , p i R + . Let s , t R such that s t . Then
M s ( x , p ) 1 P n i = 1 m j N i p j M s t ( x N i , p N i ) 1 t M t ( x , p ) ,
M s ( x , p ) 1 P n i = 1 m j N i p j M t s ( x N i , p N i ) 1 s M t ( x , p ) ,
where x N i = ( x j 1 i , , x j k i i ) , p N i = ( p j 1 i , , p j k i i ) , k i = N i , N i = { j 1 i , , j k i i } , for i = 1 , , m .
Proof. 
We use Theorem 2 with f ( x ) = x t s for x > 0 , s , t R , t > 0 , s 0 , s t . From (4), we obtain
1 P n i = 1 n p i x i t s 1 P n i = 1 m j N i p j j N i p j x j j N i p j t s 1 P n i = 1 n p i x i t s .
Substituting x i with x i s , and then raising to the power 1 t , we obtain
1 P n i = 1 n p i x i s 1 s t 1 P n i = 1 m j N i p j j N i p j x j s j N i p j 1 s t 1 P n i = 1 n p i ( x i s ) t s ,
which is (9).
Similarly, we use Theorem 2 with f ( x ) = x s t for x > 0 , s , t R , s , t > 0 , s t . We obtain
1 P n i = 1 n p i x i s t 1 P n i = 1 m j N i p j j N i p j x j j N i p j s t 1 P n i = 1 n p i x i s t .
Substituting x i with x i t , and then raising to the power 1 s , inequality (10) easily follows. Other cases follow similarly. □
Let I be an interval in R . Let n N , n 2 , x = ( x 1 , , x n ) , p = ( p 1 , , p n ) , x i I , p i R + . Then, for a strictly monotone continuous function h : I R , the discrete weighted quasi-arithmetic mean is defined as
M h ( x , p ) = h 1 1 P n i = 1 n p i h ( x i ) .
Using Theorem 2, we obtain the following inequalities for quasi-arithmetic means.
Corollary 4.
Let I be an interval in R . Let n N , n 2 , x = ( x 1 , , x n ) , p = ( p 1 , , p n ) , x i I , p i R + . Let h : I R be a strictly monotone continuous function such that f h 1 convex. Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i p j > 0 , i = 1 , , m . Then
f M h ( x , p ) 1 P n i = 1 m j N i p j f M h ( x N i , p N i ) 1 P n i = 1 n p i f x i ,
where x N i = ( x j 1 i , , x j k i i ) , p N i = ( p j 1 i , , p j k i i ) , k i = N i , N i = { j 1 i , , j k i i } , for i = 1 , , m .
Proof. 
Theorem 2 with f f h 1 and x i h ( x i ) gives
f h 1 1 P n i = 1 n p i h ( x i ) 1 P n i = 1 m j N i p j f h 1 j N i p j h ( x j ) j N i p j 1 P n i = 1 n p i f x i .

3. Applications in Information Theory

In this section we give basic results concerning the discrete Csiszár f-divergence. In addition, bounds for the divergence of the Zipf–Mandelbrot law are obtained.
Let us denote the set of all probability densities by P , i.e., p = ( p 1 , , p n ) P if p i [ 0 , 1 ] for i = 1 , , n and i = 1 n p i = 1 .
In [15], Csiszár introduced the f-divergence functional as
D f ( p , q ) = i = 1 n q i f p i q i ,
where f : [ 0 , + is a convex function, and it represents a “distance function” on the set of probability distributions P .
In order to use nonnegative probability distributions in the f-divergence functional, we assume, as usual,
f ( 0 ) : = lim t 0 + f ( t ) , 0 · f 0 0 : = 0 , 0 · f a 0 : = lim t 0 + t f a t
and the following definition of a generalized f-divergence functional is given.
Definition 1
(the Csiszár f-divergence functional). Let J R be an interval, and let f : J R be a function. Let p = ( p 1 , , p n ) be an n-tuple of real numbers and q = ( q 1 , , q n ) be an n-tuple of nonnegative real numbers such that p i / q i J for every i = 1 , , n . The Csiszár f-divergence functional is defined as
D ^ f ( p , q ) : = i = 1 n q i f p i q i .
Theorem 7.
Let I be an interval in R and f : I R a convex function. Let p = ( p 1 , , p n ) be an n-tuple of real numbers and q = ( q 1 , , q n ) be an n-tuple of nonnegative real numbers such that p i / q i I for every i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , i = 1 , , m and j N i p j j N i q j I , i = 1 , , m . Then
f P n Q n 1 P n i = 1 m j N i q j f j N i p j j N i q j 1 Q n D ^ f ( p , q )
holds.
Proof. 
Using Theorem 2 with p i q i and x i p i q i , we obtain
f 1 Q n i = 1 n q i p i q i 1 P n i = 1 m j N i q j f j N i q j p j q j j N i q j 1 Q n i = 1 n q i f p i q i ,
which is (13). □
Corollary 5.
If in the previous theorem we take p and q to be probability distributions, and we directly obtain the following result:
f 1 i = 1 m j N i q j f j N i p j j N i q j D f ( p , q ) .
Theorem 8.
Let f : I R be a convex function on I , m , M I , < m < M < + . Let p = ( p 1 , , p n ) be an n-tuple of real numbers and q = ( q 1 , , q n ) be an n-tuple of nonnegative real numbers such that m p i q i M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , for i = 1 , , m and m i = min { p j / q j : j N i } , M i = max { p j / q j : j N i } , for i = 1 , , m . Then
D ^ f ( p , q ) i = 1 m j N i q j M i j N i p j j N i q j M i m i f ( m i ) + j N i p j j N i q j m i M i m i f ( M i ) M P n Q n M m f m + P n Q n m M m f M
holds.
Proof. 
Using Theorem 1 with p i q i and x i p i q i , we obtain
1 Q n i = 1 n q i f p i q i 1 Q n i = 1 m j N i q j M i 1 j N i q j j N i q j p j q j M i m i f ( m i ) + 1 j N i q j j N i q j p j q j m i M i m i f ( M i ) M 1 i = 1 n q i i = 1 n q i p i q i M m f m + 1 i = 1 n q i i = 1 n q i p i q i m M m f M ,
which is (15). □
Corollary 6.
If, in the previous theorem, we take p and q to be probability distributions, we directly obtain the following result:
D f ( p , q ) i = 1 m j N i q j M i j N i p j j N i q j M i m i f ( m i ) + j N i p j j N i q j m i M i m i f ( M i ) M 1 M m f m + 1 m M m f M
If p and q are probability distributions, the Kullback–Leibler divergence, also called relative entropy or KL divergence, is defined as
D K L ( p , q ) : = i = 1 n p i log p i q i .
The next corollary provides us bounds for the Kullback–Leibler divergence of two probability distributions.
Corollary 7.
Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , i = 1 , , m .
  • Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) be n-tuples of nonnegative real numbers. Then
    P n Q n log P n Q n 1 P n i = 1 m j N i p j log j N i p j j N i q j 1 Q n i = 1 n p i log p i q i .
  • Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) P be probability distributions. Then
    0 i = 1 m j N i p j log j N i p j j N i q j D K L ( p , q ) .
Proof. 
Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) be an n-tuples of nonnegative real numbers. Since the function t t log t is convex, first inequality follows from Theorem 7 by setting f ( t ) = t log t .
The second inequality is a special case of the first inequality for probability distributions p and q . □
Corollary 8.
Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , for i = 1 , , m .
  • Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) be n-tuples of nonnegative real numbers. Let m = min { p i / q i : i = 1 , , n } , M = max { p i / q i : i = 1 , , n } , m i = min { p j / q j : j N i } and M i = max { p j / q j : j N i } , for i = 1 , , m . Then
    i = 1 n p i log p i q i i = 1 m j N i q j 1 M i m i log m i m i ( M i j N i p j j N i q j M i M i ( j N i p j j N i q j m i ) 1 M m log m m ( M P n Q n ) M M ( P n Q n m ) .
  • Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) P be probability distributions. Let m = min { p i / q i : i = 1 , , n } , M = max { p i / q i : i = 1 , , n } , m i = min { p j / q j : j N i } and M i = max { p j / q j : j N i } , for i = 1 , , m . Then
    D K L ( p , q ) i = 1 m j N i q j 1 M i m i log m i m i ( M i j N i p j j N i q j M i M i ( j N i p j j N i q j m i ) 1 M m log m m ( M 1 ) M M ( 1 m ) .
Proof. 
Let p = ( p 1 , , p n ) and q = ( q 1 , , q n ) be an n-tuples of nonnegative real numbers. Since the function t t log t is convex, the first inequality follows from Theorem 8 by setting f ( t ) = t log t .
The second inequality is a special case of the first inequality for probability distributions p and q . □
Now we deduce the relations for some more special cases of the Csiszár f-divergence.
Definition 2
(the Shannon entropy). For a p P , the discrete Shannon entropy is defined as
S E ( p ) = i = 1 n p i log p i .
Corollary 9.
Let q P . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , i = 1 , , m . Then
log n i = 1 m j N i q j log j N i q j log | N i | S E ( q ) .
Proof. 
Using Theorem 7 with f ( t ) = log t , t R + and q P , we obtain
log P n i = 1 m j N i q j log j N i p j j N i q j i = 1 n q i log p i q i .
For p i = 1 , i = 1 , , n inequality (17) follows. □
Corollary 10.
Let m , M R + , < m < M < + , q P such that m 1 q i M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , for i = 1 , , m and m i = min { 1 / q j : j N i } , M i = max { 1 / q j : j N i } , for i = 1 , , m . Then
S E ( q ) i = 1 m j N i q j | N i | j N i q j M i M i m i log m i + m i | N i | j N i q j M i m i log M i n M M m log m + m n M m log M
holds.
Proof. 
Using Theorem 8 with f ( t ) = log t , t R + , q P and p i = 1 , , n , we obtain
i = 1 n q i log 1 q i i = 1 m j N i q j M i | N i | j N i q j M i m i log m i + | N i | j N i q j m i M i m i log M i ,
and (17) easily follows. □
Definition 3
(Jeffrey’s distance). For the p , q P the discrete Jeffrey distance is defined as
J d ( p , q ) = i = 1 n ( p i q i ) log p i q i .
Corollary 11.
Let p , q P . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , i = 1 , , m . Then
0 i = 1 m j N i p j j N i q j log j N i p j j N i q j J d ( p , q ) .
Proof. 
Using Corollary 5 with f ( t ) = ( t 1 ) log t , t R + , we obtain
( 1 1 ) log 1 i = 1 m j N i q j j N i p j j N i q j 1 log j N i p j j N i q j i = 1 n q i p i q i 1 log p i q i ,
and (18) easily follows. □
Corollary 12.
Let m , M R + , < m < M < + , p , q P such that m p i q i M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , for i = 1 , , m and m i = min { p j / q j : j N i } , M i = max { p j / q j : j N i } , for i = 1 , , m . Then
J d ( p , q ) i = 1 m j N i q j M i j N i p j j N i q j M i m i ( m i 1 ) log m i + j N i p j j N i q j m i M i m i ( M i 1 ) log M i log M m ( 1 m ) ( M 1 ) M m
holds.
Proof. 
Using Corollary 6 with f ( t ) = ( t 1 ) log t , t R + , we obtain
i = 1 n q i p i q i 1 log p i q i i = 1 m j N i q j M i j N i p j j N i q j M i m i ( m i 1 ) log m i + j N i p j j N i q j m i M i m i ( M i 1 ) log M i M 1 M m ( m 1 ) log m + 1 m M m ( M 1 ) log M ,
and (19) easily follows. □
Definition 4
(the Hellinger distance). For the p , q P , the discrete Hellinger distance is defined as
H d ( p , q ) = i = 1 n ( p i q i ) 2 .
Corollary 13.
Let p , q P . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , i = 1 , , m . Then
0 i = 1 m j N i p j j N i q j 2 H d ( p , q ) .
Proof. 
Using Corollary 5 with f ( t ) = ( t 1 ) 2 , t R + (20) follows. □
Corollary 14.
Let m , M R + , < m < M < + , p , q P such that m p i q i M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , for i = 1 , , m and m i = min { p j / q j : j N i } , M i = max { p j / q j : j N i } , for i = 1 , , m . Then
H d ( p , q ) i = 1 m j N i q j M i j N i p j j N i q j M i m i ( m i 1 ) 2 + j N i p j j N i q j m i M i m i ( M i 1 ) 2 M 1 M m ( m 1 ) 2 + 1 m M m ( M 1 ) 2
holds.
Proof. 
Using Corollary 6 with f ( t ) = ( t 1 ) 2 , t R + (21) follows. □
Definition 5
(Bhattacharyya distance). For the p , q P , the discrete Bhattacharyya distance is defined as
B d ( p , q ) = i = 1 n p i q i .
Corollary 15.
Let p , q P . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } and j N i q j > 0 , i = 1 , , m . Then
1 i = 1 m j N i p j j N i q j B d ( p , q ) .
Proof. 
Using Corollary 5 with f ( t ) = t , t R + (22) follows. □
Corollary 16.
Let m , M R + , < m < M < + , p , q P such that m p i q i M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , j N i q j > 0 , for i = 1 , , m and m i = min { p j / q j : j N i } , M i = max { p j / q j : j N i } , for i = 1 , , m . Then
B d ( p , q ) i = 1 m j N i q j m i M i + j N i p j j N i q j m i M i M i m i ( m M + 1 ) ( m M ) M m
holds.
Proof. 
Using Corollary 6 with f ( t ) = t , t R + (23) follows. □
Now we are going to derive the results from the Theorems (7) and (8) for the Zipf–Mandelbrot law.
The Zipf–Mandelbrot law is a discrete probability distribution and is defined by the following probability mass function:
f ( i ; M , s , t ) = 1 ( i + t ) s H M , s , t , i = 1 , , M ,
where
H M , s , t = i = 1 M 1 ( i + t ) s
is a generalization of the harmonic number and M N , s > 0 and t [ 0 , are parameters.
If we define q as a Zipf–Mandelbrot law M-tuple, we have
q i = 1 ( i + t 2 ) s 2 H M , s 2 , t 2 , i = 1 , , M ,
where
H M , s 2 , t 2 = i = 1 M 1 ( i + t 2 ) s 2 ,
and the Csiszár functional becomes
D ^ f ( p , i , M , s 2 , t 2 ) = i = 1 M 1 ( i + t 2 ) s 2 H M , s 2 , t 2 f p i ( i + t 2 ) s 2 H M , s 2 , t 2 ,
where f : I R , I R , and the parameters M N , s 2 > 0 , t 2 0 are such that p i ( i + t 2 ) s 2 H M , s 2 , t 2 I , i = 1 , , M .
If p and q are both defined as Zipf–Mandelbrot law M-tuples, then the Csiszár functional becomes
D ^ f ( i , M , s 1 , s 2 , t 1 , t 2 ) = i = 1 M 1 ( i + t 2 ) s 2 H M , s 2 , t 2 f ( i + t 2 ) s 2 H M , s 2 , t 2 ( i + t 1 ) s 1 H M , s 1 , t 1 ,
where f : I R , I R , and the parameters M N , s 1 , s 2 > 0 , t 1 , t 2 0 are such that ( i + t 2 ) s 2 H M , s 2 , t 2 ( i + t 1 ) s 1 H M , s 1 , t 1 I , i = 1 , , M .
Now, from Theorem 7, we have the following result.
Corollary 17.
Let I be an interval in R and f : I R a convex function. Let p = ( p 1 , , p n ) be an n-tuple of real numbers and q = ( q 1 , , q n ) be an n-tuple of nonnegative real numbers such that p i / q i I for every i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } . Suppose s 2 > 0 , t 2 0 are such that p i ( i + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , n , j N i p j ( j + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , m . Then
f P n 1 P n i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 f j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 D ^ f ( p , i , n , s 2 , t 2 )
holds.
Proof. 
If we define q as a Zipf–Mandelbrot law n-tuple with parameters s 2 , t 2 , then from Theorem 7 it follows
f P n 1 P n i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 f j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 i = 1 n 1 ( j + t 2 ) s 2 H n , s 2 , t 2 f p i ( i + t 2 ) s 2 H n , s 2 , t 2 ,
which is (24). □
From Theorem 8 we have the following result.
Corollary 18.
Let f : I R be a convex function on I , m , M I , < m < M < + . Let p = ( p 1 , , p n ) be an n-tuple of real numbers. Suppose s 2 > 0 , t 2 0 are such that m p i ( i + t 2 ) s 2 H n , s 2 , t 2 M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , p i ( i + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , n , j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , m and m i = min { p j / ( j + t 2 ) s 2 H n , s 2 , t 2 : j N i } , M i = max { p j / ( j + t 2 ) s 2 H n , s 2 , t 2 : j N i } , for i = 1 , , m . Then
D ^ f ( p , i , n , s 2 , t 2 ) i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i m i f ( m i ) + j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 m i M i m i f ( M i ) M P n M m f m + P n m M m f M
holds.
Proof. 
If we define q as a Zipf–Mandelbrot law n-tuple with parameters s 2 , t 2 , then from Theorem 8 it follows
i = 1 n 1 ( j + t 2 ) s 2 H n , s 2 , t 2 f p i ( i + t 2 ) s 2 H n , s 2 , t 2 i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i m i f ( m i ) + j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 m i M i m i f ( M i ) M P n 1 M m f m + P n 1 m M m f M ,
which is (25). □
Now, from Theorem 7, we also have the following result.
Corollary 19.
Let I be an interval in R and f : I R a convex function. Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } . Suppose s 1 , s 2 > 0 , t 1 , t 2 0 are such that ( i + t 2 ) s 2 H n , s 2 , t 2 ( i + t 1 ) s 1 H n , s 1 , t 1 I , i = 1 , , n , j N i ( j + t 2 ) s 2 H n , s 2 , t 2 ( j + t 1 ) s 1 H n , s 1 , t 1 I , j N i 1 ( j + t 1 ) s 1 H n , s 1 , t 1 j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , m . Then
f 1 1 P n i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 f j N i 1 ( j + t 1 ) s 1 H n , s 1 , t 1 j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 D ^ f ( i , n , s 1 , s 2 , t 1 , t 2 )
holds.
Proof. 
If we define p , q as a Zipf–Mandelbrot law n-tuples with parameters s 1 , t 1 , s 2 , t 2 , then from Theorem 7, we obtain (26). □
From Theorem 8, we have the following result.
Corollary 20.
Let f : I R be a convex function on I , m , M I , < m < M < + . Suppose s 1 , s 2 > 0 , t 1 , t 2 0 are such that m ( i + t 2 ) s 2 H n , s 2 , t 2 ( i + t 1 ) s 1 H n , s 1 , t 1 M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , ( i + t 2 ) s 2 H n , s 2 , t 2 ( i + t 1 ) s 1 H n , s 1 , t 1 I , i = 1 , , n , j N i 1 ( j + t 1 ) s 1 H n , s 1 , t 1 j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , m and m i = min { ( j + t 2 ) s 2 H n , s 2 , t 2 ( j + t 1 ) s 1 H n , s 1 , t 1 : j N i } , M i = max { ( j + t 2 ) s 2 H n , s 2 , t 2 ( j + t 1 ) s 1 H n , s 1 , t 1 : j N i } , for i = 1 , , m . Then
D ^ f ( i , n , s 1 , s 2 , t 1 , t 2 ) i = 1 m j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i j N i 1 ( j + t 1 ) s 1 H n , s 1 , t 1 j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 M i m i f ( m i ) + j N i 1 ( j + t 1 ) s 1 H n , s 1 , t 1 j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 m i M i m i f ( M i ) M 1 M m f m + 1 m M m f M
holds.
Proof. 
If we define p , q as a Zipf–Mandelbrot law n-tuples with parameters s 1 , t 1 , s 2 , t 2 , then from Theorem 8, we obtain (27). □
Since the minimal value for q i is min { q i } = 1 ( n + t 2 ) s 2 H l , s 2 , t 2 and its maximal value is max { q i } = 1 ( 1 + t 2 ) s 2 H l , s 2 , t 2 , from the right-hand side of (24) and the left-hand side of (25), we obtain the following result.
Corollary 21.
Let f : I R + be a convex function on I , m , M I , < m < M < + . Let p = ( p 1 , , p n ) be an n-tuple of real numbers. Suppose s 2 > 0 , t 2 0 are such that m p i ( i + t 2 ) s 2 H n , s 2 , t 2 M , i = 1 , , n . Let N i { 1 , 2 , , n } , i = 1 , , m where N i N j = Ø for i j , i = 1 m N i = { 1 , 2 , , n } , p i ( i + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , n , j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 I , i = 1 , , m and m i = min { p j / ( j + t 2 ) s 2 H n , s 2 , t 2 : j N i } , M i = max { p j / ( j + t 2 ) s 2 H n , s 2 , t 2 : j N i } , for i = 1 , , m . Then
1 P n ( n + t 2 ) s 2 H n , s 2 , t 2 i = 1 m N i f j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 D ^ f ( p , i , n , s 2 , t 2 ) 1 ( 1 + t 2 ) s 2 H n , s 2 , t 2 i = 1 m M i N i ( 1 + t 2 ) s 2 H n , s 2 , t 2 j N i p j M i m i f ( m i ) + ( n + t 2 ) s 2 H n , s 2 , t 2 j N i p j m i N i M i m i f ( M i )
holds.
Proof. 
Using min { q i } = 1 ( n + t 2 ) s 2 H l , s 2 , t 2 and max { q i } = 1 ( 1 + t 2 ) s 2 H l , s 2 , t 2 from the right-hand side of (24) and the left-hand side of (25), we obtain
1 P n i = 1 m j N i 1 ( n + t 2 ) s 2 H n , s 2 , t 2 f j N i p j j N i 1 ( j + t 2 ) s 2 H n , s 2 , t 2 D ^ f ( p , i , n , s 2 , t 2 ) i = 1 m j N i 1 ( 1 + t 2 ) s 2 H n , s 2 , t 2 M i j N i p j j N i 1 ( 1 + t 2 ) s 2 H n , s 2 , t 2 M i m i f ( m i ) + j N i p j j N i 1 ( n + t 2 ) s 2 H n , s 2 , t 2 m i M i m i f ( M i ) ,
and (28) follows. □

4. Conclusions

In this paper we have obtained a refinement of the Lah–Ribarič inequality and a refinement of the Jensen inequality which follows from using the Lah–Ribarič inequality and the Jensen inequality on disjunctive subsets of { 1 , 2 , , n } .
Using these results, we find a refinement of the discrete Hölder inequality and a refinement of some inequalities for the discrete weighted power means and the discrete weighted quasi-arithmetic means. In addition, some interesting estimations for the discrete Csiszár divergence and for its important special cases are obtained.
It would be interesting to see whether using this method one can give refinements of some other inequalities. In addition, we can try to use this method for refining the Jensen inequality and the Lah–Ribarič inequality for operators.

Author Contributions

All authors jointly worked on the results. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dragomir, S.S.; Adil Khan, M.; Abathun, A. Refiment of Jensen’s integral inequality. Open Math. 2016, 14, 221–228. [Google Scholar] [CrossRef] [Green Version]
  2. Jessen, B. Bemaerkinger om konvekse Funktioner og Uligheder imellem Middelvaerdier. I. Mat. Tidsskr. B 1931, 17–29. [Google Scholar]
  3. Merhav, N. Reversing Jensen’s Inequality for Information-Theoretic Analyses. Information 2022, 13, 39. [Google Scholar] [CrossRef]
  4. Jensen, J.L.W.V. Om konvexe funktioner og uligheder mellem Middelvaerdier. Nyt. Tidsskr. Math. 1905, 16 B, 49–69. [Google Scholar]
  5. Nikolova, L.; Persson, L.E.; Varošanec, S. A new look at classical inequalities involving Banach lattice norms. J. Inequal. Appl. 2017, 2017, 302. [Google Scholar] [CrossRef] [PubMed]
  6. Pečarić, J.E.; Proschan, F.; Tong, Y.L. Convex Functions, Partial Orderings, and Statistical Applications; Mathematics in Science and Engineering, 187; Academic Press, Inc.: Boston, MA, USA, 1992; p. xiv+467. ISBN 0-12-549250-2. [Google Scholar]
  7. Lah, P.; Ribarič, M. Converse of Jensen’s inequality for convex functions. Univ. Beograd Publ. Elektrotehn. Fak. Ser. Mat. Fiz. 1973, 412–460, 201–205. [Google Scholar]
  8. Mitrinović, D.S.; Pečarić, J.E.; Fink, A.M. Classical and New Inequalities in Analysis; Mathematics and its Applications (East European Series), 61; Kluwer Academic Publishers Group: Dordrecht, The Netherlands, 1993; p. xviii+740. ISBN 0-7923-2064-6. [Google Scholar]
  9. Andrić, M.; Pečarić, J. Lah–Ribarič type inequalities for (h, g;m)-convex functions. Rev. Real Acad. Cienc. Exactas Fis. Nat. Ser. A Mat. 2022, 116, 39. [Google Scholar] [CrossRef]
  10. Popescu, P.G.; Sluşanschi, E.I.; Iancu, V.; Pop, F. A New Upper Bound for Shannon Entropy. A Novel Approach in Modeling of Big Data Applications. Concurr. Comput. Pract. Exp. 2016, 28, 351–359. [Google Scholar] [CrossRef]
  11. Pečarić, J.; Perić, J. Refinements of the integral form of Jensen’s and the Lah–Ribarič inequalities and applications for Csiszár divergence. J. Inequal. Appl. 2020, 2020, 108. [Google Scholar] [CrossRef] [Green Version]
  12. Constantin, P.; Niculescu; Persson, L.-E. Convex Functions and Their Applications. A Contemporary Approach; CMS Books in Mathematics; Springer: New York, NY, USA, 2005. [Google Scholar]
  13. Beckenbach, E.F.; Bellman, R. Inequalities; Springer: Berlin/Göttingen/Heidelberg, Germany, 1961. [Google Scholar]
  14. Hardy, G.H.; Littlewood, J.E. Pólya. In Inequalities; Cambridge Univ. Press: Cambridge, UK, 1934. [Google Scholar]
  15. Csiszár, I. Information-type measures of difference of probability functions and indirect observations. Studia Sci. Math. Hung. 1967, 2, 299–318. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pečarić, Đ.; Pečarić, J.; Perić, J. Refinement of Discrete Lah–Ribarič Inequality and Applications on Csiszár Divergence. Mathematics 2022, 10, 755. https://doi.org/10.3390/math10050755

AMA Style

Pečarić Đ, Pečarić J, Perić J. Refinement of Discrete Lah–Ribarič Inequality and Applications on Csiszár Divergence. Mathematics. 2022; 10(5):755. https://doi.org/10.3390/math10050755

Chicago/Turabian Style

Pečarić, Đilda, Josip Pečarić, and Jurica Perić. 2022. "Refinement of Discrete Lah–Ribarič Inequality and Applications on Csiszár Divergence" Mathematics 10, no. 5: 755. https://doi.org/10.3390/math10050755

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop