A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization

Tanaka, Hisa-Aki; Nakagawa, Masaki; Oohama, Yasutada

doi:10.3390/e21060549

Open AccessArticle

A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization

by

Hisa-Aki Tanaka

^*,

Masaki Nakagawa

and

Yasutada Oohama

Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan

^*

Author to whom correspondence should be addressed.

Entropy 2019, 21(6), 549; https://doi.org/10.3390/e21060549

Submission received: 30 April 2019 / Revised: 27 May 2019 / Accepted: 27 May 2019 / Published: 30 May 2019

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The well-known Hölder’s inequality has been recently utilized as an essential tool for solving several optimization problems. However, such an essential role of Hölder’s inequality does not seem to have been reported in the context of generalized entropy, including Rényi–Tsallis entropy. Here, we identify a direct link between Rényi–Tsallis entropy and Hölder’s inequality. Specifically, we demonstrate yet another elegant proof of the Rényi–Tsallis entropy maximization problem. Especially for the Tsallis entropy maximization problem, only with the equality condition of Hölder’s inequality is the q-Gaussian distribution uniquely specified and also proved to be optimal.

Keywords:

Rényi–Tsallis entropy; generalized entropy; optimization; Hölder’s inequality

1. Introduction

Tsallis entropy [1,2] has been recently utilized as a versatile framework for expanding the realm of Shannon–Boltzmann entropy for nonlinear processes, in particular, those that exhibit power–law behavior. It shares a structure in common with Rényi entropy [3], Daróczy entropy [4], and probability moment presented in Moriguti [5], since the essential part of all these functionals is

\int p^{q} (x) d x

(or

\sum p_{i}^{q}

) for certain constrained probability density functions

p (x)

(or

p_{i}

). This naturally has been of interest for a variety of issues in information theory and related areas. For instance, in his pioneering work, Campbell [6] stated that “Implicit in the use of average code length as a criterion of performance is the assumption that cost varies linearly with code length. This is not always the case.” Then, Campbell [6] introduced a nonlinear average length measure defined as

\begin{matrix} L (t) = \frac{1}{t} {log}_{D} \sum_{i = 1}^{N} p_{i} D^{t l_{i}}, \end{matrix}

being an extension of the one by Shannon,

\begin{matrix} L_{0} = \sum_{i = 1}^{N} p_{i} l_{i}, \end{matrix}

in which D is the size of the alphabet,

p_{i}

is the probability for a source to produce symbols

x_{i}

,

l_{i}

is the length of a codeword

c_{i}

mapped from symbol

x_{i}

(using D letters of the alphabet) in the context of source coding, and t is an arbitrary parameter (

0 < t < \infty

). One of the surprising facts proved in [6] is that the lower bound to the moment-generating function of code lengths, namely,

L (t)

, is given by

H_{\frac{1}{1 + t}} (p)

, namely, Rényi entropy of order

{(1 + t)}^{- 1}

of the source

p = {p_{i}}_{i = 1}^{N}

. Moreover, Ref. [6] also realizes that, if

\begin{matrix} l_{i} = \frac{1}{1 + t} {log}_{D} \frac{1}{p_{i}} + \frac{t}{1 + t} H_{\frac{1}{1 + t}} (p), \end{matrix}

which is a mixture of the Shannon code length

{log}_{D} \frac{1}{p_{i}}

and Rényi entropy of order

{(1 + t)}^{- 1}

, we have the lower bound

L (t) = H_{\frac{1}{1 + t}} (p)

. So far, Baer [7] has further generalized this result and constructed an algorithm for finding optimal binary codes under quasiarithmetic penalties. In addition, new extensions of [6] were obtained by Bercher [8] and by Bunte and Lapidoth [9].

Such an instance, where “a nonlinear measure” (i.e., generalized entropy) naturally arises, is also known for channel capacities. Daróczy [4] first analyzed a generalized channel capacity, which is a natural consequence of his extension of Shannon entropy (i.e., Daróczy entropy). This result has initiated extensive work in this direction. For instance, Landsberg and Vedral [10] first introduced Rényi entropy and Tsallis entropy for a binary symmetric channel, and they suggested the possibility of “super-Shannon channel capacities.” More recently, Ilić, Djordjević, and Küeppers [11] obtained new expressions for generalized channel capacities by introducing Daróczy–Tsallis entropy even for a weakly symmetric channel, binary erasure channel, and z-channel. Similar extensions have been explored for rate distortion theory. For instance, Venkatesan and Plastino [12] developed nonextensive rate distortion theory by introducing Tsallis entropy and constructed a minimization algorithm for generalized mutual information. More recently, Girardin and Lhote [13] covered the setting in [12] in a general framework of generalized entropy rates, which includes Rènyi–Tsallis entropy.

In the context of generalized entropy just described, the q-Gaussian distribution [1,2] often emerges as a maximizer of Rényi–Tsallis entropy under certain constraints, and, hence, it has been extensively studied. Since the q-Gaussian effectively models power–law behavior with a one-parameter q, its utility is widespread in various areas, including new random number generators proposed by Thistleton, Marsh, Nelson, and Tsallis [14] and by Umeno and Sato [15]. In addition to such an important application in communication systems, queuing theory has recently incorporated the q-Gaussian, reflecting the heavy-tailed traffic characteristics observed in broadband networks [16,17,18,19]. For instance, Karmeshu and Sharma [16] introduced Tsallis entropy maximization, and, there, the q-Gaussian emerges as the queue length distributions, which suggests that Jaynes’ maximum entropy principle [20,21,22] can be generalized to a framework of Tsallis entropy.

Some of the above issues are formulated as nonlinear optimizations with “a nonlinear measure” under certain constraints (which depend on each issue). As mentioned above, Rényi–Tsallis entropy and q-Gaussian is one such instance. In other words, the q-Gaussian maximizes Tsallis entropy under certain constraints. Therefore, it is useful to obtain a deeper understanding of such nonlinear optimization problems. In this study, we find a direct link between Rényi–Tsallis entropy and Hölder’s inequality that leads to yet another elegant proof of Rényi–Tsallis entropy maximization. The idea of the proof is different from those offered in previous studies (for instance, [23,24,25,26,27]) as explained below. Interestingly, the technique developed in this study might possibly be useful for tackling more complicated problems regarding optimization issues in information theory and other research areas, such as the conditional Rényi entropy (as in [28,29,30]), for instance.

Previous studies [23,24,25,26,27] are based on a common standpoint, the generalization of the moment–entropy inequality (cf. [25,26]). Namely, they intend to generalize the situation that a continuous random variable with a given second moment and maximal Shannon entropy is a Gaussian distribution (cf. [3], Theorem 8.6.5). In doing so, a generalized relative entropy is devised, which takes a different form (and has a different name) depending on the problem. First of all, Tsukada and Suyari’s beautiful work [23] has given proofs for Rényi entropy maximization, which is also known as a bound of Moriguti’s probability moment [5] (as posed in R1 in Section 2). Namely, they prove that the q-Gaussian distribution [1,2] is a unique optimal solution by utilizing the fact that all feasible solutions constitute a convex set. Although [23] does not explicitly construct a generalized relative entropy, the essential structure of the proofs inherits the one in the proof of the moment-entropy inequality ([3], Theorem 8.6.5)).

Moreover, they have identified an explicit one-to-one correspondence between feasible solutions to the problems of Rényi entropy maximization and Tsallis entropy maximization, which is also shown in ([31], p. 754). This implies that an ‘indirect’ proof to Tsallis entropy maximization (as posed in T1 in Section 2) has been first obtained in [23]. In contrast to this proof, the first ‘direct’ proof to Tsallis entropy maximization is obtained in Furuichi’s elegant work [24]. The proof in [24] utilizes nonnegativity of the Tsallis relative entropy defined between the q-Gaussian distribution (i.e., a possible maximizer) and any other feasible solution. On the other hand, the remarkable work of Lutwak, Yang, and Zhang first clarified that generalized Gaussians maximize

λ

-Rényi entropy power under a constraint on the p-th moment of the distribution, for univariate distributions [25] and for the associated n-dimensional extensions [26].The essential point in the proofs in [25,26] is construction of relative

λ

-Rényi entropy power, which is nonnegative and takes a quite different form compared to the Tsallis relative entropy in [24]. (More precisely, in [25], they prove nonnegativity of the relative

λ

-Rényi entropy

log N_{λ} [f, g]

([25], Lemma 1). Starting from this nonnegativity:

log N_{λ} [f, G t] \geq 1

, they construct a series of inequalities that saturate at the generalized Gaussian ([25], Lemma 2). Note, however, that, as observed in this

N_{λ} [f, G t]

, they start by giving a candidate of the maximizer ab initio, which is the generalized Gaussian

G t

.) Furthermore, Vignat, Hero, and Costa [32] obtained a general, sharp result using the Bregman information divergence for an n-dimensional extension of Tsallis entropy. In addition to [25,26,32], Eguchi, Komori, and Kato’s interesting results [27] include the same n-dimensional extension to Tsallis entropy. (Ref. [32] has also identified an elegant structure regarding the projective divergence and the

γ

-loss functions in maximum likelihood estimation.)Similar to [24,25,26,32], the key component of the proof in [27] is the projective power divergence, which again takes a quite different form compared to the ones in [24,25,26,32]. To prove nonnegativity of the generalized relative entropy, Refs. [25,26,27] utilize Hölder’s inequality, but Refs. [23,24,32] do not. Namely, Hölder’s inequality has been an auxiliary useful tool, and it has never played an essential role in these previous studies. In addition to the construction of generalized relative entropies, the optimal q-Gaussian distribution needs to be ‘given ab initio’ [23,24,25,26,27,32], inheriting the framework showing that the Gaussian distribution maximizes Shannon entropy ([3], Theorem 8.6.5).

Now natural questions arise: is it possible to systematically solve the problems of Rényi–Tsallis entropy maximization in a different (and hopefully simpler) way than the previous study? In addition, is it possible to ‘construct’ the q-Gaussian distribution? These questions are positively answered from a new viewpoint as follows. First, only by the equality (i.e., saturation) condition of Hölder’s inequality, the q-Gaussian distribution is specified, and, at the same time, its optimality is proved by Hölder’s inequality for a Tsallis entropy maximization of

1 < q < 3

(Theorem 1) and of

0 \leq q < 1

(Theorems 2 and 3). This clarifies how and why the q-Gaussian distribution emerges as the maximizer in an explicit way for the first time in the literature. (To the authors’ knowledge, such a characterization of the q-Gaussian distribution has never been reported.) However, for a Rényi entropy maximization of

q > 1

(Theorem 4) and of

\frac{1}{3} < q < 1

(Theorem 5), the q-Gaussian distribution is specified with the aid of the equality condition of Hölder’s inequality. In addition, the proof of its optimality requires a simple inequality inspired from Moriguti [5]. Note that we do not intend to provide an explicit characterization of the q-Gaussian distribution in terms of the parameter q, since numerous previous studies (including [23,24,25,26,27]) have already clarified this. Nevertheless, regarding Tsallis entropy maximization when

q = 0

, which has previously been studied in [2], a rigorous result (as in Theorem 3) is now obtained for the first time thanks to Hölder’s inequality. (For instance, in the framework of [24], the case for

q = 0

cannot be incorporated because the Tsallis relative entropy is not defined adequately.)

We note that Hölder’s inequality has been recently utilized as an essential tool for optimization in Campbell [6], Bercher [8], and Bunte and Lapidoth [9]; on source coding, in Bercher [33,34]; on generalized Cramér–Rao inequalities; and in Tanaka [35,36] on a physical limit of injection locking. However, such an essential role of Hölder’s inequality does not seem to be reported in the context of generalized entropy, including Rényi entropy (cf. [37]), except for the use as a means for proving nonnegativity of a generalized relative entropy, as mentioned above.

In what follows, Section 2 introduces basic definitions required for the analysis. Section 3 includes the main results regarding Rényi–Tsallis entropy maximization problems, and it also contains an explanation on the link to Moriguti’s argument in [5]. Section 4 lists the proofs to the results presented in Section 3. Finally, one Appendix at the end provides further supplementary information.

2. Basic Definitions and Problem Formulation

In this section, we first define Tsallis entropy [1,24] and Rényi entropy ([3], pp. 676–679). Next, we reformulate Rényi–Tsallis entropy maximization problems in a unified way. Finally, we introduce Hölder’s inequality in relation to the problems in this study.

2.1. Tsallis Entropy and Rényi Entropy

Tsallis entropy is beautifully presented in the context of q-analysis (cf. [1], p. 41) as follows. First, the q-exponential function

{exp}_{q}

, whose domain and range satisfies

\begin{matrix} {exp}_{q} : \{\begin{matrix} [\frac{- 1}{1 - q}, \infty) \to R^{+} \cup {0} & if q < 1, \\ (- \infty, \frac{1}{q - 1}] \to R^{+} & if q > 1, \end{matrix} \end{matrix}

is defined by

\begin{matrix} {exp}_{q} x = {[1 + (1 - q) x]}^{\frac{1}{1 - q}} . \end{matrix}

While, the inverse of the q-exponential function, namely q-logarithmic function, is defined by

\begin{matrix} {ln}_{q} x = \frac{x^{1 - q} - 1}{1 - q} . \end{matrix}

Note that, as

q \to 1

, we have

{exp}_{q} x \to e^{x}

and

{ln}_{q} x \to ln x

. We also note that the above definition of

\exp_{q} x

and

\ln_{q} x

has been recently revised by Oikonomou and Bagci [38]. (In [38], they have further developed ‘complete’ q-exponentials and q-logarithms.) Then, the Tsallis entropy

H_{q}^{Tsallis}

is defined by

\begin{matrix} H_{q}^{Tsallis} [p] = - \int_{- \infty}^{\infty} p^{q} (x) \ln_{q} p (x) d x = - ⟪p^{q} (x) \ln_{q} p (x)⟫, \end{matrix}

(1)

for univariate probability density functions (PDFs) p on

R

, which is a natural generalization of Boltzmann–Gibbs entropy and Shannon entropy. Hereafter,

⟪\cdot⟫ = \int_{- \infty}^{\infty} \cdot d x

in (1) is used for notational simplicity. The reason why

⟪\cdot⟫

is used, instead of

〈\cdot〉

, is due to the fact that

〈\cdot〉

is generally used for the expectation value. On the other hand, Rényi entropy is well-known and can be found in textbooks of information theory (cf. [3], pp. 676–679), which is defined simply by

H_{q}^{R é nyi} [p] = \frac{ln ⟪p^{q} (x)⟫}{1 - q} (0 < q < \infty (q \neq 1)) .

Finally, we note that only differential entropies (i.e., continuous probability distributions) are considered in this study, although our technique with Hölder’s inequality can be applied to discrete probability distributions.

2.2. Problem Formulation

Let

D

be the set of all PDFs on

R

. We then define the set, as introduced in [24],

\begin{matrix} C_{q} = \{p | p \in D and \frac{⟪x^{2} p^{q} (x)⟫}{⟪p^{q} (x)⟫} < \infty\} (\subset D) . \end{matrix}

Following the problem formulation in [1,2,23,24,31], we first introduce the Tsallis entropy maximization problem for univariate PDFs p on

R

:

\begin{matrix} T 1 : & \underset{p \in C_{q}}{maximize} H_{q}^{Tsallis} [p] = - ⟪p^{q} (x) \ln_{q} p (x)⟫ = \frac{1 - ⟪p^{q} (x)⟫}{q - 1} (for q \geq 0 (q \neq 1)) \end{matrix}

(2a)

\begin{matrix} subject to ⟪p (x)⟫ = 1, \end{matrix}

(2b)

\begin{matrix} \frac{⟪x^{2} p^{q} (x)⟫}{⟪p^{q} (x)⟫} = ⟪x^{2} \tilde{P_{q}} (x)⟫ = σ^{2}, \end{matrix}

(2c)

in which q and

σ^{2}

have fixed values, and

\tilde{P_{q}} (x) = p^{q} (x) / ⟪p^{q} (x)⟫

. Note that

⟪p^{q} (x)⟫ < \infty

and

⟪x^{2} p^{q} (x)⟫ < \infty

are assumed in T1.

{\tilde{P}}_{q} (x)

is often called the escort probability [27,31]. This somewhat unusual form of expectation

⟪x^{2} {\tilde{P}}_{q} (x)⟫ = σ^{2}

is called the q-normalized expectation [31], which has been usually assumed in Tsallis statistics. In contrast to the q-normalized expectation, as [31] pointed out, the usual expectation

⟪x^{2} p (x)⟫ = σ^{2}

is also valid in Tsallis statistics. We note the Tsallis entropy maximization problem under the constraint of this usual expectation is considered later in problem R2.

For problem T1, using Tsallis relative entropy, Furuichi [24] first proved that for

0 < q < 3

the q-Gaussian distribution

p (x) = \frac{1}{Z_{q}} \exp_{q} (- β_{q} x^{2})

maximizes the Tsallis entropy among any univariate PDFs in

C_{q}

, where

Z_{q}

and

β_{q}

are constants determined by q and

σ

.

Here, we formulate a slightly generalized optimization problem T2, as follows. First, replace (2c) with

\begin{matrix} ⟪x^{2} p^{q} (x)⟫ - σ^{2} ⟪p^{q} (x)⟫ = ⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0 . \end{matrix}

Note that now, as opposed to T1, it is not necessarily required that both

⟪x^{2} p^{q} (x)⟫

and

⟪p^{q} (x)⟫

are finite, and hence,

C_{q}

is not required, and it is replaced with

D

. Next, notice that Tsallis entropy is maximal at

p (x)

, such that

⟪p^{q} (x)⟫

is minimal (or correspondingly, maximal at

p (x)

, such that

⟪p^{q} (x)⟫

is maximal) for

q > 1

(correspondingly, for

0 \leq q < 1

). Then, by introducing an additional arbitrary parameter

λ_{q}

, T1 is reformulated as

\begin{matrix} T 2 : & \underset{p \in D}{minimize} (or maximize) T_{q} [p; λ_{q}] = σ^{2} ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p^{q} (x)⟫ \end{matrix}

\begin{matrix} = ⟪p^{q} (x) [λ_{q} x^{2} + (1 - λ_{q}) σ^{2}]⟫ (λ_{q} \in R, for q > 1 (correspondingly, for 0 \leq q < 1)) \end{matrix}

(3a)

\begin{matrix} subject to ⟪p (x)⟫ = 1, \end{matrix}

(3b)

\begin{matrix} ⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0, \end{matrix}

(3c)

where the constant

σ^{2}

is multiplied with

⟪p^{q} (x)⟫

in the first term of (3a) simply due to notational convenience for later analysis in Section 4.

As opposed to the Tsallis entropy maximization problem T1, the Rényi entropy maximization problem is usually considered under the constraint of the usual expectation

⟪x^{2} p (x)⟫ = σ^{2}

, in other words,

\begin{matrix} R 1 : & maximize H_{q}^{R é nyi} [p] = \frac{ln ⟪p^{q} (x)⟫}{1 - q} (for 0 < q < \infty (q \neq 1)) \end{matrix}

(4a)

\begin{matrix} subject to ⟪p (x)⟫ = 1, \end{matrix}

(4b)

\begin{matrix} ⟪x^{2} p (x)⟫ = σ^{2}, \end{matrix}

(4c)

which is equivalent to:

\begin{matrix} \underset{p \in D}{minimize} (or maximize) ⟪p^{q} (x)⟫ (q > 1 (correspondingly, 0 \leq q < 1)) \end{matrix}

(5a)

\begin{matrix} subject to ⟪p (x)⟫ = 1, \end{matrix}

(5b)

\begin{matrix} ⟪x^{2} p (x)⟫ = σ^{2} . \end{matrix}

(5c)

We note this very problem for

q > 1

was first posed and solved by Moriguti in 1952 [5]. (Later in [39], cases

q > 1

and

0 < q < 1

are both analyzed in an n-dimensional spherical symmetric extension of [5] with the same approach as [5].)Similar to T1, by introducing an additional parameter

λ_{q}

and the constraint

\begin{matrix} ⟪x^{2} p (x)⟫ - σ^{2} ⟪p (x)⟫ = ⟪(x^{2} - σ^{2}) p (x)⟫ = 0, \end{matrix}

(6)

which is obtained from (5b) and (5c), R1 is now reformulated as

\begin{matrix} R 2 : & \underset{p \in D}{minimize} (or maximize) R_{q} [p; λ_{q}] = ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p (x)⟫ \end{matrix}

\begin{matrix} = ⟪p^{q} (x) [1 + λ_{q} (x^{2} - σ^{2}) p^{1 - q} (x)]⟫ (for q > 1 (correspondingly, for 0 \leq q < 1), λ_{q} \in R) \end{matrix}

(7a)

\begin{matrix} subject to ⟪p (x)⟫ = 1, \end{matrix}

(7b)

\begin{matrix} ⟪(x^{2} - σ^{2}) p (x)⟫ = 0 . \end{matrix}

(7c)

As we observe (3a) in T2 and (7a) in R2, both become the inner products of two functions;

p^{q} (x)

and

λ_{q} x^{2} + (1 - λ_{q}) σ^{2}

and

p^{q} (x)

and

1 + λ_{q} (x^{2} - σ^{2}) p^{1 - q} (x)

, respectively. This suggests a direct link to Hölder’s inequality.

2.3. Hölder’s Inequality for Later Analysis

Here, we provide minimum information about Hölder’s inequality for later analysis in Section 3 and Section 4. The standard Hölder’s inequality is given by

\begin{matrix} {∥ f g ∥}_{1} \leq {∥ f ∥}_{α} {∥ g ∥}_{β}, \end{matrix}

(8)

with

1 \leq α

,

β \leq \infty

and

α^{- 1} + β^{- 1} = 1

(cf. [40] for the one-demensional case and [41] for general measurable functions). In general, f and g are measurable functions defined on a subset

S \subseteq R^{n}

and

μ (S) > 0

, and we employ a compact notation as

\begin{matrix} {∥ f ∥}_{α} & = & {(\int_{S} {| f |}^{α} d μ)}^{\frac{1}{α}}, \\ {∥ g ∥}_{β} & = & {(\int_{S} {| g |}^{β} d μ)}^{\frac{1}{β}} . \end{matrix}

Although

{∥ \cdot ∥}_{α}

and

{∥ \cdot ∥}_{β}

are no longer norms for

α, β < 1

, now in the context of this study, we set

α = q^{- 1}

and

β = {(1 - q)}^{- 1}

. Then, Hölder’s inequality (8) is given in the following form:

\begin{matrix} {∥ f g ∥}_{1} \leq {∥ f ∥}_{\frac{1}{q}} {∥ g ∥}_{\frac{1}{1 - q}} (0 \leq q \leq 1) . \end{matrix}

(9)

For the case

0 < q < 1

, the equality in (9) holds if and only if there exists constants A and B, not both 0 (cf. [40], p. 140), (More specifically, if f is null (i.e.,

f (s) = 0 (a . e . s \in S)

), then

B = 0

. In addition, if g is null,

A = 0

.) such that

\begin{matrix} A {|f (s)|}^{\frac{1}{q}} = B {|g (s)|}^{\frac{1}{1 - q}} (a . e . s \in S) . \end{matrix}

(10)

In addition, for the exceptional case

q = 0

(as well as

q = 1

), we can argue a condition for the equality in (9) separately, as shown in Section 4.3, although the expression of (10) is no more valid for this case.

In contrast to (9), reverse Hölder’s inequality is given by

\begin{matrix} {∥ f g ∥}_{1} \geq {∥ f ∥}_{\frac{1}{q}} {∥ g ∥}_{\frac{- 1}{q - 1}} (q > 1), \end{matrix}

(11)

which is directly obtained from Hölder’s inequality [40]. We note that f can be 0 over any subset

U \subseteq S

. As for g, on the other hand, we assume

g (s) \neq 0

for almost everywhere (a.e.)

s \in S

, taking care that

\frac{- 1}{q - 1} < 0

in (11) (cf. [40], p. 140). Then, for the case

q > 1

, the equality in (11) holds if and only if there exists

A \geq 0

, such that

\begin{matrix} |f (s)| = A {|g (s)|}^{\frac{- q}{q - 1}} (a . e . s \in S \subseteq R^{n}) . \end{matrix}

(12)

3. Main Results

In this study, we focus on the univariate PDFs on

R

, and we consider

f (x)

and

g (x)

defined on

R

as a special case of general

f (s)

and

g (s)

in Section 2.3. Hereafter, we refer to (10) and (12), as the equality condition of Hölder’s inequality and reverse Hölder’s inequality, respectively. Thanks to these equality conditions, we obtained our results systematically.

Let

p (x)

be a univariate PDF defined on

R

. Assume that

p (x)

is a measurable function which is integrable with respect to x. In addition, let

B (\cdot, \cdot)

denote the Beta function (cf. [42], p. 253). Then, we can form the following statements.

Theorem 1.

(Tsallis entropy maximization for

1 < q < 3

): Suppose

1 < q < 3

.

p_{opt} (x)

, defined by

\begin{matrix} p_{opt} (x) = \frac{1}{Z_{q}} {exp}_{q} (- β_{q} x^{2}) with Z_{q} = \sqrt{\frac{3 - q}{q - 1}} σ B (\frac{1}{q - 1} - \frac{1}{2}, \frac{1}{2}) and β_{q} = \frac{1}{(3 - q) σ^{2}}, \end{matrix}

is the unique maximizer of the Tsallis entropy

H_{q}^{Tsallis} [p]

in (2a) under the constraints

⟪p (x)⟫ = 1

of (3b) and

⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0

of (3c) inT2.

Corollary 1.

For

q \geq 3

, Tsallis entropy

H_{q}^{Tsallis} [p]

is bounded, but has no maximizer. Namely, there exist PDFs

p (x)

, such that

H_{q}^{Tsallis} [p] \to \frac{1}{q - 1} \leq \frac{1}{2}

, in other words,

⟪p^{q} (x)⟫ \to 0

. (The idea for constructing such PDFs is from Tsukada and Suyari [23], where they proved thatR1for

q \leq \frac{1}{3}

becomes unbounded, i.e.,

⟪p^{q} (x)⟫ \to + \infty

.)

The proof of this theorem (and corollary) is given in Section 4.1. As mentioned in Section 1, the above statement itself has already appeared in [24]. However, our proof is quite different to the one in [24], in the sense that it does not require generalized relative entropy, and the maximizer is explicitly ‘specified’ (not ‘given ab initio’). Namely, reverse Hölder’s inequality aids in finding the optimal solution.

The outline of the proof is as follows. First, for

1 < q < 3

, the maximization of the Tsallis entropy

H_{q}^{Tsallis} [p]

, in other words, the minimization of

T_{q} [p; λ_{q}]

in (3a), is related to reverse Hölder’s inequality in (11). Second, we observe that

T_{q} [p; λ_{q}]

has the lower bound through reverse Hölder’s inequality. Third, the minimizer

p_{opt} (x)

achieving this bound is explicitly and uniquely constructed from the equality condition (12):

\begin{matrix} p_{opt} (x) = A^{\frac{1}{q}} {[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}]}^{\frac{- 1}{q - 1}}, \end{matrix}

(13)

where

A = {[\frac{3 - q}{2} σ^{2}]}^{\frac{q}{q - 1}} {[\sqrt{\frac{3 - q}{q - 1}} σ B (\frac{1}{q - 1} - \frac{1}{2}, \frac{1}{2})]}^{- q}

and

λ_{q, opt} = \frac{1}{2} (q - 1)

.

Remark 1.

Even if we assume the additional constraint:

p \in C_{q} (\subset D)

inT2, the proof of this theorem (as well as of Theorems 2 and 3) remains the same, since we do not require the finiteness of

⟪x^{2} p^{q} (x)⟫

and

⟪p^{q} (x)⟫

(i.e.,

p \in C_{q}

) in the proof.

Remark 2.

Another simple proof for optimality of

p_{opt}

is given as follows. The idea is due to Moriguti’s argument (cf. [5], p. 288), where

p_{opt}^{q} (x)

and any

p^{q} (x)

are directly related by the Taylor expansion for each

x \in R

:

\begin{matrix} p^{q} (x) = p_{opt}^{q} (x) + q p_{opt}^{q - 1} (x) [p (x) - p_{opt} (x)] + \frac{q (q - 1)}{2} p_{int}^{q - 2} (x) {[p (x) - p_{opt} (x)]}^{2}, \end{matrix}

(14)

where

p_{int} (x) (\geq 0)

has a value between

p (x)

and

p_{opt} (x)

. Substituting (13) into the second term of the right-hand side of (14), we have

\begin{matrix} [λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] [p^{q} (x) - p_{opt}^{q} (x)] \\ = & q A^{\frac{q - 1}{q}} [p (x) - p_{opt} (x)] + [λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] \frac{q (q - 1)}{2} p_{int}^{q - 2} (x) {[p (x) - p_{opt} (x)]}^{2} . \end{matrix}

(15)

With the constraints (3b) and (3c):

⟪x^{2} p_{opt}^{q} (x)⟫ = σ^{2} ⟪p_{opt}^{q} (x)⟫

,

⟪x^{2} p^{q} (x)⟫ = σ^{2} ⟪p^{q} (x)⟫

, and

⟪p (x)⟫ = ⟪p_{opt} (x)⟫ = 1

, integrating (15) over

R

, for any

p (x)

, we find

\begin{matrix} σ^{2} [⟪p^{q} (x)⟫ - ⟪p_{opt}^{q} (x)⟫] = \frac{q (q - 1)}{2} ⟪[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] p_{int}^{q - 2} (x) {[p (x) - p_{opt} (x)]}^{2}⟫ \geq 0, \end{matrix}

(16)

since

λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2} > 0

follows from

0 < λ_{q, opt} (= \frac{1}{2} (q - 1)) < 1

. Therefore,

p_{opt}

in (13) is a unique optimal solution toT2, as the equality holds only if

p = p_{opt}

.

Theorem 2.

(Tsallis entropy maximization for

0 < q < 1

): Suppose

0 < q < 1

.

p_{opt} (x)

, as defined by

\begin{matrix} p_{opt} (x) = \{\begin{matrix} \frac{1}{Z_{q}} {exp}_{q} (- β_{q} x^{2}) (x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}), \end{matrix} \\ with {\bar{S}}_{q, opt} = [- \sqrt{\frac{3 - q}{1 - q}} σ, \sqrt{\frac{3 - q}{1 - q}} σ], Z_{q} = \sqrt{\frac{3 - q}{1 - q}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2}), and β_{q} = \frac{1}{(3 - q) σ^{2}}, \end{matrix}

is the unique maximizer of the Tsallis entropy

H_{q}^{Tsallis} [p]

in (2a) under the constraints

⟪p (x)⟫ = 1

of (3b) and

⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0

of (3c) inT2.

The proof of this theorem is given in Section 4.2. In the case for

0 < q < 1

, the maximization of

T_{q} [p; λ_{q}]

is recast as Hölder’s inequality in (9), where, similar to the argument in the proof of Theorem 1, construction of

p_{opt}

and

λ_{q, opt}

and verification of its optimality are carried out simultaneously. The maximizer

p_{opt}

for

0 < q < 1

is uniquely determined from the equality condition (10):

\begin{matrix} p_{opt} (x) = \{\begin{matrix} \frac{A}{B} {[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} (x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}), \end{matrix} \end{matrix}

(17)

where

A / B = {[\frac{3 - q}{2} σ^{2}]}^{\frac{1}{q - 1}} {[\sqrt{\frac{3 - q}{1 - q}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2})]}^{- 1}

and

λ_{q, opt} = \frac{1}{2} (q - 1) (< 0)

are uniquely determined, and the associated

{\bar{S}}_{q, opt}

is uniquely determined as

{\bar{S}}_{q, opt} = [- \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ, \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ]

.

Remark 3.

Another simple proof for optimality of

p_{opt}

is given by following Moriguti’s argument ([5], p. 288). Similar to the case for

1 < q < 3

in Remark 2, (14) holds for

x \in {\bar{S}}_{q, opt} \subset R

. As for

x \in R \ {\bar{S}}_{q, opt}

,

p_{opt} (x) = 0

from (17). We then have

\begin{matrix} [λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] [p^{q} (x) - p_{opt}^{q} (x)] \\ = {\begin{cases} q A^{\frac{q - 1}{q}} [p (x) - p_{opt} (x)] + [λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] \frac{q (q - 1)}{2} p_{int}^{q - 2} (x) {[p (x) - p_{opt} (x)]}^{2} (x \in {\bar{S}}_{q, opt}) \\ {⟪[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] p^{q} (x)⟫}_{R \ {\bar{S}}_{q, opt}} (x \in R \ {\bar{S}}_{q, opt}), \end{cases} \end{matrix}

(18)

where

{⟪\cdot⟫}_{R \ {\bar{S}}_{q, opt}} = \int_{R \ {\bar{S}}_{q, opt}} \cdot d x

is used for notational simplicity. Integrating (18) over

R

, for any p satisfying the constraints (3b) and (3c), we find

\begin{matrix} σ^{2} [⟪p^{q} (x)⟫ - ⟪p_{opt}^{q} (x)⟫] \\ = q A^{\frac{q - 1}{q}} {⟪p (x) - p_{opt} (x)⟫}_{{\bar{S}}_{q, opt}} + \frac{q (q - 1)}{2} {⟪[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] p_{int}^{q - 2} (x) {[p (x) - p_{opt} (x)]}^{2}⟫}_{{\bar{S}}_{q, opt}} \\ + {⟪[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}] p^{q} (x)⟫}_{R \ {\bar{S}}_{q, opt}} \leq 0, \end{matrix}

(19)

since the first term on the right-hand side

\leq 0

as

{⟪p (x)⟫}_{{\bar{S}}_{q, opt}} \leq 1

, the second term

\leq 0

from the definition of

{\bar{S}}_{q, opt}

and the fact that

0 < q < 1

, and the third term

\leq 0

because of the definition of

{\bar{S}}_{q, opt}

. Since the equality in (19) holds only if

p = p_{opt}

, this implies that

p_{opt}

in (17) is a unique optimal solution toT2.

Theorem 3.

(Tsallis entropy maximization for

q = 0

): Suppose

q = 0

.

p_{opt} (x)

, defined by

\begin{matrix} p_{opt} (x) = \{\begin{matrix} arbitrary positive value (x \in {\bar{S}}_{q, opt} = [- \sqrt{3} σ, \sqrt{3} σ]) \\ 0 (x \in R \ {\bar{S}}_{q, opt}), \end{matrix} \\ w i t h {⟪p_{opt} (x)⟫}_{{\bar{S}}_{q, opt}} = 1, \end{matrix}

(20)

is the unique representation of the maximizer of the Tsallis entropy

H_{q}^{Tsallis} [p]

in (2a) under the constraints

⟪p (x)⟫ = 1

of (3b) and

⟪(x^{2} - σ^{2}) p^{0} (x)⟫ = 0

of (3c) inT2.

The proof of this theorem is given in Section 4.3, where the associated Hölder’s inequality is given as

{∥ f g ∥}_{1} \leq {∥ f ∥}_{\infty} {∥ g ∥}_{1}

, and we follow the arguments in the proof of Theorem 2 for

0 < q < 1

. (This exceptional case (

q = 0

) is also considered in T1, where the same result is obtained through a more direct graphical argument, after proving that any candidate

p_{*} (x)

for the maximizer

p_{opt} (x)

is defined only on a simply connected interval

S_{*}

that is symmetric about the origin

O

. The proof is straightforward but lengthy, so we omit it here.) However, as opposed to the case for

0 < q < 1

, the equality condition is not available in the form of (10) for

q = 0

, and we directly verify that

\begin{matrix} f_{*} (x) = \{\begin{matrix} sgn [g (x)] (a . e . x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}) \end{matrix} \end{matrix}

is the unique solution satisfying the equality in (9), as shown in Lemma 1. Namely, for any feasible solutions

p (x)

satisfying the constraints (3b) and (3c), we find that

f_{*} (x)

is associated with the unique maximizer

p_{opt} (x)

of

T_{q} [p; λ_{q}]

from (52), and hence,

p_{opt} (x)

for

q = 0

is obtained as in (20).

Remark 4.

We note that the optimal solution shown in ([2], p. 2399, Figure 1) for

q = 0

, which is obtained by setting

q \to 0

in (17), is a special case of (20).

Theorem 4.

(Rényi entropy maximization for

q > 1

): Suppose

q > 1

.

p_{opt} (x)

, as defined by

\begin{matrix} p_{opt} (x) = \{\begin{matrix} \frac{1}{Z_{q}^{*}} {exp}_{q^{*}} (- β_{q}^{*} x^{2}) (x \in {\underset{\bar{}}{S}}_{opt}) \\ 0 (x \in R \ {\underset{\bar{}}{S}}_{opt}), \end{matrix} \\ with q_{*} = 2 - q, {\underset{\bar{}}{S}}_{opt} = [- \sqrt{\frac{3 q - 1}{q - 1}} σ, \sqrt{\frac{3 q - 1}{q - 1}} σ], Z_{q}^{*} = \sqrt{\frac{3 q - 1}{q - 1}} σ B (\frac{q}{q - 1}, \frac{1}{2}), and β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}, \end{matrix}

is the unique maximizer of Rényi entropy in (4a) under the constraints

⟪p (x)⟫ = 1

of (5b) (or (7b)) and

⟪x^{2} p (x)⟫ = σ^{2}

of (5c) (or (7c)).

The proof of this theorem is given in Section 4.4. The minimization of

R_{q} [p; λ_{q}]

in (7b) for

q > 1

is related to reverse Hölder’s inequality in (11). In contrast to those of Theorems 1–3, the proof of Theorem 4, which can be found in Section 4.4, follows from two steps. In the first step, we construct a candidate for the minimizer (i.e.,

p_{opt} (x)

, see (61) below), whose support becomes

{\underset{\bar{}}{S}}_{opt}

, and we determine the associated

λ_{q, opt}

and

{\underset{\bar{}}{S}}_{opt}

through the equality condition of reverse Hölder’s inequality. In doing so, as shown in Figure 1, we introduce a subset of feasible solutions

p (x)

, in other words,

Q

, which satisfies the constraints (5b) and (5c), and an additional constraint:

p^{q} (x) + λ_{q} (x^{2} - σ^{2}) p (x) > 0 (x \in R)

. In the second step, after obtaining a candidate

p_{opt} (x) \in Q

, we verify that this

p_{opt} (x)

is indeed the unique minimizer of

R_{q} [p; λ_{q}]

by directly comparing

⟪p_{opt}^{q} (x)⟫

and

⟪p^{q} (x)⟫

for any feasible solutions

p (x)

satisfying the constraints (5b) and (5c).

Remark 5.

We note that the first proof for this optimality of

p_{opt}

has been given in Moriguti [5], in which the essential idea is the Taylor expansion shown in the argument below (18).

Theorem 5.

(Rényi entropy maximization for

\frac{1}{3} < q < 1

): Suppose

\frac{1}{3} < q < 1

.

p_{opt} (x)

, defined by

\begin{matrix} p_{opt} (x) = \frac{1}{Z_{q}^{*}} {exp}_{q^{*}} (- β_{q}^{*} x^{2}) \\ with q_{*} = 2 - q, Z_{q}^{*} = \sqrt{\frac{3 q - 1}{1 - q}} σ B (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2}) and β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}, \end{matrix}

is the unique maximizer of Rényi entropy (4a) under the constraints (5b) and (5c) (or (7b) and (7c)).

The proof of this theorem is given in Section 4.5. Maximization of

R_{q} [p; λ_{q}]

is related to Hölder’s inequality in (9) and the proof follows two steps, similar to the proof for Theorem 4. In the first step, we construct a candidate for the maximizer (i.e.,

p_{opt} (x)

, given below by (72)) and determine

λ_{q, opt}

through the equality condition of Hölder’s inequality. In the second step, after obtaining a candidate

p_{opt}

, we verify that this

p_{opt}

is indeed the unique maximizer of

R_{q} [p; λ_{q}]

by directly comparing

⟪p_{opt}^{q} (x)⟫

and

⟪p^{q} (x)⟫

for any feasible solutions

p (x)

satisfying the constraints (5b) and (5c). This verification is done as in the proofs for Theorem 4. Although omitted here, using essentially the same argument as in Remark 1, another simple proof based on Moriguti [5] is possible.

Remark 6.

Tsukada and Suyari [23] have proved thatR1for

0 < q \leq \frac{1}{3}

becomes unbounded. As for the exceptional case of

q = 0

, the upper and lower bounds of

⟪p^{q} (x)⟫ (= ⟪p^{0} (x)⟫)

are argued as follows. First, if we consider the Gaussian distribution that satisfies (5b) and (5c), this gives us

⟪p^{0} (x)⟫ = {⟪1⟫}_{R} = \infty

, and it implies there is no maximizer. Next, consider a particular distribution given by

\begin{matrix} Δ (x) = \{\begin{matrix} δ^{- 1} & (x \in [\bar{σ}, \bar{σ} + δ]) \\ 0 & (o t h e r w i s e), \end{matrix} \end{matrix}

(21)

with

δ > 0

. This

Δ (x)

satisfies

⟪Δ (x)⟫ = δ^{- 1} δ = 1

in (5b), and it also satisfies

⟪x^{2} Δ (x)⟫ = σ^{2}

in (5c) when δ is arbitrary small, in other words,

\begin{matrix} \forall σ, δ (≪ σ) \exists \bar{σ} ⟪x^{2} Δ (x)⟫ = {\bar{σ}}^{2} + δ \bar{σ} + \frac{δ^{2}}{3} = σ^{2}, \end{matrix}

(22)

and, this particular distribution gives

⟪Δ^{0} (x)⟫ = {⟪1⟫}_{[\bar{σ}, \bar{σ} + δ]} = δ \to 0 (δ \to 0)

, which implies there is no minimizer. Therefore, problemR1(andR2) has no maximizer nor minimizer for

q = 0

.

4. Proof of Main Results

Following the outlines leading to Theorems 1–5 in Section 3, here we give their proofs.

4.1. Proof of Theorem 1

Proof.

Let p be arbitrary feasible solutions to T2 for

1 < q < 3

, and let

p_{opt}

be its optimal solution, which is eventually constructed in (25). Let

λ_{q, opt}

be a particular value of the additional parameter

λ_{q}

in T2, which is associated with

p_{opt}

and is eventually constructed in (29). Then, for any p and a particular

λ_{q, opt}

(

= \frac{1}{2} (q - 1)

in (29)), we define f and g as

\begin{matrix} f (x) & = & p^{q} (x) (\geq 0), \end{matrix}

(23a)

\begin{matrix} g (x) & = & λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2} (> 0) . \end{matrix}

(23b)

First, we show

T_{q} [p; λ_{q}]

is minimized in the following way:

\begin{matrix} T_{q} [p; λ_{q}] & = & T_{q} [p; λ_{q, opt}] = ⟪f g⟫ \end{matrix}

(24a)

\begin{matrix} = & {∥ f g ∥}_{1} \geq ∥ p^{q} ∥_{\frac{1}{q}} {∥ g ∥}_{\frac{- 1}{q - 1}} \end{matrix}

(24b)

\begin{matrix} = & {∥ g ∥}_{\frac{- 1}{q - 1}} (the lower bound) . \end{matrix}

(24c)

The first “=” in (24a) follows from the fact that

T_{q} [p; λ_{q}] = σ^{2} ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p^{q} (x)⟫

in (3a) is independent from the value of

λ_{q}

, since any feasible solution p satisfies

⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0

in (3c), and the second “=” in (24a) is immediate from (23). The “=” in (24b) follows from

p^{q} (x) \geq 0

and

g (x) > 0

in (23b), since in (29)

λ_{q, opt} = \frac{1}{2} (q - 1)

, and it satisfies

0 < λ_{q, opt} < 1

. The “≥” in (24b) follows from reverse Hölder’s inequality (11). The “=” in (24c) follows from

∥ p^{q} ∥_{\frac{1}{q}} = {⟪| p (x) |⟫}^{q} = 1

. (24c) implies that

T_{q} [p; λ_{q}]

has the lower bound (i.e., the Tsallis entropy

H_{q}^{Tsallis} [p]

in (1) has the upper bound).

Next, we construct a maximizer

p_{opt}

achieving this bound and show its uniqueness, which is done by checking the conditions where the “≥” in (24) become “=”; the only “≥” in (24b) becomes “=” if and only if the equality condition (12) is satisfied. Now, we rewrite the equality condition (12) (after assuming

S = R

in (12)) by using (23), which constructs (a candidate of)

p_{opt}

:

\begin{matrix} p_{opt} (x) = A^{\frac{1}{q}} {[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}]}^{\frac{- 1}{q - 1}} = \frac{1}{Z_{q}} {exp}_{q} (- β_{q} x^{2}), \end{matrix}

(25)

where A and

λ_{q, opt} (= \frac{1}{2} (q - 1))

are uniquely determined, and hence,

Z_{q}

and

β_{q}

are also uniquely determined, as shown in the following calculations. We note the formula ([42], p. 253):

\begin{matrix} \int_{0}^{\infty} \frac{d x}{x^{α} {(1 + x^{γ})}^{β}} = \frac{1}{γ} B (β - \frac{1 - α}{γ}, \frac{1 - α}{γ}) \end{matrix}

is repeatedly used for the integrations below. (More precisely, for (26), we set

α = 0 (< 1)

,

β = \frac{1}{q - 1} (> 0)

,

γ = 2 (> 0)

, and

β γ > 1 - α

is satisfied. For (27b), we set

α = - 2

,

0 (< 1)

,

β = \frac{q}{q - 1} (> 0)

,

γ = 2 (> 0)

, and

β γ > 1 - α

is satisfied.)First, by substituting

p_{opt}

in (25) into the constraint (3b), and for

1 < q < 3

(finiteness of the left-hand side of (3b) requires the condition

q < 3

), we have

\begin{matrix} ⟪p_{opt} (x)⟫ = A^{\frac{1}{q}} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} \sqrt{\frac{1 - λ_{q, opt}}{λ_{q, opt}}} σ B (\frac{1}{q - 1} - \frac{1}{2}, \frac{1}{2}) = 1 . \end{matrix}

(26)

On the other hand, substituting (25) into

n ⟪p^{q} (x)⟫

and

⟪x^{2} p^{q} (x)⟫

in the constraint (3c), we have

\begin{matrix} ⟪p_{opt}^{q} (x)⟫ & = & A {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{q}{1 - q}} σ [\sqrt{\frac{1 - λ_{q, opt}}{λ_{q, opt}}} B (\frac{1}{q - 1} + \frac{1}{2}, \frac{1}{2})], \end{matrix}

(27a)

\begin{matrix} ⟪x^{2} p_{opt}^{q} (x)⟫ & = & A {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{q}{1 - q}} σ^{3} [{(\frac{1 - λ_{q, opt}}{λ_{q, opt}})}^{\frac{3}{2}} B (\frac{1}{q - 1} - \frac{1}{2}, \frac{3}{2})], \end{matrix}

(27b)

and substitution of (27a) and (27b) into (3c) yields

\begin{matrix} A {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{q}{1 - q}} σ^{3} [{(\frac{1 - λ_{q, opt}}{λ_{q, opt}})}^{\frac{3}{2}} B (\frac{1}{q - 1} - \frac{1}{2}, \frac{3}{2}) - \sqrt{\frac{1 - λ_{q, opt}}{λ_{q, opt}}} B (\frac{1}{q - 1} + \frac{1}{2}, \frac{1}{2})] = 0 . \end{matrix}

(28)

Then, using the formula

B (x, y + 1) = \frac{y}{x} B (x + 1, y)

([42], p. 254) in (28),

λ_{q, opt}

is uniquely determined as

\begin{matrix} \frac{λ_{q, opt}}{1 - λ_{q, opt}} = \frac{q - 1}{3 - q}, i . e ., λ_{q, opt} = \frac{1}{2} (q - 1), \end{matrix}

(29)

and from (26) and (29) A is uniquely determined as

\begin{matrix} A = {[\frac{3 - q}{2} σ^{2}]}^{\frac{q}{q - 1}} {[\sqrt{\frac{3 - q}{q - 1}} σ B (\frac{1}{q - 1} - \frac{1}{2}, \frac{1}{2})]}^{- q} . \end{matrix}

(30)

In (25), equating the second term to the third term,

β_{q}

is uniquely determined as

\begin{matrix} β_{q} = \frac{λ_{q, opt}}{1 - λ_{q, opt}} {(q - 1)}^{- 1} σ^{- 2} = \frac{1}{(3 - q) σ^{2}}, \end{matrix}

and from (29) and (30),

Z_{q}

is uniquely determined as

\begin{matrix} Z_{q} = {\{A^{\frac{1}{q}} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}}\}}^{- 1} = \sqrt{\frac{3 - q}{q - 1}} σ B (\frac{1}{q - 1} - \frac{1}{2}, \frac{1}{2}) . \end{matrix}

This proves that

p_{opt}

in (25) is a unique minimizer to T2 for

1 < q < 3

.

Finally, we prove the Corollary to Theorem 1 in Section 3. For

q \geq 3

, let

p_{*}

be the distribution satisfying (3b) and (3c), defined as

\begin{matrix} p_{*} (x) = Z^{- 1} {(| x | + α)}^{- (1 + ε)}, \end{matrix}

where a normalization factor

Z = \int {(| x | + α)}^{- (1 + ε)} d x

and

α, ε > 0

. What we are going to prove is

⟪p_{*}^{q} (x)⟫ \to 0 (ε \to 0)

, which is done as follows. First, straightforward integrations yield:

\begin{matrix} Z & = & 2 α^{- ε} ε^{- 1}, \end{matrix}

(31a)

\begin{matrix} ⟪p_{*}^{q} (x)⟫ & = & 2 α^{1 - \bar{q}} {(\bar{q} - 1)}^{- 1} Z^{- q}, \end{matrix}

(31b)

\begin{matrix} ⟪x^{2} p_{*}^{q} (x)⟫ & = & 4 α^{3 - \bar{q}} {(\bar{q} - 1)}^{- 1} {(\bar{q} - 2)}^{- 1} {(\bar{q} - 3)}^{- 1} Z^{- q}, \end{matrix}

(31c)

where

\bar{q} = (1 + ε) q > 3

. Second, from the constraint in (3c),

2 {(\bar{q} - 2)}^{- 1} {(\bar{q} - 3)}^{- 1} α^{2} = σ^{2}

is obtained, and this shows that

α

becomes finite and is determined by

ε

, q, and

σ

. Finally, substituting (31a) into (31b), we obtain

\begin{matrix} ⟪p_{*}^{q} (x)⟫ = 2^{1 - q} {(\bar{q} - 1)}^{- 1} ε^{q}, \end{matrix}

(32)

and hence

⟪p_{*}^{q} (x)⟫ \to 0 (ε \to 0)

, since for

q \geq 3

in (32)

{(\bar{q} - 1)}^{- 1} \to {(q - 1)}^{- 1}

and

ε^{q} \to 0 (ε \to 0)

in (32). □

4.2. Proof of Theorem 2

Proof.

Let p be arbitrary feasible solutions to T2 for

0 < q < 1

, and let

p_{opt}

be its optimal solution, which is eventually constructed in (38). Let

λ_{q, opt}

be a particular value of the additional parameter

λ_{q}

in T2, which is associated with

p_{opt}

and is eventually constructed in (42). Then, for any p and a particular

λ_{q, opt}

(

= \frac{1}{2} (q - 1)

in (42)), we define f and g as

\begin{matrix} f (x) & = & p^{q} (x), \end{matrix}

(33a)

\begin{matrix} g (x) & = & λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}, \end{matrix}

(33b)

and we define an interval

\begin{matrix} {\bar{S}}_{q, opt} = [- \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ, \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ] . \end{matrix}

First, we show that

T_{q} [p; λ_{q}]

is maximized in the following way:

\begin{matrix} T_{q} [p; λ_{q}] & = & T_{q} [p; λ_{q, opt}] = ⟪f g⟫ \end{matrix}

(34a)

\begin{matrix} \leq & {⟪f g⟫}_{{\bar{S}}_{q, opt}} = {⟪| f g |⟫}_{{\bar{S}}_{q, opt}} \end{matrix}

(34b)

\begin{matrix} = & {∥ f g ∥}_{1, {\bar{S}}_{q, opt}} \leq {∥ f ∥}_{\frac{1}{q}, {\bar{S}}_{q, opt}} {∥ g ∥}_{\frac{1}{1 - q}, {\bar{S}}_{q, opt}} \end{matrix}

(34c)

\begin{matrix} \leq & {∥ g ∥}_{\frac{1}{1 - q}, {\bar{S}}_{q, opt}} (the upper bound), \end{matrix}

(34d)

where

⟪\cdot⟫ = \int_{- \infty}^{\infty} \cdot d x

,

{⟪\cdot⟫}_{{\bar{S}}_{q, opt}} = \int_{{\bar{S}}_{q, opt}} \cdot d x

, and

{∥ \cdot ∥}_{α, {\bar{S}}_{q, opt}} = {(\int_{{\bar{S}}_{q, opt}} {| \cdot |}^{α} d x)}^{\frac{1}{α}}

. The first “=” in (34a) follows from the fact that

T_{q} [p; λ_{q}] = σ^{2} ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p^{q} (x)⟫

in (3a) is independent from the value of

λ_{q}

, since any feasible solution p satisfies

⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0

in (3c), and the second “=” in (34a) is immediate from (33). The “≤” in (34b) is obtained from the following observation. By plotting the graph of

g (x)

for (any negative)

λ_{q, opt}

, we observe that

{\bar{S}}_{q, opt}

is the set of x on which

g (x)

becomes positive. For any f and g in (34a), we also observe from this graph that

{⟪f g⟫}_{{\bar{S}}_{q, opt}} > {⟪f g⟫}_{S_{*}}

for any set

S_{*} (\subset R)

but

{\bar{S}}_{q, opt}

, since

\begin{matrix} f (x) \geq 0 (\forall x \in R), g (x) \geq 0 (x \in {\bar{S}}_{q, opt}), and g (x) < 0 (x \in R \ {\bar{S}}_{q, opt}) . \end{matrix}

(35)

On the other hand, the “=” in (34b) is immediate from

f (x) g (x) \geq 0 (\forall x \in {\bar{S}}_{q, opt})

. The first “=” in (34c) follows from the definition of

{∥ \cdot ∥}_{1, {\bar{S}}_{q, opt}}

, and the inequality in (34c) follows from Hölder’s inequality (9). In view of (33a), the final “≤” in (34d) follows from

\begin{matrix} {∥ f ∥}_{\frac{1}{q}, {\bar{S}}_{q, opt}} = {(\int_{{\bar{S}}_{q, opt}} | p (x) | d x)}^{q} \leq {(\int_{- \infty}^{\infty} | p (x) | d x)}^{q} = 1, \end{matrix}

and the resulting

{∥ g ∥}_{\frac{1}{1 - q}, {\bar{S}}_{q, opt}}

implies the upper bound of

T_{q} [p; λ_{q}]

if

λ_{q, opt}

exists for a given q and

σ

.

Next, we construct a maximizer

p_{opt}

achieving this bound and show its uniqueness, which is done by checking the conditions where all three “≤” in (34) become “=”. As for the “≤” in (34b), it becomes “=” if and only if

p (x)

becomes positive only in

{\bar{S}}_{q, opt}

. Namely,

\begin{matrix} f (x) = p^{q} (x) \geq 0 (x \in {\bar{S}}_{q, opt}) and f (x) = 0 (a . e . x \in R \ {\bar{S}}_{q, opt}) . \end{matrix}

(36)

In other words, the “≤” in (34b) becomes “<” if the above condition (36) is violated, which is easily verified from the graph of

g (x)

and the above argument for the “≤” in (34b). On the other hand, in (34c), the “≤” becomes “=” if and only if the equality condition (10) is satisfied for

x \in {\bar{S}}_{q, opt}

, in other words,

\begin{matrix} A p (x) = B {|λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}|}^{\frac{1}{1 - q}} . \end{matrix}

(37)

If p satisfies (36), it is immediate that

A \neq 0

and

B \neq 0

in (37), since

{[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} \geq 0 (x \in {\bar{S}}_{q, opt})

, and A and B are not both 0 (cf. [40], p. 140). The conclusion is that, if these two conditions (36) and (37) are satisfied, a maximizer

p_{opt}

achieving the upper bound of

T_{q} [p; λ]

is uniquely determined:

\begin{matrix} p_{opt} (x) = \{\begin{matrix} \frac{A}{B} {[λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} = \frac{1}{Z_{q}} {exp}_{q} (- β_{q} x^{2}) (x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}), \end{matrix} \end{matrix}

(38)

in which

A / B

,

λ_{q, opt}

,

Z_{q} = \sqrt{\frac{3 - q}{1 - q}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2})

,

β_{q} = \frac{1}{(3 - q) σ^{2}}

, and

{\bar{S}}_{q, opt} = [- \sqrt{\frac{3 - q}{1 - q}} σ, \sqrt{\frac{3 - q}{1 - q}} σ]

are uniquely determined, as shown in the following calculations. We note the formula ([42], p. 253):

\begin{matrix} \int_{0}^{1} x^{α} {(1 - x^{γ})}^{β} d x = \frac{1}{γ} B (1 + β, \frac{1 + α}{γ}) \end{matrix}

is repeatedly used for the integrations below. (More precisely, for (39), we set

α = 0 (> - 1)

,

β = \frac{1}{1 - q} (> - 1)

, and

γ = 2 (> 0)

. For (40b), we set

α = 2

,

0 (> - 1)

,

β = \frac{q}{1 - q} (> - 1)

, and

γ = 2 (> 0)

.)By substituting

p_{opt} (x)

in (38) into the constraint (3b), we have

\begin{matrix} ⟪p_{opt} (x)⟫ = \frac{A}{B} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2}) = 1 . \end{matrix}

(39)

On the other hand, by substituting (38) into

⟪p^{q} (x)⟫

and

⟪x^{2} p^{q} (x)⟫

in the constraint (3c), we have

\begin{matrix} ⟪p_{opt}^{q} (x)⟫ & = & {(\frac{A}{B})}^{q} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{q}{1 - q}} σ [\sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} B (\frac{1}{1 - q}, \frac{1}{2})], \end{matrix}

(40a)

\begin{matrix} ⟪x^{2} p_{opt}^{q} (x)⟫ & = & {(\frac{A}{B})}^{q} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{q}{1 - q}} σ^{3} [{(\frac{λ_{q, opt} - 1}{λ_{q, opt}})}^{\frac{3}{2}} B (\frac{1}{1 - q}, \frac{3}{2})], \end{matrix}

(40b)

and substitution of (40a) and (40b) into (3c) yields

\begin{matrix} {(\frac{A}{B})}^{q} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}} σ^{3} [{(\frac{λ_{q, opt} - 1}{λ_{q, opt}})}^{\frac{3}{2}} B (\frac{1}{1 - q}, \frac{3}{2}) - \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} B (\frac{1}{1 - q}, \frac{1}{2})] = 0 . \end{matrix}

(41)

Then, using the formula

B (x, y + 1) = \frac{y}{x + y} B (x, y)

([42], p. 254) in (41),

λ_{q, opt}

is uniquely determined as

\begin{matrix} \frac{λ_{q, opt}}{λ_{q, opt} - 1} = \frac{1 - q}{3 - q}, i . e ., λ_{q, opt} = \frac{1}{2} (q - 1) \end{matrix}

(42)

and from (39) and (42)

A / B

is uniquely determined as

\begin{matrix} \frac{A}{B} = {[\frac{3 - q}{2} σ^{2}]}^{\frac{1}{q - 1}} {[\sqrt{\frac{3 - q}{1 - q}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2})]}^{- 1} . \end{matrix}

In (38), equating the second term to the third term,

β_{q}

is uniquely determined as

\begin{matrix} β_{q} = \frac{λ_{q, opt}}{1 - λ_{q, opt}} {(q - 1)}^{- 1} σ^{- 2} = \frac{1}{(3 - q) σ^{2}}, \end{matrix}

(43)

and from (29) and (30),

Z_{q}

is uniquely determined as

\begin{matrix} Z_{q} = {\{\frac{A}{B} {[(1 - λ_{q, opt}) σ^{2}]}^{\frac{1}{1 - q}}\}}^{- 1} = \sqrt{\frac{3 - q}{1 - q}} σ B (\frac{1}{1 - q} + 1, \frac{1}{2}) . \end{matrix}

(44)

Thus, from (43), and (44),

p_{opt}

is uniquely obtained as in (38).

To see that

p_{opt}

makes all “≤” in (34) “=”, finally, we check the last “≤” in (34d) becomes “=”, which is immediate since in (34c)

{∥ f ∥}_{\frac{1}{q}, {\bar{S}}_{q, opt}} = \int_{{\bar{S}}_{q, opt}} p_{opt} (x) d x = 1

. Therefore, it is concluded that

T_{q} [p; λ_{q}]

is uniquely maximized by

p_{opt}

in (38) for

0 < q < 1

. □

4.3. Proof of Lemma 1 and Theorem 3

Let p be arbitrary feasible solutions to T2 for

q = 0

, and let

p_{opt}

be its optimal solution, which is eventually constructed in (53). Let

λ_{q, opt}

be a particular value of the additional parameter

λ_{q}

in T2, which is associated with

p_{opt}

and is eventually constructed in (54). Then, for any p and a particular

λ_{q, opt}

(

= - \frac{1}{2}

in (54)), we define f and g as

\begin{matrix} f (x) & = & p^{0} (x) = \{\begin{matrix} 1 if 0 < p (x) < \infty \\ 0 if p (x) = 0, \end{matrix} \end{matrix}

(45a)

\begin{matrix} g (x) & = & λ_{q, opt} x^{2} + (1 - λ_{q, opt}) σ^{2}, \end{matrix}

(45b)

and we define an interval

\begin{matrix} {\bar{S}}_{q, opt} = [- \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ, \sqrt{\frac{λ_{q, opt} - 1}{λ_{q, opt}}} σ] . \end{matrix}

(46)

In (45a), as a convention, we take

0^{0} = 0

, and

p (x) < \infty (a . e . x \in {\bar{S}}_{q, opt})

. Then,

{∥ f ∥}_{\infty} = 1

follows from (45a). Now, we define

f_{*}

as

\begin{matrix} f_{*} (x) = \{\begin{matrix} sgn [g (x)] (a . e . x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}) . \end{matrix} \end{matrix}

(47)

We note this particular

f_{*} (x) = sgn [g (x)]

is proved to be the unique maximizer of

{∥ f g ∥}_{1, {\bar{S}}_{q, opt}}

in the following Lemma 1 (as a minor modification of Lemma 4 in [35]).

Lemma 1.

(cf. Lemma 4 in [35]). Let S be an arbitrary subset in

R

, with

μ (S) > 0

. For

f \in L^{\infty} (S)

and

g \in L^{1} (S)

, assume

g (x) \neq 0, a . e . on S

. Then,

f_{*} (x) = sgn [g (x)] (a . e . x \in S)

is the unique maximizer of the functional

{∥ f g ∥}_{1, S}

in (9).

Proof of Lemma 1.

First, thanks to Hölder’s inequality, see (9),

{∥ f g ∥}_{1, S}

is maximized by

f_{*}

, since

\begin{matrix} ∥ f_{*} {g ∥}_{1, S} = {⟪sgn [g (x)] g (x)⟫}_{S} = {⟪| g (x) | t⟫}_{S} = {∥ g ∥}_{1, S} . \end{matrix}

Second, the unique representation of this maximizer

f_{*}

is shown by proof by contradiction, as follows. Suppose another maximizer

{\bar{f}}_{*}

exists and it maximizes

{∥ f g ∥}_{1, S}

, in other words,

∥ {\bar{f}}_{*} {g ∥}_{1, S} = {∥ g ∥}_{1, S}

. Then, for any given

g \in L^{1} (S)

, the following is satisfied:

\begin{matrix} ∥ f_{*} {g ∥}_{1, S} - {∥ {\bar{f}}_{*} g ∥}_{1, S} = 0 . \end{matrix}

(48)

Now, using the identities

f_{*} (x) g (x) = sgn [g (x)] \cdot g (x) \geq 0

and

| g (x) | = f_{*} (x) g (x)

, we obtain

| f_{*} (x) g (x) | - | {\bar{f}}_{*} (x) g (x) | = f_{*} (x) g (x) - | {\bar{f}}_{*} (x) | | g (x) |

and

f_{*} (x) g (x) - | {\bar{f}}_{*} (x) | | g (x) | = f_{*} (x) g (x) - | {\bar{f}}_{*} (x | f_{*} (x) g (x)

, respectively, resulting in the equality

\begin{matrix} | f_{*} (x) g (x) | - | {\bar{f}}_{*} (x) g (x) | = f_{*} (x) g (x) - | {\bar{f}}_{*} (x) | f_{*} (x) g (x) . \end{matrix}

(49)

Substituting (49) into the left-hand side of (48) and using

| g (x) | = f_{*} (x) g (x)

, (48) is rewritten as

\begin{matrix} {⟪(1 - | {\bar{f}}_{*} (x) |) f_{*} (x) g (x)⟫}_{S} = {⟪(1 - | {\bar{f}}_{*} (x) |) | g (x) |⟫}_{S} = 0 . \end{matrix}

(50)

Now, keeping

0 \leq {\bar{f}}_{*} (x) \leq 1

and the assumption that

g (x) \neq 0, a . e . on S

in mind, (50) implies

\begin{matrix} | {\bar{f}}_{*} (x) | = 1, or equivalently \\ {\bar{f}}_{*} (x) = σ (x), a . e . on S, \end{matrix}

where

σ

takes either

- 1

or 1. However, among such functions

{\bar{f}}_{*}

having either

- 1

or 1 values, it is clear that

sgn [g (x)] (= f_{*})

is the only one that makes

{⟪f g⟫}_{S}

maximal. Thus, no

{\bar{f}}_{*}

can exist except for

f_{*}

, and the uniqueness of the maximizer

f_{*}

is verified. □

Proof of Theorem 3.

First, we show that

T_{q} [p; λ_{q}]

is maximized in the following way:

\begin{matrix} T_{q} [p; λ_{q}] & = & T_{q} [p; λ_{q, opt}] = ⟪f g⟫ \end{matrix}

(51a)

\begin{matrix} \leq & {⟪f g⟫}_{{\bar{S}}_{q, opt}} = {⟪| f g |⟫}_{{\bar{S}}_{q, opt}} \end{matrix}

(51b)

\begin{matrix} = & {∥ f g ∥}_{1, {\bar{S}}_{q, opt}} \leq {∥ f ∥}_{\infty, {\bar{S}}_{q, opt}} {∥ g ∥}_{1, {\bar{S}}_{q, opt}} \end{matrix}

(51c)

\begin{matrix} \leq & {∥ g ∥}_{1, {\bar{S}}_{q, opt}} (the upper bound), \end{matrix}

(51d)

where

⟪\cdot⟫ = \int_{- \infty}^{\infty} \cdot d x

,

{⟪\cdot⟫}_{{\bar{S}}_{q, opt}} = \int_{{\bar{S}}_{q, opt}} \cdot d x

, and

{∥ \cdot ∥}_{1, {\bar{S}}_{q, opt}} = \int_{{\bar{S}}_{q, opt}} | \cdot | d x

, and

{∥ f ∥}_{\infty, {\bar{S}}_{q, opt}}

is the infinity norm of

f (x) (x \in {\bar{S}}_{q, opt})

, in other words, the essential supremum of

| f (x) | (x \in {\bar{S}}_{q, opt})

. The first “=” in (51a) follows from the fact that

T_{q} [p; λ_{q}] = σ^{2} ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p^{q} (x)⟫

in (3a) is independent from the value of

λ_{q}

, since any feasible solution

p (x)

satisfies

⟪(x^{2} - σ^{2}) p^{q} (x)⟫ = 0

in (3c), and the second “=” in (51a) is immediate from (45). The “≤” in (51b) is obtained from the same argument of the inequality (34b) and (35) in the proof of Theorem 2. On the other hand, the equality in (51b) is immediate from

f (x) g (x) \geq 0 (\forall x \in {\bar{S}}_{q, opt})

. The first “=” in (51c) follows from the definition of

{∥ \cdot ∥}_{1, {\bar{S}}_{q, opt}}

, and the “≤” in (51c) follows from the Hölder’s inequality (9). The final “≤” in (51d) follows from the definition of

f (x)

in (45a), in other words,

{∥ f ∥}_{\infty, {\bar{S}}_{q, opt}} \leq 1

, and the resulting

{∥ g ∥}_{1, {\bar{S}}_{q, opt}}

implies the upper bound of

T_{q} [p; λ_{q}]

if

λ_{q, opt}

exists for given q and

σ

.

Next, we construct a maximizer

p_{opt} (x)

achieving this bound and show its uniqueness, which is done by checking the conditions where all three “≤” in (51) become “=”. As for the first “≤” in (51b), it becomes “=” if and only if

p (x)

becomes positive only in

{\bar{S}}_{q, opt}

. Namely,

\begin{matrix} f (x) = p^{0} (x) = 0 or 1 (x \in {\bar{S}}_{q, opt}) and f (x) = 0 (a . e . x \in R \ {\bar{S}}_{q, opt}), \end{matrix}

(52)

in other words, the “≤” in (51b) becomes “<” if the above condition (52) is violated, which is easily verified from the graph of

g (x)

and the above argument for the “≤” in (51b). On the other hand, in (51c), the second “≤” becomes “=” if and only if

f (x) = sgn [g (x)] (a . e . x \in {\bar{S}}_{q, opt})

due to Lemma 1 (simply by replacing S with

{\bar{S}}_{q, opt}

, in Lemma 1). The final “≤” in (51d) becomes “=” if and only if

{∥ f ∥}_{\infty, {\bar{S}}_{q, opt}} = 1

in (51c). From these three conditions, f is uniquely determined as

f_{*}

in (47), and from (45a) the associated maximizer

p_{opt}

for

q = 0

is obtained as

\begin{matrix} p_{opt} (x) = \{\begin{matrix} arbitrary positive value (x \in {\bar{S}}_{q, opt}) \\ 0 (x \in R \ {\bar{S}}_{q, opt}), \end{matrix} \end{matrix}

(53)

where

p_{opt} (x)

should satisfy

{⟪p_{opt} (x)⟫}_{{\bar{S}}_{q, opt}} = 1

. Finally, substituting (53) into the constraint (3c), we have

⟪(x^{2} - σ^{2}) p_{opt}^{0} (x)⟫ = {⟪x^{2} - σ^{2}⟫}_{{\bar{S}}_{q, opt}} = 0

, and from (46)

λ_{q, opt}

and

{\bar{S}}_{q, opt}

are uniquely obtained as

\begin{matrix} λ_{q, opt} = - \frac{1}{2} (< 0) and {\bar{S}}_{q, opt} = [- \sqrt{3} σ, \sqrt{3} σ], \end{matrix}

(54)

respectively. This shows the uniqueness of the representation of

p_{opt}

in (53). (This exceptional case

q = 0

is also argued in T1, where the same result is obtained through a more direct graphical argument, after proving that any candidate

p_{*} (x)

for the maximizer

p_{opt} (x)

is defined only on a simply connected interval

S_{*}

that is symmetric about the origin

O

. The proof is straightforward but lengthy, and we omit it here.) □

4.4. Proof of Theorem 4

Proof.

Let p be arbitrary feasible solutions to R2 for

q > 1

, and let

p_{opt}

be its optimal solution, which is eventually constructed in (61). Let

λ_{q, opt}

be a particular value of the additional parameter

λ_{q}

in R2, which is associated with

p_{opt}

and is eventually constructed in (65). First, for any p and a particular

λ_{q, opt}

, in (65), we define f and g as

\begin{matrix} f (x) & = & p^{q} (x), \end{matrix}

(55a)

\begin{matrix} g (x) & = & 1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x), \end{matrix}

(55b)

and we define a set

{\underset{\bar{}}{S}}_{q}

in

R

:

\begin{matrix} {\underset{\bar{}}{S}}_{q} = \{x | p^{q} (x) + λ_{q, opt} (x^{2} - σ^{2}) p (x) (= f (x) g (x)) > 0\} . \end{matrix}

(56)

Next, we introduce a subset

Q

of the feasible solutions p,

\begin{matrix} Q = \{p (x) | (5 b), (5 c), and p^{q} (x) + λ_{q, opt} (x^{2} - σ^{2}) p (x) \geq 0 (\forall x \in R)\}, \end{matrix}

(57)

which is proved to be non-empty in Appendix A.1.

First, we show that the following holds: if

p \in Q

,

\begin{matrix} R_{q} [p; λ_{q}] & = & R_{q} [p; λ_{q, opt}] = {⟪p^{q} (x) + λ_{q, opt} (x^{2} - σ^{2}) p (x)⟫}_{{\underset{\bar{}}{S}}_{q}} \end{matrix}

(58a)

\begin{matrix} = & {⟪|p^{q} (x) + λ_{q, opt} (x^{2} - σ^{2}) p (x)|⟫}_{{\underset{\bar{}}{S}}_{q}} = {∥ f g ∥}_{1, {\underset{\bar{}}{S}}_{q}} \end{matrix}

(58b)

\begin{matrix} \geq & {∥ f ∥}_{\frac{1}{q}, {\underset{\bar{}}{S}}_{q}} {∥ g ∥}_{\frac{1}{1 - q}, {\underset{\bar{}}{S}}_{q}}, \end{matrix}

(58c)

where

⟪\cdot⟫ = \int_{- \infty}^{\infty} \cdot d x

,

{⟪\cdot⟫}_{{\underset{\bar{}}{S}}_{q}} = \int_{{\underset{\bar{}}{S}}_{q}} \cdot d x

, and

{∥ \cdot ∥}_{α, {\underset{\bar{}}{S}}_{q}} = {(\int_{{\underset{\bar{}}{S}}_{q}} {| \cdot |}^{α} d x)}^{\frac{1}{α}}

. The first “=” in (58a) follows from the fact that

R_{q} [p; λ_{q}] = ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p (x)⟫

in (7c) is independent from the value of

λ_{q}

, since any feasible solution

p (x)

satisfies

⟪(x^{2} - σ^{2}) p (x)⟫ = 0

in (6), and the second “=” in (58a) is immediate from the definitions (56) and (57). The first “=” in (58b) is also immediate from (56). The “≥” in (58c) follows from reverse Hölder’s inequality (11).

If

R_{q} [p; λ_{q}]

achieves the lower bound, and

{∥ f ∥}_{\frac{1}{q}, {\underset{\bar{}}{S}}_{q}} {∥ g ∥}_{\frac{1}{1 - q}, {\underset{\bar{}}{S}}_{q}}

in (58c) saturates at this bound for

p_{opt} (\in Q)

and

λ_{q, opt}

, then, from (58), the following has to be satisfied:

\begin{matrix} R_{q} [p_{opt}; λ_{q, opt}] = {∥ f ∥}_{\frac{1}{q}, {\underset{\bar{}}{S}}_{q}} {∥ g ∥}_{\frac{1}{1 - q}, {\underset{\bar{}}{S}}_{q}} = the lower bound . \end{matrix}

Therefore, to construct a candidate

p_{opt}

achieving this bound, we consider the condition where the “≥” in (58c) becomes “=”. Namely, we rewrite the equality condition (12) by using (55):

\begin{matrix} p^{q} (x) = A {[1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x)]}^{\frac{q}{1 - q}} > 0 (x \in {\underset{\bar{}}{S}}_{q}), i . e ., \end{matrix}

(59a)

\begin{matrix} p^{1 - q} (x) = C [1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x)] > 0 (x \in {\underset{\bar{}}{S}}_{q}), \end{matrix}

(59b)

where

C = A^{\frac{1 - q}{q}} > 0

. From (59b), it is immediate that

\begin{matrix} p^{1 - q} (x) [1 + C λ_{q, opt} (σ^{2} - x^{2})] = C (x \in {\underset{\bar{}}{S}}_{q}), \end{matrix}

(60)

and, hence,

\begin{matrix} p (x) = {(\frac{1}{C})}^{\frac{1}{q - 1}} {[1 + C λ_{q, opt} (σ^{2} - x^{2})]}^{\frac{1}{q - 1}} = {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{\frac{1}{q - 1}} (x \in {\underset{\bar{}}{S}}_{q}), \end{matrix}

since

[1 + C λ_{q, opt} (σ^{2} - x^{2})] > 0

because

C > 0

and

p^{1 - q} (x) > 0 (x \in {\underset{\bar{}}{S}}_{q}

) in (60). On the other hand, for

p \in Q

, it is also immediate that

p (x) = 0 (x \in R \ {\underset{\bar{}}{S}}_{q})

from (56) and (57). Thereby, a candidate of the minimizer (in

Q

) is constructed as:

\begin{matrix} p_{opt} (x) = \{\begin{matrix} {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{\frac{1}{q - 1}} = \frac{1}{Z_{q}^{*}} {exp}_{q^{*}} (- β_{q}^{*} x^{2}) (x \in {\underset{\bar{}}{S}}_{opt}) \\ 0 (x \in R \ {\underset{\bar{}}{S}}_{opt}), \end{matrix} \end{matrix}

(61)

where

q^{*} = 2 - q

. (The distribution in (61) is called the

q^{*}

-Gaussian [31].) In (61),

{\underset{\bar{}}{S}}_{opt} = [- \sqrt{σ^{2} + \frac{1}{C λ_{q, opt}}}, \sqrt{σ^{2} + \frac{1}{C λ_{q, opt}}}]

is now specified and

λ_{q, opt}

,

Z_{q}^{*}

, and

β_{q}^{*}

are uniquely determined as

λ_{q, opt} = \frac{1}{2 C σ^{2}} \frac{q - 1}{q} > 0

,

Z_{q}^{*} = \sqrt{\frac{3 q - 1}{q - 1}} σ B (\frac{q}{q - 1}, \frac{1}{2})

, and

β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}

, respectively, as shown in the following calculations. We note the formula ([42], p. 253):

\begin{matrix} \int_{0}^{1} x^{α} {(1 - x^{γ})}^{β} d x = \frac{1}{γ} B (1 + β, \frac{1 + α}{γ}) \end{matrix}

is repeatedly used for the integrations below. (More precisely, for (62) we set

α = 0 (> - 1)

,

β = \frac{1}{q - 1} (> - 1)

, and

γ = 2 (> 0)

.) For (63), we set

α = 2 (> - 1)

,

β = \frac{1}{1 - q} (> - 1)

, and

γ = 2 (> 0)

. Substituting (61) into the constraint (5b), we have

\begin{matrix} ⟪p_{opt} (x)⟫ = r^{(\frac{2}{q - 1} + 1)} {λ_{q, opt}}^{\frac{1}{q - 1}} B (\frac{q}{q - 1}, \frac{1}{2}) = 1, \end{matrix}

(62)

where r is defined by

r^{2} = σ^{2} + \frac{1}{λ_{q, opt} C} .

Note that

r^{2} = \frac{3 q - 1}{q - 1} σ^{2} > 0

as shown in (65). On the other hand, substituting (61) into the constraint (5c), we have

\begin{matrix} ⟪x^{2} p_{opt} (x)⟫ = r^{(\frac{2}{q - 1} + 3)} {λ_{q, opt}}^{\frac{1}{q - 1}} B (\frac{q}{q - 1}, \frac{3}{2}) = σ^{2} . \end{matrix}

(63)

Substitution of (62) and (63) (multiplied with

σ^{2}

to its both sides) into (7a) yields

\begin{matrix} r^{2} B (\frac{q}{q - 1}, \frac{3}{2}) = σ^{2} B (\frac{q}{q - 1}, \frac{1}{2}), i . e ., r^{2} = \frac{3 q - 1}{q - 1} σ^{2} (= σ^{2} + \frac{1}{λ_{q, opt} C}), \end{matrix}

(64)

in which

B (x, y + 1) = \frac{y}{x + y} B (x, y)

([42], p. 254) is used. Thus,

λ_{q, opt}

is obtained as

\begin{matrix} λ_{q, opt} = \frac{1}{2 C σ^{2}} \frac{q - 1}{q} . \end{matrix}

(65)

Since

\begin{matrix} r^{2} = σ^{2} + \frac{1}{λ_{q, opt} C} = \frac{3 q - 1}{q - 1} σ^{2} . \end{matrix}

While, from (62), or (63), and (64) that determines r with q and

σ

,

λ_{q, opt} (> 0)

is uniquely obtained for any given

q (> 1)

and

σ

, and hence, C is also uniquely determined from (65):

\begin{matrix} C = \frac{1}{2 σ^{2}} \frac{q - 1}{q} r^{q + 1} B^{q - 1} (\frac{q}{q - 1}, \frac{1}{2}) . \end{matrix}

Next, we obtain

Z_{q}^{*}

and

β_{q}^{*}

from (61) and (65):

\begin{matrix} \begin{matrix} {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{\frac{1}{q - 1}} & = {(\frac{3 q - 1}{2 C q})}^{\frac{1}{q - 1}} {[1 - \frac{1 - q_{*}}{(3 q - 1) σ^{2}} x^{2}]}^{\frac{1}{q_{*}}} \\ = \frac{1}{Z_{q}^{*}} {exp}_{q_{*}} [- \frac{x^{2}}{(3 q - 1) σ^{2}}], \end{matrix} \end{matrix}

which yields

\begin{matrix} β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}, Z_{q}^{*} = {(\frac{3 q - 1}{2 C q})}^{\frac{1}{1 - q}} = \sqrt{\frac{3 q - 1}{q - 1}} σ B (\frac{q}{q - 1}, \frac{1}{2}) . \end{matrix}

Therefore, from (61), (62), and (65),

p_{opt} (x)

is uniquely determined as in (61). Note that

p_{opt} (x) \in Q

, since

\begin{matrix} 1 + λ_{q, opt} (x^{2} - σ^{2}) p_{opt}^{1 - q} (x) = \frac{1}{C} {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{- 1} > 0 (x \in {\underset{\bar{}}{S}}_{opt}) \end{matrix}

is immediate from (61).

Next, to prove that the candidate in (61) is the unique minimizer, we directly compare

⟪p^{q} (x)⟫

and

⟪p_{opt}^{q} (x)⟫

in the following way:

\begin{matrix} ⟪p^{q} (x)⟫ - ⟪p_{opt}^{q} (x)⟫ \end{matrix}

\begin{matrix} = & ⟪p^{q} (x) - p_{opt}^{q} (x) + q λ_{q, opt} (x^{2} - σ^{2}) [p (x) - p_{opt} (x)]⟫ - \frac{q}{C} ⟪p (x) - p_{opt} (x)⟫ \end{matrix}

(66a)

\begin{matrix} = & {⟪p^{q} (x) + q λ_{q, opt} (x^{2} - σ^{2}) p (x) - \frac{q}{C} p (x)⟫}_{R \ {\underset{\bar{}}{S}}_{opt}} \end{matrix}

(66b)

\begin{matrix} + {⟪p^{q} (x) - p_{opt}^{q} (x) - q p_{opt}^{q - 1} (x) [p (x) - p_{opt} (x)]⟫}_{{\underset{\bar{}}{S}}_{opt}} . \end{matrix}

(66c)

Note that (66a) follows from

⟪p_{opt} (x)⟫ = ⟪p (x)⟫ = 1

in (5b) and

⟪x^{2} p_{opt} (x)⟫ = ⟪x^{2} p (x)⟫ = σ^{2}

in (5c), (66b) follows from (61), in other words,

p_{opt} (x) = 0 (x \in R \ {\underset{\bar{}}{S}}_{opt})

, and (66c) follows from (66a) by using

q λ_{q, opt} (x^{2} - σ^{2}) - \frac{q}{C} = - q p_{opt}^{q - 1} (x) (x \in {\underset{\bar{}}{S}}_{opt})

, which is immediate from (61). Finally, substituting

λ_{q, opt} = \frac{1}{2 C σ^{2}} \frac{q - 1}{q}

in (65) into (66b), we obtain

\begin{matrix} {⟪p^{q} (x) + \frac{q - 1}{2 C σ^{2}} p (x) (x^{2} - \frac{3 q - 1}{q - 1} σ^{2})⟫}_{R \ {\underset{\bar{}}{S}}_{opt}} \geq 0, \end{matrix}

in which the equality holds if and only if

p (x) = 0 (x \in R \ {\underset{\bar{}}{S}}_{opt})

, since

\begin{matrix} {\underset{\bar{}}{S}}_{opt} = [- \sqrt{σ^{2} + \frac{1}{C λ_{q, opt}}}, \sqrt{σ^{2} + \frac{1}{C λ_{q, opt}}}] = [- \sqrt{\frac{3 q - 1}{q - 1}} σ, \sqrt{\frac{3 q - 1}{q - 1}} σ], \end{matrix}

and hence,

\begin{matrix} \frac{q - 1}{2 C σ^{2}} (x^{2} - \frac{3 q - 1}{q - 1} σ^{2}) > 0 (x \in R \ {\underset{\bar{}}{S}}_{opt}) . \end{matrix}

On the other hand, the term (66c) can be expressed as

\begin{matrix} {⟪p_{opt}^{q} (x) \{{[\frac{p (x)}{p_{opt} (x)}]}^{q} - q \frac{p (x)}{p_{opt} (x)} + q - 1\}⟫}_{{\underset{\bar{}}{S}}_{opt}} . \end{matrix}

(67)

Because, for

q > 1

,

h (X) = X^{q} - q X + q - 1 \geq 0

for any

X \geq 0

, and because

h (X) = 0

only when

X = 1

, (67) is nonnegative and it becomes 0 if and only if

X = 1

, in other words,

p (x) = p_{opt} (x)

(

x \in {\underset{\bar{}}{S}}_{opt}

). This proves that

p_{opt} (x)

in (61) is the unique minimizer to R2 for

q > 1

. □

4.5. Proof of Theorem 5

Proof.

Let p be arbitrary feasible solutions to R2 for

\frac{1}{3} < q < 1

, and let

p_{opt}

be its optimal solution, which is eventually constructed in (72). Let

λ_{q, opt}

be a particular value of the additional parameter

λ_{q}

in R2, which is associated with

p_{opt}

and is eventually constructed in (76). Then, for any p and a particular

λ_{q, opt}

, in (76), we define f and g as

\begin{matrix} f (x) & = & p^{q} (x), \end{matrix}

(68a)

\begin{matrix} g (x) & = & 1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x) . \end{matrix}

(68b)

First, we show the following holds for any feasible solution p,

\begin{matrix} R_{q} [p; λ_{q}] & = & R_{q} [p; λ_{q, opt}] = ⟪f g⟫ \end{matrix}

(69a)

\begin{matrix} \leq & {∥ f g ∥}_{1} \leq {∥ f ∥}_{\frac{1}{q}} {∥ g ∥}_{\frac{1}{1 - q}} \end{matrix}

(69b)

\begin{matrix} = & {∥ g ∥}_{\frac{1}{1 - q}} . \end{matrix}

(69c)

The first “=” in (69a) follows from the fact that

R_{q} [p; λ_{q}] = ⟪p^{q} (x)⟫ + λ_{q} ⟪(x^{2} - σ^{2}) p (x)⟫

in (7c) is independent from the value of

λ_{q}

, since any feasible solution

p (x)

satisfies

⟪(x^{2} - σ^{2}) p (x)⟫ = 0

in (6), and the second “=” in (69a) is immediate from (68). The first “≤” in (69b) follows from

f (x) g (x) \leq | f (x) g (x) | (\forall x \in R)

, since

f (x)

is always nonnegative but

g (x)

in (68b) can be negative on some intervals in

R

by choosing certain

p (x)

. The second “≤” in (69b) follows from Hölder’s inequality (9). The final “=” in (69c) follows from

{∥ f ∥}_{\frac{1}{q}} = {∥ p^{q} ∥}_{\frac{1}{q}} = {⟪| p (x) |⟫}^{q} = 1

in (69b).

Next, if

R_{q} [p; λ_{q}]

achieves the upper bound, and

{∥ g ∥}_{\frac{1}{1 - q}}

in (69c) saturates at this bound for

p_{opt}

and

λ_{q, opt}

, then from (69) the following has to be satisfied:

R_{q} [p_{opt}; λ_{q, opt}] = {∥ f g ∥}_{1} = {∥ f ∥}_{\frac{1}{q}} {∥ g ∥}_{\frac{1}{1 - q}} = the upper bound .

Therefore, to construct a candidate

p_{opt}

achieving this bound, we consider the condition where the two “≤” in (69) become “=”. As for the first “≤” in (69b), it becomes “=” if

g (x) > 0 (a . e . x \in R)

, in other words,

\begin{matrix} 1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x) > 0 (a . e . x \in R), \end{matrix}

(70)

which is eventually verified in (78). On the other hand, the second “≤” in (69b) becomes “=” if and only if the equality condition (10) is satisfied,

A p (x) = B | 1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x) |^{\frac{1}{1 - q}}

, in other words,

\begin{matrix} p^{1 - q} (x) = C |1 + λ_{q, opt} (x^{2} - σ^{2}) p^{1 - q} (x)| (\forall x \in R), \end{matrix}

(71)

where

C =

{(B / A)}^{1 - q}

> 0

. Note that

A \neq

0 and

B \neq 0

because of (70). From (70) and (71), a candidate of the maximizer is uniquely constructed:

\begin{matrix} p_{opt} (x) = {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{\frac{- 1}{1 - q}} = \frac{1}{Z_{q}^{*}} {exp}_{q^{*}} (- β_{q}^{*} x^{2}) (\forall x \in R), \end{matrix}

(72)

in which C =

{(B / A)}^{1 - q}

,

λ_{q, opt} = \frac{1}{2 C σ^{2}} \frac{q - 1}{q} (< 0)

,

Z_{q}^{*} = \sqrt{\frac{3 q - 1}{1 - q}} σ B (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2})

, and

β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}

are uniquely determined, and

λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C} > 0

is verified, as shown in the following calculations. We note the formula ([42], p. 253):

\begin{matrix} \int_{0}^{\infty} \frac{d x}{x^{α} {(1 + x^{γ})}^{β}} = \frac{1}{γ} B (β - \frac{1 - α}{γ}, \frac{1 - α}{γ}) \end{matrix}

is repeatedly used for the integrations below. (More precisely, for (73) we set

α = 0 (< 1)

,

β = \frac{1}{1 - q} (> 0)

,

γ = 2 (> 0)

, and

β γ > 1 - α

is satisfied.) For (74), we set

α = - 2 (< 1)

,

β = \frac{1}{1 - q} (> 0)

,

γ = 2 (> 0)

, and

β γ > 1 - α

is satisfied. Substituting (72) into the constraint (5b), we have

\begin{matrix} ⟪p_{opt} (x)⟫ = r^{(\frac{2}{q - 1} + 1)} {(- λ_{q, opt})}^{\frac{1}{q - 1}} B (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2}) = 1, \end{matrix}

(73)

where r is defined by

r^{2} = - (σ^{2} + \frac{1}{λ_{q, opt} C}) .

Note that

r^{2} = \frac{3 q - 1}{1 - q} σ^{2} > 0

as shown in (76). On the other hand, substituting (72) into the constraint (5c), we have

\begin{matrix} ⟪x^{2} p_{opt} (x)⟫ = r^{(\frac{2}{q - 1} + 3)} {(- λ_{q, opt})}^{\frac{1}{q - 1}} B (\frac{1}{1 - q} - \frac{3}{2}, \frac{3}{2}) = σ^{2} . \end{matrix}

(74)

Substitution of (73) and (74) after multiplying

σ^{2}

to both sides into (7a) yields

\begin{matrix} r^{2} B (\frac{1}{1 - q} - \frac{3}{2}, \frac{3}{2}) = σ^{2} B (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2}), i . e ., r^{2} = \frac{3 q - 1}{1 - q} σ^{2}, \end{matrix}

(75)

in which

B (x + 1, y - 1) = \frac{x}{y - 1} B (x, y)

([42], p. 254) is used. Thus, we first obtain

λ_{q, opt}

as

\begin{matrix} λ_{q, opt} = \frac{1}{2 C σ^{2}} \frac{q - 1}{q}, \end{matrix}

(76)

since

\begin{matrix} r^{2} = - (σ^{2} + \frac{1}{λ_{q, opt} C}) = \frac{3 q - 1}{1 - q} σ^{2} . \end{matrix}

Meanwhile, from (73), or (74), and (75) that determines r with q and

σ

,

λ_{q, opt}

(

< 0

) is uniquely determined for any q (

< 1

) and

σ

, and hence, C (

> 0

) is also uniquely determined from (76):

\begin{matrix} C = \frac{1}{2 σ^{2}} \frac{1 - q}{q} r^{q + 1} B^{q - 1} (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2}) . \end{matrix}

Next, we obtain

Z_{q}^{*}

and

β_{q}^{*}

from (72) and (76):

\begin{matrix} {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{\frac{- 1}{1 - q}} & = & {(\frac{3 q - 1}{2 C q})}^{\frac{1}{q - 1}} {[1 - \frac{1 - q_{*}}{(3 q - 1) σ^{2}} x^{2}]}^{\frac{1}{1 - q_{*}}} \\ = & \frac{1}{Z_{q}^{*}} {exp}_{q^{*}} [- \frac{x^{2}}{(3 q - 1) σ^{2}}], \end{matrix}

which yields

\begin{matrix} β_{q}^{*} = \frac{1}{(3 q - 1) σ^{2}}, Z_{q}^{*} = {(\frac{3 q - 1}{2 C q})}^{\frac{1}{1 - q}} = \sqrt{\frac{3 q - 1}{1 - q}} σ B (\frac{1}{1 - q} - \frac{1}{2}, \frac{1}{2}) . \end{matrix}

Now, we verify (70) is satisfied by

p_{opt}

. Note that if

\frac{1}{3} < q < 1

, using (76) we have

\begin{matrix} λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C} = \frac{1}{C} (\frac{1 - q}{2 σ^{2} q} x^{2} + \frac{3 q - 1}{2 q}) > 0 \end{matrix}

(77)

and hence (70) is satisfied by

p_{opt}

:

\begin{matrix} 1 + λ_{q, opt} (x^{2} - σ^{2}) p_{opt}^{1 - q} (x) = \frac{1}{C} {[λ_{q, opt} (σ^{2} - x^{2}) + \frac{1}{C}]}^{- 1} > 0 (a . e . x \in R), \end{matrix}

(78)

which is immediate from (71), (72), and (77). Thus,

p_{opt}

is uniquely determined as in (72).

Finally, to prove that this candidate (72) is the unique maximizer, we directly compare

⟪p^{q} (x)⟫

and

⟪p_{opt}^{q} (x)⟫

as follows. Similar to (66), the following holds here:

\begin{matrix} ⟪p^{q} (x)⟫ - ⟪p_{opt}^{q} (x)⟫ \end{matrix}

\begin{matrix} = & ⟪p^{q} (x) - p_{opt}^{q} (x) + q λ_{q, opt} (x^{2} - σ^{2}) [p (x) - p_{opt} (x)]⟫ - \frac{q}{C} ⟪p (x) - p_{opt} (x)⟫ \end{matrix}

(79a)

\begin{matrix} = & ⟪p^{q} (x) - p_{opt}^{q} (x) - q p_{opt}^{q - 1} (x) [p (x) - p_{opt} (x)]⟫ . \end{matrix}

(79b)

Note that (79a) follows from

⟪p_{opt} (x)⟫ = ⟪p (x)⟫ = 1

in (5b) and

⟪x^{2} p_{opt} (x)⟫ = ⟪x^{2} p (x)⟫ = σ^{2}

in (5c), and (79b) follows from (79a) by using

q λ_{q, opt} (x^{2} - σ^{2}) - \frac{q}{C} = - q p_{opt}^{q - 1} (x) (x \in R)

, which is immediate from (72). Because for

0 < q < 1

,

h (X) = X^{q} - q X + q - 1 \leq 0

for any

X \geq 0

and because

h (X) = 0

only when

X = 1

, (79b) is not positive, and it becomes 0 if and only if

X = 1

, in other words,

p (x) = p_{opt} (x)

(

x \in R

). This proves that

p_{opt} (x)

in (72) is the unique maximizer to R2 for

\frac{1}{3} < q < 1

. □

5. Conclusions and Discussion

We obtained a new insight about a direct link between generalized entropy and Hölder’s inequality, and yet another proof for Rényi–Tsallis entropy maximization; the q-Gaussian distribution is directly obtained from the equality condition of Hölder’s inequality, and its optimality is proved by Hölder’s inequality through Moriguti’s argument. The simplicity in the proofs of Tsallis entropy maximization (Theorem 1, 2, and 3) is worth noting; essentially, several lines of inequalities (including Hölder’s inequality) are sufficient for the proof.

As an analogy, what we have described in this study can be explained as mountain climbing; as for Tsallis entropy maximization, the top of the mountain, in other words, the upper/lower bound is clearly seen from the starting point. Namely, the bounds in (24c), (34d), and (51d) are explicitly given by q and

σ

. Therefore, all we need to do is to keep climbing to the top, in other words, to construct a series of inequalities (24), (34), and (51) that saturate at the bound. On the other hand, for Rényi entropy maximization, the top of the mountain is not clearly seen from the starting point. Namely, the upper/lower bound is not given only by q and

σ

but contains

p (x)

, as in (58c) or (69c). Even in such a case, Hölder’s inequality is still useful for finding a peak of the mountain, in other words, it leads to a candidate of the global optimal, and then we verify this candidate is really the top by using a GPS (global positioning system). In addition, this GPS is obtained as in (66) or (79), thanks to Moriguti [5].

Our technique with Hölder’s inequality plus the additional parameter

λ_{q}

can be useful for other inequalities (e.g., Young’s inequality), and it seems an interesting open problem to clarify what sort of optimization problems can be solved from such a technique.

Author Contributions

Conceptualization, H.-A.T.; methodology, H.-A.T.; writing–original draft preparation, H.-A.T.; writing–review and editing, H.-A.T. and M.N.; supervision, Y.O.; funding acquisition, H.-A.T.

Funding

This work has been supported by the Japan Ministry of Education, Culture, Sports, Science and Technology (MEXT) (Grant No. 26286086) and by the Support Center for Advanced Telecommunications Technology Research (SCAT).

Acknowledgments

The authors would like to express their gratitude to the referees for their careful reading of an earlier draft of this paper and valuable suggestions for improving the paper. The authors are indebted to Hiroki Suyari, Makoto Tsukada, Hideki Takayasu, Hayato Waki, Hayato Chiba, Yutaka Jitsumatsu, Fumito Mori, Yasuhiro Tsubo, Takashi Shimada, Akitoshi Takayasu, Masahide Kashiwagi, and Shinichi Oishi for their enlightening suggestions. One of the authors (H. T.) appreciates Norikazu Takahashi for his critical reading of the manuscript and valuable suggestions which improved the manuscript. H. T. appreciates Jürgen Kurths and Istvan Z. Kiss for their inspiring suggestion that motivated this work, and H. T. also appreciates Constantino Tsallis for his critical comments at Social Modeling and Simulations + Econophysics Colloquium (SMSEC2014). One of the authors (H. T.) would like to dedicate this work to the memory of Sigeiti Moriguti.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Example of Non-Empty set Q

Here, we illustrate an example of

Q

introduced in Section 4.4. Having obtained

p_{opt}

and

λ_{q, opt}

in Section 4.4, an element

p (\in Q)

, which satisfies (5b), (5c), and

\begin{matrix} p^{q} (x) + λ_{q, opt} (x^{2} - σ^{2}) p (x) \geq 0 (\forall x \in R) \end{matrix}

(A1)

is constructed from

p_{opt}

in (61) as follows. Figure A1 shows how we are going to construct p from

p_{opt}

; the basic idea is that such p is obtained only by slightly modifying

p_{opt}

at its edge while keeping the constraint (A1). First, we choose small adjacent intervals

I_{1}

and

I_{2}

inside the interval

[σ, \sqrt{\frac{3 q - 1}{q - 1}} σ]

. This choice is consistent to the fact that the constraint (A1) is equivalent to

p (x) \geq {[λ_{q, opt} (σ^{2} - x^{2}), 0]}_{+}^{\frac{1}{q - 1}} (as shown by the red dotted line in Figure A 1),

and hence

p (x)

can be 0 in

[σ, \sqrt{\frac{3 q - 1}{q - 1}} σ]

(as observed in the inset of Figure A1). Second, we shift

I_{1}

,

I_{2}

, and the associated value of

p_{opt} (x)

originally defined on

I_{1}

and

I_{2}

, altogether, while keeping the original value of

⟪(x^{2} - σ^{2}) p (x)⟫

at 0. As shown in the inset of Figure A1, an option for this shift is:

I_{1}

to the right and

I_{2}

to the left. Such an option for small shifts always exists because of the continuity of integration

⟪(x^{2} - σ^{2}) p (x)⟫

with respect to

I_{1}

and

I_{2}

. Note that the resulting p shown in the inset of Figure A1 satisfies (5b), (5c), and (A1). The above constructed p constitutes a non-empty set

Q

and it is straightforwardly verified to be convex.

Figure A1. Construction of

p (\in Q)

from

p_{opt}

.

Figure A1. Construction of

p (\in Q)

from

p_{opt}

.

References

Tsallis, C. Introduction to Nonextensive Statistical Mechanics; Springer: New York, NY, USA, 2009; ISBN 978-0-387-85358-1. [Google Scholar]
Prato, D.; Tsallis, C. Nonextensive foundation of Lévy distributions. Phys. Rev. E 1999, 60, 2398–2401. [Google Scholar] [CrossRef]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2006; ISBN 978-0-471-24195-9. [Google Scholar]
Daróczy, Z. Generalized information functions. Inf. Control 1970, 16, 36–51. [Google Scholar] [CrossRef] [Green Version]
Moriguti, S. A lower bound for a probability moment of any absolutely continuous distribution with finite variance. Ann. Math. Stat. 1952, 23, 286–289. [Google Scholar] [CrossRef]
Campbell, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef]
Baer, M.B. Source coding for quasiarithmetic penalties. IEEE Trans. Inf. Theory 2006, 52, 4380–4393. [Google Scholar] [CrossRef]
Bercher, J.-F. Source coding with scaled distributions and Rényi entropy bounds. Phys. Lett. A 2009, 373, 3235–3238. [Google Scholar] [CrossRef]
Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef]
Landsberg, P.T.; Vedral, V. Distributions and channel capacities in generalized statistical mechanics. Phys. Lett. A 1998, 247, 211–217. [Google Scholar] [CrossRef]
Ilić, V.M.; Djordjević, I.B.; Küeppers, F. On the Daróczy–Tsallis capacities of discrete channels. In Sciforum Electronic Conference Series, Proceedings of the 2nd International Electronic Conference on Entropy and Its Applications, 15–30 November 2015; MDPI: Basel, Switzerland, 2015; B004; pp. 1–11. [Google Scholar]
Venkatesan, R.C.; Plastino, A. Generalized statistics framework for rate distortion theory. Phys. A 2009, 388, 2337–2353. [Google Scholar] [CrossRef] [Green Version]
Girardin, V.; Lhote, L. Rescaling entropy and divergence rates. IEEE Trans. Inf. Theory 2015, 61, 5868–5882. [Google Scholar] [CrossRef]
Thistleton, W.J.; Marsh, J.A.; Nelson, K.; Tsallis, C. Generalized Box-Müller method for generating q-Gaussian random deviates. IEEE Trans. Inf. Theory 2007, 53, 4805–4810. [Google Scholar] [CrossRef]
Umeno, K.; Sato, A. Chaotic method for generating q-Gaussian random variables. IEEE Trans. Inf. Theory 2013, 59, 3199–3209. [Google Scholar] [CrossRef]
Karmeshu; Sharma, S. Queue length distribution of network packet traffic: Tsallis entropy maximization with fractional moments. IEEE Commun. Lett. 2006, 10, 34–36. [Google Scholar] [CrossRef]
Sharma, S.; Karmeshu. Power law characteristic and loss probability: finite buffer queueing systems. IEEE Commun. Lett. 2009, 13, 971–973. [Google Scholar] [CrossRef]
Singh, A.K.; Karmeshu. Power law behavior of queue size: Maximum entropy principle with shifted geometric mean constraint. IEEE Commun. Lett. 2014, 18, 1335–1338. [Google Scholar] [CrossRef]
Singh, A.K.; Singh, H.P.; Karmeshu. Analysis of finite buffer queue: Maximum entropy probability distribution with shifted fractional geometric and arithmetic means. IEEE Commun. Lett. 2015, 19, 163–166. [Google Scholar] [CrossRef]
Jaynes, E.T. On the rationale of maximum-entropy methods. Proc. IEEE 1982, 70, 939–952. [Google Scholar] [CrossRef]
Shore, J.E.; Johnson, R.W. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef] [Green Version]
Shore, J.E.; Johnson, R.W. Properties of cross-entropy minimization. IEEE Trans. Inf. Theory 1981, 27, 427–482. [Google Scholar] [CrossRef]
Tsukada, M.; Suyari, H.; Kato, M. On the probability distribution max imizing generalized entropies. In Proceedings of 2005 Symposium on Applied Functional Analysis: Information Sciences and Related Fields; Murohashi, T., Takahashi, W., Tsukada, M., Eds.; Yokohama Publisher: Yokohama, Japan, 2007; pp. 99–111. [Google Scholar]
Furuichi, S. On the maximum entropy principle and the minimization of the Fisher information in Tsallis statistics. J. Math. Phys. 2009, 50, 013303:1–013303:13. [Google Scholar] [CrossRef]
Lutwak, E.; Yang, D.; Zhang, G. Cramér-Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information. IEEE Trans. Inf. Theory 2005, 51, 473–478. [Google Scholar] [CrossRef]
Lutwak, E.; Yang, D.; Zhang, G. Moment-entropy inequalities for a random vector. IEEE Trans. Inf. Theory 2007, 53, 1603–1607. [Google Scholar] [CrossRef]
Eguchi, S.; Komori, O.; Kato, S. Projective power entropy and maximum Tsallis entropy distributions. Entropy 2011, 13, 1746–1764. [Google Scholar] [CrossRef]
Watanabe, S.; Oohama, Y. Secret key agreement from vector Gaussian sources by rate limited public communication. IEEE Trans. Inf. Forensic Secur. 2011, 6, 541–550. [Google Scholar] [CrossRef]
Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 6801–6810. [Google Scholar] [CrossRef]
Sakai, Y.; Iwata, K. Sharp bounds on Arimoto’s conditional Rényi entropies between two distinct orders. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2985–2989. [Google Scholar] [CrossRef]
Suyari, H.; Tsukada, M. Law of error in Tsallis statistics. IEEE Trans. Inf. Theory 2005, 51, 753–757. [Google Scholar] [CrossRef]
Vignat, C.; Hero, A.O., III; Costa, J.A. About closedness by convolution of the Tsallis maximizers. Phys. A 2004, 340, 147–152. [Google Scholar] [CrossRef]
Bercher, J.-F. On generalized Cramér-Rao inequalities, generalized Fisher information and characterizations of generalized q-Gaussian distributions. J. Phys. A Math. Gen. 2012, 45, 255303:1–255303:15. [Google Scholar] [CrossRef]
Bercher, J.-F. On multidimensional generalized Cramér-Rao inequalities, uncertainty relations and characterizations of generalized q-Gaussian distributions. J. Phys. A Math. Theor. 2013, 46, 095303:1–095303:18. [Google Scholar] [CrossRef]
Tanaka, H.-A. Optimal entrainment with smooth, pulse, and square signals in weakly forced nonlinear oscillators. Phys. D 2014, 288, 1–22. [Google Scholar] [CrossRef] [Green Version]
Tanaka, H.-A. Synchronization limit of weakly forced nonlinear oscillators. J. Phys. A Math. Theor. 2014, 47, 402002:1–402002:10. [Google Scholar] [CrossRef]
Dembo, A.; Cover, T.M.; Thomas, J.A. Information theoretic inequalities. IEEE Trans. Inf. Theory 1991, 37, 1501–1518. [Google Scholar] [CrossRef]
Oikonomou, T.; Bagci, G.B. A note on the definition of deformed exponential and logarithm functions. J. Math. Phys. 2009, 50, 103301:1–103301:9. [Google Scholar] [CrossRef]
Dehesa, J.S.; Galvez, F.J.; Porras, I. Bounds to density-dependent quantities of D-dimensional many-particle systems in position and momentum spaces: Applications to atomic systems. Phys. Rev. A 1989, 40, 35–40. [Google Scholar] [CrossRef]
Hardy, G.; Littlewood, J.E.; Pólya, G. Inequalities, 2nd ed.; Cambridge University Press: Cambridge, UK, 1988; ISBN 978-0-521-35880-4. [Google Scholar]
Rudin, W. Real and Complex Analysis, 3rd ed.; McGraw-Hill: New York, NY, USA, 1987; pp. 63–65. ISBN1 0070542341. ISBN2 9780070542341. [Google Scholar]
Whittaker, E.T.; Watson, G.N. A Course of Modern Analysis, 4th ed.; Cambridge University Press: Cambridge, UK, 1927. [Google Scholar]

Figure 1. This figure illustrates how our approach for Theorem 4 works in a possible structure of our optimization problem. The whole curve represents all feasible solutions, and the dotted points represent the subset

Q

in all feasible solutions.

Figure 1. This figure illustrates how our approach for Theorem 4 works in a possible structure of our optimization problem. The whole curve represents all feasible solutions, and the dotted points represent the subset

Q

in all feasible solutions.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tanaka, H.-A.; Nakagawa, M.; Oohama, Y. A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization. Entropy 2019, 21, 549. https://doi.org/10.3390/e21060549

AMA Style

Tanaka H-A, Nakagawa M, Oohama Y. A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization. Entropy. 2019; 21(6):549. https://doi.org/10.3390/e21060549

Chicago/Turabian Style

Tanaka, Hisa-Aki, Masaki Nakagawa, and Yasutada Oohama. 2019. "A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization" Entropy 21, no. 6: 549. https://doi.org/10.3390/e21060549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Direct Link between Rényi–Tsallis Entropy and Hölder’s Inequality—Yet Another Proof of Rényi–Tsallis Entropy Maximization

Abstract

1. Introduction

2. Basic Definitions and Problem Formulation

2.1. Tsallis Entropy and Rényi Entropy

2.2. Problem Formulation

2.3. Hölder’s Inequality for Later Analysis

3. Main Results

4. Proof of Main Results

4.1. Proof of Theorem 1

4.2. Proof of Theorem 2

4.3. Proof of Lemma 1 and Theorem 3

4.4. Proof of Theorem 4

4.5. Proof of Theorem 5

5. Conclusions and Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Example of Non-Empty set Q

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI