Next Article in Journal
Decomposed Entropy and Estimation of Output Power in Deformed Microcavity Lasers
Next Article in Special Issue
Information and Agreement in the Reputation Game Simulation
Previous Article in Journal
H State-Feedback Control of Multi-Agent Systems with Data Packet Dropout in the Communication Channels: A Markovian Approach
Previous Article in Special Issue
A Hierarchy of Probability, Fluid and Generalized Densities for the Eulerian Velocivolumetric Description of Fluid Flow, for New Families of Conservation Laws
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extended Divergence on a Foliation by Deformed Probability Simplexes

Faculty of Engineering, Tohoku Gakuin University, Tagajo 985-8537, Miyagi, Japan
Entropy 2022, 24(12), 1736; https://doi.org/10.3390/e24121736
Submission received: 30 October 2022 / Revised: 25 November 2022 / Accepted: 26 November 2022 / Published: 28 November 2022

Abstract

:
This study considers a new decomposition of an extended divergence on a foliation by deformed probability simplexes from the information geometry perspective. In particular, we treat the case where each deformed probability simplex corresponds to a set of q-escort distributions. For the foliation, different q-parameters and the corresponding α -parameters of dualistic structures are defined on each of the various leaves. We propose the divergence decomposition theorem that guides the proximity of q-escort distributions with different q-parameters and compare the new theorem to the previous theorem of the standard divergence on a Hessian manifold with a fixed α -parameter.

1. Introduction

In the field of nonextensive statistics, q-normal distributions and the generalization, q-exponential families, play an important role [1,2,3]. Since Ohara first pointed out the correspondence between the q-parameter of nonextensive statistics and the α -parameter of information geometry [4,5], the information geometric structure of q-exponential families has been investigated [6,7,8,9,10,11,12,13,14].
On a set of probability distributions, divergences are usually defined for a fixed α -parameter of the dualistic structure. Using those results, we defined an extended divergence on a foliation by sets of probability distributions, setting different α -parameters on each leaf. In particular, we treated a foliation by deformed probability simplexes [15].
In this paper, we also study deformed probability simplexes corresponding to sets of escort distributions with q-parameters, which satisfy q = ( 1 α ) / 2 for α -parameters of information geometry. We clarify the relationship among affine spaces, affine immersions and the extended divergence more than in our previous paper. A comparison with the extended divergence and the duo Bregman divergence used in machine learning is also described [16].
First, we explain the dualistic structures, α -divergences, and the Tsallis relative entropy on the probability simplex, using the concept of affine geometry and information geometry. The relationship between an α -parameter and the Tsallis q-parameter is stated. Next, we describe the dualistic structures and the divergences generated by affine immersions on the deformed probability simplexes corresponding to sets of escort distributions. It also includes topics about Hessian manifolds and their level surfaces. We then define an extended divergence on a foliation by deformed probability simplexes. Finally, we propose a new decomposition of an extended divergence on the foliation.

2. The Tsallis Relative Entropy and the Kullback–Leibler Divergence on the Probability Simplex

In this section, we explain dualistic structures, α -divergences, and the Tsallis relative entropy on the probability simplex [4,5,12].
Let A n + 1 be an ( n + 1 ) -dimensional real affine space and { x 1 , , x n + 1 } be the canonical affine coordinate system on A n + 1 , i.e., D ˜ d x = 0 , where D ˜ is the canonical flat affine connection on A n + 1 . Let S n be a simplex in A + n + 1 defined by
S n = { p | p A + n + 1 , i = 1 n + 1 x i ( p ) = 1 } .
If x 1 ( p ) , , x n + 1 ( p ) are regarded as probabilities of n + 1 states, S n is called the n-dimensional probability simplex. Let { p ¯ 1 , , p ¯ n } be an affine coordinate system on S n defined by p ¯ i ( p ) = x i ( p ) x n + 1 ( p ) for i = 1 , , n , and
{ 1 , , n } , w h e r e i | p = x i x n + 1 | p , p S n ,
be a frame of a tangent vector field on S n .
The Fisher metric g = ( g i j ) on S n is defined by
g i j ( p ) g ( i , j ) | p = k = 1 n + 1 x k ( p ) log x k x i | p log x k x j | p = 1 x i ( p ) δ i j + 1 x n + 1 ( p ) ,
p S n , i , j = 1 , , n ,
where δ i j is the Kronecker’s delta. We define an α -connection ( α ) on S n by
i ( α ) j = k = 1 n Γ i j ( α ) k k ,
Γ i j ( α ) k | p = 1 + α 2 ( 1 x k ( p ) δ i j k + x k ( p ) g i j ( p ) ) , i , j , k = 1 , , n ,
where δ i j k = 1 if i = j = k , and δ i j k = 0 if others. Then, the Levi–Civita connection of g coincides with ( 0 ) . For α R , we have
X g ( Y , Z ) = g ( X ( α ) Y , Z ) + g ( Y , X ( α ) Z ) for X , Y , Z X ( S n ) ,
where X ( S n ) is the set of all smooth tangent vector fields on S n . Then, ( α ) is called the dual connection of ( α ) . For each α , ( α ) is torsion-free and ( α ) g is symmetric. Therefore, the triple ( S n , ( α ) , g ) is a statistical manifold, and ( S n , ( α ) , g ) the dual statistical manifold of it.
Note that affine connections ( 1 ) and ( 1 ) in Equations (4)–(6) are the dual connection and the canonical connection, respectively.
It is known that when n 2 , the curvature of the statistical manifold ( S n , ( α ) , g ) is a constant value
κ = 1 + α 2 1 α 2 = 1 α 2 4 .
Therefore, the curvature of the dual statistical manifold ( S n , ( α ) , g ) is also κ = ( 1 α 2 ) / 4 . Iff α = ± 1 , the curvature of ( S n , ( α ) , g ) is zero, and ( ( α ) , ( α ) , g ) is called the dually flat structure.
For α ± 1 , an α -divergence D ( α ) on A + n + 1 is often defined by
D ( α ) ( p , r ) = 4 1 α 2 { 1 α 2 i = 1 n + 1 x i ( p ) + 1 + α 2 i = 1 n + 1 x i ( r ) i = 1 n + 1 x i ( p ) 1 α 2 x i ( r ) 1 + α 2 } ,
p , r A + n + 1 .
If q = ( 1 α ) / 2 , it holds that
D ( α ) ( p , r ) = 1 q K q ( p , r ) , p , r S n ,
for the Tsallis relative entropy K q on S n defined by
K q ( p , r ) i = 1 n + 1 x i ( p ) ln q x i ( r ) x i ( p ) = 1 1 q { 1 i = 1 n + 1 x i ( p ) q x i ( r ) 1 q } , p , r S n ,
where ln q is the q-logarithmic function defined by
ln q x x 1 q 1 1 q , q 1 , x > 0
Refs. [1,2]. The Tsallis relative entropy K q converges to the Kullback–Leibler divergence as q 1 , because lim q 1 ln q x = log x . In the information geometric view, the α -divergence D ( α ) converges to the Kullback–Leibler divergence as α 1 .
For the Tsallis q-parameter, the curvature of the statistical manifold ( S n , ( α ) , g ) is κ = q ( 1 q ) .

3. Divergences Generated by Affine Immersions as Level Surfaces

In this section, we describe the general theory of affine immersions and divergences related to level surfaces of the Hessian domain.
If the Hessian D ˜ d φ = i , j ( 2 φ ) / ( x i x j ) d x i d x j of a function φ on a domain Ω A n + 1 is non-degenerate, the triple ( Ω , D ˜ , g ˜ = D ˜ d φ ) is called a Hessian domain. A statistical manifold is said to be flat if the curvature tensor of its affine connection vanishes. A flat statistical manifold is locally a Hessian domain. Conversely, a Hessian domain is a flat statistical manifold [12,17].
In a previous study, we show the following theorem on the level surfaces of a Hessian function.
Theorem 1
([18]). Let M be a simply connected n-dimensional level surface of φ on an ( n + 1 ) -dimensional Hessian domain ( Ω , D ˜ , g ˜ = D ˜ d φ ) with a Riemannian metric g ˜ and suppose that n 2 . If we consider ( Ω , D ˜ , g ˜ ) a flat statistical manifold, ( M , D , g ) is a 1-conformally flat statistical submanifold of ( Ω , D ˜ , g ˜ ) , where D and g denote the connection and the Riemannian metric on M induced by D ˜ and g ˜ , respectively.
Here, “1-conformally flat” represents the characterization of surfaces projected by a flat statistical manifold along dual coordinates. We continue to explain the terms used in Theorem 1 and the outline of the proof.
For α R , statistical manifolds ( N , , h ) and ( N , ¯ , h ¯ ) are α -conformally equivalent if there exists a function ϕ on N such that
h ¯ ( X , Y ) = e ϕ h ( X , Y ) , h ( ¯ X Y , Z ) = h ( X Y , Z ) 1 + α 2 d ϕ ( Z ) h ( X , Y ) + 1 α 2 { d ϕ ( X ) h ( Y , Z ) + d ϕ ( Y ) h ( X , Z ) } , X , Y , Z X ( N ) .
If ( N , ¯ , h ¯ ) is 1-conformally equivalent to a flat statistical manifold ( N , , h ) , ( N , ¯ , h ¯ ) is called a 1-conformally flat statistical manifold. A statistical manifold ( N , , h ) is 1-conformally flat iff the dual statistical manifold ( N , , h ) is ( 1 ) -conformally flat [19].
In terms of affine geometry, ( N , , h ) and ( N , ¯ , h ) are ( 1 ) -conformally equivalent if and only if and ¯ are projectively equivalent [20,21].
For an ( n + 1 ) -dimensional Hessian domain ( Ω , D ˜ , g ˜ = D ˜ d φ ) , an n-dimensional level surface of φ has the dualistic structure as the statistical submanifold structure. On the other hand, the level surface also has the structure induced by the affine immersion. It is essential for Theorem 1 that the statistical submanifold structure coincides with the dualistic structure by the affine immersion on a level surface of φ .
For ( Ω , D ˜ , g ˜ = D ˜ d φ ) , let x be the canonical immersion of an n-dimensional level surface M into Ω . Let E be a transversal vector field on M defined by
E = d φ ( E ˜ ) 1 E ˜ ,
where E ˜ is the gradient vector field of φ on Ω defined by
g ˜ ( X ˜ , E ˜ ) = d φ ( X ˜ ) , X ˜ X ( Ω ) .
For an affine immersion ( x , E ) and the canonical flat affine connection D ˜ on Ω A n + 1 , the induced affine connection D E , the affine fundamental form g E , the shape operator S E and the transversal connection form τ E on M are defined by
D X Y = D X E Y + g E ( X , Y ) E ,
D X E = S E ( X ) + τ E ( X ) E , X , Y X ( M ) .
See [21,22]. Then, D E and g E coincide with the restricted affine connection of D ˜ and the restricted Riemannian metric of g ˜ , respectively. For the level surface M, the transversal connection form satisfies that τ E 0 . Therefore, ( x , E ) it is called the equiaffine immersion. It is known that a simply connected statistical manifold can be realized in A n + 1 by a non-degenerate equiaffine immersion iff it is 1-conformally flat [19]. Thus, Theorem 1 holds.
Next, we introduce a divergence on a Hessian domain, treating it as a flat statistical manifold.
The canonical divergence ρ of a Hessian domain ( Ω , D ˜ , g ˜ = D ˜ d φ ) is defined by
ρ ( p , r ) = φ ( p ) + φ * ( ι ˜ ( r ) ) + i = 1 n + 1 x i ( p ) x i ( r ) f o r p , r Ω ,
where ι ˜ is the gradient mapping from Ω to the dual affine space A n + 1 * , i.e.,
x i = x i * ι ˜ = φ x i ,
and { x 1 * , , x n + 1 * } is the dual affine coordinate system of { x 1 , , x n + 1 }. The Legendre transform φ * of φ is defined by
φ * ι ˜ = i = 1 n + 1 x i x i φ .
See [12].
Let ι be the conormal immersion for the affine immersion ( x , E ) defined by Equation (11), 12. By the definition of a conormal immersion, ι satisfies that
ι ( p ) , Y p = 0 , Y p T p M , ι ( p ) , E p = 1 for p M ,
where a , b is the pairing of a A n + 1 * and b A n + 1 . It is known that the conormal immersion ι coincides with the restriction of the gradient mapping ι ˜ to the level surface M.
The next definition is given in relation to affine immersions and divergences.
Definition 1
([19]).Let ( N , , h ) be a 1-conformally flat statistical manifold realized by a non-degenerate affine immersion ( v , ξ ) into A n + 1 , and w the conormal immersion for v. Then the divergence ρ c o n f of ( N , , h ) is defined by
ρ c o n f ( p , r ) = w ( r ) , v ( p ) v ( r ) f o r p , r N .
The ρ c o n f definition is independent of the choice of a realization of ( N , , h ) .
The divergence ρ c o n f is referred to as Kurose geometric divergence in affine geometry and as Fenchel–Young divergence in the machine learning community [23,24]. Since an n-dimensional level surface M of ( Ω , D ˜ , g ˜ = D ˜ d φ ) is a 1-conformally flat statistical manifold realized by a non-degenerate affine immersion ( x , E ) , ρ c o n f on M is as follows:
ρ c o n f ( p , r ) = ι ( r ) , x ( p ) x ( r ) for p , r M .
Let ρ s u b be the restriction of the canonical divergence ρ to ( M , D , g ) as a statistical submanifold of ( Ω , D ˜ , g ˜ ) . From Equations (15), (17) and (18), the next theorem holds.
Theorem 2
([20]). For a 1-conformally flat statistical submanifold ( M , D , g ) of ( Ω , D ˜ , g ˜ ) , two divergences ρ c o n f and ρ s u b coincide.

4. Deformed Probability Simplexes and Escort Distributions Generated by Affine Immersions

In this section, we explain dualistic structures on deformed probability simplexes, which correspond to sets of escort distributions via affine immersion.
We set p i = x i ( p ) , i = 1 , , n + 1 for p S n , where S n and { x 1 , , x n + 1 } be the probability simplex and the canonical affine coordinate system on A n + 1 , respectively. For n + 1 states p 1 , , p n + 1 on S n and 0 < q < 1 , if each probability P ( p i ) satisfies
P ( p i ) = ( p i ) q i = 1 n + 1 ( p i ) q , i = 1 , , n + 1 ,
the probability distribution P is called the escort distribution [1,2], where ( p i ) q is p i powered by q.
It realizes the dualistic structure of a set of escort distributions via the affine immersion into A + n + 1 [4,5]. For 0 < q < 1 , let f q be the affine immersion of S n into A + n + 1 defined by
x i ( f q ( p ) ) = 1 q ( x i ( p ) ) q , i = 1 , , n + 1 , f o r p S n .
Then, the escort distribution P is also represented as follows:
P ( p i ) = θ i i = 1 n + 1 θ i , θ i = 1 q ( p i ) q , i = 1 , , n + 1 .
For a function ψ q on A + n + 1 defined by
ψ q = 1 1 q i = 1 n + 1 ( q x i ) 1 q ,
the image f q ( S n ) is a level surface of ψ q satisfying ψ q = 1 / ( 1 q ) . For 0 < q < 1 , the Hessian matrix of the function ψ q is positive definite on A + n + 1 . Then, ψ q induces the Hessian structure ( A + n + 1 , D ˜ , g ˜ q ( 2 ψ q / x i x j ) ) . By definition
Γ ˜ i j k = l = 1 n + 1 g ˜ q k l Γ ˜ i j l = 3 ψ x i x j x k , i , j , k = 1 , , n ,
D ˜ x i ( α ) x j = 1 α 2 k = 1 n + 1 Γ ˜ i j k x k , α = 1 2 q ,
the tetrad ( A + n + 1 , D ˜ , D ˜ ( 1 ) , g ˜ q ) is the dually flat structure. The connection D ˜ ( 0 ) coincides with the Levi–Civita connection of the Riemannian metric g ˜ q .
We denote by D and g q the restricted D ˜ and g q ˜ on f q ( S n ) , and induce the dualistic structure of ( f q ( S n ) , D , g q ) as the submanifold structure of ( A + n + 1 , D ˜ , g ˜ q ) . From the discussion in Section 3, ( f q ( S n ) , D , g q ) coincides with the dualistic structure induced by the equiaffine immersion ( f q , E q ) , where
E q d ψ q ( E q ˜ ) 1 E q ˜
for the gradient vector field E q ˜ of ψ q on A + n + 1 defined by
g ˜ q ( X ˜ , E q ˜ ) = d ψ q ( X ˜ ) for X ˜ X ( A + n + 1 ) .
The pullback of ( f q ( S n ) , D , g q ) to S n is ( 1 ) -conformally equivalent to ( S n , ( α ) , g ) defined by Equations (3)–(5). In addition, ( f q ( S n ) , D , g q ) has a constant curvature κ = q ( 1 q ) = ( 1 α 2 ) / 4 [5].
On ( f q ( S n ) , D , g q ) , the restricted divergence ρ q from the canonical divergence of ( A + n + 1 , D ˜ , g ˜ q ) coincides with the geometric divergence by Equation (18) from the affine immersion ( f q , E q ) . For an affine coordinate system { x 1 , , x n + 1 } on A n + 1 defined by
x i = ψ q x i = 1 1 q ( q x i ) 1 q q ,
the divergence ρ q of ( f q ( S n ) , D , g q ) is described as
ρ q ( a , b ) = i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) , a , b f q ( S n ) .
In addition, the pullback divergence of ρ q to S n coincides with D ( α ) and the Tsallis relative entropy K q [4].
At the end of this section, we mention the divergence of ( A + n + 1 , D ˜ , g ˜ q ) . By Equation (17), the Legendre transform ψ q * of ψ q is
ψ q * ( x ( a ) ) = ψ q ( a ) + i = 1 n + 1 x i ( a ) x i ( a ) , a A + n + 1 .
By Equations (15) and (16), the canonical divergence ρ q of ( A + n + 1 , D ˜ , g ˜ q ) is defined by
ρ q ( a , b ) = ψ q ( a ) ψ q ( b ) + i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) , a , b A + n + 1 ,
represented by the same symbol ρ q of ( f q ( S n ) , D , g q ) .

5. Extended Divergence on a Foliation by Deformed Probability Simplexes

Previous sections described the divergence for each fixed q and each fixed α . This section defines an extended divergence on a foliation by deformed probability simplexes ( f q ( S n ) , D , g q ) for all 0 < q < 1 , and shows the divergence decomposition theorem. The contents of our paper [15] are included but are explained in detail by the setting of affine geometry.
To give the proximity of q-escort distributions with different q-parameters, we define an extended divergence on a foliation by deformed probability simplexes as follows.
Definition 2.
Let S f o l = 0 < q < 1 f q ( S n ) = { p | p A + n + 1 , i = 1 n + 1 x i ( p ) > 1 } , which corresponds to a foliation F = { f q ( S n ) | 0 < q < 1 } . We call a function ρ f o l on S f o l × S f o l defined by Equation (31) an extended divergence on a foliation by deformed probability simplexes.
ρ f o l ( a , b ) ψ q ( a ) ( a ) ψ q ( b ) ( b ) + i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) )
f o r a f q ( a ) ( S n ) , b f q ( b ) ( S n ) , 0 < q ( a ) < 1 , 0 < q ( b ) < 1 .
The i-th component of the conormal immersion of ( f q , E q ) is ψ q / x i . By the right-hand side of Equation (27), the dual coordinate of b, denoted by x ( b ) , satisfies that
x ( b ) ( x 1 ( b ) , , x n + 1 ( b ) ) f 1 q ( b ) ( S n ) .
Therefore, we consider f 1 q ( S n ) as the dual simplex of f q ( S n ) for 0 < q < 1 . As q = 1 / 2 , f q ( S n ) is self dual [4]. Note that the i-th component of the dual coordinate of b is denoted by η i ( b ) = x i ( b ) = ( ψ q / x i ) | b in [15].
On the extended divergence, the next proposition holds.
Proposition 1.
An extended divergence ρ f o l on S f o l of satisfies that:
(i) If a , b f q ( a ) ( S n ) ,
ρ f o l ( a , b ) = ρ q ( a ) ( a , b ) = D ( α ( a ) ) ( f q ( a ) 1 ( a ) , f q ( a ) 1 ( b ) ) ,
where ρ q is the divergence of ( f q ( S n ) , D , g q ) by Equation (28), D ( α ) is an α-divergence defined by Equation (7), and α ( a ) = 1 2 q ( a ) .
(ii) In the case of q ( a ) q ( b ) ,
ρ f o l ( a , b ) 0 f o r ( a , b ) S f o l × S f o l ,
and if and only if a = b ,
ρ f o l ( a , b ) = 0 .
Proof. 
If a , b f q ( a ) ( S n ) , ψ q ( a ) ( a ) = ψ q ( b ) ( a ) = ψ q ( b ) ( b ) . By Equations (28) and (31),
ρ f o l ( a , b ) = i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) = ρ q ( a ) ( a , b ) .
Then, (i) holds. If 1 > q ( a ) q ( b ) > 0 , it holds that ψ q ( a ) ( a ) ψ q ( b ) ( b ) because
ψ q ( a ) ( a ) = 1 1 q ( a ) , ψ q ( b ) ( b ) = 1 1 q ( b )
are induced by the definition of f q ( S n ) . In addition, f q ( a ) ( S n ) and f q ( b ) ( S n ) are convex surfaces centered on the origini of A + n + 1 , and the surfaces f q ( a ) ( S n ) closer to the origin than f q ( b ) ( S n ) . Then, i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) 0 . Thus, (ii) holds. □
We define the extended dual divergence  ρ f o l * of ρ f o l as follows;
ρ f o l * ( a , b ) ψ q ( a ) * ( x ( a ) ) ψ q ( b ) * ( x ( b ) ) + i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) )
f o r a f q ( a ) ( S n ) , b f q ( b ) ( S n ) , 0 < q ( a ) < 1 , 0 < q ( b ) < 1 ,
where ψ q * is the Legendre transform of ψ q for 0 < q < 1 . Then, the following holds.
Proposition 2.
The functions ρ f o l and ρ f o l * satisfy that
ρ f o l * ( b , a ) = ρ f o l ( a , b ) f o r a f q ( a ) ( S n ) , b f q ( b ) ( S n ) .
Proof. 
By the definition of the Legendre transform, we have
ρ f o l * ( b , a ) = ψ q ( b ) * ( x ( b ) ) ψ q ( a ) * ( x ( a ) ) + i = 1 n + 1 x i ( a ) ( x i ( b ) x i ( a ) ) = ψ q ( b ) ( b ) i = 1 n + 1 x i ( b ) x i ( b ) + ψ q ( a ) ( a ) + i = 1 n + 1 x i ( a ) x i ( a ) + i = 1 n + 1 x i ( a ) ( x i ( b ) x i ( a ) ) = ψ q ( a ) ( a ) ψ q ( b ) ( b ) + i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) = ρ f o l ( a , b ) .
The extended divergence is related to the duo Bregman (pseudo-)divergence, where the parameters also define the convex functions [16]. To work with the entire parametrized probability distribution families and to explore the application of divergences, we must investigate their relationship.

6. Decomposition of an Extended Divergence

In this section, we explain the orthogonal foliation of F . Next, we give a decomposition of an extended divergence along the orthogonal leaf and the original leaf.
For the foliation F = { f q ( S n ) | 0 < q < 1 } , we consider the flow on S f o l defined using the following equation.
d x i d t = x i , i = 1 , , n + 1 ,
where a function x i on S f o l takes the i-th component of the dual coordinate on f q ( S n ) as Equation (27) for each 0 < q < 1 . An integral curve of Equation (35) is orthogonal to f q ( S n ) for each q with respect to the pairing , . The set of integral curves becomes the orthogonal foliation of F . We denote it by F .
Translating into the primal coordinate system, we have the next equation.
d x i d t = E ˜ i , i = 1 , , n + 1 , on S f o l ,
E ˜ = E ˜ q i = j = 1 n + 1 g q i j ψ q x j on f q ( S n ) ,
where ( g q i j ) is the inverse matrix of ( g q i j ) . The right-hand side of Equation (37) is calculated using Equations (11) and (12) for ψ q . A leaf of F is an integral curve of the vector field E ˜ that takes the value E ˜ q on f q ( S n ) for each q.
The following theorem is on the decomposition of the extended divergence.
Theorem 3.
Let S n be the probability simplex, and ( f q ( S n ) , D , g q ) the 1-conformally flat statistical manifold generated by the affine immersion ( f q , E q ) , where f q is defined as
x i ( f q ( p ) ) = 1 q ( x i ( p ) ) q , i = 1 , , n + 1 , f o r p S n ,
ψ q 1 / ( 1 q ) i = 1 n + 1 ( q x i ) 1 / q , E q d ψ ( E ˜ q ) 1 E ˜ q , E ˜ q i j = 1 n + 1 g q i j ψ q / x j , and g q is the restriction of ( g q i j ) = D d ψ q to f q ( S n ) . Let a , b f q ( a ) ( S n ) , 0 < q ( a ) < 1 , and c S f o l 0 < q < 1 f q ( S n ) . If there exists an orthogonal leaf L F that includes b and c, we have
ρ f o l ( a , c ) = μ ρ f o l ( a , b ) + ρ f o l ( b , c ) , x ( c ) = μ x ( b ) , μ > 0 ,
where x ( · ) is the dual coordinate of f q ( S n ) for each q.
Proof. 
From a , b f q ( a ) ( S n ) , it holds that ψ q ( a ) ( a ) = ψ q ( b ) ( b ) , where q ( b ) = q ( a ) . By the definition in Equations (22) and (23), we have
ρ f o l ( a , c ) = ψ q ( a ) ( a ) ψ q ( c ) ( c ) + i = 1 n + 1 x i ( c ) ( x i ( a ) x i ( c ) ) = ψ q ( b ) ( b ) ψ q ( c ) ( c ) + i = 1 n + 1 { x i ( c ) ( x i ( a ) x i ( b ) ) + x i ( c ) ( x i ( b ) x i ( c ) ) } = + μ i = 1 n + 1 x i ( b ) ( x i ( a ) x i ( b ) ) + { ψ q ( b ) ( b ) ψ q ( c ) ( c ) + i = 1 n + 1 x i ( c ) ( x i ( b ) x i ( c ) ) } = μ ρ f o l ( a , b ) + ρ f o l ( b , c ) .
See Figure 1 for a decomposition of extended divergence and graphs of deformed simplexes f q ( S n ) .
A decomposition similar to Equation (39) is also available on a foliation by Hessian level surfaces of one Hessian manifold [20]. Theorem 3 generalizes the previous decomposition.
Finally, we describe the gradient flow on a leaf f q ( S n ) using the extended divergence.
Theorem 4.
For a submanifold ( f q ( S n ) , D , g q ) of S f o l , we denote by { x 1 , , x n } an affine coordinate system on f q ( S n ) such that D d x i = 0 , i = 1 , , n , and set g q i j = g q ( / x i , / x j ) , ( g q i j ) = ( g q i j ) 1 . For a fixed point c L , the gradient flow on f q ( S n ) defined by
d x i d t = j = 1 n g q i j x j ρ f o l ( a x , c ) , i = 1 , , n , a x f q ( S n )
converges to the unique point b L f q ( S n ) , where a x is a variable point parametrized as { x 1 ( t ) , , x n ( t ) } .
Proof. 
By Theorem 3, for any a x f q ( S n ) , there exists μ > 0 such that
ρ f o l ( a x , c ) = μ ρ f o l ( a x , b ) + ρ f o l ( b , c ) , x ( c ) = μ x ( b ) .
Equation (40) is described by the dual coordinate system { x 1 , , x n } on f q ( S n ) as follows;
d x i d t = μ x j ρ f o l ( a x , b ) , i = 1 , , n .
On f q ( S n ) , from Prop. 1.(i), ρ f o l coincides with the geometric divergence ρ q , generated by the affine immersion ( f q , E q ) . The geometric divergence generates the dual coordinate x i such that D * d x i = 0 , i = 1 , , n , to be derived by x i [19]. Then, it holds that
d x i d t = μ ( x i ( a x ) x i ( b ) ) , i = 1 , , n ,
and that
x i = x i ( b ) + ( x i ( a | t = 0 ) x i ( b ) ) e μ t , i = 1 , , n ,
where a | t = 0 is an initial point of Equation (40). Then, the gradient flow of Equation (40) converges to b L f q ( S n ) following a geodesic for the dual coordinate system. □
The gradient flow similar to Equation (40) has been provided on a flat statistical submanifold [25]. The similar one on a Hessian level surface, i.e., a 1-conformally statistical submanifold, has been given in [20]. Theorem 4 generalizes the previous theorems on gradient flows.

7. Conclusions

This study considers a foliation of deformed probability simplexes corresponding to sets of escort distributions with q-parameters, for the continuous transition of α -parameters on information geometry. Since these are typical q-exponential families, we still need to provide details on the extended divergence and natural definition of the foliation of q-exponential families.
The extended divergence guides the proximity of q-exponential distributions with different q-parameters. Therefore, our theory guarantees the mathematical basis for generalizing methods of machine learning and statistical mechanics to the case of the q-distribution families when different q-parameters are mixed. The decomposition theorem is applied to the problem of the optimal choice of q-parameter. The application methods are open to consideration. It also remains to investigate the relationship with a new λ -duality on nonextensive statistical mechanics with mixed parameters [26,27].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to the referees for their constructive comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: New York, NY, USA, 2009. [Google Scholar]
  2. Naudts, J. Generalised Thermostatistics; Springer: London, UK, 2011. [Google Scholar]
  3. Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Ineq. Pure Appl. Math. 2004, 5, 102. [Google Scholar]
  4. Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A 2007, 370, 184–193. [Google Scholar] [CrossRef]
  5. Ohara, A. Geometric study for the Legendre duality of generalized entropies and its application to the porous medium equation. Eur. Phys. J. B 2009, 70, 15–28. [Google Scholar] [CrossRef] [Green Version]
  6. Ohara, A.; Wada, T. Information geometry of q-Gaussian densities and behaviors of solutions to related diffusion equations. J. Phys. A Math. Theor. 2010, 43, 035002. [Google Scholar] [CrossRef] [Green Version]
  7. Matsuzoe, H.; Ohara, A. Geometry for q-exponential families. In Recent Progress in Differential Geometry and its Related Fields; Adachi, T., Hashimoto, H., Hristov, M.J., Eds.; World Scientific Publishing: Hackensack, NJ, USA, 2011; pp. 55–71. [Google Scholar]
  8. Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Phys. A 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
  9. Matsuzoe, H.; Henmi, M. Hessian structures and divergence functions on deformed exponential families. In Geometric Theory of Information, Signals and Communication Technology; Nielsen, F., Ed.; Springer: Basel, Switzerland, 2014; pp. 57–80. [Google Scholar]
  10. Matsuzoe, H.; Wada, T. Deformed algebras and generalizations of independence on deformed exponential families. Entropy 2015, 17, 5729–5751. [Google Scholar] [CrossRef] [Green Version]
  11. Wada, T.; Matsuzoe, H.; Scarfone, A.M. Dualistic Hessian structures among the thermodynamic potentials in the κ-thermostatistics. Entropy 2015, 17, 7213–7229. [Google Scholar] [CrossRef] [Green Version]
  12. Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
  13. Scarfone, A.M.; Matsuzoe, H.; Wada, T. Information geometry of κ-exponential families: Dually-flat, Hessian and Legendre structures. Entropy 2018, 20, 436. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Matsuzoe, H. A sequence of escort distributions and generalizations of expectations on q-exponential family. Entropy 2017, 19, 7. [Google Scholar] [CrossRef] [Green Version]
  15. Uohashi, K. A foliation by deformed probability simplexes for transition of α-parameters. In Proceedings of the International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, IHP, Paris, France, 18–22 July 2022. [Google Scholar]
  16. Nielsen, F. Statistical divergences between densities of truncated exponential families with nested supports: Duo Bregman and duo Jensen divergences. Entropy 2022, 24, 421. [Google Scholar] [CrossRef] [PubMed]
  17. Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
  18. Uohashi, K.; Ohara, A.; Fujii, T. 1-conformally flat statistical submanifolds. Osaka J. Math. 2000, 37, 501–507. [Google Scholar]
  19. Kurose, T. On the divergences of 1-conformally flat statistical manifolds. Tohoku Math. J. 1994, 46, 427–433. [Google Scholar] [CrossRef]
  20. Uohashi, K.; Ohara, A.; Fujii, T. Foliations and divergences of flat statistical manifolds. Hiroshima Math. J. 2000, 30, 403–414. [Google Scholar] [CrossRef]
  21. Nomizu, K.; Pinkal, U. On the geometry and affine immersions. Math. Z. 1987, 195, 165–178. [Google Scholar] [CrossRef]
  22. Nomizu, K.; Sasaki, T. Affine Differential Geometry: Geometry of Affine Immersions; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
  23. Azoury, K.S.; Warmuth, M.K. Relative loss bounds for on-line density estimation with the exponential family of distributions. Mach. Learn. 2001, 43, 211–246. [Google Scholar] [CrossRef] [Green Version]
  24. Blondel, M.; Martins, A.F.T.; Niculae, V. Learning with Fenchel-Young losses. J. Mach. Learn. Res. 2020, 21, 1–69. [Google Scholar]
  25. Fujiwara, A.; Amari, S. Gradient systems in view of information geometry. Phys. D. 1995, 80, 317–327. [Google Scholar] [CrossRef]
  26. Zhang, J.; Wong, T.-K.L. λ-Deformation: A canonical framework for statistical manifolds of constant curvature. Entropy 2022, 24, 193. [Google Scholar] [CrossRef] [PubMed]
  27. Wong, T.-K.L.; Zhang, J. Tsallis and Rényi deformations linked via a new λ-duality. IEEE Trans. Inf. Theory 2022, 68, 5353–5373. [Google Scholar] [CrossRef]
Figure 1. A decomposition of extended divergence ρ f o l ( a , c ) , graphs of the standard simplex ( q 1 ), and deformed simplexes as q = 0.75, 0.6, 0.5, 0.4, 0.25 in A + 2 . For primal coordinates a , b f 0.75 ( S 1 ) , and c f 0.6 ( S 1 ) , dual coordinates satisfy x ( a ) , x ( b ) f 0.25 ( S 1 ) , and x ( c ) f 0.4 ( S 1 ) . The primal geodesic between a and b is orthogonal to the dual one between b and c with respect to the metric g 0.75 .
Figure 1. A decomposition of extended divergence ρ f o l ( a , c ) , graphs of the standard simplex ( q 1 ), and deformed simplexes as q = 0.75, 0.6, 0.5, 0.4, 0.25 in A + 2 . For primal coordinates a , b f 0.75 ( S 1 ) , and c f 0.6 ( S 1 ) , dual coordinates satisfy x ( a ) , x ( b ) f 0.25 ( S 1 ) , and x ( c ) f 0.4 ( S 1 ) . The primal geodesic between a and b is orthogonal to the dual one between b and c with respect to the metric g 0.75 .
Entropy 24 01736 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Uohashi, K. Extended Divergence on a Foliation by Deformed Probability Simplexes. Entropy 2022, 24, 1736. https://doi.org/10.3390/e24121736

AMA Style

Uohashi K. Extended Divergence on a Foliation by Deformed Probability Simplexes. Entropy. 2022; 24(12):1736. https://doi.org/10.3390/e24121736

Chicago/Turabian Style

Uohashi, Keiko. 2022. "Extended Divergence on a Foliation by Deformed Probability Simplexes" Entropy 24, no. 12: 1736. https://doi.org/10.3390/e24121736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop