Next Article in Journal
Entropy-Based Measures of Hypnopompic Heart Rate Variability Contribute to the Automatic Prediction of Cardiovascular Events
Next Article in Special Issue
A Geometric Approach to Average Problems on Multinomial and Negative Multinomial Models
Previous Article in Journal
Complexity Analysis of EEG, MEG, and fMRI in Mild Cognitive Impairment and Alzheimer’s Disease: A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Global Geometry of Bayesian Statistics

Department of Mathematics, Osaka Dental University, Osaka 573-1121, Japan
Entropy 2020, 22(2), 240; https://doi.org/10.3390/e22020240
Submission received: 15 January 2020 / Revised: 18 February 2020 / Accepted: 18 February 2020 / Published: 20 February 2020
(This article belongs to the Special Issue Information Geometry III)

Abstract

:
In the previous work of the author, a non-trivial symmetry of the relative entropy in the information geometry of normal distributions was discovered. The same symmetry also appears in the symplectic/contact geometry of Hilbert modular cusps. Further, it was observed that a contact Hamiltonian flow presents a certain Bayesian inference on normal distributions. In this paper, we describe Bayesian statistics and the information geometry in the language of current geometry in order to spread our interest in statistics through general geometers and topologists. Then, we foliate the space of multivariate normal distributions by symplectic leaves to generalize the above result of the author. This foliation arises from the Cholesky decomposition of the covariance matrices.

Graphical Abstract

1. Introduction

Suppose that a smooth manifold U is embedded in the space of positive probability densities defined on a fixed domain. Then, the relative entropy defines a separating premetric D : U × U R 0 on U. Here a premetric on U is a non-negative function on U × U vanishing along the diagonal set Δ U × U , and it is separating if it vanishes only at Δ . Its jet of order 3 at Δ induces a family of differential geometric structures on U, which is the main subject of the information geometry. There is a large body of literature on the information geometry (see [1,2] and references therein). It is worth noting that another “canonical” choice of premetric other than the above D is discussed in [3].
In the case where U is the space of univariate normal distributions, the half plane H = R × R > 0 ( m , s ) presents U, where m denotes the mean and s the standard deviation. Since the convolution of two normal densities is a normal density, it induces a product ∗ on H called the convolution product. On the other hand, since the pointwise product of two normal densities is proportional to a normal density, it induces another product · on H called the Bayesian product. Their expressions are
( m , s ) ( m , s ) = m + m , s 2 + s 2 , ( m , s ) · ( m , s ) = m s 2 + m s 2 s 2 + s 2 , s s s 2 + s 2 .
In the previous work of the author ([4]), a Fourier-like transformation is defined as the diffeomorphism F ^ : H H sending ( m , s ) to ( M , S ) = m s 2 , 1 s . It is an involution interchanging the two operations ∗ and ·. Accordingly, the stereograph f : H × H R > 0 of the above D is defined by
f : ( m , s , M , S ) 1 2 M S + s S m s 2 + 1 2 ( s S ) 2 1 2 ln ( s S ) = 1 2 ( m m ) 2 + s 2 s 2 1 + ln s 2 s 2 = D ( ( m , s ) , ( m , s ) ) .
The flow ( m , s , M , S ) ( e t m , e t s , e t M , e t S ) ( t R ) preserves f as well as the graph F H × H of F ^ . The same symmetry appears in the contact/symplectic geometry related to the algebraic geometry of Hilbert modular cusps. Moreover, there exists a contact Hamiltonian flow whose restriction to the graph F presents a certain Bayesian inference. Its application appears in [5].
In this paper, we describe Bayesian statistics in the language of current geometry in order to share the problems among general geometers and topologists. Then, generalizing the above result of the author, we foliate the space of multivariate normal distributions by using the Cholesky decomposition of the covariance matrices and define on each leaf the Fourier-like transformation, the stereograph of the relative entropy, and the contact Hamiltonian flow presenting a Bayesian inference. The ultimate aim of this research is to construct a Bayesian statistical model of space-time on which everything learns by changing its inner distribution along the leaf.

2. Results

2.1. Symplectic/Contact Geometry

Current geometry does not heavily use tensor calculus. Instead, it uses (exterior) differential forms, which can be integrated along cycles, pulled-back under smooth maps, and differentiated without affine connections. In symplectic/contact geometry, the readers must be familiar with differential forms. Then, this subsection is the minimal summary of definitions. For the details, refer to [6].
A (positive) symplectic form on an oriented 2 n -manifold is a closed 2-form ω satisfying ω n > 0 , where ω n = ω ω . If the orientation is reversed, the 2-form ω becomes a negative symplectic form. In either case, a symplectic form ω identifies a vector field X with an exact 1-form d H through the one-to-one correspondence defined by Hamilton’s equation ι X ω = d H . Here ι denotes the interior product. Then, X is called a Hamiltonian vector field of the primitive function H (+constant). The flow generated by X preserves the symplectic form ω . Namely, the Lie derivative L X ω ( = ι X d ω + d ι X ω ) vanishes. A Lagrangian submanifold is an n-manifold which is immersed in a symplectic 2 n -manifold so that the pull-back of the symplectic form vanishes. The word “symplectic” is a calque of “complex”. Indeed, there exists an almost complex structure J which is compatible with a given symplectic structure, i.e., for which the composition ω ( · , J · ) is a Riemannian metric. In the case where J is integrable, ω is called a Kähler form of the complex manifold.
On the other hand, a (positive) contact form on an oriented ( 2 n 1 ) -manifold N is a 1-form η satisfying η ( d η ) n 1 > 0 . A (co-oriented) contact structure on N is the conformal class of a contact form. It can be presented as the oriented hyperplane field ker η . The product manifold R ( t ) × N carries the exact symplectic form d ( e t η ) . Take a function h on N. Let X be the Hamiltonian vector field of the function e t h defined on the product manifold R × N . Then, the push-forward Y of X under the projection of R × N to the second factor is well-defined. The vector field Y is called the contact Hamiltonian vector field of the function h on N. The pair of the equations η ( Y ) = h and η L Y η = 0 uniquely determines Y. A Legendrian submanifold is an ( n 1 ) -manifold which is immersed in a contact ( 2 n 1 ) -manifold so that the pull-back of the contact form vanishes.

2.2. Bayesian Statistics

Suppose that any point y of a smooth manifold M equipped with a volume form d v o l presents a positive probability density or probability ρ y : W R > 0 defined on a (possibly discrete) measurable space W, where ρ y depends smoothly on y, and ρ y ρ y for y y M . Let V be the space of positive volume forms with finite total volume on M. Take an arbitrary element V V and consider it as the initial state of the mind M of an agent. Here W stands for (a part of) the world for the agent. Finding a datum w W in his world, the agent can take the value ρ y ( w ) as a smooth positive function ρ ¯ w : y ρ y ( w ) on M, which is called the likelihood of the datum w. Then, he can multiply the volume form V by ρ ¯ w > 0 to obtain a new element of V . This defines the updating map
φ : W × V ( w , V ) ρ ¯ w V V .
The “psychological” probability density p V on the mind M defined by p V d v o l = V / M V is accordingly updated into the density p φ ( w , V ) p V ρ ¯ w , which is called the conditional probability density given the datum w. Practically, Bayes’ rule on conditional probabilities is expressed as
P ( y Δ y | w Δ w ) = P ( w Δ w | y Δ y ) P ( w Δ w ) · P ( y Δ y ) .
Here P denotes the probability of an event, Δ y (respectively, Δ w ) a small portion of M (respectively, W). Since the state of the world does not depend on the mind of the agent, the probability P ( w Δ w ) is independent of y, and therefore approximates a constant on M. On the other hand, the conditional probability P ( w Δ w | y Δ y ) of the datum w approximates a function of y which is clearly proportional to the above likelihood. This implies that the factor P ( w Δ w | y Δ y ) P ( w Δ w ) in the right-hand side of Bayes’ rule (Equation (4)) is approximately proportional to the likelihood. Thus, Bayes’ rule (Equation (4)) implies the updating of p V via the formula (Equation (3)). The Bayesian product · mentioned in the introduction appears in this context. Namely, the variable of the first factor is the mean y of a normal distribution on W. The density of the normal distribution at the datum w can be considered as a function of y, which is proportional to a normal distribution on M. Thus the Bayesian product of normal distributions on M presents the updating of the density of the predictive mean in the mind of the agent.
The aim of Bayesian updating is practical to many people. Indeed, the aim of the above updating is the estimation of the mean. Nevertheless, it is quite natural that a geometer multiplies a volume form by a positive function once he is given them. In this regard, we can say that the aim of Bayesian updating is a geometric setting of a dynamical system. In particular, a Bayesian updating in the conjugate prior is at first, simply the iteration of a Bayesian product.

2.3. The Information Geometry

Suppose that a manifold U is embedded in the subset { V V M V = 1 } = { p V d v o l V V } . Hereafter, we identify the element p V d v o l U with the “psychological” probability density p V . We call U a conjugate prior for the updating map φ if the cone U ˜ = { e t V t R , V U } satisfies φ ( W × U ˜ ) U ˜ . (Whether there exists a preferred conjugate prior or not, how to determine the initial state of the mind is another interesting problem. For example, one may fix the asymptotic behavior of the state of mind according to the aim of the Bayesian inference and search for the optimal decision of the initial state. See [7] for an approach to this problem via the information geometry.)
Now we define the “distance” D ˜ : U ˜ × U ˜ R on U ˜ , which satisfies none of the axioms of distance, by
D ˜ ( V 1 , V 2 ) = M V 1 ln V 1 V 2 ( the relative entropy ) .
From the convexity of ln , we see that the restriction D = D ˜ | U × U ln M V 2 = 0 is a separating premetric on U, which is called the Kullback–Leibler divergence in information theory. This implies that the germ of D along the diagonal set Δ of U × U represents the zero section of the cotangent sheaf of U, that is, for any point x = ( x 1 , , x n ) of any chart of U, the Taylor expansion of D x + 1 2 d x , x 1 2 d x has no linear terms. Thus the differential d D : T U × T U R also vanishes on the diagonal set Δ of T U × T U . We regard the 1-form on T U represented by the germ of d D along Δ as a quadratic tensor, and denote it by g (note that g x : T x U × T x U R is linear). It appears as two times the quadratic terms 1 2 i , j g i j d x i d x j ( g i j = g j i ) in the above Taylor expansion. Of course, it also appears in the Taylor expansion of D x 1 2 d x , x + 1 2 d x . Thus it can also be considered as the quadratic terms of the symmetric sum D x + 1 2 d x , x 1 2 d x + D x 1 2 d x , x + 1 2 d x . The symmetric matrix [ g i j ] is called the Fisher information in information theory. From the non-negativity of D, we may assume generically that g is a Riemannian metric. We would like to notice that this construction of Riemannian metric by means of symmetric sum also works over U ˜ . Indeed, we have
D ˜ ( e t + 1 2 d t V , e t 1 2 d t V ) + D ˜ ( e t 1 2 d t V , e t + 1 2 d t V ) = e t d t 2 + O ( d t 3 ) .
Let 0 be the Levi–Civita connection of g. We write the lowest degree terms in the Taylor expansions of D x + 1 2 d x , x 1 2 d x D x 1 2 d x , x + 1 2 d x as 1 3 i , j , k T i j k d x i d x j d x k ( T i j k = T j i k = T i k j ). This presents the symmetric cubic tensor T, which can be constructed from the anti-symmetric difference D ( x 1 , x 2 ) = D ( x 1 , x 2 ) D ( x 2 , x 1 ) similarly as above. One can use it to deform the Levi–Civita connection 0 into the α -connections α = 0 α g T ( α R ) without torsion, where g denotes the dual metric. Especially, we call 1 and 1 respectively the e-connection and the m-connection. The symmetric tensor T is sometimes called skewness since it presents the asymmetry of D. The information geometry concerns the family of α -connections as well as the Fisher information metric on U. We usually do not extend it over U ˜ for the symmetric sum of D ˜ lacks asymmetry.

2.4. The Geometry of Normal Distributions

We consider the space U of multivariate normal distributions on M = R n . A vector μ = ( μ i ) 1 i n R n and an upper triangular matrix C = [ c i j ] 1 i , j n Mat ( n , R ) with positive diagonal entries parameterize U by declaring that μ presents the mean and C T C the Cholesky decomposition of the covariance matrix. Further we put
σ i = c i i , r i j = c i j c i i ( i , j { 1 , , n } ) , C = diag ( σ ) [ r i j ] .
The matrix [ r i j ] is unitriangular, i.e., a triangular matrix whose diagonal entries are all 1. Then, each point x = ( μ , σ , r ) U = R n × ( R > 0 ) n × R n ( n 1 ) / 2 presents the volume form
V x = 1 ( 2 π ) n σ 1 σ n exp 1 2 C ( σ , r ) T ( y μ ) 2 d v o l .
Let · 2 denote the sum of the squares of the entries of a matrix as well as a vector. The relative entropy defines the premetric D ( x , x = ( μ , σ , r ) ) by
D ( x , x ) = C ( σ , r ) T ( μ μ ) 2 2 + C ( σ , r ) C ( σ , r ) 1 2 n 2 i = 1 n ln σ i σ i ,
Let 1 n denote the unit matrix, and Δ C the difference C ( σ + Δ σ , r + Δ r ) C ( σ , r ) . We have
D ( x + Δ x , x ) = C T Δ μ 2 2 + Δ C C 1 2 2 + tr ( Δ C C 1 ) ln 1 n + Δ C C 1 .
Let r i j be the entries of the inverse matrix of [ r i j ] . We have
( the i j - entry of Δ C C 1 ) = Δ σ i σ i ( i = j ) σ i + Δ σ i σ j k = i + 1 j r k j Δ r i k ( i < j ) 1 1 0 ( i > j ) .
From Equations (10) and (11), we see that the Fisher information metric g is expressed as
g = k = 1 n 1 σ k i = 1 k r i k d μ i 2 + 2 i = 1 n d σ i σ i 2 + l = 1 n 1 k = l + 1 n σ l σ k i = l + 1 k r i k d r l i 2 .
Put
g μ μ = g μ i , μ j = k i , j r i k r j k σ k 2 = C 1 C T , g σ σ = diag 2 σ i 2 , g r r , l = g r l i r l j i , j > l = σ l 2 g μ i , μ j i , j > l ( l = 1 , , n 1 ) .
Then, the representing matrix of g is the following block diagonal matrix:
diag ( g μ μ , g σ σ , g r r , 2 , , g r r , n ) .
Lowering the upper indices of the α -connection with L g K L Γ α I J K = Γ { I , J } , K α , we have
Γ { μ i , μ j } , σ k 0 = Γ { μ i , σ k } , μ j 0 = r i k r j k σ k 3 ,
Γ { σ i , σ i } , σ i 0 = 2 σ i 3 ,
Γ { μ i , μ j } , r a b 0 = Γ { μ i , r a b } , μ j 0 = k = b n r b k ( r i a r j k + r i k r j a ) 2 σ k 2 ,
Γ { r l i , r l j } , σ l 0 = Γ { r l i , σ l } , r l j 0 = k i , j σ l r i k r j k σ k 2 ,
Γ { r l i , r l j } , σ k 0 = Γ { r l i , σ k } , r l j 0 = σ l 2 r i k r j k σ k 3 ( k i , j ) ,
Γ { r l i , r l j } , r a b 0 = Γ { r l i , r a b } , r l j 0 = σ l 2 Γ { μ i , μ j } , r a b 0 ( a > l ) ,
Γ { I , J } , K 0 = 0 ( for the other choices of   { I , J } and K ) .
With respect to the same coordinates, the coefficients of the e-connection are
Γ { μ i , σ k } , μ j 1 = 2 Γ { μ i , σ k } , μ j 0 ,
Γ { σ i , σ i } , σ i 1 = 3 Γ { σ i , σ i } , σ i 0 ,
Γ { μ i , r a b } , μ j 1 = 2 Γ { μ i , r a b } , μ j 0 ,
Γ { r l i , r l j } , σ l 1 = 2 Γ { r l i , r l j } , σ l 0 ,
Γ { r l i , σ k } , r l j 1 = 2 Γ { r l i , σ k } , r l j 0 ( k i , j ) ,
Γ { r l i , r a b } , r l j 1 = 2 Γ { r l i , r a b } , r l j 0 ( a > l ) ,
Γ { I , J } , K 1 = 0 ( for the other choices of   { I , J } and K ) .
Those of the m-connection are
Γ { μ i , μ j } , σ k ( 1 ) = 2 Γ { μ i , μ j } , σ j 0 ,
Γ { σ i , σ i } , σ i ( 1 ) = Γ { σ i , σ i } , σ i 0 ,
Γ { μ i , μ j } , r a b ( 1 ) = 2 Γ { μ i , μ j } , r a b 0 ,
Γ { r l i , σ l } , r l j ( 1 ) = 2 Γ { r l i , σ l } , r l j 0 ,
Γ { r l i , r l j } , σ k ( 1 ) = 2 Γ { r l i , r l j } , σ k 0 ( k i , j ) ,
Γ { r l i , r l j } , r a b ( 1 ) = 2 Γ { r l i , r l j } , r a b 0 ( a > l ) ,
Γ { I , J } , K ( 1 ) = 0 ( for the other choices of   { I , J } and K ) .
There is a particular system of coordinates for describing the e-connection. Namely, all the coefficients vanish with respect to the natural parameter ( C 1 C T μ , ξ ) , where ξ = ( ξ a b ) 1 a b n is the upper half of C 1 C T . On the other hand, all the coefficients for the m-connection vanish with respect to the expectation parameter ( μ , ν ) , where ν = ( ν a b ) 1 a b n is the upper half of C T C + μ μ T .

2.5. The Generalization

This subsection is devoted to the generalization of the result of the author, which is mentioned in the introduction, to the above multivariate setting. We fix the third component r of the coordinate system ( μ , σ , r ) , and change the presentation of the others. Namely, we take the natural projection π : U = H n × R n ( n 1 ) / 2 R n ( n 1 ) / 2 and replace the coordinates ( μ , σ ) on the fiber L ( r ) = π 1 ( r ) by ( m , s ) appearing in the next proposition. The generalization is then straightforward.
Proposition 1.
The fiber L ( r ) = π 1 ( r ) is an affine subspace of U with respect to the e-connection 1 . It can be parameterized by affine parameters m i s i 2 and 1 s i 2 , where m = [ r i j ] T μ and s = 2 σ .
Proof. 
The natural parameters C 1 C T μ = 2 k = i n r i k m k s k 2 1 i n and ξ = k = b n r a k r b k 1 s k 2 1 a b n are affine provided that r i j are constant, and m i s i 2 and 1 s i 2 are affine. □
Note that d r ( μ α μ ) is identically zero on some/any fiber L ( r ) if and only if α = 1 . The fiber satisfies the following two properties.
Proposition 2.
L ( r ) is closed under the convolution ∗ and the Bayesian product ·, and thus inherits them.
Proof. 
The covariance of the density at ( μ , σ , r ) ( μ , σ , r ) is
[ r i j ] T diag ( ( σ i 2 ) ) [ r i j ] + [ r i j ] T diag ( ( σ i 2 ) ) [ r i j ] = [ r i j ] T diag ( ( σ i 2 + σ i 2 ) ) [ r i j ] .
This coincides with that of the density at μ + μ , σ i 2 + σ i 2 , r . Thus L ( r ) is closed under ∗, and inherits it as ( m , s ) ( m , s ) = m + m , s i 2 + s i 2 .
Put u = ( σ i 2 ) and y = [ r i j ] T x . The density at ( μ , σ , r ) is proportional to exp y ( x ) T diag ( u ) y ( x ) 2 m T diag ( u ) y ( x ) . From this we see that ( μ , σ , r ) · ( μ , σ , r ) = ( μ , σ , r ) implies r = r , u = u + u and diag ( u ) m = diag ( u ) m + diag ( u ) . Thus L ( r ) is closed under ·, and inherits it as ( m , s ) · ( m , s ) = m i s i 2 + m i s i 2 s i 2 + s i 2 , s i s i s i 2 + s i 2 . □
Proposition 3.
The fiber L ( r ) with the induced metric from g admits a Kähler complex structure.
Proof. 
The restriction of g is 2 i = 1 n d m i 2 + d s i 2 s i 2 . We define the complex structure J : T L ( r ) T L ( r ) by J ( m i ) = s i ( J ( s i ) = m i ). Then, the 2-form ω = 2 i = 1 n d m i d s i s i 2 satisfies ω ( · , J · ) = g ( · , · ) . □
We write the restriction D | L ( r ) of the premetric D using the coordinates ( m , s ) as follows, where we omit r for the expression does not depend on r.
D | L ( ( m , s ) , ( m , s ) ) = 1 2 i = 1 n m i s i m i s i 2 + s i 2 s i 2 1 ln s i 2 s i 2 .
We take the product U 1 × U 2 of two copies of U. Then, the products L 1 ( r ) × L 2 ( R ) of the fibers foliate U 1 × U 2 . We call this the primary foliation of U 1 × U 2 . For each ( r , R ) R n ( n 1 ) , we have the coordinate system ( m , s , M , S ) on the leaf L 1 ( r ) × L 2 ( R ) . From the Kähler forms ω 1 = 2 i = 1 n d m i d s i s i 2 and ω 2 = 2 i = 1 n d M i d S i S i 2 respectively on L 1 ( r ) and L 2 ( R ) , we define the symplectic forms ω 1 ± ω 2 on L 1 ( r ) × L 2 ( R ) , which induce the mutually opposite orientations in the case where n is odd. Hereafter, we consider the pair of regular Poisson structures defined by these symplectic structures on the primary foliation, and fix the primitive 1-forms λ ± = 2 i = 1 n d m i s i ± d M i S i . The corresponding pair of Poisson bi-vectors is Π ± = 1 2 i = 1 n s i 2 m i s i ± S i 2 M i S i defined on U 1 × U 2 . We take the 2 n -dimensional submanifolds F ε , δ = m i s i + M i ε i S i = 0 , s i S i = δ i ( i = 1 , , n ) of the leaf L 1 ( r ) × L 2 ( R ) for ε R n and δ ( R > 0 ) n . The secondary foliation of U 1 × U 2 foliates any leaf U ( r ) × U ( r ) by the 3 n -dimensional submanifolds F ε = δ ( R > 0 ) n F ε , δ for ε R n . The tertiary foliation of U 1 × U 2 foliates all leaves F ε of the secondary foliation by the 2 n -dimensional submanifolds F ε , δ for δ ( R > 0 ) n .
Proposition 4.
With respect to the Kähler form d λ , the tertiary leaves F ε , δ are Lagrangian correspondences.
Proof. 
The tangent space T F ε , δ is generated by s i m i S i M i and 2 m i m i + s i s i S i S i . We have d λ ( s i m i S i M i , m i ) = 0 and d λ ( s i m i S i M i , s i s i S i S i ) = 2 s i 2 s i 2 ( S i ) 2 S i 2 = 0 . Thus F ε , δ are Lagrangian submanifolds of ( L 1 ( r ) × L 2 ( R ) , d λ ) . □
The restrictions of λ ± | N to the hypersurface N = ( m , s , M , S ) L 1 × L 2 | i = 1 n ( s i S i ) = 1 are contact forms. Let η ± denote them.
Proposition 5.
For any ε and δ with i = 1 n δ i = 1 , the submanifold F ε , δ N is a disjoint union of n-dimensional submanifolds { s = c o n s t } F ε , δ which are integral submanifolds of the hyper plane field ker η + on N.
Proof. 
We have λ + ( s i m i S i M i ) = 1 1 = 0 . □
For each point ( ε , δ ) H n , we have the diffeomorphism F ^ ε , δ : H n H n sending ( m , s ) H n to ( M , S ) H n with ( m , s , M , S ) F ε , δ . We put h i = ln s i S i δ i and
f ε , δ ( m , s , M , S ) = 1 2 i = 1 n M i ε i S i + e h i m i s i 2 + e 2 h i 1 + 2 h i .
Then, we have D | L ( ( m , s ) , ( m , s ) ) = f ε , δ ( ( m , s ) , F ^ ε , δ ( m , s ) ) . For any ζ ( R > 0 ) 2 n , we define the diffeomorphism φ ε , ζ : ( m , s , M , S ) ( ζ 2 i 1 m i ) , ( ζ 2 i 1 s i ) , ( ε i + ζ 2 i ( M i ε i ) ) , ( ζ 2 i S i ) ) , which preserves the 1-forms λ ± . It is easy to prove the following proposition.
Proposition 6.
In the case where ζ 2 i 1 ζ 2 i = 1 for i = 1 , , n , the diffeomorphism φ ε , ζ preserves f ε , δ .
For each ε R n , we take the set f ε = ( f ε , δ , F ε , δ ) δ ( R > 0 ) n , and consider it as a structure of the secondary leaf F ε .
Proposition 7.
For any ζ ( R > 0 ) n , the diffeomorphism φ ε , ζ preserves the set f ε for any ε R n . In the case where ζ satisfies i = 1 n ( ζ 2 i 1 ζ 2 i ) = 1 , the diffeomorphism φ ε , ζ also preserves the hypersurface N.
Proof. 
F ε , δ maps to F ε , ( ζ 2 i 1 ζ 2 i δ i ) , and f ε , ( ζ 2 i 1 ζ 2 i δ i ) φ ε , ζ = f ε , δ holds. Further i = 1 n δ i = 1 implies i = 1 n ( ζ 2 i 1 ζ 2 i δ i ) = 1 provided that i = 1 n ( ζ 2 i 1 ζ 2 i ) = 1 . □
For any δ ( R > 0 ) n , the diffeomorphism F ^ 0 , δ interchanges the operation ( m , s ) ( m , s ) = m + m , s i 2 + s i 2 with the operation ( m , s ) · ( m , s ) = m i s i 2 + m i s i 2 s i 2 + s i 2 , s i s i s i 2 + s i 2 .
Proposition 8.
If ( m , s , M , S ) , ( m , s , M , S ) F 0 , δ , then
( ( m , s ) · ( m , s ) , ( M , S ) ( M , S ) ) F 0 , δ ( ( m , s ) ( m , s ) , ( M , S ) · ( M , S ) ) F 0 , δ .
Proof. 
Putting ( m , s ) = ( m , s ) · ( m , s ) we have M i + M i = δ i m i s i 2 δ i m i s i 2 = δ i m i s i 2 and S i 2 + S i 2 = δ i 2 s i 2 + δ i 2 s i 2 = δ i 2 s i 2 , hence the first equation. The second equation is similarly proved. □
A curve ( m ( t ) , s ( t ) ) H n is a geodesic with respect to the e-connection 1 if and only if m i s i 2 and 1 s i 2 are affine functions of t for i = 1 , , n .
Definition 1.
We say that an e-geodesic ( m ( t ) , s ( t ) ) H n is intensive if it admits an affine parameterization such that 1 s i 2 are linear for i = 1 , , n .
Note that any e-geodesic is intensive in the case where n = 1 .
Proposition 9.
Given an intensive e-geodesic ( m ( t ) , s ( t ) ) , we can change the parameterization of its image M ( t ) , S ( t ) = ε i m i ( t ) δ i s i 2 , δ i s i under the diffeomorphism F ^ ε , δ to obtain an intensive e-geodesic.
Proof. 
Put m i s i 2 = a i t + b i and 1 s i 2 = c i t . We have M i ε S i 2 = a i c i δ i b i c i δ i · 1 t and 1 S i 2 = 1 c i δ i 2 · 1 t . They are respectively an affine function and a linear function of 1 t . □
We have the hypersurface N = i = 1 n s i S i = 1 H n which carries the contact forms η ± = 2 i = 1 n d m i s i ± d M i S i | N . This is defined on any leaf L 1 ( r ) × L 2 ( R ) H 2 n of the primary foliation of U 1 × U 2 . Now we state the main result.
Theorem 1.
The contact Hamiltonian vector field Y of the restriction of the function i = 1 n m i s i to N for the contact form η + coincides with that for the other contact form η . It is tangent to the tertiary leaves F ε , δ and defines flows on them. Here each flow line presents the correspondence between intensive e-geodesics in Proposition 9.
Corollary 1.
For any δ ( R > 0 ) n , the flow on the leaf F 0 , δ presents the iteration of the operation ∗ on the first factor of U 1 × U 2 and that of the operation · on the second factor as is described in Proposition 8 (see Figure 1).
From Corollary 1, we see that the flow model of Bayesian inference studied in [4,5] also works for the multivariate case. We prove the theorem.
Proof. 
Take the vector field Y ˜ = 1 4 i = 1 n 2 m i m i + s i s i S i S i on U 1 × U 1 . It satisfies ι Y ˜ λ ± = i = 1 n m i s i , L ˜ Y λ ± = λ ± , and ι Y ˜ d ln ( s i S i ) = 0 , and thus its restriction to N is the contact Hamiltonian vector field Y. It also satisfies ι Y ˜ d m i s i + M i ε i S i = 1 4 m i s i + M i ε i S i ( i = 1 , , n ), where the right-hand sides vanish along F ε , δ . Given a point ( P 0 , Q 0 ) = ( m 0 , s 0 , r 0 , M 0 , S 0 , R 0 ) N L 1 ( r 0 ) × L 2 ( R 0 ) U × U , we have the integral curve ( m ( t ) , s ( t ) , r ( t ) , M ( t ) , S ( t ) , R ( t ) ) = ( e 2 t m 0 , e t s 0 , r 0 , M 0 , e t S 0 , R 0 ) of Y with initial point ( P 0 , Q 0 ) . We can change the parameter of the curve ( m ( t ) , s ( t ) , r 0 ) on the first factor with t = e 2 t to obtain an intensive e-geodesic. □

2.6. The Symmetry

The diffeomorphism φ ε , ζ with i = 1 n ( ζ 2 i 1 ζ 2 i ) = 1 in Proposition 7 also appears in the standard construction of Hilbert modular cusps in [8]. We sketch the construction.
Since the function H = ln i = 1 n ( s i S i ) is linear in the logarithmic space R 2 n ( ln s i ) , ( ln S i ) , we can take 2 n 1 points ( ( ln ζ 2 i 1 ( k ) ) , ( ln ζ 2 i ( k ) ) ) { H = 0 } ( k = 1 , , 2 n 1 ) so that the quotient of the primary leaf L ( r ) × L ( R ) under the Z 2 n 1 -action generated by φ ε , ζ ( 1 ) , ..., φ ε , ζ ( 2 n 1 ) is the total space of a vector bundle over T 2 n 1 × R . Here T 2 n 1 is the quotient of { H = 0 } under the Z 2 n 1 -lattice generated by the above points, and the fiber R 2 n consists of the vectors ( m , M ε ) .
In the univariate case ( n = 1 ), on the logarithmic s S -plane, we can take any point ( ln s , ln S ) = ( ln ζ 1 ( 1 ) , ln ζ 2 ( 1 ) ) of the line H = ln s + ln S = 0 other than the origin. That is, we put ζ 1 ( 1 ) = e t and ζ 2 ( 1 ) = e t ( t 0 ). Then, the map φ ε , ζ sends ( m , s , M , S ) to ( e t m , e t s , e t ( M ε ) , e t S ) . The Z -action generated by it rolls up the level sets of H, so that the quotient of the logarithmic s S -plane becomes the cylinder T 1 × R , which is the base space. On the other hand, the m M -plane, which is the fiber, expands horizontally and contracts vertically. This is the inverse monodromy along T 1 . In general, we obtain a similar R 2 n -bundle over T 2 n 1 × R if we take the 2 n 1 points of H = 0 in general position.
From Proposition 7, we see that the leaf F ε of the secondary foliation with the set f ε , as well as the pair of the 1-forms λ ± with the function H, descends to the R 2 n -bundle. If there exists further a Z 2 n -lattice on the fiber R 2 n which is simultaneously preserved by the maps ( m i ) , ( M i ε i ) ( ζ 2 i 1 ( k ) m i ) , ζ 2 i ( k ) ( M i ε i ) ( k = 1 , , 2 n 1 ), we obtain a T 2 n -bundle over T 2 n 1 × R . Such a choice of ζ ( k ) would be number theoretical. Indeed, this is the case for Hilbert modular cusps. Moreover, we are considering the 1-forms λ ± , which descend to the T 2 n -bundle. See [9] for the standard construction with special attention to these 1-forms.
We should notice that the vector field Y does not descend to the T 2 n -bundle. However, every actual Bayesian inference along Y eventually stops. Thus, we may take sufficiently large T 2 n and consider Y as a locally supported vector field to perform the inference in the quotient space.

3. Discussion

Finally, we would like to comment on the transverse geometry of the primary foliation. The author conjectures that it has some relation to the M-theory. See e.g., [10] for a relation between Poisson geometry and matrix theoretical or non-commutative geometrical physics.
The premetric D ( ( μ + Δ μ , σ + Δ σ , r + Δ r ) , ( μ , σ , r ) ) on U can be decomposed as
D ( ( μ + Δ μ , σ + Δ σ , r ) , ( μ , σ , r ) ) + i = 1 n 1 j = i + 1 n σ i + Δ σ i σ j k = i + 1 j r k j Δ r i k 2 .
The first term on the right-hand side presents the fiber premetric D | L ( m + Δ m , s + Δ s ) , ( m , s ) , where Δ m = [ r i j ] T Δ μ and Δ s = 2 Δ σ . If Δ σ = 0 ( Δ s = 0 ), then we have the (non-information geometrical) Pythagorean-type formula
D ( ( μ + Δ μ , σ , r + Δ r ) , ( μ , σ , r ) ) = D ( ( μ + Δ μ , σ , r ) , ( μ , σ , r ) ) + D ( ( μ , σ , r + Δ r ) , ( μ , σ , r ) ) .
Note that each term in this expression does not depend on μ . The second term on the right-hand side can be expressed as
D ( ( μ , σ , r + Δ r ) , ( μ , σ , r ) ) = i = 1 n 1 j = i + 1 n σ i σ j k = i + 1 j r k j Δ r i k 2 .
This presents the discretized version of the following restriction of the Fisher information g:
g | { ( μ , σ ) = const } = i = 1 n 1 j = i + 1 n σ i σ j k = i + 1 j r k j d r i k 2 .
We have the orthonormal frame with respect to this metric which consists of
e i j = σ i σ j k = j n r j k r i k ( 1 i < j n ) .
This frame satisfies the relations [ e i j , e k l ] = δ i l e k j δ k j e i l of the unitriangular algebra, where δ · , · denotes Kronecker’s delta. Using the dual coframe e i j , the relations can be expressed as
d e i j = k = i + 1 j 1 e i k e k j .
The transverse section of the primary foliation of U 1 × U 2 is the product of two copies of the unitriangular Lie group, which we would like to call the bi-unitriangular group. We fix the frame (respectively, the coframe) of the transverse section consisting of the above e i j (respectively, e i j ) in the first factor U 1 and their copies E i j (respectively, E i j ) in the second factor U 2 . The quotient manifold of the bi-unitriangular group by a cocompact lattice inherits a Riemannian metric from the sum of Fisher informations, and carries the following global ( n 2 ) -plectic structure Ω ( d Ω = 0 , Ω n > 0 ):
Ω = i = 1 n e i , i + 1 e i , n E n i + 1 , n i + 2 E n i + 1 , n .
We notice that, in the symplectic case where n = 3 , the quotient manifold admits no Kähler structure (see [11]). However, it is still remarkable that the transverse symplectic 6-manifold is naturally ignored in the Bayesian inference described in this paper. Conjecturally, a similar model would help us to treat events in parallel worlds (or blanes) in the same “psychological” procedure.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Amari, S. Information Geometry and Its Applications; Springer: Tokyo, Japan, 2016. [Google Scholar]
  2. Ay, N.; Jost, J.; Le, H.V.; Schwachhöfer, L. Information Geometry; Springer: Gewerbesrasse, Switzerland, 2017. [Google Scholar]
  3. Felice, N.; Ay, N. Towards a Canonical Divergence within Information Geometry. arXiv 2018, arXiv:1806.11363. [Google Scholar]
  4. Mori, A. Information geometry in a global setting. Hiroshima Math. J. 2018, 48, 291–306. [Google Scholar] [CrossRef]
  5. Mori, A. A concurrence theorem for alpha-connections on the space of t-distributions and its application. Hokkaido Math. J. 2020, in press. [Google Scholar]
  6. Geiges, H. An Introduction to Contact Topology; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
  7. Snoussi, H. Bayesian information geometry: Application to prior selection on statistical manifolds. In Advances in Imaging and Electron Physics 146; Hawkes, P., Ed.; Elsevier: San Diego, CA, USA, 2007; pp. 163–207. [Google Scholar]
  8. Hirzebruch, F. Hilbert modular surfaces. Enseign. Math. 1973, 19, 183–281. [Google Scholar]
  9. Massot, P.; Niederkrüger, K.; Wendl, C. Weak and strong fillability of higher dimensional contact manifolds. Invent. Math. 2013, 192, 287–373. [Google Scholar] [CrossRef] [Green Version]
  10. Kuntner, N.; Steinacker, H. On Poisson geometries related to noncommutative emergent gravity. J. Geom. Phys. 2012, 62, 1760–1777. [Google Scholar] [CrossRef]
  11. Cordero, L.; Fernández, M.; Gray, A. Symplectic manifolds with no Kähler structure. Topology 1986, 25, 375–380. [Google Scholar] [CrossRef] [Green Version]
Figure 1. On any leaf of the primary foliation of U 1 × U 2 , there is a bi-contact hypersurface N carrying the bi-contact Hamiltonian vector field Y. Because of the dimension, the surface F in the figure presents simultaneously a leaf of the secondary foliation and a leaf of the tertiary foliation of that leaf. The flow on the tertiary leaf F = F 0 , δ traces the common lift of the iteration of ∗ on L 1 and the iteration of · on L 2 .
Figure 1. On any leaf of the primary foliation of U 1 × U 2 , there is a bi-contact hypersurface N carrying the bi-contact Hamiltonian vector field Y. Because of the dimension, the surface F in the figure presents simultaneously a leaf of the secondary foliation and a leaf of the tertiary foliation of that leaf. The flow on the tertiary leaf F = F 0 , δ traces the common lift of the iteration of ∗ on L 1 and the iteration of · on L 2 .
Entropy 22 00240 g001

Share and Cite

MDPI and ACS Style

Mori, A. Global Geometry of Bayesian Statistics. Entropy 2020, 22, 240. https://doi.org/10.3390/e22020240

AMA Style

Mori A. Global Geometry of Bayesian Statistics. Entropy. 2020; 22(2):240. https://doi.org/10.3390/e22020240

Chicago/Turabian Style

Mori, Atsuhide. 2020. "Global Geometry of Bayesian Statistics" Entropy 22, no. 2: 240. https://doi.org/10.3390/e22020240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop