Next Article in Journal
Clinical Evaluation of Duchenne Muscular Dystrophy Severity Using Ultrasound Small-Window Entropy Imaging
Next Article in Special Issue
Lagrangian Submanifolds of Symplectic Structures Induced by Divergence Functions
Previous Article in Journal
Optimal Encoding in Stochastic Latent-Variable Models
Previous Article in Special Issue
A Geometric Approach to Average Problems on Multinomial and Negative Multinomial Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds

Sony Computer Science Laboratories, Tokyo 141-0022, Japan
Entropy 2020, 22(7), 713; https://doi.org/10.3390/e22070713
Submission received: 11 June 2020 / Revised: 24 June 2020 / Accepted: 24 June 2020 / Published: 28 June 2020
(This article belongs to the Special Issue Information Geometry III)

Abstract

:
We study the Voronoi diagrams of a finite set of Cauchy distributions and their dual complexes from the viewpoint of information geometry by considering the Fisher-Rao distance, the Kullback-Leibler divergence, the chi square divergence, and a flat divergence derived from Tsallis entropy related to the conformal flattening of the Fisher-Rao geometry. We prove that the Voronoi diagrams of the Fisher-Rao distance, the chi square divergence, and the Kullback-Leibler divergences all coincide with a hyperbolic Voronoi diagram on the corresponding Cauchy location-scale parameters, and that the dual Cauchy hyperbolic Delaunay complexes are Fisher orthogonal to the Cauchy hyperbolic Voronoi diagrams. The dual Voronoi diagrams with respect to the dual flat divergences amount to dual Bregman Voronoi diagrams, and their dual complexes are regular triangulations. The primal Bregman Voronoi diagram is the Euclidean Voronoi diagram and the dual Bregman Voronoi diagram coincides with the Cauchy hyperbolic Voronoi diagram. In addition, we prove that the square root of the Kullback-Leibler divergence between Cauchy distributions yields a metric distance which is Hilbertian for the Cauchy scale families.

1. Introduction

Let P = { P 1 , , P n } be a finite set of points in a space X equipped with a measure of dissimilarity D ( · , · ) : X × X R + . The Voronoi diagram [1] of P partitions X into elementary Voronoi cells Vor ( P 1 ) , , Vor ( P n ) (also called Dirichlet cells [2]) such that
Vor D ( P i ) : = X X , D ( P i , X ) D ( P j , X ) , j { 1 , , n }
denotes the proximity cell of point generator P i (also called Voronoi site), i.e., the locii of points X X closer with respect to D to P i than to any other generator P j .
When the dissimilarity D is chosen as the Euclidean distance ρ E , we recover the ordinary Voronoi diagram [1]. The Euclidean distance ρ E ( P , Q ) between two points P and Q is defined as
ρ E ( P , Q ) = p q 2 ,
where p and q denote the Cartesian coordinates of point P and Q, respectively, and · 2 the 2 -norm. Figure 1 (left) displays the Voronoi cells of an ordinary Voronoi diagram for a given set of generators.
The Voronoi diagram and its dual Delaunay complex [3] are fundamental data structures of computational geometry [4]. These core geometric data-structures find many applications in robotics, 3D reconstruction, geographic information systems (GISs), etc. See the textbook [1] for some of their applications. The Delaunay simplicial complex is obtained by drawing a straight edge between two generators iff their Voronoi cells share an edge (Figure 1, right). In Euclidean geometry, the Delaunay simplicial complex triangulates the convex hull of the generators, and is therefore called the Delaunay triangulation. Figure 1 depicts the dual Delaunay triangulations corresponding to ordinary Voronoi diagrams. In general, when considering arbitrary dissimilarity D, the Delaunay simplicial complex may not triangulate the convex hull of the generators (see [5] and Section 4).
When the dissimilarity is oriented or asymmetric, i.e., D ( P , Q ) D ( Q , P ) , one can define the reverse or dual dissimilarity D * ( P , Q ) : = D ( Q , P ) . This duality is termed reference duality in [6], and is an involution:
( D * ) * ( P , Q ) = D ( P , Q ) .
The dissimilarity D ( P : Q ) is called the forward dissimilarity.
In the remainder, we shall use the ‘:’ notational convention [7] between the arguments of the dissimilarity to emphasize that a dissimilarity D is asymmetric: D ( P : Q ) D ( Q : P ) . For an oriented dissimilarity D ( · : · ) , we can define two types of dual Voronoi cells as follows:
Vor D ( P i ) : = X X , D ( P i : X ) D ( P j : X ) , j { 1 , , n } ,
and
Vor D * ( P i ) : = X X D ( X : P i ) D ( X : P j ) , j { 1 , , n } ,
= X X D * ( P i : X ) D * ( P j : X ) , j { 1 , , n } ,
= Vor D * ( P i ) = Vor D * ( P i ) .
That is, the dual Voronoi cell Vor D * ( P i ) with respect to a dissimilarity D is the primal Voronoi cell Vor D * ( P i ) for the dual (reverse) dissimilarity D * .
In general, we can build a Voronoi diagram as a minimization diagram [8] by defining the n functions f i ( X ) : = D ( P i : X ) . Then X Vor D ( P i ) iff f i ( X ) f j ( X ) for all j { 1 , , n } . Thus, by building the lower envelope [8] of the n functions f 1 ( X ) , , f n ( X ) , we can retrieve the Voronoi diagram.
An important class of smooth asymmetric dissimilarities are the Bregman divergences [9]. A Bregman divergence B F is defined for a smooth and strictly convex functional generator F ( θ ) by
B F ( θ 1 : θ 2 ) : = F ( θ 1 ) F ( θ 2 ) ( θ 1 θ 2 ) F ( θ 2 ) ,
where F denotes the gradient of F. In information geometry [7,10,11], Bregman divergences are the canonical divergences of dually flat spaces [7]. Dually flat spaces generalize the (self-dual) Euclidean geometry obtained for the generator F Eucl ( θ ) = 1 2 θ θ . In information sciences, dually flat spaces can be obtained, for example, as the induced information geometry of the Kullback-Leibler divergence [12] of an exponential family manifold [7,13] or a mixture manifold [14]. The dual Bregman Voronoi diagrams and their dual regular complexes have been studied in [15,16].
In this paper, we study the Voronoi diagrams induced by the Fisher-Rao distance [17,18,19], the Kullback-Leibler (KL) divergence [12] and the chi square distance [20] for the family C of Cauchy distributions. Cauchy distributions also called Lorentzian distributions in the literature [21,22].
The paper is organized with our main contributions as follows:
In Section 2, we concisely review the information geometry of the Cauchy family: We first describe the hyperbolic Fisher-Rao geometry in Section 2.1 and make a connection between the Fisher-Rao distance and the chi square divergence, then we point out the remarkable fact that any α -geometry coincides with the Fisher-Rao geometry (Section 2.2), and we finally present dually flat geometric structures on the Cauchy manifold related to Tsallis’ quadratic entropy [23,24] which amount to a conformal flattening of the Fisher-Rao geometry (Section 2.4). Section 3.3 proves that the square root of the KL divergence between any two Cauchy distributions yields a metric distance (Theorem 3), and that this metric distance can be isometrically embedded in a Hilbert space for the case of Cauchy scale families (Theorem 4). Section 4 shows that the Cauchy Voronoi diagrams induced either by the Fisher-Rao distance, the chi-square divergence, or the Kullback-Leibler divergence (and its square root metrization) all coincide with a hyperbolic Voronoi diagram [25] calculated on the Cauchy 2D location-scale parameters. This result yields a practical and efficient construction algorithm of hyperbolic Cauchy Voronoi diagrams [25,26] (Theorem 5) and their dual hyperbolic Cauchy Delaunay complexes (explained in detail in Section 6). We prove that the hyperbolic Cauchy Voronoi diagrams are Fisher orthogonal to the dual Cauchy Delaunay complexes (Theorem 6). In Section 4.2, we show that the primal Voronoi diagram with respect to the flat divergence coincides with the hyperbolic Voronoi diagram, and that the Voronoi diagram with respect to the reverse flat divergence matches the ordinary Euclidean Voronoi diagram. Finally, we conclude this work in Section 5.

2. Information Geometry of the Cauchy Family

We start by reporting the Fisher-Rao geometry of the Cauchy manifold (Section 2.1), then show that all α -geometries coincide with the Fisher-Rao geometry (Section 2.2). Then we recall that we can associate an information-geometric structure to any parametric divergence (Section 2.3), and finally dually flatten this Fisher-Rao curved geometry using Tsallis’s quadratic entropy [23,24] (Section 2.4) and a conformal Fisher metric.

2.1. Fisher-Rao Geometry of the Cauchy Manifold

Information geometry [7,10,11] investigates the geometry of families of probability measures. The 2D family C of Cauchy distributions
C : = p λ ( x ) : = s π ( s 2 + ( x l ) 2 ) , λ : = ( l , s ) H : = R × R + ,
is a location-scale family [27] (and also a univariate elliptical distribution family [28]) where l R and s > 0 denote the location parameter and the scale parameter, respectively:
p l , s ( x ) : = 1 s p x l s ,
where
p ( x ) : = 1 π ( 1 + x 2 ) = : p 0 , 1 ( x )
is the Cauchy standard distribution.
Let l λ ( x ) : = log p λ ( x ) denote the log density. The parameter space H : = R × R + of the Cauchy family is called the upper plane. The Fisher-Rao geometry [17,19,29] of C consists in modeling C as a Riemannian manifold ( C , g FR ) by choosing the Fisher Information metric [7] (FIm)
g FR ( λ ) = [ g i j FR ( λ ) ] , g i j FR ( λ ) : = E p λ i l λ ( x ) j l λ ( x ) ,
as the Riemannian metric tensor, where m : = λ m for m { 1 , 2 } (i.e., 1 = l and 2 = s ). The matrix [ g i j FR ] is called the Fisher Information Matrix (FIM), and is the expression of the FIm tensor in a local coordinate system { e 1 , e 2 } : g i j FR ( λ ) = g ( e i , e j ) with i , j { 1 , 2 } .
The Fisher-Rao distance ρ FR [ p λ 1 , p λ 2 ] = ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] is then defined as the Riemannian geodesic length distance on the Cauchy manifold ( C , g FR ) :
ρ FR p λ 1 x , p λ 2 x = min λ ( s ) such   that λ ( 0 ) = λ 1 , λ ( 1 ) = λ 2 0 1 d λ ( t ) d t T g FR ( λ ( s ) ) d λ ( t ) d t d t .
The Fisher information metric tensor for the Cauchy family [28] is
g FR ( λ ) = g FR ( l , s ) = 1 2 s 2 1 0 0 1 ,
where λ = ( l , s ) H .
A generic formula for the Fisher-Rao distance between two univariate elliptical distributions is reported in [28]. This formula when instantiated for the Cauchy distributions yields the following closed-form formula for the Fisher-Rao distance:
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 log tan ψ 1 2 tan ψ 2 2 ,
where
ψ i = arcsin s i A , i { 1 , 2 } ,
A 2 = s 1 2 + ( l 2 l 1 ) 2 ( s 1 2 s 2 2 ) 2 4 ( l 2 l 1 ) 2 .
However, by noticing that the metric tensor for the Cauchy family (Equation (14)) is equal to the scaled metric tensor g P of the Poincaré (P) hyperbolic upper plane [30]:
g P ( x , y ) = 1 y 2 1 0 0 1 ,
we get a relationship between the square infinitesimal lengths (line elements) d s FR 2 = d l 2 + d s 2 2 s 2 and d s P 2 = d x 2 + d y 2 y 2 as follows:
d s FR = 1 2 d s P .
It follows that the Fisher-Rao distance between two Cauchy distributions is simply obtained by rescaling the 2D hyperbolic distance expressed in the Poincaré upper plane [30]:
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 ρ P ( l 1 , s 1 ; l 2 , s 2 )
where
ρ P ( l 1 , s 1 ; l 2 , s 2 ) : = arccosh 1 + δ ( l 1 , s 1 , l 2 , s 2 ) ,
with
arccosh ( x ) : = log x + x 2 1 , x > 1 ,
and
δ ( l 1 , s 1 ; l 2 , s 2 ) : = ( l 2 l 1 ) 2 + ( s 2 s 1 ) 2 2 s 1 s 2 .
This latter term δ shall naturally appear in Section 2.4 when studying the dually flat space obtained by conformal flattening the Fisher-Rao geometry. The expression δ ( l 1 , s 1 , l 2 , s 2 ) of Equation (23) can be interpreted as a conformal divergence for the squared Euclidean distance [31,32,33].
We may also write the delta term using the 2D Cartesian coordinates λ = ( λ ( 1 ) , λ ( 2 ) ) as:
δ ( λ 1 , λ 2 ) : = ( λ 2 ( 1 ) λ 1 ( 1 ) ) 2 + ( λ 2 ( 2 ) λ 1 ( 1 ) ) 2 2 λ 1 ( 2 ) λ 2 ( 2 ) = λ 1 λ 2 2 2 2 λ 1 ( 2 ) λ 2 ( 2 ) ,
where λ H .
In particular, when l 1 = l 2 , we get the simplified Fisher-Rao distance for Cauchy scale families:
ρ FR [ p l , s 1 , p l , s 2 ] = 1 2 log s 1 s 2 .
Proposition 1.
The Fisher-Rao distance between two Cauchy distributions is
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 log s 1 s 2 w h e n l 1 = l 2 , 1 2 arccosh 1 + ( l 2 l 1 ) 2 + ( s 2 s 1 ) 2 2 s 1 s 2 w h e n   l 1 l 2 .
The Fisher-Rao manifold of Cauchy distributions has constant negative scalar curvature κ = 2 , see [28] for detailed calculations.
Remark 1.
It is well-known that the Fisher-Rao geometry of location-scale families amount to a hyperbolic geometry [27]. For d-variate scale-isotropic Cauchy distributions p λ ( x ) with λ = ( l , s ) R d × R , the Fisher information metric is g FR ( λ ) = 1 2 s 2 I , where I denotes the ( d + 1 ) × ( d + 1 ) identity matrix. It follows that
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 arccosh 1 + Δ ( l 1 , s 1 , l 2 , s 2 ) ,
where
Δ ( l 1 , s 1 , l 2 , s 2 ) : = l 2 l 1 2 2 + ( s 2 s 1 ) 2 2 s 1 s 2 ,
where · 2 is the d-dimensional Euclidean 2 -norm: x = x x . That is, ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] is the scaled d-dimensional real hyperbolic distance [30] expressed in the Poincaré upper space model.
Let us mention that recently the Riemannian geometry of location-scale models was also studied from the complementary viewpoint of warped metrics [34,35].
Remark 2.
Li and Zhao [36] proposed to use the Wasserstein Information metric (WIm) expressed using the distribution parameter coordinates by the Wasserstein Information Matrix (WIM). They reported the explicit formula of the WIM for generic location-scale families:
I W ( l , s ) = E p λ [ x 2 ] 2 l E p λ [ x ] + l 2 s 2 0 0 1 .
In particular, the WIM of the Gaussian family (a location-scale family) is the identity matrix and yields the Euclidean geometry (see the Wasserstein geometry of Gaussians [37]). Although the WIM can be calculated for the Gaussian location-scale family, let us notice that the moments greater or equal to one (i.e., E [ X ] and E [ X 2 ] ) are not finite for the Cauchy distributions. Thus, the WIM is not well-defined for the Cauchy family since Equation (28) makes sense only for finite moments.

2.2. The Dualistic α -Geometry of the Statistical Cauchy Manifold

A statistical manifold [38] is a triplet ( M , g , T ) where g is a Riemannian metric tensor and T is a cubic totally symmetric tensor (i.e., T σ ( i ) σ ( j ) σ ( k ) = T i j k for any permutation σ ). For a parametric family of probability densities M = { p λ ( x ) } , the cubic tensor is called the skewness tensor [7], and defined by:
T i j k ( θ ) : = E p λ i l λ ( x ) j l λ ( x ) k l λ ( x ) .
A statistical manifold structure ( M , g , T ) allows one to construct Amari’s dualistic α-geometry [7] for any α R : Namely a quadruplet ( M , g FR , α , α ) where α and α are dual torsion-free affine connections coupled to the Fisher metric g FR (i.e., α = ( α ) * ). We refer the reader to the textbook [7] and the overview [11] for further details.
The Fisher-Rao geometry ( M , g FR ) corresponds to the 0-geometry, i.e., the self-dual geometry where 0 : = g is the Levi-Civita metric connection [7] induced by the metric tensor (with ( g ) * = g ). That is, we have
( C , g FR ) = ( C , g FR , 0 , 0 ) .
In information geometry, the invariance principle states that the geometry should be invariant under the transformation of a random variable X to Y provided that Y = t ( X ) is a sufficient statistics [7] of X. The α -geometry ( M , g FR , α , α ) and its special case of Fisher-Rao geometry are invariant geometry [7,11] for any α R .
A remarkable fact is that all the α -geometries of the Cauchy family coincide with the Fisher-Rao geometry since the cubic skewness tensor T vanishes everywhere [28], i.e., T i j k = 0 . The non-zero coefficients of the Christoffel symbols of the α -connections (including the Levi-Civita metric connection derived from the Fisher metric tensor) are:
α Γ 12 1 = α Γ 21 1 = α Γ 22 2 = 1 s ,
α Γ 11 2 = 1 s .
Thus, all α -geometries coincide and have constant negative scalar curvature κ = 2 . In other words, we cannot choose a value for α to make the Cauchy manifold dually flat [7]. To contrast with this result, Mitchell [28] reported values of α for which the α -geometry is dually flat for some parametric location-scale families of distributions: For example, it is well known that the manifold N of univariate Gaussian distributions is ± 1 -flat [7]. The manifold S k of t-Student’s distributions with k degrees of freedom is proven dually flat when α = ± k + 5 k 1 [28]. Dually flat manifolds are Hessian manifolds [39] with dual geodesics being straight lines in one of the two dual global affine coordinate systems. On a global Hessian manifold, the canonical divergences are Bregman divergences. Thus, these dually flat Bregman manifolds are computationally friendly [15] as many techniques of computational geometry [4] can be naturally extended to these Hessian spaces (e.g., the smallest enclosing balls [40]).

2.3. Dualistic Structures Induced by a Divergence

A divergence or contrast function [13] is a smooth parametric dissimilarity. Let M denote the manifold of its parameter space. Eguchi [13] showed how to associate to any divergence D a canonical information-geometric structure ( M , D g , D , D * ) . Moreover, the construction allows proving that D * = D * . That is the dual connection D * for the divergence D corresponds to the primal connection for the reverse divergence D * (see [7,11] for details).
Conversely, Matsumoto [41] proved that given an information-geometric structure ( M , g , , * ) , one can build a divergence D such that ( M , g , T ) = ( M , D g , D T ) from which we can derive the structure ( M , D g , D , D * ) . Thus, when calculating the Voronoi diagram Vor D for an arbitrary divergence D, we may use the induced information-geometric structure ( M , D g , D , D * ) to investigate some of the properties of the Voronoi diagram: For example, is the bisector Bi D D -autoparallel?, or is the bisector Bi D of two generators orthogonal with respect to the metric D g to their D -geodesic? Section 4 will study these questions in particular cases.

2.4. Dually Flat Geometry of the Cauchy Manifold by Conformal Flattening

The Cauchy distributions are usually handled in information geometry using the wider scope of q-Gaussians [7,22,42] (deformed exponential families [43]). The q-Gaussians also include the Student’s t-distributions. Cauchy distributions are q-Gaussians for q = 2 . These q-Gaussians are also called q-normal distributions [44], and they can be obtained as maximum entropy distributions with respect to Tsallis’ entropy T q ( · ) [23,24] (see Theorem 4.12 of [7]):
T q ( p ) : = 1 q 1 1 p q ( x ) d x , q 1 .
When q = 2 , we have the following Tsallis’ quadratic entropy:
T 2 ( p ) : = 1 p 2 ( x ) d x .
We have lim q 1 T q ( p ) = S ( p ) : = p ( x ) log p ( x ) d x , Shannon entropy.
Thus, q-Gaussians are q-exponential families [21], generalizing the MaxEnt exponential families derived from Shannon entropy [45]. The integral E ( p ) : = p 2 ( x ) d x corresponds to Onicescu’s informational energy [46,47]. Tsallis’ entropy is considered in non-extensive statistical physics [24].
A dually flat structure construction for q-Gaussians is reported in [7] (Sec. 4.3, pp. 84–89). We instantiate this construction for the Cauchy distributions (2-Gaussians):
Let
exp C ( u ) : = 1 1 u , u 1 ,
denote the deformed q-exponential and
log C ( u ) : = 1 1 u , u 0 ,
its compositional inverse, the deformed q-logarithm.
The probability density of a 2-Gaussian can be factorized as
p θ ( x ) = exp C ( θ x F ( θ ) ) ,
where θ denotes the 2D natural parameters. We have:
log C ( p θ ( x ) ) = 1 1 s π ( s 2 + ( x l ) 2 ) = 1 π s + ( x l ) 2 s ,
= : θ t ( x ) F ( θ ) ,
= 2 π l s x + π s x 2 θ t ( x ) π s + π l 2 s 1 F ( θ ) .
Therefore the natural parameter is θ ( l , s ) = ( θ 1 , θ 2 ) = 2 π l s , π s Θ = R × R (for t ( x ) = ( x , x 2 ) ) and the deformed log-normalizer is
F ( θ ( λ ) ) = π s + π l 2 s 1 = : F λ ( λ ) ,
F ( θ ) = π 2 θ 2 θ 1 2 4 θ 2 1 .
In general, we obtain a strictly convex and C 3 -function F q ( θ ) , called the q-free energy for a q-Gaussian family. Here, we let F ( θ ) : = F 2 ( q ) for the Cauchy family: F ( θ ) is the Cauchy free energy.
We convert back the natural parameter θ Θ to the ordinary parameter λ H as follows:
λ ( θ ) = ( l , s ) = θ 1 2 θ 2 , π θ 2 .
The gradient of the deformed log-normalizer is:
F ( θ ) = θ 1 2 θ 2 π 2 θ 2 2 + θ 1 2 4 θ 2 2 .
The gradient F ( θ ) defines the dual global affine coordinate system η : = F ( θ ) where η H = R × R + is the dual parameter space.
It follows the following divergence D flat [ p λ 1 : p λ 2 ] [7] between Cauchy densities which is by construction equivalent to a Bregman divergence B F ( θ 1 : θ 2 ) (canonical divergence in dually flat space) between their corresponding natural parameters (Eq. (4.95) of [7] instantiated for q = 2 ):
D flat [ p λ 1 : p λ 2 ] : = 1 p λ 2 2 ( x ) d x p λ 2 2 ( x ) p λ 1 ( x ) d x 1 ,
= 2 π s 2 s 1 2 + s 2 2 + ( l 1 l 2 ) 2 2 s 1 s 2 1 ,
= 2 π s 2 ( s 1 s 2 ) 2 + ( l 1 l 2 ) 2 2 s 1 s 2 ,
= 2 π s 2 δ ( l 1 , s 1 , l 2 , s 2 ) ,
= B F ( θ 1 : θ 2 ) ,
where θ 1 : = θ ( λ 1 ) and θ 2 : = θ ( λ 2 ) . We term B F ( θ 1 : θ 2 ) the Bregman-Tsallis (quadratic) divergence ( B F q for general q-Gaussians).
We used a computer algebra system (CAS, see Section 7) to calculate the closed-form formulas of the following definite integrals:
p λ 2 2 ( x ) d x = 1 2 π s 2 ,
p λ 2 2 ( x ) p λ 1 d x = s 1 2 + s 2 2 + ( l 1 l 2 ) 2 2 s 1 s 2 .
Here, observe that the equivalent Bregman divergence is not on swapped parameter order as it is the case for ordinary exponential families: D KL [ p θ 1 : p θ 2 ] = B F ( θ 2 : θ 1 ) where F denotes the cumulant function of the exponential family, see [7,11].
We term the divergence D flat the flat divergence because its induced affine connection [13] D flat has zero curvature (i.e., the 4D Riemann-Christofel curvature tensor induced by the connection vanishes, see [7] p. 134).
Since D flat [ p λ 1 : p λ 2 ] = 2 π s 2 δ ( l 1 , s 1 , l 2 , s 2 ) = π s 1 ( s 1 s 2 ) 2 + ( l 1 l 2 ) 2 , the flat divergence is interpreted as a conformal squared Euclidean distance [33], with conformal factor π s 1 . In general, the Fisher-Rao geometry of q-Gaussians has scalar curvature [44] κ = q 3 q . Thus, we recover the scalar curvature κ = 2 for the Fisher-Rao Cauchy manifold since q = 2 .
Theorem 1.
The flat divergence D flat [ p λ 1 : p λ 2 ] between two Cauchy distributions is equivalent to a Bregman divergence B F ( θ 1 : θ 2 ) on the corresponding natural parameters, and yields the following closed-form formula using the ordinary location-scale parameterization:
D flat [ p λ 1 : p λ 2 ] = 2 π s 2 δ ( l 1 , s 1 , l 2 , s 2 ) = π s 1 ( s 1 s 2 ) 2 + ( l 1 l 2 ) 2 = π s 1 λ 1 λ 2 2 2 .
The conversion of η -coordinates to θ -coordinates are calculated as follows:
θ ( η ) = 2 π η 1 η 2 η 1 2 π η 2 η 1 2 : = F * ( η ) ,
where
F * ( η ) : = θ ( η ) η F ( θ ( η ) ) ,
is the Legendre-Fenchel convex conjugate [7]:
F * ( η ) = 1 2 π η 2 η 1 2 .
Since
η ( λ ) = η ( θ ( λ ) ) = ( λ 1 , λ 1 2 + λ 2 2 ) = ( l , l 2 + s 2 ) ,
we have
F λ * ( λ ) : = F * ( η ( λ ) ) = 1 2 π l 2 + s 2 l 2 = 1 2 π s
that is independent of the location parameter l. Moreover, we have [7]
F λ * ( λ ) : = 1 1 p 2 ( x ) d x = 1 1 1 2 π s = 1 2 π s .
We can convert the dual parameter η to the ordinary parameter λ H as follows:
λ ( η ) = ( l , s ) = ( η 1 , η 2 η 1 2 ) .
It follows that we have the following equivalent expressions for the flat divergence:
D flat [ p λ 1 : p λ 2 ] = B F ( θ 1 : θ 2 ) = B F * ( η 2 : η 1 ) = A F ( θ 1 : η 2 ) = A F * ( η 2 : θ 1 ) ,
where
A F ( θ 1 : η 2 ) : = F ( θ 1 ) + F * ( η 2 ) θ 1 η 2 ,
is the Legendre-Fenchel divergence measuring the inequality gap of the Fenchel-Young inequality:
F ( θ 1 ) + F * ( η 2 ) θ 1 η 2 .
That is, A F ( θ 1 : η 2 ) = rhs ( θ 1 : η 2 ) lhs ( θ 1 : η 2 ) 0 , where rhs ( θ 1 : η 2 ) : = F ( θ 1 ) + F * ( η 2 ) and lhs ( θ 1 : η 2 ) = θ 1 η 2 .
The Hessian metrics of the dual convex potential functions F ( θ ) and F * ( η ) are:
2 F ( θ ) = 1 2 θ 2 θ 1 2 θ 2 2 θ 1 2 θ 2 2 θ 1 2 2 θ 2 2 2 π 2 θ 2 2 = : g F ( θ ) ,
2 F * ( η ) = 2 η 2 η 1 2 + 2 η 1 2 ( η 2 η 1 2 ) 3 2 η 1 ( η 2 η 1 2 ) 3 2 η 1 ( η 2 η 1 2 ) 3 2 1 2 ( η 2 η 1 2 ) 3 2 = : g F * ( η ) .
We check the Crouzeix identity [11,48]:
2 F ( θ ) 2 F * ( η ( θ ) ) = 2 F ( θ ( η ) ) 2 F * ( η ) = I ,
where I denotes the 2 × 2 identity matrix.
The Hessian metric 2 F ( θ ) is also called the q-Fisher metric [44] (for q = 2 ). Let g FR λ ( λ ) and g FR θ ( θ ) denote the Fisher information metric expressed using the λ -coordinates and the θ -coordinates, respectively. Then, we have
g FR θ ( θ ) = Jac λ ( θ ) × g FR λ ( λ ( θ ) ) × Jac λ ( θ ) ,
where Jac λ ( θ ) denotes the Jacobian matrix:
Jac λ ( θ ) : = λ i θ j .
Similarly, we can express the Hessian metric g F : = 2 F ( θ ) using the λ -coordinate system:
g F λ ( λ ) = Jac θ ( λ ) × g F θ ( θ ( λ ) ) × Jac θ ( λ ) .
We calculate explicitly the following Jacobian matrices:
Jac θ ( λ ) = π 2 λ 2 2 λ 1 λ 2 2 0 1 λ 2 2 .
and
Jac λ ( θ ) = 1 2 θ 2 θ 1 2 θ 2 2 0 π θ 2 2 .
We check that we have
g F θ ( θ ) = 2 θ 2 π 2 g FR θ ( θ ) ,
g F λ ( λ ) = 2 π σ g FR λ ( λ ) .
That is, the Riemannian metric tensors g FR λ ( λ ) and g F λ ( λ ) (or g F θ ( θ ) and g FR θ ( θ ) ) are conformally equivalent. This is, there exists a smooth function u ( λ ) = log 2 π σ such that g F λ ( λ ) = e u ( λ ) g FR λ ( λ ) .
This dually flat space construction of the Cauchy manifold
C , g ( θ ) = 2 F ( θ ) , D flat , D flat * = D flat *
can be interpreted as a conformal flattening of the curved α -geometry [7,44,49]. The relationships between the curvature tensors of dual ± α -connections are studied in [50].
Notice that this dually flat geometry can be recovered from the divergence-based structure of Section 2.3 by considering the Bregman-Tsallis divergence. Figure 2 illustrates the relationships between the invariant α -geometry and the dually flat geometry of the Cauchy manifold. The q-Gaussians can further be generalized by χ-family with corresponding deformed logarithm and exponential functions [7,45]. The χ-family unifies both the dually flat exponential family with the dually flat mixture family [45].
A statistical dissimilarity D [ p λ 1 : p λ 2 ] between two parametric distributions p λ 1 and p λ 2 amounts to an equivalent dissimilarity D ( θ 1 : θ 2 ) between their parameters: D ( θ 1 : θ 2 ) : = D [ p λ 1 : p λ 2 ] . When the parametric dissimilarity is smooth, one can construct the divergence-based α -geometry [11,51]. Thus, the dually flat space structure of the Cauchy manifold can also be obtained from the divergence-based ± α -geometry obtained from the flat divergence D flat (see Figure 2). It can be shown that the dually flat space q-geometry is the unique geometry in the intersection of the conformal Fisher-Rao geometry with the deformed χ -geometry (Theorem 13 of [45]) when the manifold is the positive orthant R d + 1 . Please note that a dually flat space in information geometry is usually not Riemannian flat (with respect to the Levi-Civita connection, e.g., the Gaussian manifold). In particular, Matsuzoe proved in [52] that the Riemannian manifold ( C , 2 F ( θ ) ) induced by the q-Fisher metric is of constant curvature 1 when q = 2 .
There are many alternative possible ways to build a dually flat space from a q-Gaussian family once a convex Bregman generator F ( θ ) has been built from the density p q ( θ ) of a q-Gaussian. The method presented above is a natural generalization of the dually flat space construction for exponential families. To give another approach, let us mention that Matsuzoe [52] also introduced another Hessian metric g M ( θ ) = [ g i j M ( θ ) ] defined by:
g i j M ( θ ) : = i p θ ( x ) j log q p θ ( x ) d x .
This metric is conformal to both the Fisher metric and the q-Fisher metric, and is obtained by generalizing equivalent representations of the Fisher information matrix (see α -representations in [7]).

3. Invariant Divergences: f -Divergences and α -Divergences

3.1. Invariant Divergences in Information Geometry

The f-divergences [20,53] between two densities p ( x ) and q ( x ) is defined for a positive convex function f, strictly convex at 1, with f ( 1 ) = 0 as:
I f [ p : q ] : = X p ( x ) f q ( x ) p ( x ) d x ,
The KL divergence is a f-divergence obtained for the generator f ( u ) = log ( u ) .
An invariant divergence is a divergence D which satisfies the information monotonicity [7]: D [ p X : p Y ] D [ p t ( X ) : p t ( Y ) ] with equality iff t ( X ) is a sufficient statistic. The invariant divergences are the f-divergences for the simplex sample space [7]. Moreover, the standard f-divergences (calibrated with f ( 1 ) = 0 and f ( 1 ) = f ( 1 ) = 1 ) induce the Fisher information metric (FIm) for its metric tensor I f g when the sample space is the probability simplex: I f g = g FR , see [7].

3.2. α -Divergences between Location-Scale Densities

Let I α [ p : q ] denote the α-divergence [7,54,55] between p and q:
I α [ p : q ] : = 1 α ( 1 α ) ( 1 C α [ p : q ] ) , α { 0 , 1 }
where C α [ p : q ] is Chernoff α-coefficient [56,57]:
C α [ p : q ] : = p α ( x ) q 1 α ( x ) d x ,
= q ( x ) p ( x ) q ( x ) α ,
= C 1 α [ q : p ] .
We have I α [ p : q ] = I 1 α [ q : p ] = I α * [ p : q ] .
The α -divergences include the chi square divergence ( α = 2 ), the squared Hellinger divergence ( α = 1 2 , symmetric) and in the limit cases the Kullback-Leibler (KL) divergence ( α 1 ) and the reverse KL divergence ( α 0 ). The α -divergences are f-divergences for the generator:
f α ( u ) = u 1 α u α ( α 1 ) , if α 0 , α 1 u log ( u ) , if α = 0 ( reverse Kullback - Leibler divergence ) , log ( u ) , if α = 1 ( Kullback - Leibler divergence ) .
For location scale families, let
C α ( l 1 , s 1 ; l 2 , s 2 ) : = C α p l 1 , s 1 : p l 2 , s 2 .
Using change of variables in the integrals, one can show the following identities:
C α ( l 1 , s 1 ; l 2 , s 2 ) = C α 0 , 1 ; l 2 l 1 s 1 , s 2 s 1 ,
= C α l 1 l 2 s 2 , s 1 s 2 ; 0 , 1 ,
= C 1 α 0 , 1 ; l 1 l 2 s 2 , s 1 s 2 ,
= C 1 α ( l 2 , s 2 ; l 1 , s 1 ) .
For the location-scale families which include the normal family N , the Cauchy family C and the t-Student families S k with fixed degree of freedom k, the α -divergences are not symmetric in general (e.g., α -divergences between two normal distributions). However, we have shown that the chi square divergences and the KL divergence are symmetric when densities belong to the Cauchy family. Thus, it is of interest to prove whether the α -divergences between Cauchy densities are symmetric or not, and report their closed-form formula for all α R .
Using symbolic integration described in Section 7, we found that
C 3 ( p λ 1 ; p λ 2 ) = 3 s 2 4 + ( 2 s 1 2 + 6 l 2 2 12 l 1 l 2 + 6 l 1 2 ) s 2 2 + 3 s 1 4 + ( 6 l 2 2 12 l 1 l 2 + 6 l 1 2 ) s 1 2 + 3 l 2 4 12 l 1 l 2 3 + 18 l 1 2 l 2 2 12 l 1 3 l 2 + 3 l 1 4 ) 8 s 1 2 s 2 2 ,
and checked that this Chernoff similarity coefficient is symmetric:
C 3 ( p λ 1 ; p λ 2 ) = C 3 ( p λ 2 ; p λ 1 ) .
Therefore the 3-divergence I 3 between two Cauchy distributions is symmetric. In particular, when l 1 = l 2 = l , we find that
C 3 ( p l , s 1 ; p l , s 2 ) = 3 ( s 1 4 + s 2 4 ) + 2 s 1 2 s 2 2 8 s 1 2 s 2 2 ,
= 1 + 3 4 ( s 1 2 s 2 2 ) 2 2 s 1 2 s 2 2 ,
= 1 + 3 4 δ ( l 2 , s 1 2 , l 2 2 , s 2 2 ) .
In the Section 7, we proved by symbolic calculations that the α -divergences are symmetric for α { 0 , 1 , 2 , 3 , 4 } .
Remark 3.
The Cauchy family can also be interpreted as a family of univariate elliptical distributions [28]. A univariate elliptical distribution has canonical parametric density:
q μ , σ ( x ) : = 1 σ h x μ σ 2 ,
for some function h ( u ) . For example, the Gaussian distributions are elliptical distributions obtained for h ( u ) = 1 2 π exp 1 2 u . Location-scale densities p l , s with standard density p 0 , 1 can be interpreted as univariate elliptical distributions q μ , σ with h ( u ) = p 0 , 1 ( u 2 ) and ( μ , σ ) = ( l , s ) : p l , s = q μ , σ . It follows that the Cauchy densities are elliptical distributions for h ( u ) = 1 π ( 1 + u ) . By doing a change of variable in the KL divergence integral, we find again the following identity:
D KL q μ 1 , σ 1 : q μ 2 , σ 2 = D KL q 0 , 1 : q μ 2 μ 1 σ 2 , σ 1 σ 2 .

3.3. Metrization of the Kullback-Leibler Divergence

The Kullback-Leibler divergence [12] D KL [ p : q ] between two continuous probability densities p and q defined over the real line support is an oriented dissimilarity measure defined by:
D KL [ p : q ] : = p ( x ) log p ( x ) q ( x ) d x .
The closed-form formula for the KL divergence between two Cauchy distributions requires to perform a (non-trivial) integration task. The following closed-form expression has been reported in [58] using advanced symbolic integration:
D KL [ p l 1 , s 1 : p l 2 , s 2 ] = log 1 + ( s 1 s 2 ) 2 + ( l 1 l 2 ) 2 4 s 1 s 2 .
Although the KL divergence is usually asymmetric, it is a remarkable fact that it is symmetric between any two Cauchy densities. However, the KL divergence of Equations (92) and (93) does not satisfy the triangle inequality, and therefore although symmetric, it is not a metric distance.
The KL divergence between two Cauchy distributions is related to the Pearson D χ P 2 [ p : q ] and Neyman D χ N 2 [ p : q ] chi square divergences [20]:
D χ P 2 [ p : q ] : = ( q ( x ) p ( x ) ) 2 p ( x ) d x ,
D χ N 2 [ p : q ] : = ( q ( x ) p ( x ) ) 2 q ( x ) d x = D χ P 2 * [ p : q ] = D χ P 2 [ q : p ] .
Indeed, the formula for the Pearson and Neyman chi square divergences between two Cauchy distributions coincide, and (surprisingly) amount to the δ distance:
D χ P 2 [ p l 1 , s 1 : p l 2 , s 2 ] = D χ N 2 [ p l 1 , s 1 : p l 2 , s 2 ] ,
= ( s 1 s 2 ) 2 + ( l 2 l 1 ) 2 2 s 1 s 2 ,
= : δ ( l 1 , s 1 ; l 2 , s 2 ) .
Since the Pearson and Neyman chi square divergences are symmetric, let us write D χ 2 [ p : q ] = D χ P 2 [ p : q ] in the remainder. We can rewrite the Fisher-Rao distance between two Cauchy distributions using the D χ 2 divergence as follows:
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 arccosh 1 + D χ 2 [ p l 1 , s 1 : p l 2 , s 2 ] .
Figure 3 plots the strictly increasing chi-to-Fisher-Rao conversion function:
t χ FR ( u ) : = 1 2 arccosh 1 + u .
Since the Cauchy family is a location-scale family, we have the following general invariance property of f-divergences:
Theorem 2.
The f-divergence [53] between two location-scale densities p l 1 , s 1 and p l 2 , s 2 can be reduced to the calculation of the f-divergence between one standard density with another location-scale density:
I f [ p l 1 , s 1 : p l 2 , s 2 ] = I f p : p l 2 l 1 s 1 , s 2 s 1 = I f p l 1 l 2 s 2 , s 1 s 2 : p .
Proof. 
The proof follows from changes of the variable x in the definite integral of Equation (74): Consider y = x l 1 s 1 with d x = s 1 d y , x = s 1 y + l 1 and x l 2 s 2 = s 1 y + l 1 l 2 s 2 = y l 2 l 1 s 1 s 2 s 1 . We have
I f [ p l 1 , s 1 : p l 2 , s 2 ] : = X p l 1 , s 1 ( x ) f p l 2 , s 2 ( x ) p l 1 , s 1 ( x ) d x ,
= Y 1 s 1 p ( y ) f 1 s 2 p y l 2 l 1 s 1 s 2 s 1 1 s 1 p ( y ) s 1 d y ,
= p ( y ) f p l 2 l 1 s 1 , s 2 s 1 ( y ) p ( y ) d y ,
= I f p : p l 2 l 1 s 1 , s 2 s 1 .
The proof for I f [ p l 1 , s 1 : p l 2 , s 2 ] = I f ( p l 1 l 2 s 2 , s 1 s 2 : p ) is similar. One can also use the conjugate generator f * ( u ) : = u f ( 1 u ) which yields the reverse f-divergence: I f * [ p : q ] = I f [ q : p ] = I f * [ p : q ] . □
Since the KL divergence is expressed by D KL [ p l 1 , s 1 : p l 2 , s 2 ] = log 1 + 1 2 δ ( l 1 , s 1 ; l 2 , s 2 ) , we also check that
δ ( l 1 , s 1 ; l 2 , s 2 ) = δ 0 , 1 ; l 1 l 2 s 2 , s 1 s 2 ,
= δ l 2 l 1 s 1 , s 2 s 1 ; 0 , 1 ,
= : δ ( a , b ) ,
where
δ ( a , b ) : = a 2 + ( b 1 ) 2 4 b .
It follows the following corollary for scale families:
Corollary 1.
The f-divergences between scale densities is scale-invariant and amount to a scalar scale-invariant divergence D f ( s 1 : s 2 ) : = I f [ p s 1 : p s 2 ] .
Proof. 
D f ( s 1 : s 2 ) : = I f [ p s 1 : p s 2 ] = I f p : p s 2 s 1 = : D f 1 : s 2 s 1 ,
= I f p s 1 s 2 : q = : D f s 1 s 2 : 1 .
 □
Many algorithms and data-structures can be designed efficiently when dealing with metric distances: For example, the metric ball tree [59] or the vantage point tree [60,61] are two such data structures for querying efficiently nearest neighbors in metric spaces. Thus, it is of interest to consider statistical dissimilarities which are metric distances. The total variation distance [12] and the square-root of the Jensen-Shannon divergence [62] are two common examples of statistical metric distances often met in the literature. In general, the metrization of f-divergences was investigated in [63,64].
We shall prove the following theorem:
Theorem 3.
The square root of the Kullback-Leibler divergence between two Cauchy density p l 1 , s 1 and p l 2 , s 2 is a metric distance:
ρ KL [ p l 1 , s 1 , p l 2 , s 2 ] : = D KL [ p l 1 , s 1 : p l 2 , s 2 ] = log 1 + ( s 1 s 2 ) 2 + ( l 1 l 2 ) 2 4 s 1 s 2 .
Proof. 
The proof consists in showing that the square root of the conversion function of the Fisher-Rao distance to the KL divergence is a metric transform [65]. A metric transform t ( u ) : R + R + is a transform which preserves the metric distance ρ , i.e., ( t ρ ) ( p , q ) = t ( ρ ( p , q ) ) is a metric distance. The following are sufficient conditions for function t ( u ) to be a metric transform:
  • t is a strictly increasing function,
  • t ( 0 ) = 0 ,
  • t satisfies that subadditive property: t ( a + b ) t ( a ) + t ( b ) for all a , b 0 .
For example, strictly concave functions t ( u ) with t ( 0 ) = 0 are metric transforms. In general, one can check that t ( u ) is subadditive by verifying that the ratio of functions t ( u ) u is non-decreasing.
The following transform t FR KL ( u ) converts the Fisher-Rao distance ρ FR to the Kullback-Leibler divergence D KL :
t FR KL ( u ) : = log 1 2 + 1 2 cosh ( 2 u ) ,
where
cosh ( x ) : = e x + e x 2 .
The square root of that conversion function is a subadditive function since t FR KL ( u ) u is non-decreasing (see Figure 4) and t FR KL ( 0 ) = 0 .
Since the Fisher-Rao distance is a metric distance and since t FR KL ( u ) is a metric transform, we conclude that
ρ KL [ p l 1 , s 1 : p l 2 , s 2 ] : = D KL [ p l 1 , s 1 : p l 2 , s 2 ] = t FR KL ( ρ FR [ p l 1 , s 1 : p l 2 , s 2 ] )
is a metric distance. □
A metric distance ρ ( p , q ) is said to be Hilbertian if there exists an embedding ϕ ( · ) into a Hilbert space such that ρ ( p , q ) = ϕ ( p ) ϕ ( q ) H , where · H is a norm. A metric is said to be Euclidean if there exists an embedding with associated norm 2 , the Euclidean norm. For example, the square root of the celebrated Jensen-Shannon divergence is a Hilbertian distance [62].
Let us prove the following:
Theorem 4.
The square root of the KL divergence between to Cauchy densities of the same scale family is a Hilbertian distance.
Proof. 
For Cauchy distributions with fixed location parameter l, the KL divergence of Equation (93) simplifies to:
D KL [ p l , s 1 : p l , s 2 ] = log ( s 1 + s 2 ) 2 4 s 1 s 2 .
We can rewrite this KL divergence as
D KL [ p l , s 1 : p l , s 2 ] = 2 log A ( s 1 , s 2 ) G ( s 1 , s 2 ) ,
where A ( s 1 , s 2 ) = s 1 + s 2 2 and G ( s 1 , s 2 ) = s 1 s 2 are the arithmetic mean and the geometric mean of s 1 and s 2 , respectively. Then we use Lemma 3 of [66] to conclude that D KL [ p l , s 1 : p l , s 2 ] is a Hilbertian metric distance.
Another proof consists in rewriting the KL divergence as a scaled Jensen-Bregman divergence [66,67]:
D KL [ p l , s 1 : p l , s 2 ] = 2 JB F ( s 1 , s 2 ) ,
where
JB F ( θ 1 , θ 2 ) : = F ( θ 1 ) + F ( θ 2 ) 2 F θ 1 + θ 2 2 ,
for a strictly convex generator F. We use F ( θ ) = log ( u ) , i.e., the Burg information yielding the Jensen-Burg divergence JB F . Then we use Corollary 1 of [66] (i.e., F is the cumulant of an infinitely divisible distribution) to conclude that JB F ( θ 1 , θ 2 ) is a metric distance (and hence, ρ KL ( l , s 1 , l , s 2 ) = D KL [ p l , s 1 : p l , s 2 ] = 2 JB F ( s 1 , s 2 ) is a Hilbertian metric distance). □
The α-skewed Jensen-Bregman divergence is defined by
JB F α ( θ 1 : θ 2 ) : = α F ( θ 1 ) + ( 1 α ) F ( θ 2 ) F α θ 1 + ( 1 α ) θ 2 ,
and the maximal α -skewed Jensen-Bregman divergence is called the Jensen-Chernoff divergence:
JB F α * ( θ 1 : θ 2 ) : = max α ( 0 , 1 ) JB F α ( θ 1 : θ 2 ) .
The maximal exponent α * corresponds to the error exponent in Bayesian hypothesis testing on exponential family manifolds [57]. In general, the metrization of Jensen-Bregman divergence (and Jensen-Chernoff) was studied in [68].
Furthermore, by combining Corollary 1 of [66] with Theorem 3 of [67], we get the following proposition:
Proposition 2.
The square root of the Bhattacharyya divergence between two densities of an exponential family is a metric distance when the exponential family is infinitely divisible.
This proposition holds because the Bhattacharyya divergence
D Bhat [ p , q ] = log p ( x ) q ( x ) d x ,
between two parametric densities p ( x ) = p θ 1 ( x ) and q ( x ) = p θ 2 ( x ) of an exponential family with cumulant function F amounts to a Jensen-Bregman divergence [67] (Theorem 3 of [67]):
D Bhat [ p θ 1 ( x ) , p θ 2 ( x ) ] = JB F ( θ 1 , θ 2 ) .
Notice that Proposition 2 recovers the fact that the square root of the Bhattacharyya divergence between two zero-centered normal distributions is a metric (proved differently in [69]) since the set of normal distributions form an infinitely divisible exponential family.

4. Cauchy Voronoi Diagrams and Dual Cauchy Delaunay Complexes

Let us consider the Voronoi diagram [1] of a finite set P = { p λ 1 , p λ n } of n Cauchy distributions with the location-scale parameters λ i = ( l i , s i ) H for i { 1 , , n } . We shall consider the Fisher-Rao distance ρ FR , the KL divergence D KL and its square root metrization ρ KL , the chi square divergence D χ 2 , and the flat divergence D flat .

4.1. The Hyperbolic Cauchy Voronoi Diagrams

Observe that the Voronoi diagram does not change under any strictly increasing function t of the dissimilarity measure (e.g., square root function): Vor D t ( P ) = Vor D ( P ) . Thus, we get the following theorem:
Theorem 5.
The Cauchy Voronoi diagrams under the Fisher-Rao distance, the the chi-square divergence and the Kullback-Leibler divergence all coincide, and amount to a hyperbolic Voronoi diagram on the corresponding location-scale parameters.
Proof. 
The KL divergence can be expressed as
D KL [ p l 1 , s 1 : p l 2 , s 2 ] = log 1 + 1 2 δ ( l 1 , s 1 , l 2 , s 2 ) .
Thus, both the D KL and ρ FR dissimilarities are expressed as strictly increasing functions of δ (a synonym for the D χ 2 divergence). Therefore the Voronoi bisectors between two Cauchy distributions p l 1 , s 1 and p l 2 , s 2 for D { ρ FR , D KL , D KL , D χ 2 } amounts to the same expression:
Bi D ( p λ 1 : p λ 2 ) = λ H : δ ( λ , λ 1 ) = δ ( λ , λ 2 ) ,
Bi D ( p l 1 , s 1 : p l 2 , s 2 ) = ( l , s ) H : δ ( l , s , l 1 , s 1 ) = δ ( l , s , l 2 , s 2 ) .
 □
It follows that we can calculate the Cauchy Voronoi diagram of n Cauchy distributions in optimal Θ ( n log n ) time by calculating the 2D hyperbolic Voronoi diagram [25,26] on the location-scale parameters (see Section 6 for details). Figure 5 displays the Voronoi diagram of a set of Cauchy distributions by its equivalent parameter hyperbolic Voronoi diagram in the Poincaré upper plane model, the Poincaré disk model, and the Klein disk model. Figure 6 shows the hyperbolic Voronoi diagram in the upper plane with colored Voronoi cells. A model of hyperbolic geometry is said to be conformal if it preserves angles, i.e., its underlying Riemannian metric tensor is a scalar positive function of the Euclidean metric tensor. The Poincaré disk model and the Poincaré upper plane model are both conformal models [30]. The Klein model is not conformal, except at the disk origin. Let  D = { p : p < 1 } denote the open unit disk domain for the Poincaré and Klein disk models. Indeed, the Riemannian metric corresponding to the Klein disk model is
d s Klein 2 ( p ) = d s Eucl 2 1 p 2 + p , d p 1 p 2 2 ,
where d p = d x + d y and d s Eucl = d x 2 + d y 2 denotes the Euclidean line element. Since d s Klein 2 ( 0 ) = d s Eucl 2 , we deduce that Klein model is conformal at the origin (when measuring the angles between two vectors v 1 and v 2 of the tangent plane T 0 ).
The dual of the Voronoi diagram is called the Delaunay (simplicial) complex [4,5]: We build the Delaunay complex by drawing an edge between generators whose Voronoi cells are adjacent. For the ordinary Euclidean Delaunay complex with points in general position (i.e., no d + 2 cospherical points in dimension d), the Delaunay complex triangulates the convex hull of the points [8,70]. Therefore it is called the Delaunay triangulation [1,3,8]. Figure 7 displays an Euclidean Voronoi diagram with its dual Delaunay triangulation.
Similarly, for the hyperbolic Voronoi diagram, we construct the hyperbolic Delaunay complex by drawing a hyperbolic geodesic edge between any two generators whose Voronoi cells are adjacent. However, we do not necessarily obtain anymore a geodesic triangulation of the hyperbolic geodesic convex hull but rather a simplicial complex, hence the name hyperbolic Delaunay complex [5,71,72]. In extreme cases, the hyperbolic Delaunay complex has a tree structure. See Figure 8 for examples of a hyperbolic Delaunay triangulation and a hyperbolic Delaunay complex which is not a triangulation In fact, hyperbolic geometry is very well-suited for embedding isometrically with low distortion weighted tree graphs [73]. Hyperbolic embeddings of hierarchical structures [74] has become a hot topic in machine learning.
Let us now prove that these Cauchy hyperbolic Voronoi/Delaunay structures are Fisher orthogonal:
Theorem 6.
The Cauchy Voronoi diagram is Fisher orthogonal to the Cauchy Delaunay complex.
Proof. 
It is enough to prove that the corresponding hyperbolic geodesic γ ( p λ 1 , p λ 2 ) is orthogonal to the bisector Bi ( p λ 1 : p λ 2 ) . The distance in the Klein disk model is
ρ Klein ( p , q ) = ρ K ( p , q ) : = arccosh 1 p , q 1 p 2 1 q 2 .
The equation of the hyperbolic bisector in the Klein disk model [25] is
Bi ρ Klein ( λ 1 : λ 2 ) = λ D : λ 1 λ 1 2 λ 2 1 λ 2 2 λ 1 + 1 λ 2 2 1 λ 1 2 = 0 .
Using a Möbius transformation [25] (i.e., a hyperbolic “rigid motion”), we may consider without loss of generality that p λ 1 = p λ 2 . It follows that the bisector equation writes simply as
Bi ρ Klein = λ : 2 1 p λ 1 λ λ 1 = 0 .
Since the Klein disk model is conformal at the origin, we deduce from Equation (130) that we have γ ( p λ 1 , p λ 2 ) Bi ( p λ 1 : p λ 2 ) . □
Figure 9 displays two bisectors with their corresponding geodesics in the Klein model. We check that the Euclidean angles are deformed when the intersection point is not at the disk origin. Section 6 provides further details for the efficient construction of the hyperbolic Voronoi diagram in the Klein model.
Remark 4.
The hyperbolic Cauchy Voronoi diagram can be used for classification tasks in statistics as originally motivated by C.R. Rao in his celebrated paper [17]: Let p λ 1 , , p λ n be n Cauchy distributions, and x 1 , , x s be s identically and independently samples drawn from a Cauchy distribution p λ . We can estimate λ ^ the location-scale parameters from the s samples [75], and then decide the multiple test hypothesis H i : p λ = p λ i by choosing the hypothesis H i such that ρ FR ( p λ i , p λ ) ρ FR ( p λ j , p λ ) for all j { 1 , , n } . This classification task amounts to perform a nearest neighbor query in the Fisher-Rao hyperbolic Cauchy Voronoi diagram. Hypothesis testing for comparing location parameters based on Rao’s distance is investigated in [76].
Figure 10 displays the hyperbolic Voronoi Cauchy diagram induced by 300 Cauchy distribution generators.
Notice that it is possible to construct a set of points such that all hyperbolic Voronoi cells for that point set are unbounded. See Figure 11 for such an example.
The ordinary Euclidean Delaunay triangulation satisfies the empty sphere property [4,77]: That is the circumscribing spheres passing through the vertices of the Delaunay triangles of the Delaunay complex are empty of any other Voronoi site. This property still holds for the hyperbolic Delaunay complex which is obtained by a filtration of the ordinary Euclidean Delaunay triangulation in [5]. A hyperbolic ball in the Poincaré conformal disk model or the upper plane model has the shape of a Euclidean ball with displaced center [71]. Figure 12 displays the Delaunay complex with the empty sphere property in the Poincaré and Klein disk models. The centers of these circumscribing spheres are located at the T-junctions of the Voronoi diagrams.

4.2. The Dual Voronoi Diagrams on the Cauchy Dually Flat Manifold

The dual Cauchy Voronoi diagrams with respect to the flat divergence D flat (and dual reverse flat divergence D flat * which corresponds to a dual Bregman-Tsallis divergence) of Section 2.4 amount to calculate 2D dual Bregman Voronoi diagrams [15,16]. We get the following dual bisectors: The primal bisector with respect to the dual flat divergence is:
Bi D flat ( p λ 1 : p λ 2 ) = p λ : D flat [ p λ 1 : p λ ] = D flat [ p λ 2 : p λ ] ,
= λ : δ ( l 1 , s 1 ; l , s ) = δ ( l 2 , s 2 ; l , s ) .
Thus, this primal bisector with respect to the flat divergence corresponds to the hyperbolic bisector of the Fisher-Rao distance/chi square/ KL divergences:
Bi D flat ( p λ 1 : p λ 2 ) = Bi ρ FR ( p λ 1 : p λ 2 ) = Bi D KL ( p λ 1 : p λ 2 ) = Bi D χ 2 ( p λ 1 : p λ 2 ) .
The dual bisector with respect to the dual flat divergence (reverse Bregman-Tsallis divergence) is:
Bi D flat * ( p λ 1 : p λ 2 ) = p λ : D flat [ p λ : p λ 1 ] = D flat [ p λ : p λ 2 ] ,
= λ : λ λ 1 = λ λ 2 .
That is, the dual bisector corresponds to an ordinary Euclidean bisector:
Bi D flat * ( p λ 1 : p λ 2 ) = Bi ρ E ( p λ 1 , p λ 2 ) .
Notice that Bi D flat * ( p λ 1 : p λ 2 ) = Bi D flat * ( p λ 1 : p λ 2 ) .
To summarize, one primal bisector coincides with the Fisher-Rao bisector while the dual bisector amounts to the ordinary Euclidean bisector.
Theorem 7.
The dual Cauchy Voronoi diagrams with respect to the flat divergence can be calculated efficiently in Θ ( n log n ) -time.
The construction of 2D Bregman Voronoi diagrams is described in [15].

4.3. The Cauchy Voronoi Diagrams with Respect to α -Divergences

The dual bisectors with respect to the α -divergences between any two parametric probability densities p λ 1 ( x ) and p λ 2 ( x ) are
Bi I α ( p λ 1 : p λ 2 ) = p λ : I α [ p λ 1 : p λ ] = I α [ p λ 2 : p λ ] ,
= λ : C α ( p λ 1 ; p λ ) = C α ( p λ 2 ; p λ ) ,
and
Bi I α * ( p λ 1 : p λ 2 ) = p λ : I α [ p λ : p λ 1 ] = I α [ p λ : p λ 2 ] ,
= Bi I 1 α ( p λ 1 : p λ 2 ) .
It is an open problem to prove when the dual α -bisectors coincide for the Cauchy family. We have shown it is the case for the χ 2 -divergence and the KL divergence. In theory, the Risch semi-algorithm [78] allows one to answer whether a definite integral has a closed-form formula or not. However, the Risch semi-algorithm is only a semi-algorithm as it requires to implement an oracle to check whether some mathematical expressions are equivalent to zero or not.

5. Conclusions

In this paper, we have considered the construction of Voronoi diagrams of finite sets of Cauchy distributions with respect to some common statistical distances. Since statistical distances can potentially be asymmetric, we defined the dual Voronoi diagrams with respect to the forward and reverse/dual statistical distances. From the viewpoint of information geometry [7], we have reported the construction of two types of geometry on the Cauchy manifold: (1) The invariant α-geometry equipped with the Fisher metric tensor g FR and the skewness tensor T from which we can build a family of pairs of torsion-free affine connections coupled with the metric, and (2) a dually flat geometry induced by a Bregman generator defined by the free energy F q of the q-Gaussians (here, instantiated to q = 2 when dealing with the Cauchy family). The metric tensor of the latter geometry is called the q-Fisher information metric, and is a Riemannian conformal metric of the Fisher information metric. We have shown that the Fisher-Rao distance amount to a scaled hyperbolic distance in the Poincaré upper plane model (Proposition 1), and that all Amari’s α -geometries [7] coincide with the Fisher-Rao geometry since the cubic tensor vanishes, thus yielding a hyperbolic manifold of negative constant scalar curvature κ = 2 for the Cauchy α -geometric manifolds. We noticed that the Fisher-Rao distance and the KL divergence can be expressed as a strictly increasing function of the chi square divergence. Then we explained how to conformally flatten the curved Fisher-Rao geometry to obtain a dually flat space where the flat divergence amounts to a canonical Bregman divergence built from Tsallis’ quadratic entropy (Theorem 1). We reported the Hessian metrics of the dual potential functions of the dually flat space, and showed that there are other alternative choices for building Hessian structures [52].
Table 1 summarizes the various closed-form formula of statistical dissimilarities obtained for the Cauchy family. We proved that the square root of the KL divergence between any two Cauchy distributions is a metric distance (Theorem 3) in general, and more precisely a Hilbertian metric for the scale Cauchy families (Theorem 4). It follows that the Cauchy Voronoi diagram for the Fisher-Rao distance coincides with the Voronoi diagram with respect to the KL divergence or the chi square divergence (Figure 13). We showed how to build this hyperbolic Cauchy diagram from an equivalent hyperbolic Voronoi diagram on the corresponding location-scale parameters (see also Section 6). Then we proved that the dual hyperbolic Cauchy Delaunay complex is Fisher orthogonal to the Fisher-Rao hyperbolic Cauchy Voronoi diagram (Theorem 6). The dual Voronoi diagrams with respect to the dual flat divergences can be built from the corresponding dual Bregman-Tsallis divergences with the primal Voronoi diagram coinciding with the hyperbolic Voronoi diagram and the dual diagram coinciding with the ordinary Euclidean Voronoi diagram (Figure 13). These results are particular to the special case of the Cauchy location-scale family, and do not hold in general for arbitrary location-scale families since the cubic tensor may not vanish [28] and the KL divergence is usually asymmetric (e.g., the Gaussian location-scale family). However, the Fisher-Rao geometry of any location-scale family amounts after a potential rescaling to hyperbolic geometry [27,79].

6. Klein Hyperbolic Voronoi Diagram from a Clipped Power Diagram

We concisely recall the efficient construction of the hyperbolic Voronoi diagram in the Klein disk model [25]. Let P = { p 1 , , p n } be a set of n points in the d-dimensional open unit ball domain D = x R d : x 2 < 1 , where · 2 denotes the Euclidean 2 -norm. The hyperbolic distance between two points p and q is expressed in the Klein model as follows:
ρ K ( p , q ) : = arccosh 1 p , q 1 p 2 2 1 q 2 2 .
It follows that the Klein bisector between any two points in the Klein disk is an hyperplane (affine equation) clipped to D :
Bi ρ K ( λ 1 : λ 2 ) = λ D : λ 1 λ 1 2 2 λ 2 1 λ 2 2 2 λ 1 + 1 λ 2 2 2 1 λ 1 2 2 = 0 .
The Klein bisector is a hyperplane (i.e., line in 2D) restricted to the disk domain D . A Voronoi diagram is said to be affine [8] when all bisectors are hyperplanes. It is known that affine Voronoi diagrams can be constructed from equivalent power diagrams [8]. Thus, the Klein hyperbolic Voronoi diagram is equivalent to a clipped power diagram:
Vor ρ K ( P ) = Vor D PD ( S ) D ,
where
D PD ( σ , x ) : = x c 2 w ,
denotes the power “distance” between a point x (and more generally a weighted point [80] when the weight can be negative) to a sphere σ = ( c , w ) , and S = { σ 1 = ( c 1 , w 1 ) , , σ n = ( c n , w n ) } is the equivalent set of weighted points. The power distance is a signed distance since we have the following property: D PD ( σ , x ) < 0 iff x int ( σ ) , i.e., the point x falls inside the sphere σ = { x : x c 2 2 = w } . The power bisector is a hyperplane of equation
Bi PD ( σ i , σ j ) = x R d : 2 x ( c j c i ) + w i w j = 0
Notice that by shifting all weights by a predefined constant a, we obtain the same power bisector since ( w i + a ) ( w j + a ) = w i w j is kept invariant. Thus, we may consider without loss of generality that all weights are non-negative, and that the weighted points correspond to spheres with non-negative radius r i 2 = w i .
By identifying Equation (142) with Equation (145), we get the following equivalent spheres σ i = ( c i , w i ) [25] for the points in the Klein disk:
c i = p i 2 1 p i 2 ,
w i = p i 2 2 4 1 p i 2 2 1 1 p i 2 2 .
We can then shift all weights by the constant a = min i { 1 , , n } w i so that w i = w i + a 0 .
Thus, the Klein hyperbolic Voronoi diagram is a power diagram clipped to the unit ball D [80,81,82]. In computational geometry [4], the power diagram can be calculated from the intersection of n halfspaces by lifting the spheres σ i to corresponding halfspaces H i + of R d + 1 as follows: Let F = { ( x , z ) R d + 1 : z i = 1 d x i 2 } be the epigraph of the paraboloid function, and F denotes its boundary. We lift a point x R d to F using the upper arrow operator x = ( x , z = i = 1 d x i 2 ) , and we project orthogonally a point ( x , z ) of the potential function F by dropping its last z-coordinate so that we have ( x ) = x . Now, when we lift a sphere σ = ( c , w ) to F , the set of lifted points σ all belong to a hyperplane H σ , called the polar hyperplane of equation:
H σ : z = 2 c x c c + w .
Let H σ + denote the upper halfspace with bounding hyperplane H σ : H σ + : z 2 c x c c + w . Then one can show [4] that Vor D PD ( S ) is obtained as the vertical projection ↓ of the intersection of all these polar halfspaces H i with F :
Vor D PD ( S ) = i = 1 n H i + F .
Transforming back and forth non-vertical ( d + 1 ) -dimensional hyperplanes to corresponding d-dimensional spheres allows one to design various efficient algorithms, e.g., computing the intersection or the union of spheres [4], useful primitives for molecular chemistry [1].
Let H D denote the lower halfspace (containing the origin ( x = 0 , z = 0 ) ) supported by the polar hyperplane associated with the boundary sphere of the disk domain D . Computing the clipped power diagram Vor D PD ( S ) D can be done equivalently as follows:
Vor D PD ( S ) D = i = 1 n H i + F H D ,
= i = 1 n H i + H D F ,
using the commutative property of the set intersection.
The advantage of the method of Equation (150) is that we begin to clip the power diagram using H D before explicitly calculating it. Indeed, we first compute the intersection polytope of n + 1 hyperplanes P K : = i = 1 n H i + H D . Then we project down orthogonally the intersection of P K with F to get the clipped power diagram equivalent to the hyperbolic Klein Voronoi diagram:
Vor ρ K ( P ) = P K F .
By doing so, we potentially reduce the algorithmic complexity by avoiding to compute some of the vertices of P PD : = i = 1 n H i + whose orthogonal projection fall outside the domain D .
More generally, a Bregman Voronoi diagram [15] can be calculated equivalently as a power diagram (and intersection of d + 1 -dimensional halfspaces) using an arbitrary smooth and strictly convex potential function F instead of the the paraboloid potential function of Euclidean geometry [25]. The non-empty intersection of halfspaces can in turn be calculated as an equivalent convex hull [4]. Thus, we can compute in practice the hyperbolic Voronoi diagram in the Klein model using the Quickhull algorithm [83].

7. Symbolic Calculations with a Computer Algebra System

We use the open source computer algebra system Maxima(can be freely downloaded at http://maxima.sourceforge.net/) to calculate the gradient (partial derivatives) and Hessian of the deformed log-normalizer, and some definite integrals based on the Cauchy location-scale densities.
/* Written in Maxima */
assume(s>0);
CauchyStd(x) := (1/(%pi*(x**2+1)));
Cauchy(x,l,s) := (s/(%pi*((x-l)**2+s**2)));
/* check that we get a probability density (=1) */
integrate(Cauchy(x,l,s),x,-inf,inf);
/* calculate the the deformed log-normalizer */
logC(u):=1-(1/u);
logC(Cauchy(x,l,s));
ratsimp(%);
/* calculate partial derivatives of the deformed log-normalizer */
theta(l,s):=[2*%pi*l/s,-%pi/s];
F(theta):=(-%pi**2/theta[2])-(theta[1]**2/(4*theta[2]))-1;
derivative(F(theta),theta[1],1);
derivative(F(theta),theta[2],1);
/* calculated definite integrals */
assume(s1>0);
assume(s2>0);
integrate(Cauchy(x,l2,s2)**2,x,-inf,inf);
integrate(Cauchy(x,l2,s2)**2/Cauchy(x,l1,s1),x,-inf,inf);
We calculate the function θ ( η ) by solving the following system of equations:
solve([-t1/(2*t2)=e1, (%pi/t2)**2+ (t1/t2)**2/4=e2],[t1, t2]);
The Hessian metrics of the dual potential functions F and F * (denoted by G in the code) can be calculated as follows:
F(theta):=(-%pi**2/theta[2])-(theta[1]**2/(4*theta[2]))-1;
hessian(F(theta),[theta[1], theta[2]]);
G(eta):=1-2*%pi*sqrt(eta[2]-eta[1]**2);
hessian(G(eta),[eta[1], eta[2]]);
The plot of the Fisher-Rao to the square root KL divergence can be plotted using the following commands:
t(u):=sqrt(log((1/2)+(1/2)*cosh(sqrt(2)*u)));
plot2d(t(u)/u,[u,0,10]);
Symbolic calculations for the α -Chernoff coefficient between two Cauchy distributions prove that the α -Chernoff coefficient is symmetric for α = 3 and α = 4 as exemplified by the Maxima code below:
assume(s1>0);
assume(s2>0);
assume(s>0);
CauchyStd(x) := (1/(%pi*(x**2+1)));
Cauchy(x,l,s) := (s/(%pi*((x-l)**2+s**2)));
/* closed-form */
a: 3;
integrate((Cauchy(x,l2,s2)**a) * (Cauchy(x,l1,s1)**(1-a)),x,-inf,inf);
term1(l1,s1,l2,s2):=ratsimp(%);
integrate((Cauchy(x,l2,s2)**(1-a)) * (Cauchy(x,l1,s1)**(a)),x,-inf,inf);
term2(l1,s1,l2,s2):=ratsimp(%);
/* Is the a-divergence symmetric? */
term1(l1,s1,l2,s2)-term2(l1,s1,l2,s2);
ratsimp(%);

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Okabe, A.; Boots, B.; Sugihara, K.; Chiu, S.N. Spatial Tessellations: Concepts and Applications of Voronoi Diagrams; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 501. [Google Scholar]
  2. Aurenhammer, F. Voronoi diagrams: A survey of a fundamental geometric data structure. ACM Comput. Surv. (CSUR) 1991, 23, 345–405. [Google Scholar] [CrossRef]
  3. Cheng, S.W.; Dey, T.K.; Shewchuk, J. Delaunay Mesh Generation; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
  4. Boissonnat, J.D.; Yvinec, M. Algorithmic Geometry; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
  5. Bogdanov, M.; Devillers, O.; Teillaud, M. Hyperbolic Delaunay complexes and Voronoi diagrams made practical. In Proceedings of the twenty-ninth Annual Symposium on Computational Geometry, Rio de Janeiro, Brazil, 17–20 June 2013; pp. 67–76. [Google Scholar]
  6. Zhang, J. Divergence function, duality, and convex analysis. Neural Comput. 2004, 16, 159–195. [Google Scholar] [CrossRef] [PubMed]
  7. Amari, S.i. Information Geometry and Its Applications; Springer: Berlin, Germany, 2016; Volume 194. [Google Scholar]
  8. Boissonnat, J.D.; Wormser, C.; Yvinec, M. Curved Voronoi diagrams. In Effective Computational Geometry for Curves and Surfaces; Springer: Berlin, Germany, 2006; pp. 67–116. [Google Scholar]
  9. Bregman, L.M. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 1967, 7, 200–217. [Google Scholar] [CrossRef]
  10. Calin, O.; Udrişte, C. Geometric Modeling in Probability and Statistics; Springer: Berlin, Germany, 2014. [Google Scholar]
  11. Nielsen, F. An elementary introduction to information geometry. arXiv 2018, arXiv:1808.08271. [Google Scholar]
  12. Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
  13. Eguchi, S. Geometry of minimum contrast. Hiroshima Math. J. 1992, 22, 631–647. [Google Scholar] [CrossRef]
  14. Nielsen, F.; Hadjeres, G. Monte Carlo information geometry: The dually flat case. arXiv 2018, arXiv:1803.07225. [Google Scholar]
  15. Boissonnat, J.D.; Nielsen, F.; Nock, R. Bregman Voronoi diagrams. Discrete Comput. Geom. 2010, 44, 281–307. [Google Scholar] [CrossRef]
  16. Nielsen, F.; Boissonnat, J.D.; Nock, R. Visualizing Bregman voronoi diagrams. In Proceedings of the twenty-third annual symposium on Computational geometry (SoCG), Gyeongju, Korea, 6–8 June 2007; pp. 121–122. [Google Scholar]
  17. Rao, C.R. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Bull. Cal. Math. Soc. 1945, 37, 81–91. [Google Scholar]
  18. Atkinson, C.; Mitchell, A.F. Rao’s distance measure. Sankhyā The Indian J. Stat. Series A 1981, 43, 345–365. [Google Scholar]
  19. Pinele, J.; Strapasson, J.E.; Costa, S.I. The Fisher-Rao Distance between Multivariate Normal Distributions: Special Cases, Bounds and Applications. Entropy 2020, 22, 404. [Google Scholar] [CrossRef] [Green Version]
  20. Nielsen, F.; Nock, R. On the chi square and higher-order chi distances for approximating f-divergences. IEEE Signal Process. Lett. 2013, 21, 10–13. [Google Scholar] [CrossRef] [Green Version]
  21. Naudts, J. The q-exponential family in statistical physics. Cent. Eur. J. Phys. 2009, 7, 405–413. [Google Scholar] [CrossRef] [Green Version]
  22. Matsuzoe, H.; Henmi, M. Hessian structures and divergence functions on deformed exponential families. In Geometric Theory of Information; Springer: Berlin, Germany, 2014; pp. 57–80. [Google Scholar]
  23. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  24. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
  25. Nielsen, F.; Nock, R. Hyperbolic Voronoi diagrams made easy. In Proceedings of the 2010 International Conference on Computational Science and Its Applications, Fukuoka, Japan, 23–26 March 2010; pp. 74–80. [Google Scholar]
  26. Nielsen, F.; Nock, R. Visualizing hyperbolic Voronoi diagrams. In Proceedings of the thirtieth annual symposium on Computational geometry, Kyoto, Japan, 8–11 June 2014; pp. 90–91. [Google Scholar]
  27. Murray, M.K.; Rice, J.W. Differential Geometry and Statistics; CRC Press: Boca Raton, FL, USA, 1993; Volume 48. [Google Scholar]
  28. Mitchell, A.F. Statistical manifolds of univariate elliptic distributions. Int. Stat. Rev. 1988, 56, 1–16. [Google Scholar] [CrossRef]
  29. Hotelling, H. Spaces of statistical parameters. Bull. Am. Math. Soc 1930, 36, 191. [Google Scholar]
  30. Anderson, J.W. Hyperbolic Geometry; Springer Science & Business Media: Berlin, Germany, 2006. [Google Scholar]
  31. Nielsen, F.; Nock, R. Total Jensen divergences: Definition, properties and k-means++ clustering. arXiv 2013, arXiv:1309.7109. [Google Scholar]
  32. Nielsen, F.; Nock, R. Total Jensen divergences: Definition, properties and clustering. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Queensland, Australia, 19–24 April 2015; pp. 2016–2020. [Google Scholar]
  33. Nock, R.; Nielsen, F.; Amari, S.i. On conformal divergences and their population minimizers. IEEE Trans. Inf. Theory 2015, 62, 527–538. [Google Scholar] [CrossRef] [Green Version]
  34. Chen, B.Y. Differential Geometry of Warped Product Manifolds And Submanifolds; World Scientific Singapore: Singapore, 2017. [Google Scholar]
  35. Said, S.; Bombrun, L.; Berthoumieu, Y. Warped Riemannian metrics for location-scale models. In Geometric Structures of Information; Springer: Berlin, Germany, 2019; pp. 251–296. [Google Scholar]
  36. Li, W.; Zhao, J. Wasserstein information matrix. arXiv 2019, arXiv:1910.11248. [Google Scholar]
  37. Takatsu, A. Wasserstein geometry of Gaussian measures. Osaka J. Math. 2011, 48, 1005–1026. [Google Scholar]
  38. Lauritzen, S.L. Statistical manifolds. Differ. Geom. Stat. Inference 1987, 10, 163–216. [Google Scholar]
  39. Shima, H. The Geometry of Hessian Structures; World Scientific: Singapore, 2007. [Google Scholar]
  40. Nielsen, F.; Nock, R. On the smallest enclosing information disk. Inf. Process. Lett. 2008, 105, 93–97. [Google Scholar] [CrossRef] [Green Version]
  41. Matumoto, T. Any statistical manifold has a contrast function—on the C3-functions taking the minimum at the diagonal of the product manifold. Hiroshima Math. J. 1993, 23, 327–332. [Google Scholar] [CrossRef]
  42. Naudts, J. Generalised Thermostatistics; Springer Science & Business Media: Berlin, Germany, 2011. [Google Scholar]
  43. Vigelis, R.F.; Cavalcante, C.C. On φ-families of probability distributions. J. Theor. Probab. 2013, 26, 870–884. [Google Scholar] [CrossRef]
  44. Tanaya, D.; Tanaka, M.; Matsuzoe, H. Notes on geometry of q-normal distributions. In Recent Progress in Differential Geometry and Its Related Fields; World Scientific: Singapore, 2012; pp. 137–149. [Google Scholar]
  45. Amari, S.i.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries. Physi. A Stat. Mech. Its Appl. 2012, 391, 4308–4319. [Google Scholar] [CrossRef]
  46. Onicescu, O. Théorie de l’information énergie informationelle. Comptes rendus de l’Academie des Sci. Series AB 1966, 263, 841–842. [Google Scholar]
  47. Nielsen, F. A note on Onicescu’s informational energy and correlation coefficient in exponential families. arXiv 2020, arXiv:2003.13199. [Google Scholar]
  48. Crouzeix, J.P. A relationship between the second derivatives of a convex function and of its conjugate. Math. Program. 1977, 13, 364–365. [Google Scholar] [CrossRef]
  49. Ohara, A. Conformal Flattening on the Probability Simplex and Its Applications to Voronoi Partitions and Centroids. In Geometric Structures of Information; Springer: Berlin, Germany, 2019; pp. 51–68. [Google Scholar]
  50. Zhang, J. A note on curvature of α-connections of a statistical manifold. Ann. Inst. Stat. Math. 2007, 59, 161–170. [Google Scholar] [CrossRef]
  51. Amari, S.i.; Cichocki, A. Information geometry of divergence functions. Bull. Pol. Acad. Sci. Tech. Sci. 2010, 58, 183–195. [Google Scholar] [CrossRef] [Green Version]
  52. Matsuzoe, H. Hessian structures on deformed exponential families and their conformal structures. Differ. Geom. Its Appl. 2014, 35, 323–333. [Google Scholar] [CrossRef]
  53. Csiszár, I. Information-type measures of difference of probability distributions and indirect observation. Studia Sci. Math. Hungar 1967, 2, 229–318. [Google Scholar]
  54. Schwander, O.; Nielsen, F. Non-flat clustering with alpha-divergences. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 22–27 May 2011; pp. 2100–2103. [Google Scholar]
  55. Nielsen, F.; Sun, K. Combinatorial bounds on the α-divergence of univariate mixture models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 4476–4480. [Google Scholar]
  56. Chernoff, H. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Stat. 1952, 23, 493–507. [Google Scholar] [CrossRef]
  57. Nielsen, F. An information-geometric characterization of Chernoff information. IEEE Signal Process. Lett. 2013, 20, 269–272. [Google Scholar] [CrossRef]
  58. Chyzak, F.; Nielsen, F. A closed-form formula for the Kullback–Leibler divergence between Cauchy distributions. arXiv 2019, arXiv:1905.10965. [Google Scholar]
  59. Uhlmann, J.K. Metric trees. Appl. Math. Lett. 1991, 4, 61–62. [Google Scholar] [CrossRef] [Green Version]
  60. Yianilos, P.N. Data structures and algorithms for nearest neighbor seach in general metric spaces. In Proceedings of the Symposium on Discrete Algorithms (SODA), Austin, TX, USA, 25–27 January 1993; pp. 311–321. [Google Scholar]
  61. Nielsen, F.; Piro, P.; Barlaud, M. Bregman vantage point trees for efficient nearest neighbor queries. In Proceedings of the IEEE International Conference on Multimedia and Expo, Cancun, Mexico, 28 June–3 July 2009; pp. 878–881. [Google Scholar]
  62. Fuglede, B.; Topsoe, F. Jensen-Shannon divergence and Hilbert space embedding. In Proceedings of the International Symposium on Information Theory (ISIT), Chicago, IL, USA, 27 June–2 July 2004; p. 31. [Google Scholar]
  63. Kafka, P.; Österreicher, F.; Vincze, I. On powers of f-divergences defining a distance. Studia Sci. Math. Hungar 1991, 26, 415–422. [Google Scholar]
  64. Vajda, I. On metric divergences of probability measures. Kybernetika 2009, 45, 885–900. [Google Scholar]
  65. Duin, R.P.W.; Elzbieta, P. Dissimilarity Representation for Pattern Recognition: The Foundations and Applications; World Scientific: Singapore, 2005; Volume 64. [Google Scholar]
  66. Acharyya, S.; Banerjee, A.; Boley, D. Bregman divergences and triangle inequality. In Proceedings of the SIAM International Conference on Data Mining, San Diego, CA, USA, 8–12 July 2013; pp. 476–484. [Google Scholar]
  67. Nielsen, F.; Boltz, S. The Burbea-Rao and Bhattacharyya centroids. IEEE Trans. Inf. Theory 2011, 57, 5455–5466. [Google Scholar] [CrossRef] [Green Version]
  68. Chen, P.; Chen, Y.; Rao, M. Metrics defined by Bregman divergences: Part 2. Commun. Math. Sci. 2008, 6, 927–948. [Google Scholar] [CrossRef] [Green Version]
  69. Sra, S. Positive definite matrices and the S-divergence. Proc. Am. Math. Soc. 2016, 144, 2787–2797. [Google Scholar] [CrossRef] [Green Version]
  70. Nielsen, F.; Yvinec, M. An output-sensitive convex hull algorithm for planar objects. Int. J. Comput. Geom. Appl. 1998, 8, 39–65. [Google Scholar] [CrossRef] [Green Version]
  71. Tanuma, T.; Imai, H.; Moriyama, S. Revisiting hyperbolic Voronoi diagrams in two and higher dimensions from theoretical, applied and generalized viewpoints. In Transactions on Computational Science XIV; Springer: Berlin, Germany, 2011; pp. 1–30. [Google Scholar]
  72. DeBlois, J. The Delaunay tessellation in hyperbolic space. Math. Proc. Camb. Philos. Soc. 2018, 164, 15–46. [Google Scholar] [CrossRef] [Green Version]
  73. Sarkar, R. Low distortion Delaunay embedding of trees in hyperbolic plane. In International Symposium on Graph Drawing; Springer: Berlin, Germany, 2011; pp. 355–366. [Google Scholar]
  74. Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6338–6347. [Google Scholar]
  75. Haas, G.; Bain, L.; Antle, C. Inferences for the Cauchy distribution based on maximum likelihood estimators. Biometrika 1970, 57, 403–408. [Google Scholar]
  76. Giménez, P.; López, J.N.; Guarracino, L. Geodesic Hypothesis Testing for Comparing Location Parameters in Elliptical Populations. Sankhya A 2016, 78, 19–42. [Google Scholar] [CrossRef]
  77. Delaunay, B. Sur la sphère vide. Izv. Akad. Nauk SSSR, Otdelenie Matematicheskii i Estestvennyka Nauk 1934, 7, 1–2. [Google Scholar]
  78. Risch, R.H. The solution of the problem of integration in finite terms. Bull. Am. Math. Soc. 1970, 76, 605–608. [Google Scholar] [CrossRef] [Green Version]
  79. Komaki, F. Bayesian prediction based on a class of shrinkage priors for location-scale models. Ann. Inst. Stat. Math. 2007, 59, 135–146. [Google Scholar] [CrossRef]
  80. Boissonnat, J.D.; Delage, C. Convex hull and Voronoi diagram of additively weighted points. In European Symposium on Algorithms; Springer: Berlin, Germany, 2005; pp. 367–378. [Google Scholar]
  81. Nielsen, F. Grouping and querying: A paradigm to get output-sensitive algorithms. In Japanese Conference on Discrete and Computational Geometry; Springer: Berlin, Germany, 1998; pp. 250–257. [Google Scholar]
  82. Yan, D.M.; Wang, W.; LéVy, B.; Liu, Y. Efficient computation of clipped Voronoi diagram for mesh generation. Comput.-Aided Des. 2013, 45, 843–852. [Google Scholar] [CrossRef] [Green Version]
  83. Barber, C.B.; Dobkin, D.P.; Huhdanpaa, H. The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. (TOMS) 1996, 22, 469–483. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Euclidean Voronoi diagram of a set of generators (black square) in the plane with colored Voronoi cells (left). Euclidean Voronoi diagrams (red) and their dual Delaunay triangulations (blue) for n = 8 points (middle) and n = 256 points (right).
Figure 1. Euclidean Voronoi diagram of a set of generators (black square) in the plane with colored Voronoi cells (left). Euclidean Voronoi diagrams (red) and their dual Delaunay triangulations (blue) for n = 8 points (middle) and n = 256 points (right).
Entropy 22 00713 g001
Figure 2. Information-geometric structures on the Cauchy manifold and their relationships.
Figure 2. Information-geometric structures on the Cauchy manifold and their relationships.
Entropy 22 00713 g002
Figure 3. Plot of the chi-to-Fisher-Rao conversion function: A strictly increasing function.
Figure 3. Plot of the chi-to-Fisher-Rao conversion function: A strictly increasing function.
Entropy 22 00713 g003
Figure 4. Plot of the function t FR KL ( u ) u .
Figure 4. Plot of the function t FR KL ( u ) u .
Entropy 22 00713 g004
Figure 5. Hyperbolic Voronoi diagram of a set of Cauchy distributions in the Poincaré upper plane (top), the Poincaré disk model (bottom left), and the Klein disk model (bottom right).
Figure 5. Hyperbolic Voronoi diagram of a set of Cauchy distributions in the Poincaré upper plane (top), the Poincaré disk model (bottom left), and the Klein disk model (bottom right).
Entropy 22 00713 g005
Figure 6. A hyperbolic Cauchy Voronoi diagram of a finite set of Cauchy distributions (black square generators, colored Voronoi cells, and black cell borders).
Figure 6. A hyperbolic Cauchy Voronoi diagram of a finite set of Cauchy distributions (black square generators, colored Voronoi cells, and black cell borders).
Entropy 22 00713 g006
Figure 7. Duality between the ordinary Euclidean Voronoi diagram and the Delaunay structures: The Voronoi diagram partitions the space into Voronoi proximity cells. The Delaunay complex triangulates the convex hull of the generators. A Delaunay edge is drawn between the generators of adjacent Voronoi cells. Observe that the Delaunay edges cuts orthogonally the corresponding Voronoi bisectors in Euclidean geometry.
Figure 7. Duality between the ordinary Euclidean Voronoi diagram and the Delaunay structures: The Voronoi diagram partitions the space into Voronoi proximity cells. The Delaunay complex triangulates the convex hull of the generators. A Delaunay edge is drawn between the generators of adjacent Voronoi cells. Observe that the Delaunay edges cuts orthogonally the corresponding Voronoi bisectors in Euclidean geometry.
Entropy 22 00713 g007
Figure 8. Examples of hyperbolic Voronoi Delaunay complexes drawn in the Klein model: Delaunay complex triangulates the convex hull yielding the Delaunay triangulation (top left), and Delaunay complex which does not triangulate the convex hull, (top right). Bottom: A hyperbolic Voronoi diagram and its dual Delaunay complex displayed in the Poincaré disk model (left) and in the Klein disk model (right).
Figure 8. Examples of hyperbolic Voronoi Delaunay complexes drawn in the Klein model: Delaunay complex triangulates the convex hull yielding the Delaunay triangulation (top left), and Delaunay complex which does not triangulate the convex hull, (top right). Bottom: A hyperbolic Voronoi diagram and its dual Delaunay complex displayed in the Poincaré disk model (left) and in the Klein disk model (right).
Entropy 22 00713 g008
Figure 9. In hyperbolic geometry, the Voronoi bisector between two generators is orthogonal to the geodesic linking them. The top figures display a pair of (bisector,geodesic) in the Klein model (left), and the same pair in the Poincaré model (right). When viewed in Klein non-conformal model, the bisector does not intersect orthogonally (with respect to the Euclidean geometry) the geodesic (left) except when the intersection point is at the disk origin (bottom right).
Figure 9. In hyperbolic geometry, the Voronoi bisector between two generators is orthogonal to the geodesic linking them. The top figures display a pair of (bisector,geodesic) in the Klein model (left), and the same pair in the Poincaré model (right). When viewed in Klein non-conformal model, the bisector does not intersect orthogonally (with respect to the Euclidean geometry) the geodesic (left) except when the intersection point is at the disk origin (bottom right).
Entropy 22 00713 g009
Figure 10. Equivalent hyperbolic Voronoi diagram and dual Delaunay complex of a set of Cauchy distributions in the Poincaré upper plane (left), the Poincaré disk model (middle), and the Klein disk model (right). Top row figures for n = 24 Cauchy distributions, middle row figures for n = 1024 distributions and bottom row figures for a quasi-regular set of n = 25 Cauchy distributions.
Figure 10. Equivalent hyperbolic Voronoi diagram and dual Delaunay complex of a set of Cauchy distributions in the Poincaré upper plane (left), the Poincaré disk model (middle), and the Klein disk model (right). Top row figures for n = 24 Cauchy distributions, middle row figures for n = 1024 distributions and bottom row figures for a quasi-regular set of n = 25 Cauchy distributions.
Entropy 22 00713 g010
Figure 11. A hyperbolic Voronoi diagram with all unbounded Voronoi cells.
Figure 11. A hyperbolic Voronoi diagram with all unbounded Voronoi cells.
Entropy 22 00713 g011
Figure 12. Delaunay triangles of the hyperbolic Delaunay complex satisfy the empty circumscribing sphere property. The empty sphere centers are located on the Voronoi T-junction vertices. The hyperbolic spheres are displayed as ordinary Euclidean sphere (with displaced center) in the Poincaré model (left column) and as ellipsoids (with displaced center) in the Klein model (right column). The centers of the empty hyperbolic spheres are located at the Voronoi T-junctions.
Figure 12. Delaunay triangles of the hyperbolic Delaunay complex satisfy the empty circumscribing sphere property. The empty sphere centers are located on the Voronoi T-junction vertices. The hyperbolic spheres are displayed as ordinary Euclidean sphere (with displaced center) in the Poincaré model (left column) and as ellipsoids (with displaced center) in the Klein model (right column). The centers of the empty hyperbolic spheres are located at the Voronoi T-junctions.
Entropy 22 00713 g012
Figure 13. Voronoi diagrams of a set of Cauchy distributions with respect to the Fisher-Rao (FR) distance ρ FR , the Kullback-Leibler (KL) divergence D KL , the χ 2 -divergence D χ 2 , and the asymmetric Bregman-Tsallis flat divergence D flat .
Figure 13. Voronoi diagrams of a set of Cauchy distributions with respect to the Fisher-Rao (FR) distance ρ FR , the Kullback-Leibler (KL) divergence D KL , the χ 2 -divergence D χ 2 , and the asymmetric Bregman-Tsallis flat divergence D flat .
Entropy 22 00713 g013
Table 1. Summary of the main closed-form formula for the statistical distances between Cauchy densities and their induced Voronoi diagrams.
Table 1. Summary of the main closed-form formula for the statistical distances between Cauchy densities and their induced Voronoi diagrams.
FormulaVoronoi
D χ 2 [ p l 1 , s 1 , p l 2 , s 2 ] = ( l 2 l 1 ) 2 + ( s 2 s 1 ) 2 2 s 1 s 2 Vor D χ 2 hyperbolic Voronoi
ρ FR [ p l 1 , s 1 , p l 2 , s 2 ] = 1 2 arccosh ( 1 + D χ 2 [ p l 1 , s 1 , p l 2 , s 2 ] ) Vor ρ FR hyperbolic Voronoi
D KL [ p l 1 , s 1 , p l 2 , s 2 ] = log 1 + 1 2 D χ 2 [ p l 1 , s 1 , p l 2 , s 2 ] Vor D KL hyperbolic Voronoi
ρ KL [ p l 1 , s 1 , p l 2 , s 2 ] = D KL [ p l 1 , s 1 , p l 2 , s 2 ] (metric) Vor ρ KL hyperbolic Voronoi
D flat [ p l 1 , s 1 , p l 2 , s 2 ] = 2 π s 2 D χ 2 [ p l 1 , s 1 , p l 2 , s 2 ] Bregman Voronoi:
Vor D flat hyperbolic Voronoi, Vor D flat * Euclidean Voronoi.

Share and Cite

MDPI and ACS Style

Nielsen, F. On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds. Entropy 2020, 22, 713. https://doi.org/10.3390/e22070713

AMA Style

Nielsen F. On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds. Entropy. 2020; 22(7):713. https://doi.org/10.3390/e22070713

Chicago/Turabian Style

Nielsen, Frank. 2020. "On Voronoi Diagrams on the Information-Geometric Cauchy Manifolds" Entropy 22, no. 7: 713. https://doi.org/10.3390/e22070713

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop