# On Monotone Embedding in Information Geometry

## Abstract

**:**

_{ϕ}function) is recovered with the identification ρ = ϕ, τ = log

_{ϕ}, with ϕ-exponential exp

_{ϕ}given by the associated convex function linking the two representations.

_{ϕ}associated with an arbitrary strictly increasing function ϕ as investigated by Naudts (2004) [3] arises naturally from identifying ϕ with ρ and with a proper choice of the auxiliary function f as a part of Zhang’s theory.

## 1. Equivalence of (F, G)-Geometry to Zhang’s (2004) [2] (ρ, τ)-Geometry

#### 1.1. Amari’s α-Geometry and α-Embedding

_{Θ}= {p(·|θ), θ∈Θ ⊆ ℝ

^{n}} of parametric probability functions p (probability density or probability distributions) is through the Fisher–Rao metric g

_{ij}as its Riemannian metric:

^{(}

^{α}

^{)}(α ∈ ℝ):

_{µ}denotes the expectation with respect to a background measure µ of the random variable denoted by ζ:

^{*}

^{(1)}is frequently called e-connection (α = 1) and Γ

^{*}≡ Γ

^{(−1)}called m-connection (α = −1). A Riemannian manifold M

_{µ}with its metric g and the family of α-connections Γ

^{(}

^{α}

^{)}in the form of (1) and (2) has been called α-geometry. Amari’s α-geometry can be specified in terms of a symmetric (0, 2)-tensor g

_{ij}(the Fisher–Rao metric) and a totally symmetric (0, 3)-tensor T

_{ijk}(sometimes called the Amari–Chentsov tensor), which is linked to the α-connections via:

^{(α)}: ℝ

^{+}→ ℝ:

#### 1.2. Zhang (2004) [2] Extension: ρ-Embedding and (ρ, τ)-Geometry

^{*}given by:

- we call ρ-representation of a probability function p the mapping p ↦ ρ(p);
- we say τ-representation of the probability function p ↦τ(p) is conjugate to ρ-representation with respect to a smooth and strictly convex function f, or simply τ is f-conjugate to ρ, if:$$\tau \left(p\right)=f\prime \left(\rho \left(p\right)\right)={\left(\left({f}^{*}\right)\prime \right)}^{-1}\left(\rho \left(p\right)\right),$$$$\rho \left(p\right)={\left(f\prime \right)}^{-1}\left(\tau \left(p\right)\right)={\left({f}^{*}\right)}^{\prime}\left(\tau \left(p\right)\right).$$

^{*})′ are both strictly increasing (due to their strict convexity) and that (f

^{*})

^{*}= f, (f

^{*})′ = (f′)

^{−1}. Sometimes, we write f′ = σ, (f

^{*})

^{−1}= σ

^{−1}for convenience, so σ(ρ) = τ, σ

^{−1}(τ) = ρ, for a strictly increasing function τ.

^{*}(t) = exp(t) and f(t) = t log t − t + 1. That ρ(p) and τ(p) are just the p and log p representation reflects the conventional dual embeddings that have later been extended to ϕ- and log

_{ϕ}-embedding in ([3]). In Section 2.2, it will be shown that Naudts’ ϕ-logarithm formulation is recovered as a special case of the (ρ, τ)-embedding.

^{(}

^{β}

^{)}(p) to be the β-representation given by Equation (6); this would have been traditionally called “alpha-embedding”, except we use the symbol β, so that the α-parameter will be reserved for indexing α-connections. In this case, the conjugate representation is the (−β)-representation τ(p) = l

^{(−}

^{β}

^{)}(p):

**Proposition 1.**([2], Proposition 7) Using an arbitrary monotone embedding function ρ and an arbitrary smooth strictly convex function f, a generalization of α-geometry is obtained, with metric and α-connections taking the form:

**Corollary 1.**([2], Proposition 8) Using two arbitrary monotone embedding functions ρ and τ, the metric and α-connections of (14)–(16) are:

#### 1.3. Harsha and Subrahamanian Moosath’s (2014) Work [1]

_{p}{(·)} = E

_{µ}{(·)p}. (23) is the expression for the e-connection (α = 1), ${\mathrm{\Gamma}}_{ijk}^{F,G}$. To express the conjugate connection (m-connection, α = −1), ${\mathrm{\Gamma}}_{ijk}^{H,G}$, a dual embedding function H is introduced, which is shown ([1], Theorem 3.2) to be related to F and G via (their Equation (36)):

**Statement 1.**Equations (14) and (22) give the same Riemannian metric; Equations (17) and (23) give the same affine connection; and Equations (18) and (25) give the same conjugate connection, as long as:

**Proof.**Re-writing (14), and keeping in mind:

**Statement 2.**The conjugate embedding function H is the same as τ. The conjugate connection (25), when expressed using H, has the same form as (23) for${\mathrm{\Gamma}}_{ij,k}^{G,F}$ using F.

**Proof.**Applying Definition (24) immediately yields H′ = τ′. Therefore, (apart from constant) H(p) = τ(p). Next, we will express (25) explicitly using the conjugate embedding function H (rather than F) and the weighting function G. That is to say, we will simplify the terms in the middle parenthesis of (25):

^{−1}(t) (ρ′(ρ

^{−1}(t)))

^{2}), for a given ρ. The subsequent development in their paper [1], e.g., the definition of the F -affine manifold (their Equation (50)), replicates the definition of ρ-affine manifold in [2] (Section 3.4).

## 2. Uniqueness of (ρ, τ)-Geometry and Representation Duality

#### 2.1. Monotone Embedding as a Transformation Group

**Lemma 1**. Denote Ω as the set of strictly increasing functions from ℝ → ℝ. Then, (Ω, ○) forms a group, with ○ denoting functional composition.

**Proof.**We easily verify that:

- closure for ○: for any ρ
_{1}, ρ_{2}∈ Ω, ρ_{2}○ ρ_{1}, defined as ρ_{2}(ρ_{1}(·)), is strictly increasing, and hence, ρ_{2}○ ρ_{1}∈ Ω; - existence of unique identity element: the identity function ι, which satisfies ρ ○ ι = ι ○ ρ = ρ, is strictly increasing, and hence, ι ∈ Ω and is unique;
- existence of inverse: for any ρ ∈ Ω, its functional inverse ρ
^{−1}, which satisfies ρ^{−1}○ρ = ρ^{−1}○ρ = ι, is also strictly increasing, and hence, ρ^{−1}∈ Ω; - associativity of ○: for any three ρ
_{1}, ρ_{2}, ρ_{3}∈ Ω, then (ρ_{1}○ ρ_{2}) ○ ρ_{3}= ρ_{1}○ (ρ_{2}○ ρ_{3}).

^{−1}= τ(ρ

^{−1}(·)), (f*)′ = ρ ○ τ

^{−1}= ρ(τ

^{−1}(·)), encountered above, are themselves two mutually inverse strictly increasing functions. This is the rationale behind Zhang’s ([2]) choice of f (and f

^{*}) as the auxiliary function to capture conjugate embedding, rather than using G as in [1]. The following identities are useful; they are obtained by differentiating (10) and (11):

^{*}and τ.

^{−1}(t)). In terms of conjugate function f

^{*}, the relation is (f

^{*})′(t) = ρ(τ

^{−1}(t)). The function f (or f

^{*}) is important in constructing the general class of divergence function.

#### 2.2. Naudts’ ϕ-Logarithm as a Special Case

_{+}→ ℝ

_{+}, the ϕ-logarithm is defined as:

_{ψ}, is defined by:

_{ϕ}, so our current rendition has a subtle difference shown as (48) and (49) below.) It can be shown that the deformed functions log

_{ϕ}and exp

_{ψ}are in fact inverse functions of each other if:

_{ϕ}(t) can be viewed as the solution to the following integral and its equivalent differential equation:

_{ψ}(t) can be viewed as the solution to the following integral and its equivalent differential equation:

^{*}) function. Set ϕ(t) = ρ(t) and f

^{*}(t) = exp

_{ψ}(t), so that (f

^{*})′(t) = ψ(t) from (46). Therefore, we derive:

_{ϕ}turns out to be the τ-representation, while the deformed exponential is nothing but f

^{*}. The relationship (47) is identical to (10) and (11).

_{ϕ}(that is, τ) is specified, through the integral relation (45). Viewing τ(·) = f′(ρ(·)), the relation (45) essentially specifies a strictly convex function f, through its derivative f′, which operates on ρ.

**Proposition 2.**Denote ρ ≡ ϕ. The deformed logarithmic transformation ϕ → log

_{ϕ}given by (45) can be viewed as the function composition f′ : ρ → f′(ρ), where f is given by:

^{*}given by (9),

**Proof.**From (45), we write:

^{*}○ f′ is the inverse function of ρ, or:

^{−1}in terms of g, then g would be the ρ-exponential, g

^{−1}the ρ-logarithm and g′ the linking function. In the case of ϕ ↦ log

_{ϕ}transformation, g = f

^{*}(t).

#### 2.3. Uniqueness of (ρ, τ)-Geometry

_{Θ}, there are several traditional choices for tangent vectors: ∂

_{i}p, ∂

_{i}log p, ${\partial}_{i}\sqrt{p}$, etc. Each of these are linked with a weighting function (expectation operator), so that the tangent vectors are zero-mean random variables:

_{i}(ρ(p)) = ρ′(p)∂

_{i}p, so a tangent vector retains its direction with any choice of monotone embedding function.

_{i}ρ a zero-mean random function at any point of $\mathcal{M}$

_{Θ}(i.e., for any value of θ ∈ Θ).

_{µ}{∂

_{i}p ∂

_{j}log p} = E

_{µ}{∂

_{i}log p ∂

_{j}p}, the pairing of a random function with a random functional under two embeddings p and log p. A natural generalization (see [6]) is to use two (independently chosen) monotone embeddings ρ, τ:

^{2}= τ′(p)ρ′(p), when tangent vectors are expressed as ∂

_{i}p (identity representation). When ρ-representation or τ-representation is adopted, the weighting function is simply f″(ρ(p)) or (f

^{*})″(τ(p)), respectively.

_{ki,j}and:

**Proposition 3.**T as given by (62) is a totally symmetric (0,3)-tensor.

**Proof.**First, we prove that T (θ) is totally symmetric:

_{ijk}= T

_{jik}, we only need to establish T

_{ijk}= T

_{ikj}. Applying the chain-rule of differentiation,

_{ijk}is indeed a (0,3)-tensor. This is done through examining the behavior of T under a coordinate transform $\theta \mapsto \overline{\theta}$, with the (inverse) Jacobian matrix $\frac{\partial {\theta}^{k}}{\partial {\overline{\theta}}^{l}}$, which affects:

^{−1}(the inverse function of σ) serves as the weighting function. Note that σ = f′, σ

^{−1}= (f

^{*})′ when ρ and τ are said to be conjugate. Furthermore, note the negative sign in (75) compared with (74); this precisely reflects “representation duality” with a ρ ↔ τ exchange.

#### 2.4. Representation Duality versus Reference Duality

- parameterizing the divergence functions (α-divergences);
- parameterizing monotone embedding of probability functions (α-embedding);
- parameterizing the convex mixture of connections (α-connections).

^{*}) pair. Naudts’ (2004) [3] ϕ-logarithm is but a special case of the (ρ, τ) duality, in which f′ plays the role of the “integral-of-the-reciprocal” operation, that is taking the log of a function. This linkage then leads to f

^{*}and τ as inverse functions. The phenomena of biduality emerges when exchanging ρ ↔ τ or (ρ, f) ↔ (τ, f

^{*}) leads to invariance of the Riemannian metric, but switches the two connections (the latter half of the statement is equivalent to changing signs of the Amari–Chentsov tensor). Therefore, the present paper, while elaborating the theory developed in [2], re-asserts the distinction between two distinct kinds of duality that was originally confounded in Amari’s theory of α-geometry, one through the freedom of selecting monotone embedding functions (“representation duality”) and the other through the freedom of assigning referential status to points for pair comparison (“reference duality”).

## 3. Conclusion

## Acknowledgments

## Conflicts of Interest

## References

- Harsha, K.V; Subrahamanian Moosath, K.S. F -geometry and Amari’s α-geometry on a statistical manifold. Entropy
**2014**, 16, 2472–2487. [Google Scholar] - Zhang, J. Divergence function, duality, and convex analysis. Neural Comput.
**2004**, 16, 159–195. [Google Scholar] - Naudts, J. Estimators, escort probabilities, and ϕ-exponential families in statistical physics. J. Inequal. Pure Appl. Math.
**2004**, 5, 102. [Google Scholar] - Zhang, J. Referential Duality and Representational Duality on Statistical Manifolds. In Referential Duality and Representational Duality on Statistical Manifolds, Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo, Japan, 12–16 December 2005; pp. 58–67.
- Zhang, J. Referential duality and representational duality in the scaling of multi-dimensional and infinite-dimensional stimulus space. In Measurement and Representation of Sensations: Recent Progress in Psychological Theory; Dzhafarov, E., Colonius, H., Eds.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2006. [Google Scholar]
- Zhang, J. Nonparametric information geometry: From divergence function to referential-representational biduality on statistical manifolds. Entropy
**2013**, 15, 5384–5418. [Google Scholar] - Zhang, J. Divergence functions and geometric structures they induce on a manifold. In Geometric Theory of Information; Nielsen, F., Ed.; Springer: Cham, Switzerland, 2014; pp. 1–30. [Google Scholar]
- Zhang, J. Reference duality and representation duality in information geometry. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering (MaxEnt2014), Proceedings of 34th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Engineering, Amboise, France, 21–26 September 2014; 1641, pp. 130–146.
- Amari, S. Differential geometry of curved exponential families—curvatures and information loss. Ann. Stat.
**1982**, 10, 357–385. [Google Scholar] - Amari, S. Differential Geometric Methods in Statistics; Lecture Notes in Statistics; Volume 28, Springer: New York, NY, USA, 1985. [Google Scholar]
- Amari, S.; Nagaoka, H. Method of Information Geometry; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
- Ohara, A. Geometry of distributions associated with Tsallis statistics and properties of relative entropy minimization. Phys. Lett. A
**2007**, 370, 184–193. [Google Scholar] - Naudts, J. Generalised exponential families and associated entropy functions. Entropy
**2008**, 10, 131–149. [Google Scholar] - Ohara, A.; Matsuzoe, H.; Amari, S. A dually flat structure on the space of escort distributions. J. Phys. Conf. Ser.
**2010**, 201, 012012. [Google Scholar] - Amari, S.; Ohara, A. Geometry of q-exponential family of probability distributions. Entropy
**2011**, 13, 1170–1185. [Google Scholar] - Amari, S.; Ohara, A.; Matsuzoe, H. Geometry of deformed exponential families: Invariant, dually-flat and conformal geometry. Physica A
**2012**, 391, 4308–4319. [Google Scholar] - Eguchi, S. Second order efficiency of minimum contrast estimators in a curved exponential family. Ann. Stat.
**1983**, 11, 793–803. [Google Scholar] - Eguchi, S. A differential geometric approach to statistical inference on the basis of contrast functionals. Hiroshima Math. J
**1985**, 15, 341–391. [Google Scholar] - Chentsov, N.N. Statistical Decision Rules and Optimal Inference; American Mathematics Society: Providence, RI, USA, 1982. [Google Scholar]
- Ay, N.; Jost, J.; Le, H.V.; Schwachhöfer, L. Information geometry and sufficient statistics. Probab. Theory Relat. Fields
**2014**. [Google Scholar] [CrossRef] - Zhang, J.; Hasto, P. Statistical manifold as an affine space: A functional equation approach. J. Math. Psychol.
**2006**, 50, 60–65. [Google Scholar] - Burbea, J.; Rao, C.R. Entropy differential metric, distance and divergence measures in probability spaces: A unified approach. J. Multivar. Anal.
**1982**, 12, 575–596. [Google Scholar] - Burbea, J.; Rao, C.R. Differential metrics in probability spaces. Probab. Math. Stat.
**1984**, 3, 241–258. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Zhang, J.
On Monotone Embedding in Information Geometry. *Entropy* **2015**, *17*, 4485-4499.
https://doi.org/10.3390/e17074485

**AMA Style**

Zhang J.
On Monotone Embedding in Information Geometry. *Entropy*. 2015; 17(7):4485-4499.
https://doi.org/10.3390/e17074485

**Chicago/Turabian Style**

Zhang, Jun.
2015. "On Monotone Embedding in Information Geometry" *Entropy* 17, no. 7: 4485-4499.
https://doi.org/10.3390/e17074485