Next Article in Journal
Assessing the Potential Earthquake Precursory Information in ULF Magnetic Data Recorded in Kanto, Japan during 2000–2010: Distance and Magnitude Dependences
Previous Article in Journal
The Rr Form of the Kedem–Katchalsky–Peusner Model Equations for Description of the Membrane Transport in Concentration Polarization Conditions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multiplicative Decomposition of Heterogeneity in Mixtures of Continuous Distributions

1
Department of Psychiatry, Dalhousie University, Halifax, NS B3H 2E2, Canada
2
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada
*
Author to whom correspondence should be addressed.
Current address: 5909 Veterans Memorial Lane (8th Floor), Abbie J. Lane Memorial Building, QE I.I. Health Sciences Centre, Halifax, NS B3H 2E2, Canada.
Entropy 2020, 22(8), 858; https://doi.org/10.3390/e22080858
Submission received: 17 June 2020 / Revised: 28 July 2020 / Accepted: 30 July 2020 / Published: 1 August 2020
(This article belongs to the Section Entropy and Biology)

Abstract

:
A system’s heterogeneity (diversity) is the effective size of its event space, and can be quantified using the Rényi family of indices (also known as Hill numbers in ecology or Hannah–Kay indices in economics), which are indexed by an elasticity parameter q 0 . Under these indices, the heterogeneity of a composite system (the γ -heterogeneity) is decomposable into heterogeneity arising from variation within and between component subsystems (the α - and β -heterogeneity, respectively). Since the average heterogeneity of a component subsystem should not be greater than that of the pooled system, we require that γ α . There exists a multiplicative decomposition for Rényi heterogeneity of composite systems with discrete event spaces, but less attention has been paid to decomposition in the continuous setting. We therefore describe multiplicative decomposition of the Rényi heterogeneity for continuous mixture distributions under parametric and non-parametric pooling assumptions. Under non-parametric pooling, the γ -heterogeneity must often be estimated numerically, but the multiplicative decomposition holds such that γ α for q > 0 . Conversely, under parametric pooling, γ -heterogeneity can be computed efficiently in closed-form, but the γ α condition holds reliably only at q = 1 . Our findings will further contribute to heterogeneity measurement in continuous systems.

1. Introduction

Measurement of heterogeneity is important across many scientific disciplines. Ecologists are interested in the heterogeneity of ecosystems’ biological composition (biodiversity) [1], economists are interested in the heterogeneity of resource ownership (wealth equality) [2], and medical researchers and physicians are interested in the heterogeneity of diseases and their presentations [3]. Using Rényi heterogeneity [3,4,5], which for categorical random variables corresponds to ecologists’ Hill numbers [6] and economists’ Hannah–Kay indices [7], one can measure a system’s heterogeneity as its effective number of distinct configurations.
The heterogeneity of a mixture or ensemble of systems is often known as γ -heterogeneity, and is generated by variation occurring within and between constituent subsystems. A good heterogeneity measure will facilitate decomposition of γ -heterogeneity into α (within subsystem) and β (between subsystem) components. Under this decomposition, we require that γ α , since it is counterintuitive that the heterogeneity of the overall ensemble should be less than any of its constituents, let alone the “average” subsystem [8,9]. Such a decomposition was introduced by Jost [9] for systems represented on discrete event spaces (such as representations of organisms by species labels). However, many data are better modeled by continuous embeddings, including word semantics [10,11,12], genetic population structure [13], and natural images [14]. Unfortunately, there is considerably less understood about how to decompose Rényi heterogeneity in such cases where data are represented on non-categorical spaces [4]. Although there are decomposable functional diversity indices expressed in numbers equivalent, they require categorical partitioning of the data (in order to supply species (dis)similarity matrices) [15,16,17,18] and setting sensitivity or threshold parameters for (dis)similarities [16,18]. For many research applications, such as those in psychiatry [3,4,19] or involving unsupervised learning [13,14], we may not have categorical partitions of the observable space that are valid, reliable, and of semantic relevance. If we are to apply Rényi heterogeneity to such continuous-space systems, then we must demonstrate that its multiplicative decomposition of γ -heterogeneity into α and β components is retained.
Therefore, our present work extends the Jost [9] multiplicative decomposition of Rényi heterogeneity to the analysis of continuous systems, and provides conditions under which the γ α condition is satisfied. In Section 2, we introduce decomposition of the Rényi heterogeneity in categorical and continuous systems. Specifically, we highlight that the most important decision guiding the availability of a decomposition is how one defines the distribution over the mixture of subsystems. We show that, for non-parametrically pooled systems (i.e., finite mixture models, illustrated in Section 3), the γ α condition can hold for all values of the Rényi elasticity parameter q > 0 , but that γ -heterogeneity will generally require numerical estimation. Section 4 introduces decomposition of Rényi heterogeneity under parametric assumptions on the pooled system’s distribution. In this case, which amounts to a Gaussian mixed-effects model (as commonly implemented in biomedical meta-analyses), we show that γ α will hold at q = 1 , though not necessarily at q 1 . Finally, in Section 5, we discuss the implications of our findings and scenarios in which parametric or non-parametric pooling assumptions might be particularly useful.

2. Background

2.1. Categorical Rényi Heterogeneity Decomposition

In this section, we consider the definition and decomposition of Rényi heterogeneity for a composite random variable (or “system”) that we call a discrete mixture (Definition 1).
Definition 1 (Discrete Mixture). 
A random variable or system X is called a discrete mixture when it is defined on an n-dimensional discrete state space X = { 1 , 2 , , n } with probability distribution p ¯ = p ¯ i i = 1 , 2 , , n , where p ¯ i is the probability that X is observed in state i X . Furthermore, let X be an aggregation of N component subsystems X 1 , X 2 , , X N with corresponding probability distributions P = p i j i = 1 , 2 , , N j = 1 , 2 , , n . The proportion of X attributable to each component is governed by the weights w = w i i = 1 , 2 , , N , where 0 w i 1 and i = 1 N w i = 1 .
Let X be a discrete mixture. The Rényi heterogeneity for the ithcomponent is
Π q X i = j = 1 n p i j q 1 1 q ,
which is the effective number of states in X i . Assuming the pooled distribution over discrete mixture X is a weighted average of subsystem distributions, p ¯ = P w , the γ -heterogeneity is thus
Π q γ X = i = 1 n p ¯ i q 1 1 q ,
which we interpret as the effective number of states in the pooled system X.
Jost [9] proposed the following decomposition of γ -heterogeneity:
Π q γ X = Π q α X Π q β X ,
where Π q α X and Π q β X are summary measures of heterogeneity due to variation within and between subsystems, respectively. Since the γ factor has units of effective number of states in the pooled system, and α has units of effective number of states per component, then
Π q β X = Π q γ X Π q α X
yields the effective number of components in X.
For discrete mixtures, Jost [9] specified the functional form for α -heterogeneity as
Π q α X = i = 1 N w i q k = 1 N w k q j = 1 n p i j q 1 1 q q 1 exp { i = 1 N w i j = 1 n p i j log p i j } q = 1 ,
which allows the decomposition in Equation (3) to satisfy the following desiderata:
  • The α and β components are independent [20]
  • The within-group heterogeneity is a lower bound on total heterogeneity [8]: Π q α Π q γ
  • The α -heterogeneity is a form of average heterogeneity over groups
  • The α and β components are both expressed in numbers that are equivalent.
Specifically, Jost [9] proved that Π q γ X Π q α X is guaranteed for all q 0 when w i = w j for all ( i , j ) { 1 , 2 , , N } , or for unequal weights w if the elasticity is set to the Shannon limit of q 1 .

2.2. Continuous Rényi Heterogeneity Decomposition

Let X be a non-parametric continuous mixture according to Definition 2. Despite individual mixture components in X potentially having parametric probability density functions, we call this a “non-parametric” mixture because the distribution over pooled components does not assume the form of a known parametric family.
Definition 2 (Non-Parametric Continuous Mixture). 
A non-parametric continuous mixture is a random variable X defined on an n-dimensional continuous space X R n , and composed of subsystems X 1 , X 2 , , X N , with respective probability density functions f ( x ) = f i ( x ) i = 1 , 2 , , N and weights w = w i i = 1 , 2 , , N such that i = 1 N w i = 1 and 0 w i 1 . The pooled probability density over X is defined as
f ¯ ( x ) = i = 1 N w i f i ( x ) .
The continuous Rényi heterogeneity for the ithsubsystem of X is
Π q X i = X f i q ( x ) d x 1 1 q ,
whose interpretation is given by Proposition 1 (see Proposition A3 in Nunes et al. [5] for the proof), which we henceforth call the “effective volume” of the event space or domain of X i .
Proposition 1 
(Rényi Heterogeneity of a Continuous Random Variable). The Rényi heterogeneity of a continuous random variable X defined on event space X R n with probability density function f is equal to the magnitude of the volume of an n-cube over which there is a uniform probability density with the same Rényi heterogeneity as that in X.
Given the pooled distribution as defined in Equation (6), the Rényi heterogeneity over the mixture, which is the γ -heterogeneity, is
Π q γ X = X f ¯ q ( x ) d x 1 1 q .
The γ -heterogeneity is thus the total effective volume of X’s domain. The α -heterogeneity represents the effective volume per component mixture component in X, and is computed as follows:
Π q α X = i = 1 N w i q k = 1 N w k q X f i q ( x ) d x 1 1 q .
Given Equations (8) and (9), the following theorem provides conditions under which γ α is satisfied for a non-parametric continuous mixture. The proof is analogous to that given by Jost [9] for discrete mixtures, and is detailed in Appendix A.
Theorem 1. 
If X is a non-parametric continuous mixture (Definition 2), with γ-heterogeneity specified by Equation (8) and α-heterogeneity given by Equation (9), then
Π q β X = Π q γ X Π q α X 1
under the following conditions:
1. 
q = 1
2. 
q > 0 when weights are equal for all mixture components.
If X f i q ( x ) d x is analytically tractable for all i { 1 , 2 , , N } , then a closed form expression for Π q α X will be available. If X f ¯ q ( x ) d x is also analytically tractable, then Π q β X will be too. However, this will depend entirely on the functional form of f ¯ , and will rarely be the case using real world data. In the majority of cases, X f ¯ q ( x ) d x will have to be computed numerically.

3. Rényi Heterogeneity Decomposition under a Non-Parametric Pooling Distribution

Definition 3 defines a general Gaussian mixture X as a weighted combination of component Gaussian random variables, without identifying the function form of the composition. The non-parametric Gaussian mixture, where the distribution over X is a simple model average over its Gaussian components, is specified in Definition 4.
Definition 3 (Gaussian Mixture). 
The n-dimensional Gaussian mixture X is a weighted combination of the set of n-dimensional Gaussian random variables X i i = 1 , 2 , , N with component weights w = w i i = 1 , 2 , , N such that 0 w i 1 and i = 1 N w i = 1 . The probability density function of component X i is denoted N x | μ i , Σ i , and is parameterized by an n × 1 mean vector μ i and n × n covariance matrix Σ i .
Definition 4 (Non-Parametric Gaussian Mixture). 
We define the random variable X as a non-parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as
f ¯ ( x | μ 1 : N , Σ 1 : N , w ) = i = 1 N w i N ( x | μ i , Σ i ) ,
where μ 1 : N and Σ 1 : N denote the set of component mean vectors μ 1 , , μ N and covariance matrices Σ 1 , , Σ N , respectively.
We now introduce the Rényi heterogeneity of a single n-dimensional Gaussian random variable (Proposition 2) and subsequently characterize the γ -, α -, and β -heterogeneity values for a non-parametric Gaussian mixture.
Proposition 2 
(Rényi Heterogeneity of a Multivariate Gaussian). The Rényi heterogeneity of an n-dimensional Gaussian random variable X with mean μ and covariance matrix Σ is
Π q X = Undefined q = 0 ( 2 π e ) n 2 Σ 1 2 q = 1 ( 2 π ) n 2 Σ 1 2 q = ( 2 π ) n 2 q n 2 ( q 1 ) Σ 1 2 q { 0 , 1 , } .
The proof of Proposition 2 is included in Appendix A. Unfortunately, a closed form solution such as Equation (12) cannot be obtained for the γ -heterogeneity of a non-parametric Gaussian mixture,
Π q γ X = X i = 1 N w i N ( x | μ i , Σ i ) q d x 1 1 q ,
which must be computed numerically to yield the effective size of the mixture’s domain. This process may be computationally expensive, particularly in high dimensions. Conversely, Equation (9), which yields the effective size of the domain per mixture component, can be evaluated in closed form for a Gaussian mixture:
Π q α X = Undefined q = 0 exp 1 2 n + i = 1 N w i log 2 π Σ i q = 1 0 q = ( 2 π ) n 2 i = 1 N w i q j = 1 N w j q Σ i 1 2 q n 2 1 1 q q { 0 , 1 , } .
The β -heterogeneity, which returns the effective number of components in the mixture, can then be computed using Equation (4). Example 1 demonstrates an important property of considering X as a non-parametric Gaussian mixture: that low-probability regions of the domain between well-separated components will have little to no effect on the γ - or β -heterogeneity estimates.
Example 1 
(Decomposition of Rényi heterogeneity in a univariate Gaussian mixture). Consider three non-parametric Gaussian mixtures X ( 1 ) , X ( 2 ) , X ( 3 ) defined on R whose number of components are respectively N 1 = 2 , N 2 = 3 , and N 3 = 4 . Components in each mixture are equally weighted—that is, the components of mixture X ( j ) have weights w i ( j ) = 1 / N j for all i { 1 , 2 , , N j } —and have equal standard deviation σ = 0.5 . This yields a per-component Rényi heterogeneity of approximately 2.07, which is also consequently the α-heterogeneity for each Gaussian mixture.
Figure 1 demonstrates the multiplicative decomposition of Rényi heterogeneity (at q = 1 ) in these Gaussian mixtures, where γ-heterogeneity was computed numerically, across varying separations of respective mixtures’ component means. Note that the β-heterogeneity in this case represents the effective number of distinct components in the mixture distribution, and is bound between 1 (when all components overlap), and N j (when all components are well separated). Further separating the mixture components beyond the point at which β-heterogeneity reaches N j yielded no further increase in β-heterogeneity.
Assuming sufficiently accurate approximation of the integral in Equation (13), the γ -heterogeneity in Example 1 appears to reach a limit corresponding to the sum of effective domain sizes under all mixture components, and the β -heterogeneity reaches a limit corresponding to the number of individual mixture components.
Unfortunately, computation of β -heterogeneity in a non-parametric Gaussian mixture will yield results whose accuracy will depend on the error of numerical integration, and which may consume significant computational resources when evaluated for large N (many components) and large n (high dimension). Monte Carlo integration may be preferable for high dimensional mixture distributions, but running samplers can still be costly if the γ -heterogeneity must be estimated many times. Although the non-parametric pooling approach may be the only available method for many distribution classes, a computationally efficient parametric pooling approach exists for Gaussian mixtures, to which we now turn our attention.

4. Rényi Heterogeneity Decomposition under a Parametric Pooling Distribution

This section introduces the parametric Gaussian mixture (Definition 5). This is essentially an ensemble of individual Gaussian distributions whose means and covariance matrices are weighted and pooled to obtain the mean and covariance matrix of the mixture as a whole. We subsequently provide conditions under which decomposition of the parametric Gaussian mixture’s heterogeneity satisfies the requirement that α -heterogeneity be a lower bound on γ -heterogeneity (Theorem 2). Parametric Gaussian mixtures are an important class of models commonly used in mixed-effects meta-analyses [21], where one models the effect size of each of K N + studies as Gaussians whose means are themselves Gaussian distributed with “true” effect-size μ * and variance τ 2 . The variance of the true effect, τ 2 , is often taken as an index of between-study heterogeneity, but unfortunately variance does not satisfy the replication principle [4]. A parametric Gaussian mixture can also be used to measure the effective number of natural images embedded in the real valued latent space of a variational autoencoder (a probabilistic deep learning model used to learn compressed representations of high-dimensional data) [5].
Definition 5 (Parametric Gaussian Mixture). 
We define the random variable X as an n-dimensional parametric Gaussian mixture if it is a Gaussian mixture (Definition 3) whose probability density function is defined as
f ¯ ( x | μ * , Σ * ) = N ( x | μ * , Σ * ) ,
with pooled mean vector
μ * = i = 1 N w i μ i ,
and pooled covariance matrix
Σ * = μ * μ * + i = 1 N w i Σ i + μ i μ i .
The efficiency of assuming a parametric, rather than non-parametric, Gaussian mixture is that γ -heterogeneity for the latter may be computed in closed form using Equation (12) (it is simply a function of Equation (17)). However, the critical difference between the parametric and non-parametric Gaussian mixture assumptions is that γ -heterogeneity—and therefore β -heterogeneity—will depend on the component means μ 1 : N , according to the following Lemma.
Lemma 1 
(Relationship of γ -Heterogeneity to Component Dispersion). Let X and X be N-component parametric Gaussian mixtures on R n with component-wise mean vectors μ 1 : N = μ i i = 1 , 2 , , N and μ 1 : N = c μ i i = 1 , 2 , , N , where c 1 is a scaling factor. The component-wise weights w and covariance matrices Σ 1 : N = Σ i i = 1 , 2 , , N are identical between X and X . Finally, let Σ * and Σ * be the pooled covariance matrices for X and X , respectively. Then, for all c 1 , we have that
Π q γ X Π q γ X ,
with equality if c = 1 .
Lemma 1, whose proof is detailed in Appendix A, implies that the resulting β -heterogeneity of a parametric Gaussian mixture will increase as the mixture component means are spread further apart. This follows from the fact that Equation (14), which is computed component-wise, remains a valid expression of the α -heterogeneity in a parametric Gaussian mixture.
Before stating the conditions under which α is a lower bound on γ for a parametric Gaussian mixture (Theorem 2), we introduce the following Lemma, whose proof is left to Appendix A.
Lemma 2. 
If Σ i i = 1 , 2 , , N is a set of N N 2 positive semidefinite n × n matrices with corresponding weights w = w i i = 1 , 2 , , N such that 0 w i 1 and i = 1 N w i = 1 , then
i = 1 N w i Σ i 1 2 i = 1 N Σ i w i 2 .
Theorem 2. 
The Rényi β-heterogeneity of order q = 1 of a parametric Gaussian mixture X (Definition 5) has a lower bound of 1:
Π 1 β X = Π 1 γ X Π 1 α X 1
Proof. 
Recall that Π q α X is independent of the mean-vectors of components in X (Equation (14)). Furthermore, it follows from Lemma 1 that, if μ 1 : N = 0 i = 1 , 2 , , N , where 0 is an n × 1 zero vector, then for any parametric Gaussian mixture X with means μ 1 : N , we will have Π q γ X Π q γ X , where equality is obtained if μ 1 : N are also zero vectors, or the covariance of mean vectors in X ,
Cov [ μ ] = E [ μ μ ] E [ μ ] E [ μ ] ,
is otherwise singular. Thus, it suffices to prove our theorem under the assumption that μ 1 : N = 0 i = 1 , 2 , , N , where the pooled covariance of X is redefined as
Σ * = i = 1 N w i Σ i .
The expression for Π 1 γ X Π 1 α X is
( 2 π e ) n 2 Σ * 1 2 exp 1 2 n + i = 1 N w i log 2 π Σ i ,
which after simplification,
Σ * 1 2 i = 1 N Σ i w i 2 ,
can be appreciated to satisfy Lemma 2. ☐
Although Theorem 2 highlights the reliability and flexibility of using elasticity q = 1 , we must emphasize that q = 1 may not be the only condition under which Π q γ X Π q α X , as suggested by Example 2. Indeed, Example 2 suggests that the integrity of this bound on β -heterogeneity at elasticity values q 1 may depend in various ways on the unique combination of component-wise parameters in a parametric Gaussian mixture.
Example 2 
(Decomposition of Rényi Heterogeneity in a Parametric Gaussian Mixture). Consider a parametric Gaussian mixture X with four components defined on R (for instance, Figure 2A). The components’ respective standard deviations are σ = 0.5 , 0.8 , 1.1 , 1.6 . We vary the column vector of mixture component weights w = w i i = 1 , , 4 according to the following function:
w ( a ) = 1 , 0 , 0 , 0 a = 0 0.25 , 0.25 , 0.25 , 0.25 a = 1 0 , 0 , 0 , 1 a = a 1 3 1 a 4 3 1 a i 1 3 i = 1 , , 4 a { 0 , 1 , }
which “skews” the distribution of weights over components in X according to the value of a skew parameter a 0 (shown in Figure 2B. As the parameter a decreases further below 1, components X 1 and X 2 (which have the narrowest distributions) become preferentially weighted. Conversely, as a increases above 1, components X 3 and X 4 are preferentially weighted. At a = 1 , all components are equally weighted (depicted as the dashed black lines in Figure 2B–F).
Figure 2C–E plot the γ-, α-, and β-heterogeneity for the parametric Gaussian mixture at q 1 , respectively, while Figure 2F computes the β-heterogeneity at q = 1 for variously skewed weight distributions. Note that, when the weight distributions are skewed, there is a discontinuity in β-heterogeneity around q = 1 . When the skew parameter results in a distribution of weights whose ranking of components agrees with the rank order of component distribution widths (that is, when the largest components of σ also have the highest weights), then β-heterogeneity appears to exceed 1 for q > 1 . However, when the component weights and distribution widths are anti-correlated (when the largest components of σ have the smallest weights, and vice versa), then we observe values of β-heterogeneity below 1 at values of q > 1 , as well as for some values of q < 1 .
Figure 3 illustrates the effect of progressively separating the locations (i.e., means) of mixture components on the resulting β -heterogeneity of parametric and non-parametric univariate Gaussian mixtures. We implemented mixtures with N { 2 , 3 , 4 } equally weighted components, respectively. Each Gaussian component had unit variance, since our comparison is primarily concerned with separation of component means. The mean of component i { 1 , , N } was set to i μ o , with μ o 0 . Thus, as μ o is increased, mixture components become progressively further separated.
The γ -heterogeneity values of parametric Gaussian mixtures were computed by pooling component means and variances according to Definition 5, to which we applied Equation (12). The γ -heterogeneity values of non-parametric Gaussian mixtures (Equation (13)) were computed using numerical integration, as well as in closed form using second-order asymptotic approximation. In all cases, the α -heterogeneity reduced simply to the Rényi heterogeneity of a single univariate Gaussian with unit variance.
Figure 3 further highlights that the β -heterogeneity of uniformly weighted non-parametric Gaussian mixtures tend to approach the number of individual components in the system. Conversely, the β -heterogeneity of parametric Gaussian mixtures continues increasing. In fact, one can show that, as the separation between mixture components becomes large, the β -heterogeneity approaches a linear rate of growth (Appendix B).

5. Discussion

This paper provided approaches for multiplicative decomposition of heterogeneity in continuous mixture distributions, thereby extending the earlier work on discrete space heterogeneity decomposition presented by Jost [9]. Two approaches were offered, dependent upon whether the distribution over the pooled system is defined either parametrically or non-parametrically. Our results improve the understanding of heterogeneity measurement in non-categorical systems by providing conditions under which decomposition of heterogeneity into α and β components conforms to the intuitive property that γ α .
If one defines the pooled mixture non-parametrically, as in a finite mixture model, heterogeneity is decomposable such that γ α for all q > 0 (if component weights are uniform, or at q = 1 , otherwise), and β may be interpreted as the discrete number of distinct mixture components (Section 2.2 and Section 3). This has the advantage of conforming with the original discrete decomposition by Jost [9], insofar as probability mass in the mixture is recorded only where it is observed in the data, and not elsewhere, as would be assumed under a parametric model of the pooled system. Consequently, one achieves a more precise estimate of the size of the pooled system’s base of support. The primary limitation arises from the need to numerically integrate the γ -heterogeneity, which can become prohibitively expensive in higher dimensions. Future work should investigate the error bounds on numerically integrated γ .
A more computationally efficient approach for decomposition of continuous Rényi heterogeneity is to assume that the pooled mixture has an overall parametric distribution. A common application for which this assumption is generally made is in mixed-effects meta-analysis [21]. An important departure from the non-parametric pooling approach of finite mixture models is that non-trivial probability mass may now be assigned to regions not covered by any of the constituent component distributions. From another perspective, one may appreciate that the non-parametric approach to pooling is insensitive to the distance between component distributions, and rather only measures the effective volume of event space to which component distributions assign probability. Conversely, assumption of the parametric distribution over mixture (in the case of Section 4, a Gaussian) incorporates the distance between the component distributions into the calculation of γ -heterogeneity. This would be appropriate in scenarios where one assumes that the observed components undersamples the true distribution on the pooled system. For example, in the case of mixed-effects meta-analysis, the available research studies for inclusion may differ significantly in terms of their means, but one might assume that there is a significant probability of a new study yielding an effect somewhere in between. Specifying a parametric distribution over the pooled system would capture this assumption.
One limitation of the present study is the use of a Gaussian model for the pooled system distribution. This was chosen on account of (A) its prevalence in the scientific literature and (B) analytical tractability. Future work should expand these results to other distributions. Notwithstanding this, we have demonstrated the decomposition of γ Rényi heterogeneity into its α and β components for continuous systems. There are (broadly) two approaches, based on whether parametric assumptions are made about the pooled system distribution. Under these assumptions applied to Gaussian mixture distributions, we provided conditions under which the criterion that γ α is satisfied. Future studies should evaluate this method as an alternative approach for the measurement of meta-analytic heterogeneity, and expand these results to other parametric distributions over the pooled system.

Author Contributions

Conceptualization, A.N.; methodology, A.N.; validation, A.N.; formal analysis, A.N.; investigation, A.N.; writing—original draft preparation, A.N.; writing—review and editing, M.A. and T.T.; visualization, A.N.; supervision, M.A. and T.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of Theorem 1. 
Following Jost [9] (proof 2), in the limit q 1 , one obtains the following inequality:
i = 1 N w i X f i ( x ) log f i ( x ) d x X f ¯ ( x ) log f ¯ ( x ) d x ,
whereas, when w i = w j for all ( i , j ) { 1 , 2 , , N } , for q > 1 , we have
1 N i = 1 N X f i q ( x ) d x X 1 N i = 1 N f i ( x ) q d x .
and, for q < 1 , we have
1 N i = 1 N X f i q ( x ) d x X 1 N i = 1 N f i ( x ) q d x ,
all of which hold by Jensen’s inequality. ☐
Proof of Proposition 2. 
We must solve the following integral:
Π q X = 2 π q n 2 Σ q 2 R n e q 2 x μ Σ 1 x μ d x 1 1 q
The eigendecomposition of the inverse of the covariance matrix Σ 1 into an orthonormal matrix of eigenvectors U and an n × n diagonal matrix of eigenvalues Λ = δ i j λ i i = 1 , 2 , , n j = 1 , 2 , , n , where δ i j is Kronecker’s delta, facilitates the substitution y = U 1 ( x μ ) required for Gaussian integration, by which we obtain the following solution for q { 0 , 1 , } :
Π q X = q n 2 ( q 1 ) ( 2 π ) n 2 Σ 1 2 .
L’Hôpital’s rule facilitates computation of the limit as q 1 :
lim q 1 log Π q X = lim q 1 n 2 ( q 1 ) log q + n 2 log ( 2 π ) + 1 2 log Σ = n 2 + n 2 log ( 2 π ) + 1 2 log Σ ,
giving the perplexity,
Π 1 X = 2 π e n 2 Σ 1 2 .
By the same procedure, we can compute the limit as q ,
Π X = 2 π n 2 Σ 1 2 ,
as well as show that Π 0 X is undefined. ☐
Proof of Lemma 1. 
For all q > 0 , proving Π q γ X Π q γ X amounts to proving Σ * 1 2 Σ * 1 2 . To this end, we have
(A9) Σ *   = i = 1 N w i Σ i + i = 1 N w i c μ i c μ i i = 1 N w i c μ i i = 1 N w i c μ i (A10) = i = 1 N w i Σ i + c i = 1 N w i μ i μ i μ * μ * (A11) = Σ ^ + c C [ μ ]
and
Σ * = Σ ^ + C [ μ ] ,
where we denoted Σ ^ = i = 1 N w i Σ i and C [ μ ] = i = 1 N w i μ i μ i μ * μ * for notational parsimony. Clearly, when c = 1 , we have Σ * = Σ * .
By the Minkowski determinant inequality, we have that
Σ * 1 2 Σ ^ 1 2 + c n 2 C [ μ ] 1 2
Σ * 1 2 Σ ^ 1 2 + C [ μ ] 1 2 ,
which, since c 1 , implies the first line is greater than or equal to the second. Subtracting the second line from the first and simplifying yields
Σ * 1 2 Σ * 1 2 C [ μ ] 1 2 c n 2 1
At c = 1 , Equation (A15) reduces to an equality, and, since c 1 and n 1 , Equation (A15) establishes that Σ * 1 2 Σ * 1 2 . ☐
Proof of Lemma 2. 
Since Σ 1 : N are positive semidefinite matrices, then for all x R n , we have that 1 2 x w i Σ i x 0 , and thus 1 2 x i = 1 N w i Σ i x 0 . By exponentiating the quadratic term, we have
e 1 2 x i = 1 N w i Σ i x = i = 1 N e 1 2 x Σ i x w i .
We obtain the following expressions by applying Gaussian integration to the left-hand side,
R n e 1 2 x i = 1 N w i Σ i x d x = 2 π n 2 i = 1 N w i Σ i 1 2 ,
as well as to a bound on the right-hand side obtained by Hölder’s inequality,
R n i = 1 N e 1 2 x Σ i x w i d x i = 1 N R n e 1 2 x Σ i x d x w i
= 2 π n 2 i = 1 N Σ i w i 2 .
Substituting Equations (A17) and (A19) into Equation (A16) and simplifying terms yields
i = 1 N w i Σ i 1 2 i = 1 N Σ i w i 2 .

Appendix B. Growth of β-Heterogeneity with Respect to Component Separation

Let X be a parametric Gaussian mixture with N component-wise means μ , 2 μ , , N μ , where μ is an n-dimensional column vector where all entries are equal to a scalar μ > 0 . Component-wise n × n covariance matrices are identical and equal to Σ . The pooled mean is therefore
μ * = N + 1 2 μ
and the pooled covariance matrix is
Σ * = Σ + N 2 1 12 μ μ .
Using Equations (9) and (12), the β -heterogeneity for this parametric Gaussian mixture can be computed as follows for q = 1 :
Π 1 β X = 2 π e 1 2 ( log ( | 2 π Σ | ) n ) + 1 Σ + 1 12 N 2 1 μ μ ,
and as follows for q { 0 , 1 , } :
Π q β X = ( 2 π ) 1 n 2 q 1 q 1 | Σ | q n 2 1 q 1 Σ + 1 12 N 2 1 μ μ .
We now show that the limit of d d μ Π q β X as μ approaches a non-negative constant, indicating that the growth in β -heterogeneity approaches a linear rate. At q = 1 , we obtain
lim μ Π 1 β X = e π 6 N 2 1 i = 1 n 1 2 ( i + 1 ) / i 2 e π ,
and at q { 0 , 1 , } , we have
lim μ Π q β X = q 1 2 ( q 1 ) π 6 N 2 1 i = 1 n 1 q 1 2 2 q 2 ( i + 1 ) / i 2 π ,
both expressions of which are non-negative and constant with respect to μ .

References

  1. Hooper, D.; Chapin, F.; Ewel, J.; Hector, A.; Inchausti, P.; Lavorel, S.; Lawton, J.; Lodge, D.; Loreau, M.; Naeem, S.; et al. Effects of biodiversity on ecosystem functioning: A consensus of current knowledge. Ecol. Monogr. 2005, 75, 3–35. [Google Scholar] [CrossRef]
  2. Cowell, F. Measuring Inequality, 2nd ed.; Oxford University Press: Oxford, UK, 2011. [Google Scholar]
  3. Nunes, A.; Trappenberg, T.; Alda, M. We need an operational framework for heterogeneity in psychiatric research. J. Psychiatry Neurosci. 2020, 45, 3–6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Nunes, A.; Trappenberg, T.; Alda, M. The Definition and Measurement of Heterogeneity. PsyArXiv 2020. [Google Scholar] [CrossRef]
  5. Nunes, A.; Alda, M.; Bardouille, T.; Trappenberg, T. Representational Rényi heterogeneity. Entropy 2020, 22, 417. [Google Scholar] [CrossRef] [Green Version]
  6. Hill, M. Diversity and Evenness: A Unifying Notation and Its Consequences. Ecology 1973, 54, 427–432. [Google Scholar] [CrossRef] [Green Version]
  7. Hannah, L.; Kay, J. Concentration in Modern Industry: Theory, Measurement, and the U.K. Experience; The MacMillan Press: London, UK, 1977. [Google Scholar]
  8. Lande, R. Statistics and partitioning of species diversity and similarity among multiple communities. Oikos 1996, 76, 5–13. [Google Scholar] [CrossRef]
  9. Jost, L. Partitioning Diversity into Independent Alpha and Beta Components. Ecology 2007, 88, 2427–2439. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and hrases and their compositionality. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2013; pp. 1–9. [Google Scholar]
  11. Pennington, J.; Socher, R.; Manning, C. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar] [CrossRef]
  12. Nickel, M.; Kiela, D. Poincaré embeddings for learning hierarchical representations. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 2017, pp. 6339–6348. [Google Scholar]
  13. Price, A.L.; Patterson, N.J.; Plenge, R.M.; Weinblatt, M.E.; Shadick, N.a.; Reich, D. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 2006, 38, 904–909. [Google Scholar] [CrossRef] [PubMed]
  14. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020. [Google Scholar]
  15. Ricotta, C.; Szeidl, L. Diversity partitioning of Rao’s quadratic entropy. Theor. Popul. Biol. 2009, 76, 299–302. [Google Scholar] [CrossRef] [PubMed]
  16. Leinster, T.; Cobbold, C. Measuring diversity: The importance of species similarity. Ecology 2012, 93, 477–489. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Chiu, C.; Chao, A. Distance-based functional diversity measures and their decomposition: A framework based on hill numbers. PLoS ONE 2014, 9, e100014. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Chao, A.; Chiu, C.H.; Villéger, S.; Sun, I.F.; Thorn, S.; Lin, Y.C.; Chiang, J.M.; Sherwin, W.B. An attribute-diversity approach to functional diversity, functional beta diversity, and related (dis)similarity measures. Ecol. Monogr. 2019, 89, e01343. [Google Scholar] [CrossRef] [Green Version]
  19. Marquand, A.; Wolfers, T.; Mennes, M.; Buitelaar, J.; Beckmann, C. Beyond Lumping and Splitting: A Review of Computational Approaches for Stratifying Psychiatric Disorders. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2016, 1, 433–447. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Wilson, M.; Shmida, A. Measuring Beta Diversity with Presence-Absence Data. J. Ecol. 1984, 72, 1055. [Google Scholar] [CrossRef]
  21. DerSimonian, R.; Laird, N. Meta-analysis in clinical trials. Control. Clin. Trials 1986, 7, 177–188. [Google Scholar] [CrossRef]
Figure 1. Demonstration of the multiplicative decomposition of Rényi heterogeneity in Gaussian mixture models, where γ -heterogeneity is computed using numerical integration. Each row represents a different number of mixture components (from top to bottom: 2, 3, and 4 univariate Gaussians with σ = 0.5 , respectively). Each column shows a case in which the component locations are progressively further separated ( max i μ i min i μ i distance from left to right: 0, 2, 4, 6). The α -heterogeneity in all scenarios was 2.07 . The headings on each panel show the resulting γ and β -heterogeneity values.
Figure 1. Demonstration of the multiplicative decomposition of Rényi heterogeneity in Gaussian mixture models, where γ -heterogeneity is computed using numerical integration. Each row represents a different number of mixture components (from top to bottom: 2, 3, and 4 univariate Gaussians with σ = 0.5 , respectively). Each column shows a case in which the component locations are progressively further separated ( max i μ i min i μ i distance from left to right: 0, 2, 4, 6). The α -heterogeneity in all scenarios was 2.07 . The headings on each panel show the resulting γ and β -heterogeneity values.
Entropy 22 00858 g001
Figure 2. Graphical counterexample showing that α -heterogeneity is not always a lower bound on γ -heterogeneity when q 1 for a parametric Gaussian mixture. (A) four univariate Gaussian components used in the mixture distribution evaluated. (B) mixture component weights. Each colored line (see (C)) represents a different distribution of weights on the mixture components, such that, in some settings, the most narrow components are weighted highest, and vice versa. Weightings W 1 to W 7 were generated by varying the parameter a (from Equation (25)) across the following values: a { 0.32 , 0.5 , 0.79 , 1 . , 1.26 , 2 . , 3.16 } . (C) γ -heterogeneity as computed by pooling the mixture components from Panel A according to Equation (15), for each weighting scheme at q 1 . (D) the α -heterogeneity for each weighting scheme at q 1 ; (E) the β -heterogeneity across each weighting scheme at q 1 ; (F) the β -heterogeneity across various weighting schemes (plotted on the x-axis in log scale) at q = 1 . The vertical coloured lines correspond to the values of Π 1 β X across the weighting schemes W 1 : 7 shown in the legend of (C).
Figure 2. Graphical counterexample showing that α -heterogeneity is not always a lower bound on γ -heterogeneity when q 1 for a parametric Gaussian mixture. (A) four univariate Gaussian components used in the mixture distribution evaluated. (B) mixture component weights. Each colored line (see (C)) represents a different distribution of weights on the mixture components, such that, in some settings, the most narrow components are weighted highest, and vice versa. Weightings W 1 to W 7 were generated by varying the parameter a (from Equation (25)) across the following values: a { 0.32 , 0.5 , 0.79 , 1 . , 1.26 , 2 . , 3.16 } . (C) γ -heterogeneity as computed by pooling the mixture components from Panel A according to Equation (15), for each weighting scheme at q 1 . (D) the α -heterogeneity for each weighting scheme at q 1 ; (E) the β -heterogeneity across each weighting scheme at q 1 ; (F) the β -heterogeneity across various weighting schemes (plotted on the x-axis in log scale) at q = 1 . The vertical coloured lines correspond to the values of Π 1 β X across the weighting schemes W 1 : 7 shown in the legend of (C).
Entropy 22 00858 g002
Figure 3. Comparison of β -heterogeneity for univariate Gaussian mixtures with varying numbers of components (blue lines N = 2 ; purple lines N = 3 ; gold lines N = 4 ). Individual Gaussian components have unit variance, and the mean of component i { 1 , , N } is set to i μ o , with μ o 0 . Solid lines show the β -heterogeneity computed for the non-parametric Gaussian mixture using a second-order asymptotic approximation to the integral in Equation (13). Dotted markers show β -heterogeneity of respective non-parametric Gaussian mixtures with the γ -heterogeneity estimated numerically. Dashed lines show the respective β -heterogeneities for parametric Gaussian mixtures.
Figure 3. Comparison of β -heterogeneity for univariate Gaussian mixtures with varying numbers of components (blue lines N = 2 ; purple lines N = 3 ; gold lines N = 4 ). Individual Gaussian components have unit variance, and the mean of component i { 1 , , N } is set to i μ o , with μ o 0 . Solid lines show the β -heterogeneity computed for the non-parametric Gaussian mixture using a second-order asymptotic approximation to the integral in Equation (13). Dotted markers show β -heterogeneity of respective non-parametric Gaussian mixtures with the γ -heterogeneity estimated numerically. Dashed lines show the respective β -heterogeneities for parametric Gaussian mixtures.
Entropy 22 00858 g003

Share and Cite

MDPI and ACS Style

Nunes, A.; Alda, M.; Trappenberg, T. Multiplicative Decomposition of Heterogeneity in Mixtures of Continuous Distributions. Entropy 2020, 22, 858. https://doi.org/10.3390/e22080858

AMA Style

Nunes A, Alda M, Trappenberg T. Multiplicative Decomposition of Heterogeneity in Mixtures of Continuous Distributions. Entropy. 2020; 22(8):858. https://doi.org/10.3390/e22080858

Chicago/Turabian Style

Nunes, Abraham, Martin Alda, and Thomas Trappenberg. 2020. "Multiplicative Decomposition of Heterogeneity in Mixtures of Continuous Distributions" Entropy 22, no. 8: 858. https://doi.org/10.3390/e22080858

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop