Next Article in Journal
Continuous Stability TS Fuzzy Systems Novel Frame Controlled by a Discrete Approach and Based on SOS Methodology
Next Article in Special Issue
A Central Limit Theorem for Predictive Distributions
Previous Article in Journal
Resolutions of the Jerk and Snap Vectors for a Quasi Curve in Euclidean 3-Space
Previous Article in Special Issue
The Rescaled Pólya Urn and the Wright—Fisher Process with Mutation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mixture of Species Sampling Models

by
Federico Bassetti
and
Lucia Ladelli
*,†
Department of Mathematics, Politecnico of Milano, 20133 Milano, Italy
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Mathematics 2021, 9(23), 3127; https://doi.org/10.3390/math9233127
Submission received: 5 November 2021 / Revised: 30 November 2021 / Accepted: 2 December 2021 / Published: 4 December 2021

Abstract

:
We introduce mixtures of species sampling sequences (mSSS) and discuss how these sequences are related to various types of Bayesian models. As a particular case, we recover species sampling sequences with general (not necessarily diffuse) base measures. These models include some “spike-and-slab” non-parametric priors recently introduced to provide sparsity. Furthermore, we show how mSSS arise while considering hierarchical species sampling random probabilities (e.g., the hierarchical Dirichlet process). Extending previous results, we prove that mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. Using this representation, we give an explicit expression of the Exchangeable Partition Probability Function of the partition generated by an mSSS. Some special cases are discussed in detail—in particular, species sampling sequences with general base measures and a mixture of species sampling sequences with Gibbs-type latent partition. Finally, we give explicit expressions of the predictive distributions of an mSSS.

1. Introduction

Discrete random measures have been widely used in Bayesian nonparametrics. Noteworthy examples of such random measures are the Dirichlet process [1], the Pitman–Yor process [2,3], (homogeneous) normalized random measures with independent increments (see, e.g., [4,5,6,7]), Poisson–Kingman random measures [8] and stick-breaking priors [9]. All the previous random measures are of the form
P = j 1 p j δ Z j ,
where ( Z j ) j 1 are i.i.d. random variables taking values in a Polish space ( X , X ) with common distribution H, and ( p j ) j 1 are random positive weights in [ 0 , 1 ] , independent of ( Z j ) j 1 , such that p 1 p 2 p 3 .
With a few exceptions—see, e.g., [1,4,10,11,12,13,14]—the base measure H of a random measure P in (1) is usually assumed to be diffuse, since this simplifies the derivation of various analytical results.
The diffuseness of H is assumed also to define the so-called species sampling sequences [15], exchangeable sequences whose directing measure is a discrete random probability of type (1). In this case, the diffuseness of H is motivated by the interpretation of species sampling sequences as sequences describing a sampling mechanism in discovering species from an unknown population. In this context, the Z j s are the possible infinite different species, and the diffuseness of H ensures that there is no redundancy in this description.
On the other hand, from a Bayesian point of view, the diffuseness of H is not always reasonable and there are situations in which a discrete (or mixed) H is indeed natural. For example, recent interest in species sampling models with a spike-and-slab base measure emerged in [16,17,18,19,20,21] in order to induce sparsity and facilitate variable selection. Other models, which are implicitly related to species sampling sequences with non-diffused base measures, are mixtures of Dirichlet processes [10] and hierarchical random measures; see, e.g., [22,23,24,25].
The combinatorial structure of species sampling sequences derived from random measure (1) with general H have been recently studied in [14].
In this paper, we discuss some relevant properties of species sampling sequences with general base measures, as well as some further generalizations, namely mixtures of species sampling sequences with general base measures (mSSS).
An mSSS is an exchangeable sequence whose directing random measure is of type (1), where ( Z n ) n 1 is a sequence of exchangeable random variables and ( p n ) n 1 are random positive weights in [ 0 , 1 ] with p 1 p 2 p 3 , independent of ( Z n ) n 1 .
The core of the results that we prove in this paper is that all the mSSS can be obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. We summarize the results of Section 3 in the next statement.
The following are equivalent:
  • ξ = ( ξ n ) n 1 is an mSSS;
  • with probability one ( ξ n ) n 1 = ( Z I n ) n 1 , where ( I n ) n 1 is a sequence of integer-valued random variables independent of the Zs such that, conditionally on p : = ( p 1 , p 2 , ) , the I n are independent and P { I n = i | p } = p i .
  • with probability one ( ξ n ) n 1 : = ( Z C n ( Π ) ) n 1 , where ( Z n ) n 1 is an exchangeable sequence with the same law of ( Z n ) n 1 , Π is an exchangeable partition, independent of ( Z n ) n 1 , obtained by sampling from ( p n ) n 1 , and C n ( Π ) is the index of the block in Π containing n.
The partition Π obtained from p = ( p 1 , p 2 , ) is the so-called paint-box process associated with p . In general, this partition, called the latent partition, does not coincide with the partition induced by the ( ξ n ) n 1 . Note that also the sequence ( Z n ) n 1 is latent, in the sense that it cannot be obtained if only ( ξ n ) n 1 is known. On the other hand, combining the information contained in ( Z n ) n 1 and in Π , one obtains complete knowledge of ( ξ n ) n 1 , and, in particular, of its clustering behavior. This last observation is essential for the development of all the other results presented in our paper.
The rest of the paper is organized as follows. Section 2 reviews some important results on species sampling models and exchangeable random partitions. Section 3 introduces mixtures of species sampling sequences and discusses how these sequences are related to various types of Bayesian models. In the same section, the stochastic representations for mixtures of species sampling sequences sketched above are proven. In Section 4, we provide an explicit expression of the Exchangeable Partition Probability Function (EPPF) of the partition generated by such sequences. This result is achieved considering two EPPFs arising from a suitable latent partition structure. Some special cases are further detailed. Finally, Section 5 deals with the predictive distributions of mixtures of species sampling sequences.

2. Background Materials

In this section, we briefly review some basic notions of exchangeable random partitions and species sampling models.

2.1. Exchangeable Random Partitions

A partition π n of [ n ] : = { 1 , , n } is an unordered collection { π 1 , n , , π k , n } of disjoint non-empty subsets (blocks) of { 1 , , n } such that j = 1 k π j , n = [ n ] . A partition π n = { π 1 , n , π 2 , n , , π k , n } has | π n | : = k blocks (with 1 | π n | n ) and | π c , n | , with c = 1 , , k , is the number of elements of the block c. We denote by P n the collection of all partitions of [ n ] and, given a partition, we list its blocks in ascending order of their smallest element, i.e., in order of their appearance. For instance, we write [ ( 1 , 3 ) , ( 2 , 4 ) , ( 5 ) ] and not [ ( 2 , 4 ) , ( 3 , 1 ) , ( 5 ) ] .
A sequence of random partitions, Π = ( Π n ) n 1 , defined on a common probability space, is called a random partition of N if, for each n, the random variable Π n takes values in P n and, for m < n , the restriction of Π n to P m is Π m (consistency property).
In order to define an exchangeable random partition, given a permutation ρ of [ n ] and a partition π n in P n , we denote by ρ ( π n ) the partition with blocks { ρ ( j ) : j π i , n } for i = 1 , , | π n | . A random partition of N is said to be exchangeable if Π n has the same distribution of ρ ( Π n ) for every n and every permutation ρ of [ n ] . In other words, its law is invariant under the action of all permutations (acting on Π n in the natural way).
The law of any exchangeable random partition on N is completely characterized by its Exchangeable Partition Probability Function (EPPF); in other words, there exists a unique symmetric function q on the integers such that, for any partition π n in P n ,
P { Π n = π n } = q | π 1 , n | , , | π k , n |
where k is the number of blocks in π n . In the following, we shall write Π q to denote an exchangeable partition of N with EPPF q . Note that an EPPF is indeed a family of symmetric functions q k n ( · ) defined on C n , k = { ( n 1 , , n k ) N k : i = 1 k n i = n } . To simplify the notation, we write q instead of q k n . Alternatively, one can assume that q is a function on n N k = 1 n C n , k . See [26].
Given a sequence of random variables X = ( X j ) j 1 taking values in some measurable space, the random partition Π * ( X ) induced by X is defined as the random partition obtained by the equivalence classes under the random equivalence relation i ( ω ) j ( ω ) if and only if X i ( ω ) = X j ( ω ) . One can check that a partition induced by an exchangeable random sequence is an exchangeable random partition.
Recall that, by de Finetti’s theorem, a sequence X = ( X n ) n 1 taking values in a Polish space ( X , X ) is exchangeable if and only if the X n s, given some random probability measure Q on X , are conditionally independent with common distribution Q. Moreover, the random probability Q, known as the directing random measure of X, is the almost sure limit (with respect to weak convergence) of the empirical process 1 n i = 1 n δ X i .
Based on de Finetti’s theorem, Kingman’s correspondence theorem sets up a one-to-one map between the law of an exchangeable random partition on N (i.e., its EPPF) and the law of random ranked weights p = ( p j ) j 1 satisfying 1 p 1 p 2 0 and j p j 1 (with probability one). To state the theorem, recall that a partition Π is said to be generated by a (possibly random) p , if it is generated by a sequence ( I n ) n 1 of integer-valued random variables that are conditionally independent given p with conditional distribution
P { I n = i | p } : = 1 j 1 p j if i = n p i if i 1 ,
Note that 1 j 1 p j is the magnitude of the so-called “dust” component; indeed, each I n sampled from this part, i.e., I n = n , contributes to a singleton n in the partition Π . A consequence is that if j 1 p j = 1 a.s., the partition Π has no singleton. The partition Π * ( I ) is sometimes referred to as the p -paintbox process; see [27].
Let : = { p j [ 0 , 1 ] : p 1 p 2 , j 1 p j 1 } . We are now ready to state Kingman’s theorem.
Theorem 1
([28]). Given any exchangeable random partition Π with EPPF q , denote by Π j , n the blocks of the partition rearranged in decreasing order with respect to the number of elements in the blocks of Π n . Then,
lim n | Π j , n | n j 1 = ( p j ) j 1 a . s .
for some random p = ( p j ) j 1 taking values in. Moreover, on a possibly enlarged probability space, there is a sequence of integer-valued random variables I = ( I n ) n 1 , conditionally independent given p , such that (3) holds and the partition induced by I is equal to Π a.s.
Kingman’s theorem is usually stated in a slightly weaker form (see, e.g., Theorem 2.2 in [26]) and the equality between Π * ( I ) and Π is given in law. The previous “almost sure” version can be easily derived by inspecting the proof of Kingman’s theorem given in [29].
A consequence of the previous theorem is that q Law ( p ) for p in (4) defines a bijection from the set of the EPPF and the laws on ∇.
If p is proper, i.e., j 1 p j = 1 a.s., then Kingman’s correspondence between p and the EPPF q can be made explicit by
q ( n 1 , , n k ) = ( j 1 , , j k ) N k E i = 1 k ( p j i ) n i .
where N k is the set of all ordered k-tuples of distinct positive integers. See Chapter 2 [26].
Given an EPPF q , one deduces the corresponding sequence of predictive distributions, which is the sequence of conditional distributions
P { Π n + 1 = π n + 1 | Π n = π n }
when Π q . Starting with Π 1 = { 1 } , given Π n = π n (with | π n | = k ), the conditional probability of adding a new block (containing n + 1 ) to Π n is
ν n ( π n ) = ν n ( | π 1 , n | , , | π k , n | ) : = q ( | π 1 , n | , , | π k , n | , 1 ) q ( | π 1 , n | , , | π k , n | ) ;
while the conditional probability of adding n + 1 to the -th block of Π n (for = 1 , , k ) is
ω n , ( π n ) = ω n , ( | π 1 , n | , , | π k , n | ) : = q ( | π 1 , n | , , | π , n | + 1 , , | π k , n | ) q ( | π 1 , n | , , | π k , n | ) .

2.2. Species Sampling Models

A species sampling random probability (SSrp) is a random probability of the form
P = j 1 p j δ Z j + ( 1 j 1 p j ) H
where ( Z j ) j 1 are i.i.d. random variables taking values in a Polish space ( X , X ) with common distribution H, and ( p j ) j 1 are random positive weights in [ 0 , 1 ] , independent of ( Z j ) j 1 , such that j 1 p j 1 with probability one. These random probability measures are also known as Type III random probability measures; see [30].
Given the SSrp in (8), let ( p j ) j 1 be the ranked sequence obtained from ( p j ) j 1 rearranging the p j s in decreasing order. One can always write
P = j 1 p j δ Z ˜ j + ( 1 j 1 p j ) H
where ( Z ˜ j ) j 1 is a suitable random reordering of the original sequence ( Z j ) j 1 . It is easy to check that ( Z ˜ j ) j 1 are i.i.d. random variables with law H independent of ( p j ) j 1 . Hence, H and the EPPF q associated via Kingman’s correspondence with ( p j ) j 1 completely characterize the law of P, from now on denoted by S S r p ( q , H ) .
S S r p with H diffuse are also characterized as directing random measures of a particular type of exchangeable sequences, known as species sampling sequences. Let q be an EPPF and H a diffuse probability measure on a Polish space X . An exchangeable sequence ξ : = ( ξ n ) n taking values in X is a species sampling sequence, S S S ( q , H ) , if the law of ( ξ n ) n is characterized by the predictive system:
  • (PS1) P { ξ 1 d x } = H ( d x ) ;
  • (PS2) the conditional distribution of ξ n + 1 given ( ξ 1 , , ξ n ) is
    P ξ n + 1 d x | ξ 1 , , ξ n = c = 1 K ω n , c δ ξ c * ( d x ) + ν n H ( d x ) ,
    where ( ξ 1 * , , ξ K * ) is the sequence of distinct observations in order of appearance, ω n , c = ω n , c ( | Π 1 , n | , , | Π K , n | ) , ν n = ν n ( | Π 1 , n | , , | Π K , n | ) , K = | Π n | , Π n is the random partition induced by ( ξ 1 , , ξ n ) and ω n , c and ν n are related to the q by (6) and (7).
We summarize here some results proven in [15].
Proposition 1
([15]). Let H be a diffuse probability measure; then, an exchangeable sequence ( ξ n ) n is characterized by (PS1)–(PS2) if and only if its directing random measure is an S S r p ( q , H ) .
As noted in [29], the partition induced by any exchangeable sequence taking values in X with directing measure μ ˜ depends only on the sequence μ ˜ ( x ˜ j ) , where x ˜ j are the random atoms forming the discrete component of μ ˜ and ordered in such a way that μ ˜ ( x ˜ 1 ) μ ˜ ( x ˜ 2 ) . Combining this observation with the previous proposition, one can see that, when H is diffuse and ξ is an S S S ( q , H ) , the partition Π * ( ξ ) is equal (a.s.) to Π * ( I ) (where I is defined as in Kingman’s theorem) and Π * ( ξ ) has EPPF q . Note that [29] defines the p -paintbox process as any random partition Π * ( ξ ) where ξ is an exchangeable sequence with directing random measure (9) and H is a diffuse measure.
One can show (see the proof of Proposition 13 in [15]) that an S S S ( q , H ) can be obtained by assigning the values of an i.i.d. sequence ( Z n ) n with distribution H to the classes of an independent exchangeable random partition with EPPF q . More formally, for a random partition Π , let C n ( Π ) be the random index denoting the block containing n, i.e.,
C n ( Π ) = c if n Π c , n
or equivalently if n Π c , j for some (and hence all) j n . If Z = ( Z n ) n is an i.i.d. sequence with law H (diffuse), Π is an exchangeable partition with Π q , and Z and Π are stochastically independent, then
( ξ n ) n 1 : = ( Z C n ( Π ) ) n 1
is an S S S ( q , H ) . Note that the Z n s appearing in (10) are not the same Z n s of (8), although they have the same law.
It is worth mentioning that the original characterization given in [15] of species sampling sequences is stronger than the one summarized here. Indeed, the original definition of SSS is given using a slightly weaker predictive assumption. For details, see Proposition 13 and the discussion following Proposition 11 in [15].
In summary, when H is diffuse, one can build a species sampling sequence ( ξ n ) n by one of the following equivalent constructions:
  • using the system of predictive distributions (PS1)–(PS2);
  • sampling (conditionally) i.i.d. variables from (8);
  • combining an i.i.d. sequence from H with an exchangeable random partition by (10).

3. Mixture of Species Sampling Models

We now discuss some possible generalizations of the notion of species sampling sequences and we show that the three constructions presented above are no more equivalent in this setting.

3.1. Definitions and Relation to Other Models

Exchangeable sequences sampled from an S S r m with a general base measure, also known as generalized species sampling sequences ( g S S S ), have been introduced and studied in [14,25].
Definition 1
( g S S S ( q , H ) ). ( ξ n ) n 1 is a g S S S ( q , H ) if it is an exchangeable sequence with directing random measure P, where P S S r p ( q , H ) , H being any measure on ( X , X ) (not necessarily diffuse).
Clearly, a g S S S ( q , H ) with H diffuse is an S S S ( q , H ) . On the contrary, if ξ is a g S S S ( q , H ) with H non-diffuse, (PS1)–(PS2) are no longer true. Moreover, the EPPF of the random partition induced by ξ with H non-diffuse is not q . The relation between the partition induced by ξ and q has been studied in [14].
In [25], the definition of g S S S ( q , H ) with H not necessarily diffuse was motivated by an interest in defining the class of the so-called hierarchical species sampling models. If ξ 1 , ξ 2 , are exchangeable random variables with a directing random measure of hierarchical type, one has that
ξ n | P 1 , P 0 i . i . d . P 1 n 1 P 1 | P 0 S S r p ( q , P 0 ) P 0 S S r p ( q 0 , H 0 ) .
In order to understand why the general definition of g S S S ( q , H ) is useful in this context, note that, even if H 0 is diffuse and q 0 is proper (i.e., the p associated with q 0 by Kingman’s correspondence are proper), the conditional distribution of [ ξ n ] n 1 given P 0 is not an S S S , since P 0 is a.s. a purely atomic probability measure on X . Moreover, assuming that q is proper, we can write
P 1 = j p j 1 δ Z j
where Z j are conditionally i.i.d. with common distribution P 0 , given P 0 , and ( p j 1 ) j are associated by Kingman’s correspondence with the EPPF q . In other words, in this case, ( ξ n ) n 1 are exchangeable with directing random measure P 1 = j p j 1 δ Z j , where ( p j 1 ) j and ( Z j ) j are independent and ( Z j ) j are exchangeable with directing measure P 0 S S r p ( q 0 , H 0 ) .
The previous observation suggests a further generalization of species sampling sequences.
Definition 2
( m S S S ). We say that ( ξ n ) n 1 is a mixture of species sampling sequences ( m S S S ) if it is an exchangeable sequence with directing random measure
P = j 1 p j δ Z j + ( 1 j 1 p j ) H ˜
where Z : = ( Z n ) n 1 is an exchangeable sequence with directing random measure H ˜ , p a sequence of random weight inwith EPPF q such that P { j 1 p j > 0 } > 0 , ( Z , H ˜ ) and p are stochastically independent.
First of all, note that g S S S ( q , H ) is a particular case of Definition 2, obtained from a deterministic H ˜ = H . Moreover, Definition 2 can be seen as a mixture of g S S S . Indeed, if ξ = ( ξ n ) n 1 is as in Definition 2 and H ˜ is the directing random measure of ( Z n ) n , then the conditional distribution of ξ given H ˜ is a g S S S ( q , H ˜ ) . This motivates the name “mixture of species sampling sequences”.
It is worth noticing that one can also consider more general mixtures of SSS. The most general mixture one can take into consideration leads to a random probability measure of the form (11), where Z : = ( Z n ) n 1 are exchangeable random variables with directing random measure H ˜ , p is a sequence of random weight in ∇ such that P { j 1 p j > 0 } > 0 , where [ Z , H ˜ ] , and p are not necessarily stochastically independent.
As an example of this more general situation, we describe the so-called mixtures of Dirichlet processes as defined in [10]. Recall that a Dirichlet process P D i r ( α ) is defined as a random probability measure characterized by the system of finite n-dimensional distributions
P { ( P ( A 1 ) , , P ( A n ) ) · } = D i r · ; α ( A 1 ) , , α ( A n ) n 1 , A i X
where D i r ( · ; a 1 , , a n ) is the Dirichlet measure (on the n 1 simplex) of parameters a 1 , , a n and α is a finite σ -additive measure on X . It is well known that a Dirichlet process is an S S r p ( q , H ) for H ( · ) = α ( · ) / α ( X ) and
q ( n 1 , , n k ) = α ( X ) k ( α ( X ) ) n c = 1 k ( n c 1 ) ! ,
where ( x ) n = x ( x + 1 ) ( x + n 1 ) is the rising factorial (or Pochhammer polynomial); see [2,31]. A mixture of Dirichlet processes is defined in [10] as a random probability measure P characterized by the n-dimensional distributions
P { ( P ( A 1 ) , , P ( A n ) ) · } = U D i r · ; α u ( A 1 ) , , α u ( A n ) Q ( d u )
where, now, ( u , A ) α u ( A ) is a kernel measure on U × X (in particular, A α u ( A ) is a finte σ -additive measure on X for every u U ), ( U , U ) is a (Borel) regular space (e.g., a Polish space) and Q is a probability measure on U .
Using the fact that a Dirichlet process is the S S r p described above, one can prove that any mixture of Dirichlet processes has a representation of the form (11), where ( ( Z n ) n 1 H ˜ ) and p are stochastically dependent. More precisely, the joint law of ( H ˜ , ( Z n ) n 1 , p ) is characterized by the law of the (augmented) random element
( H ˜ , ( Z n ) n 1 , p , u ˜ )
given by the following:
  • u ˜ is a random variable taking values in U with law Q;
  • H ˜ ( · ) : = α u ˜ ( · ) / α u ˜ ( X ) ;
  • ( Z n ) n 1 are exchangeable random variables with directing random measure H ˜ ;
  • p is sequence of random weight in ∇ such that P { j 1 p j = 1 } = 1 and the conditional distribution of p given u ˜ depends only on α u ˜ ( X ) . In particular, the (conditional) EPPF associated with the law of p given u ˜ has the form
    q ( n 1 , , n k | u ˜ ) : = α u ˜ ( X ) k ( α u ˜ ( X ) ) n c = 1 k ( n c 1 ) !
Note that the marginal EPPF of the p , obtained by integrating (14) with respect to the law of u ˜ , is
q ( n 1 , , n k ) = c = 1 k ( n c 1 ) ! U α u ( X ) k ( α u ( X ) ) n Q ( d u ) .
Without further assumptions, a mixture of Dirichlet processes is a mixture of SSrp with p and H ˜ ( · ) possibly dependent. Nevertheless, with this representation at hand, one can easily deduce that if ( ξ n ) n 1 is sampled from a mixture of Dirichlet processes under the additional hypothesis that Q is such that α u ˜ ( X ) and α u ˜ ( · ) / α u ˜ ( X ) are independent, then ( ξ n ) n 1 satisfies Definition 2, with H ˜ = α u ˜ ( · ) / α u ˜ ( X ) and q given by (15).
In the rest of the paper, we focus our attention on mSSS for which [ Z , H ˜ ] and p are independent.

3.2. Representation Theorems for mSSS

In this section, we give two alternative representations for exchangeable sequences as in Definition 2, which generalize Proposition 1 in [14].
Proposition 2.
An exchangeable sequence ξ = ( ξ n ) n 1 is an m S S S as in Definition 2 if and only if
ξ n = Z I n a . s .
where Z + = ( Z n ) n 1 , H ˜ and p are as in Definition 2 , Z = ( Z n ) n 1 are further conditionally (given H ˜ ) i.i.d. random variables with conditional distribution H ˜ , and ( I n ) n 1 is a sequence of integer-valued random variables independent of the Zs and H ˜ , such that, conditionally on p , the I n are independent and (3) holds. All the random elements are defined on a possibly enlarged probability space.
Proof. 
Let σ 2 = [ Z + , H ˜ , p ] , where Z + , H ˜ , p are defined as in Definition 2 (mSSS). Set α = 1 j 1 p j . On a possibly enlarged probability space, let ( Z ) = ( Z n ) n 1 be a sequence of random variables conditionally i.i.d. given H ˜ with conditional distribution H ˜ and let I = ( I n ) n 1 be a sequence of integer-valued random variables conditionally independent given p with conditional distribution (3) with I n in place of I n . One can also assume that I and Z = [ Z + , ( Z ) ] are independent given [ p , H ˜ ] ; see Lemma A1 in the Appendix A. Set τ 1 = [ I , ( Z ) ] and define
( ξ n ) n 1 = ϕ ( τ 1 , σ 2 ) : = ( Z I n 1 { I n 1 } + Z n 1 { I n = n } ) n 1 .
Let us show that the law of ξ : = ( ξ n ) n 1 given σ 2 is the same as the law of ξ given σ 2 . Take n Borel sets A 1 , , A n and non-zero integer numbers i 1 , , i n . One has
P ξ 1 A 1 , , ξ n A n , I 1 = i 1 , , I n = i n | H ˜ , p , Z = j = 1 n δ Z i j ( A j ) p i j 1 { i j > 0 } + α δ Z j ( A j ) 1 { i j = j } .
Conditionally on H ˜ , the ( Z n ) n 1 are i.i.d. with law H ˜ so that
P ξ 1 A 1 , , ξ n A n , I 1 = i 1 , , I n = i n | H ˜ , p , Z + = E P ξ 1 A 1 , , ξ n A n , I 1 = i 1 , , I n = i n | H ˜ , p , Z | H ˜ , p , Z + = E [ j = 1 n δ Z i j ( A j ) p i j 1 { i j > 0 } + α δ Z j ( A j ) 1 { i j = j } H ˜ , p , Z + = j = 1 n E [ δ Z i j ( A j ) p i j 1 { i j > 0 } + α δ Z j ( A j ) 1 { i j = j } H ˜ , p , Z + = j = 1 n δ Z i j ( A j ) p i j 1 { i j > 0 } + α H ˜ ( A j ) 1 { i j = j } .
Marginalizing with respect to i 1 , , i n ,
P ξ 1 A 1 , , ξ n A n | H ˜ , p , Z + = j = 1 n P ( A j ) .
Recalling that P = j 1 p j δ Z j + α H ˜ ,
P ξ 1 A 1 , , ξ n A n | P = j = 1 n P ( A j )
almost surely. Since X is Polish, we have proven that, given P, ( ξ n ) n 1 are i.i.d. with common distribution P. In particular, we have proven that ξ : = ( ξ n ) n 1 given σ 2 is the same as the law of ξ given σ 2 . This concludes the proof of the “if part”, since, by the previous argument, any sequence of the form ( ξ ) is of type (mSSS). To complete the proof, it remains to conclude the “only if part”. Setting σ 1 = ξ , we have proven that the conditional distribution of σ 1 given σ 2 is the same as the conditional distribution of ϕ ( τ 1 , σ 2 ) given σ 2 . At this stage, Lemma A3 in the Appendix A yields that there is τ = [ ( I n ) n 1 , ( Z n ) n 1 ] such that ( ξ n ) n = ϕ ( τ , σ 2 ) a.s., i.e., ( ξ n ) n = ( Z I n ) n a.s. In addition, L ( τ , σ 2 ) = L ( τ 1 , σ 2 ) ; hence, the ( Z n ) n 1 are conditionally i.i.d. given H ˜ and the I n s are conditionally independent given [ Z + , Z , H ˜ , p ] with the conditional distribution defined by (3). □
Proposition 3.
An exchangeable sequence ξ = ( ξ n ) n 1 is an m S S S as defined in Definition 2 if and only if
( ξ n ) n 1 : = ( Z C n ( Π ) ) n 1 a . s .
where Z : = ( Z n ) n 1 is an exchangeable sequence with the same law of Z, Π is an exchangeable partition with EPPF q and Π and Z are independent.
Remark 1.
Note that the Z n s appearing in (16) are not the same Z n s appearing in Definition 2, although they have the same law.
Proof of Proposition 3.
If ξ is mSSS, then, by Proposition 2, we know that ξ = ( Z I n ) n 1 . Let Π = Π * ( I ) be the partition induced by ( I n ) n 1 ; then, Π has EPPF q by Kingman’s theorem 1. Denote by I 1 * = I 1 , I 2 * , , I K * (with K + ) the distinct values of ( I n ) n 1 in order of appearance, and set
Z n = Z I n * n = 1 , , K .
When K < + , set I K + j * = D + j , where D = max { i : i = I n * for n K } , and define the remaining Z m for m > K accordingly as Z m = Z I m * . Let { i 1 , , i M } be integers in Z \ { 0 } , and denote the distinct values in ( i 1 , , i M ) in order of appearance by ( i 1 * , , i m * ) . Let A 1 , , A n be measurable sets in X , if n > m , then
P { Z 1 A 1 , , Z n A n , I 1 = i 1 , , I M = i M } = 1 , , n m P { Z i 1 * A 1 , , Z i m * A m , Z 1 A m + 1 , Z n m A n , I 1 = i 1 , , I M = i M , I m + 1 * = 1 , , I n * = n m }
where the sum runs over all the non-zero distinct integers 1 , , n m different from i 1 * , , i m * . Since I * is a function of I and I and Z are independent, it follows that
P { Z i 1 * A 1 , , Z i m * A m , Z 1 A m + 1 , Z n m A n , I 1 = i 1 , , I M = i M , I m + 1 * = 1 , , I n * = n m } = P { Z i 1 * A 1 , , Z i m * A m , Z 1 A m + 1 , , Z n m A n } P { I 1 = i 1 , , I M = i M , I m + 1 * = 1 , , I n * = n m } = P { Z 1 A 1 , , Z n A n } P { I 1 = i 1 , , I M = i M , I m + 1 * = 1 , , I n * = n m }
where the second equality follows by exchangeability. Summing in , one obtains
P { Z 1 A 1 , , Z n A n , I 1 = i 1 , , I M = i M } = P { Z 1 A 1 , , Z n A } P { I 1 = i 1 , , I M = i M } .
For m n , the sum is not needed and the same result follows. This shows that ( Z n ) n is an exchangeable sequence with the same law of Z, and ( Z n ) n and ( I n ) n 1 are independent. To conclude, note that, with probability one, I C n ( Π ) * = I n , and hence
ξ n = Z I n = Z I C n ( Π ) * = Z C n ( Π ) .
Conversely, let us assume that ξ n = Z C n ( Π ) and let ( p j ) j 1 be the weights obtained from Π by (4). Let I 1 , I 2 , be the integer-valued random variables appearing in Theorem 1 such that Π = Π * ( I ) a.s. It follows that C n ( Π ) = C n ( Π * ( I ) ) and I C n ( Π ) * = I n , where the I n * are defined as above for n K . Setting
Z m : = Z k if I k * = m Z m if I k * m k ,
with Z m , m Z conditionally i.i.d. given H ˜ with law H ˜ , independent of everything else. Arguing as above, one can check that the ( Z m ) m Z , m 0 are exchangeable random variables with the same law of Z independent of ( I , p ) . To conclude, note that, in particular,
Z I n = Z I C n ( Π ) * = Z C n ( Π ) a . s . .
The conclusion follows by Proposition 2. □
A simple consequence of the previous proposition is the following.
Corollary 1.
Let ( ξ n ) n 1 be an m S S S as defined in Definition 2. For every A 1 , , A n Borel set in X ,
P ξ 1 A 1 , , ξ n A n = π n P n q ( | π 1 , n | , , | π k , n | ) E c = 1 | π n | H ˜ ( j π c , n A j ) .

4. Random Partitions Induced by mSSS

Let Π ˜ = Π * ( ξ ) be the random partition induced by an exchangeable sequence ξ defined as in Definition 2, and let Π ( 0 ) : = Π * ( Z ) be the partition induced by the corresponding exchangeable sequence ( Z n ) n (see Proposition 3). Finally, let Π be the partition with EPPF q appearing in Proposition 3. As already observed, if Z is an i.i.d. sequence from a diffuse H, then Π ( 0 ) = [ ( 1 ) , ( 2 ) , ( 3 ) , ] a.s. and hence Π * ( ξ ) = Π . The same result follows if Z is exchangeable without ties (see Corollary 2). When Π ( 0 ) is not the trivial partition, it is clear by construction that different blocks in Π can merge in the final clustering configuration (i.e., Π * ( ξ ) ). In other words, two observations can share the same value because either they belong to the same block in the latent partition Π or they are in different blocks but they share the same value (from Z ). This simple observation leads us to write the EPPF of the random partition Π * ( ξ ) using the EPPF of Π ( 0 ) and of Π .

4.1. Explicit Expression of the EPPF

If π ˜ n = { π ˜ 1 , n , π ˜ k , n } is a partition of [ n ] with | π ˜ i , n | = n i ( i = 1 , , k ) and n = ( n 1 , , n k ) , we can easily describe all the partitions π n more finely than π ˜ n , which are compatible with π ˜ n in the merging process described above. To do this, first of all, note that any block π ˜ i , n can arise from the union of 1 m i n i blocks in the latent partition. Hence, given n = ( n 1 , , n k ) , where n = i = 1 k n i , we define the set
M ( n ) = m = ( m 1 , , m k ) N k : 1 m i n i .
See Figure 1 for an example. Once a specific configuration m in M ( n ) is considered, the m i blocks of the latent partition contributing to the block π ˜ i , n , are characterized by the sufficient statistics λ i = ( λ i 1 , , λ i n i ) N n i , where λ i j is the number of blocks of j elements among the m i blocks above. This leads, for m in M ( n ) , to the definition of
Λ ( n , m ) : = λ = [ λ 1 , , λ k ] where λ i = ( λ i 1 , , λ i n i ) N n i : j = 1 n i j λ i j = n i , j = 1 n i λ i j = m i for i = 1 , , k .
In summary, the set of partitions π ˜ n , which are compatible with π ˜ n in the merging process described above, can be written as
P π ˜ n : = m M ( n ) λ Λ ( n , m ) P π ˜ n ( λ )
where P π ˜ n ( λ ) is the set of all the partitions in P n with m 1 + + m k = : | m | blocks such that
  • it is possible to determine k subset containing m 1 , , m k of these blocks;
  • the union of the blocks in the i-th subset coincides with the i-th block of π ˜ n for i = 1 , , k ;
  • in the i-th block, there are λ i j blocks with j elements, for j = 1 , , n i .
Given the EPPF q , let
q ¯ ( λ ) : = q ( n 11 , , n 1 m 1 , n 21 , , n k m k ) ,
where ( n 11 , , n 1 m 1 , , n k m k ) is any sequence of integer numbers such that c = 1 m i n i c = j j λ i j for every i and # { c : n i c = j } = λ i j for every i and j. Note that since the value of q ( n 11 , , n 1 m 1 , n 21 , , n k m k ) depends only on the statistics λ , q ¯ ( λ ) is well-defined. See, e.g., [26].
Finally, recall that the cardinality of P π ˜ n ( λ ) is
c ( λ ) : = i = 1 k ( j j λ i j ) ! j = 1 n i λ i j ! ( j ! ) λ i j = i = 1 k n i ! j = 1 n i λ i j ! ( j ! ) λ i j ,
See Equation (39) in [15].
Proposition 4.
Let ξ = ( ξ n ) n 1 be an ( m S S S ) . Denote by Π ˜ = Π * ( ξ ) the random partition induced by ξ and by q ( 0 ) the EPPF of the partition induced by ( Z n ) n 1 . If π ˜ n = { π ˜ 1 , n , π ˜ k , n } is a partition of [ n ] with | π ˜ i , n | = n i ( i = 1 , , k ) and n = ( n 1 , , n k ) , then
P { Π ˜ n = π ˜ n } = m M ( n ) q ( 0 ) ( m ) λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) .
Proof. 
Start by writing
P { Π ˜ n = π ˜ n } = P ( m M ( n ) λ Λ ( n , m ) π n P π ˜ n ( λ ) { Π n = π n , Π ˜ n = π ˜ n } ) ,
which gives
P { Π ˜ n = π ˜ n } = m M ( n ) λ Λ ( n , m ) π n P π ˜ n ( λ ) P { Π ˜ n = π ˜ n | Π n = π n } P { Π n = π n }
Whenever π n P π ˜ n ( λ ) ,
P { Π n = π n } = q ¯ ( λ ) .
Therefore, we can write (20) as
P { Π ˜ n = π ˜ n } = m M ( n ) λ Λ ( n , m ) π n P π ˜ n ( λ ) P { Π ˜ n = π ˜ n | Π n = π n } q ¯ ( λ ) .
Define now the function M π ˜ n , π n : { 1 , , | m | } { 1 , , | π ˜ n | } as M ( j ) = i if π j , n is in the i-th subset of blocks, i.e., if π j , n π ˜ i , n . Recalling that k is the number of blocks in π ˜ n , define now a partition π ( M π ˜ n , π n ) on { 1 , , | m | } with k block where the i-th block is
{ j : M π ˜ n , π n ( j ) = i } .
Recalling that Π ( 0 ) is the partition induced by the Z s, one has
{ Π ˜ n = π ˜ n , Π n = π n } = { Π | m | ( 0 ) = π ( M π ˜ n , π n ) , Π n = π n }
which gives
P { Π ˜ n = π ˜ n | Π n = π n } = P { Π | m | ( 0 ) = π ( M Π ˜ n , π n ) | Π n = π n } = P { Π | m | ( 0 ) = π ( M Π ˜ n , π n ) }
since Π ( 0 ) and Π are independent. To conclude, note that the vector of the cardinalities of the blocks in π ( M π ˜ n , π n ) is m ; hence, if q ( 0 ) is the EPPF of Π ( 0 ) , one has P { Π | m | ( 0 ) = π ( M π ˜ n , π n ) } = q ( 0 ) ( m ) . Since the cardinality of P π ˜ n ( λ ) is c ( λ ) , one obtains the thesis. □
Corollary 2.
Let ξ = ( ξ n ) n be defined according to (mSSS). If P { Z 1 = Z 2 } = 0 , then Π * ( ξ ) = Π with probability one.
Proof. 
If P { Z 1 = Z 2 } = 0 , by exchangeability, P { Z i 1 = Z i 2 = = Z i k } P { Z 1 = Z 2 } = 0 . Hence, the Z i s are distinct with probability one. Since ( ξ 1 , , ξ n ) = ( Z C 1 ( Π ) , , Z C n ( Π ) ) by Proposition 3, it follows that Π ˜ n = Π n . □
Remark 2.
Note that, as a special case, we recover the fact that if ξ is a g S S S ( q , H ) with H diffuse (i.e., it is a S S S ( q , H ) ), then the random partition induced by ξ is a.s. Π.
Remark 3.
The fact that the EPPF of Π ˜ n is q when P { Z 1 = Z 2 } = 0 can be deduced from (18). Indeed, if P { Z 1 = Z 2 } = 0 , then the partition induced by Z is a.s. the trivial partition [ ( 1 ) , ( 2 ) , ( 3 ) , ] , so that q ( 0 ) ( m ) = 0 for every m ( 1 , 1 , , 1 ) . For m = ( 1 , 1 , , 1 ) , π ˜ n = { π ˜ 1 , n , π ˜ k , n } with | π ˜ i , n | = n i ( i = 1 , , k ) , and n = ( n 1 , , n k ) , the set Λ ( n , m ) reduces to the singleton λ ( 1 ) : = [ λ 1 , , λ k ] , where λ i = ( 0 , 0 , , 1 ) with λ i of length n i . Hence, q ¯ ( λ ( 1 ) ) = q ( n ) and (18) gives P { Π ˜ n = π ˜ n } = q ¯ ( λ ( 1 ) ) = q ( n ) .

4.2. EPPF When Π Is of Gibbs Type

An important class of exchangeable random partitions is that of Gibbs-type partitions, introduced in [32] and characterized by the EPPF
q ( n 1 , , n k ) : = V n , k j = 1 k ( 1 σ ) n j 1 ,
where ( x ) n = x ( x + 1 ) ( x + n 1 ) , σ < 1 and V n , k are positive real numbers such that V 1 , 1 = 1 and
( n σ k ) V n + 1 , k + V n + 1 , k + 1 = V n , k , n 1 , 1 k n .
A noteworthy example of Gibbs-type EPPF is the so-called Pitman–Yor two-parameter family. It is defined by
q ( n 1 , , n k ) : = i = 1 k 1 ( θ + i σ ) ( θ + 1 ) n 1 c = 1 k ( 1 σ ) n c 1 ,
where 0 σ < 1 and θ > σ ; or σ < 0 and θ = | σ | m for some integer m; see [2,31].
In order to state the next result, we recall that
( λ 1 , , λ n ) j = 1 n j λ j = n , j = 1 n λ j = k j = 1 n [ ( 1 σ ) j 1 ] λ j n ! j = 1 n λ i ! ( j ! ) λ j = S σ ( n , k )
where S σ ( n , k ) is the generalized Stirling number of the first kind; see (3.12) in [26]. In the same book, various equivalent definitions of generalized Stirling numbers are presented.
Corollary 3.
Let Π ˜ = Π * ( ξ ) be defined as in Proposition 4 with q of Gibbs type defined in (22). If π ˜ n = { π ˜ 1 , n , π ˜ k , n } is a partition of [ n ] with | π ˜ i , n | = n i ( i = 1 , , k ) and n = ( n 1 , , n k ) , then
P { Π ˜ n = π ˜ n } = m M ( n ) q ( 0 ) ( m ) V n , | m | i = 1 k S σ ( n i , m i ) .
Proof. 
Combining Proposition 4 with (22), one obtains
P { Π ˜ n = π ˜ n } = m M ( n ) q ( 0 ) ( m ) V n , | m | λ Λ ( n , m ) i = 1 k j = 1 n i [ ( 1 σ ) j 1 ] λ i , j n i ! j = 1 n i λ i j ! ( j ! ) λ i j = m M ( n ) q ( 0 ) ( m ) V n , | m | × i = 1 k ( λ i 1 , , λ i n i ) j = 1 n i j λ i j = n i , j = 1 n i λ i j = m i j = 1 n i [ ( 1 σ ) j 1 ] λ i , j n i ! j = 1 n i λ i j ! ( j ! ) λ i j = m M ( n ) q ( 0 ) ( m ) V n , | m | i = 1 k S σ ( n i , m i ) .
 □

4.3. The EPPF of a g S S S ( q , H )

As a special case, we now consider the partition induced by a g S S S ( q , H ) with general base measure H. For the rest of the section, it is useful to decompose H as
H ( d x ) = i 1 a i δ x ¯ i ( d x ) + ( 1 a ) H c ( d x )
where X 0 : = { x ¯ 1 , x ¯ 2 , } is the collection of points with positive H probability, a i = H ( x ¯ i ) , a = H ( X 0 ) [ 0 , 1 ] and H c ( · ) = H ( · X 0 c ) / H ( X 0 c ) is a diffuse probability measure on X . The sum is assumed taken over i { 1 , , | X 0 | } .
We now describe q ( 0 ) , i.e., the EPPF of the partition induced by ( Z n ) n 1 . Let m in M ( n ) , where n = ( n 1 , , n k ) , and assume that the realization of Π | m | ( 0 ) has k blocks of cardinality m 1 , , m k . Set z i = 0 if the Z n corresponding to the i-th block of Π | m | ( 0 ) comes from the diffuse component H c , while z k = if it is equal to x ¯ . Since the blocks in Π ( 0 ) need to be associated with different values of the Z n , one has that necessarily z i = z j = 0 if z i = z j for i j . In this case, the block is a singleton, which is m i = m j = 1 . On the other hand, if m i 2 , i.e., a merging occurred, necessarily, z i > 0 . Note that it is also possible that m i = 1 but z i > 0 . This motivates the definition of the set
Z ( m ) : = { ( z 1 , , z k ) { 0 , 1 , , | X 0 | } k : if z i = z j for i j then z i = z j = 0 and m i = m j = 1 ; if m i 2 then z i > 0 }
for m in M ( n ) where n = ( n 1 , , n k ) . The probability of obtaining, in an i.i.d. sample of length | m | from H, exactly k ordered blocks with cardinality m 1 , , m k , such that observations in each block are equal and observations in distinct blocks are different, is
H # ( m ) : = ( z 1 , , z k ) Z ( m ) ( 1 a ) # { j : z j = 0 } j : z j > 0 a z j m j .
By exchangeability, H # ( m ) turns out to be q ( 0 ) ( m ) . Note also that if a = 1 , H # ( m ) reduces to
( z 1 , , z k ) j = 1 k a z j m j
where z 1 , , z k runs over all distinct positive integers (less than or equal to | X 0 | if X 0 is finite), which is nothing else (5) for deterministic weights.
To rewrite H # ( m ) in a different way, given m = ( m 1 , , m k ) in M ( n ) , let m * be the vector containing all the elements m i > 1 and let r be its length, with possibly r = 0 if m = ( 1 , 1 , , 1 ) , and define for 0
A m , = j 1 j r + a j 1 m 1 * a j r m r * a j r + 1 a j r +
with the convention that A m , 0 = 1 when r = 0 . A simple combinatorial argument shows that
H # ( m ) = = 0 k r ( 1 a ) k r k r A m , .
Proposition 4 gives immediately the next proposition.
Proposition 5.
Let ξ be a g S S S ( q , H ) and let Π ˜ = Π * ( ξ ) be the random partition induced by ξ. If π ˜ n = [ π ˜ 1 , n , π ˜ k , n ] is a partition of [ n ] with | π ˜ i , n | = n i ( i = 1 , , k ) and n = ( n 1 , , n k ) , then
P { Π ˜ n = π ˜ n } = m M ( n ) H # ( m ) λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) .
Remark 4.
Once again, if H is diffuse, then H # ( m ) = 0 for every m ( 1 , 1 , , 1 ) . Hence, the above formula reduces to the familiar
P { Π ˜ n = π ˜ n } = q ( | π ˜ n , 1 | , , | π ˜ n , k | ) = P { Π n = π ˜ n } .

4.4. EPPF for g S S S with Spike-and-Slab Base Measure

We now consider the special case of g S S S with a spike-and-slab base measure. A spike-and-slab measure is defined as
H ( d x ) = a δ x 0 ( d x ) + ( 1 a ) H c ( d x )
where a ( 0 , 1 ) , x 0 is a point of X and H c is a diffuse measure on X . This type of measure has been used as a base measure in the Dirichlet process by [16,17,18,19,20] and in the Pitman–Yor process by [21].
Here, we deduce by Proposition 5 the explicit form of the EPPF of the random partition induced by a sequence sampled from a species sampling random probability with such a base measure.
Proposition 6.
Let H be as in (26), Π ˜ be the random partition induced by a g S S S ( q , H ) and Π be an exchangeable random partition with EPPF q . If π n = { π 1 , n , π k , n } is a partition of [ n ] with | π i , n | = n i ( i = 1 , , k ), then
P { Π ˜ n = π n } = ( 1 a ) k q ( n 1 , , n k ) + ( 1 a ) k 1 i = 1 k q ( n 1 , , n i 1 , n i + 1 , , n k ) r = 1 n i a r q n ( r | n 1 , , n i 1 , n i + 1 , , n k )
where, conditionally on the fact that Π n n i has k 1 blocks with sizes n 1 , , n i 1 , n i + 1 , , n k , the probability that Π n has k 1 + r blocks is denoted by q n ( r | n 1 , , n i 1 , n i + 1 , , n k ) . If, in addition, q is of Gibbs type (22), then
P { Π ˜ n = π n } = ( 1 a ) k V n , k j = 1 k ( 1 σ ) n j 1 + ( 1 a ) k 1 i = 1 k j = 1 , j i k ( 1 σ ) n j 1 r = 1 n i a r V n , k 1 + r S σ ( n i , r ) .
Proof. 
In this case, H # ( m ) = 0 if m i 2 and m j 2 for some i j because H has only one atom. Moreover, H # ( m ) is clearly symmetric and
H # ( 1 , 1 , 1 , , 1 ) = ( 1 a ) k + k ( 1 a ) k 1 a
H # ( m , 1 , , 1 ) = a m ( 1 a ) k 1 for m > 1 .
By Proposition 5,
P { Π ˜ n = π n } = [ ( 1 a ) k + k ( 1 a ) k 1 a ] q ( n 1 , , n k ) + ( 1 a ) k 1 i = 1 k m i = 2 n i a m i λ Λ ( m ) c ( λ ) q ¯ ( λ ) = [ ( 1 a ) k + k ( 1 a ) k 1 ] q ( n 1 , , n k ) + + ( 1 a ) k 1 i = 1 k r = 2 n i a r λ Λ ( m ) for m : m i = r , m j = 1 , j i c ( λ ) q ( n 1 , , n i 1 , n r ( i ) , n i + 1 , , n k ) = ( 1 a ) k q ( n 1 , , n k ) + ( 1 a ) k 1 i = 1 k r = 1 n i a r λ Λ ( m ) for m : m i = r , m j = 1 , j i c ( λ ) q ( n 1 , , n i 1 , n r ( i ) , n i + 1 , , n k )
where n r ( i ) is any vector of r positive integers with sum n i such that λ i j of them are equal to j. In view of the definition of c ( λ ) , formula (27) is immediately obtained.
If q is of Gibbs type, taking into account (24), then
q n ( r | n 1 , , n i 1 , n i + 1 , , n k ) = V n , k 1 + r V n n i , k 1 S σ ( n i , r )
and the second part of the thesis follows by simple algebra. □
Applying Proposition 6 to the Pitman–Yor EPPF defined in (23), one immediately recovers the results stated in Theorem 1 and Corollary 1 of [21].

5. Predictive Distributions

In this section, we provide some expressions for the predictive distributions of mixtures of species sampling sequences.

5.1. Some General Results

Let ξ be as in Definition 2 and let ( Z n ) n and Π n be the sequence of exchangeable random variables and the exchangeable random partition appearing in Proposition 3. Let
G n = σ ( Z 1 , , Z | Π n | , Π n )
be the σ -field generated by ( Z 1 , , Z | Π n | , Π n ) . By Proposition 3, one has ξ n = Z C n ( Π ) a.s.; hence, ξ n is G n measurable. Note that, in general, σ ( ξ 1 , , ξ n ) can be strictly contained in G n . Set Ξ n : = | Π n | and, for any k 1 , let K k + 1 ( · | · ) be a kernel corresponding to the conditional distribution of Z k + 1 given Z 1 , , Z k (i.e., the k + 1 -predictive distribution of the exchangeable sequence Z ). Finally, recall that Π ˜ = Π * ( ξ ) is the partition induced by ξ and define ξ 1 : K ˜ n * = ( ξ 1 * , , ξ K ˜ n * ) as the distinct values in order of appearance of ξ 1 : n : = ( ξ 1 , , ξ n ) with K ˜ n = | Π ˜ n | .
Proposition 7.
Let ξ as in Definition 2. Then,
P { ξ n + 1 · | G n } = = 1 Ξ n ω n , ( Π n ) δ Z ( · ) + ν n ( Π n ) K Ξ n + 1 ( · | Z 1 , , Z Ξ n )
where ν n and ω n , are defined by (6) and (7). If P { Z 1 = Z 2 } = 0 , then
P { ξ n + 1 · | ξ 1 , , ξ n } = P { ξ n + 1 · | ξ 1 * , , ξ K ˜ n * , Π ˜ n } = = 1 K ˜ n ω n , ( Π ˜ n ) δ ξ * ( · ) + ν n ( Π ˜ n ) K K ˜ n + 1 ( · | ξ 1 * , , ξ K ˜ n * ) .
Proof. 
Set
E n e w * = { Ξ n + 1 = Ξ n + 1 } .
Since ξ n = Z C n ( Π ) , one can write
P { ξ n + 1 A | G n } = = 1 Ξ n P { ξ n + 1 A , n + 1 Π , n | G n } + P { ξ n + 1 A , E n e w * | G n } = = 1 Ξ n P { Z A , n + 1 Π , n | G n } + P { Z Ξ n + 1 A , E n e w * | G n } = = 1 Ξ n δ Z ( A ) P { n + 1 Π , n | G n } + P { Z Ξ n + 1 A , E n e w * | G n }
Now, since Π and ( Z n ) n are independent, it follows that P { n + 1 Π , n | G n } = P { n + 1 Π , n | Π n } = ω n , ( Π n ) and also
P { Z Ξ n + 1 A , E n e w * | G n } = P { Z Ξ n + 1 A | Z 1 , , Z Ξ n } P { E n e w * | Π n } = K Ξ n + 1 ( A | Z 1 , , Z Ξ n ) ν n ( Π n ) .
Combining all the claims, one obtains (28). The second part of the proof follows since, if P { Z 1 = Z 2 } = 0 , the Z i s are distinct with probability one. Since ( ξ 1 , , ξ n ) = ( Z C 1 ( Π ) , , Z C n ( Π ) ) , it follows that Π ˜ n = Π n , Ξ n = K ˜ n and ( ξ 1 * , , ξ K ˜ n * ) = ( Z 1 , , Z Ξ n ) with probability one and G n = σ ( ξ 1 , , ξ n ) . Hence, (29) follows from (28). □
Remark 5.
Note that (29) can be also derived as follows. P { Z 1 = Z 2 } = 0 is equivalent to the fact that H ˜ is almost sure diffuse. Hence, conditionally on H ˜ , we have a S S S ( q , H ˜ ) ; then, by (PS2) in Section 2.2, one has
P { ξ n + 1 · | ξ 1 , , ξ n , H ˜ } = = 1 K ˜ n ω n , ( Π ˜ n ) δ ξ * ( · ) + ν n ( Π ˜ n ) H ˜ ( d x ) .
Taking the conditional expectation of the previous equation, given ξ 1 , , ξ n , we obtain
P { ξ n + 1 A | ξ 1 , , ξ n } = = 1 K ˜ n ω n , ( Π ˜ n ) δ ξ * ( A ) + ν n ( Π ˜ n ) E [ H ˜ ( A ) | ξ 1 , , ξ n ]
and the thesis follows since one can check (arguing as in the proof of the proposition) that
E [ H ˜ ( A ) | ξ 1 , , ξ n ] = E [ H ˜ ( A ) | Z 1 , , Z K ˜ n ] = K K ˜ n + 1 ( A | Z 1 , , Z K ˜ n ) .
Assume now that the random variables Z j are defined on X by a Bayesian model with likelihood f ( z j | u ) and prior Q ( u ) , where f is a density with respect to a dominating measure λ and Q is a probability measure defined on a Polish space U (the space of parameters). In other words,
P { Z 1 A 1 , , Z k A k } = U A 1 × A 2 × A k j = 1 k f ( z j | u ) λ ( d z 1 ) λ ( d z k ) Q ( d u ) .
Note that this means that H ˜ ( A ) = A f ( z | u ˜ ) λ ( d z ) , where u ˜ Q . Bayes’ theorem (see, e.g., Theorem 1.31 in [33]) gives
P { Z k + 1 d z k + 1 | Z 1 , , Z k } = U f ( z k + 1 | u ) Q ( d u | Z 1 , , Z k ) λ ( d z k + 1 )
where Q ( d u | Z 1 , , Z k ) is the usual posterior distribution, which is
Q ( d u | Z 1 , , Z k ) : = j = 1 k f ( Z j | u ) Q ( d u ) U j = 1 k f ( Z j | v ) Q ( d v ) .
If λ is a diffuse measure, one obtains P { Z 1 = Z 2 } = 0 . Hence, (29) in Proposition 7 applies and one has
P { ξ n + 1 d x | ξ 1 , , ξ n } = = 1 K ˜ n ω n , ( Π ˜ n ) δ ξ * ( d x ) + ν n ( Π ˜ n ) U f ( x | u ) Q ( d u | ξ 1 * , , ξ K ˜ n * ) d x .
For example, one can apply this result to a mixture of Dirichlet processes in the sense of [10], as briefly described at the end of Section 3.1. Assume that α u ˜ ( X ) and H ˜ ( · ) = α u ˜ ( · ) / α u ˜ ( X ) are independent and that α u ( A ) / α u ( X ) = Z f ( z | u ) λ ( d z ) for a suitable dominating diffuse measure λ .
Under these hypotheses, a sample ( ξ n ) n 1 from a mixture of Dirichlet processes is an mSSS with q described in (15) and, in addition, P { Z 1 = Z 2 } = 0 . Combining (15) with (6) and (7), one obtains
ω n , ( Π ˜ n ) = | Π ˜ n , | U α u ( X ) K ˜ n ( α u ( X ) ) n + 1 Q ( d u ) U α u ( X ) K ˜ n ( α u ( X ) ) n Q ( d u )
and
ν n ( Π ˜ n ) = U α u ( X ) K ˜ n + 1 ( α u ( X ) ) n + 1 Q ( d u ) U α u ( X ) K ˜ n ( α u ( X ) ) n Q ( d u )
Hence, the predictive distribution of ξ n + 1 given ( ξ 1 , , ξ n ) is (33) for ω n , ( Π ˜ n ) and ν n ( Π ˜ n ) given above.
Note that the same result can be deduced by combining Lemma 1 and Corollary 3.2’ in [10].
Example 1
(Species Sampling NIG). Let Z n be defined as a mixture of normal random variables with Normal-Inverse-Gamma prior. In other words, given μ 0 R , k 0 > 0 , α 0 > 0 , β 0 > 0 ,
Z n | μ ˜ , σ ˜ 2 i . i . d . N ( μ ˜ , σ ˜ 2 ) μ ˜ | σ ˜ 2 N ( μ 0 , σ ˜ 2 / k 0 ) σ ˜ 2 In Γ ( α 0 , β 0 )
where N ( μ , σ 2 ) denotes a normal distribution of mean μ and variance σ 2 and In Γ ( α , β ) is the inverse gamma distribution with shape α and scale β. Let T ν ( · | μ , σ 2 ) be the density of a Student-T distribution with ν degrees of freedom and ( μ , σ ) position/scale parameters, i.e.,
T ν ( x | μ , σ 2 ) : = 1 σ 2 Γ ν + 1 2 ν π Γ ( ν 2 ) 1 + 1 ν σ 2 x μ 2 ν + 1 2 .
It is well known that, under these assumptions, K k + 1 ( A | z 1 , , z k ) has density T 2 α k ( z | μ k , σ k 2 ) , where the parameters are updated
μ k = k 0 μ 0 + k z ¯ n k 0 + n z ¯ k = 1 k j = 1 k z j α k = α 0 + k / 2 , σ k 2 = β 0 + 1 2 j = 1 k ( z j z ¯ k ) 2 + k k 0 ( z ¯ k μ 0 ) 2 2 ( n + k 0 ) ( k 0 + k + 1 ) ( α 0 + k / 2 ) ( k 0 + k )
Thus, in this case, if z 1 , , z k are distinct real numbers and π n = [ π 1 , n , , π k , n ] , one has
P { ξ n + 1 d x | ξ 1 * = z 1 , , ξ k * = z k , Π * ( ξ ) = π n } = = 1 k ω n , ( π n ) δ ξ * ( d x ) + ν n ( π n ) T 2 α k ( x | μ n , σ k 2 ) d x .
We show an application of (33) to a true dataset by choosing ω n , and ν n according to a Pitman–Yor two-parameter family; see (23). The data are the relative changes in reported larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties, taken from Section 2.1 of [34]. We apply our models to both the raw data and the rounded data (approximated to the second digit) in order to obtain ties in the ξs. In the evaluation of the predictive CDFs, we fix μ 0 = 0 , α 0 = 0.1 , β 0 = 0.1 and k 0 = 0.1 . In Figure 2, we report the empirical CDF of the rounded data (solid line), the predictive CDF obtained from (33) (dotted line) and the predictive CDF of a Pitman–Yor species sampling sequence (see PS2) with H = T 2 α 0 ( · | μ 0 , σ 0 2 ) , σ 0 2 = β 0 ( k 0 + 1 ) / α 0 k 0 (dashed line). Similar plots are reported in Figure 3, with raw data in place of the rounded data. Note that in all the various settings, the influence of the hyper-parameters ( θ , σ ) is stronger in the CDF of the simple Pitman–Yor species sampling model with respect to the corresponding predictive CDF derived from (33).

5.2. Predictive Distributions for g S S S

We now deduce an explicit form for the predictive distribution of a g S S S with general base measure H given in (25).
Recall that we denote by Π ˜ n the partition induced by ξ 1 : n , with K ˜ n = | Π ˜ n | , and by Π n the latent partition appearing in Proposition 3. We also set
ζ i = if ξ i * = x ¯ 0 if ξ i * X 0 c .
The variable ζ i is a discrete random variable that takes value 0 if ξ i * comes from the diffuse component of H.
Let P π ˜ n , z 1 : k P π ˜ n be the set of all the possible configurations of Π n that are compatible with the observed partition Π ˜ n = π ˜ n and the additional information given by ζ 1 : K ˜ n = z 1 : k , K ˜ n = k . In order to describe this set, observe that if z i > 0 , then the block π ˜ i , n may arise from the union of more blocks in π n , while, if z i = 0 , then π ˜ i , n = π ϕ ( i ) , n for some ϕ . Note that it may happen that ϕ ( i ) i .
Recalling that the elements in m = ( m 1 , , m k ) in (17) are used to describe the numbers of sub-blocks into which the blocks of π ˜ n have been divided to form the latent partition π n , it turns out that the set P π ˜ n , z 1 : k has the additional constraint m i = 1 whenever z i = 0 . These considerations yield that, starting from π ˜ n and z 1 : k , the set of admissible m can also be described by resorting to the definition of Z ( m ) as follows:
M ( n , z 1 : k ) : = m M ( n ) : z 1 : k Z ( m ) = m N k : m i = 1 if z i = 0 , 1 m i n i if z i > 0 .
With this position, one has
P π ˜ n , z 1 : k = m M ( n ) : z 1 : k Z ( m ) λ Λ ( n , m ) P π ˜ n ( λ ) = m M ( n , z 1 : k ) λ Λ ( n , m ) P π ˜ n ( λ ) ,
where Λ ( n , m ) and P π ˜ n ( λ ) have been defined in Section 4.1.
For any m in M ( n , z 1 : k ) and any λ in Λ ( n , m ) , we define
λ n e w : = [ λ 1 , , λ k , 1 ] .
In other words, λ n e w corresponds to the configuration obtained from λ by adding one new element as a new block. In the following, let N ˜ = ( | Π ˜ 1 , n | , , | Π ˜ K ˜ n , n | ) , and let N ˜ i + be obtained from N ˜ by adding 1 to its i-th component.
Proposition 8.
Let ( ξ n ) n 1 be a g S S S ( q , H ) . Then, for any A in X ,
P { ξ n + 1 A | ξ 1 : n } = 1 Z n i = 1 K ˜ n w i δ ξ i * ( A ) + w 0 H ¯ n ( A ) a . s .
where
H ¯ n ( A ) : = : x ¯ ξ 1 : K ˜ n * a δ x ¯ ( A ) + ( 1 a ) H c ( A ) H ( X \ ξ 1 : K ˜ n * ) 1 , w i : = m M ( N ˜ i + , ζ 1 : K ˜ n ) j : ζ j > 0 a ζ j m j λ Λ ( N ˜ i + , m ) c ( λ ) q ¯ ( λ ) , w 0 : = H ( X \ ξ 1 : K ˜ n * ) m M ( N ˜ , ζ 1 : K ˜ n ) j : ζ j > 0 a ζ j m j λ Λ ( N ˜ , m ) c ( λ ) q ¯ ( λ n e w ) , Z n : = m M ( N ˜ , ζ 1 : K ˜ n ) j : ζ j > 0 a ζ j m j λ Λ ( N ˜ , m ) c ( λ ) q ¯ ( λ ) .
Proof. 
We start by defining the following events for i = 1 , , K ˜ n :
E i = { ξ n + 1 = ξ i * } , E n e w = { ξ n + 1 ξ 1 : K ˜ n * } .
Since conditioning on ξ 1 : n is equivalent to the condition on [ ξ 1 : K ˜ n * , Π ˜ n ] , one can write
P { ξ n + 1 A | ξ 1 : n } = i = 1 K ˜ n P { ξ n + 1 A , E i | ξ 1 : K ˜ n * , Π ˜ n } + P { ξ n + 1 A , E n e w | ξ 1 : K ˜ n * , Π ˜ n }
Now, set
E n e w * : = { C n + 1 ( Π ) = | Π n | + 1 }
and
E i * = { | C n + 1 ( Π ) | | Π n | and Π C n + 1 ( Π ) , n Π ˜ i , n } .
On ζ i = 0 , one has (up to zero probability sets)
{ ξ n + 1 A } E i = { ξ i * A } E i *
while, on ζ i > 0 (up to zero probability sets),
{ ξ n + 1 A } E i = { ξ i * A } E i * { ξ i * A } { Z | Π n | + 1 = x ¯ ζ i } E n e w * .
Note that (up to zero probability sets)
{ Z | Π n | + 1 = x ¯ ζ i } E n e w * { ζ i = 0 } = .
Hence,
P { ξ n + 1 A , E i | ξ 1 : K ˜ n * , Π ˜ n } = δ ξ i * ( A ) P { E i * ( { Z | Π n | + 1 = x ¯ ζ i } E n e w * ) | Π ˜ n , ξ 1 : K ˜ n * } .
Similarly, using that E n e w E n e w * , one obtains
P { ξ n + 1 A , E n e w | ξ 1 : n } = P { ξ n + 1 A , E n e w , E n e w * | Π ˜ n , ξ 1 : K ˜ n * } = P { ξ n + 1 A , E n e w | ξ 1 : K ˜ n * , E n e w * } P { E n e w * | Π ˜ n , ξ 1 : K ˜ n * } = H n ( A ) P { E n e w * | Π ˜ n , ξ 1 : K ˜ n * }
where
H n ( A ) = : x ¯ ξ 1 : K ˜ n * a ¯ δ x ¯ ( A ) + ( 1 a ) H c ( A ) .
At this stage, note that, by construction,
L ( ξ 1 : K ˜ n * | Π n , Z | Π n | + 1 , ζ 1 : K ˜ n , Π ˜ n , Π n + 1 ) = L ( ξ 1 : K ˜ n * | ζ 1 : K ˜ n )
where L ( ξ 1 : K ˜ n * | ζ 1 : K ˜ n ) is characterized by
P ( ξ 1 * A 1 , , ξ K ˜ n * A K ˜ n | ζ 1 : K ˜ n ) = i = 1 K ˜ n H c ( A i ) 1 { ζ i = 0 } + δ x ¯ ζ i ( A i ) 1 { ζ i > 0 } ,
and then
L ( Z | Π n | + 1 , ζ 1 : K ˜ n , Π ˜ n , Π n + 1 , ξ 1 : K ˜ n * | Π n ) = L ( Z | Π n | + 1 , ζ 1 : K ˜ n , Π ˜ n , Π n + 1 | Π n ) L ( ξ 1 : K ˜ n * | ζ 1 : K ˜ n ) .
Hence,
L ( Π n , Π n + 1 , Z | Π n | + 1 , ξ 1 : K ˜ n * , ζ 1 : K ˜ n , Π ˜ n ) = L ( Π n ) L ( Z | Π n | + 1 , ζ 1 : K ˜ n , Π ˜ n , Π n + 1 | Π n ) L ( ξ 1 : K ˜ n * | ζ 1 : K ˜ n )
which shows, in particular, that [ Π n , Π n + 1 , Z | Π n | + 1 , Π ˜ n ] and ξ 1 : K ˜ n * are conditionally independent given ζ 1 : K ˜ n . Since E i * , E n e w * and { Z | Π n | + 1 = x ¯ ζ i } E n e w * depend logically only on Π n + 1 , Z | Π n | + 1 , Π ˜ n , ζ 1 : K ˜ n , one obtains
P { E i * | Π ˜ n , ξ 1 : K ˜ n * } = P { E i * | Π ˜ n , ζ 1 : K ˜ n } , P { ( Z | Π n | + 1 = x ¯ ζ i ) E n e w * | Π ˜ n , ξ 1 : K ˜ n * } = P { ( Z | Π n | + 1 = x ¯ ζ i ) E n e w * | Π ˜ n , ζ 1 : K ˜ n }
and, finally,
P { E n e w * | Π ˜ n , ξ 1 : K ˜ n * } = P { E n e w * | Π ˜ n , ζ 1 : K ˜ n } .
Since [ Π ˜ n , ζ 1 : K ˜ n , K ˜ n ] are discrete random variables, we use the elementary definition of the conditional probability of events to evaluate the conditional distributions (38) and (39). Specifically, assume that K ˜ n = k , [ Π ˜ n , ζ 1 : K ˜ n ] = [ π ˜ n , z 1 : k ] , N ˜ = n , and, for a given event E, write
P { E | Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } = P { E , Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } P { Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } .
As for the denominator in (40), letting M n * = M ( n , z 1 : k ) and J = # { i : z i > 0 } , using (34), one obtains
P { Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } = π n P π ˜ n , z 1 : k P { Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k , Π n = π n } = m M n * λ Λ ( n , m ) π n P π ˜ n ( λ ) P { Π ˜ n = π ˜ n , ζ 1 : n = z 1 : k , Π n = π n } = ( 1 a ) k J m M n * j : z j > 0 a z j m j λ Λ ( n , m ) π n P π ˜ n ( λ ) q ¯ ( λ ) = ( 1 a ) k J m M n * j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) .
As for the numerators in (40), when E = E n e w * , we start from
P { E n e w * , Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k , Π n = π n } = P { E n e w * | Π n = π n } P { Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k , Π n = π n } = P { E n e w * | Π n = π n } ( 1 a ) k J j : z j > 0 a z j m j q ¯ ( λ ) = ( 1 a ) k J j : z j > 0 a z j m j q ¯ ( λ n e w ) .
where, in the last equality, we used that for π n P π ˜ n ( λ ) , one has
P { E n e w * | Π n = π n } = q ¯ ( λ n e w ) q ¯ ( λ ) .
Taking the sum over P π ˜ n , z 1 : k gives
P { E n e w * , Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } = ( 1 a ) k J m M n * j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ n e w ) .
Combining these with (39) and (40) and recalling that M n * = M ( n , z 1 : k ) , one obtains
P { E n e w * | Π ˜ n , ξ 1 : K ˜ n * } = m M ( n , z 1 : k ) j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ n e w ) m M ( n , z 1 : k ) j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) .
Finally, it remains to consider (40) when E = E i * ( { Z | Π n | + 1 = x ¯ ζ i } E n e w * ) . Now, observe that
( E i * ( { Z | Π n | + 1 = x ¯ ζ i } E n e w * ) ) { Π ˜ n = π ˜ n , ζ 1 : K ˜ n = z 1 : k } = { Π ˜ n + 1 = π ˜ n i + , ζ 1 : K ˜ n = z 1 : k } = { Π ˜ n + 1 = π ˜ n i + , ζ 1 : K ˜ n + 1 = z 1 : k }
where π ˜ n i + denote the partition in P n + 1 obtained from π ˜ n by adding n + 1 to the i-th block of π ˜ n . Note that, for the second equality, we used that, on Π ˜ n + 1 = π ˜ n i + , one has K ˜ n + 1 = K ˜ n = k .
Hence, using (34) with π ˜ n i + in place of π ˜ n , one concludes that
P { Π ˜ n + 1 = π ˜ n i + , ζ 1 : K ˜ n + 1 = z 1 : k } = m M ( n i + , z 1 : k ) λ Λ ( n , m ) π n + 1 P π ˜ n + 1 ( λ ) P { Π ˜ n + 1 = π ˜ n i + , Π n + 1 = π n + 1 , ζ 1 : K ˜ n + 1 = z 1 : k } = ( 1 a ) k J m M ( n i + , z 1 : k ) j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ )
where n i + = ( n 1 , , n i + 1 , , n k ) . Hence, by (38)–(40), one can write
P { E i * ( { Z | Π n | + 1 = x ¯ ζ i } E n e w * ) | Π ˜ n , ξ 1 : K ˜ n * } = m M ( n i + , z 1 : k ) j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) m M ( n , z 1 : k ) j : z j > 0 a z j m j λ Λ ( n , m ) c ( λ ) q ¯ ( λ ) .
Combining these results, one obtains the thesis. □

6. Conclusions and Discussion

We have defined a new class of exchangeable sequences, called mixtures of species sampling sequences (mSSS). We have shown that these sequences include various well-known Bayesian nonparametric models. In particular, the observations of many nonparametric hierarchical models (e.g., hierarchical Dirichlet process, hierarchical Pitman–Yor process and, more generally, hierarchical species sampling models [22,23,24,25]) are mSSS. We have shown that also observations sampled from a mixture of Dirichlet processes [10] are mSSS, under some additional assumptions. Our general class also includes species sampling sequences with a general (not necessarily diffuse) base measure, which have been used in various applications, e.g., in the case of “spike-and-slab”-type nonparametric priors [16,17,18,19,20,21].
We believe that our general framework sheds light on the common structure of all the above-mentioned models, leading to a possible unified treatment of some of their important features. Our techniques provides unified proofs for various results that, up to now, have been proven with ad hoc methods.
We have proven that all the mSSS are obtained by assigning the values of an exchangeable sequence to the classes of a latent exchangeable random partition. This representation is proven in the strong sense of an almost sure equality (see Section 3) and leads to the simple and clear derivation of an explicit expression for the EPPF of an mSSS. We believe that our general proof simplifies the derivation of the EPPF of many of the above-mentioned particular cases. Moreover, our results show that the clustering and the predictive structure of various well-known models do not depend on the relation between these models and completely random measures, but are essentially a consequence of the simple combinatorial structure of these sequences. Many important differences between well-known models (such as mixtures of Dirichlet and hierarchical Dirichlet) can be explained easily by simple differences in the latent partition and the corresponding latent exchangeable sequence.
We stress that a clear understanding of the clustering structure of mSSS is fundamental for practical purposes, since these models are typically used to cluster observations. Moreover, we hope that the explicit expression for EPPFs in our general framework can lead to the development of new MCMC algorithms for sampling from the posterior distribution.
Finally, we believe that some of the results we have proven here for mSSS can be extended to the more general case of partially exchangeable arrays. In this direction, for future works, a possible generalization of mSSS is to consider partially exchangeable arrays with a mixture of species sampling random probability measures as directing measures.

Author Contributions

F.B.: Methodology, simulation, writing and editing. L.L.: Methodology, writing and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 817257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

F. Bassetti and L. Ladelli wish to express their gratitude to Professor Eugenio Regazzini, who has been an inspiring teacher and outstanding guide in many fields of probability and statistics.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In what follows, L ( X ) denotes the law of a random element X. For ease of reference, we state here Lemma 5.9 and Corollary 5.11 of [35].
Lemma A1
(Extension 1). Fix a probability kernel K between two measurable spaces S and T, and let σ be a random element defined on ( Ω , F , P ) taking values in S. Then, there exists a random element η in T, defined on some extension of the original probability space Ω, such that P [ η · | σ ] = K ( · | σ ) a.s. and, moreover, η is conditionally independent given σ from any other random element on Ω.
Lemma A2
(Extension 2). Fix two Borel spaces S and T, a measurable mapping f : T S and some random elements σ in S and η ˜ in T with L ( σ ) = L ( f ( η ˜ ) ) . Then, there is a random element η defined on some extension of the original probability space, such that L ( η ) = L ( η ˜ ) and σ = f ( η ) a.s.
We need the following variant of the previous result.
Lemma A3
(Extension 3). Fix three Borel spaces S 1 , S 2 and T 1 , a measurable mapping ϕ : T 1 × S 2 S 1 and some random elements σ = ( σ 1 , σ 2 ) in S 1 × S 2 and τ 1 in T 1 , all defined on a probability space ( Ω , F , P ) . Assume that the conditional law of σ 1 given σ 2 is the same as the conditional law of ϕ ( τ 1 , σ 2 ) given σ 2 (P-almost surely). Then, there is a random element τ defined on some extension of the original probability space ( Ω , F , P ) taking values in T 1 such that
  • σ 1 = ϕ ( τ , σ 2 ) a.s.
  • L ( τ 1 , σ 2 ) = L ( τ , σ 2 ) .
Proof. 
Define f : T 1 × S 2 = : T S 1 × S 2 = : S by f ( a , b ) = ( ϕ ( a , b ) , b ) , set η ˜ = ( τ 1 , σ 2 ) and σ = ( σ 1 , σ 2 ) . By hypothesizing, it is clear that L ( f ( η ˜ ) ) = L ( ( ϕ ( τ 1 , σ 2 ) , σ 2 ) ) = L ( σ 1 , σ 2 ) = L ( σ ) . Thus, by Lemma A2, one has that, on an enlargement of ( Ω , F , P ) , there exists η : = ( τ , σ 2 * ) such that L ( η ) = L ( η ˜ ) and ( ϕ ( τ , σ 2 * ) , σ 2 * ) = f ( η ) = σ = ( σ 1 , σ 2 ) a.s. Hence, σ 2 * = σ 2 a.s. but also ϕ ( τ , σ 2 ) = ϕ ( τ , σ 2 * ) = σ 1 a.s. It remains to show the second part of the thesis. Since ( τ , σ 2 ) = ( τ , σ 2 * ) = η a.s. and L ( η ) = L ( η ˜ ) , where η ˜ = ( τ 1 , σ 2 ) , it follows that L ( τ , σ 2 ) = L ( τ 1 , σ 2 ) . □

References

  1. Ferguson, T.S. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1973, 1, 209–230. [Google Scholar] [CrossRef]
  2. Pitman, J.; Yor, M. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Ann. Probab. 1997, 25, 855–900. [Google Scholar] [CrossRef]
  3. Perman, M.; Pitman, J.; Yor, M. Size-biased sampling of Poisson point processes and excursions. Probab. Theory Relat. Fields 1992, 92, 21–39. [Google Scholar] [CrossRef]
  4. Regazzini, E.; Lijoi, A.; Prünster, I. Distributional results for means of normalized random measures with independent increments. Ann. Stat. 2003, 31, 560–585. [Google Scholar] [CrossRef]
  5. James, L.F.; Lijoi, A.; Prünster, I. Posterior analysis for normalized random measures with independent increments. Scand. J. Stat. 2009, 36, 76–97. [Google Scholar] [CrossRef]
  6. Lijoi, A.; Prünster, I. Models beyond the Dirichlet process. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
  7. De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Prunster, I.; Ruggiero, M. Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 212–229. [Google Scholar] [CrossRef] [Green Version]
  8. Pitman, J. Poisson-Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Beachwood, OH, USA, 2003; Volume 40, pp. 1–34. [Google Scholar]
  9. Ishwaran, H.; James, L.F. Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 2001, 96, 161–173. [Google Scholar] [CrossRef]
  10. Antoniak, C.E. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 1974, 2, 1152–1174. [Google Scholar] [CrossRef]
  11. Cifarelli, D.M.; Regazzini, E. Distribution functions of means of a Dirichlet process. Ann. Stat. 1990, 18, 429–442. [Google Scholar] [CrossRef]
  12. Sangalli, L.M. Some developments of the normalized random measures with independent increments. Sankhyā 2006, 68, 461–487. [Google Scholar]
  13. Broderick, T.; Wilson, A.C.; Jordan, M.I. Posteriors, conjugacy, and exponential families for completely random measures. Bernoulli 2018, 24, 3181–3221. [Google Scholar] [CrossRef] [Green Version]
  14. Bassetti, F.; Ladelli, L. Asymptotic number of clusters for species sampling sequences with non-diffuse base measure. Stat. Probab. Lett. 2020, 162, 108749. [Google Scholar] [CrossRef]
  15. Pitman, J. Some developments of the Blackwell-MacQueen urn scheme. In Statistics, Probability and Game Theory; IMS Lecture Notes Monograph Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1996; Volume 30, pp. 245–267. [Google Scholar] [CrossRef]
  16. Dunson, D.B.; Herring, A.H.; Engel, S.M. Bayesian selection and clustering of polymorphisms in functionally related genes. J. Am. Stat. Assoc. 2008, 103, 534–546. [Google Scholar] [CrossRef]
  17. Kim, S.; Dahl, D.B.; Vannucci, M. Spiked Dirichlet process prior for Bayesian multiple hypothesis testing in random effects models. Bayesian Anal. 2009, 4, 707–732. [Google Scholar] [CrossRef] [PubMed]
  18. Suarez, A.J.; Ghosal, S. Bayesian Clustering of Functional Data Using Local Features. Bayesian Anal. 2016, 11, 71–98. [Google Scholar] [CrossRef]
  19. Cui, K.; Cui, W. Spike-and-Slab Dirichlet Process Mixture Models. Spike Slab Dirichlet Process. Mix. Model. 2012, 2, 512–518. [Google Scholar] [CrossRef] [Green Version]
  20. Barcella, W.; De Iorio, M.; Baioa, G.; Malone-Leeb, J. Variable selection in covariate dependent random partition models: An application to urinary tract infection. Stat. Med. 2016, 35, 1373–13892. [Google Scholar] [CrossRef] [Green Version]
  21. Canale, A.; Lijoi, A.; Nipoti, B.; Prünster, I. On the Pitman–Yor process with spike and slab base measure. Biometrika 2017, 104, 681–697. [Google Scholar] [CrossRef]
  22. Teh, Y.; Jordan, M.I. Hierarchical Bayesian nonparametric models with applications. In Bayesian Nonparametrics; Hjort, N.L., Holmes, C., Müller, P., Walker, S., Eds.; Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
  23. Teh, Y.W.; Jordan, M.I.; Beal, M.J.; Blei, D.M. Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 2006, 101, 1566–1581. [Google Scholar] [CrossRef]
  24. Camerlenghi, F.; Lijoi, A.; Orbanz, P.; Pruenster, I. Distribution theory for hierarchical processes. Ann. Stat. 2019, 1, 67–92. [Google Scholar] [CrossRef] [Green Version]
  25. Bassetti, F.; Casarin, R.; Rossini, L. Hierarchical Species Sampling Models. Bayesian Anal. 2020, 15, 809–838. [Google Scholar] [CrossRef]
  26. Pitman, J. Combinatorial Stochastic Processes; Lectures from the 32nd Summer School on Probability Theory Held in Saint-Flour, 7–24 July 2002, with a Foreword by Jean Picard; Lecture Notes in Mathematics; Springer: Berlin, Germany, 2006; Volume 1875. [Google Scholar]
  27. Crane, H. The ubiquitous Ewens sampling formula. Stat. Sci. 2016, 31, 1–19. [Google Scholar] [CrossRef]
  28. Kingman, J.F.C. The representation of partition structures. J. Lond. Math. Soc. 1978, 18, 374–380. [Google Scholar] [CrossRef]
  29. Aldous, D.J. Exchangeability and related topics. In École d’été de Probabilités de Saint-Flour, XIII—1983; Lecture Notes in Mathematics; Springer: Berlin, Germany, 1985; Volume 1117, pp. 1–198. [Google Scholar] [CrossRef]
  30. Kallenberg, O. Canonical representations and convergence criteria for processes with interchangeable increments. Z. Wahrscheinlichkeitstheorie Und Verw. Geb. 1973, 27, 23–36. [Google Scholar] [CrossRef]
  31. Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields 1995, 102, 145–158. [Google Scholar] [CrossRef]
  32. Gnedin, A.; Pitman, J. Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 2005, 325, 83–102, 244–245. [Google Scholar] [CrossRef] [Green Version]
  33. Schervish, M.J. Theory of Statistics; Springer Series in Statistics; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
  34. Marin, J.M.; Robert, C.P. Bayesian Core: A Practical Approach to Computational Bayesian Statistics; Springer Texts in Statistics; Springer: New York, NY, USA, 2007; pp. xiv+255. [Google Scholar]
  35. Kallenberg, O. Foundations of Modern Probability, 3rd ed.; Probability Theory and Stochastic Modelling; Springer: New York, NY, USA, 2021; Volume 99. [Google Scholar]
Figure 1. Pictorial representation of the latent partition structure of an mSSS. In the example, the partition induced by ( ξ 1 , , ξ n ) for n = 8 is Π ˜ n = { [ 1 , 3 , 4 , 7 ] , [ 2 ] , [ 5 , 6 , 8 ] } , and it is represented using rounded squares (left bottom). Circles at the top left represent a compatible latent partition, namely Π n = { [ 1 , 3 ] , [ 2 ] , [ 4 , 7 ] , [ 5 , 8 ] , [ 6 ] } . The partition on { 1 , , 5 } induced by the latent Z n , i.e., Π | Π n | ( 0 ) = { [ 1 , 3 ] , [ 2 ] , [ 4 , 5 ] } , is represented with squares in the middle of the figure. Combining Π n and Π | Π n | ( 0 ) , one obtains Π ˜ n . The statistics n , m and λ corresponding to this particular configuration are shown in the box at the bottom right.
Figure 1. Pictorial representation of the latent partition structure of an mSSS. In the example, the partition induced by ( ξ 1 , , ξ n ) for n = 8 is Π ˜ n = { [ 1 , 3 , 4 , 7 ] , [ 2 ] , [ 5 , 6 , 8 ] } , and it is represented using rounded squares (left bottom). Circles at the top left represent a compatible latent partition, namely Π n = { [ 1 , 3 ] , [ 2 ] , [ 4 , 7 ] , [ 5 , 8 ] , [ 6 ] } . The partition on { 1 , , 5 } induced by the latent Z n , i.e., Π | Π n | ( 0 ) = { [ 1 , 3 ] , [ 2 ] , [ 4 , 5 ] } , is represented with squares in the middle of the figure. Combining Π n and Π | Π n | ( 0 ) , one obtains Π ˜ n . The statistics n , m and λ corresponding to this particular configuration are shown in the box at the bottom right.
Mathematics 09 03127 g001
Figure 2. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Data have been rounded to the second decimal. Here, n = 90 and k = 36 . Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with H = T 2 α 0 ( · | μ 0 , σ 0 2 ) , σ 0 2 = β 0 ( k 0 + 1 ) / α 0 k 0 . Different plots correspond to different values of θ and σ . In all the plots, the predictive CDFs are evaluated with μ 0 = 0 , α 0 = 0.1 , β 0 = 0.1 and k 0 = 0.1 .
Figure 2. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Data have been rounded to the second decimal. Here, n = 90 and k = 36 . Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with H = T 2 α 0 ( · | μ 0 , σ 0 2 ) , σ 0 2 = β 0 ( k 0 + 1 ) / α 0 k 0 . Different plots correspond to different values of θ and σ . In all the plots, the predictive CDFs are evaluated with μ 0 = 0 , α 0 = 0.1 , β 0 = 0.1 and k 0 = 0.1 .
Mathematics 09 03127 g002
Figure 3. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Raw data, without rounding. Here, n = 90 and k = 36 . Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with H = T 2 α 0 ( · | μ 0 , σ 0 2 ) , σ 0 2 = β 0 ( k 0 + 1 ) / α 0 k 0 . Different plots correspond to different values of θ and σ . In all the plots, the predictive CDFs are evaluated with μ 0 = 0 , α 0 = 0.1 , β 0 = 0.1 and k 0 = 0.1 .
Figure 3. Predictive CDFs for the relative changes in larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties; data taken from Section 2.1 of [34]. Raw data, without rounding. Here, n = 90 and k = 36 . Solid line: empirical CDF. Dotted line: predictive CDF from (33). Dashed line: predictive CDF from PS2 with H = T 2 α 0 ( · | μ 0 , σ 0 2 ) , σ 0 2 = β 0 ( k 0 + 1 ) / α 0 k 0 . Different plots correspond to different values of θ and σ . In all the plots, the predictive CDFs are evaluated with μ 0 = 0 , α 0 = 0.1 , β 0 = 0.1 and k 0 = 0.1 .
Mathematics 09 03127 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bassetti, F.; Ladelli, L. Mixture of Species Sampling Models. Mathematics 2021, 9, 3127. https://doi.org/10.3390/math9233127

AMA Style

Bassetti F, Ladelli L. Mixture of Species Sampling Models. Mathematics. 2021; 9(23):3127. https://doi.org/10.3390/math9233127

Chicago/Turabian Style

Bassetti, Federico, and Lucia Ladelli. 2021. "Mixture of Species Sampling Models" Mathematics 9, no. 23: 3127. https://doi.org/10.3390/math9233127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop