Next Article in Journal
“Agree to Disagree”: Forecasting Stock Market Implied Volatility Using Financial Report Tone Disagreement Analysis
Next Article in Special Issue
Cumulant-Based Goodness-of-Fit Tests for the Tweedie, Bar-Lev and Enis Class of Distributions
Previous Article in Journal
Efficient Image Encryption Scheme Using Novel 1D Multiparametric Dynamical Tent Map and Parallel Computing
Previous Article in Special Issue
A Novel Discrete Generator with Modeling Engineering, Agricultural and Medical Count and Zero-Inflated Real Data with Bayesian, and Non-Bayesian Inference
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Undirected Structural Markov Property for Bayesian Model Determination

College of Mathematics and System Science, Xinjiang University, Urumqi 830046, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(7), 1590; https://doi.org/10.3390/math11071590
Submission received: 24 January 2023 / Revised: 8 March 2023 / Accepted: 23 March 2023 / Published: 25 March 2023
(This article belongs to the Special Issue Advances in Applied Probability and Statistical Inference)

Abstract

:
This paper generalizes the structural Markov properties for undirected decomposable graphs to arbitrary ones. This helps us to exploit the conditional independence properties of joint prior laws to analyze and compare multiple graphical structures, while being able to take advantage of the common conditional independence constraints. This work provides a theoretical support for full Bayesian posterior updating about the structure of a graph using data from a certain distribution. We further investigate the ratio of graph law so as to simplify the acceptance probability of the Metropolis–Hastings sampling algorithms.

1. Introduction

A probabilistic graphical model (PGM) or a structured probabilistic model (SPM) is a statistical model that consists of a graph and a distribution family for which the graph encodes the conditional independence information between random variables. Such models always associate with independence models, arise naturally in multivariate analysis and can provide certain versatility and convenience in analyzing complex data with large scales, while independence models are the sets of conditional independence constraints encoded by graphs via the global Markov property.
It is known that different classes of graphs with different interpretations of independence have been developed in the past decades, and the reader can refer to [1,2,3,4] for details. One of the most important classes of graphs in graphical models is undirected graphs (UGs). Their corresponding Markov models are often known as undirected graphical models or Markov networks [1,2]. These models have been found to have many applications in a wide range of areas such as econometrics, medical science, artificial intelligence [5,6,7] and so on. Our research in this paper is related to the work in the area of the structure determination of these models with the Bayesian method.
The main objective of Bayesian structure learning is to learn the structure of a graph from data. Meanwhile, Bayesian structure learning requires a clear illustration of a prior distribution about graphical structures, which is termed as a graph law. Statisticians have proposed some approaches to calculate the prior law of a graph. The simplest graph law is the uniform distribution in [8]. Additionally, the Erdös-Rényi random graph model is also used to indicate the graph law in [9]. Furthermore, a characterization of graph law with the form of exponential family is proposed by [10]. However, how to simplify this prior law is a significant task for us, especially in the posterior inference of graphical structures. In view of this, the structural Markov property is first proposed for the purpose of characterizing the conditional independence of the structure of a graph. The structural Markov properties require that the structures of distinct components of graphs are conditionally independent given the existence of a separating component; see [10]. These properties reflect the conditional independence at the structural level. It has been proved that a graph law is structural Markov if and only if it is a member of the clique exponential family given the support condition as the set of decomposable undirected graphs; see [10]. Further, a weaker support condition of equivalent characterization for graph laws is given via closure operation of graphical structures in [11].
Indeed, the structural Markov property is an extension of the hyper Markov property, which was proposed in [12] and reflects the global Markov property at the parameter level. These hyper Markov properties are used to describe the conditional independence properties of a distribution of random variables or statistical quantities in graphical models. The hyper Markov laws arise naturally as sampling distributions of maximum likelihood estimators and as prior or posterior distributions in Bayesian inference.
Recently, a weaker version of the structural Markov properties for decomposable graphs was introduced in [13], where the authors provided an analogous clique-separator factorization for the graph law. These weakly structural Markov properties require that the separator is complete. It has been shown that this provides a more flexible family of graph prior laws to use in full Bayesian posterior updating.
It should be pointed out that all the work in [8,10,13] only focuses on decomposable graphical models. However, based on conditional independence and graphical separation, the structural Markov properties might be extended to non-decomposable undirected graphical models. The aim of this paper is filling this gap in the field of graphical models. Further, we focus on a full Bayesian method for the posterior updating of graph laws via the observed data from a certain distribution, and we also prove that this full Bayesian posterior of graph law is feasible and reasonable. Finally, as examples, we illustrate our theory with detailed investigations of two significant cases based on the graphical Gaussian models and the multinomial models, respectively.
The outline of this paper is organized as follows. In Section 2, we introduce the terminologies and conceptions used in this paper. Section 3 first investigates the structural Markov properties for non-decomposable graphs, and then exploits the joint prior laws of a random sample distribution for full Bayesian inference. Section 4 gives two examples such as the inverse Wishart distribution and the Dirichlet distribution to study the posterior updating of graph laws in details. Further, we discuss some details about the computation for the structural Markov graph laws in Section 5. Finally, in Section 6, we give the conclusion of this paper.

2. Preliminaries

For terms and symbols, we follow the references [10,12] as many theoretical frameworks of this paper are constructed and developed based on them. Several concrete notions and terminologies used in this paper will be given in the following for clarity and consistency.

2.1. Graphical Terminologies and Notation

A graph G = ( V , E ) consists of a finite set of vertices V ( G ) = { v 1 , v 2 , , v p } and the set of edges E ( G ) V ( G ) × V ( G ) = { ( u , v ) : u , v V ( G ) } . An edge ( u , v ) of G is said to be undirected if ( u , v ) is an unordered pair. A graph G is said to be an undirected graph if all its edges are undirected. Unless otherwise specified, here G is always assumed to be undirected, simple and connected throughout the paper.
For A V ( G ) , an induced subgraph of G on A will be denoted by G A = ( A , E ( G A ) ) , where E ( G A ) = { ( u , v ) E ( G ) : u , v A } . All subgraphs in this paper are induced subgraphs. A is complete (or a clique) if any two different vertices u , v A are adjacent, i.e., ( u , v ) E ( G ) . A graph is a clique if its vertex set is a clique. For A , B V ( G ) , a clique G A is a maximal clique if G B is incomplete for any superset B A . Two vertices u and v are considered to be neighbors if ( u , v ) E ( G ) . For A V ( G ) , the boundary b d ( A ) is the set of vertices in V ( G ) \ A that are neighbors of vertices in A. G can be collapsible onto A if every connected component of V ( G ) \ A has a complete boundary in G.
For any subsets A, B and C of V ( G ) , we say that C separates A from B, and write A B | C [ G ] , if any path in G between some u A and v B contains a vertex in C. Usually, we call C a separator of A and B. Separators that are cliques are called clique separators.
For any disjoint subsets A, B and S of V ( G ) , we say ( A , B , S ) forms a decomposition if (i) A B S = V ( G ) ; (ii) A B | S [ G ] , and (iii) S is a clique separator in G. A decomposition ( A , B , S ) is said to be proper if the sets A S and B S are both proper subsets of V ( G ) .
Definition 1 
([14]). Let G = ( V , E ) be an undirected graph. A graph G is reducible if its vertex set contains a clique separator, otherwise G is said to be prime. E.g., G is prime if G is a clique, while G is reducible if G is a disconnected graph. An induced subgraph G U is a maximal prime subgraph of G if it satisfies
(i) 
G U  is prime, and
(ii) 
W V ( G )  s.t.  U W G W  is reducible.
In Figure 1, it is easy to find that G 1 is prime since there is no clique separator in G 1 . However, G 2 is reducible because of a clique separator S = { a , c } in G 2 .
Definition 2 
([14]). A proper decomposition ( A , B , S ) of an undirected graph G is stated to form a prime decomposition if G A S and G B S are prime, or G A S and G B S can be recursively decomposed into pairwise different maximal prime subgraphs of G.
In particular, G is decomposable if G A S and G B S are complete, or they are both decomposable subgraphs of G. Note that the prime decomposition of arbitrary undirected graphs is a generalization of that of chordal graphs. For instance, in Figure 2, G is a non-decomposable undirected graph with V ( G ) = { a , b , c , d , e } , which involves two maximal prime subgraphs G U 1 and G U 2 , with U 1 = { a , b , c } and U 2 = { b , c , d , e } , respectively, and a clique separator S = U 1 U 2 = { b , c } . It is obvious that ( { a } , { d , e } , { b , c } ) forms a prime decomposition of G. Additionally, we find that G U 1 is complete since all its pairs of vertices are joined, while G U 2 is incomplete because the vertices between b and d, or c and e, are not joined.
It is worthwhile to point out that all the maximal prime subgraphs of an undirected graph can form a perfect sequence in a certain way. If there exists a proper decomposition of an undirected graph G, then G admits a perfect sequence ( U 1 , U 2 , , U k ) of maximal prime subgraphs, so that for each j = 2 , , k , there exists some h { 1 , 2 , , j 1 } , and we have
S j = U j ( i = 1 j 1 U i ) U h ,
where S j are clique separators actually. Specifically, G is decomposable if its all maximal prime subgraphs are complete (cliques).
In a PGM, a vertex v denotes a random variable X v , which takes values in a space X v . Let X = X V ( G ) = ( X v ) v V ( G ) be a p-dimensional random vector on some product space v V ( G ) X v with P or θ representing its distribution. All the concerned distributions in the present paper are assumed to be positive and closed under marginalization and conditioning with respect to the type of a joint distribution family. For the sake of simplicity, we use P to represent the set of all positive distributions over X. For A , B V ( G ) , θ A will denote the marginal distribution of X A and θ B | A the conditional distribution of X B given X A = x A .
Let U be the set of undirected graphs with fixed vertex set V ( G ) . A probability distribution of a random graph G, which takes values in U , is said to be a law, denoted by G . Further, define U ( A , B , S ) to be the set of undirected graphs for which ( A , B , S ) is a prime decomposition.

2.2. Independence Model and Collapsibility

Given a finite set N, for A , B , C N , an independence model, denoted by I , is the set of triplets of the form A , B | C , which are termed as conditional independence statements. A graphical independence model is an independence model induced by a graph. For a graph G U , the graphical independence model of G can be defined as
I ( G ) = A , B | C : A B | C [ G ] for A , B , C V ( G ) .
Obviously, I ( G ) is the set of triples A , B | C , encoding its global Markov property over G.
It should be pointed out that the conditional independence of a statistical model in [15,16] shares the same properties of graph separation in [2], i.e., for a graphical independence model I ( G ) , it has the following properties:
  • for all A , B V ( G ) , A , B | A I ( G ) , A , B | B I ( G ) and A , B | A B I ( G ) ;
  • if A , B | C I ( G ) , then B , A | C I ( G ) ;
  • if A , B | C I ( G ) , and U A , then U , B | C I ( G ) ;
  • if A , B | C I ( G ) , and U A , then A , B | C U I ( G ) ;
  • if A , B | C I ( G ) , and A , W | B C I ( G ) , then A , B W | C I ( G ) .
In particular, the following property holds when A , B , C are disjoint.
If A , B | C I ( G ) and A , C | B I ( G ) , then A , B C | Ø I ( G ) .  
Further, a graphical independence model I ( G ) has a natural projection operation on D V ( G ) that
I ( G ) D = A , B | C I ( G ) : A , B , C D .
It is worthwhile to point out that I ( G ) D I ( G D ) , where I ( G D ) is the independence model induced by the induced subgraph G D .
Definition 3 (CI-collapsibility).
Let G be a fixed undirected graph in  U . For  D V ( G ) ,  I ( G )  can be conditional independence collapsible (CI-collapsible) onto D if  I ( G ) D = I ( G D ) .
CI-collapsibility reflects the consistence of conditional independence relations induced by G D and those induced by G, but constrained on D.
We say a distribution P is Markov with respect to G if for A , B , C V ( G ) , it holds that
A , B | C I ( G ) X A X B | X C [ P ] ,
where X A X B | X C [ P ] represents the assertion that X A is independent of X B given X C under P.
In order to ensure that various distributions and those of statistical quantities are Markov with respect to G, we now are in a position to review the graphical models within the framework of undirected graphs. A graphical model, denoted by P ( G ) , is a statistical model such that
P ( G ) = P P : X A X B | X C [ P ] for A , B | C I ( G ) .
For the Markov distribution family P ( G ) , we say that it is faithful to G if there exists a distribution P P ( G ) such that I ( P ) = I ( G ) , where
I ( P ) = A , B | C : X A X B | X C [ P ] for A , B , C V ( G ) .
All the graphical models concerned throughout this paper are assumed to be faithful to G. Such an assumption is called “Faithfulness Assumption” [17]. In fact, this assumption is broad and mild since Gaussian distribution families and multinomial distribution families satisfy the faithfulness assumption.
Moreover, a statistical model P ( G ) also admits a natural projection operation on D V ( G ) , denoted by P ( G ) D , which is defined as follows:
P ( G ) D = P D = V ( G ) \ D d P : P P ( G ) .
Generally, P ( G ) D is not equal to P ( G D ) , but it is obviously shown that P ( G ) D P ( G D ) .
Definition 4 (M-collapsibility).
Let G be a fixed undirected graph in  U . For  D V ( G ) ,  P ( G )  can be model collapsible (M-collapsible) onto D if  P ( G ) D = P ( G D ) .
M-collapsibility indicates that the marginal distribution family is identical to the distribution family induced by G D .
Theorem 1. 
Let G be a fixed undirected graph in  U  and  D V ( G ) . Then, the following statements are equivalent.
1.
G is graphical collapsible onto D;
2.
I ( G ) is CI-collapsible onto D;
3.
P ( G ) is M-collapsible onto D.
Proof. 
See Appendix A.    □
Let H j = i = 1 j U i denote the histories set for each j = 1 , 2 , , k . By Theorem 1, we can obtain the following result.
Proposition 1. 
Let G be a fixed graph in  U  and G has a perfect sequence  ( U 1 , U 2 , , U k )  of maximal prime subgraphs. Then, the following statements hold for each  j { 1 , 2 , , k } .
1.
G can be graphical collapsible onto H j ;
2.
I ( G H j ) = I ( G ) H j ;
3.
P ( G H j ) = P ( G ) H j .
Proof. 
This can be easily obtained from the meaning of collapsibility and Theorem 1.    □

3. Structural Markov Graph Laws for Full Bayesian Inference

3.1. Basic Concepts and Properties

We begin with the definition of the structural Markov property of [10].
Definition 5. 
A graph law  G ( G )  over  U  is structural Markov if
G A S G B S | { G U ( A , B , S ) } [ G ] ,
where  G ( U ( A , B , S ) ) > 0  and  U ( A , B , S )  is the set of undirected graphs for which  ( A , B , S )  is a prime decomposition.
Specifically, if G is decomposable in U , Definition 5 degenerates to that defined in [10].
The structural Markov property indicates that the structures of different induced subgraphs are conditionally independent when the event { G U ( A , B , S ) } happens; see Figure 3 as an illustration.
Proposition 2. 
Let G be a fixed undirected graph in  U . For any subsets A, B and S of  V ( G )  satisfying  A B S = V ( G ) , if  G ( G )  is structural Markov, then
G A S G B S
whenever S is complete and separates A from B in G.
Proof. 
By Definition 5, the existence of the remaining edges in G A S is independent of those in G B S since S is complete and separates A from B in G. Therefore, we are naturally left with a statement of marginal independence G A S G B S since the term G S is redundant. Hence, the result follows.    □
Proposition 2 indicates that different components of undirected graphs are conditionally independent provided that their corresponding separators are complete. In order to illustrate our results with detailed investigations, we give a non-decomposable graph G in which A B separates A from B while A B is incomplete in Figure 4. We can easily find that the two subgraphs G B and G A have possible common edges in A B , which make the existence of the remaining edges in G A dependent of those in G B . In other words, these dependencies will disappear as long as A B is complete.
It also implies that an arbitrary undirected graph can be denoted by the graph product of its induced subgraphs as
G = G A S G B S , G U ( A , B , S ) .
The structural Markov property can be well-characterized by the above operation.
Proposition 3. 
Let π be the density of a graph law  G  with respect to the counting measure on  U . Suppose that  G , G U ( A , B , S ) . Then,
1.
G A S G B S U ( A , B , S )  and  G A S G B S U ( A , B , S ) ;
2.
if  G  is structural Markov on  U , then
π ( G ) π ( G ) = π ( G A S G B S ) π ( G A S G B S ) .
Proof. 
See Appendix A.    □
For any subset C A , define G A ( C ) to be the graph on A such that G A ( C ) is complete in C and empty otherwise.
Proposition 4. 
Let G be a fixed graph in  U  and G has a perfect sequence  ( U 1 , U 2 , , U k )  of maximal prime subgraphs. If G has a structural Markov graph law  G  with the density π, then the density π can be factorized as
π ( G ) = j = 1 k π ( G U j ) j = 2 k π ( G S j ) .
Proof. 
See Appendix A.    □

3.2. Joint Distribution Law

In this section, we will investigate how the structural Markov laws interact with the hyper Markov laws when they are considered as the joint prior laws.
Hyper Markov laws are motivated by the property that graph decomposition allows one to decompose a prior or posterior distribution into the product of marginal distributions on corresponding maximal prime subgraphs. For a fixed graph G U ( A , B , S ) , any prior or posterior distribution of θ P ( G ) is uniquely characterized by its marginals θ A S and θ B S , taking values in P ( G A S ) and P ( G B S ) , respectively.
Following [12], to be specific, a probability distribution of a random distribution θ , which takes values in P ( G ) , is said to be a law, denoted by L . For A V ( G ) , the marginal law of θ A will be denoted by L A and L B | A will denote the conditional law of θ B | A .
Here, we give the definitions of weak and strong hyper Markov properties.
Definition 6 
([12], Weak and strong hyper Markov). Suppose that G is a fixed graph in U ( A , B , S ) and θ P ( G ) . Let L ( θ ) be a law of θ. We say that L ( θ ) is weak hyper Markov over G if
θ A S θ B S | θ S [ L ] .
Further, we say that L ( θ ) is strong hyper Markov over G if
θ A S θ B S | S [ L ] .
Let X be a random sample from θ P ( G ) . The conditional independence property of the joint distribution law ( P , L ) for the pair ( X , θ ) on G U can be characterized as follows.
Proposition 5. 
Let G be a fixed undirected graph in  U  with a prime decomposition  ( A , B , S ) . X is a random sample from  θ P ( G ) . Then, the joint distribution law of  ( X , θ )  satisfies:
1.
if  L ( θ )  is weak hyper Markov with respect to G, then
( X A S , θ A S ) ( X B S , θ B S ) | ( X S , θ S ) [ P , L ] ;
2.
if  L ( θ )  is strong hyper Markov, then
( X A S , θ A S | S ) ( X B S , θ B S ) | X S [ P , L ] .
Proof. 
See Appendix A.    □
It is worth mentioning that the hyper Markov property does not hold for the cases where separators are not complete. For instance, the graph G 1 in Figure 1 is incomplete, and we do not have θ { a , b , c } θ { a , d , c } | θ { a , c } or θ { b , a , d } θ { b , c , d } | θ { b , d } . However, it is worthwhile to point out that the corresponding pairwise Markov property X b X d | X { a , c } or X a X c | X { b , d } holds under P if P is Markov with respect to G 1 .
Let Θ be the family of Markov distributions over U and L the family of hyper Markov laws over U . For the sake of discussion, it is necessary for us to reconsider the notion of hyper compatibility, which was first proposed by [10], to characterize families of laws for every graph.
Definition 7 (Hyper compatibility).
Let  L , L L  be the laws of  θ Θ  with respect to G and  G  on  U , respectively. For  A V ( G ) , we say  L  is hyper compatible on  U  if  L A ( θ ) = L A ( θ )  whenever  G , G  are collapsible onto A and  G A = G A .
Here L is always assumed to be hyper compatible over U . Based on the arguments above, some significant conditional independence properties of such joint law ( L , G ) can be investigated as the following.
Proposition 6. 
Suppose that G has a graph law  G  over  U . If θ has a law  L  from a hyper compatible family  L  over  U , then
θ A S G B S | ( G A S , { G U ( A , B , S ) } ) [ L , G ] .
Proof. 
Suppose that G U ( A , B , S ) . Since G is collapsible onto both A S and B S , by hyper compatibility, L A S can only take values in L ( G A S ) for any L L .    □
Theorem 2. 
Suppose that  G ( G )  is structural Markov over  U . For any  L L ,
1.
if  L ( θ )  is weak hyper Markov, then
( θ A S , G A S ) ( θ A S , G B S ) | ( θ S , { G U ( A , B , S ) } ) [ L , G ] ;
2.
if  L ( θ )  is strong hyper Markov, then
( θ A S , G A S ) ( θ B S | S , G B S ) | { G U ( A , B , S ) } [ L , G ] .
Proof. 
See Appendix A.    □
Theorem 2 reflects the conditional independence properties at both parameter and structural level.
Further, for any G U , let X be a random sample from a distribution θ Θ on U . If G is assigned the prior law G and θ is assigned the prior law L , then a joint distribution law is thereby created for ( X , θ , G ) .
Proposition 7. 
Suppose that  G ( G )  is structural Markov on  U . Let X be a random sample from  θ Θ . For  L L ,
1.
if  L ( θ )  is weak hyper Markov, then
( X A S , θ A S ) G B S | ( X S , θ S , { G U ( A , B , S ) } ) [ Θ , L , G ] ;
2.
if  L ( θ )  is strong hyper Markov, then
( X A S , θ A S | S ) G B S | ( X S , { G U ( A , B , S ) } ) [ Θ , L , G ] .
Proof. 
See Appendix A.    □
The conditional independence property of any such joint distribution law of ( X , θ , G ) can be characterized as follows.
Theorem 3. 
Suppose that  G ( G )  is structural Markov on  U . Let X be a random sample from  θ Θ  on  U . For  L L ,
1.
if  L ( θ )  is weak hyper Markov, then
( X A S , θ A S , G A S ) ( X B S , θ B S , G B S ) | ( X S , θ S , { G U ( A , B , S ) } ) [ Θ , L , G ] ;
2.
if  L ( θ )  is strong hyper Markov, then
( X A S , θ A S , G A S ) ( X B S , θ B S | S , G B S ) | ( X S , { G U ( A , B , S ) } ) [ Θ , L , G ] .
Proof. 
See Appendix A.    □
Theorem 3 reflects that a random sample can be determined by both hyper and structural parameters, which will play a significant role in full Bayesian inference.
Corollary 1. 
Suppose that  G ( G )  is structural Markov on  U . Let X be a random sample from  θ Θ  on  U . For  L L ,
1.
if  L ( θ )  is weak hyper Markov, then
( X A S , θ A S ) ( X B S , θ B S ) | ( X S , θ S , { G U ( A , B , S ) } ) [ Θ , L , G ] ;
2.
if  L ( θ )  is strong hyper Markov, then
( X A S , θ A S ) ( X B S , θ B S | S ) | ( X S , { G U ( A , B , S ) } ) [ Θ , L , G ] .
Proof. 
It can be easily obtained from Theorem 3.    □
Corollary 1 can be considered as a generalization of Proposition 5 since G is a random undirected graph on U with a prime decomposition ( A , B , S ) . Without loss of generality, when the event { G U ( A , B , S ) } happens, i.e., given a graph G with a prime decomposition ( A , B , S ) , we can deduce from Corollary 1 that
( X A S , θ A S ) ( X B S , θ B S ) | ( X S , θ S ) [ P , L ] .

3.3. Posterior Updating for Graph Law

Our research in this section aims to identify the structure of models via the Bayesian approach. Based on our results in Section 3.2, in the following, we will use data from a certain distribution to learn the structure of a graph.
We assume that G has a structural Markov graph law G over U . For θ Θ , let θ have a law from a hyper compatible family L . Let X ( n ) = ( X 1 , X 2 , , X n ) denote a random sample of n observations from θ . If we focus on the density of posterior graph law π ( G | x ( n ) , θ ) with its conjugated prior graph law π ( G ) , then the full Bayesian posterior graph law follows:
π ( G | x ( n ) , ϑ ) = 1 Z π ( G ) ( θ | G ) p ( x ( n ) | θ ; G ) , θ Θ , G U ,
where Z is a normalizing constant and ϑ is a hyperparameter that characterizes the law of θ . In general, it is hardly for us to estimate the structure of a graph G since the hyper parameter ϑ is unknown.
In the following, we investigate the properties of structural Markov laws when used as priors for models.
Proposition 8. 
If the prior graph law  G ( G )  is structural Markov on  U , then the posterior graph law, obtained by conditioning on data  X ( n ) = x ( n ) , is structural Markov on  U .
Proof. 
By the conditional independence and Theorem 3, we can easily find that
G A S G B S | ( X ( n ) , θ , { G U ( A , B , S ) } ) .
   □
Proposition 9. 
Assume that the prior graph law  G ( G )  is structural Markov and  L ( θ )  is strong hyper Markov on  U . Then, the following properties hold:
1.
The posterior graph law obtained by conditioning on data  X ( n ) = x ( n )  is structural Markov with respect to  U ;
2.
The marginal data distribution of  X ( n )  is Markov with respect to  U ;
3.
The posterior law of θ conditioning on  X ( n ) = x ( n )  is Markov with respect to  U .
Proof. 
By the conditional independence and Theorem 3, we have
G A S G B S | ( X ( n ) , { G U ( A , B , S ) } ) .
This implies (i).
To prove (ii), by the conditional independence and Theorem 3, we have
X A S ( n ) X B S ( n ) | ( X S ( n ) , { G U ( A , B , S ) } ) .
In particular, if G is given from U ( A , B , S ) , then
X A S ( n ) X B S ( n ) | X S ( n ) .
From Theorem 3, we have
θ S S θ B S | ( X ( n ) , θ S , { G U ( A , B , S ) } ) ,
which implies (iii).    □
Our Bayesian approaches call for a strong hyper Markov prior law on θ with respect to G U . By Proposition 9, the posterior law of θ , given G, has a density of the following form:
( θ | x ( n ) , G ) = U U ( θ U | x U ( n ) ) S S ( θ S | x S ( n ) ) , θ Θ ,
where U is the set of maximal prime subgraphs of G and S is the set of corresponding clique separators.
If G ( G ) is structural Markov and L ( θ ) is strong hyper Markov with respect to G, then the posterior graph law of G will be given by
( G | x ( n ) , ϑ ) U U π ( G U ) ( θ U | x U ( n ) ) S S π ( G S ) ( θ S | x S ( n ) ) , θ Θ , G U .
It is worthwhile to point out that (3) indicates that the posterior graph law of G will preserve the structural Markov property under the hyper compatible laws. This result coincides with Proposition 8. Further, this updating may be performed locally by (3), which implies that the posterior graph laws on each maximal prime subgraphs of G are only dependent of the posterior of hyper compatible laws on the maximal prime subgraph.

4. Two Special Cases

4.1. Graphical Gaussian Models and the Inverse Wishart Law

A graphical Gaussian model is defined by a p-dimensional multivariate Gaussian distribution with the expected value μ and covariance matrix Σ , i.e.,
P ( X ) = N p ( μ , Σ ) .
For simplicity, we assume that the model has zero mean in the following. Define K = Σ 1 to be the precision matrix of G, where
K M p + : K u v = 0 for all ( u , v ) E ( G ) ,
where M p + denotes the set of p × p positive definite matrices. For any matrix M M p + , M A will denote the | A | × | A | matrix obtained by ( M u v ) ( u , v ) A 2 . It has been shown that the global, local and pairwise Markov properties are equivalent in graphical Gaussian models; see [2]. We therefore conclude that the graphical Gaussian distribution P is Markov with respect to G if and only if
K u v = 0 X u X v | X V ( G ) \ { u , v } .
Let x ( n ) be observations of n × p sample matrix X ( n ) , a random sample of size n from the graphical Gaussian distribution N p ( 0 , Σ ) , and let S = x ( n ) ( x ( n ) ) T denote the observed sum-of-products matrix. Then, for any U U ,
p ( x U ( n ) | Σ U ) = 1 ( 2 π ) | U | | Σ U | n 2 exp 1 2 t r ( Σ U 1 S U ) ,
where | U | is the cardinality of U, and | Σ U | is the determinant of Σ U . It is similar for p ( x S ( n ) | Σ S ) , S S .
The inverse Wishart distribution is also termed inverse Wishart law, denoted by I W ( δ , Φ ) . It is as the prior for the graphical Gaussian distribution N p ( 0 , Σ ) . Conditioning on (4), Σ has a hyper inverse Wishart prior law, denoted by H I W ( δ , Φ ) . The marginal density ( Σ U | Φ U ) is of the form
( Σ U | Φ U ) = | Σ U | δ + | U | 1 2 exp 1 2 t r ( Σ U 1 Φ U ) .
It is already shown in [12] that the hyper inverse Wishart law satisfies the strong hyper Markov property, which would allow us to compute the posterior updating of Σ by the margins of maximal prime subgraphs of the graph G. That is, for any U U ,
( Σ U | X ( n ) = x ( n ) ) = ( Σ U | X U ( n ) = x U ( n ) )
with the density
( Σ U | x U ( n ) ) Σ U | δ + n + | U | 1 2 exp 1 2 Σ U 1 ( Φ U + S U ) .
We conclude that
L ( Σ U | X U ( n ) = x U ( n ) ) = I W ( ( δ + n ; S U + Φ U ) .
Consequently, if we assign a prior law of form (1) for G, then from Proposition 8 we can conclude that the posterior graph law of G, given data X ( n ) = x ( n ) from the Gaussian distribution N p ( 0 , Σ ) , can be obtained through (3) with a density of the following form:
π ( G | x ( n ) , Φ ) U U π ( G U ) ( Σ U | Φ U + S U ) S S π ( G S ) ( Σ S | Φ S + S S ) , G U .

4.2. Multinomial Models and the Dirichlet Law

Suppose that all the variables ( X 1 , X 2 , , X p ) are discrete-valued. Let V ( G ) denote the contingency table by I = I 1 × I 2 × × I p , where I h is a finite set for each h { 1 , 2 , , p } . An element i I is referred to as a cell in this table. Based on this, ( X 1 , X 2 , , X p ) will take value in finite sets I = ( I 1 , I 2 , , I p ) . Indeed, I is a discrete-valued random vector whose distribution θ is assumed to be Markov with respect to G. Then,
θ ( i ) = U U θ ( i U ) S S θ ( i S ) , i I ,
where θ ( i U ) ( 0 , 1 ) , θ ( i S ) ( 0 , 1 ) and i I θ ( i ) = 1 .
Let x ( n ) be observations of X ( n ) , a random sample from θ . X ( n ) is an n × p matrix where each row denotes an observation of I. The distribution of X ( n ) is the multinomial distribution with index n and probabilities θ , denoted by M ( n , θ ) . Then, the likelihood function p ( x ( n ) | θ , G ) has the form
p ( x U ( n ) | θ U ) = i U I C θ ( i U ) n ( i U ) , U U ,
where I U = u U I u , θ U = θ ( i U ) i U I U , and n ( i U ) counts the number of elements of x U ( n ) from the marginal cell i U . It is similar for p ( x S ( n ) | θ S ) , S 𝒮 .
The Dirichlet distribution is also termed Dirichlet law, denoted by D ( α ) , where α = ( α ( i ) ) i I are hyper parameters. It is used as the prior for multinomial distribution M ( n , θ ) . It is shown that the Dirichlet law satisfies strong hyper Markov property; see [12]. Thus, we have
( θ U | α U ) = i U I U θ ( i U ) α ( i U ) 1 , U U ,
and then the posterior law can be written as
( θ U | x ( n ) , α U ) = i U I U θ ( i U ) α ( i U ) + n ( i U ) 1 , U U .
Based on the above arguments, we can conclude that L ( θ U | x U ( n ) ) = D ( α U + n U ) . Further, if we assign a prior law of form (1) for G, by Proposition 8, the posterior graph law of G, given data X ( n ) = x ( n ) obtained from θ , has density in the following way:
π ( G | x ( n ) , α ) U U π ( G U ) ( α U + n ) S S π ( G S ) ( α S + n ) , G U .

4.3. An Example on Simulated Data

4.3.1. Dataset Description

In this section, we present the results for one application to a real dataset. We analyze a labor force survey dataset, which is available from [18]. This dataset is used to analyze the multivariate associations among income, education and family background on 1002 males in the American labor force. Here, we briefly describe these variables in this dataset.
  • inc: The income of the respondents.
  • deg: Tespondents’ highest educational degree.
  • chi: The number of children of the respondents.
  • pin: The income of the respondents’ parents.
  • pde: The highest educational degree of respondents’ parents.
  • pch: The number of children of respondents’ parents.
  • age: Respondents’ age in years.

4.3.2. Experiments and Results

We consider the posterior graph law of G in Equation (5), a Gibbs sampler can then be formed by using the following conditional posteriors:
  • X N p ( 0 , Σ ) ;
  • Σ | X , Φ I W ( n + δ 1 , S + Φ ) .
For the prior graph law of G, following from Example 3.5 in [10], we consider an Erdös-Rényi random graph model prior on each edge ( u , v ) with
π ( G ) φ 1 φ | E ( G ) | ,
where the parameter φ ( 0 , 1 ) is a prior probability of existing edges. In this case, we set φ = 0.5 . We use the inverse Wishart law I W ( δ , Φ ) as a prior for the covariance matrix over the graph G, with δ = 7 and Φ = I 7 as an identity matrix here.
By using the function above, we simulate n = 1002 observations. The experimental results are implemented by R package for 5000 iterations with 2500 as burn-in as follows:
The experimental results on this dataset are displayed in Figure 5 and Figure 6. The estimated posterior probabilities of the size of the graphs are shown in the left of Figure 5, which shows that our algorithm mainly visits graphs with sizes between nine and twelve edges. The figure on the right exhibits the estimated posterior probabilities of all visited graphs with various sizes, and also shows that more than 15 different graphs are visited. The graph in Figure 6 is the selected graph with the highest posterior probability from these visited graphs.
The results also suggest that the respondents’ income has relationships with their own education and age. It is also shown that the income of respondents’ parents is only related to their education.

5. Computations

In this section, we aim to design an algorithm to take samples that we are interested in, such as decomposable undirected graphs, from the structural Markov graph law G on U .

5.1. Ratio for Graph Law

Model comparison plays an important role in statistical analysis, especially in solving the problem of the ratio of distributions of variables in different states. We consider a graph itself as a random variable into the construction of this ratio between two undirected graphs G and G, where G is obtained from G by removing or adding one edge. This ratio can be written as
Λ ( G : G ) = π ( G ) π ( G ) .
The main objective of this next section is to greatly simplify this complex calculation under the assumption that the graph law G is structural Markov on U . For the sake of convenience, we define η U = π ( G U ) and ζ S = π ( G S ) for U U and S S .
In Figure 7, it is a special case where G is obtained from G by removing the edge ( u , v ) , which is exactly in one prime component U U of G.
Proposition 10. 
Let G be a fixed graph in  U  and G has a perfect sequence  ( U 1 , U 2 , , U k )  of maximal prime subgraphs. Suppose that  G  is obtained from G by removing the edge  ( u , v ) . Then,
1.
if u and v are contained in exactly one maximal prime subgraph  U j  of G, then
Λ ( G : G ) = η U j η U j , j = 1 , 2 , , k ;
2.
if u and v are contained in both two neighboring maximal prime subgraphs  U j , U j + 1  of G, then
Λ ( G : G ) = η W η W ,
where W = U j U j + 1 in G.
Proof. 
See Appendix A    □
In Figure 8, it is a certain case where G is obtained from G by adding the edge ( u , v ) within two neighboring prime components U i and U j of G such that u U i and v U j .
Proposition 11. 
Let G be a fixed graph in  U  and G has a perfect sequence  ( U 1 , U 2 , , U k )  of maximal prime subgraphs. Suppose that  G  is obtained from G by adding the edge  ( u , v ) . Then,
1.
if u and v are contained in exactly one incomplete prime subgraph  U h , then
Λ ( G : G ) = η U h η U h .
2.
if  U i u  and   U j v  are the two distinct maximal prime subgraphs of G, then there are some prime components  U i = U h 1 , U h 2 , , U h m = U j  such that
Λ ( G : G ) = η T η T ,
where T = U h 1 U h 2 U h m .
Proof. 
See Appendix A.    □
In particular, if G is a decomposable graph in U , then we have the following results.
Lemma 1 
([19]). Let G be a decomposable graph in U and G has a perfect cliques sequence ( U 1 , U 2 , , U k ) . Suppose that G is decomposable, obtained from G by removing or adding one edge ( u , v ) . Then,
1.
If  G  is obtained from G by removing the edge  ( u , v ) , then u and v must belong to a clique  U j  of G;
2.
If  G  is obtained from G by adding the edge  ( u , v ) , then there exist two different cliques  U i u  and  U j v  such that  S = U i U j  is complete and separates  U i  and  U j .
Corollary 2. 
Let G be a decomposable graph in  U  and G has a perfect cliques sequence  ( U 1 , U 2 , , U k ) . Suppose that  G  is decomposable, obtained from G by removing or adding one edge  ( u , v ) . Then,
1.
If  G  is obtained from G by removing the edge  ( u , v )  within  U j , then
Λ ( G : G ) = η U u η U v η 0 η U ,
where U u = U j \ { v } , U v = U j \ { u }  and  U 0 = U j \ { u , v } ;
2.
If  G  is obtained from G by adding the edge  ( u , v )  such that  u U i  and  v U j , then the ratio  Λ ( G : G )  is
Λ ( G : G ) = ζ S ζ S 0 ζ S u ζ S v ,
where S = U i U j , S u = S { u } , S v = S { v } and S 0 = S { u , v } .
Proof. 
We first give the proof of 1. If ( u , v ) E ( G ) and ( u , v ) E ( G ) , by Lemma 1, the deleted edge ( u , v ) must belong to a single clique U j . It is worthwhile to point out that all of U u , U v and U 0 are cliques in G and G . Then,
η U j = η U u η U v η U 0 ,
which combines with (A19) gives the result. The proof of 2 is similar.    □

5.2. Sampling Decomposable Graphs from Structural Markov Graph Laws

We now take a random graph on U as the initial state to design the Markov chain Monte Carlo (MCMC) sampler for sampling from a structural Markov graph law. This technique relies on small perturbations to the edge set of a graph, indicating that one edge could be removed or added.
A reversible jump MCMC sampler is introduced for posterior sampling of decomposable graphical models, which relies on making single edge additions and removals; see [8]. We now use this jump MCMC methodology for our sample from structural Markov law in further details.
Let G denote a state variable and G the destination variable where G is obtained from a random graph G by removing or adding one edge, and so G would take the chain to the destination G with probability q ( G , G ) , which ensures detailed balance with respect to the target distribution π ( G ) . Then, the Metropolis–Hastings acceptance ratio can be written as
α ( G , G ) = min 1 , π ( G ) q ( G , G ) π ( G ) q ( G , G ) .
In fact, the Equation (9) is not the only choice yielding detailed balance. In particular, in order to reduce the error caused by excessive proportion, we can make the following adjustment that
α ( G , G ) = min 1 , π ( G ) π ( G ) × min 1 , q ( G , G ) q ( G , G ) .
In general, since the proposal kernel, which we will set as symmetric, that is, q ( G , G ) = q ( G , G ) . Consequently, it is indicated that the acceptance probability is only dependent on the relative densities, which will only require us to compute
α ( G , G ) = min 1 , Λ ( G : G ) .
We randomly select a pair of vertices u , v V ( G ) . If ( u , v ) E ( G ) , then it is removed. If ( u , v ) E ( G ) , then it is added. Let G + ( u , v ) denote the graph, which is obtained from G by adding the edge ( u , v ) , and similarly for G ( u , v ) . Let G ( t ) denote the state of G at time t and let U be the set of decomposable undirected graphs with vertex set V ( G ) . We begin with an ER random graph as its initial state, and then a Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law G can be constructed in the following Algorithm 1:
Algorithm 1 A Metropolis–Hastings algorithm for sampling decomposable graphs from a structural Markov graph law.
Input: An ER random graph G U .
Output: A set of decomposable graph from U .
    Set G ( 0 ) = G
    for  t = 0 , 1 , 2 , . do
         if  ( u , v ) E ( G ( t ) ) and G ( u , v ) U  then
              set G ( t + 1 ) = G ( u , v ) with probability min η U u η U v η U 0 η U , 1
         else if  ( u , v ) E ( G ( t ) ) and G + ( u , v ) U  then
              set G ( t + 1 ) = G + ( u , v ) with probability min ζ S ζ S 0 ζ S u ζ S v , 1
         else
               G ( t + 1 ) = G ( t )
         end if
    end for
    return A set of decomposable graphs.
Based on our results in Section 5.1, this algorithm implies that the acceptance probability can be obtained by only evaluating the marginal likelihood of corresponding subsets of V ( G ) at each step when sampling from a posterior graph law in Proposition 8 or Proposition 9.

6. Conclusions

The main contribution of this paper is to define the structural Markov properties of [10] for non-decomposable undirected graphs. It is shown that an arbitrary undirected graph can be primely decomposed into the sum of several prime subgraphs. Based on the prime decomposition of undirected graphs and conditional independence, the structural Markov properties can be naturally extended to arbitrary undirected graphs.
Then, we propose a full Bayesian method for estimating the structure of a graph. This method requires that our observed data are from a certain distribution. By using our results, we have shown that the computation of posterior updating of graph law can be determined by the prime components margins, which would make the computation of the posterior graph law greatly simplified.
It should be pointed that all our research only focuses on undirected graphs. However, other classes of graphs, such as chain graphs or ancestral graphs, may have more interesting and valuable properties that can reflect the conditional independence of the graph structure in the problem of models determination. In the future, we will detail them at length.

Author Contributions

Methodology, Y.S.; Validation, X.K.; Writing—review & editing, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Grant No. 2022D01C406), the National Natural Science Foundation of China (Grant Nos. 11861064, 11726629, 11726630) and the National Key Laboratory for Applied Statistics of MOE, Northeast Normal University (Grant No. 130028906).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs of Some Main Theorems and Propositions

Proof of Theorem 1. 
The equivalence of (i) and (ii) can be implied by [Corollary 2.5] [20]. So, it suffices to show that (ii) ⇔ (iii). We first give the proof of (ii) ⇒ (iii). Firstly, we know that
P ( G ) = { P : I ( G ) I ( P ) } .
By the meaning of P ( G ) , we define
P ( G D ) = { Q : I ( G D ) I ( Q ) } .
For P P ( G ) , let R = P D P ( G ) D . By CI-collapsibility, we have
I ( G D ) = I ( G ) D I ( R ) .
So, we implied that R P ( G D ) by (A1) and (A2). From which, it follows that P ( G ) D P ( G D ) . Hence, the result follows by P ( G ) D P ( G D ) . Conversely, under the “Faithfulness Assumption”, there is a P P ( G ) such that I ( P ) = I ( G ) , implying I ( P D ) = I ( P ) D = I ( G ) D . By M-collapsibility, we know that P D P ( G D ) , which gives I ( G D ) I ( P D ) . Hence, we have I ( G D ) I ( G ) D . The result follows since it is easy to obtain that I ( G ) D I ( G D ) . □
Proof of Proposition 3. 
By the graph product operation, since S is complete and separates A from B, then for any G , G U ( A , B , S ) , ( A , B , S ) is a prime decomposition of the graph G A S G B S with vertex set V ( G ) , and so is G A S G B S . They imply that (i) holds. As for (ii), if G is structural Markov on U , then
π ( G ) = π ( G A S G B S ) = π ( G A S | { G U ( A , B , S ) } ) π ( G B S | { G U ( A , B , S ) } ) ,
and similarly we can have the same result for π ( G ) . From (i), we have
π ( G A S G B S ) = π ( G A S | { G U ( A , B , S ) } ) × π ( G B S | { G U ( A , B , S ) } ) ,
and so is π ( G A S G B S ) . Thus, our results by the above arguments follow. □
Proof of Proposition 4. 
Let H j = i = 1 j U i , S j = U j H j 1 , j { 2 , 3 , , k } . Since ( H j 1 \ S j , U j \ S j , S j ) forms a prime decomposition of G H j , for each j { 2 , 3 , , k } , we have
G H j = G U j G H j 1 .
For j = k , since S k = U k H k 1 is complete, then
G S k = G H k ( S k ) = G U k ( S k ) G H k 1 ( S k ) .
Whence we have
π ( G H k ) π ( G S k ) = π ( G U k G H k 1 ) π ( G U k ( S k ) G H k 1 ( S k ) ) .
By Proposition 3, we can obtain
π ( G H k ) π ( G S k ) = π ( G U k G U k ( S k ) ) π ( G H k 1 G H k 1 ( S k ) ) = π ( G U k ) π ( G H k 1 ) .
The Equation (1) can be obtained recursively. □
Proof of Proposition 5. 
Suppose that ( A , B , S ) forms a prime decomposition of G. Since G can be graphical collapsible onto A S , by Theorem 1, θ A S only takes values in P ( G A S ) . This implies that X A S can be obtained from θ A S actually. Then, we obtain
X B S θ A S | θ B S .
From (A3), we deduce
X B S θ A S | ( θ B S , θ S ) .
By the meaning of the hyper Markov property, it follows that θ A S θ B S | θ S . Combing this with (A4) and the axioms of conditional independence gives
θ A S ( X B S , θ B S ) | ( X S , θ S ) ,
which implies that
X A S ( X B S , θ B S ) | ( X S , θ A S ) .
Together, (A5) and (A6) yield the result. The proof for the strong case follows similar steps. □
Proof of Theorem 2. 
The weak hyper Markov property states that
θ A S θ B S | ( θ S , G , { G U ( A , B , S ) } ) .
Since G U ( A , B , S ) , then G = G A S G B S . Thus, from (A7) we deduce
θ A S θ B S | ( θ S , G A S , G B S , { G U ( A , B , S ) } ) .
From Proposition 6, we obtain
θ A S G B S | ( θ S , G A S , { G U ( A , B , S ) } ) ,
which gives the result that
θ A S ( θ B S , G B S ) | ( θ S , G A S , { G U ( A , B , S ) } )
by combining with (A8).
Again, by Proposition 6 and the structural Markov property, we have
G A S ( θ B S , G B S ) | { G U ( A , B , S ) } .
Then, we have
G A S ( θ B S , G B S ) | ( θ S , { G U ( A , B , S ) } ) .
Thus, our result follows from (A9) and (A10). The proof for the strong case follows similar steps. □
Proof of Proposition 7. 
By Theorem 2, we obtain
( θ A S , G A S ) ( θ B S , G B S ) | ( θ S , { G U ( A , B , S ) } ) .
From (A11), we deduce
θ A S G B S | ( θ S , { G U ( A , B , S ) } ) .
Whence we have
θ A S G B S | ( X S , θ S , { G U ( A , B , S ) } ) .
By conditional independence property and Theorem 2,
X A S G B S | ( X S , θ S , { G U ( A , B , S ) } ) .
Thus, we have
X A S G B S | ( X S , θ S , θ A S , { G U ( A , B , S ) } ) ,
which combines with (A12) to give the result. The proof of the strong case is similar, so we omit it for simplicity. □
Proof of Theorem 3. 
Since X is a random sample from θ and L ( θ ) is hyper Markov with respect to G, then by Proposition 5,
( X A S , θ A S ) ( X B S , θ B S ) | ( X S , θ S , G , { G U ( A , B , S ) } ) .
Since G U ( A , B , S ) , G = G A S G B S . Then, from (A14) we can find that
( X A S , θ A S ) ( X B S , θ B S ) | ( X S , θ S , G A S , G B S , { G U ( A , B , S ) } ) .
From Proposition 7, we obtain
( X A S , θ A S ) G B S | ( X S , θ S , G A S , { G U ( A , B , S ) } ) ,
which combines with (A15) to give the result that
( X A S , θ A S ) ( X B S , θ B S , G B S ) | ( X S , θ S , G A S , { G U ( A , B , S ) } ) .
Additionally, from the structurally Markov property and Proposition 7, we have
G A S ( X B S , θ B S , G B S ) | ( X S , θ S , { G U ( A , B , S ) } ) .
From (A17), we deduce
G A S ( X B S , θ B S , G B S ) | ( X B S , θ A S , X S , θ S , { G U ( A , B , S ) } ) .
So, the result follows from (A16) and (A18). Similar proof can be given for the case of the strong hyper Markov law. □
Proof of Proposition 10. 
We first give the proof of (i). Suppose that  G , G U . If G is structural Markov on U , then we have
Λ ( G : G ) = π ( G ) π ( G ) = j = 1 k π ( G U j ) j = 2 k π ( G S j ) × j = 2 k π ( G S j ) j = 1 k π ( G U j ) = η U j η U j , j { 1 , 2 , , k } .
The proof of (ii) is given as follows. It is obvious that W is prime in G . Consequently, G has a perfect maximal prime subgraphs sequence ( U 1 , , U j 1 , W , U j + 2 , , U k ) , and then the Equation (7) follows by using (i). □
Proof of Proposition 11. 
The proof of (i) follows similar steps to that of Proposition 10. To give the proof of (ii), let T be the junction tree with vertices being all maximal prime subgraphs of G. The construction of T can be referred to [21]. Since u , v are in two different maximal prime subgraphs of G, we then connect the U i and U j in T . Then, we will obtain a unique cycle. Without a loss of generality, the vertices on this cycle are denoted by ( U i = U h 1 , U h 2 , , U h m = U j ) , where U h t and U h t + 1 are connected by an edge in T . Then, it is easy to see that U \ { U i = U h 1 , U h 2 , , U h m = U j } T is the set of all the maximal prime subgraphs of G . So, by applying (i), Equation (8) follows. □

References

  1. Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  2. Lauritzen, S.L. Graphical Models; Oxford University Press: New York, NY, USA, 1996. [Google Scholar]
  3. Richardson, T. A factorization criterion for acyclic directed mixed graphs. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009. [Google Scholar]
  4. Richardson, T.; Spirtes, P. Ancestral graph Markov models. Ann. Stat. 2002, 30, 962–1030. [Google Scholar] [CrossRef]
  5. Iqbal, K.; Buijsse, B.; Wirth, J. Gaussian Graphical Models Identify Networks of Dietary Intake in a German Adult Population. J. Nutr. Off. Organ Am. Inst. Nutr. 2016, 146, 646–652. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Larranaga, P.; Moral, S. Probabilistic graphical models in artificial intelligence. Appl. Soft Comput. 2011, 11, 1511–1528. [Google Scholar] [CrossRef]
  7. Verzilli, C.J.; Stallard, N.; Whittaker, J.C. Bayesian graphical models for genomewide association studies. Am. J. Hum. Genet. 2006, 79, 100–112. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Giudici, P.; Green, P.J. Decomposable graphical Gaussian model determination. Biometrika 1999, 86, 785–801. [Google Scholar] [CrossRef] [Green Version]
  9. Madigan, D.; Raftrey, A.E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Stat. Assoc. 1994, 89, 1535–1546. [Google Scholar] [CrossRef]
  10. Byrne, S.; Dawid, A.P. Structural Markov graph laws for Bayesian model uncertainty. Ann. Stat. 2015, 43, 1647–1681. [Google Scholar] [CrossRef] [Green Version]
  11. Li, B.C. Support condition for equivalent characterization of graph laws. Sci. Sin. Math. 2022, 52, 467–474. [Google Scholar]
  12. Dawid, A.P.; Lauritzen, S.L. Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 1993, 21, 1272–1317. [Google Scholar] [CrossRef]
  13. Green, P.J.; Thomas, A. A structural Markov property for decomposable graph laws that allows control of clique intersections. Biometrika 2018, 105, 19–29. [Google Scholar] [CrossRef] [Green Version]
  14. Leimer, H.G. Optimal decomposition by clique separators. Discret. Math. 1993, 113, 99–123. [Google Scholar] [CrossRef] [Green Version]
  15. Dawid, A.P. Conditional independence in statistical theory. J. R. Stat. Soc. B. 1979, 41, 1–15. [Google Scholar] [CrossRef]
  16. Dawid, A.P. Conditional independence for statistical operations. Ann. Stat. 1980, 8, 598–617. [Google Scholar] [CrossRef]
  17. Meek, C. Strong Completeness and Faithfulness in Bayesian Networks; Morgan Kaufmann: San Francisco, CA, USA, 1995. [Google Scholar]
  18. Hoff, P.D. Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Stat. 2007, 23, 103–122. [Google Scholar]
  19. Frydennberg, M.; Lauritzen, S.L. Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 1989, 76, 539–555. [Google Scholar] [CrossRef]
  20. Asmussen, S.; Edwards, D. Collapsibility and response variables in contingency tables. Biometrika 1983, 70, 567–578. [Google Scholar] [CrossRef]
  21. Wang, X.F.; Guo, J.H. Junction trees of general graphs. Front. Math. China 2008, 3, 399–413. [Google Scholar] [CrossRef]
Figure 1. G 1 is a prime graph and G 2 is a reducible graph.
Figure 1. G 1 is a prime graph and G 2 is a reducible graph.
Mathematics 11 01590 g001
Figure 2. A prime decomposition for an undirected graph G.
Figure 2. A prime decomposition for an undirected graph G.
Mathematics 11 01590 g002
Figure 3. A representation of the structural Markov property for non-decomposable undirected graphs: A B is complete and separates A from B.
Figure 3. A representation of the structural Markov property for non-decomposable undirected graphs: A B is complete and separates A from B.
Mathematics 11 01590 g003
Figure 4. A B separates A from B while A B is incomplete.
Figure 4. A B separates A from B while A B is incomplete.
Mathematics 11 01590 g004
Figure 5. The figure in the left is the estimated posterior probabilities of the size of the graphs. The figure in the right is the estimated posterior probabilities of all visited graphs.
Figure 5. The figure in the left is the estimated posterior probabilities of the size of the graphs. The figure in the right is the estimated posterior probabilities of all visited graphs.
Mathematics 11 01590 g005
Figure 6. The figure is the inferred graph with the highest posterior probability.
Figure 6. The figure is the inferred graph with the highest posterior probability.
Mathematics 11 01590 g006
Figure 7. G is obtained from G by removing the edge ( u , v ) .
Figure 7. G is obtained from G by removing the edge ( u , v ) .
Mathematics 11 01590 g007
Figure 8. G is obtained from G by adding the edge ( u , v ) .
Figure 8. G is obtained from G by adding the edge ( u , v ) .
Mathematics 11 01590 g008
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kang, X.; Hu, Y.; Sun, Y. Undirected Structural Markov Property for Bayesian Model Determination. Mathematics 2023, 11, 1590. https://doi.org/10.3390/math11071590

AMA Style

Kang X, Hu Y, Sun Y. Undirected Structural Markov Property for Bayesian Model Determination. Mathematics. 2023; 11(7):1590. https://doi.org/10.3390/math11071590

Chicago/Turabian Style

Kang, Xiong, Yingying Hu, and Yi Sun. 2023. "Undirected Structural Markov Property for Bayesian Model Determination" Mathematics 11, no. 7: 1590. https://doi.org/10.3390/math11071590

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop