# Calculating the Prior Probability Distribution for a Causal Network Using Maximum Entropy: Alternative Approaches

## Abstract

**:**

## 1. Introduction

## 2. Method 1: Sequential Maximum Entropy (SME)

“We thus present a new kind of maximum entropy model, which is computed sequentially. We then show that for all general Bayesian networks, the sequential maximum entropy model coincides with the unique joint distribution.”

#### 2.1. Definition of d-Separation

_{i},v

_{j}> where v

_{i}, v

_{j}belong to V. Suppose also that W is a subset of V and that u, v are vertices outside W. A path, p, between u and v is blocked by W if one of the following is true:

- (1)
- d-sep1: There is a vertex w, in W, on p such that the edges, which determine that w is on p, meet tail-to-tail at w. I.e. There are edges <w,u> and <w,v>, where u and v are adjacent to w on the path p.
- (2)
- d-sep2:There is a vertex w, in W, on p such that the edges, which determine that w is on p, meet head-to-tail at w. I.e. There are edges <u,w> and <w,v>, where u and v are adjacent to w on the path p.
- (3)
- d-sep3: There is a vertex x on p, for which neither x nor any of its descendents are in W, such that the edges which determine that x is on p meet head-to-head at x. I.e. There are edges <u,x> and <v,x>, where u and v are adjacent to x on the path p, but u and v are not members of W. Note also that W can be empty.

#### 2.2. The SME Technique

## 3. Method 2: Development of the Method of Tribus

#### 3.1. Analysis Using Lagrange Multipliers

^{n}states, ${s}_{0},..,{s}_{{2}^{n}-1}$ with probabilities ${p}_{0},..,{p}_{{2}^{n}-1}$. It is required that a stationary point for the entropy, H, be discovered, whilst conforming to the system constraints, where:

_{0}, in all the systems to be considered in this paper, thus:

^{_}0.7${p}_{1}$.

#### 3.2. Updating the Lagrange Multiplier for a Linear Constraint

#### 3.3. Updating the Lagrange Multiplier for an Independence Constraint

#### 3.4. Algorithm and Data Structures

- Using a standard procedure to normalise the state probabilities.
- Updating the estimate of state probabilities using the constraints.

## 4. Method 3: A Variation on Method 2

## 5. µ-Notation

#### 5.1. Example 1

#### 5.1.1. Linear Constraints

#### 5.1.2. Independence Constraints

#### 5.2. Using Method 3 with µ-Notation

#### 5.2.1. Grg-Elimination

#### 5.2.2. Analysis Phase 1: Determination of the Independence Multipliers

#### 5.2.3. Analysis Phase 2: Proof that c cid d | b

#### 5.2.4. Tokenised Algebra

^{_}{02 + 23}.{10g + 31}

#### 5.2.5. Analysis Phase 3: Proof that b cid c | a

## 6. The Embedded Lozenge

#### 6.1. A More General Example, Emb-LozN

- (1)
- It is directly linked to both ancestors and descendants from among the new elements.
- (2)
- There are paths which bypass such a vertex.

#### 6.2. Further Testing

## 7. Discussion

#### 7.1. Properties

- Operates on complete networks,
- Requires d-seps1&2 to propagate probabilities,
- Suffers from rapid exponential complexity.

- Operates on complete and incomplete networks,
- Requires both triangulation and moral independencies,
- Can be used to validate CNs methodologies,
- Can be used to detect the stationary points for an incomplete
- network by varying the initial conditions,
- Potential for application beyond conventional networks.

- Operates on complete networks only,
- Requires moral independencies alone,
- Potential for wider application,
- Further generalisation of proof needed.

#### 7.2. Further Testing

## 8. Conclusions

## Acknowledgements

## References

- Griffeath, D.S. Computer solution of the discrete maximum entropy problem. Technometrics
**1972**, 14, 891–897. [Google Scholar] [CrossRef] - Rhodes, P.C.; Garside, G.R. Use of maximum entropy method as a methodology for probabilistic reasoning. Knowl. Based Syst.
**1995**, 8, 249–258. [Google Scholar] [CrossRef] - Neapolitan, R.E. Probabilistic Reasoning in Expert Systems: Theory and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 1990. [Google Scholar]
- Anderson, S.K.; Oleson, K.G.; Jenson, F.V.; Jenson, F. HUGIN-A Shell for Building Bayesian Belief Universes for Expert Systems. Proc. Int. Jt. Conf. Artif. Intell.
**1989**, 11, 1080–1085. [Google Scholar] - Lauritzen, S.L.; Spiegelhalter, D.J. Local computations with probabilities on graphical structures and their applications to expert systems. J. Roy. Stat. Soc. B
**1988**, 50, 157–224. [Google Scholar] - Jaynes, E.T. Where do we stand on maximum entropy? In The ME Formalism; The MIT Press: Cambridge, MA, USA, 1979. [Google Scholar]
- Paris, J.B.; Vencovská, A. A Note on the Inevitability of Maximum Entropy. Int. J. Approximate Reasoning
**1990**, 4, 183–223. [Google Scholar] [CrossRef] - Shore, J.E.W.; Johnson, R.W. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inform. Theory
**1980**, IT–26, 26–37. [Google Scholar] [CrossRef] - Williamson, J.; Corfield, A.; Williamson, J. Foundations for bayesian networks. In Foundations of Bayesianism (Applied Logic Series); Kluwer: Dordrecht, The Netherlands, 2001; pp. 75–115. [Google Scholar]
- Williamson, J. Maximising entropy efficiently. Linköping Electron. Artic. Comput. Inform. Sci.
**2002**, 7, 1–32. [Google Scholar] - Markham, M.J. Probabilistic independence and its impact on the use of maximum entropy in causal networks. Ph.D. Thesis, School of Informatics, Department of Computing, University of Bradford, Bradford, England, 2005. [Google Scholar]
- Paris, J.B. On filling-in missing information in causal networks. Int. J. Uncertainty Fuzziness Knowl. Based Syst.
**2005**, 13, 263–280. [Google Scholar] [CrossRef] - De Campos, L.; Moral, S. Independence concepts for convex sets of probabilities. In Proceedings of the Eleventh Annual Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 18–20 August 1995; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 108–115. [Google Scholar]
- Wally, P. Statistical Reasoning with Imprecise Probabilities; Chapman and Hall/CRC: Boca Raton, FL, USA, 1991; pp. 443–444. [Google Scholar]
- Lukasiewicz, T. Credal Networks under Maximum Entropy. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, Stanford, California, USA, July 2000; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2000; pp. 63–70. [Google Scholar]
- Tribus, M. Rational Descriptions, Decisions and Designs; Pergamon Press: Elmsford, NY, USA, 1969; pp. 120–123. [Google Scholar]
- Markham, M.J. Independence in complete and incomplete causal networks under maximum entropy. Int. J. Uncertainty Fuzziness Knowl. Based Syst.
**2008**, 16, 699–713. [Google Scholar] [CrossRef] - Geiger, D.; Verma, T.; Pearl, J. d-Separation: From theorems to algorithms. In Proceedings of the UAI’89 Fifth Annual Conference on Uncertainty in Artificial Intelligence; North-Holland Publishing Co.: Amsterdam, The Netherlands, 1990; pp. 139–148. [Google Scholar]
- Verma, T.S.; Pearl, J. Causal networks: Semantics and expressiveness. In Proceedings of the 4th AAAI Workshop on Uncertainty in AI; North-Holland Publishing Co.: Amsterdam, The Netherlands, 1988; pp. 352–359. [Google Scholar]
- Geiger, D.; Pearl, J. On the logic of causal models. In Proceedings of 4th AAAI Workshop on Uncertainty in AI; North-Holland Publishing Co.: Amsterdam, The Netherlands, 1988; pp. 136–147. [Google Scholar]
- Holmes, D.E. Independence relationships implied by D-separation in the bayesian model of a causal tree are preserved by the maximum entropy model. In Proceedings of the 19th International Worshop on Bayesian Interference and Maximum Entropy Methods, Boise, ID, USA, 2–6 August 1999; pp. 296–307.
- Grove, A.J.; Halpen, J.Y.; Koller, D. Random worlds and maximum entropy. J. Artif. Intell. Res.
**1994**, 2, 33–88. [Google Scholar] - Hunter, D. Uncertain reasoning using maximum entropy inference. In Proceedings of the UAI-Uncertainty in Artificial Intelligence; Elsevier Science: New York, NY, USA, 1986; Volume 4, pp. 203–209. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Garside, G.R.; Rhodes, P.C. Computing marginal probabilities in causal multiway trees given incomplete information. Knowl. Based Syst.
**1996**, 9, 315–327. [Google Scholar] [CrossRef]

Configuration | Steps |
---|---|

Starting with variable a, P(a) is given. | |

Going on to b and adding b ind a (given by local d-sep2 in SME), P(b) is given so P(ab) = P(a).P(b). | |

The independence of a and b determines the distribution of ab and this, together with P(c | ab), allows P(abc) to be calculated, viz:
P(c | ab) = k
_{c | ab} gives P(abc) = k_{c | ab}.P(ab).
| |

Local d-sep2 gives d cid c | b, which implies that d cid ac | b, therefore
P(abcd).P(b) = P(abc). P(bd),
| |

Finally, e cid ab | cd (d-sep2) plus P(e | cd) provides P(abcde). The probability distribution for the variables of Vee-Loz5 has now been found in 5 steps, and the moral indeps b ind a and d cid c | b have been generated. |

Average Number of Explicit Independence Constraints Rqd. |

Method 1 & CNs: nil , Method 2: 8.8, Method 3: 5.4 |

© 2011 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Markham, M.J.
Calculating the Prior Probability Distribution for a Causal Network Using Maximum Entropy: Alternative Approaches. *Entropy* **2011**, *13*, 1281-1304.
https://doi.org/10.3390/e13071281

**AMA Style**

Markham MJ.
Calculating the Prior Probability Distribution for a Causal Network Using Maximum Entropy: Alternative Approaches. *Entropy*. 2011; 13(7):1281-1304.
https://doi.org/10.3390/e13071281

**Chicago/Turabian Style**

Markham, Michael J.
2011. "Calculating the Prior Probability Distribution for a Causal Network Using Maximum Entropy: Alternative Approaches" *Entropy* 13, no. 7: 1281-1304.
https://doi.org/10.3390/e13071281