Next Article in Journal
Robustness Evaluation of the Open Source Product Community Network Considering Different Influential Nodes
Previous Article in Journal
Feedback Loops in Opinion Dynamics of Agent-Based Models with Multiplicative Noise
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition

1
School of Mechatronic Engineering, Xi’an Technological University, Xi’an 710021, China
2
School of Electronic Information Engineering, Xi’an Technological University, Xi’an 710021, China
3
General Office, Northwest Institute of Mechanical and Electrical Engineering, Xianyang 712099, China
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(10), 1354; https://doi.org/10.3390/e24101354
Submission received: 10 August 2022 / Revised: 9 September 2022 / Accepted: 21 September 2022 / Published: 24 September 2022

Abstract

:
The Bayesian Network (BN) structure learning algorithm based on dynamic programming can obtain global optimal solutions. However, when the sample cannot fully contain the information of the real structure, especially when the sample size is small, the obtained structure is inaccurate. Therefore, this paper studies the planning mode and connotation of dynamic programming, restricts its process with edge and path constraints, and proposes a dynamic programming BN structure learning algorithm with double constraints under small sample conditions. The algorithm uses double constraints to limit the planning process of dynamic programming and reduces the planning space. Then, it uses double constraints to limit the selection of the optimal parent node to ensure that the optimal structure conforms to prior knowledge. Finally, the integrating prior-knowledge method and the non-integrating prior-knowledge method are simulated and compared. The simulation results verify the effectiveness of the method proposed and prove that the integrating prior knowledge can significantly improve the efficiency and accuracy of BN structure learning.

1. Introduction

Many problems in the real world face uncertainty factors, and artificial intelligence today deals with problems of uncertainty, such as image recognition, speech recognition, intelligent decision-making, and so on. A Bayesian Network (BN) [1], as a type of Graphical Model, has become a powerful tool to treat uncertainty problems because of its strict mathematical foundation, visual and understandable graphic topological structure, as well as a natural expression of reality problems. In recent years, Bayesian Networks have been successfully applied in various fields such as medical diagnosis [2,3], fault diagnosis [4], decision analysis [5], gene analysis [6,7], target identification [8], threat assessment [9,10], and system reliability analysis [11,12].
However, before a BN is used to solve problems in engineering practice, the BN structure needs to be constructed first. Compared with the approximate BN structure search algorithm based on constraint and a heuristic algorithm, the accurate solution of the BN structure learning has recently become a popular topic in academic research. The accurate solution includes the branch and bound method [13], integer programming [14,15], and Dynamic Programming (DP) [16,17]. Although the traditional BN structure learning algorithm based on DP can obtain the global optimal solution, the acquired structure is inaccurate when the sample does not completely contain the information of the real structure, especially when the sample size is small. The complexity problem is also the bottleneck faced by the current DP method. However, in reality, there is a lot of deterministic prior knowledge available in BN modeling. The prior-knowledge distribution of the BN structure is learned to put forward a method of Bayesian network model averaging [18]. The BN structure learning is transformed into a constrained objective function extremum problem with the node order in [19]. Campos [20] considers various deterministic constraints, analyzes the interaction between constraints, and realizes it by the hill climbing method and PC algorithm. The Nicholson [21] Incorporating expert elicited structural information in the CaMML causal discovery program. The results show that with prior knowledge, CaMML has excellent properties. Castelo [22] conducted BN structure learning by specifying the prior knowledge. Borboudakis [23] took the probability of edge and path existence as prior knowledge through rigorous mathematical derivation and conducted structured learning through the BD score and hill-climbing method. The node order prior knowledge is integrated into the process of dynamic programming in [24]. The path constraints are used to learn the BN structure with integer programming [25]. Li proposed a constraint-based hill-climbing approach to incorporate all these constraints [26]. Cussens [27] considered integer linear programming (ILP) as constrained optimization and treated all constraints as cutting planes.
As can be seen, the use of prior knowledge can not only improve the learning accuracy but also the learning efficiency. However, no one has ever studied the use of edge and path prior knowledge in the process of DP structure learning. Therefore, this paper proposes a BN structure learning algorithm based on DP, which combines expert prior knowledge and sample information effectively. The proposed algorithm incorporates edge constraints and path constraints to limit the search process of DP and delete parts of the planning space, so that all search processes meet the prior-knowledge requirements, thus reducing the complexity of the algorithm.
The rest of the paper is organized as follows. Section 2 introduces the theoretical basis of a Bayesian Network. Section 3 introduces in detail the dynamic programming BN structure learning algorithm integrating prior knowledge. In Section 4, the algorithm proposed in this paper is simulated and analyzed in terms of effectiveness and complexity. Section 5 is the conclusion.

2. Theoretical Basis of Bayesian Network

Prior to the general definition of Bayesian networks, several basic concepts in graph theory need to be introduced.
X and Y are two nodes in the directed graph G . X Y means there is an edge from X to Y where X is called the parent node of Y while Y is called the child node of X . For any node X , C h X represents the set of all child nodes while P a X represents the set of all parent nodes of X . If X has no parent node, then X is a root node. The set of all root nodes of G is R o o t G . If X has no child node, then X is a leaf node. The set of all leaf nodes of G is L e a f G . If there are k nodes, i.e., X 1 , , X k in G , and for each i = 1 , , k 1 , there is X i X i + 1 , then there is a directed path from X i to X k , marked as X 1 X k . For any X Y , X is called the ancestor node of Y while Y is the descendant node of X . Likewise, A n X represents the set of all ancestor nodes of X and D e X represents the set of all descendant nodes. If there is a node in G , and the node is its own ancestor node, then the graph has a directed cycle. If the directed graph does not have any directed cycle, then it is a Directed Acyclic Graph (DAG).
A Bayesian Network consists of a DAG and a Conditional Probability Table (CPT), and its complete definition is as follows:
Definition 1 [28].
A Bayesian Network is a binary group 1 G , Θ , in which G = V , E represents the structure of the Bayesian network, a DAG where V = X 1 , X 2 , , X n represents a set of random variables, and E is a directed edge set indicating the nature of causal association between variables. Θ = P ( X i | P a ( X i ) ) : X i V is a Conditional Probability Table (CPT).
Definition 2 [28].
(node order) A node order o refers to the linear arrangement of some variables in which X i X j means X i is in front of X j . The node order o is the node order of G . If and only if for arbitrary X i , X j v a r i o there is X i X j in o , then X j cannot be an ancestor node of X i .
Theorem 1 [28].
With BIC score as the criterion, in an optimal Bayesian network, any node has at most log 2 N / log N parent nodes, where N is the number of samples. This article refers to log 2 N / log N as n m p (max number parents).

3. BN Structure Learning Algorithm for Dynamic Programming Integrating Prior-Knowledge

3.1. Dynamic Programming Algorithm

A Bayesian network structure learning algorithm based on dynamic programming is a process of accurately solving mathematical programming problems, and with exponential computational complexity, it is limited by the number of nodes. The state transition equations of dynamic programming are:
max S c o r e V = max X V max S c o r e V \ X + max S c o r e X , V \ X ,
b e s t s c o r e X , V \ X = max S c o r e X , V \ X = max P a X V \ X S c o r e X , P a X ,
where V is a set of variables, X is a leaf node in the optimal structure, and S c o r e is a decomposable scoring function [29]. Equations (1) and (2) connect the relationship between the whole structure and its substructures, and the optimal network on the remaining nodes V \ X is recursively constructed through the above process until the remaining nodes are a variable. All the child node sets form a Hasse Diagram, showing the whole process of dynamic programming. When the DP algorithm calculates from top to bottom, the root node is determined first, and then the leaf nodes that are gradually added to the remaining nodes are universal set variables. When the DP algorithm calculates from bottom to top, leaf nodes are determined first, and then the root nodes that are gradually added to the remaining nodes are empty set variables. Because a Hasse Diagram contains the node order information of the network, it is also called the Order Graph. There is another similar graph called the parents graph [17]. Figure 1 shows a node order graph with the number of nodes as n = 4 and the parent node graph of node X 1 .

3.2. Expression of Constraints

In this paper, deterministic prior knowledge is directly transformed into some constraints. Prior knowledge and prior constraints are equivalent concepts. For convenience of expression, constraints are used to refer to prior knowledge later. C is used in this article to refer to a set of constraints representing edges or paths, which are expressed as follows:
  • X Y means X is the parent node of Y . X Y means X cannot be the parent node of Y . e d g e X , Y is used to express any edge constraint between X and Y ;
  • X Y means X is the ancestor node of Y . X Y means X cannot be an ancestor node of Y . In these two cases, Y is called a tail node and is a head node. p a t h X , Y is used to express any X path constraint between X and Y ;
  • Suppose there is an arbitrary node order o and constraint set C in which o and C are consistent. If and only if for arbitrary X 1 , X 2 v a r i o v a r i C , there is X 1 X 2 in o , then X 2 cannot be an ancestor node of X 1 in C .
This paper uses constraints in the following two steps: (1) to limit the construction process of the node order graph. Specifically, some illegal nodes are deleted from the node order graph, which can reduce the complexity, especially the space complexity. (2) A sparse parent node graph and query algorithm are constructed, so that the results of the optimal parent node query can satisfy the constraints. Theorem 2 is given as the basis for realizing constraints.
Theorem 2.
In a given set of constraints on edges or paths e d g e X 1 , Y 1 C , p a t h X 2 , Y 2 C , for any optimal substructure G in the dynamic programming process, if X 1 , Y 1 v a r i G , then there must be e d g e X 1 , Y 1 G . Similarly, if X 2 , Y 2 v a r i G , then there must be p a t h X 2 , Y 2 G .
Proof of Theorem 2.
If e d g e X 1 , Y 1 C and X 1 , Y 1 v a r i G , and if there is no e d g e X 1 , Y 1 in G , then due to the non-aftereffect property of the dynamic programming method, all the extended structures G will not satisfy constraints C , so there must be e d g e X 1 , Y 1 in G . p a t h X 2 , Y 2 C can also be proven in the same way. The proof is completed. □

3.3. Integrating Constraints of Edge

3.3.1. Pruning Node Order Graph

With the given constraints of edges X 1 X 2 , node X 2 in the node order graph needs to be deleted because it violates the constraint: the structures produced by the optimal substructure of this node all satisfy the node order X 2 X 1 which is obviously inconsistent with constraints X 1 X 2 , so it is unnecessary to calculate node X 2 when constructing the node order graph. As can be seen from the above example, if a node in the node order graph violates a constraint, it needs to be deleted. The theorem is given as follows:
Theorem 3.
In a given constraint set C , there is a node U and its set of node order o U in the node order graph, then U needs to be deleted from the node order graph if and only if there is such a X 1 , Y 1 v a r i C , satisfying the condition that X 1 X 2 can be inferred from C and there are X 2 U , X 1 U .
Proof of Theorem 3.
In subsequent nodes of any, U , X 1 is added as a leaf node, so obviously for any o o U , there is X 2 X 1 , which is inconsistent with X 1 X 2 . Suppose for any X 1 X 2 relationship in C , there is no X 2 U , X 1 U , then o 1 in v a r i o 1 = U is consistent with C , and o 2 in v a r i o 2 = V \ U is consistent with C . Moreover, for any X 1 X 2 , there is no X 2 v a r i o 1 or X 1 v a r i o 2 . So o made up of o 1 and o 2 is consistent with C . The proof is completed. □
Theorem 4.
In a given edge constraint set C and the variable set V of the problem domain, when traversing to any node U during the construction of node order graph, make G s = s u b G C , v a r i C \ U . If the resulting new node U X of U must satisfy X V \ v a r i C r o o t G s , then all the constructed nodes in the last node order graph satisfy the constraint C and all deleted nodes violate the constraint C .
Proof of Theorem 4.
The node order graph is constructed from an empty set, which satisfies the constraint C . At this time, no node in the node order graph is deleted. When traversing to any node U , suppose all the existing nodes in the node order graph satisfy C and all the deleted nodes violate C . Then, it is necessary to prove that the new nodes constructed in the node order graph satisfy the constraints and the deleted nodes violate the constraints.
The newly constructed nodes satisfy the constraints: any constructed new node is U X , in which X V \ v a r i C r o o t G s and G s = s u b G C , v a r i C \ U . If X V \ v a r i C X 1 , , X n , because U satisfies the constraint, and the new variable X has nothing to do with constraint C , then obviously U X also satisfies the constraint. If X r o o t s u b G C , v a r i C \ U , the newly added variable X is the remaining variable in v a r i C and it is the root node in the subgraph of the remaining variables of the edge constraint graph. So there is no such ancestor node Y v a r i C \ U X of X in the remaining nodes, causing there to be Y X in C . Therefore, according to Theorem 3, any newly added node satisfies the constraints.
No deleted node satisfies the constraints. Suppose H is deleted: H can be remade by combining H \ X with X , with X as any variable in H . If there is H \ X and H \ X is the node that satisfies the constraint, then X is a non-root node in the corresponding subgraph, but in this subgraph, there must be a corresponding root node Y . There is Y X and Y H , so according to Theorem 3, H does not satisfy the constraint. When there is no arbitrary X to make H \ X satisfy the constraint and if H satisfies the constraint, according to Theorem 3, there is a Y in H and X Y . Then, a node order of constraint C , i.e., X Y Z , can be constructed by using H v a r i C . Take the last variable Z : if Z does not exist, then Z = Y and H \ Z satisfy the constraint, which contradicts the condition. Therefore, the hypothesis is invalid, which proves that the arbitrarily deleted node does not satisfy the constraints. The proof is completed. □
Theorem 3 provides the basis for pruning the node order graph. The most direct way for the pruning is to make judgments on each node U so that they satisfy Theorem 3. However, even the simplified judgment algorithms cannot perform with the best efficiency. Therefore, Theorem 4 proposes a method to construct a node order graph, so that all nodes in the graph satisfy the constraint. According to the method in Theorem 4, we can make full use of the constraint to prune the node order graph, reduce the space complexity, and obtain the optimal structure after the node order graph is constructed.
As the score of sets in the node order graph needs to be queried repeatedly, in order to increase the efficiency of sets querying, this paper designs the hash function of sets in which different sets correspond to different hash values. The hash function is designed as follows: Suppose the set of all variable in the problem domain is X 1 , , X n . Set binary number b with the number of digits as n . For a set U in the node order graph, if X i U , then set the ith place of b to 1, otherwise set it to 0. Finally, convert b to decimal which will be the corresponding hash value.
The specific algorithm flow of the node order graph construction is shown in Algorithm 1.
Algorithm 1. Construction algorithm of node order graph.
Constructing Node Order Graph Based on Edge Constraint
Input: S P G -Sparse parent node graph, G C -edge constraint graph
Output: G -Global optimal structure
1.  P r e v i o u s L a y e r . H a s h T a b l e , P r e v i o U s L a y e r [ ] . s c o r e 0 , V C = v a r i C
2. f o r L a y e r 1   to   n   d o
3.     for each node U In the P r e v i o u s L y a e r  do
4.         V r V c \ U
5.         G r Removing variables of V r And their relative arcs in G c
6.         R Root variables of G r
7.             for each variable X V \ U V r R  do
8.               [ b e s t p a r e n t s   b e s t s c o r e ] G e t B e s t S c o r e X , U , C , S P G
9.            c u r s c o r e U . s c o r e + b e s t s c o r e
10.               if N e w L a y e r [ U X ] Is null
11.                 N e w L a y e r [ U X ] c u r s c o r e , p a r e n t s , H a s h T a b l e
12.             else if c u r s c o r e > N e w L a y e r [ U X ] . s c o r e then
13.              N e w L a y e r [ U X ] c u r s c o r e , p a r e n t s , H a s h T a b l e
14.             end if
15.          end for
16.     end for
17.    P r e v i o u s L a y e r N e w L a y e r
18.   end for
19.  G q u e r y i n g   P r e v i o u s L a y e r . H a s h T a b l e

3.3.2. Construction and Query of Sparse Parent Node Graph

The construction algorithm of the sparse parent node graph is as follows: As the sparse parent node graph stores the information of the first n m p layers of nodes in the complete parent node graph, we first construct a complete parent node graph based on the constraint according to Theorem 4, and then store it in the sparse parent node graph every time a node is constructed. When the first n m p layers are completely constructed, the sparse parent node graph will be obtained. Here is an example to illustrate how to construct a complete parent node graph based on this constraint.
Figure 2 gives some constraints of variables. To calculate a node of X 3 in the graph, such as the node X 1 , X 2 , we only need to compare the scores of X 1 and X 2 , i.e., s c o r e X 1   s c o r e X 2 and s c o r e X 3 , X 1 , X 2 when there is no constraint. Moreover, because X 1 is the parent node in the constraint, only s c o r e X 1 , s c o r e X 3 , X 1 , X 2 are compared now. In addition, the crossed nodes in Figure 2b do not need to be solved because these nodes contain variables X 4 and X 4 is by no means the parent node of X 3 .
Based on the above ideas, the specific algorithm flow of the sparse parent node graph construction, defined as PBDP-EDGE, is shown in Algorithm 2.
Algorithm2. Construction algorithm of sparse parent node graph.
Constructing Sparse Parent Node Graph Based on Edge Constraint
Input:  V -set of all variables, C -set of constraints, s c o r e ( . , . ) -decomposable score function value
Output:  S P G -Sparse parent node graph
1.  for X V  do
2.    s c o r e X , p a r e n t s X
3.      for l a y e r 0   to   n  do
4.          for each node U Such that U V \ X & U = = l a y e r  do
5.              B e s t S c o r e X , U = max Y P , Y P a X B e s t S c o r e X , U \ Y
6.               if  U n o P a X = = & & s c o r e X , U > B e s t S c o r e X , U
7.            B e s t S c o r e X , U s c o r e X , U
8.            Append [scoreX, parentsX] with B e s t S c o r e X , U , b i t n a r i z e U
9.               end if
10.        end for
11.    end for
12.    Sort s c o r e X , p a r e n t s X with s c o r e X In descending
13. end for
14. return SPG ← [score., patrnts.]
The query algorithm idea of the sparse parent node graph is as follows: Suppose δ is a query constraint of X , that is, all the possible p a X in U must satisfy the constraint δ . In other words, the front set of p a r e n t s X that satisfy δ is the best parent node set in U . Furthermore, query constraint δ must satisfy the following two conditions: (1) Y U , (2) C P a X U Y , Y C N P a X = , in which C P a X means it is the set of parent nodes of X and C N P a X means it is not.
The specific implementation is as follows: First, set a bit array v a l i d X of all 1s and with the same length as p a r e n t s X . Then, according to the first condition in δ , first do v a l i d X & p a r e n t s X X i for each X i that satisfies X i V \ U . Then, according to the second condition in δ , do v a l i d X & p a r e n t s X X i for each X i that satisfies X i C P a X U . The purpose of this step is to ensure that all the remaining sets include all the variables in C P a X U . Finally, conduct v a l i d X & p a r e n t s X X i for each X i that satisfies X i C N P a X U . This step eliminates the sets which include variables in any C N P a X U , and the front set in the remaining sets is the best parent node set. Algorithm 3 shows the specific algorithm of the best parent node set query.
Algorithm 3. Query algorithm of best parent node set.
The Optimal Parent Node Set Based on Query Constraints
Input:  V -set of all variables, C -set of constraints, S P G -sparse parent node graph
Output:  b e s t s p a r e n t s . , . -The best parent node set, b e s t s c o r e . , . -The corresponding score
1. v a l i d a l l S c o r e s X
2. for each Y 1 P a X U  do
3.      v a l i d v a l i d & p a r e n t s X Y 1
4. end for
5. for each Y 2 V \ U n o P a X \ P a X  do
6.     v a l i d v a l i d & ~ p a r e n t s X Y 2
7. end for
8. i n d e x f i r s t S e t B i t v a l i d
9. return s c r o r e X i n d e x , p a r e n t s X i n d e x
Here is an example to illustrate the implementation process. As shown in Table 1, from X 1 , X 2 , X 4 , X 5 , find the best parent node sets of X 3 , C P a X 3 = X 1 , X 2 and C N P a X 3 = X 4 . At this point, the remaining candidate sets in the table all satisfy the first condition of δ , that is, they all are subsets of X 1 , X 2 , X 4 , X 5 . If there is no constraint, X 1 , X 5 will be the best parent node set. Next, we need to realize the second condition. Because X 1 , X 2 = C P a X U , do p a r e n t s X X 1 & p a r e n t s X X 2 . The result is shown in the seventh row of the table, in which a value of 1 means that all sets contain X 1 , X 2 . Because C N P a X 3 = X 4 , find p a r e n t s X X 4 . The result is shown in the eighth row of the table in which the value of 1 means none of them contains X 4 . Finally, sum the seventh and the eighth row to obtain the final v a l i d X . The first set X 1 , X 2 , X 5 is the best parent node set satisfying the constraint.

3.4. Integrating Path Constraints

3.4.1. Pruning Node Order Graph

The algorithm of the pruning node order graph by path constraint is the same as that by edge constraint. First, construct a constraint graph G C , then use the algorithm in Table 1 to prune the node order graph, wherein path constraint graph G C of the constraint set C is a directed acyclic graph containing variable v a r i C , and for arbitrary X 1 , X 2 v a r i C , there is an edge X 1 X 2 between X 1 , X 2 in G C , if and only if X 1 X 2 C .

3.4.2. Construction and Query of Sparse Parent Node Graph

As the parent node of Y must contain at least one X or one descendant node of X , when constructing the sparse parent node graph, for Y , it is necessary to store all the parent node sets with the number of variables below n m p . For other variables, the sparse parent node graph is constructed according to the unconstrained condition. The specific construction algorithm of the sparse parent node graph, defined as PBDP-PATH, is shown in Algorithm 4.
Algorithm 4. Construction algorithm of sparse parent node graph.
Constructing Sparse Parent Node Graph Based on Path Constraints
Input: V -set of all variables, C -set of constraints, s c o r e ( . , . ) -decomposable score function value
Output: S P G -sparse parent node graph
1.for X V  do
2.      if X e n d C  do
3.         Construct Full Sparse Parent Graph ( V , X , s c o r e ( . , . ) )
4.      else do
5.         Construct Sparse Parent Graph without Constraints ( V , X , s c o r e ( . , . ) )
6..      end if
7. end for
8. return  S P G s c o r e · , p a r e n t s ·
9. Function Construct Sparse Parent Graph without Constraints ( V , X , s c o r e ( . , . ) )
10.       s c o r e X , p a r e n t s X
11.       for l a y e r 0   to   n  do
12.           for each node P such that P V \ X & P = = l a y e r  do
13.           B e s t S c o r e X , P = max Y P B e s t S c o r e X , P \ Y
14.           if  s c o r e X , P > B e s t S c o r e X , P
15.             B e s t S c o r e X , P s c o r e X , P
16.             append s c o r e X , p a r e n t s X with S c o r e X , P , b i t n a r i z e P
17.          end if
18.         end for
19.      end for
20.      sort with s c o r e X in descending
21.       return  s c o r e X , p a r e n t s X
22. end function
23. Function Construct Full Sparse Parent Graph ( V , X , s c o r e ( . , . ) )
24.       s c o r e X , p a r e n t s X
25.       for each P V \ X  do
26.           Append s c o r e X , p a r e n t s X with S c o r e X , P , b i t n a r i z e P
27.       end for
28.       sort with s c o r e X in descending
29.       return s c o r e X , p a r e n t s X
30. end function
The query algorithm idea of path constraint is as follows: For a given path constraint X Y , to find the best parent node set S of Y in U , if X U , there is at least one Z X d e s X to make Z S . d e s X is the descendant node of X in the structure of U . If X Y , then there is Z S for all Z X d e s X .
The specific query method is as follows: Initialize a bit array v a l i d X of all 1s and with the same length as p a r e n t s X . Conduct v a l i d X & p a r e n t s X X k for each X k that satisfies X k V \ U . For each X i , set an auxiliary bit array C v a l i d of all ones and find the descendant node d e s X i of X i . For each Z X i d e s X i , perform the OR operation C v a l i d | p a r e n t s X Z . Finally, perform the AND operation v a l i d v a l i d & C v a l i d . For each X j , find d e s X j . For each Z X i d e s X i , perform the AND operation v a l i d v a l i d   &   p a r e n t s X Z . Algorithm 5 shows the specific algorithm flow of the best parent node set query.
Algorithm 5. Query algorithm of best parent node set.
The Best Parent Node Set Based on the Path Constraint Query
Input:  V -set of all variables, C set of path constraints, S P G -sparse parent node graph
Output:  b e s t s p a r e n t s . , . -the best parent node set, b e s t s C o r e . , . -the corresponding score
1. v a l i d a l l S c o r e s X
2. for each do
3.     v a l i d v a l i d & ~ p a r e n t s X Y i
4. end for
5. for each Y j   such   that   Y j X C  do
6.          C v a l i d a l l S c o r e s X
7.         for each S Holding that Y j S in G  do
8.            C v a l i d C v a l i d | p a r e n t s X S
9.            C v a l i d C v a l i d | p a r e n t s X Y i
10.     end for
11.     v a l i d v a l i d & C v a l i d
12. end for
13. for each Y k   such   that   Y k X C  do
14.         for each S Holding that Y k S in G  do
15.            v a l i d v a l i d & ~ p a r e n t s X S
16.         end for
17. end for
18. i n d e x f i r s t S e t B i t v a l i d
19. return  s c r o r e s X i n d e x
An example is given below to illustrate the implementation process.
Figure 3 is an example of path constraints, in which C is X 1 Y and X 2 Y . At this point, we need to find the best parent node set S of Y from U = X 1 , X 2 , X 3 , X 4 . Table 2 shows the specific solution process. In the table, parent node sets are selected with part of them as the subset of U , so the first condition of δ has been satisfied. At this point, if there is no constraint, X 2 , X 4 will be the best parent node set. When a constraint is given, perform the OR operation for the line where the elements of d e s X 1 X 1 are and obtain C v a l i d 1 , as in line 7. Perform OR operation for the line where the elements of d e s X 2 X 2 are and obtain C v a l i d 2 , as in line 8. Then, perform the AND operation v a l i d v a l i d & C v a l i d 1 & C v a l i d 2 . At this time, v a l i d equals 1, which shows that there are elements both from d e s X 1 X 1 and d e s X 2 X 2 ; therefore X 3 , X 4 is the best solution at this time.

4. Algorithm Simulation and Analysis

4.1. Validity Verification

In this section, in order to verify the effectiveness, first, an 18-node network is generated by using Matlab constructor. Then, the constructed network, Asia network, and Sachs network are simulated and verified with 20 samples. In order to verify that this method can really integrate constraints, some extreme simulation conditions are set.
1.
The simulation is carried out with the Asia network. All the edge prior knowledge is given, which is verified by the PBDP-EDGE structure. Part of the path prior knowledge is given, specifically 1 6 , 2 6 , 2 8 , 3 7 , 3 8 , 4 7 , and 4 8 , which is verified by the PBDP-PATH structure. The results are shown in Figure 4.
The real network structure of the Asia network is shown in Figure 4a. It can be seen from Figure 4b that training samples contain very little information and can only learn a few edges, and a complete structure cannot be constructed. It can be seen from Figure 4c that the correct structure can be learned even if the sample size is small, which demonstrates the correctness and effectiveness of the integrating edge prior-knowledge algorithm proposed in this paper. It can be seen from Figure 4d that it is obvious that all the learned structures contain these paths ( 1 6 , 2 6 , 2 8 , 3 7 , 3 8 , 4 7 , and 4 8 ).
2.
The simulation is carried out with the Sachs network. Part of the edge prior knowledge is given, which is verified by the PBDP-EDGE structure. Part of the path prior knowledge is given, specifically 1 2 , 1 4 , 1 5 , 1 7 , 1 8 , 2 5 , 2 8 , 3 4 , 4 6 , 5 6 , and 9 11 , which is verified by the PBDP-PATH structure. The results are shown in Figure 5.
The real network structure of the Sachs network is shown in Figure 5a. As can be seen from Figure 5b, the training sample contains little information, only a few edges can be learned, and a complete structure cannot be constructed. It can be seen from Figure 5c that partial correct structures can be learned even if the sample size is small, indicating the correctness and effectiveness of integrating the edge prior-knowledge algorithm proposed in this paper. It can be seen from Figure 5d that it is obvious that all the learned structures contain these paths ( 1 2 , 1 4 , 1 5 , 1 7 , 1 8 , 2 5 , 2 8 , 3 4 , 4 6 , 5 6 , and 9 11 ).
3.
The simulation is carried out with the Constructed network. Part of the edge prior knowledge is given, which is verified by the PBDP-EDGE structure. Part of the path prior knowledge is given, specifically 1 4 , 1 17 , 2 18 , 2 13 , 2 5 , 3 5 , 3 9 , 6 8 , 5 10 , 5 12 , 7 5 , 10 13 , 13 9 , and 15 10 , which is verified by the PBDP-PATH structure. The results are shown in Figure 6.
The real structure of the Constructed network is shown in Figure 6a. As can be seen from Figure 6b, the training samples contain little information, only a few edges can be learned, and a complete structure cannot be constructed. It can be seen from Figure 6c that partial correct structures can be learned even if the sample size is small, indicating the correctness and effectiveness of the integrating constraints of the edge algorithm proposed in this paper. It can be seen from Figure 6d that it is obvious that all the learned structures contain these paths ( 1 4 , 1 17 , 2 18 , 2 13 , 2 5 , 3 5 , 3 9 , 6 8 , 5 10 , 5 12 , 7 5 , 10 13 , 13 9 , and 15 10 ).
Therefore, the above simulation results can prove that the method proposed in this paper is correct and reliable and can be realized no matter what kind of prior knowledge is given.

4.2. Complexity Verification

  • The integrating edge constraint is simulated by the Halifinder network, a large-scaled network, and half of the real edges are randomly selected as prior knowledge. The training sample size is 200, 500, and 1000, respectively. Table 3 shows the simulation results. PBDP (Priors Based DP) indicates the integrating prior-knowledge method, which is measured in seconds. The space cost refers to the size of the array to be set, and the proportion represents the time and space ratio between the PBDP method and DP method.
The path constraint is simulated in the same way, with the results shown in Table 4.
It can be seen from Table 3 and Table 4 that the integrating edge constraint and path constraint can not only improve the scores, but also effectively reduce the complexity of time and space. To sum up, this method can use edge constraints and path constraints to effectively reduce the time and space complexity of the Dynamic Programming algorithm and improve its timeliness significantly.

5. Conclusions

In this paper, the specific process of dynamic planning is analyzed, and its restrictive relationship with edge constraints and path constraints is determined. The prior constraints are used to restrict and guide each link in dynamic planning, and deterministic prior knowledge is integrated into the dynamic planning of BN structure learning. The BN structure learning algorithm of dynamic planning integrating prior knowledge is proposed, and the specific implementation is described in detail. Simulation results show that this algorithm can use edge prior knowledge and path prior knowledge to effectively reduce the time and space complexity of the dynamic programming algorithm. It also reveals the complementary relationship between prior knowledge and learning in BN modeling, that is, only by making full use of prior knowledge and training sample information can an ideal model be obtained. This paper also provides some implications for the breaking through of the node number in the dynamic programming method.

Author Contributions

Conceptualization, Z.L.; methodology, R.D.; software, Y.C.; validation, H.W. and Z.L.; formal analysis, Y.C.; investigation, H.W. and Z.L.; resources, X.L.; data curation, X.S.; writing—original draft preparation, Y.C. and C.H.; writing—review and editing, R.D.; visualization, Y.C.; supervision, Z.L.; project administration, R.D.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Projects supported by the National Natural Science Foundation of China, grant number 62171360.

Data Availability Statement

Data Availability Statement: The true networks of all data sets are known, and they are publicly available (http://www.bnlearn.com/bnrepository (accessed on 15 June 2021)).

Acknowledgments

The authors are grateful to Wang and Ruohai Di teacher for helpful discussions. The authors also thank the anonymous reviewers for critical and constructive review of the manuscript. This work was supported by National Natural Science Foundation of China (62171360).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann Press: San Mateo, CA, USA, 1988; pp. 1–15. [Google Scholar]
  2. Bueno, M.L.; Hommersom, A.; Lucas, P.J.; Lappenschaar, M.; Janzing, J.G.E. Understanding disease processes by partitioned dynamic Bayesian networks. J. Biomed. Inform. 2016, 61, 283–297. [Google Scholar] [CrossRef] [PubMed]
  3. Zhang, X.; Mahadevan, S. Bayesian network modeling of accident investigation reports for aviation safety assessment. Relib. Eng. Syst. Safe 2020, 209, 1–19. [Google Scholar] [CrossRef]
  4. Wang, Z.; Wang, Z.; He, S.; Gu, X.; Yan, Z.F. Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information. Appl. Energy 2017, 188, 200–214. [Google Scholar] [CrossRef]
  5. Shenton, W.; Hart, B.; Chan, T. Bayesian network models for environmental flow decision-making: 1. Latrobe River Australia. River Res. Appl. 2015, 27, 283–296. [Google Scholar] [CrossRef]
  6. Yu, K.; Liu, L.; Li, J.; Ding, W.; Le, T.D. Multi-Source Causal Feature Selection. IEEE Trans. Pattern Anal. Mach. Learn. 2020, 42, 2240–2256. [Google Scholar] [CrossRef] [PubMed]
  7. McLachlan, S.; Dube, K.; Fenton, N. Bayesian Networks in Healthcare: Distribution by Medical Condition. Artif. Intell. Med. 2020, 107, 101912. [Google Scholar] [CrossRef] [PubMed]
  8. Lin, C.; Liu, Y. Target Recognition and Behavior Prediction based on Bayesian Network. Inter. J. Perform. Eng. 2019, 15, 1014–1022. [Google Scholar] [CrossRef]
  9. Sun, H.; Xie, X.; Sun, T.; Zhang, L. Threat assessment method for air defense targets of DBN fleet in the state of missing small sample data. Syst. Eng. Electron. 2019, 41, 1300–1308. [Google Scholar]
  10. Di, R.; Gao, X.g.; Guo, Z. Small data set BN modeling method and its application in threat assessment. Chin. J. Electron. 2016, 44, 1504–1511. [Google Scholar]
  11. Su, C.; Zhang, H. The introduction of human reliability in aircraft combat effectiveness evaluation. Acta Aeron Sin. 2006, 27, 262–266. [Google Scholar]
  12. Wu, Y.; Ren, Z. Mission reliability analysis of multiple-phased systems based on Bayesian network. In Proceedings of the IEEE 2014 Prognostics and System Health Management Conference, Zhangjiajie, China, 24–27 August 2014. [Google Scholar]
  13. Campos, C.; Ji, Q. Efficient Structure Learning of Bayesian Networks using Constraints. J. Mach. Learn. Res. 2011, 663–689. [Google Scholar]
  14. Cussens, J. Bayesian network learning with cutting planes. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, Barcelona, Spain, 14–17 July 2011. [Google Scholar]
  15. Jaakkola, T.; Sontag, D.; Globerson, A.; Meila, M. Learning Bayesian Network Structure using LP Relaxations. J. Mach. Learn. Res. 2010, 9, 358–365. [Google Scholar]
  16. Ott, S.; Imoto, S.; Miyano, S. Finding optimal models for small gene networks. Pac. Symp. Biocomput. 2004, 9, 557. [Google Scholar]
  17. Koivisto, M.; Sood, K. Exact Bayesian Structure Discovery in Bayesian Networks. J. Mach. Learn. Res. 2004, 5, 549–573. [Google Scholar]
  18. Angelopoulos, N.; Cussens, J. Bayesian learning of Bayesian networks with informative priors. Ann. Math. Artif. Intel. 2008, 54, 53–98. [Google Scholar] [CrossRef]
  19. Zhu, M.; Liu, S.; Wang, C. An optimization method based on prior node order learning Bayesian network structure. Acta Autom. Sin. 2011, 37, 1514–1519. [Google Scholar]
  20. Campos, L.; Castellanoa, J. Bayesian Network Learning Algorithms using Structural Restrictions. Int. J. Approx. Reason. 2007, 45, 233–254. [Google Scholar] [CrossRef]
  21. Nicholson, D.; Han, B.; Korb, K.B.; Alam, M.J.; Hope, L.R. Incorporating expert elicited structural information in the CaMML Causal Discovery Program. 2008. Available online: https://bridges.monash.edu/articles/report/Incorporating_Expert_Elicited_Structural_Information_in_the_CaMML_Causal_Discovery_Program/20365395 (accessed on 9 August 2022).
  22. Castelo, R.; Siebes, A. Priors on network structures. Biasing the search for Bayesian networks. Int. J. Approx. Reason. 2000, 24, 39–57. [Google Scholar] [CrossRef] [Green Version]
  23. Borboudakis, G.; Tsamardinos, I. Scoring and searching over Bayesian networks with causal and associative priors. In Proceedings of the 29th International Conference on Uncertainty in Artificial Intelligence, Washington, DC, USA, 12–14 July 2013. [Google Scholar]
  24. Parviainen, P.; Koivisto, M. Finding optimal Bayesian networks using precedence constraints. J. Mach. Learn. Res. 2013, 14, 1387–1415. [Google Scholar]
  25. Chen, E.; Shen, Y.; Choi, A.; Darwiche, A. Learning Bayesian networks with ancestral constraints. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016. [Google Scholar]
  26. Li, A.; Beek, P. Bayesian network structure learning with side constraints. In Proceedings of the 9th International Conference on Probabilistic Graphical Models, Prague, Czech, 11–14 September 2018. [Google Scholar]
  27. Bartlett, M.; Cussens, J. Integer linear programming for the Bayesian network structure learning problem. Artif. Intell. 2017, 244, 258–271. [Google Scholar] [CrossRef]
  28. Zhang, L.; Guo, H. Introduction to Bayesian Nets; Science Press: Beijing, China, 2006. [Google Scholar]
  29. Malone, B.; Yuan, C.; Hansen, E. Memory-Efficient Dynamic Programming for Learning Optimal Bayesian Networks. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
Figure 1. Node order graph and parent node graph of X1: (a) Node order graph of four nodes; (b) Parent node graph of X1.
Figure 1. Node order graph and parent node graph of X1: (a) Node order graph of four nodes; (b) Parent node graph of X1.
Entropy 24 01354 g001
Figure 2. Construction of complete parent node graph of X 3 with given edge constraint: (a) Edge constraint; (b) Construction of complete parent node graph of X 3 .
Figure 2. Construction of complete parent node graph of X 3 with given edge constraint: (a) Edge constraint; (b) Construction of complete parent node graph of X 3 .
Entropy 24 01354 g002
Figure 3. Example of path constraints.
Figure 3. Example of path constraints.
Entropy 24 01354 g003
Figure 4. Asia network structure simulation diagram.
Figure 4. Asia network structure simulation diagram.
Entropy 24 01354 g004
Figure 5. Sachs network structure simulation diagram.
Figure 5. Sachs network structure simulation diagram.
Entropy 24 01354 g005
Figure 6. Constructed network structure simulation diagram.
Figure 6. Constructed network structure simulation diagram.
Entropy 24 01354 g006
Table 1. Example of query process based on constraints.
Table 1. Example of query process based on constraints.
1 p a r e n t s X 3 X 1 , X 5 X 1 , X 2 , X 4 X 1 , X 2 , X 5 { } X 1
2 s c o r e s X 3 54321
3 p a r e n t s X 3 X 1 11101
4 p a r e n t s X 3 X 2 01100
5 p a r e n t s X 3 X 4 01000
6 p a r e n t s X 3 X 5 10100
7Operation of C P a X 3 01100
8 Operation   of   C N P a X 3 10111
9 v a l i d 00100
Table 2. Example of query process based on path constraint.
Table 2. Example of query process based on path constraint.
1 p a r e n t s Y X 2 , X 4 X 4 X 3 , X 4 { } X 3
2 s c o r e s Y 54321
3 p a r e n t s Y X 1 00000
4 p a r e n t s Y X 2 10000
5 p a r e n t s Y X 3 00101
6 p a r e n t s Y X 4 11100
7 C v a l i d   of   d e s X 1 X 1 00101
8 C v a l i d   of   d e s X 2 X 2 11101
9 v a l i d 00101
Table 3. Simulation comparison of integrating constraints of edge.
Table 3. Simulation comparison of integrating constraints of edge.
Sample SizeApproachPBDP-EDGEDPProportion
200PIC Score−670,352.217−696,271.251
Runtime3723.05331,114.2420.12
Space52,2232621430.199
500PIC Score−629,520.727−635,543.295
Runtime3141.87031,925.5660.098
Space52,2232621430.199
1000PIC Score−629,520.727−630,672.929
Runtime3218.29530,909.4470.104
Space52,223262,1430.199
Table 4. Simulation comparison of integrating path constraints.
Table 4. Simulation comparison of integrating path constraints.
Sample SizeApproachPBDP-PATHDPProportion
200PIC Score−654,151.15−684,005.61
Runtime5486.11329,359.5880.187
Space44,159262,1430.168
500PIC Score−633,710.86−637,161.52
Runtime5035.86729,785.5410.169
Space44,159262,1430.168
1000PIC Score−629,561.51−630,698.96
Runtime5323.84627,892.2180.191
Space44,159262,1430.168
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lv, Z.; Chen, Y.; Di, R.; Wang, H.; Sun, X.; He, C.; Li, X. Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition. Entropy 2022, 24, 1354. https://doi.org/10.3390/e24101354

AMA Style

Lv Z, Chen Y, Di R, Wang H, Sun X, He C, Li X. Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition. Entropy. 2022; 24(10):1354. https://doi.org/10.3390/e24101354

Chicago/Turabian Style

Lv, Zhigang, Yiwei Chen, Ruohai Di, Hongxi Wang, Xiaojing Sun, Chuchao He, and Xiaoyan Li. 2022. "Dynamic Programming BN Structure Learning Algorithm Integrating Double Constraints under Small Sample Condition" Entropy 24, no. 10: 1354. https://doi.org/10.3390/e24101354

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop