A Novel Qualitative Maximum a Posteriori Estimation for Bayesian Network Parameters Based on Computing the Center Point of Constrained Parameter Regions

Introducing parameter constraints has become a mainstream approach for learning Bayesian network parameters with small datasets. The QMAP (Qualitative Maximum a Posteriori) estimation has produced the best learning accuracy among existing learning approaches. However, the rejection-acceptance sampling employed in the QMAP algorithm for determining average BN(Bayesian Network) parameter values is time-consuming, particularly when the number of parameter constraints is large. This paper proposes a new analytical approach that enhances the learning efficiency of the QMAP algorithm without reducing its learning accuracy by treating the average value of the parameters as the center point of the constrained parameter region, which is a much more efficient method than the rejection-acceptance sampling method employed in the traditional QMAP algorithm. First, a novel objective function is designed and a constrained objective optimization model is constructed based on parameter constraints. Second, the constructed model is employed to obtain the center point of the constrained parameter region based on its boundary points, and the average parameter value is the average of all boundary points. The results obtained from a large number of simulation experiments with four benchmark Bayesian networks demonstrate that of the parameter learning accuracy of the proposed algorithm is slightly better than that the original QMAP algorithm under specific conditions, and the computational efficiency is substantially increased under all conditions.


I. INTRODUCTION
A Bayesian network (BN) is a statistical model that represents a group of variables as nodes in a directed acyclic graph, and denotes the conditional dependencies between variables as the graph edges. As such, a BN contains a directed acyclic graph structure and network parameters reflecting the conditional dependency distributions between variables. The BN concept was first proposed by Professor Judea Pearl of UCLA in 1986 [1]. A BN can adopt multi-informational variables, and can infer probable outcomes via data fusion and bi-directional reasoning. Moreover, a BN can colligate prior information with current information to increase the accuracy The associate editor coordinating the review of this manuscript and approving it for publication was Orazio Gambino . and believability of inferences. As such, BNs have become powerful tools for addressing statistical problems, and they have been wildly applied in many fields, such as geological hazard prediction [2], gene analysis [3], medical diagnosis [4]- [6], reliability analysis [7], pattern classification [8], fault diagnosis [9], language recognition [10], [11], and action recognition [12].
The construction of an accurate BN model is the core task in BN-related research. Similar to the neural networks [13], [14], the learning of BN model parameters based on existing data is very important. This parameter learning task is generally easy when sufficient data samples are available. However, sufficient data samples can be difficult to obtain due to many reasons such as measurement equipment failure or a natural scarcity of data due to rare conditions, such as VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ when the statistical problem involves rare geological hazards or rare diseases, or when the data acquisition cost is quite expensive. This condition can severely limit the feasibility of BNs. Therefore, numerous BN modeling methods based on small datasets have been developed, and the introduction of parameter constraints has become a mainstream approach applied to address the condition. The expert knowledge employed in BN parameter learning can be expressed by the following eight types of parameter constraints [32]: axiomatic constraints, range constraints, intra-distribution constraints, cross-distribution constraints, inter-distribution constraints, approximate-equality constraints, additive synergy constraints, and product synergy constraints. These constraints are defined in detail in Subsection 2.4. The methods developed for BN parameter learning with small datasets can be divided into two types: parameter learning methods with a single type of constraint and parameter learning methods with multiple types of constraints. Among the single constraint type methods, Wittig and Jameson [15] proposed a constrained BN parameter learning algorithm that employed cross-distribution parameter constraints. This algorithm first employs the cross-distribution parameter constraints to build a penalty function. Second, the penalty function is combined with an entropy function to obtain an optimization model. Finally, the optimization model is solved using the adaptive probability networks method. Feelders and Gaag [16], [17] regarded parameter learning as an isotonic regression problem. This algorithm first obtains the initial BN parameters via the maximum likelihood (ML) estimation. Then, the parameters are regulated the minimum lower sets algorithm to ensure that the regulated parameters satisfy the cross-distribution constraints. Altendorf et al. [18] also suggested a constrained BN parameter learning algorithm employing cross-distribution constraints. Here, an objective function is defined by integrating the cross-distribution constraints into an entropy function that is solved using the gradient-descent algorithm. Zhou et al. [19] proposed a constrained BN parameter learning algorithm called constrained optimization with a flat prior. The method improves the accuracy of the BN parameters obtained via the ML estimation using a flat prior and cross-distribution constraints, and applies sequential quadratic programming to solve the improved ML function. Di et al. [20], [21] proposed a constrained BN parameter learning algorithm that employed intra-distribution constraints. Here, the intra-distribution constraints are translated into the hyper-parameters of a Dirichlet distribution or a beta distribution, and a maximum a posteriori algorithm is applied to obtain the BN parameters. Ren et al. [22] proposed a constrained BN parameter learning algorithm that employed range constraints. This algorithm converts the range constraints into the hyper-parameters of a beta distribution, and applies a maximum a posteriori algorithm to obtain the BN parameters.
While reasonable BN parameter learning has been obtained using single-type constraints, these constraints can only represent a single type of expert knowledge.
However, many different types of expert knowledge can be used in the BN parameter learning process. As a result, multiple constraint type methods have gained increasing attention in the last few years. For example, Niculescu et al. [23] and Campos and Qiang [24] proposed BN parameter learning algorithms based on convex optimization, in which the ML function is treated as the objective function and multiple types of parameter constraints are regarded as the feasible solution region. Liao and Qiang [25] introduced multiple types of parameter constraints into the BN parameter learning process under the condition of missing data using the expectation-maximization algorithm. Rui et al. [26] proposed the qualitative maximum a posteriori (QMAP) algorithm for BN parameter learning. The QMAP algorithm is an a posteriori estimation approach that incorporates both quantitative data and qualitative constraints. This approach requires that a sufficient number of parameters be sampled from within the constrained parameter region using a rejection-acceptance sampling strategy to obtain the mean values of the parameters. In addition, the hyper-parameters of the Dirichlet distribution are determined as the products of an equivalent sample size and the mean values of the sampled parameters. The optimal parameters are then computed as interpolations of the sample observations and the obtained hyper-parameters. Guo et al. [27] found that the BN parameters learned by the QMAP algorithm could violate the parameter constraints, and accordingly proposed the further constrained QMAP algorithm. The algorithm regulates the QMAP estimation by replacing data estimation with a further constrained estimation conducted via convex optimization. Zhou et al. [28], [29] proposed the multinomial parameter learning with constraints algorithm. First, the frequency of the configuration states of particular child and parent nodes is counted. Second, an auxiliary BN model is built by integrating both the sample data and the parameter constraints. Finally, the optimal parameters are computed as the mean values of the probability distribution. Guo et al. [30] proposed a dual constrained estimation method employing both intra-distribution parameter constraints and cross-distribution constraints.
Compared with existing BN parameter learning algorithms, the above-mentioned QMAP algorithms have been demonstrated to provide the best BN parameter learning accuracy with small datasets [30]. However, the rejectionacceptance sampling strategy used in these QMAP algorithms is time-consuming, particularly when the number of constraints is large, which greatly decreases the computational efficiency of the algorithms. This paper addresses this issue by proposing a new analytical approach that enhances the learning efficiency of the QMAP algorithm without reducing its learning accuracy. The proposed approach treats the average value of the parameters as the center point of the constrained parameter region, which is a much more efficient method than the rejection-acceptance sampling method employed in the QMAP algorithm. First, a novel objective function is designed and a constrained objective optimization model is constructed based on parameter constraints.
Second, the constructed model is employed to obtain the center point of the constrained parameter region based on its boundary points, and the average parameter value is determined as the average of the boundary points. The results obtained for a large number of simulation experiments with four benchmark networks demonstrate that the parameter learning accuracy of the proposed algorithm is slightly better than that of the original QMAP algorithm under specific conditions, and the computational efficiency is substantially increased under all conditions.

A. BAYESIAN NETWORKS
For each node X i , where i = 1, 2, . . . , n, of a BN composed of n nodes, we denote the parents of X i as π(X i ). A parameter represents the conditional probability P(X i |π (X i )) reflecting the conditional dependencies between π(X i ) and π(X i ). If the conditional distributions are specified for all n nodes of the BN, the joint probability can be calculated as follows: Decomposing the joint probability according to (1) provides not only a compact representation of the joint probability but also provides an efficient implementation of reasoning. Node X i could represent a discrete variable or a continuous variable. This paper considers only discrete variables.

B. PARAMETER LEARNING IN A BAYESIAN NETWORK
The discussion in the above subsection indicates that the BN parameter learning process involves estimating conditional probability distributions from available data pertinent to the problem to which the BN is applied. Two classical methods are typically applied for BN parameter learning, which include ML estimation and Bayesian estimation. This parameter estimation process can be presented in terms of the ML estimation algorithm as an example. Here, when the dataset is complete (i.e., it has no missing data), parameter estimation can be expressed as a maximization of the loglikelihood function (θ|D), as follows: where q i is the number of parent node states, r i is the number of child node states, N ijk is the number of nodal configurations that satisfy X i = k and π(X i ) = j, and θ ijk is the maximum likelihood estimation of the BN parameter for the ith node when it is in the k-th state and its parent nodes are in the jth state. θ ijk is computed as follows: where N ij is the number of nodal configurations of the ith node that satisfy π(X i ) = j. The estimation of network parameters can be decomposed into the product of independent estimations for individual nodes using the decomposability property of a BN illustrated in (1).

C. SAMPLE COMPLEXITY OF BN PARAMETER LEARNING
The ML estimation algorithm is computationally efficient, and is therefore ideal for accurately learning BN parameters if the dataset is sufficiently large. Unfortunately, datasets are often small, which substantially decreases the accuracy of the ML estimation algorithm. However, determining the dataset size that can ensure the accurate estimation of BN parameters using the ML estimation algorithm is a problem. In this regard, Dasgupta [31] provided a lower dataset size limit for a known BN structure. For a BN composed of n Boolean nodes, where no node has more than k parents, the lower dataset size limit is calculated with confidence (1 − δ) as follows: where ε is the error rate, which is often computed as ε = nσ , for a small arbitrary constant σ .

D. COMMON PARAMETER CONSTRAINTS
The eight common types of parameter constraints can be formally defined as follows.
(1) Axiomatic constraints These constraints require that the sum of the BN parameters obtained from the same distribution is 1, which is a basic law from probability theory.
(2) Range constraints Here, α ijk is a lower bound and β ijk is an upper bound. These constraints restrict BN parameters to specified intervals.
As discussed above, these constraints describe relationships between parameters obtained from the same distribution, where the parameters have the same parent node configuration state j and different child node states k and k . Cross-distribution (4) constraints As discussed above, these constraints describe relationships between two parameters with the same child node state k and different parent node configuration states j and j . Inter-distribution (5) constraints These constraints describe relationships between the parameters of different nodes. Approximate-equality (6) constraints These constraints define close relationships between any two parameters.
(7) Additive synergy constraints These constraints describe the comparative relationships between the sums of two parameters with different parent node configuration states. (8) Product synergy constraints These constraints describe the comparative relationships between the products of two parameters with different parent node configuration states.

E. HOW TO OBTAIN THE PARAMETER CONSTRAINTS
The parameter constraints come from expert knowledge or domain knowledge. For example, smokers are more likely to get cancer than nonsmokers. If C = 1 means that a person has cancer, C = 0 means that the person does not have cancer. If S = 1 means that a person smokes, S = 0 means that the person does not smoke. This domain knowledge can be expressed as

III. CENTER POINT OF A CONSTRAINED PARAMETER REGION A. ANALYSIS OF LOW-DIMENSIONAL CONSTRAINED PARAMETER REGIONS
We visualize two-dimensional (2D) and three-dimensional (3D) constrained parameter regions to analyze the functionality of constrained parameter regions in the BN parameter learning process. For the 2D constrained parameter condition, it is assumed that the BN parameters θ i (i = 1, 2) satisfy the following constraints.
The constraints in (13) can be illustrated as shown in Figure 1, where the line segment AC is the constrained region. To determine the center point of line segment AC, the coordinates of points A and C must first be obtained, and the center point is established as the average value of the two end points. We note from Figure 1 that the actual task of obtaining the coordinates of points A and C is to acquire the boundary points of the constrained region, which is a complex problem, particularly for high-dimensional constrained parameter regions. We address this problem by modeling the acquisition of the boundary points of a constrained region as a constrained optimization problem. Under the 2D constrained parameter condition, the constrained optimization models to obtain the coordinates of points A and C are respectively given as follows.
The basic process treats the boundary points as the nearest points to the vertices of the constrained parameter region. As shown in (14) and (15), the vertices are the points (0,1) and (1,0). Similarly, for the 3D constrained parameter condition, it is assumed that the BN parameters θ i (i = 1, 2, 3) satisfy the following constraints.  The constraints in (16) can be illustrated as shown in Figure 2, where the triangle CDE is the constrained region.
The center point of the triangle CDE is obtained by acquiring the coordinates of points C, D, and E. As was conducted for the 2D case, the coordinates of points C, D, and E can be respectively obtained as follows.
Then, the center point is established as the average value of the three boundary points.

B. APPROXIMATE CENTER POINT CALCULATION FOR A HIGH-DIMENSIONAL CONSTRAINED PARAMETER REGIONS
The method proposed in Subsection 3.1 for acquiring the boundary points of a constrained region are now employed to obtain an approximate calculation of the center point of a high-dimensional constrained parameter region. The method is illustrated using the simple BN in Figure 3 with BN parameters i θ i (i = 1, 2, 3, 4). The corresponding parameter constraints D are given as follows.
The main task is constructing the objective function to obtain the boundary points of the constrained parameter region for the BN in After obtaining the boundary points of the constrained parameter region using (21)- (24), the center point is defined as the average value of the four boundary points.
Again, the center point is defined as the average value of the four boundary points obtained from (25)- (28). It is noteworthy that, if the cross-distribution θ 1 > θ 3 does not exist, the objective function can be decomposed into lowdimensional objective functions using the local independence of BN parameters. The specific process for the BN in Figure 3 converts the original 4D optimization problem into a 2D optimization problem as follows. The corresponding parameter constraints D 1 are given as formula (29).
We note from (30)-(33) that the complexity of the original 4D optimization problem is greatly reduced. According to the above processes, the method for approximating the center of a constrained BN parameter region based on basis vectors is given as follows. We define the set of estimators for an n-dimensional BN as θ = (θ ij 1 k 1 , θ ij 1 k 2 , . . . , θ ij 1 k m , θ ij 2 k 1 , . . . , θ ij q k 1 , . . . , θ n ). When the parameter constraints contain (1)-(4) in Subsection 2.4, the number of nodes in child states is m, the number of nodes in parent states is q, and the standard basis vectors in n-dimensional space are e 1 , e 2 , . . . , e n ,where (1, 0, 0, . . . , 0), . . . , e n = n (0, 0, 0, . . . , 1). Therefore, the objective function for obtaining the boundary points of the constrained parameter region is given as follows.
We note from Figure 4 that m q improved vectors are employed to construct the objective function. As such, m q We note from Figure 4 that m q improved vectors are employed to construct the objective function. As such, m q mq-dimensional objective functions must be solved, which represents a substantial computational burden for obtaining the BN parameters. Therefore, we reduce the computational burden by combining locally decomposed BN parameters, which decomposes (35) into a set of low-dimensional optimization problems. If the cross-distribution constraint does not exist, (35) can be converted to the following form. θ rang = arg min · · · · · · · · · · · · · · · 0 ≤ θ ij w k m ≤ 1 θ ij w k 1 + θ ij w k 2 + · · · + θ ij w k m = 1 (38) Here, w = 1, 2, . . . , q, and mq m-dimensional objective functions must be solved, which reduces the complexity of the proposed algorithm.

IV. IMPROVED QMAP ALGORITHM BASED ON THE CENTER POINT OF THE CONSTRAINED PARAMETER REGION
The present work proposes the constrained maximum a posteriori (CMAP) algorithm for learning BN parameters, which improves the computational performance of the QMAP algorithm by replacing the rejection-acceptance sampling method with the use of the center point of the constrained BN parameter region. The CMAP algorithm has two forms denoted as CMAP-I and CMAP-II, depending upon the method employed for establishing the center point. Here, the CMAP-I algorithm computes the center point using the basis vectors in (33) while the CMAP-II algorithm computes the center point using the improved vectors considering the axiomatic constraint in (35) when cross-distribution constraints are included in the set of constraints or in (36) if no cross-distribution constraints are included.
In general, the parameter learning process of the CMAP algorithm is similar to that of the QMAP algorithm. First, the log-form score function of the QMAP algorithm is decomposed into a prior likelihood P(θ| , G), where G denotes the known structure of the BN. The data likelihood P(D|θ, G) is calculated as follows: where c = P (D| , G). The prior likelihood is defined according to the parameter constraints. When sampling prior instances are obtained from the parameter constraints using the rejection-acceptance sampling method, the log prior likelihood is expressed as where M ijk is the pseudo prior statistical count for the ith node when it is in state k and its parent nodes are in state j. The data likelihood is a conventional log-likelihood function given as follows.
Finally, the maximum estimation of the QMAP score function is obtained as Here M ijk = A · P(X i = k, π(X i ) = j| ), where the coefficient A can be determined from expert knowledge or via the cross-validation method. P(X i = k, π(X i ) = j| ) is the average value of the parameters sampled from the parameter constraints using the rejection-acceptance sampling strategy, which is computed as follows: where S is the number of sampled parameters that satisfy the parameter constraints. P(X i = k, π(X i ) = j| ) can also be treated as the center point of the parameter constraints. Therefore, the CMAP-I and CMAP-II algorithms compute P(X i = k, π(X i ) = j| ) using the proposed methods presented in Subsection 3.2.
The process flow of the CMAP algorithm can be summarized as follows.
Step 1: Count the numbers of observations N ijk and N ij from the dataset.
Step 2: Construct the objective function to obtain the boundary points of the constrained parameter region, and obtain the center point of the region from the boundary points using the method presented in Subsection 3.2.
Step 3: Compute the maximum a posteriori (MAP) estimation using (43) with the best value of A obtained using the cross-validation method, and adopt the center point of the parameter constraints as the average parameter value.

V. EXPERIMENTS
In the following experiments, we evaluate the performances of eight BN learning algorithms for four typical benchmark BNs varying in size from small to medium, large, and very large. The BN learning algorithms include the ML algorithm, the constrained ML (CML) algorithm, the maximum entropy (ME) algorithm, the constrained ME (CME) algorithm, the MAP algorithm, the QMAP algorithm, and the proposed CMAP-I and CMAP-II algorithms. It is noted that the flat prior was used in the experiments for the MAP algorithm, and the best value of the coefficient A was initially set between 1 and 20 and determined by the cross-validation method for the QMAP and CMAP algorithms. The BNs used in the experiments are summarized in Table 1. The learning accuracy and learning efficiency performances of the different algorithms are evaluated. The learning accuracy is evaluated using the Kullback-Leibler (KL) divergence, which expresses the divergence between the true parameters and the learned parameters. Therefore, the learning accuracy increases as the KL divergence decreases. The learning efficiency is evaluated according to the required computation time.

A. EXPERIMENTAL SETTINGS
The experimental datasets were composed of from 20 to 200 entries in increments of 20 to determine the effect of the dataset size on the performance of the eight algorithms. The parameter constraints were generated from the true parameters of the BNs, and the maximum number of constraints for each node was 30. We employed two experimental groups, denoted as constraint groups A and B. Constraint group A conducted BN parameter learning using a set of parameter constraints that included axiomatic constraints, range constraints, intra-distribution constraints, and crossdistribution constraints while constraint group B employed all the same types of parameter constraints except for the crossdistribution constraints.
The average KL divergence values and computation times obtained by the eight algorithms for the four BNs under the different dataset sizes with constraint group A are listed in Tables 2 and 3, respectively, where the best results are highlighted in bold.

B. LEARNING ACCURACY ANALYSIS
We note from Table 2 that the learning accuracy of all algorithms increased by different degrees for all BNs as the dataset size increased. In most cases, the QMAP algorithm provided the highest learning accuracy of all algorithms considered. However, the KL divergence values of the CMAP-I and CMAP-II algorithms were less than twice the magnitude of those of the QMAP algorithm, but they were less than the KL divergence values of the other algorithms. We also   note that the CMAP-II algorithm uniformly obtained higher learning accuracies than the CMAP-I algorithm.

C. LEARNING EFFICIENCY ANALYSIS
We note from Table 3 that the computation time of all algorithms increased as the dataset size increased. The ML and MAP algorithms uniformly required the least amount of computation time. However, the proposed CMAP-I and CMAP-II algorithms provided better learning efficiency than all other algorithms considered. Meanwhile, the CMAP-I and CMAP-II algorithms required considerably less computation time than the QMAP algorithm. In fact, the QMAP algorithm required more computation time than the CMAP-I algorithm by a factor greater than three.
The KL divergence values and computation times obtained by the eight algorithms for the four BNs under the different dataset sizes with constraint group B are listed in Tables 4 and  5, respectively, where the best results are highlighted in bold. Moreover, the KL divergence values and computation times obtained by the eight algorithms for the Alarm BN are given in Figures 5 and 7 with respect to dataset size, respectively. We note from Figure 5 that the KL divergence values VOLUME 10, 2022  obtained by the QMAP and CMAP-II algorithms are very close. Therefore, these results are reproduced in Figure 6 to more adequately compare these two results.

D. LEARNING ACCURACY ANALYSIS
We note from Table 4 and Figure 5 that the learning accuracy of all algorithms considered increased by different degrees for all BNs as the dataset size increased. In addition, the CMAP-II algorithm generally provided the highest learning accuracy of all algorithms considered. The KL divergence values obtained by the CMAP-II algorithm shown in Figure 5 were very close to those obtained by the QMAP algorithm for the Alarm BN, but closer inspection ( Figure 6) indicates that these values were always slightly less than those obtained by the QMAP algorithm, indicating that the CMAP-II algorithm provided uniformly higher learning accuracy than the QMAP algorithm for the Alarm BN under all dataset sizes. Finally, we note that the learning accuracy of the CMAP-I algorithm was slightly inferior to that of the QMAP algorithm.

E. LEARNING EFFICIENCY ANALYSIS
We note from Table 5 and Figure 7 that the computation time required by all of the algorithms considered increased as the dataset size increased. Again, the ML and ME algorithms uniformly required the least amount of computation time. In addition, we also note that the proposed CMAP-II algorithm generally required greater computation time than all other algorithms considered, except for the QMAP algorithm, under the condition of constraint group B. This was also often the case for the proposed CMAP-I algorithm as dataset size increased. The CMAP-I and CMAP-II algorithms again required considerably less computation time than the QMAP algorithm. (3) The computational time required by the QMAP algorithm was greater than that of the CMAP-II algorithm by a factor greater than five times.
The above results indicate that the proposed CMAP-I and CMAP-II algorithms have higher computational efficiencies than the QMAP algorithm under all conditions considered, and the learning accuracy of the CMAP-II algorithm is only slightly less than that of the QMAP algorithm when cross-distribution constraints are included within the constraint set. However, the CMAP-II algorithm has the highest learning accuracy of all the existing algorithms when cross-distribution constraints are not included within the constraint set.

VI. CONCLUSION
The rejection-acceptance sampling method employed in the QMAP algorithm for determining average BN parameter values is time-consuming, particularly when the number of parameter constraints is large. Therefore, this paper proposed a new analytical method for determining the average value of BN parameters by applying the center point of the constrained parameter region, which is a far more computationally efficient method. Experiments involving four benchmark BNs demonstrated that the proposed CMAP-I and CMAP-II algorithms have higher computational efficiency than the QMAP algorithm under all experimental conditions considered, and the learning accuracy of the CMAP-II algorithm is only slightly less than that of the QMAP algorithm when cross-distribution constraints are included within the constraint set. The CMAP-II algorithm has the highest learning accuracy of all existing algorithms when cross-distribution constraints are not included within the constraint set, and its computational efficiency is more than five times greater than that of the QMAP algorithm under this condition. The future work on this topic will be divided into two parts. One part is to improve the performance of the proposed algorithm.
This includes the following points: (1) Develop a more efficient solution engine for solving the objective function of the boundary points;(2) Design a more appropriate vector set to obtain the objective function of the boundary points; and(3) Introduce more types of parameter constraints in the proposed algorithm. The other part is to apply the proposed algorithm to solve the practical problems that can be modeled with Bayesian networks, such as fault diagnosis, genetic engineering, image processing, intelligent decision and other problems.