Prior-Knowledge-Driven Local Causal Structure Learning and Its Application on Causal Discovery Between Type 2 Diabetes and Bone Mineral Density

Type 2 diabetes (T2DM), one of the most prevalent chronic diseases, affects the glucose metabolism of the human body, which decreases the quantity of life and brings a heavy burden on social medical care. Patients with T2DM are more likely to suffer bone fragility fracture as diabetes affects bone mineral density (BMD). However, the discovery of the determinant factors of BMD in a medical way is expensive and time-consuming. In this paper, we propose a novel algorithm, Prior-Knowledge-driven local Causal structure Learning (PKCL), to discover the underlying causal mechanism between BMD and its factors from the clinical data. Since there exist limited data but redundant prior knowledge for medicine, PKCL adequately utilize the prior knowledge to mine the local causal structure for the target relationship. Combining the medical prior knowledge with the discovered causal relationships, PKCL can achieve more reliable results without long-standing medical statistical experiments. Extensive experiments are conducted on a newly provided clinical data set. The experimental study of PKCL on the data is proved to highly corresponding with existing medical knowledge, which demonstrates the superiority and effectiveness of PKCL. To illustrate the importance of prior knowledge, the result of the algorithm without prior knowledge is also investigated.


I. INTRODUCTION
Diabetes mellitus is one of the most common chronic diseases featured by high levels of blood glucose and type 2 diabetes mellitus (T2DM) is the most frequent subtype of diabetes mellitus. T2DM and its complications cause a variety of health problems and they bring heavy economic burdens to individuals worldwide [1]. Osteoporosis is a common skeletal system disease characterized by decreased bone density and The associate editor coordinating the review of this manuscript and approving it for publication was György Eigner . normal bone microstructure deterioration predisposing to an increased risk of bone fracture [2]. Osteoporosis leads to a decrease in physical function and the impairment of quality of life. Moreover, bone fracture due to osteoporosis causes increased disability rate, mortality, and a great economic burden on family and society [3].
Measurement of bone mineral density by dual X-ray absorptiometry (DXA) is the most commonly used approach to diagnose osteoporosis [4]. Decreased BMD reflects the reduction in bone strength that is closely linked to increased bone fracture risk. Osteoporosis-related bone fracture frequently occurs in patients with T2DM [2], [5]. Notably, although patients with T2DM have higher risks of osteoporosis-related bone fracture than those in non-diabetic individuals, the BMD is not necessarily identical [6], [7]. As suggested in a recent meta-analysis by Vestergaard, BMD even increases in patients with T2DM compared with non-diabetic individuals [8].
Many factors affect BMD in diabetes conditions. The traditional large longitudinal prospective studies are helpful to unravel determinant factors of BMD in T2DM. However, these kinds of studies are very expensive in terms of cost and time that they are difficult to reach the conclusion within a short time. In addition, the studies on the determinants of BMD in T2DM need to carry out complicated data analyses and data processing due to the complexity and complications of T2DM. Existing methods to find the relationship between risk factors and BMD mostly rely on experts' knowledge and artificial analysis of clinical data, which is time-consuming and cost-effective. Furthermore, they cannot identify the underlying causal mechanism between risk factors and BMD in T2DM.
To automatically identify the risk factors of BMD and discover the underlying casual mechanism among them, intelligent algorithms should be developed. Traditionally, Bayesian networks (BN) structure learning algorithms can learn the casual mechanism from the data. However, in the medical field, the number of clinical samples are not enough for a BN structure learning algorithm to discover the real underlying causal mechanism. Moreover, as BMD is affected by numerous factors, traditional BN structure learning algorithms can not be applied to such a large scale of factors. Considering that lots of existing medical knowledge are not exploited, this paper proposes a new BN structure learning algorithm (PKCL), which can learn the underlying causal mechanism between BMD and it's factors, meanwhile, incorporating rich existing prior knowledge. With the advantage of incorporating prior knowledge when learning the BN structure, some of the parameters of the model are determined by the prior knowledge. Thus, PKCL can deal with the case of large number of factors with reduced cost. Benefiting from prior knowledge, PKCL provides insight into complicated diseases and offer useful information to clinical experiment. Our contributions are summarized into the following three aspects: 1) Aiming to the clinical data with scarce samples but abundant prior knowledge, a new framework is present to learn a more accurate model. 2) A structure learning algorithm, PKCL, is proposed to utilize the prior knowledge as well as the causal information to detect the causal relationships in clinical data. 3) We conclude the prior knowledge of experts about BMD and its risk factors. Conditioned on that, we discover the underlying causal mechanism between BMD and risk factors.

II. RELATED WORK
It is accepted that patients with T2DM have a higher risk of osteoporosis-related bone fracture than those without diabetes [5], [9], [10]. Measurement of BMD is used to diagnosing osteoporosis as the golden standard. Nevertheless, whether the BMD decreases in T2DM is paradoxical according to current clinical studies. A number of factors affect the BMD in diabetes conditions, such as sex, body mass index (BMI), insulin, and glucose. The prevalence of higher BMD in T2DM is similar in men and women across racial and ethnic groups including Mexican American, white, and black people [11]- [13]. BMI is strongly associated with BMD in T2DM and might explain, in part, higher BMD in T2DM compared with non-diabetic individuals [14]. Insulin resistance and hyperinsulinemia, which are characteristics of T2DM, have effects on bone metabolism. High levels of circulating insulin may contribute to high BMD and there are evidences in preclinical models that altered insulin levels and insulin resistance affect bone remodeling via direct effects on osteoblasts, osteoclasts, and osteocytes, all of which express insulin receptors [15]. Hyperglycemia is associated with the accumulation of advanced glycation end-products (AGEs) in the bone matrix, and AGEs inhibit bone formation, an effect mediated at least in part by increased osteocyte sclerostin production [16], [17]. Given the determinants of BMD is complicated, the derivation of causality will contribute to elucidate the cause of bone mineral density in T2DM, which is beneficial to prevent and treat osteoporosis-related bone fracture in T2DM.
However, the current work about selecting the most relative risk factors is rarely studied. The existing approaches are mainly depended on the analysis and experience of the experts, which are not cost-effective and time-efficient. In addition, they can't analyze the risk factors of a complicated disease from a data aspect.
In recent studies, feature selection (FS) has been applied to several tasks including classification, regression, and clustering. A number of FS methods [18]- [21], which exploit different criteria to select the most informative features, have been proposed in the literature. They can roughly be divided into three classes: filter, wrapper, and embedding methods [22]. However, these three classes can not discover the underlying causal relationship between features and targets. Moreover, their FS criteria lack a theoretical proof of the optimality. The Markov Blanket (MB) algorithms are showed to have a superior performance over the traditional FS algorithm, as the MB is proved to be the optimal feature subset [23]- [25]. And MB algorithms can discover the underlying causal mechanisms of the selected features utilizing causal feature selection and causal discovery.
Generally, MB discovery can be grouped into two main types: nontopology-based and topology-based.
Nontopology-based MB algorithms exploit independent tests between feature variables and target variables to discover the MB heuristically. Koller-Sahami (KS) [26] first proposed an approximate algorithm to find the MB, which minimizes the cross-entropy loss by pruning out some redundant variables in a backward way. Due to the unsoundness of KS, lots of nontopology-based algorithms are proposed to improve on it. The Growth and Shrink algorithm (GS) [27] first tests and adds variables, which are sorted by the mutual information with the target variables, into the MB set in the growth stage. Then the shrinking stage eliminates false-negative nodes from the previous MB sets. Based on GS, The increment associated MB algorithm (IAMB) [28] improves the performance of GS by resorting the variables each time the MB set changes. After that, numerous variants of IAMB have been proposed including IAMBnPC, inter-IAMB, and KIAMB [29]. However, with the size of variables growth, the need for samples grows exponentially. If the sample data isn't enough, the performance of IAMB and its variants will degrade.
As the limited data in real-world applications, topologybased methods are proposed to solve the data efficiency while keeping a reasonable time cost. Min-max MB (MMMB) [30] discovers the MB set by finding the parent-and-children set first and then finding the spouses, in which way the sample size only relies on the Directed Acyclic Graph (DAG) structure rather than the size of variables. Although MMMB is later proved to be unsound [31], the two steps of discovering the MB set are the foundation of the following methods. HITON-MB [31] inherits the framework of MMMB and interweaving the two steps to exclude the false positives from parents and children (PC) sets as early as possible, which can decrease the number of independent tests (ITs) needed later. However, both MMMB and HITON-MB are unsound due to the incorrectness of PC discovery. Parent children-based MB algorithm (PCMB) [29], the first sound topology-based MB algorithm, which utilizes a double check strategy to fix the errors in PC discovery, is then introduced by Pena et. al.
After that, Iterative parent children-based MB (IPCMB) algorithm [32] are proposed based on PCMB and discover the PC set more efficiently. Recently, Simultaneous MB algorithm (STMB) [33] is developed to improve the time efficiency of MB algorithms by utilizing the property of coexisting between descendants and spouses.
Although MB algorithms can discover the underlying causal mechanism between variables and targets, they can't recognize the direction of the dependency. By BN structure learning, a DAG over all nodes can be constructed using the local MB sets. One approach of learning BN structure is constraint-based, which discover the arcs between each node pairs by conditional independent test (CIs). However, the number of CIs needed growth exponentially with the increase of the nodes. Moreover, as each CIs is calculated based on the results of another, it will lead to inevitable escalated errors. Another approach to learning BN structure is score-based.

III. NOTATION AND DEFINITION
Let capital letters denote variables (such as M , N ), lower-case letters (such as m, n) denote the value of random variables and capital bold italic (such as M, N) denote variable sets.
Definition 1 (Bayesian Network [34]): Formally, a Bayesian Network is a triplet < G, P, U >, which denotes a joint probability distribution P over a random variable set U and can be represented by a DAG where each node corresponds to a random variable.
If there is an arc from M to N , which means MN ∈ G, then M is said to be a parent of N and N is a children of M . In addition, if M is a parent or children of N ,they are said to be neighbors. Node M and N are said to be spouses of each other if they have a common child and there is no arc between M and N .If there is a directed path from M to N in G, then N is a descendant of M . And the descendants and the parents of X is represented as Des(X ) and Pa(X ). Further, we use H G (X ) and S G (X ) to denote the neighbors and the spouse of node X in G.
Definition 2 (Markov Condition [34]): Every node in the BN is independent of its nondescendant nodes, given its parents. Thus, if a BN < G, P, U >, according to the definition of Markov Condition, the joint probability P can be decomposed into the product of a series conditional probabilities: [34]): Three nodes M , X , and N are said to be a V-structure if there are two arcs from M , N to N and M is not adjacent to X .
X is said to be a collider if X has two incoming arcs from M and N , no matter M and N are adjacent or not. On the condition that M and N are adjacent, we say Y is an unshielded collider for the path from M to N .
Definition 4 (Blocked Path [34]): Any path from node in M to node in N is said to be blocked by a variable set iff: 1) comprises a head-to-tail (M → X → N ) or tail-totail (M ← X → N ) chain, and M ∈ N. 2) comprises a head-to-head (M → X ← N ) chain, where X / ∈ Z and any node in Des(X ) / ∈ Z. Definition 5 (d-Separation [34]): If all paths from M to N is blocked by , then is said to d-sperate M and N, denoted as Dsep(M , N |Z) Definition 6 (Faithfulness Condition [34]): Given a BN < G, P, U >, P and G are faithful to each other iff: all and only the condition probabilities true in P are entailed by G. [34]): Formally, given the MB of a target node T , denoted as MB(T ), T is independent of U \ MB(T ).
Definition 8 (PCMasking [35]): Let PC(X ) denotes the PC set of variable X. PC 1 and PC 2 denote two subsets of PC(X ) and PC 1 ∩ PC 2 = ∅. PC 1 and PC 2 are PCMaksing for variable X if PC(X ) and PC 1 are called MaskingPCs.
Theorem 9 (MB Uniqueness [34]): Given a BN< G, P, U >, if P and G are faithful to each other, then MB(T ), T ∈ U, is unique and is the node set of neighbors H (T ) and spouses S(T ).In addition, H (T ) is also unique.

IV. METHODS
In this section, we propose a BN structure learning algorithm driven by prior knowledge. Section V-A demonstrates the structure of PKCL and Section V-B, Section V-C demonstrate two-stage of it.

A. OVERVIEW
In real-world applications, the number of samples is limited while the number of features is numerous. If directly develop a structure learning (SL) algorithm in the limited data set, the output DAG can hardly reflect the real underlying casual mechanism among variables. Meanwhile, the existing SL algorithms ignore the significance of experts' prior knowledge, which leads to the poor performance of the algorithms. Motivated to incorporate the SL algorithm with the experts' prior knowledge, we propose an SL algorithm, which learns the BN structure and adds the prior knowledge simultaneously to build a global structure.
PKCL algorithm works in two phases: the local stage and the global stage. The pseudo-code of PKCL algorithm is denoted as Algorithm 1. In the local stage (lines 1-4 of Algorithm 1), PKCL first discovers the neighbors of the target variables and then detects the MaskingPCs to eliminate the effect of them. After that, it finds the spouse of target variables utilizing the neighbors set. Thus, the skeleton of BN is constructed and the detail of this stage is discussed in Section V-B.
In the global stage (lines 6-9 of Algorithm 1), PKCL leverages the MB sets learned in the local stage to learn the global BN structure, in which prior knowledge is incorporated to guide the global learning phase. Specifically, it learns the casual direction between feature variables and target variables by combining the constraint-based method and score-based method. What's more, in the learning phase, it automatically adds casual direction according to the prior knowledge. The detail of this stage is discussed in Section V-C.

B. THE LOCAL STAGE
In this stage, we present a cross-check and complement MB discovery (CCMB) [35]. CCMB is a topology-based MB discovery algorithm, different from other previews algorithm, it discovers the MB set of a node while repair the incorrect conditional independent (CI) tests via eliminating the PCMasking phenomenon.
The pseudo-code of CCMB is represented in algorithm 2. Specifically, it works in the following three steps. if T ∈ FindNeighbors(D, X ) and X / ∈ H(T ) then 6: For each target node T in T , the algorithm works in three phases: First, find the potential neighbors set H(T ) of T , then score and rank the potential neighbors to choose the best one from H (T ), and finally prune out the false variables.
Step2 (Lines 4-11 of Algorithm 2): Prune out MaskingPCs of node T . Compared to other MB algorithms, this step is the key point that makes CCMB outperform them. CCMB exploits a cross-check method(lines 4-8) to discover the MaskingPCs and appends them into PCMasking table PCM in the format of [T , X ], where T denotes the target variable and X denotes the cross-checked variable. Specifically, if T is the neighbor of X while X is not the neighbor of T , the crosscheck method will take X and T as MaskingPCs because of the asymmetry between them.
Step3 (Algorithm 4): Discover the spouses of node T . The pseudo-code of this step is represented in algorithm 4. If Y is the neighbors of T and X is the neighbor of Y , then find a node subset Z conditioned on which T is independent of X .

C. THE GLOBAL STAGE
After the MB sets discovered, local information can be integrated to get the structure of DAG. Traditionally, the next step is to determine the direction of the edge, and thus, the underlying causal mechanism is learned. However, this way of learning the DAG is totally depended on the clinical data, which means a lot of knowledge in the medical field are ignored. Thus, some of the causal relationships that only learned from clinical data in conflict with medical knowledge and some causal relationship that is already proved in medical literature can not be learned.
PKCL learns the structure of a DAG between nodes via leveraging the MB sets discovered in the local state. Different from other structure learning method, the learning process of PKCL is routed by the prior knowledge of experts, which means the PKCL works in a more data-efficient way while maintaining superior performance. Specifically, PKCL first discovers the colliders to constructed the overall DAG. if there is no collider discovered, we use a heuristic method with the constraints of MB sets and prior rules to construct the DAG of the underlying BN. Here, the heuristic algorithm we used is the steepest ascent hill-climbing with a TABU [36] list of the last 100 structures and a stopping criterion of 15 steps without improvement in the maximum score.  Table 1. The prior knowledge used in the experiment is that feature 3, 5, 6, 7, and 8 are the causes of six BMDs. Before experiments, the data should be discretized. Here we use a packed toolkit Causal Explorer [37]. The detailed description is added in the appendix.

B. QUALITY ANALYZATION OF SELECTED FEATURES
The overall experiment comprises two-stage. To demonstrate the superiority of PKCL, each stage of PKCL is analyzed. In the local stage, four traditional feature selection algorithms and four MB algorithms are also applied to the clinical data set. To evaluate the quality of selected features, five classifiers are learned based on the selected features of nine algorithms and the prediction accuracy is computed. In the global stage, to demonstrate that the casual relationship learned after incorporated prior knowledge is more reasonable than the causal relationship learned without prior knowledge, not only the DAG with prior knowledge is learned, but also the DAG without prior knowledge are learned.
At first, we randomly select 400 samples from the dataset to implement the CCMB algorithm and other four MB algorithms and four traditional feature selection algorithms, namely IAMB [28], PCMB [29], MBOR, STMB [33], mRMR, Fisher, FCBF, and RFS. Then, in order to demonstrate the superiority of PKCL, five classifiers, i.e., Support Vector Machine (SVM), k-Nearest Neighbors(kNN), AdaBoost, Random Forest (RF), and Naive Bayes (NB) are trained with their selected features. In addition, the classifiers are also trained with the original features to be considered as a baseline, which can demonstrate that the feature selection algorithms can improve the prediction accuracy of classifiers by extracting the informative features. The k is set to 10 in kNN classifier. Lastly, the rest 100 samples are used as testing data to evaluate the quality of the selected features.
The experimental results on six BMD are listed in Table 3. As Table 3 shows, when the label is BMD1, BMD2 or for all X , Y ∈ PH(T ) do 12: if X ⊥ Y and T ⊥ X |Y then 13:

25:
end if 26: end for 27: end while 28: end for 29: return The Neighbors set of nodes in T H BMD5, the five classifiers using the features selected by PKCL achieves the best prediction accuracy. When the label is BMD3, BMD4 or BMD6, SVM, KNN, Adaboost, and Random Forest also achieves the best prediction accuracy using the feature selected by PKCL, although Naive Bayes achieves the best prediction using the feature selected by mRMR, the result is still competitive when using the feature selected by PKCL. Specifically, the five classifiers achieve 0.9-19.2% improvement of prediction accuracy in comparison to the result of using all features, which brings a significant improvement. In addition, the selected features are the input of the global stage of PKCL, if the selected features are more informative, the underlying causal mechanism will be more reasonable.

C. LEARNING THE DAG WITH PRIOR KNOWLEDGE
To illustrate the significance of prior knowledge, the DAG that not incorporating prior knowledge is also learned. The overall DAG learned with prior knowledge and the overall DAG learned without prior knowledge are presented in the appendix. Here only six BMDs concerned are analyzed. for all Y ∈ MB(X ) do 4: for all Z is the common child of X and Y do 5: if possible without introducing cycles and satisfies prior knowledge R then 6: add XY and YZ to G In order to have an insight into the differences between the DAG that incorporating prior knowledge and the DAG that not incorporating prior knowledge, we first analyze the DAG that incorporating prior knowledge and then analyze the DAG not incorporating prior knowledge, finally, the superiority of the former and the inaccuracy of the latter are analyzed in detail.
The DAG incorporating prior knowledge is analyzed as follows. As Figure 1 shows, all BMDs have no effect on any feature, which means BMDs are the comprehensive effects of some risk factors and BMDs don't have effects on any risk factors. In addition, features 1, 2, 3, 5, 6,7,8,11,12,15,28,29,33 are the common causes of all six BMDs, which means these features have an underlying effect on the decrease of mine density.   25,32. BMD6 has no effect on any feature.

D. DISCUSSION
As we analyzed above, PCKL can discover the underlying causal mechanism between BMDs and their related risk factors. Here some casual relationships that have already been discovered in the clinical field will be discussed, which can demonstrate the superiority of PKCL. In addition, the new casual mechanism found by PCKL will provide an insight into the relationship between BMDs and their factors, which may contribute to the prevention and treatment of diabetes-related osteoporosis.
A report shows that 1 out of 3 women and 1 out of 5 men over 50 years old will experience an osteoporotic fracture at some point in their life [38]. Patients with T2DM, one of the most common chronic diseases, suffer from an increased osteoporotic-related bone fracture risk, which places a heavy burden on individuals. BMD is the golden standard for diagnosing osteoporosis. However, the causal chain involved in BMD and T2DM is not clear.
In elderly diabetic individuals, AGEs may inhibit the phenotypic expression of osteoblast and promote osteoblast apophasis, thereby contributing to the deficiency in the bone formation [39], [40]. AGEs also increases osteoclast-induced bone resorption. The study by Zhou et al. has indicated that increasing age is a more important risk factor for bone mineral loss in patients with T2DM than diabetes duration [41]. The report by Wang et al. has indicated that the adverse changes in the collagen network occur with aging and such changes may lead to the decreased toughness of the bone [42]. Moreover, the porosity of the bone significantly increases with aging and correlates to bone strength and stiffness [42]. Therefore, BMD negatively correlates to aging. After bone mass reaches a peak in the third or fourth decade of life, vertebral bone mass and density decrease with aging for both females and males [43]. Moreover, AGEs accumulation occurs in the bone with aging, increasing by 4 to10 fold at the age, of 50 years old [44]. As discussed below, increased levels of AGEs in bone tissues have been shown to be associated with diminished bone mechanical function and reduced cortical and trabecular bone strength. Additionally, age-related bone loss is associated with abnormalities in vitamin D status. Reduced serum levels of active vitamin D metabolites, 25-hydroxylamine-D[25(OH)D] and 1α, 25-(OH)2-D, occur with aging in both sexes [45], [46]. Nutritional vitamin D deficiency may contribute to secondary hyperparathyroidism and bone loss with aging since decreases in serum 25(OH)D levels correlate inversely with serum parathyroid hormone levels and positively with BMD [47].
As indicated in this study, hyperglycemia is another important factor determining BMD in patients with T2DM. It could be explained as follows. Firstly, diabetes has been shown to cause decreased osteopsathyrosis, reduced bone formation, and enhanced osteoblast apophasis in a bone-loss mouse model [48]. Secondly, hyperglycemia leads to glycosuria, which results in a loss of calcium. Hypercalciuria presented as a raised glomerular filtration rate, reduces calcium reabsorption and impairs bone deposition in diabetic rats [49]. Hypercalciuria decreases the level of calcium in the bone,  leading to poor bone quality [50]. Some reports indicate that the hypercalciuria in patients with uncontrolled blood glucose could stimulate parathyroid hormone secretion, which may contribute to the development of osteopenia [51]. Thirdly, hyperglycemia is known to generate higher concentrations of AGEs in collagen [52]. AGEs have been shown to be associated with decreased strength in human cadaver femurs [10]. The combination of the accumulation of AGEs in bone collagen and lower bone turnover may contribute to reduced bone strength for a given BMD in diabetes [53]. AGEs and oxidative stress produced by hyperglycemia may reduce enzymatic beneficial cross-linking, inhibit osteoblast differentiation, and induce osteoblast apoptosis [50].
Height, weight, sex, and obesity are also factors affecting BMD in T2DM. As a Korean population-based study reported, sex affects BMD. The difference in BMD distribution at the same skeletal site may be partially explained by distinctive endocrine and paracone factors between the two sexes [54]. It has been suggested that bone loss in elderly men is mostly a result of decreased bone formation, whereas bone loss in postmenopausal women is a result of excessive bone resorption [55]. Sex hormones may account for this difference. Estrogen rapidly decreases in postmenopausal women. An accelerated phase of predominantly cancellous bone loss initiated by menopause is the result of the loss of the direct restraining effect of estrogen on bone turnover [56]. Estrogen acts on high-affinity estrogen receptors in osteoblast and osteoclasis to restrain bone turnover [57]. Estrogen also regulates the production, by osteoblastic and marrow stromal cells, of cytokines involved in bone remodeling, such as interleukin (IL)-1, IL-6, tumor necrosis factor-α, prostaglandin E2, transforming growth factor-β, eta, and osteoprotegerin [57], [58]. The net result of the loss of direct action of estrogen is a marked increase in bone resorption that is not accompanied by an adequate increase in bone formation, resulting in bone loss. The accelerated phase of bone loss in women is due to direct skeletal consequences of rapid reduction in serum estrogen following the menopause.
High body weight and obesity have been shown to be associated with high BMD in many observational studies [59]. Obesity may lead to increased BMD because it is associated with higher 17 δ -estradiol levels and higher mechanical load, which may protect bone [60], [61]. Visceral fat accumulation is associated with higher levels of pro-inflammatory cytokinesis, which may up-regulate receptor activators of nuclear ligand, leading to increased bone resorption and therefore decreased BMD [62]- [64].
Some studies have shown that weight loss, both intentional and unintentional, is associated with the decreases in BMD. The study by Geoffroy et al. has shown that more than 70% of patients have clinically significant BMD loss at 12 months after bariatrics surgery [65]. This loss of bone density was observed at the femoral neck and femur [65]. Then the significant reduction in BMD was related by bivariate analysis to the extent of reduction in BMI, weight loss, and to loss of fat and lean mass [65]. A recent study in elderly women has identified risk factors for hip BMD loss over four years and concluded that women who gain weight show attenuated BMD loss at the trochanter, femoral neck, and total hip [66].

VI. CONCLUSION
In this paper, we propose a new BN algorithm (PKCL) that can find the underlying causal mechanism between six BDMs and their related factors. PKCL includes two stages: the local stage that discovers the local MB sets and the global stage that learns the direction of casual-effect relationship.
In addition, to demonstrate the superiority and effectiveness of PCKL, a clinical data set that concludes the clinical indexes of the patient with T2DM was collected and preprocessed. Experiments on this dataset shows that PKCL can discover the casual relationships that have already been discovered in clinical literature. What's more, PKCL can discover new casual relationships to assist clinical researchers in carrying out new experiments, which can save a lot of time and money. Different from other BN algorithm, PCKL incorporates rich prior knowledge, which means it can achieve good performance even when the dataset is small while the feature is numerous. What' more, PCKL is not limited in the clinical literature but can be adjusted into any domain if incorporated with prior knowledge. The future work on this subject will employ other probabilistic models [67], [68] and learning in the model space [69], [70] for this kind of problems.

THE PROCESS OF DISCRETIZATION
The discretization method that works as follows: 1) Data is normalized so that each variable has mean 0 and standard deviation 1 2) After normalization, association of each variable with the response variable is computed using either Wilcoxon rank sum test (for binary response variable) or Kruskal-Wallis non-parametric ANOVA (for multicategory response variable) at 0.05 alpha level [71]. 3) If a variable is not significantly associated with the response variable, it is discretized as follows: • 0 for values less than -1 standard deviation • 1 for values between -1 and 1 standard deviation • 2 for values greater than 1 standard deviation 4) If a variable is significantly associated with the response variable, it is discretized using sliding threshold (into binary) or using sliding window (into ternary). The discretization threshold(s) is determined by the Chi-squared test to maximize association with the response variable [72].
The discretization procedure can be instructed to compute necessary statistics only using training samples of the data to ensure unbiased estimation of error metrics on the testing data. VOLUME 8, 2020 Figure 2 is the learned DAG that not incorporating prior knowledge. Figure 3 is the learned DAG that incorporating prior knowledge. The circle represents the features and the arc represents the causal relationship. The numbers in the circle denote the features and Table 1 lists the corresponding relationships.  Since 1990, he has been an Internal Medicine Physician. He is currently a Chief Physician in endocrinology with The First Affiliated Hospital of USTC, Hefei. He is the author of more than 50 articles. His research interests include diabetes and osteoporosis. He is the Chief of the Anhui Osteoporosis and the Bone Mineral Disease Society. He is a Reviewer and the editor of several academic journals.

ABBREVIATIONS AND ITS DESCRIPTIONS
YAYUN CUI was born in Anhui, China, in 1983. She received the master's degree from Anhui Medical University, in 2016. From 2000 to 2005, she was a Resident Doctor with the Anhui Provincial Hospital, where she has been an Attending Doctor for a period of seven years with the Department of Radiation Oncology, since 2012. She was an Associate Chief Physician. She is the author of ten articles and has two medical science foundations. Her research interests include radiotherapy for esophageal cancer, nasopharyngeal carcinoma, lung cancer, and diagnosis and treatment of other tumors. She is a member of the Oncology Radiotherapy Committee of the China Medical Education Association.
XI ZHANG was born in Chizhou, Anhui, China, in 1992. She received the bachelor's degree from the Wannan Medical College, Wuhu, Anhui, China, in 2015. She is currently pursuing the degree in endocrinology. Her main research interests include diabetes, osteoporosis, and other endocrine diseases.