A General Outpatient Triage System Based on Dynamic Uncertain Causality Graph

,


I. INTRODUCTION
Triage systems can be classified into three types, the specific disease triage [1]- [3], the emergency triage [4], [5], and the outpatient triage [6], [7]. The purpose of the first two is to arrange the treatment priority for patients according to their fundamental physiological indicators and the urgency of their health. Different from the first two, the purpose of outpatient triage is to triage the patient to the proper department according to his clinical features. Research on triage systems has focused primarily on emergency triage and specific disease triage, and numerous computer-aided systems developed to help triage nurses to achieve that mission [8]- [14] but ignored the importance of the outpatient triage. Outpatient triage is the first step of clinical diagnosis, many misdiagnoses were caused by outpatient triage errors, most of which were due to a lack of experience and medical knowledge among triage nurses and to insufficient information provided by patients [15], [16]. Some common symptoms, such as fever, The associate editor coordinating the review of this manuscript and approving it for publication was Gina Tourassi. headache, and cough, can be caused by various diseases, and those diseases belong to different departments. If there are not enough clinical features of patients, outpatient triage errors could easily be made by an inexperienced nurse. For a patient, the direct consequence of an outpatient triage error is a waste of time, a bad medical experience, and property damage. In severe cases, it can cause misdiagnosis, which can lead the patient to miss the best treatment time, with irreversible consequences. Therefore, it is of great practical significance to study and develop an outpatient triage system to assist triage nurses in improving the outpatient triage accuracy and reducing misdiagnosis or missed diagnosis caused by outpatient triage errors.
The essence of outpatient triage is to classify patients into the proper department according to their clinical features. Various algorithms could be used for this classification, such as the K-NN [17], [18], Naive Bayes, Bayesian Network [19], [20], SVM [21]- [23], decision tree [24], [25], artificial neural network, and deep learning [26]- [29]. These classification algorithms have been fully verified and widely applied in the field of text classification and image VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ recognition [30]- [34]. Unlike other fields, the algorithms used in clinical diagnosis not only require a high diagnosis accuracy but also good interpretability for its result. However, the algorithms cannot properly explain the calculation results, and this prevents them from performing well in clinical diagnosis. Therefore, an algorithm with good interpretability and extreme accuracy is needed. As a graphical causal probability model [35]- [40], the DUCG can intuitively express causal knowledge in various graphic symbols, present the result in the form of conditional probability and illustrate it graphically. With initial application in the fault diagnosis of complex large scale systems [41]- [45], the DUCG was later applied to clinical diagnoses, such as jaundice, vertigo, and nasal obstruction, and perfect results were achieved [46], [47]. Due to the good performance of DUCG in clinical diagnosis, this paper uses the DUCG to perform outpatient triage. Section II is a brief description of the DUCG model, including M-DUCG and S-DUCG. Section III explains the construction method of the triage knowledge base. Section IV shows the calculation method and the calculation case of the hybrid DUCG. Section V describes a method of heuristic information collection and verifies the triage system. Section VI is a conclusion, and it outlines future work.

II. BRIEF INTRODUCTION TO THE DUCG
In the DUCG, different types of variables/events are represented by different shapes [41]. As shown in Figure 1, B i ( ) is the root cause event, such as a fault source in the fault diagnosis or a disease in the clinical diagnosis. The Btype variable has no parent variable but has at least one child variable. During the inference calculation, the B-type variable is used as a hypothesis. X i ( ) stands for a consequence variable or a procedure variable, for example, clinical features in a clinical diagnosis or abnormal signals in a fault diagnosis. It can be caused by any type of variables and can also be used as the cause of other variables. D i ( ) represents the default or unspecified cause event of X i . D i works only when none of the known causes of X i has occurred. BX i ( ) represents an integrated cause variable. In a fault diagnosis, BX i is used as a collection of multiple root cause events [48], while in clinical diagnosis, BX i represents the root cause event affected by risk factors [45], [46]. During the inference calculation, if BX i exists, it replaces B i as the hypothesis. G i ( ) is the logic gate variable representing the logic relations among its parent variables. The logical combinations of its parent variables are described by the logic gate specification table (LGS i ). For example, the LGS 1 of G 1 is G 1,0 = ''Remnant'', G 1,1 = X 3,1 X 6,0 , and G 1,2 = X 3,1 X 6,1 . SG i ( ) is also a logic gate, but it is only used in clinical diagnosis to express combinations of risk factors. In the DUCG, uncertain causality relationships are represented by different directed arcs. The uncertain causality between variables in S-DUCG is represented by linkage event P n;i ( . Before '';'' is the child variable, and n is the index of the child variable; after '';'' is the parent variable and i is the index of the parent variable. Its causality intensities are coded in matrix p n;i , e.g., P 3;1 is the linkage event from B 1 to X 3 , with the parameter matrix as follows: p 3;1 = Pr P 3;1 = p 3,0;1,0 p 3,0;1,1 p 3,1;1,0 p 3,1;1,1 = 0.7 0.2 0.3 0.8 Different from P 3,1 , P 6;1 is a conditional linkage event( ). Its condition is denoted as Z 6,1 = X 8,1 , which means that if X 8,1 is true, P 6;1 exists; otherwise, P 6;1 does not exist. Weighted functional event F n;i ( ) represents the uncertain causality between variables in M-DUCG, e.g., F 7;2 . F n;i will be explained in detail later. Zoom factor variable SA n;i ( ) is specifically used between BX n and SG i to represent the zoom effect of risk factors on BX n , Unlike F n;i , the parameters of SA n;i can be greater than 1, and a dotted double directed arc ( ) is used to represent the conditional zoom factor variable. The black directed arc ( ) links the input variables of the G-type or SG-type variable.
S-DUCG is short for DUCG that is applied to single-valued variables [36]. In S-DUCG, all variables have two states, only the cause of the variable's true state (1 state) needs to be defined, and the false state (0 state) regarded as the complement of the true state. Figure 2(a) is an example of S-DUCG, it shows that X i and X j have a certain probability to cause X n . Figure 2(b) reveals the functional mechanism of Figure 2(a), it shows that the relationship between X i and X j is OR. P n;i and P n;j are two linkage events, and they show the causal intensity that X i and X j cause X n to occur separately. Therefore, X n can be expressed as X n = P n;i X i ∪ P n;j X j , and this process called event expansion. Before probability calculations, all the events will be expanded layer by layer along the causality chain, until the events are expanded to a form of disjoint sum-of-products only composed of {P-, B-, D-}-type events. The event expansion formula for S-DUCG is To obtain the disjoint sum-of-products, we can use (2) and (3) to perform the disjoint calculation on (1) [35].
M-DUCG is short for DUCG that is applied to multivalued variables. A multivalued variable means that its more than one state could be specified separately. Figure 3 illustrates the internal mechanism of the M-DUCG, different from S-DUCG, the relationship among parent variables with the same child variable is the weighted exclusive OR, the effect of a parent variable on its child variable is represented by a weighted functional event F nk;ij i , and the event expansion formula for the M-DUCG is In (4), V ij , V ∈ {B, X , G, D}, denotes a parent event (state j of parent variable i) of child event X nk (state k of child variable n), F nk;ij i = r n;i /r n A nk;ij i , r n;i represents the degree of uncertainty between X n and V i , and r n = r n;i is the sum of the degree of uncertainty for all the parent variables of X n . From formula (4), it can be observed that M-DUCG is a weighted average model.
The parameters in DUCG can be given by domain experts based on their experience, and they can also be learned from data. The DUCG allows the incomplete representation of parameters, which means that users only need to assign the parameters for their concerned variables. The DUCG can be built in a modular way. Users could model the local knowledge as some sub-DUCGs, and then all the sub-DUCGs can be compiled into a complete DUCG automatically by our designed DUCG compiler. Those features reduce difficulties for DUCG to build a large and complex knowledge base.
Regarding the model application, if multivalued variables are included in the knowledge representation, then the M-DUCG is used to construct the knowledge base; if only single-valued variables are included in the knowledge representation, then the S-DUCG is a better selection. The mixed S-DUCG and M-DUCG models called the hybrid DUCG.

III. THE CONSTRUCTION OF A TRIAGE KNOWLEDGE BASE
The triage knowledge base is used to describe the department division of the hospital and to model the causal relationships between departments and clinical features. In reality, the granularity of the department division for each hospital is different; the department division of large-scale general hospitals is detailed, and by contrast, the division of county-level hospitals or community hospitals is rough. To make the triage knowledge base have good scalability and match the different triage needs of different hospitals, we construct the triage knowledge base in the following way: 1) Construct a basic triage knowledge base. Modeling each department as a sub-DUCG according to the most detailed department division schemes, and all those sub-DUCGs are compiled into a complete DUCG as a basic triage knowledge base. 2) Config the basic knowledge base to meet the triage needs of a target hospital. The purpose of the configuration is to match the target hospital departments with the departments of the basic triage knowledge base, and this process is achieved by adding a new S-DUCG without any modification of the basic triage knowledge base. In the following section, the construction method and the configuration method of the basic knowledge base will be described.

A. USING M-DUCG TO BUILD A BASIC TRIAGE KNOWLEDGE BASE
A basic triage knowledge base is an integrated DUCG, which is composed of some sub-DUCGs, and each sub-DUCG stands for one independent department. The automatic integration of DUCG is made by our designed DUCG compiler, so we only need to concentrate on how to construct each department (sub-DUCG). For one department, we classify all of its clinical features into four types: department, risk factors, symptoms, and physical signs. Each sub-DUCG should clearly express the causality between the department and the other three types of clinical features. Because the clinical features include both single-valued variables and multivalued variables, M-DUCG is used to construct the department. The detailed department construction method is as follows: A department is represented by a B-type variable. Risk factors are represented by X -type variables, and they are used as input to a double line logic gate (SG-type variable). The SG-type variable is used to express the combination of risk VOLUME 8, 2020 factors and the department variable. Its inputs are risk factors and the department variable, with its only output being the BX-type variable. The BX-type variable is used to represent the department variable affected by risk factors. The influence of risk factors on the probability of occurrence of a given department is represented by the zoom factor variable. The symptoms and physical signs associated with the departments are also represented by X -type variables. Unlike risk factors, they are used as child variables of the BX-type variable. The causal relationships between the symptoms, physical signs, and departments are represented by weighted functional events. The M-DUCG of otolaryngology constructed in this way is illustrated in Figure 4.
In Figure 4, B 5 represents otolaryngology, it has two states, state 0 being false (not belonging to this department) and state 1 being true (belonging to this department). X 22 (dust environment) and X 23 (noise environment) are risk factor variables, if they are true, they can increase the incidence of otolaryngology. BX 5 represents the otolaryngology variable affected by risk factors. Its child variables are the symptoms and physical signs related to it. Red directed arcs stand for the intensity of causality between parents and children, and parameter matrices are given in Appendix A. Take the red direct arc from otolaryngology (BX 5 ) to nasal obstruction (X 1 ) as an example, the parameter matrix between them is: In this matrix, we only assign a 1,1;5,1 = 0.95, It means that when otolaryngology related diseases occur. The probability that they can cause nasal obstruction to occur is 95%. For we do not concern the relationships between the other states, the parameters between them are given as ''-'', which means it will not function during the inference calculation.
So far, the sub-M-DUCG representing for otolaryngology is completed, it clearly describes the causal relationships between the otolaryngology and its related clinical features. The other departments are also modeled as sub-M-DUCGs in the same way. After all the sub-M-DUCGs are finished, they are compiled into one complete M-DUCG by our DUCG compiler. During the compilation, some verification work is made to remove the redundant variables and lines in different sub-M-DUCGs. The complete M-DUCG is shown in Figure 5. It includes 31 departments, and the detailed departments are listed in table 1. 436 X -type variables are used to describe the clinical features (symptoms, physical signs, and risk factors) that are related to departments. 856 red lines present the causal relations between departments and clinical features. This M-DUCG is used as the basic triage knowledge base.

B. USING S-DUCG TO CONFIGURE THE BASIC TRIAGE KNOWLEDGE BASE
In reality, the granularity of the department division for each hospital is different; the department division of large-scale general hospitals is detailed, while the division of countylevel hospitals or community hospitals is rough. How to use our triage system to support the different triage needs in various hospitals? It needs our triage system to get the ability of universality. We equip our triage system with the universal ability by the ''integration'' and ''disintegration'' operations to our basic triage knowledge base.

1) INTEGRATION OPERATION
This operation will be employed when multiple departments of the triage knowledge base should be mapping into one department of the target hospital. In using the ear, nose, throat, pharynx, head, and neck diseases as an example. A county-level hospital usually classifies those diseases into one department, otolaryngology, while in our basic triage knowledge, they are divided into two departments, otolaryngology (Figure 4), and head & neck surgery (Appendix B). Therefore, department integration should be performed to make our triage knowledge base to match the department division of the target hospital before our basic triage system used in the target hospital. Department integration does not mean to construct one new department to replace the original two, instead, a new S-DUCG is constructed to integrate the two departments into one, therefore, the department integration does not require any modifications to the basic triage knowledge base. In fact, departmental integration is a logical OR relationship, that is, similar to the union operation of multiple sets, the department variable (BX-type variable) is single-valued, so the departmental integration can be transformed into an S-DUCG model. As shown in Figure 6, BX 101 (Integrated Otolaryngology), BX 5 (Otolaryngology), and BX 34 (Head & Neck Surgery) form an S-DUCG, P 101;5 and P 101;34 represent the causal intensities between BX 101 and its two parent variables. Since the relationship between the child variable and parent variable is inclusive, and all of them are single-valued, the parameter matrix of P n;i is always satisfied as follows: That is, when the parent event occurs, the child event must occur. During the process of DUCG compilation, those three sub-DUCG were compiled into one hybrid DUCG, and it is shown as Appendix C. After department integration, BX 101 will be used as a hypothesis (BX 101 is called the hypothetical BX-type variable). BX 5

2) DISINTEGRATION OPERATION
When a department in the triage knowledge base needs to be mapped to multiple departments in the target hospital, a disintegration operation is required. Using the fever clinic as an example, our triage knowledge base has a fever clinic (BX 35 ) to triage patients with high fever, while the target community hospital does not. As a result, before our triage knowledge base is applied to the hospital, it is necessary to disintegrate the fever clinic to different departments of the target hospital associated with high fever according to the conditions. In the target hospital, departments with high fever symptoms are the infectious disease, rheumatology & allergy, and respiratory medicine, so the fever clinic should be disintegrated into these three departments according to the patient's conditions. This situation can be modeled by the S-DUCG as showing   in Figure 7, and parts of the parameters in Figure 7 are below it. In Figure 7, the fever clinical (BX 35 ) is disintegrated into three departments of the target hospital, infectious disease (BX 132 ), rheumatology & allergy (BX 116 ), and respiratory medicine (BX 107 ) by different conditions. These conditions are represented by conditional linkage events. For example, conditional P 107;35 is the conditional linkage event between BX 35 and BX 107 , and Z 107;35 = X 101,1 ∪X 102,1 (X 101,1 denotes the patient was from an infected area, X 102,1 denotes the patient was from the pastoral area or had a history of exposure to cattle and sheep) is its condition. If Z 107;35 is true, then the relationship will be established, and the causal intensity is 0.8. Similarly, if X 121,1 is true, then the relationship between BX 35 and BX 116 will be created. Otherwise, these relationships will not exist. By integration and disintegration operations, our basic knowledge base can be flexibly adjusted to meet the triage requirements of various hospitals. This method achieves the effect of integration and disintegration by adding some additional S-DUCGs, therefore, avoids modifications to the basic knowledge base, and equips our triage system with flexibility and scalability.
To verify the feasibility of this method, we used our knowledge base to achieve the triage requirements of a county-level hospital (target hospital). There are 25 departments in this hospital and all departments given in table 2, but our knowledge base has 31 departments. To apply our knowledge base to this hospital, some departments in our knowledge base should integrate into the corresponding department of the target hospital. The relations of department integration show in Table 2. The Target Departments are the departments of the county-level hospital, and the Matching Relations show the matching relations between the departments in our basic triage knowledge base and that of the target hospital. The h-BX is the symbol of the department of the target hospital in the hybrid DUCG, it will be used as a hypothesis during the calculation. For example, the Respiratory Medicine of our knowledge base matches to the Respiratory Medicine of target hospital exactly, so the Respiratory Medicine of our knowledge base was directly used as Respiratory Medicine of the target hospital, and BX 7 is used as the symbol of the Respiratory Medicine in the hybrid DUCG. Another example, Dermatology & Venereology of the target hospital matched the Dermatology (BX 12 ), and Venereology (BX 33 ) in our knowledge base, so Dermatology (BX 12 ) and Venereology (BX 33 ) integrated as one department, Dermatology & Venereology, BX 103 is used as the symbol of Dermatology & Venereology. According to the matching relations in Table 2, we built one S-DUCG shown in Appendix D. After together compiling the M-DUCG in Figure 5 and S-DUCG in Appendix D, we get the final hybrid DUCG showed in Figure 8. This hybrid DUCG used as the outpatient triage knowledge base for the target hospital, and the inference calculation method for the hybrid DUCG model will be explained in the next section.

V. HEURISTIC INFORMATION COLLECTION METHOD AND CASE STUDY
Incompetent clinical information of patients may also lead to triage errors. To collect the clinical information of patients competently, a heuristic information collection method is designed for the triage system. The method is as follows: 1) Enter the patient's self-reported symptoms into the system for an initial reasoning calculation and obtain the ranked result. 2) For the top 3 departments in the ranked result, select 6 unknown evidence (X i ) from each department (BX n ) with the highest occurrence probability as optional recommended items, and max a i,k;n,1 is used as the occurrence probability of X i . For each recommended item, according to the principle that the recommended item of departments with high ranked probability should be recommended first, the product of max a i,k;n,1 × Pr BX n,1 is used as the ranked value to rank the total 18 recommended items. Finally, the top 10 of the 18 recommended items and all the risk factors related to these 3 departments are selected as the recommended items for patients to answer, and the recommended evidence will no longer be recommended for the next time. 3) Repeat steps 1 and 2 until no new evidence is entered. The final diagnosis result is usually existing in the top 3 hypotheses of each calculation, so the recommended items are chosen from the top 3 departments. 10 clinical features are chosen as the final recommended items are primarily based on the following two concerns. One is that too few recommended items are likely to cause inaccurate recommendations. Another is that too many recommended items make the  recommendation less concise. In practical applications, those parameters can be configured dynamically. An outpatient case was used to explain the heuristic information collection method and diagnosis process.
The patient's self-report is as follows: male (X 73,1 ), joint pain (X 40,1 ) for two months and has had a fever (X 79,1 ) for two weeks. After entering this evidence E = X 40,1 X 73,1 X 79,1 into the triage system, we obtain the following top 3 results: rheumatology & allergy (44.13%), infectious disease (17.79%), and orthopedics (8.31%). Simultaneously, the system recommends that the patient should answer questions listed in Table 3 for further diagnosis. During the consultation, the patient said that he started raising sheep (X 102,1 ) 3 months ago, and some of the other breeders also showed the same symptoms as his. In addition, the patient had cold intolerance (X 115,1 ). With this evidence, E = X 102,1 X 115,1 was input into our system for another calculation, and the result shows that the top 3 departments are infectious disease (55.42%), rheumatology & allergy (14.81%), and orthopedics (2.24%). Since no more evidence was provided by the patient, the triage finished. This result indicated that the patient should go to the infectious disease department to see a doctor. The patient was ultimately diagnosed with brucellosis, which means that the outpatient triage was accurate. Figure 11 is a graphical explanation of the infectious disease department, it can explain all symptoms except the gender, for the onset of infectious diseases has nothing to do with gender.
To verify the performance of our triage system, we performed three experiments. The first was used to test the triage performance of our triage system. The second experiment was used as a comparison of our triage system and some machine learning methods mentioned above, and the last one was a comparison of our triage system and inexperienced triage nurses. All tests were run on a laptop with an Intel Core i7@2.70-GHz processor and 16-GB random access memory. The cases used for experiments were gathered from the hospital information system (HIS) of three hospitals, 151 cases from Peking Union Medical College Hospital, 100 cases from Xuanwu Hospital Capital Medical University, 200 cases from Suining Central Hospital, total 451 cases, and all cases were selected randomly. During the experiments, only the selfreported symptoms of the patients recorded in cases were used as input. Table 4 is the triage performance of our triage system. Test Cases are the number of cases used for testing each department. True is the number of test cases correctly triaged, and False is that of incorrectly triaged. Accuracy is the triage accuracy of each department. As shown in Table 4, the overall triage accuracy for our triage system was 96.8%, and the accuracy for each department is higher than 90% except for neurosurgery. 22 cases were used for testing neurosurgery, but 4 cases were incorrectly triaged into neurology. In reality, it is rather difficult to distinguish whether the patient should be triaged into neurology or neurosurgery only based on his self-reported symptoms. During the actual triage process, if he has the symptom of head trauma, he should be triaged into neurosurgery. Otherwise, he should be triaged into neurology. Therefore, the doctor doesn't think these 4 cases are triage errors. If the 4 cases don't be recognized as triage errors, then the accuracies for all departments were higher than 90%, and the overall accuracy would be 97.8%.
The experiment results for some machine learning algorithms are presented in Table 5. For each algorithm, 66% of the cases were employed as the training set, and the rest of the cases were used as the test set. Each experiment was  Table 5, the average triage performances of the machine learning algorithms are mediocre, ranged from 43.9% to 75.8%. The SVM algorithm did the best triage performance with the highest accuracy, but it still 21% less accurate than our triage system. For the same algorithm, the triage accuracies for different departments varied greatly, some departments were lower than 50%, even 0%, while some reached to 90%. For our triage, the triage accuracies for different departments were stable.
The same cases were employed to test the triage performance of 7 triage nurses with less than two-year experience in outpatient triage. As shown in Table 6, the triage accuracy for all tested nurses ranged from 60.3% to 80.2%. The average accuracy was 68.9%, which means that more than 30% of patients were incorrectly triaged. If our triage system is employed to assist those inexperience nurses for outpatient triage, the accuracy of triage will be improved significantly.
Experimental results demonstrate that our triage system performs better than some machine learning algorithms and inexperienced triage nurses. Although machine learning algorithms can obtain the model through data learning, the quality of the model heavily depends on the data quantity and data quality of the training set. In essence, outpatient triage is a multi-classification problem with highdimensional data. In addition to the excellent performance of the SVM algorithm in dealing with high-dimensional data, other algorithms have limited ability to cope with VOLUME 8, 2020  this problem. This is the reason why the machine learning algorithms perform poorly on the outpatient triage. Unlike machine learning algorithms, our models are constructed by experts based on knowledge and reviewed by other experts to ensure the accuracy of the models. Data dimension reduction can be achieved by removing the unrelated variables during simplification, and thus also reduce the computational complexity. Accurate triage model and rigorous causal reasoning algorithm lead to the high triage accuracy of our triage system.
In general, the results prove that using the DUCG to outpatient triage is feasible. The system can not only help triage nurses to improve their outpatient triage accuracy, but also can be used as a teaching tool to assist inexperienced triage nurses in acquiring triage knowledge. Because the knowledge base is readable and understandable to nurses, and this can help them to get the overall view of departments and their related clinical features. The heuristic information collection method can teach nurses how to obtain patients' clinical information step by step. Graphical interpretation of triage results can help nurses understand the causal relationship between results and evidence. Thus, their knowledge level will be improved gradually during the process of using the system.

VI. CONCLUSIONS AND FUTURE WORK
In this study, we develop a general outpatient triage system based on the hybrid DUCG to aid triage nurses to achieve outpatient triage accurately. In this system, the M-DUCG is used to create a knowledge base as the basic outpatient triage base. The knowledge base contains 31 different departments, 436 clinical features related to those departments, and 856 lines are used to represent the causal relationships between departments and clinical features. The S-DUCG is used to adjust the basic knowledge base to the different triage requirements of various hospitals. This makes the system have good scalability.
Validation experiments are made to compare the accuracy of our system with some machine learning algorithms and inexperienced triage nurses. Benefiting from the intuitive causal knowledge base and the chaining inference algorithm, the system achieves outpatient triage with high accuracy compared with some machine learning algorithms and the inexperienced outpatient triage nurses. In addition, the knowledge base is understandable to triage nurses, and the inference process and the results are intuitive representation by graphs. Those make triage conclusions and advice more explicable and convincing, further increasing the objectivity of outpatient triage. For the inexperienced triage nurses, they can obtain the triage knowledge by using this system constantly and thus improve their accuracy of triage.
In future work, we will further optimize the basic triage knowledge base and conduct actual triage tests in hospitals. Then, the system will be employed in hospitals to assist triage nurses with outpatient triage.