Interpretable Pneumonia Detection by Combining Deep Learning and Explainable Models With Multisource Data

With the rapid development of AI techniques, Computer-aided Diagnosis has attracted much attention and has been successfully deployed in many applications of health care and medical diagnosis. For some specific tasks, the learning-based system can compare with or even outperform human experts’ performance. The impressive performance owes to the excellent expressiveness and scalability of the neural networks, although the models’ intuition usually cannot be represented explicitly. Interpretability is, however, very important, even the same as the diagnosis precision, for computer-aided diagnosis. To fill this gap, our approach is intuitive to detect pneumonia interpretably. We first build a large dataset of community-acquired pneumonia consisting of 35389 cases (distinguished from nosocomial pneumonia) based on actual medical records. Second, we train a prediction model with the chest X-ray images in our dataset, capable of precisely detecting pneumonia. Third, we propose an intuitive approach to combine neural networks with an explainable model such as the Bayesian Network. The experiment result shows that our proposal further improves the performance by using multi-source data and provides intuitive explanations for the diagnosis results.


I. INTRODUCTION
Pneumonia is a respiratory infection caused by bacteria, viruses, or fungi, and it has been known as a quite common and potentially fatal disease in the past two centuries. The incidence rate of Pneumonia is quite high in the extreme-age group. Around 450 million people (or about 7% of the world's population) were diagnosed with pneumonia each year; and about 4 million deaths were reported [1]. The diagnoses of The associate editor coordinating the review of this manuscript and approving it for publication was Yizhang Jiang . pneumonia usually start with examinations of chest X-ray images by well-trained specialists [2]. Preliminary results are then written into examination reports and submitted to clinicians. The final conclusions are given by the clinicians according to the analysis on the reports and some clinical symptoms. This process is usually cumbersome and sometimes leads to disagreements between clinicians [3]. Moreover, the signs and symptoms of pneumonia vary on different causes, patients and other factors, and the conditions of the disease usually change rapidly, which makes the pneumonia detection complicated. Existing computer-aided diagnosis systems for pneumonia usually take chest X-ray images, Computed Tomographies (CT), or Magnetic Resonance Images (MRI) as input [4]. But practically, in a real diagnosis procedure, a human physician uses not merely these images, but also some observable clinical features as criteria. Symptoms such as fever, cough, and chest pain are also very crucial to detect the disease. Motivated by the diagnosis process of human experts, we combine the clinical observation with the medical images. We propose a model named MulNet in this paper, which uses 7 typical symptoms and the chest X-ray images as input for pneumonia detection, as shown in Figure 1. The results show that, the combination of deep learning and Bayesian Network can improve the performance, as well as the interpretability of the system. Bayesian Network structure construction methods are divided into a scoring-based method and a constraint-based method [5]. The score-based method selects the structure with the highest score as the best Bayesian Network structure from the sampled structure according to the scoring criteria (such as K2 [6], BIC [7]). However, this method ignores the relationship between the result node and the factor nodes. So we propose a constraint-based algorithm that combines medical knowledge to build a reasonable Bayesian Network structure.
To be specific, we first build a large dataset of community-acquired pneumonia (distinguished from nosocomial pneumonia) based on real medical records consisting of 35389 cases. Second, we train a prediction model with chest X-ray images and reports, which is capable of precisely detect pneumonia. Third, we propose an intuitive approach to combine neural networks with Bayesian Network, which provides intuitive explanations for the diagnosis results. The experiment result shows that our proposal not only further improves the performance by using multi-source data, but also provides intuitive explanations for the diagnosis results.
To summarize, the main contributions of this work are as follows: 1) We establish a large data set for pneumonia detection, which contains 35389 cases (section III). 2) We propose an intuitive method to integrate multisource data such as chest X-ray images and clinical reports in natural language to predict pneumonia (section V). 3) We propose an approach to combine medical knowledge with a Bayesian Network, which constructs a reasonable Bayesian Network structure and improves pneumonia detection's interpretability (section V).
We believe that our proposal is general enough to be used in other prediction models by fine-tuning, and is straightforward to be extended by using other explainable models such as Situation Calculus, Nonmonotonic Logics, Latent Trees, etc.

II. RELATED WORK A. THE DATASETS OF PNEUMONIA
Initially, [8]- [10] proposed the non-large-scale labeled datasets (under 2000 samples). It is challenging to train a meaningful model with deep learning by the initial dataset. In recent works, [2] build a hospital-scale chest X-Ray database called ChestX-ray8, which contains 32717 cases with eight common thoracic diseases. Later in 2017, [11] used DenseNet Image Encoder to classify pneumonia with VOLUME 9, 2021 AUC of 0.713. [12] developed CheXnet with 121 convolutional layers and yielded AUC 0.7680 in pneumonia prediction. Significantly, CheXpert [13] is a large dataset with 224316 samples chest radiographs from 65240 patients. However, the number of pneumonia cases in CheXpert is insufficient because its pneumonia cases are lower than 5000. Mendeley Dataset [14] with 5232 chest X-ray images (3883 pneumonia and 1349 normal) was collected from a Children's medical center in Guangzhou, China. Chest X-Ray Images Pneumonia [15] is a part of Mendeley and Cohen JP Dataset [15], authority prepared the dataset by checking and screening raw images to ensure quality. Our dataset contains a total of 44327 chest X-ray images, far more than other datasets. All other pneumonia datasets are from physical examination records, and our dataset is from the actual medical records in outpatient and inpatient. Therefore, there are two types of chest X-ray images in other datasets: pneumonia and normal, but our dataset includes pneumonia and other diseases. The classification of pneumonia and other diseases is more practical and more challenging clinically.

B. THE PNEUMONIA DIAGNOSIS MODELS
Few studies used multimodal medical datasets influenced by [16]. Most models are based on a CNN-RNN framework to achieve transforming image information into semantic information. Obviously, [17]- [19] are dedicated to generating medical reports through medical imaging. As a result, the transformed semantic information is a co-attention model with image information [20]. On the other hand, integrating reports with medical images is used to improve the ability of disease diagnosis [21], [22]. Likewise, [23] trained a small-scale image dataset to diagnose diseases. Still, its results compared using only reports and using only images with increased accuracy of 4% and 7%. Nevertheless, it is not efficient enough on large-scale datasets. Recently, a method was proposed to screen features using CNN and machine learning, which worked well for feature extraction but did not use features to diagnose pneumonia [24]. [25] automatic binary classification of pneumonia images based on fined-tuned versions of CNN. [26] and [27] proposed using CNN and transfer learning to diagnose pneumonia, but Ima-geNet [28] is generally used to train pre-trained models. However, these pneumonia classification models only use chest X-ray images to diagnose pneumonia, ignoring the impact of clinical symptoms.

C. THE EXPLAINABLE MODELS
Due to the successful application of deep learning on images such as face recognition [29], the 3D face-alignment method [30], which can run on a CPU in real-time, is used in human life. CAD uses deep learning to improve the accuracy of diagnosis. An effective CAD system for all cell identity from microscopic blood images was recently proposed [31], which first extracts all categories of cells and then extracts each cell's characteristics. But the current CAD system lacks interpretability.
Gradient-weighted Class Activation Mappings (Grad CAMs) is widely practiced in current medical interpretations [32]. To achieve an explanatory model for disease diagnosis, [2], [33] implemented Grad CAMs through images that can display the concerning location. Besides, the latent tree has obtained excellent results in the interpretability of Chinese medicine, which deployed data-driven methods and provided a theoretical foundation for the disease classification [34]- [36]. The two explainable approaches mentioned above differ from the explanatory nature of our proposal. The formers utilized only images as a competent diagnosis of multiple diseases, and the latter only classified the division of diseases.

III. DATA PREPARATION
We propose a systematic method to create the pneumonia dataset: first, pneumonia can be categorized into CAP (community-acquired pneumonia) and HAP (nosocomial pneumonia). In this work, we mainly focus on CAP, which is acquired in the community; therefore, we merely selected pneumonia cases from the respiratory medicine department and pediatric department. Then, we select pneumonia cases based on the ICD-10 [37] code in the records. In the electronic medical records collection process, the coding staff codes the diseases according to the doctor's reports and diagnosis results. Pneumonia is a broad concept. For example, the ICD-10 code of Haemophilus influenzae pneumonia is J14, and the ICD-10 code of streptococcal pneumonia is J13. Finally, we select the codes J12 to J18 (including J12 and J18) as the pneumonia codes. 35389 cases in the dataset were performed both inpatient and outpatient between October 2017 and January 2020. Some cases of pneumonia also have other diseases; similarly, most cases without pneumonia have other diseases. To build a model that can be applied to more patients, the dataset is created from patients with many age groups, including cases of the elderly and children. Considering the difference in the diagnosis of patients with long-term pneumonia, we only use the first record.
Community-acquired pneumonia (CAP) is a common disease with potential life risk, especially in the elderly and patients with comorbidities [38]. The clinical diagnosis of CAP includes three phases: 1) community onset; 2) Clinical manifestations of pneumonia; 3) chest imaging examination. Clinical diagnosis can be established after meeting only 1), 2) or 3) excluding other diseases, such as tuberculosis and lung tumor. Therefore, in the process of diagnosing CAP, clinical manifestations are significant. The potential relationship between various clinical manifestations can provide a more reliable basis for Computer-Aided Diagnosis. The 7 indicators, 1) cough, 2) hemoptysis, 3) chest pain, 4) fever, 5) dyspnea, 6) wet rales, and 7) dry rales, are described by the pneumonia diagnosis through the Internal Medicine as Figure 2.
We extracted the required clinical manifestations from the reports and made valid tags of each report resulting in Training-BN, as shown in Figure 2. According to pneumonia  diagnosis, it is acknowledged that cough, hemoptysis, chest pain, fever, dyspnea, wet rales, and dry rales are essential criteria. So, we extract clinical manifestations related to cough, hemoptysis, chest pain, fever, dyspnea from the chief complaint, and wet rales and dry rales from the physical examination. We set the corresponding binary bit to ''1'' if the patient developed any one of the symptoms. For example, tag ''1001001'' identified that the patient developed a cough, fever, dry rales, and no other symptoms. We designed some basic textual processing to extract clinical manifestations from the chief complaint and physical examination. Taking ''fever'' as an example: first, we observed that the doctors generally use the words either ''fever'' or ''no fever'' to record whether the patient catches a fever. Then, we split the chief complaint by each Chinese punctuations like '', '' and '';.'' Then, we extracted the sentences containing ''heat.'' If the keyword in a sentence is ''heat,''; the bit is marked as ''1'' (the word ''heat'' is also means ''fever'' in Chinese.) Moreover, considering that some doctors might have their expression style, we manually reviewed each sentence containing ''heat'' and corrected the tag if ''heat'' was found.

IV. DATA SPLITTING
We divide the dataset with X-ray images and electronic medical records into three parts: 1) CNN training set (Training-CNN), 2) Bayesian Network training set (Training-BN), and 3) test sets. Examples of chest X-ray images are shown in Figure 3, the left side of the figure is a chest X-ray image of a patient suffering from pneumonia, with patchy shadows in the red box, and the right side of the figure is a normal person's chest X-ray image.
Similar to CheXpert, the CNN Training set consists of chest X-ray images and their corresponding tags. In CheXpert, an image owns not only the tag of pneumonia, but also the tags of other 13 lung diseases, otherwise than, the images in CNN Training Set correspond to tags merely of pneumonia. The CNN training set is further divided into three for multiple training rounds. After training, the CNN model takes X-ray images as input and gives diagnosis result, which is either positive or negative. The result can be further associated with the data as a label for Bayesian Network training. Here are two types of test sets: 1) Test-CNN and 2) Test-BN. Test-CNN and Test-BN had the same chest X-ray images, but Test-BN contained not only the chest X-ray images but also the corresponding reports adding each chest X-ray image. Inevitably, Training-CNN and Training-BN have non-overlapping data. See Table 1 for the description details of Training-CNN, Training-BN, Test-CNN, and Test-BN.
Our test set are annotated by two respiratory specialists, in order to accurately label data within a limited time, 200 cases are randomly selected as the test set, and the proportion of pneumonia in the test set is 50%. We created a website showing the chest X-ray images and the admission record of each case to assist physicians in annotating. In each case may have frontal or lateral radiographs or both. We took the notes from two physicians and the final diagnosis corresponding to each case as the ground truth. If two physicians annotated a case positive, it would be marked as positive; otherwise, it would be negative.

V. THE PROPOSED APPROACH A. IMAGE MODEL WITH CNN
The training process is divided into two steps. Firstly using CheXpert dataset to train DenseNet121 as a pre-training model, and secondly continue to train the pre-training model on Train-CNN dataset, converting the probability into 0 or 1 as the output. The details are as follows: CheXpert is a large chest X-ray image dataset, which also contains pneumonia data. By CheXpert, we have trained a well-performed pre-training model. We implemented DenseNet121 [39] as our model. DenseNet proposes a more radical dense connection mechanism than traditional networks. All layers are connected; specifically, each layer accepts all the layers ahead as its additional input. This connection enhances the reuse of features and allows the final classifier to make decisions based on all the characteristics of the entire network, see Figure 4. The model input x 0 is the chest X-ray image owned by the case, and the model has a total of l layers. H l () is the non-linear transformation. [x 0 , x 1 ,.., x l−1 ] indicates the concatenation of the feature-maps produced in each layer. x l represents the output of the model, and Equation 1 shows the reuse of features for calculation x l .
Chest X-ray images of pneumonia were fed into the network with the size of 320 × 320 pixels. The β-parameters of Adam optimizer were set to default at β 1 = 0.9, β 2 = 0.999, and the learning rate was 1 × 10 4 .
We trained a new model of pneumonia diagnosis from the CheXpert data and our dataset. The AUC score on the CheXpert's validation set was 0.74. Then, we trained our three batches to get the best CNN model.
We built a Training-CNN dataset to train the pre-training model. In addition, we have adopted some data augmentation technologies, in which each example was rotated randomly between −25 and 25 degrees, shifted randomly between −25 and 25 pixels, and flipped horizontally with 50% probability while in training. To took the CNN output as the input of the Bayesian Network, we calculated the Youden's index [40] as the threshold to convert the probability value of the CNN model output into 0 or 1.

B. MULNET
A pure connectionist approach can provide diagnostic results, but it lacks interpretability and transparency. Therefore, the model needs to be able to diagnose pneumonia and have interpretability. We trained a Bayesian Network to diagnose pneumonia, which was called MulNet. As shown in Figure 6, the training dataset we used is Training-BN, in which the chest X-ray image is the input from the trained CNN model, and the output of the model is 0 or 1. The clinical manifestations in the reports in Training-BN were extracted into a 7-dimensional vector through the specific textual processing. As a final input of MulNet, the 7-dimensional vector and the CNN model's binary output were contacted into an 8-dimensional vector.
In the construction of the Bayesian Network, we hope to be able to combine medical knowledge. Cough, hemoptysis, chest pain, fever, dyspnea, wet rales, and dry rales are essential factors in the diagnosis of pneumonia, and chest X-ray image is also an essential part of the diagnosis of pneumonia, so the classification model should be able to combine all features to diagnose pneumonia.
We propose an algorithm called MGS to construct the Bayesian Network structure by improving the constraintbased GS algorithm [5] shown in Algorithm 1. for each S ⊆ B do 7: if CONDINDEP(X,Y,S) then remove link X-Y from G; 8: break 9: end if 10: end for 11: end for /* Orient edges */ 12: for each X ∈ {V } and Y ∈ Bd(X ) do 13: for each Z ∈ Bd(X ) \ Bd(Y ) \ {Y } do 14: orient Y → X /* to be corrected if a test yields independence */ 15: 16: for each S ⊆ B do 17: if CONDINDEP(Y,Z,B) and X ! = V 0 then remove orientation Y → Z; 18: break 19: end if 20: end for 21: if Y → X then break 22: end if 23: end for 24: end for 25: return G In this code, V 0 stands for the result node, Mb(X ) stands for the boundary of X , we note a conditional independence test with a subroutine call CONDINDEP(X , Y , Z ): ideally, this function returns true when (X ⊥Y | Z ) holds, and false otherwise. The algorithm first computes the Markov blanket for each factor nodes from data and then defines the Markov blanket for the result node as all factor nodes (''cough'', ''hemoptysis'', ''chest_pain'', ''fever'', ''dyspnea'', ''wet_rales'', ''dry_rales'', ''pictures''). This solves the problem that the result node and the factor nodes are not in the same Markov blanket.
Step 2 selects the smallest base search set for each phase and performs further conditional-independence tests around each variable to infer the structure locally.
Step 3 of the algorithm orients the arcs whenever it finds that conditioning on a middle node creates a dependency without V 0 (''pneumonia_or_not'') and all nodes connected to V 0 point to V 0 . According to this algorithm, the Bayesian Network is constructed as shown in Figure 5.

VI. IMPLEMENTATION & EVALUATION A. MATRIC
In order to test the effectiveness and robustness of the binary classification pneumonia model, the commonly medical standards are used to measure the performance of the model, that is, recall, precision, F1-score and AUC (Area Under the ROC Curve) [41].
The recall is defined as (2): The precision is defined as (3): The higher the recall, the lower the accuracy and vice versa in most cases. F1-score is defined to take both recall and precision into consideration (4): AUC is defined as the area under the ROC curve. Obviously, the value of this area will not be greater than 1. Because the ROC curve is usually above the y = x line, the value ranges of AUC are from 0.5 to 1. If the AUC is larger, the classifier is better.
Two types of 95% confidence intervals are generally constructed around proportions: exact 95% confidence interval and asymptotic. Because the sample proportion is a good approximation of normal distribution, asymptotic confidence VOLUME 9, 2021 interval is used to calculated by assuming a normal approximation of the sampling distribution.

B. TRAINING
The experimental environment was an Ubuntu Linux server with Kaby Lake GT2 GPU. The CNN model was implemented with PyTorch [42] (GPU and Ubuntu versions) framework, and BN and DT had been implemented with Scikit-learn [43] framework. The entire experimental process was divided into six steps.
Step One: we trained a pre-trained model on CheXpert with DenseNet, and the AUC of CheXpert's validation was 0.74.
Step Two: we continued to train the pre-training model with our X-ray image dataset, i.e., Training-CNN. To improve the reliability of the experiment and to reduce the accidental error, we trained three times to obtain 3 CNN models, which had an average AUC of 0.90. It increased by 0.16 percent comparing to the previous test on the validation set of CheXpert.
Step Three: we predicted our chest X-ray dataset of Training-BN and Test-BN. We transformed the output of trained CNN models from probability values to 0 or 1.
Step Four: a 7-dimensional vector is extracted from the report dataset corresponding to the chest X-ray dataset, i.e., the clinical manifestations in each report of the Training-BN and Test-BN.
Step Five: label outputs from CNN was contacted with each 7-dimensional vector extracted from the report. As a result, there were 8-dimensional vectors.
Step Six: construct a Bayesian Network structure called MulNet, then train and test MulNet. See Figure 6 for the complete training process.
When training Bayesian Network, a 10-fold crossvalidation method is used to select the best parameters and avoid over-fitting with the partitioning. First, the training set is divided into ten parts, nine parts are used as the training set, and the rest is used as the validation set. Then the training was repeated ten times, and the AUC average was used as the evaluation criterion to select the best model.

C. COMPARISON AND DISCUSSION OF STATISTICAL MANIFESTATIONS
The Figure 7 shows the calibration curve of MulNet and the estimated probabilities obtained with MulNet by both Isotonic calibration [44] and Sigmoid calibration [45]. The calibration performance is evaluated with Brier score [46], reported in the legend (the smaller the better). Isotonic calibration and Sigmoid calibration also improves the Brier score slightly.
We selected three models for comparison, and they were Support Vector Machine (SVM) with linear kernel, Random Forest, Decision Tree (DT) respectively. Then use the 10-fold cross-validation to select the best parameters and calculate the average AUC value for ten training sessions [47]. The average AUC of MulNet is 0.86, the average of AUC of DT is 0.87, the average of AUC of Random Forest is 0.86, and the average of SVM is 0.77(we calculate the functional distance from the sample point to the segmented hyperplane and then convert the distance into a probability value).
As shown in Table 2, except that the precision of MulNet is slightly lower than DT, the other are the highest. MulNet achieves an AUC of 0.87(95% CI 0.82, 0.92), a precision of 0.73(95% CI 0.65, 0.80), a recall of 0.94(95% CI 0.85, 0.98), and an F1-score of 0.82(95% CI 0.74, 0.88). Since the features extracted from reports and chest X-ray images are low-dimensional vectors, the AUC value of SVM can also reach 0.79(95% CI 0.74, 0.84). Meanwhile, to evaluate the statistical significance of the clinical information, we implement the paired t-tests (95% significance level) on regression performances of our model and the competing models. In terms of classification capabilities, the performance gap between Random Forest, DT, and MulNet is not obvious, but MulNet has a greater advantage in interpretability.

D. INTERPRETATIVE VARIABLES OF MULNET
In addition to the ability to classify pneumonia accurately, MulNet is more importantly explainable. Compared with SVM, Random Forest and DT, MulNet shows the relationship between different factor nodes. For any result of diagnosis, the probability from root (result node) to leaf (factor nodes) can be analyzed.
The Bayesian Network assigns a conditional probability table (CPT) to each variable, and CPT is used to explain the causality between nodes. We will display and analyze the conditional probability table of some nodes. Table 3 shows the conditional probability table of the node ''pictures''. When fever symptoms occur, the possibility of abnormalities in the chest X-ray images is the greatest, with a probability of 0.69. However, the occurrence of chest pain, on the contrary, reduces the possibility of abnormalities in the chest X-ray images. When chest pain occurs, regardless of whether fever symptoms occur, the possibility of abnormalities in the chest X-ray images is below 0.35, and the probability that both symptoms do not occur similar. Medically speaking, patients with pneumonia may be accompanied by chest pain, but chest pain is more common in other diseases, and fever is a common symptom in patients with pneumonia. The conditional probability table explains this phenomenon. Table 4 shows part of the conditional probability table of the ''pneumonia_or_not'' node. When the patient has a fever, the probability of suffering from pneumonia is 0.15. The probability of pneumonia when the patient has a fever and abnormalities in the chest X-ray images is 0.60. When the patient not only has fever and abnormalities in the chest X-ray images but also has chest pain, the probability of pneumonia is 0.92. According to the conditional probability table, it can VOLUME 9, 2021  be inferred that when the patient has fever and abnormalities in the chest X-ray images, there is more than half of the probability of suffering from pneumonia. When the patient has a fever, abnormalities in the chest X-ray images, and wet rales, it is almost certain that the patient suffered pneumonia.

A. PERFORMANCE COMPARISON OF CNN MODELS
In MulNet, the CNN model is an essential part used to learn and analyze chest X-ray images. The CNN model chosen in this paper is DenseNet121. Before choosing this model, a variety of models were trained and tested on Test-CNN. The comparison models we select are Inception-V4, ResNet, Xception, and AlexNet. As shown in Figure 9, using AUC and F1-score as indicators to evaluate the performance of the model, DenseNet's AUC and F1-score values are 0.829 and 0.759, respectively, which are the best in both indicators. The experimental results demonstrate that the feature reuse  technique, DenseNet121, is more suitable for learning chest X-ray images.

B. THE IMPACT ON THE DIFFERENT TYPES OF DATA INPUT
We trained DenseNet model to estimate the ''pictures'' node values of the Bayesian Network by creating Training-BN and Testing-BN. We respectively compared the AUC values of the three different input models as shown in Figure 10: 1) The AUC was 0.865 when MulNet integrates chest X-ray images and the clinical the reports; 2) The AUC was 0.829 when we took only chest X-ray images to diagnose via DenseNet; 3) When we took only the reports to diagnose pneumonia by MulNet, the AUC is 0.801. It confirms that combining the two different types of information to diagnose pneumonia has the most substantial AUC value. Furthermore, the AUC of chest X-ray images result is better than the report.

C. THE WEIGHTS OF DIFFERENT NODES IN THE DECISION TREE
After the DT training, we could calculate the weight distribution of the different nodes in the DT. The results are presented in Figure 11, which indicates that the ''pictures'' node takes up the highest weight, exceeding 0.6. ''cough,'' ''fever,'' ''wet_rales,'' and ''dry_rales'' also had a large proportion of weight, but the weight of ''hemoptysis,'' ''chest_pain,'' and ''dyspnea'' were low. The medical manifestation and the X-rays image are the most important for clinical diagnosis. Similarly, analyzing the condition from the chest X-ray images or CT and then looking at essential symptoms such as cough, fever, wet rales, and dry rales. Note that hemoptysis, chest pain, and dyspnea are not typical symptoms in pneumonia patients.

VIII. CONCLUSION
In this paper, we propose a multi-data and interpretive medical-assisted diagnosis model for pneumonia, and we have created a large-scale dataset of pneumonia diagnosis annotated by respiratory specialists. Our model consists of CNN and the Bayesian Network (BN) combined with two types of data: 1) chest X-ray images and 2) medical reports. Moreover, the model provides diagnostic explanatory information giving that physicians can have a better understanding of the diagnosis result. The results showed that our model was better than just using only images or only reports. The model works best when compared to a variety of baselines. Next, we are working on to classify pneumonia deeper, such as to determine whether it is bacteria, viruses, or fungi. In the future, we may add a knowledge map as the input of the model. We are constructing a large-scale knowledge graph related to pneumonia, so that the classification ability of the model will be further improved.