Pneumothorax Recognition Neural Network Based on Feature Fusion of Frontal And Lateral Chest X-ray Images

Pneumothorax is a potentially life-threatening disease that requires urgent diagnosis and treat-ment. Clinically, a chest X-ray examination is the first choice for diagnosing pneumothorax. However, it is difficult to diagnose pneumothorax by only frontal chest X-ray imaging when the lesion area is only composed of a small amount of air. Therefore, we propose a pneumothorax diagnosis neural network based on feature fusion, where frontal and lateral X-ray information are fused. In this network, there are two inputs and three outputs. The two inputs are the frontal chest X-ray image and the lateral chest X-ray image. The three outputs are the classification results of the frontal chest X-ray image, the classification results of the lateral chest X-ray image, and the classification results integrating the characteristics of the fused frontal chest X-ray image and lateral chest X-ray image. Our algorithm considers the vanishing gradient problem in the pneumothorax recognition model and introduces the residual block to alleviate this problem. Because of the large number of channels in this model, we also utilize channel attention mechanisms to improve the model’s performance. Our comparative experiments show that neural network fusion of frontal and lateral chest image features can achieve higher accuracy than the single task model. Using only image-level annotation, our pneumothorax model can achieve high recognition accuracy.


I. INTRODUCTION
Pneumothorax (lung collapse) occurs when excessive air accumulates in the pleural cavity between the lung and the chest wall. This air pressure causes the lung to collapse. The main symptoms of pneumothorax are chest tightness, shortness of breath, and cough. Pneumothorax may be caused by physical factors, such as chest trauma or impact, smoking, and pulmonary diseases [1,2]. The diagnosis of pneumothorax is very complex and is usually determined by radiologists based on a chest X-ray examination. However, using a chest X-ray to diagnose pneumothorax is challenging for radiologists. Even experienced radiologists need to carefully adjust image display settings, such as the window width, window level, and image contrast, to make the correct diagnosis of the disease. This work requires a large amount of clinical experience and patience. Sometimes, fatigued doctors will make incorrect judgments. The diagnostic accuracy of pneumothorax highly depends on the expertise of the attending radiologist [3,4].
In the absence of trained radiologists, the correct diagnosis and treatment of pneumothorax are often delayed, which may cause serious injury or even death in patients. The above situation is common in some undeveloped countries and regions. Therefore, there is an urgent need for a computeraided diagnostic [5,6] (CAD) tool to help doctors accurately diagnose and detect pneumothorax. Deep learning-based technology is a popular choice for image segmentation and classification. Great success has been achieved using deep learning in different fields, such as natural scene image understanding [7], geographic exploration [8], and medical image recognition [9,10]. The popularization of object detection, semantic segmentation, and disease classification based on deep learning has greatly relieved doctors of tedious work and improved the diagnosis efficiency [11][12][13][14]. Multitask learning strategies [15,16], such as Siamese networks [17] and auxiliary tasks [18], have attracted increasing attention in computer-aided diagnostics.
In engineering, when designing an outstanding pneumothorax auxiliary diagnosis algorithm, the following challenges are usually encountered. The resolution of a chest X-ray is limited, and it is difficult to distinguish when the lesion area contains only a small amount of air. Air accumulation areas can be scattered at various locations and appear in multiple shapes. Pixel-level medical image annotation (i.e., strong supervision) is expensive and difficult to obtain. Image level medical image annotation (i.e., weak supervision) is relatively easy to obtain, but it is not easy to use to achieve high recognition accuracy [19].
To alleviate these problems, we propose a multitask learning model [15] (i.e., a fusion model) for the automatic detection of pneumothorax. The model can be used to fuse information from frontal and lateral chest X-ray images to realize high-precision automatic recognition of pneumothorax. We invited experienced radiologists to build a pneumothorax image dataset (Haut-NY). This dataset contains not only frontal chest X-ray images but also corresponding lateral chest X-ray images, which is meaningful. Many well-known pneumothorax datasets, such as the National Institutes of Health chest X-ray dataset [19,20] and the Society for Imaging Informatics in Medicine (SIIM) Pneumothorax Challenge Dataset[21], do not contain lateral chest X-ray images. The main contributions of this work are summarized as follows.
• Different from previous works that only use frontal chest X-ray images to identify pneumothorax, we propose a multi-input multi-output neural network that can fuse information from frontal and lateral chest X-ray images. Experiments show that the model's accuracy is higher than that of the model using only frontal or lateral chest X-ray images.

•
The fusion model includes residual block and channel attention mechanism. The residual block alleviates the vanishing gradient problem, and the channel attention mechanism gives different weights to different feature maps. Experiments show that those strategies can improve the pneumothorax recognition accuracy of the fusion model.

•
The fusion model proposed in this paper only needs image-level annotation to achieve high pneumothorax recognition accuracy.
The rest of this study is organized as follows. In Section II, the literature on the automatic diagnosis of pneumothorax is introduced. In Section III, the specific structure and classification results of the fusion model, including the details of the dataset, the hyperparameters, and the comparison with the single input model, are described. In Section IV, a comparison of our model to models from excellent articles in recent years is given. Discussions, conclusions, and future work are given in Sections V and VI.

II. RELATED WORKS
Early automatic pneumothorax detection methods relied on traditional feature extraction techniques. Hough transform [22] was used to model the appearance of pneumothorax in X-ray images and local intensity histograms, image edge detection was used to catch the visceral pleural edge [23], and texture information was used to quantify pulmonary vascular markers [24]. Because the predefined appearance features cannot capture a variety of human lungs and pneumothorax, the diagnostic accuracy of such algorithms is still relatively low. The development of deep learning has introduced a new approach for the automatic diagnosis of pneumothorax. The deep learning algorithm can be used to train the model to classify X-ray images of lungs with pneumothorax and without pneumothorax [25,26]. Cicero et al. [27] pioneered pneumothorax diagnosis. They used GoogLeNet to detect five common lung diseases using more than 35,000 adult chest Xray images. The accuracy of pneumothorax detection was 0.86 AUC (area under the curve). Taylor et al. [28] compared the performance of Inception, VGG, and ResNet neural network architectures, and the AUC obtained for pneumothorax detection was 0.94. Rajpurkar et al. [29] later proved that the performance of ResNet is statistically equivalent to the ability of radiologists to diagnose pneumothorax through chest X-ray images. Park et al. [30] applied the YOLO series network to identify traumatic pneumothorax after chest puncture. Xiyue Wang et al. [14] used a multitask training strategy to improve the accuracy of pneumothorax recognition. The AUC obtained in pneumothorax detection was 0.9786. The highest AUC value was achieved by Xiaosong Wang et al. [31], who combined medical reports with X-ray images from the same patient to achieve automatic pneumothorax classification. The AUC obtained in pneumothorax detection was 0.995. To protect the privacy of patients, they did not publish these reports.

A. OBJECTIVES
The work presented in [14] and [31] is better than that of most other work on pneumothorax recognition. The former used pixel-level annotation, while the latter used medical reports to detect pneumothorax. However, these two types of annotations are expensive. How to use low-cost image-level annotation to achieve high-precision pneumothorax recognition is our main objective. Different sizes of pneumothorax may be presented in the front chest X-ray image and the lateral chest X-ray image of the same patient.
Frontal chest X-ray images and lateral chest X-ray images have their own advantages. Therefore, we aim to use a neural network to fuse frontal chest X-ray image information and lateral chest X-ray image information to improve the accuracy of pneumothorax recognition.

1) DATASET
In cooperation with Nanyang Central Hospital in Henan Province, China, we collected 2,530 pairs of chest X-ray images from patients of different ages and gender (each pair consists of a frontal and lateral chest X-ray image from the same patient). The dataset contains a total of 5,060 digital imaging and communications in medicine (DICOM) files. Each DICOM file includes the patient's protected health information (PHI), including name, sex, age, and image-related information. Specifically, there were 1,670 negative pneumothorax cases and 860 positive pneumothorax cases, which were labeled by experienced radiologists. Figure 1 shows X-ray images of one patient taken from different angles. The specific data distribution is shown in Figure 2 and Figure  3.

2) IMAGE AND METHOD ENHANCEMENTS
Data preprocessing: First, Pydicom was used to convert all DICOM files into PNG files. The original image resolution was 3200 × 3200 pixels. We zoomed in on the image to achieve a 224 × 224 pixel resolution. The advantage of this resolution is that relatively few model parameters are required and a fast training speed can be achieved. In addition, we also attempted to use resolutions of 768 × 768 and 1024 × 1024 pixels. We used bilinear interpolation [32,33] to ensure the quality of the reduced image as much as possible.
Considering the limited number of chest X-ray images, we used the Albumentations image enhancement tool, which is a fast training data enhancement library for OpenCV, to enhance the image. It has a very simple and powerful interface that can be used for various tasks, such as classification, segmentation, and detection. In addition, it is easy to customize and add other frameworks and can be used to convert the dataset pixel by pixel. The specific image enhancement methods we used are shown in Table 2.
Because our model needs to fuse chest X-ray images of two different views of the same patient (i.e., frontal and lateral views), the data enhancement of the two images must be consistent. Otherwise, convergence cannot occur in our deep learning model. . An example of chest X-ray images of a patient from the Haut-NY dataset. The first row shows the original frontal chest X-ray images, and the second row shows the frontal chest X-ray images after image enhancement. The third row shows the original lateral chest X-ray images, and the fourth row shows the lateral chest X-ray images after image enhancement. A consistent data enhancement method must be maintained for a pair of images (i.e., the frontal and lateral chest X-ray images of the same patient). TABLE 2. The image enhancement methods used in this experiment are as follows. The parameter "p" represents probability. For example, the probability of "blur" is 0.8, and the probability of no "blur" is 0.2. For the training dataset, we used nine methods to increase the model's generalization ability. For the test dataset, only the normalization method was used because for a real diagnosis, medical images will not be rotated or blurred.

D. SINGLE INPUT MODEL
In this experiment, we only used 2,530 frontal chest X-ray images. The label of the positive pneumothorax image is set to 1, and the negative pneumothorax image is set to 0. The model only used the frontal branch (i.e., single input, single output), as shown in Figure 6, to input the frontal chest X-ray images and output the binary classification results. We attempted to VOLUME XX, 2017 1 use different pretraining networks. We found that ResNet-50 always achieved the highest accuracy (Table 3). Therefore, we used ResNet-50 pretrained on ImageNet as the backbone. The input size was 224 * 224 * 3. The data were flattened using the flattening layer. Then, the image information was parsed using four fully connected layers with the ReLU activation function. The parameter of the final fully connected layers was 2 (representing the binary classification), and the softmax activation function was used to convert the output value of the binary classification into probability distributions in the range of [0,1]. The loss function was binary cross-entropy.
After the first set of experiments, we used only 2,530 lateral chest X-ray images for the next experiment. The model only used the lateral branches (i.e., single input and single output), as shown in Figure 6, to input the lateral chest X-ray images and output the binary classification results. The training details of the model were consistent with the previous single input single output forward chest X-ray image model.

E. MULTI-INPUT MODEL
Different from traditional learning only through frontal chest X-ray images or lateral X-ray images, we wanted to design a multi-input network that could integrate frontal chest X-ray image and chest lateral X-ray image information to diagnose pneumothorax. The model pretrained on ImageNet was used as a tool to extract image features. We attempted to use several different pretraining models as the backbone. We found that when ResNet-50 was used as the backbone, the model's accuracy was the highest (Table 6). We also found that the characteristics of the ResNet-50 network determine the accuracy. In training the model, we found that one of the challenges is the vanishing gradient problem [34], which occurs when the network is deep. The deeper the network is, the more obvious the vanishing gradient, and the poorer the training effect of the network. However, the shallow network cannot significantly improve network performance. This is a contradictory problem, but the residual block [35] (an important module in ResNet-50.) effectively alleviates the vanishing gradient in a deeper network.  Figure 5 and Formulas 7-9 show how this is achieved. Even if the gradient attenuation occurs in the backward propagation of A-B-C, the gradient at D can still be directly transmitted to A, that is, the cross-layer propagation of the gradient is realized. From the perspective of gradient size, no matter how deep the network structure is, the residual network can maintain a large value of the weight close to the data layer (input) to alleviate the vanishing gradient.
The channel of our fusion model is deep (with a large number of feature maps). Different feature maps have different importances for pneumothorax recognition. Therefore, we decided to introduce a channel attention mechanism to optimize the model. We use the channel attention mechanism Se-Net [36]. After the emergence of SeNet, the loss caused by the different importance of different channels of feature maps in the process of convolution pooling was solved. The squeeze and exception (SE) block improves the representation ability of the network by modeling the dependence of each channel and adjusts the features channel by channel so that the network can learn to selectively strengthen the features containing useful information and suppress useless features through global information.
The basic structure of SeNet is shown in Figure 7. The parameter VC represents the C th convolution kernel, and X S represents the S th input. The Ftr parameter represents the convolution operation before the attention mechanism, and Fsq represents the squeeze operation. After global average pooling, the characteristic information changes from H * W * C to 1 * 1 * C. The Fex parameter represents the excitation operation. After the first fully connected layer, the ReLU layer, the characteristic information changes from 1 * 1 * C to 1 * 1 * C/r, where r is 16. After passing through the second fully connected layer and sigmoid function, the characteristic information changes from 1 * 1 * C/r to 1 * 1 * C (called weight s). The Fscale parameter represents the multiplication of "weights" and "U" obtained in the front convolution to obtain the output. The relevant equations of the channel attention mechanism are as follows: In our work, frontal and lateral chest X-ray images were combined and two-channel attention mechanisms were used. In the first step, the global spatial characteristics of each channel were taken by the squeeze operation as the representation of the channel to form a channel descriptor. In the second step, the dependence on each channel was learned and the feature map was adjusted according to the dependence. The modified feature map was the output of the SE block. The benefits of SE block reprofiling could be accumulated throughout the network. The concat function was used to fuse information of frontal and lateral chest X-ray images, and VOLUME XX, 2017 1 finally, a flatten layer was used to flatten the data. Fully connected layers (FCs) played the role of classifiers in the whole convolutional neural network. Finally, the binary classification results were output using the softmax function (Formula 14). Our model has two inputs and three outputs. The two inputs are the frontal chest X-ray image and the lateral chest X-ray image. The three outputs are the classification results of the frontal chest X-ray image, the classification results of the lateral chest X-ray image, and the classification results integrating the characteristics of the fused frontal chest X-ray image and lateral chest X-ray image. Our fusion model can fuse frontal feature maps and lateral feature maps of chest Xray images and then learn them.
The loss function was binary cross-entropy (Formula 16). Using cross-entropy as the loss function can alleviate the imbalance between positive and negative samples to a certain extent, and the calculated gradient is more stable [37]. The loss function adopted by the three branches was binary crossentropy, but the weighting indices of the three branches are different (Formula 17).  All experiments in this study adopted the 5-fold-cross-validation method, where the data were divided into five equal parts on average. Each data point was completely independent and did not cross other data points. One part of the data was taken for testing in each experiment, and the rest was used for training. The average value was obtained for five experiments.

F. EXPERIMENTAL DETAILS AND OPTIMIZATION METHODS
The following experimental hyperparameter settings were utilized. The Adam optimizer was used, the batch size was 32, and the initial learning rate was 0.0001. The epoch was 60. If the model's performance was not improved after five epochs were trained continuously, the learning rate was decreased to one-tenth of its original value. If the performance after 15 epochs was not enhanced, "early stop" was initiated to prevent overfitting. All networks were implemented based on the Ten-sorFlow framework and trained using NVIDIA GeForce RTX 3080 Ti GPU cards.

1) SINGLE INPUT MODEL CLASSIFICATION RESULTS
The classification results of frontal chest X-ray images are shown in Table 4. First, for the binary classification of the frontal images, based on the 334 nonpneumothorax images, a recognition precision of 0.89 was achieved, and based on the 172 pneumothorax images a recognition precision of 0.82 was achieved. The recognition accuracy was 0.87.
The classification results of lateral chest X-ray images are shown in Table 5. First, for the binary classification of the frontal images, based on the 334 nonpneumothorax images, a recognition precision of 0.90 was achieved, and based on the 172 pneumothorax images, a recognition precision of 0.77 was achieved. The recognition accuracy was 0.85.

2) MULTI-INPUT MODEL CLASSIFICATION RESULTS
For the case of multiple inputs, the classification results of the frontal branch are shown in Table 7. For the 334 images from patients without pneumothorax, a recognition precision of 0.93 was achieved, and for the 172 images from patients with pneumothorax, a recognition precision of 0.85 was achieved. The recognition accuracy was 0.91. In the case of multiple inputs, the classification results of the lateral branch are shown in Table 8. For the 334 images from patients without pneumothorax, a recognition precision of 0.91 was achieved, and for the 172 images from patients with pneumothorax, a recognition precision of 0.87 was achieved. The recognition accuracy was 0.90. Compared with a single input, all values were improved to varying degrees.
The most obvious improvement in the accuracy was a result of the fusion branch. In the case of multiple inputs (without channel attention), the classification results of the frontal branch are shown in Table 9. For the 334 images from patients without pneumothorax, a recognition precision of 0.95 was achieved, and for the 172 images from patients with pneumothorax, a recognition precision of 0.88 was achieved. The recognition accuracy was 0.92. Both the macro avg and weighted avg indices improved to varying degrees. In the case of combining the multiple inputs model and channel attention, the classification results of the frontal branch are shown in Table 10. For the 334 images from patients without pneumothorax, a recognition precision of 0.93 was achieved, and for the 172 images from patients with pneumothorax, a recognition precision of 0.95 was achieved. The recognition accuracy was 0.94. Both macro avg and weighted avg indices improved to varying degrees.

1) INFLUENCE OF MODEL TRAINING SEQUENCE ON ACCURACY
We also found that training three branches in different sequences will affect the performance of the fusion model. We trained our fusion model with four different sequences. 1) Three branches were trained at the same time (train together). 2) First, the frontal branch was trained, and then the lateral branch and fused branch were trained (1-2-3). 3) First, the lateral branches were trained, and then the frontal branch and fused branch were trained (2-1-3). 4) The frontal and lateral branches were trained together, and then fused branch was trained (1 2-3)

2) INFLUENCE OF WEIGHT RATIO OF LOSS FUNCTION ON MODEL ACCURACY
Because our model contains three outputs, the weight ratio between the three loss functions must be considered. In fact, in our experiment, different weight ratios affect the accuracy of the model to a certain extent. We found that when the weight of the loss function is set to 1:0.6:0.6 (fused:frontal:lateral), the fused branch achieved the highest AUC.

I. STATISTICAL ANALYSIS OF THE FUSION MODEL
We attempted to analyze the connection between the frontal branch, lateral branch, and fusion branch from the perspective of statistics. Softmax was used in the process of binary classification to map the output of multiple neurons into the (0,1) interval, which can be understood as probability, to carry out binary classification when we input a chest X-ray image into the model. The softmax function output of the frontal branch was [0.49,0.51], which means that the probability of the corresponding chest X-ray image showing nonpneumothorax is 0.49, and the probability of pneumothorax is 0.51. The softmax function output of the lateral branch is [0.01,0.99], which means that the probability that the lateral branch assesses the corresponding chest X-ray image is nonpneumothorax is 0.01, and the probability of pneumothorax is 0.99.
Let the softmax function output of the frontal branch be [X Frontal, YFrontal], the softmax function output of the lateral branch be [XLateral, YLateral], and the softmax function output of the fusion branch be [XFused, YFused]. The connection between the three is shown in Table 14, where P (YFused>0.5) represents the probability that the corresponding image of fusion branch recognition is positive pneumothorax. Relevant articles from recent years on pneumothorax classification were divided into two categories. The first category was multiclassification, which usually predicts 5-14 different chest diseases, and the AUC for pneumothorax recognition ranges from 0.80 to 0.92 [19,27,[40][41][42]. The second category of articles was the binary classification of pneumothorax [14,38,39]. The work related to the binary classification of pneumothorax was the main comparison for this experiment. Taylor [14] was to use pixel-level pneumothorax annotation combined with a multitask learning model to achieve high-precision pneumothorax classification. Their work was very comprehensive. Among the five evaluated indicators in Table 15, AUC is the most valuable because AUC can resist the imbalance of the number of samples to a certain extent. The accuracy, precision, recall, and F1-score are vulnerable to data imbalance (i.e., the number of positive and negative pneumothorax cases).

V. DISCUSSION
When accuracy was the evaluated indicator, using the fusion network improved the accuracy by approximately 7% compared with using only the frontal chest X-ray image information and approximately 9% more accurate than using only lateral chest X-ray image information. When AUC was the evaluated indicator, our fusion model was 4% higher than that using only frontal or lateral models. In fact, by comparing the data in Table 11, we found that no matter which performance index was used as the evaluation, the performance of the fusion network was the best.
Our further experiments showed that the accuracy of pneumothorax recognition was related to the training sequence of the model and the weight ratio of the loss function. When the training sequence was three branches trained simultaneously and the weight of the loss function was 1:0.6:0.6 (fused:frontal: lateral), the model's accuracy was the highest.
Referring to Table 15, using only image-level annotation, our model still achieved high pneumothorax recognition accuracy.

VI. CONCLUSION AND FUTURE DIRECTIONS
We proposed a pneumothorax binary classification neural network based on feature fusion. This was meaningful because most of the literature on pneumothorax recognition only considers frontal chest X-ray images. Our model could be used to fuse frontal and lateral X-ray information to achieve higher precision pneumothorax recognition. The model design considered the phenomenon of the vanishing gradient in deeper neural networks, so we introduced the residual block to alleviate it. There were too many channels in the feature map after feature fusion, so the channel attention mechanism SeNet was used to adjust the feature map. Comparative experiments showed that the accuracy of this method was higher than that of the traditional single task pneumothorax recognition network. The main value of our work is that only using imagelevel datasets can achieve high pneumothorax recognition accuracy. However, we need pairs of image-level annotations (frontal annotation + lateral annotation) rather than only front images. Therefore, our dataset is more expensive than the dataset containing only frontal images. This is a limitation of our method. However, the cost is still lower than that of pixel-level annotations, even for paired image-level annotations. Therefore, the proposed method may assist radiologists with the prompt and accurate diagnosis of pneumothorax and precise treatment planning.
Future work is as follows. First, we will improve the dataset, expand the number of images, invite experienced radiologists to add pixel-level annotations to our dataset, and conduct indepth research around visualization techniques. Second, we will combine our model with the expert system [43] and the fuzzy consensus [44], which play an important role in artificial intelligence-aided diagnosis and the internet of medical things.