A Deep Model for Lung Cancer Type Identification by Densely Connected Convolutional Networks and Adaptive Boosting

Timely diagnosis and determination to the type of lung cancer has important clinical significance. Generally, it requires multiple imaging methods to complement each other to obtain a comprehensive diagnosis. In this work, we propose a deep learning model to identify lung cancer type from CT images for patients in Shandong Provincial Hospital. It has a two-fold challenge: artificial intelligent models trained by public datasets cannot meet such practical requires, and the amount of collected patients’ data is quite few. To solve the two-fold problem, we use image rotation, translation and transformation methods to expand and balance our training data, and then densely connected convolutional networks (DenseNet) is used to classify malignant tumor from images collected from, and finally adaptive boosting (adaboost) algorithm is used to aggregate multiple classification results to improve classification performance. Experimental results show that our method can achieve identifying accuracy 89.85%, which performs better than DenseNet without adaboost, ResNet, VGG16 and AlexNet. This provides an efficient, non-invasive detection tool for pathological diagnosis to lung cancer type.


I. INTRODUCTION
Lung cancer has become one of the most common causes of death in the world [1]. It is one of the most harmful malignant tumors to human health. Its mortality rate ranks first among malignant tumor deaths and is the number one killer of cancer deaths among men and women worldwide [2], [3]. There are about 1.8 million new cases of lung cancer per year (13% of all tumors), 1.6 million deaths (19.4% of all tumors) in the world [4], and 5-year survival rate is only 18% [5]. The incidence and mortality of male lung cancer rank first in China, which is related to the higher smoking rate of men [6]. The incidence of lung cancer in Chinese women ranks second only to breast cancer, and the death rate ranks first [7], [8]. The smoking rate of women is low, but the incidence of lung cancer is still high, which is related to women's easy access to second-hand smoke, indoor soot, outdoor air pollution and The associate editor coordinating the review of this manuscript and approving it for publication was Guanjun Liu . so on. For this reason, more and more experts and scholars have begun to pay attention to the diagnosis and treatment of lung cancer, and the state has also invested more funds to encourage in-depth research on lung cancer, with a view to effective prevention and treatment. Currently, lung cancer can be divided into two types according to the degree of differentiation and morphological characteristics: small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC). NSCLC includes three subtypes: squamous cell carcinoma, adenocarcinoma and large cell carcinoma [9]- [13]. Timely diagnosis of lung cancer type is of great clinical significance to help doctors develop effective treatment programs and improve patients' survival time and quality of life.
Medical image analysis has the supreme advantage in the field of health aspect, especially in the field of noninvasive treatment and clinical examination [14]. CT images are one of the filtering mechanism that use attractive fields to capture images in films [15]. In 1998, LeCun et al. proposed a LeNet-5 neural network model to identify manuscript VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ numbers [16]. In 2012, deep convolutional neural network model AlexNet was proposed and won the award in the ImageNet Large-Scale Visual Recognition Challenge [17]. After that, GoogLeNet was inspired by LeNet, but it has a neoteric inception module [18], [19]. In 2015, ResNet was proposed and enable to train a 152 layers' network with lower complexity [20]. In 2017, DenseNet was proposed with a completely new connection model that requires less computation and less model complexity. It has made significant improvements over the state-of-the-art CNN on most benchmark tasks and achieved a remarkable success in many applications such as image recognition, electronic transaction fraud events, etc [21]- [23]. Although many methods have been proposed for image classification, few of them focus on the classification of lung cancer type from CT images without biopsy.
In this work, we use deep learning to identify lung cancer type from CT images of patients in Shandong Provincial Hospital. It has a two-fold challenge: artificial intelligent models trained by public datasets cannot meet such practical requires, and the amount of collected patients' data is quite few. To solve the two-fold problem, we use rotation, translation and transformation methods to expand and balance our training data, and then densely connected convolutional networks (DenseNet) is used to classify malignant tumor into three categories: squamous cell carcinoma, adenocarcinoma and small-cell carcinoma, and finally adaptive boosting (adaboost) algorithm is used to aggregate multiple classification results to improve classification performance.
Experimental results show that our method can achieve identifying accuracy 89.85%, which performs better than DenseNet without adaboost [24], ResNet [20], VGG16 [19] and AlexNet [25]. This provides an efficient, non-invasive detection tool for pathological diagnosis to lung cancer type.

A. DATA PRE-PROCESSING AND AUGMENTATION
In data preprocessing, it is firstly to remove the noise that is obviously not lung cancer from the CT image, and then the histogram equalization method is adopted to change the gray histogram of the original image from a relatively concentrated gray range to a uniform distribution in the whole gray range. The image is non-linear stretched and the pixel value of the image is redistributed so that the number of pixels in a certain gray range is roughly the same. The image contrast is enhanced effectively and the image is clearer. In addition, since the datasets is small and each type of lung cancer image extracted from the datasets is unbalanced, we use rotation, translation and transformation methods to expand and balance our training data. Data augmentation strategy can prevent overfitting and misleading by adding in-variances to existing data, improve the generalization ability of the model and avoid biasing the classification results to more frequent samples. The total data of lung cancer image case was close to 4000 cases after the data were augmented.

B. DenseNet
It is proposed here a DenseNet architecture to classify lung cancer images. DenseNet is a convolutional neural network with dense connectivity, consisting of several dense blocks with dense connectivity and transition layers. In a dense block of L layer introduces L(L + 1)/2 connections, unlike traditional architectures, which only introduces L layer connections. There is a direct connectivity between any two layers. The input to per layer of the network is the union of the outputs of all previous layers, and the feature-maps learned by this layer will be directly transmitted to all subsequent layers as input. Fig.1 shows a dense block through which featuremaps can be concatenated. The l th layer has l inputs, which consists of all the previous feature-maps. Its own featuremaps are also passed on to all subsequent layers.

1) DENSE CONNECTIVITY
In the training process, the weight is updated by calculating the gradient of the loss function. The gradient of one layer depends on the gradient of the previous layer. When moving from the last layer to the first layer in a deep neural network structure, the gradient gradually disappears. In order to solve this notorious problem, several methods have been proposed, such as normalized initialization and batch normalization. One of the most effective methods is to use the connection layer to allow the gradients to pass more quickly and directly. Dense connectivity is a typical convolution layer connection type, which is a very dense fast connection that is used to connect the upper and lower layers to ensure efficient gradient propagation. It successfully mitigates the problem of gradient disappearance and helps deep structured convolutional neural networks to obtain the high and low level features of objects. It is shown in Fig.2 the dense connectivity in curves connect each layer to each other.
The first impression of the word dense connectivity is that it greatly increases the number of parameters and calculation of the network. However, DenseNet is more efficient than other networks, and the key lies in the reduction of computation amount per layer of network and the reuse of features. Each layer of network only requires to learn very few features, thus greatly reducing the number of parameters and calculation. In the DenseNet model, the l th layer receives the feature-maps of all the previous layers, x 0 , x 1 , . . . ,x l−1 , as input, and its output is: where [x 0 , x 1 , . . . , x l−1 ] are the merging of the previous feature-maps in the dimension of channels. This dense connectivity means that each layer is directly connected to input and loss, thus promoting gradient backpropagation and mitigating the phenomenon of gradient disappearance.

3) TRANSITION LAYERS
Since each dense block outputs a large number of featuremaps, convolution is used to reduce the dimension of featuremaps transferred to the next dense block, and pooling is used to reduce the size of feature-maps. When the size of featuremaps is changed, the operation used in (1) is not feasible. However, an important operation of the convolutional neural network is downsampling, which can change the size of feature-maps. In order to facilitate downsampling operations, the network is divided into multiple dense blocks and transition layers. The layers between the blocks are called transition layers, which perform convolution and pooling operations. Therefore, dense connectivity can only be performed in dense blocks, and there is no dense connectivity between different dense blocks. In our experiment, the transition layers between each dense block are composed of BN + ReLu + Conv (1 × 1) + dropout + Pooling(2 × 2).

4) GROWTH RATE
If each function H l (·) produces k feature-maps, there are k 0 + k × (l − 1) input feature-maps in the l th layer, where k 0 is the number of channels in the input layer. That is, the input feature-maps of the network model increases with the number of layers, increasing by k each time. The hyperparameter k is called the growth rate of the network.

5) BOTTLENECK LAYERS
Although only k feature-maps are output per layer, as the depth of block module deepens, the dimensions of later input feature maps become larger. In order to solve this problem, the bottleneck unit was added to the block module. A 1 × 1 convolution can be introduced as bottleneck layer before each 3 × 3 convolution. It improves computational efficiency by reducing the output dimension to 4k and reducing the number of input feature-maps. Our network has such a bottleneck layer: BN + ReLu + Conv (1 × 1) + BN + ReLu + Conv (3 × 3).

6) COMPRESSION
In order to further improve the compactness of the model, we reduce the number of feature-maps of the transition layers. If a dense block contains m feature-maps, the following transition layer outputs the θ m feature-maps, where 0 < θ ≤ 1.
Parameter θ indicates how many times these outputs are reduced to the original, the default is 0.5, so that the number of feature-maps will be reduced by half when passed to the next dense block. The number of feature-maps is reduced, so parameter θ is called the compression factor. When θ = 1, the number of feature-maps across transition layers remains unchanged.
In our experiments, we set the DenseNet structure with 4 dense blocks on the 50 × 50 input images, and set the compression factor θ = 0.5, the growth rate k = 12. The initial convolution layer comprises 2k convolutions of size 7 × 7 with stride 2. The number of feature-maps for all other layers is also set to k.

C. AdaBoost
The AdaBoost classifier is a meta-algorithm classifier, it trains some weak classifiers and assigns the same initial weight to each sample. After each round of training, the weight of each sample will be adjusted according to classifier error rate. Reducing the weight of the sample that was correctly classified in the previous round and increasing the weight of the sample that was misclassified. Increasing the weight of the wrong sample in order to get more attention in subsequent samples. According to the above process, k weak learners are obtained through iterative training. Finally, we perform weighted combination to get a strong learner. As shown in Fig.3 below.
The specific steps are as follows: since multivariate classification is a generalization of binary classification, it is assumed that it is a binary classification problem. Training set sample is T = {(x 1 , y 1 ) , (x 2 , y 2 ) , . . . , (x m , y m )}, y i is the output and y i {−1, 1}, the weight coefficient of the sample set of the k th weak classifier is with w 1i = 1/m i = 1, 2, . . . , m. After k iteration training, weak classifier G k (x) was obtained by training with datasets with weight distribution D(k). Classification error rate on the training datasets is The weight coefficient of the k th weak classifier G k (x) is In (4), if the classification error rate e k is larger, the corresponding weak classifier weight coefficient α k is smaller. In other words, a weak classifier with a small error rate has a larger weight coefficient.
For the Adaboost multivariate classification algorithm, the principle is similar to the binary classification, and the main difference is the weight coefficient of the weak classifier. For example, the AdaBoost SAMME algorithm, the weight coefficient of its weak classifier: where R is the number of categories. In (5), when R = 2, it is a binary classification, which is also consistent with the weight coefficient of the weak classifier in our binary classification algorithm. For updating the sample weight D, the weight coefficient of the sample set of the k th weak classifier is D(k), the weight coefficient of the sample set of the corresponding (k + 1) th weak classifier is  where Z k is the normalization factor, so that D(k + 1) becomes a probability distribution.
In (6), if the i th sample is classified incorrectly, then y i G k (x i ) < 0, leading to the weight of the sample to increase in the (k + 1) th weak classifier. If the classification is correct, the weight is reduced in the (k + 1) th weak classifier. Finally, K weak classifiers were obtained by iterative training, and then the weak classifiers were combined into a strong classifier by linear weighting: In our experiments, the convolutional neural network model with dense connectivity is trained on the datasets to classify CT medical images. Then we use 6 weak classifiers  as integrated members to form a strong classifier, which significantly improves the performance of the classification.
Our lung cancer classification method is shown in Fig.4. Firstly, the lesion information in various types of CT images was extracted, the CT images were preprocessed, and the data were enhanced to balance the proportion of each type of CT images. Then the convolutional neural network model with dense connectivity is trained on the datasets to classify CT images, the CT medical image of lung cancer is divided into three types: adenocarcinoma, squamous cell carcinoma and small cell carcinoma. Finally, adaboost algorithm is used to aggregate multiple classification results to improve the performance of classification method.

A. DATASETS
The datasets used in this algorithm was from Shandong Provincial Hospital. All lesions information in the datasets were first marked by two radiologists. The datasets has 2222 CT images and is divided into three types: adenocarcinoma, squamous cell carcinoma and small cell carcinoma. Among them, there are 1985 lung adenocarcinoma images, 141 squamous cell carcinoma images, and 96 small cell carcinoma images. Histogram equalization method was used to preprocess the image data, enhance the overall contrast of the image to make the image clearer. The methods of rotation, translation, and transformation were used to enhance the data of CT medical images of lung cancer, expand and balance our training data to avoid overfitting. After data enhancement, the datasets is enlarged to 3940, among which 70% of the datasets are used for training and the rest 30% are used for testing. The specific distribution of CT images of three different types of lung cancer in the datasets is shown in Table 1. In the experiment, we ran 5 tests and averaged the accuracy of the VOLUME 8, 2020 five tests to reduce the impact of random errors. The average accuracy is 89.85%.

B. PERFORMANCE METRICS
In the experiment, classifier with supervised learning is used to identify lung cancer types from CT images. The confusion matrix obtained after classification is shown in table 2, among which 1063 CT images were correctly classified in 1183 CT images. Table 3 shows the accuracy of lung cancer image classification.
It is shown in Fig.5 the accuracy rate and loss rate of the proposed model with less trainable parameters in 50 epoches, and accuracy achieves 89.85%.

C. COMPARISON WITH OTHER METHODS
In the experiment, our model is qualitatively compared to other deep architectures. Four networks were trained using our data respectively, including the DenseNet, ResNet, AlexNet and VGG16 methods. All the models in the comparison used a similar 46-layer ConvNet architecture. After the model was trained, we ran five tests for each model separately and averaged the accuracy of the five tests to reduce the impact of random errors. Table 4 shows the results compared with the other four methods, the meaning of the figures before ± represents the average accuracy, and the meaning of the figures after ± represents fluctuations in five tests. The results show that the accuracy of our model on supervised tasks is higher than the other four models, and the results improve significantly with better network architecture. In addition, the advantages of this study are simple, non-invasive and convenient operation.

IV. CONCLUSION
In this paper, we designed an automated classification model of lung cancer CT images using DenseNet network and AdaBoost algorithm. Firstly datasets is enlarged by rotation, translation and transformation methods, which can improve the generalization ability of the model and avoid biasing the classification results. Then denseNet is developed to process the lung cancer datasets and classify the collected data. Finally adaboost algorithm is used to aggregate multiple classification results to improve classification performance. Experimental results show that our model achieves better classification results in CT image classification of malignant tumors, and the accuracy of the test reached 89.85%.
In the future clinical practice, our new method of lung cancer image classification can assist the radiologists's treatment, simplify the steps of lung cancer diagnosis, improve the accuracy of lung cancer diagnosis, and reduce the rate of misdiagnosis and missed diagnosis. In addition, we will use more high-quality lung cancer CT images to process the classification of lung cancer, further improving the accuracy of the network.