A Comprehensive Review of Computer-Aided Diagnosis of Pulmonary Nodules Based on Computed Tomography Scans

Lung cancer is one of the malignant tumor diseases with the fastest increase in morbidity and mortality, which poses a great threat to human health. Low-Dose Computed Tomography (LDCT) screening has been proved as a practical technique for improving the accuracy of pulmonary nodule detection and classification at early cancer diagnosis, which helps to reduce mortality. Therefore, with the explosive growth of CT data, it is of great clinical significance to exploit an effective Computer-Aided Diagnosis (CAD) system for radiologists on automatic nodule analysis. In this article, a comprehensive review of the application and development of CAD systems is presented. The experimental benchmarks for nodule analysis are first described and summarized, covering public datasets of lung CT scans, commonly used evaluation metrics and various medical competitions. We then introduce the main structure of a CAD system and present some efficient methodologies. For the extensive use of Convolutional Neural Network (CNN) based methods in pulmonary nodule investigations recently, we summarized the advantages of CNNs over traditional image processing methods. Besides, we mainly select the CAD systems developed by state-of-the-art CNNs with excellent performance and analyze their objectives, algorithms as well as results. Finally, research trends, existing challenges, and future directions in this field are discussed.


I. INTRODUCTION
Over the past decades, cancer treatment has been the critical focus of medical research on human health all around the world. According to the 2020 cancer report released by World Health Organization, cancer is the second leading cause of death globally, with an estimated 9.6 million people deaths in 2018, accounting for one in six deaths [1]. The global cancer burden is heavy and growing. Among various types of cancers, lung cancer is the most frequently diagnosed cancer with the highest rate of incidence and mortality, as shown in Fig. 1. In the United States, lung cancer death rate continuously declined from 2008 to 2017, yet it still caused more deaths in 2017 than breast, prostate, colorectal, and brain The associate editor coordinating the review of this manuscript and approving it for publication was Yi Zhang . cancers combined [2]. Many risk factors, such as outdoor air pollution and the prevalence of tobacco use, contribute to most of the deaths and disease from lung cancer. However, lung cancer treatment is becoming more and more unaffordable, and healthcare systems are struggling to provide new cancer cures for people. Therefore, lung cancer interventions, including primary prevention, screening, and early diagnosis, remain a top priority and are more meaningful for people to reduce financial and psychological barriers [3]. Pulmonary nodule analysis is one of the effective cancer prevention interventions, consisting of detection step and classification step.
Generally, pulmonary nodules are characterized as a round opacity or irregular lung lesions with diameters from approximately 3 mm to 30 mm, which can be solitary or multiple. They are complex in number (single or multiple), size VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ (diameters 8mm or diameters>8mm), shape (round, polygonal or irregular), margins (smooth, lobulated or spiculate), location (well-circumscribed, juxta-pleural or juxta-vascular) and density (solid, subsolid or ground glass nodule). Some types of nodules are shown in Fig. 2. These nodules are common, and most of them are benign. In many cases, lesions of pulmonary nodules are often asymptomatic and can increase the difficulty of diagnosis. However, studies show that nodules with large size (diameters>8mm), subsolid, spiculate, lobulated characteristics are more likely to be malignant [16], [17]. From [10], the 5-year survival rate is only 10-15% for patients diagnosed with lung cancer, while the rate for those with early-stage cancerous lesion completely resected increased to 65-80%. It is challenging but essential to determine whether the nodule is malignant in an early stage. Low-dose computed tomography (LDCT) Screening dramatically reduces mortality from lung cancer [8]. To be specific, the National Lung Screening Trial (NLST), which is the largest trial to date, demonstrated that a 20.0% decline in mortality rate among participants at high risk for lung cancer compared with chest radiography screening. More recently, the Multicentric Italian Lung Detection Trial showed a 39% reduction in mortality compared with no early intervention [2], [9]. LDCT screening detects more pulmonary nodules and has lower radiation damage, which is helpful for radiologists to diagnose early-stage lung cancers and make treatment plans. Thus, the open datasets, evaluation metrics, and developing algorithms that we select in this article, are all related to LDCT acquisition techniques.
With the popularization of LDCT screening techniques, an enormous increase in CT scans burdens radiologists heavily. Manual analysis in massive CT scans is becoming a very tedious and time-consuming task. Therefore, with the aim of lessening the radiologist's workload, an efficient Computer-Aided Diagnosis (CAD) system is necessary to facilitate the process of automatically analyzing large amounts of CT scans. In recent years, CAD systems have been widely adopted in addressing various diseases [3]. For its high validity and reliability in clinical diagnosis, the global CAD market estimated to reach 2.7 billion dollars by 2025, expanding at a compound annual growth rate of 11.6% over the forecast period [15]. Specifically, a conventional CAD system can be divided into a detection system (CADe) and a diagnostic system (CADx). CADe aims to locate the interest regions of the lung CT scans to detect abnormal lesions. CADx is designed to assist radiologists or clinicians in determining the type, malignancy of the anomalies. In general, a CAD system for lung cancer lies the emphasis on detection and classification of pulmonary nodules, consisting of three stages: (1) preprocessing, (2) nodule detection, including candidate nodule detection and false positive reduction, (3) nodule classification. Preprocessing is mainly conducted to reduce noise, segment the Regions Of Interest (ROI) in the lung for narrowing the search range of pulmonary nodules, and normalize the data. In the nodule detection stage, candidate nodules should be detected as many as possible, which often results in high sensitivity and low accuracy. Then the false positive reduction step should be performed to catch precise nodules marks. Finally, the classification stage aims to predict the probability of nodule malignancy [4]. Practical algorithms of three stages of CAD development will be covered in more detail in section III.
Numerous published works have been applied to improve the performance of CAD systems for pulmonary nodule analysis. In the case of insufficient resources and dataset, researchers usually use traditional machine learning methods such as multiple gray-level thresholding, linear discriminant analysis, distance transformation and Support Vector Machine (SVM) for quick investigation on lung nodules [18]- [23]. However, using Deep Learning approaches in various medical imaging tasks has been a mainstream trend in the past ten years. Deep Neural Networks (DNNs), especially Convolutional Neural Networks (CNNs) have repeatedly been shown the outstanding performance in many open computer vision competitions, including ImageNet challenges and Microsoft Common Objects in Context (MS COCO) challenges. For the high adaptability of CNNs, many CNN-based models, such as U-Net, Faster Region CNN (Faster R-CNN), Mask-RCNN, and Retina-Net [24]- [29], have been widely applied on nodule detection and classification tasks, increasing accuracy and robustness of CAD systems.
This article aims to provide a detailed overview of CAD systems for pulmonary nodule detection and classification, which can be used as a study guide for researchers. Compared with the previous surveys [6], [7], this review not only illustrates the applications, experimental benchmarks, and construction constitutes of CAD systems but also emphasizes the introduction of various systems developed based on stateof-the-art CNNs. Accurately, we completely summarize the robust and effective algorithms for pulmonary nodule analysis, which have been validated on public or large datasets with excellent performance. After eliminating similar papers, we only detailly analyze those that have shown the best performance in lung cancer diagnosis to avoid duplicate content from previous reviews. We mainly focus on works published in 2019 and 2020 with the latest advancements, yet a small part of relevant works proposed before 2019 are also included. Please refer to the Appendix section for details of the literature collection procedure.
The remainder of this article is organized as follows. Firstly, the experimental benchmarks on pulmonary nodule detection and classification, including public datasets of lung CT scans, widely used evaluation methods, and related competitions, are introduced. Secondly, the complete structure of CAD systems, as well as some efficient algorithms of each component, are explicitly presented. Thirdly, workflows of CNN-based algorithms and traditional image processing methods are presented, and the advantages of CNN-based algorithms are summarized. Fourthly, the CAD systems, which are developed using state-of-the-art CNNs with excellent performance, are analyzed. Finally, research trends, current challenges, and prospective directions of CAD system development for pulmonary nodule analysis are discussed.

II. EXPERIMENTAL BENCHMARKS
For the development of effective CAD systems, there are three experimental benchmarks for researchers to focus on: datasets, evaluation metrics, and large-scale competitions for lung cancer diagnosis. Training pulmonary nodule detection and classification models require a large volume of lung CT scans, thus the acquisition of public datasets is extremely vital. To fairly validate the performance of various algorithms, reliable evaluation metrics are necessary. In addition, large-scale competitions always provide up-to-date CAD models, which are trained based on unified datasets and evaluation standards. Approximately 200,000 image series from over 75,000 CT exams are available, which include data on participant characteristics, screening exam results, diagnostic procedures, lung cancer, and mortality [31].

2) VIA/I-ELCAP
The International Early Lung Cancer Action Program database was made for the performance evaluation of diverse CAD systems by the ELCAP and Vision and Image Analysis research groups. 50 LDCT scans with a slice thickness of 1.25mm, nodule location information, and nodules types are provided. Particularly, the nodule sizes of this database are relatively small [32].

Nederlands-Leuvens Longkanker Screenings
Onderzoek trial was designed to investigate the benefits of LDCT screening on lung cancer mortality. Data of 15822 participants were collected since 2003. Datasets were original from images of the lung with a thickness of 1 mm and reconstructed at an overlap interval of 0.7 mm [34].

4) LIDC-IDRI
The Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) is the largest publicly available reference database for lung nodules. This database contains 1018 CT scans and associated XML files of two-phase image annotations from four experiences radiologists. Additionally, annotation consists of nodule characteristics, types, and position [36].

5) OTHERS
Datasets of Non-Small Cell Lung Cancer (NSCLC)-Radiomics, ACRIN-NSCLC-FDG-PET, LungCT-Diagnosis, and QIN LUNG CT are relatively less-used for improving the sensitivity of CAD system in contrast experiments. However, researchers can use these datasets for model robustness and generalization testing. These datasets can be downloaded from The Cancer Imaging Archive (TCIA) [42]. The information of all datasets is summarized in Table 1.

B. EVALUATION METRICS
The commonly used evaluation metrics of pulmonary nodule detection and classification are listed below:   [50] and became the evaluation criteria of most pulmonary nodule detection competitions. CPM is calculated using Eq. (4). Visually, Fig. 3 shows the FORC curves and CPMs of various CAD systems from [30].
where i represents the number of FP per scan at seven predefined FPR levels, s means the sensitivity of a CAD system, which can be referred to Eq. (1).

4) AUC, LOGLOSS
The Area Under the Curve (AUC) of the ROC curve and LogLoss are usually applied to evaluate the algorithms' ability to predict malignancy and classify lung cancer. The larger the AUC, the higher the classification accuracy of a CAD system. On the contrary, the lower the LogLoss, the better performance of a CAD system. The definition of LogLoss is: where n is the number of patients in the test set,ŷ i is the predicted probability of the image belonging to a patient with cancer, y i is one if the diagnosis is cancer, zero otherwise.

5) CROSS-VALIDATION
Cross-validation (CV) is a statistical method used to evaluate the performance of machine learning models. It is largely used in the phase of training predictive models because it can avoid overfitting and improve model generalization when there is not enough available data. Specifically, splitting the dataset into a test set and a training set in different ways is the critical part. CV methods contain K-Fold CV, Leave One Group Out, and Holdout method.  [66]. Table 2 shows detailed information of each competition, including sample data, constituent parts, evaluation criteria, best results, etc.

III. STRUCTURE OF A CAD SYSTEM
For decades, to increase the efficiency of lung cancer diagnosis, numerous researches have been done. Potentially, CAD systems can take advantage of thin cross-section images and serve as a second interpreter for radiologists for pulmonary nodule identification. There are a variety of CAD systems designed with different structures, and the main structure includes three components: (1) Preprocessing, (2) Nodule Detection, including candidate nodule detection and false positive reduction, (3) Nodule Classification. The whole procedure on how a CAD system works is shown in Fig. 4. Performance varies significantly among CAD systems due to the CT input, different characteristics of nodules, and especially the diversity of algorithms. To improve both sensitivity and specificity, most of the studies are focused on false positive reduction and nodule classification while using the same datasets.
In this section, we completely summarize the components of a CAD system and present practical algorithms that have been proved the effectiveness on public datasets.

A. PREPROCESSING
Preprocessing is a significant first stage of lung CT image analysis, because much irrelevant information existing in raw images that reduces working efficiency and diagnostic accuracy of a CAD system. The main lung volume, which is the ROI, is the core searching space while conducting nodule detection. Therefore, removing distracting components such VOLUME 8, 2020 as chest tissues and image artifacts, recovering or enhancing useful information are the key goals in this stage. It is illustrated that adapting lung segmentation procedures, as a preprocessing step in a CAD system, can prevent approximately 5% -17% missing of detected nodules [33]. Specifically, the contrast of Hounsfield Unit (HU) value between the lung and surrounding tissue forms the basis of most segmentation methods, and these methods can be divided into rule-based approaches and data-based approaches [4].
Generally, thresholding, component analysis, region growing, morphological operations, and filtering [19], [22], [26], [51], [52], [62], [67], [68], [96], [109] are often used as rule-based approaches in preprocessing medical images. Thresholding and component analysis are the most effective and quick ways to approximately separate lung volume from distracting components. Then lung volume can be found by restricting size and location. Also, using region growing can identify lung volume from trachea and bronchi. After determining lung volume, morphological operations, such as erosion and dilation, can be performed to obtain nonporous, smooth-bound lung. Various filters, such as Gaussian smooth filter and arithmetic mean filter, are recommended to reduce noise or enhance the image quality. These methods can be combined in various ways to achieve different segmentation effects. Han et al. [19] used thresholding to simply extract the chest volume, then applied a two-class high-level vector quantization algorithm for classifying voxels, followed by a linear Karhunen-Lo'eve transformation of the local intensity vectors. Besides, the principal component analysis technique was performed to optimize the vector space. Liao et al. [62] firstly adopted a Gaussian filter, intensity and distance thresholding operations to extract the mask of lung and rule out other tissues, next performed morphological operations including convex hull computing and dilation, to optimize mask extraction. Fig. 5 shows a complete preprocessing procedure using rule-based approaches. Furthermore, data-based approaches with better applicability can be used after rule-based operations by training learnable models to refine lung segmentation. Soliman et al. [63] initially identified the background of all 3D chest scans by region growing and component analysis. They then proposed a joint 3D Markov-Gibbs Random Field (MGRF) framework, which integrated two appearance sub-models and an adaptive shape prior sub-model to segment both normal and pathological lungs. Besides, deep learning techniques, particularly CNNs can be applied for medical image segmentation when there is sufficient hardware support. Existing advanced CNNs such as U-net [64], Mask-RCNN [65], and hybrid CNNs [69] can also be used for automatic lung segmentation. Alom et al. [69] proposed a Recurrent CNN (RU-Net) and a Recurrent Residual CNN (R2U-Net) for medical image segmentation, both of which were designed based on U-Net models. The proposed CNNs were evaluated on the LUNA16 dataset to segment lung region and achieved an accuracy of 99.18%.
In fact, rule-based approaches can also reach similar segmentation performance by manually adjusting parameters as data-based approaches. However, data-based approaches cost more time to train a learnable model, and it will be more computationally expensive than using rule-based approaches in optimizing CAD systems. Rule-base approaches are more convenient options for researchers to preprocess lung CT images.

B. NODULE DETECTION
Nodule detection involves nodule registration and filtering, called candidate nodule detection and false positive reduction, respectively. For decades, various nodule detection techniques have been proposed because of the complexity of nodule texture, size, shape and location, etc. We roughly divide diverse techniques into two common categories: traditional methods and DNN-based methods. The traditional methods, mainly consisting of classical image processing methods and machine learning classifiers, determine nodules and eliminate FPs by maximizing the matching rate between the feature profiles and suspicious area under the premise of manual-defined specific features. On the other hand, DNN-based methods, especially CNN-based methods, refer to black-box operations, which extract implicit features and tuning system performance automatically. Many practical algorithms of nodule detection are proposed based on both the traditional and DNN-based methods [77].
In this part, we briefly introduce useful algorithms which are developed for candidate nodule detection and false positive reduction. The effective algorithms proposed in 2019 and 2020 are selected and summarized in Table 3.

1) CANDIDATE NODULE DETECTION
In this stage, the main goal of CAD is to generate as many candidates as possible without considering the specificity but sensitivity. The more the pulmonary nodules are detected, the higher the survival rate of patients. Candidate Nodule Detection (CNDET) is a procedure to identify suspicious lesions and provide predicted position and probability of candidates.
For decades, several traditional methods such as thresholding, region growing, clustering, distance transformations, and morphological operations [19]- [23], [70]- [73], [83], [100] have been widely used based on hand-crafted features for roughly recognizing candidate nodules. El-Regaily et al. [73] first applied rule-based approaches, including contrastenhancing, region growing, rolling-ball algorithm and morphological operations, to extract lung parenchyma as well as preserve nodules attached to the lung wall. They then used 3D region growing, Euclidean distance transform and 2D thresholding to capture candidates from depth maps. However, these classical image processing methods are developed according to pixel intensity and low-level representatives of images. Additional filtering or geometric features computing methods are also needed to optimize candidate nodule generation [22], [74].
With the popularization of deep learning, more and more detection algorithms are proposed based on DNN techniques. So far, a large number of DNN-based methods, specifically CNN-based methods, are applied to generate candidate nodules because they can capture both low-level features and abstract high-level features, which greatly improve the detection sensitivity. The commonly used network structures for nodule detection mainly comprise simple CNN, U-Net, Feature Pyramid Network (FPN), Region Proposal Network (RPN), Residual Network (ResNet), and Retina-Net. Nearly all of the detection algorithms are variants of these networks, [26]- [28], [51], [53]- [57], [61], [75]- [79]. Part of them are Hybrid networks, which combine multiple structures in cascade mode [24], [28], [62], [68], [90], [99]. Wang et al. [75] proposed a nodule-size-adaptive model, which is similar to Faster R-CNN, to locate candidates with bounding boxes. Azad et al. [68] developed a Bi-directional ConvL-STM U-Net with Densely connected convolutions (BCDU-Net), which use different ways of concatenation to take full advantages of multiple feature maps for lung nodule recognition and segmentation. Besides, some fusion networks are also explored using multi-stream structures in order to integrate the power of different networks [25], [65], [80]- [82]. Liu et al. [80] exploited three identical 3D ResUNets to generate 3D Gaussian blob nodules, then fine-tuned the network by adding 3D RPN heads resulting in higher sensitivity on large nodules.

2) FALSE POSITIVE REDUCTION
After the CNDET stage, there are still many FPs decreasing the efficiency of nodule diagnosis. Excessive FPs will lead to over-diagnosis, over-treatment, waste of medical resources. Therefore, it is essential to increase the accuracy of nodule detection by reducing FPs. False Positive Reduction (FPRED) refers to classify true nodules from extracted candidates, which is equivalent to a binary classification task. There are also many works focusing on FPRED.
In the FPRED stage, a variety of features based on intensity, morphology, or texture should be extracted from candidate nodule images and fed to classifiers to determine nodule and non-nodule candidates. For traditional methods, several machine learning classifiers are commonly applied to recognize true nodules, for example, SVM, k-Nearest Neighborhood classifiers, linear discriminant classifiers, and various boosting classifiers [19]- [22], [83], [84]. Naqi et al. [22] combined geometric texture and Histogram of Oriented Gradient reduced by Principle Component Analysis (HOGPCA) features into a hybrid feature vector, then fed the extracted vector to k-Nearest Neighbourhood, Naive Bayesian, SVM, and AdaBoost to reduce FPs.
Besides, optimization strategies such as data augmentation, positive samples balancing via focal loss function, and Non-Maximum Suppression (NMS) operations [80]can be used to improve performance for better classification. For example, At Liu et al. [80] applied Projected Gradient Descent to generate three types of adversarial samples, then trained a 3D Dense U-Net with the extracted candidates and adversarial samples, resulting in a 5.33% improvement on CPM.

C. NODULE CLASSIFICATION
Nodule classification is the final step of CAD systems. Most of the CAD systems are designed for predicting malignancy of nodules and determining whether a nodule is cancerous, but some are designed for nodule type classification [21], [95]. In this article, we mainly focus on lung cancer diagnosis. The age, sex, pack-years smoked, and smoking status of patients can cause nodule lesions. Since cancerous nodules tend to have large size (diameters>8mm), and uneven surface with spiculate, lobulated characteristics, measurements of nodule size and representations of nodule appearance are the most important research direction to estimate the malignant probability.

A. WORKFLOW
Rapid improvements in computing power, as well as an increase in the amount of available data, enable the extensive use of DNN-based methods in medical image processing. Particularly, CNNs are responsible for the tremendous influence in the field of CAD development, and significantly improve the accuracy in nodule detection and classification tasks. CNNs are designed to discover the underlying relationship between images and automatically extract the most descriptive features, mainly in an end-to-end manner. CNNs are typically built by three types of layers (convolution layers, pooling layers and fully connected layers.) and activation functions. The convolution and pooling layers perform feature extraction while the fully connected layers map the extracted features to the final output. And each fully connected layer is followed by a specific activation function such as sigmoid, softmax and ReLU. The activation functions are selected according to different data and VOLUME 8, 2020 classification tasks. However, traditional methods are to applied several computer vision techniques for image processing. For feature extraction, it is necessary to determine hand-crafted features and manually select the important ones in each given image, which heavily depend on the subjective judgment of researchers. In the following step, machine learning classifiers need to be performed for nodule classification. The workflows using typical CNNs and traditional methods for nodule diagnosis are described in Fig. 6.

B. ADVANTAGES OF CNNs 1) BETTER PERFORMANCE
Unlike traditional methods, CNNs utilize the shared convolution kernel to discover potential patterns in different image categories, which are beneficial for processing not only single-dimensional but also high-dimensional data. This characteristic of CNNs enables the exploration of high-level semantic information from substantial medical images, leading to better performance in image detection and classification tasks.

2) STRONG FLEXIBILITY
CNNs provide strong flexibility and adaptivity on various datasets. Since CNNs are mathematical models with approximation functions, any dataset that can be quantified can be used to retrain CNN-based models for both regression and classification problems. In contrast to traditional methods, classical image processing methods and machine learning classifiers tend to be domain-specific [107].

3) AUTOMATED AND EFFICIENT
CNNs are designed based on black-box operations, which are able to automatically extract descriptive and salient features corresponding to each target object. However, the difficulty of traditional methods is to manually define and select specific features according to different image tasks. As the number of image categories increases, feature extraction becomes more time-consuming and energy-draining.

V. STATE-OF-THE-ART
Part of the algorithms in this article involve a small extension to previously published works, or are just tested with different datasets. Therefore, only the CAD systems with the best performance or developed with state-of-the-art CNNs will be detailedly introduced in this section.

A. MULTI-STREAM FRAMEWORKS
Multi-stream frameworks refer to those developed using multi-scale, multi-resolution, multi-views input data, or those designed with multiple networks. Applying multi-stream frameworks can take advantage of different types of features for better identifying malignant nodules.

1) MULTI-SCALE GRADUAL INTEGRATION CADE
The Multi-scale Gradual Integration CNN (MGI-CNN) CADe system, which was designed specifically for false positive reduction, was adapted by Kim et al. [86]. MGI-CNN was proposed to extract morphological and contextual features from multi-scale input data. The MGI-CNN consisted of two main components: Gradual Feature Extraction (GFE) and Multi-Stream Feature Integration (MSFI). The candidate nodule patches were first extracted from thoracic CT scans at three different scales: 40 × 40 × 26, 30 × 30 × 10, 20 × 20 × 6. The extracted patches were fed into 'zoomin' and 'zoom-out' GFE networks next. Then the multiplestream features were fused by MSFI to integrate contextual information hierarchically. The algorithm was evaluated on the LUNA16 dataset and obtained CPM scores of 0.908(V1) and 0.942(V2). Note that the source code is available at: https://github.com/ ku-milab/MGICNN.

2) MAXIMUM INTENSITY PROJECTION-BASED CADE
This CADe system was developed by Zheng et al. [81] using Maximum Intensity Projection (MIP)-based CNNs. The framework of the proposed CADe included four fusion 2D CNNs based on U-Net and a 3D CNN based on VGG-Net. MIP images were the superposition of maximum grey values at each coordinate from a set of consecutive slices. Specifically, ruled-base methods, covering thresholding, component analysis, and binary morphology operations, were first used to segment lung volume. Then the MIP images with different slab thicknesses were generated as input for 2D CNNs to detect candidates from four streams. All the fused candidates were finally fed into a 3D CNN to reduce FPs. In the CNDET stage, the MIP-based CADe system was trained on the LUNA16 dataset and achieved a sensitivity of 95.36% with 20.4 FPs/scan. In the FPRED stage, the system was performed on the LIDC-IDRI dataset and obtained a sensitivity of 94.19% with 2 FPs/scan.

3) CLOUD-BASED AUTOMATED CADE
Masood et al. [54] developed a CADe system, which was integrated cloud computing provided by virtual machines and software given by a 3D CNN model. In the preprocessing stage, Median Intensity Projection (MeIP) was applied to generate MeIP images. Then traditional methods were used to extract multi-scale, multi-angle, and multi-view patches from MeIP images. In the CNDET stage, a multi-Region Proposal Network (mRPN) architecture was built based on a modified VGG-16 backbone to detect candidate nodules from extracted patches. They chose seven levels of anchor sizes to generate nodules of diverse malignant levels: 4×4, 8×8, 12×12, 16× 16, 20×20, 26×26, and 32×32. In the following stage, a 3D CNN using a modified ResNet-10 basic layout was performed to reduce FPs. The Cloud-based automated CADe system was trained and validated on LUNA16, ANODE09, LIDC-IDR datasets, achieving sensitivities of 0.988 at 1.97 FPs/scan, 0.976 at 2.3 FPs/scan and 0.988 at 1.97 FPs/scan, respectively.

4) END-TO-END CADX
Ardila et al. [65] proposed an end-to-end 3D CNN framework that trained with patients' current and prior CT volumes for lung cancer diagnosis. The architecture of the CADx system consisted of four sectors: (1) Lung segmentation model: a Mask-RCNN was trained on the LUNA16 dataset to produce segmentation masks of lung CT scans. (2) Cancer ROI detection model: a modified 3D Retina-Net was pre-trained on the LIDC dataset and fine-tuned on the NLST dataset to generate nodule-like ROIs. (3) Full-volume model: a 3D inflated Inception V1 was trained on the 1.5 mm 3 voxel size CT volumes for cancer prediction, fine-tuning from a checkpoint trained on ImageNet. (4) Cancer risk prediction model: a 3D Inception was applied to extract features from the output of the (2)(3) models for final malignancy prediction, exploiting both nodule-level local information and global context from the entire CT volume. The CADx system achieved the best performance on NLST data (AUC=0.944).

5) MULTI-VIEW CADX
Xie et al. [56] adopted a U-Net to segment the lung nodules on a slice-by-slice basis. Next, a 3D multi-view knowledgebased collaborative (MV-KBC) deep model consisting of nine KBC sub-models was trained to learn multiple characteristics from nine specific views (sagittal, coronal, axial and six diagonal planes) on each 3D nodule image. Each sub-models comprised three pre-trained ResNet-50 networks, which were applied to extract the overall appearance, heterogeneity in voxel values, heterogeneity in shapes patches of nodules on each plane. Besides, a penalty loss function was introduced to balance the number of positive and negative samples. The proposed model was tested on the LIDC-IDRI dataset for nodule classification, resulting in an accuracy of 91.6% and the AUC of 95.7%.

B. TRANSFER LEARNING
Transfer learning indicates an algorithm that stores knowledge gained while solving a specific task and can be applied to another relevant task. Initialize or fine-tune models using pre-trained CNNs can improve the efficiency and accuracy of CAD systems to some extent.

1) MULTI-PLANAR CADE
Zheng et al. [79] proposed a 2D U-Net++ and a 3D Multi-Scale Dense CNN to develop the CADe system for small nodule identification. This CADe system was trained using images from the axial plane, the coronal plane, and the sagittal plane with transfer learning algorithms. For image preprocessing, the multi-planar lung volumes were segmented using the same ways as [81]. For candidate nodule generation, the multi-planar slices and MIP slices served as the input of the U-Net++ model, which was adapted from efficient-net pretrained on ImageNet. The U-Net++ model extracted features from both small and large receptive fields and detected candidate nodules on each plane using bounding boxes. Next, a 3D Multi-Scale Dense CNN, consisting of thirty-two basic blocks, five transition blocks, and a classifier block, was applied to exclude suspicious candidates. The Multi-planar CADe system was trained and validated on the LIDC-IDRI dataset. It reached a sensitivity of 0.981 when identifying candidate nodule, and obtained a CPM score of 0.955 after reducing FPs.

2) MED3D CADX
The Med3D was developed by Chen et al. [98] for 3D medical image segmentation and classification using transfer learning algorithms. The procedure of the Med3D CAD contained three steps: (1) The 3DSeg-8 dataset with diverse scans VOLUME 8, 2020 regions, target organs, and pathologies were collected and normalized. (2) The Med3D, which consisted of a shared encoder adapted from ResNets and eight simple decoder branches, was trained to extract specific features. (3) Transfer the pre-trained Med3D to different medical tasks. In the lung cancer diagnosis task, they changed the encoder part of Med3D to a ResNet classifier for feature extraction, then added an average pooling layer and a full-connected layer with (1,1,1) kernel size for malignant nodule classification. The system evaluation was conducted on the LIDC dataset with an accuracy of 0.919 and higher network convergence speed. Note that all pre-trained models and source codes are provided at: https://github.com/Tencent/MedicalNet.

C. SEMI/UN-SUPERVISED BASED CADe
Semi-supervised learning methods can be used to automatically extract features from a very limited number of labeled data and a large number of unlabeled data. In contrast, unsupervised learning methods are used on unlabeled data. Semi/un-supervised learning algorithms can increase the accuracy of lung cancer diagnosis in the case of insufficient training data.
Wang et al. [99] proposed a FocalMix method, which was to take advantage of the latest semi-supervised learning (SSL) algorithms for 3D medical image processing. FocalMix method mainly included three optimization strategies to improve the effectiveness of lung nodule analysis: soft-target focal loss, anchor-level target prediction model, and MixUp augmentation. The labeled images and unlabeled images were utilized as input data. Firstly, the training anchors in labeled images were assigned according to the annotated boxes, while the unlabeled ones were extracted by the target prediction model. Besides, the target prediction model was designed using both traditional methods and CNN-based methods in SSL manner, covering image transformations, morphological operations, and a 3D variant of FPN. After that, two levels of MixUp augmentation, which were image-level Mixup and Object-level Mixup, were applied to each input batch. Additionally, the soft-target focal loss was used on unlabeled data to train the model. They evaluated the FocalMix method on the LUNA16 dataset and NLST dataset, resulting in a CMP score of 0.907. It was proved that the proposed method outperformed the fully supervised baseline, and was easy to transfer to other modern SSL frameworks.

D. SELF-SUPERVISED LEARNING
Self-supervised learning is an approach that makes full use of unlabeled data to generate the needed information for supervised feature learning.

1) MODELS GENESIS CADE
The Models Genesis (MG) CADe system was designed by Zhou et al. [105]. The MG consisted of various 2/3D source models, which were trained from unlabeled images using a unified self-supervised learning method. The proposed system was built in an encoder-decoder architecture and was applied in different imaging tasks. The MG consolidated four novel transformations: (1) non-linear, (2) local-shuffling, (3) out-painting, and (4) in-painting, to recover anatomical patterns. The MG was trained from diverse perspectives (appearance, texture, context, etc.) by unifying all tasks into a single image restoration task via transformation operations. Models fine-tuned from MG outperformed models learned from scratch and any 2D models in five target tasks, including both image segmentation and classification. For nodule and nonnodule classification tasks, the MG CADe system was tested on the LUNA16 dataset and achieved the best performance (AUC=0.982, CPM=0.971). Note that all pre-trained models are available at: https://github.com/MrGiovanni/ModelsGen esis.

2) HIGH SENSITIVITY AND SPECIFICITY CADX
Liu et al. [76] developed a CADx using self-supervised learning. The whole framework of this CADx system comprised a 3D FPN and a High Sensitivity and Specificity (HS 2 ) network. In the preprocessing stage, traditional methods such as Gaussian filters were applied to segment lung region. After that, a 3D FPN with a self-supervised pre-trained ResNet-18 was adopted for candidate nodule recognition by using multi-scale features to improve the resolution of nodules and parallel top-down path to transfer high-level semantic features to supplement low-level features. For false positive reduction, the HS 2 network consisting of two convolution layers and three fully connected layers was performed to track the appearance changes of each candidate from continuous CT slices on Location History Images (LHI). The proposed system achieved CPM scores of 0.957, 0.899, 0.889, 0.871 on LUNA16, SPIE-AAPM, LungTIME, and HMS Lung Cancer datasets, respectively.

E. MULTI-TASK CADx
Multi-task learning is a method that solves multiple learning tasks at the same time and finds out the commonalities and differences across tasks. Compared with the single-task training model, multi-task learning can improve the learning efficiency and accuracy of the specific task model.
Liu et al. [61] proposed a multi-task 2D CNN with Margin Ranking loss (MTMR-Net) to construct the CADx system for nodule analysis. The MTMR-Net consisted of two 2D CNNs in Siamese network architecture for nodule benign-malignant classification task and attribute score regression task. Besides, a margin ranking loss was employed to further classify ambiguous nodules, which improved the discriminating capability of the network. Specifically, each 2D CNN was built with a feature extraction module, a classification module, and a regression module: (1) The feature extraction module was designed based on residual blocks and was trained using parameters from pre-trained ResNet-15. (2) The classification module contained one fully connected layer followed by a cross-entropy loss for the final benign-malignant classification. (3) The regression module comprised two fully connected layers followed by mean square error loss for the final attribute score prediction (eight attribute scores: internal structure, calcification, sphericity, margin, spiculation, lobulation, and texture). They evaluated the proposed CADx on the LIDC-IDRI dataset, obtaining an accuracy of 0.935, a sensitivity of 0.93, a specificity of 0.894, and an AUC of 0.9797, respectively. Note that source codes are available at: https://github.com/CaptainWilliam/ MTMR-NET.

VI. DISCUSSION
From the investigation mentioned above of advanced CAD systems, it has been proved that remarkable progress has been achieved in automatic pulmonary nodule analysis. Particularly, various advanced CNN-based algorithms have been applied to improve the accuracy and sensitivity in nodule detection and classification tasks, thus significantly increasing the effectiveness of CAD systems for lung cancer diagnosis in an early stage. Despite more and more intelligent CAD systems appear with the popularization of CT scanning techniques and deep learning approaches, there are still problems existing. In this section, we analyze trends from the research works mentioned above and present some unsolved challenges and future directions in pulmonary nodule diagnosis.

A. RESEARCH TRENDS
As shown in Section III and IV, substantial CNN-based researches have been done for candidate nodule detection, false positive reduction, and nodule classification. It is observed that developers tend to perform CNN-based algorithms instead of traditional methods to design CAD systems in lung cancer identification with the rising of computer power [7].
From the state-of-the-art CAD systems listed in Section V, we categorize the corresponding developing strategies into five CNN groups: (1) multi-stream CNNs [54], [56], [65], [79], [81], [86], (2) CNNs with transfer learning algorithms [54], [56], [76], [79], [98], (3) CNNs with semi/unsupervised learning algorithms [99], (4) CNNs with self-supervised algorithms [76], [105], and (5) multi-task based CNNs [61]. It is shown that the ratio of using these five strategies in state-of-the-art CAD systems is 6:5:1:1, while the ratio of using these five strategies in practical publications proposed in 2019 and 2020 is 15:8:2:2:2 (see Table 3, 4). Obviously, multi-stream CNNs are more commonly adopted than other state-of-the-art strategies, possibly because the multi-stream framework can exploit comprehensive multimodal features, including both low-level image features and high-level semantic features, leading to higher accuracy in nodule diagnosis. Furthermore, it is worth noting that some multi-stream CNNs with transfer learning algorithms [54], [56], [76], [79] obtain CPM and AUC over 94%, which outperform most of the other strategies. The reason for its strong stability and great performance is that the extracted common features from different disease patterns, as well as multi-modal features, may have significant benefits for detection and classification tasks.

B. EXISTING CHALLENGES AND FUTURE DIRECTIONS 1) LACK OF A LARGE AND HIGH-QUALITY LABELED DATASETS
As is well known, a large amount of high-quality labeled data is crucial to train an effective deep learning model for medical image analysis. However, the existing public datasets of lung CT scans are not labeled in an organized manner, which results in messy annotated information among different datasets. Thus collecting mass lung CT data with accurate labels remains a big challenge. On the one hand, privacy issues could be the biggest obstacle for collecting individual lung CT scans, and some hospital regulations and national policies also involve personal information protection. On the other hand, it takes much time for radiologists to annotate the medical images while works to non-expertise would lead to misclassification.
To alleviate the dataset scarcity problem, data augmentation strategies such as cropping, rotation, flipping, or scaling of image patches and relevant labels can be applied to increase the number and diversity of available training samples. Besides, Generative Adversarial Networks (GAN) can also be adopted to synthesize adversarial images as additional data [110]. When there are sufficient raw CT scans with lack of labels, advanced off-the-shelf CNNs can be trained on much or all of the unlabeled data using semi/un-supervised as well as self-supervised learning methods, which will reach a better performance than supervised learning methods [76], [99], [105], [106]. Using transfer learning algorithms to pre-train 3D CNNs on other large-scale datasets, such as ImageNet, will improve the accuracy in nodule detection and classification tasks in the case of insufficient datasets [60], [79].

2) BAD INTERPRETABILITY OF DIAGNOSTIC RESULT
CNN-based models are trained in the black-box procedure, which can automatically identify and classify pulmonary nodules, yet provide no explanation of pathogenesis. The interpretability of models is essential for radiologists to find out the exact cause of the disease. Only detective results or diagnosis scores do not significantly help radiologists make a final diagnosis and draw up an accurate treatment plan. Thus the CNN-based models, which can discover the relationship between input data and diagnostic results as well as determine which features of nodules are responsible for the malignancy, deserve attention.
To improve the interpretability of CAD system, an inference model based on the Bayesian network has been constructed using the Markov Chain Monte Carlo (MCMC) method, which can estimate the conditional probability of each feature [111]. Furthermore, the cause and effect inference problem could be divided into a feature prediction task and a benign-malignant classification task [25]. The casual relationship between predicted feature scores and diagnostic results can be observed. For instance, a multi-task CNN system with margin ranking loss was proposed for nodule attribute score prediction and cancer diagnosis [61].

3) LACK OF CONTINUOUS LEARNING ABILITY
An effective CAD system for pulmonary nodule diagnosis is usually required to assist radiologists in making clinical decisions correctly when facing unexpected samples. Therefore, the continuous learning ability of a CAD system for new medical image samples identification is vital. However, present CAD systems are mostly developed using trained models and applied of actual use, which means that they can only perform well in statistic environments instead of a dynamic environment. These systems can not correctly recognize some unique samples that have not been trained, probably resulting in wrong diagnosis [77]. It is of great benefit to construct a CAD system with continuous learning ability to support realtime changing situations.
One possible direction for building continuously learningbased systems is to design a new CNN framework with cloud computing techniques. With the use of cloud computing techniques, diagnosis records can be sent to cloud storage to update training datasets so that the proposed CNN can be trained in a cloud back-end to adapt real-time changes continuously [54].

VII. CONCLUSION
In this article, we make a comprehensive review of pulmonary nodule detection and classification for CT images in developing a CAD system. The public datasets of lung CT scan, widely used evaluation methods, and related challenges of the pulmonary nodule are first introduced and summarized. Then we describe the detailed procedures on how a CAD system works and presented some practical algorithms of each processing stage. We next compare the differences between traditional methods and CNNs and summarize the advantage of CNNs. Besides, the CAD systems that are developed using state-of-the-art CNNs with excellent performance are filtered and analyzed. Finally, we make a discussion about research trends, existing challenges, and future directions of CAD system development.
It can be concluded from this review that CNN-based methods are dominant with better performance and outperform the traditional methods in both nodule detection and classification tasks. The exploration of multi-stream, semi/un-supervised, self-supervised, multi-task, and transfer learning methods, especially multi-stream and transfer learning approaches for improving the performance of CAD systems, deserve more attention. Note that we focus on clarifying the development of CAD systems and analyzing the effective CNN-based algorithms. It is believed that this review can provide a comprehensive reference for researchers and radiologists.

APPENDIX
Most of the literature pertinent to pulmonary nodules analysis in CAD development is collected from searches using the public online database search engines: IEEE Xplore, Science Citation Index Expanded, arXiv, SpringerLink, and ScienceDirect. The searching keywords are ''lung cancer'', ''nodule detection'', ''nodule classification'', ''false positive reduction'', ''CAD'', and ''computer-aided diagnosis''. Combine the keywords using ''OR'' and ''AND'' in different ways. Articles published in 2019 and 2020 with excellent performance are selected and filtered. Contributions from other sources can be identified from citations in the above publications.
RUI WU received the B.Eng. degree in electronic information engineering from Shantou University, Shantou, China, in 2018. She is currently pursuing the master's degree in information and communication engineering with Shenzhen University, Shenzhen, China. Her current research interests include deep learning and medical image analysis.