Classifier Ensemble Based on Computed Tomography Attenuation Patterns for Computer-Aided Detection System

Cancer is one of the leading causes of mortality worldwide, specifically lung cancer. Computer-Aided Detection (CADe) systems are being proposed to assist radiologists in the task of pulmonary nodule detection. In this paper, we propose a CADe system that uses Deep Convolutional Neural Network (DCNN). In the Nodule Candidate Detection (NCD) step, we used Mask Region-Convolutional Neural Network (Mask R-CNN) to detect bounding boxes in 2D slices of low-dose Computed Tomography (CT) scans. In the False Positive Reduction (FPR) step, we used a classifier ensemble based on CT attenuation patterns to boost 3D pulmonary nodule classification performance. The final confidence index generated by the CADe system to the pulmonary nodule candidates is the average of the prediction obtained with the NCD and FPR steps. The CADe system was validated on the publicly available LUng Nodule Analysis 2016 (LUNA16) challenge and obtained a sensitivity of 94.90% and an average of 1.0 False Positives per scan (FP/Scan), against 96.90% of the proposal that combines different existing CADe systems. To the best of our knowledge, our proposal has one of the best results of CADe systems, outperforming other state-of-the-art individual methods.


I. INTRODUCTION
Cancer is one of the biggest public health problems worldwide. In 2018, cancer was responsible for 9.6 million deaths globally, about one-sixth of the fatalities. Lung cancer is one of the most common cancers, with 2.09 million cases, and has caused approximately 1.76 million deaths [1].
According to the Brazilian National Cancer Institute (INCA) [2], lung cancer is the second most common type of this disease in Brazil (not counting non-melanoma skin cancer). It accounted over more than 26 thousand deaths in 2015. The five-year survival rate for patients diagnosed with lung cancer is 18.0% (15.0% for men and 21.0% for women). Only 16% of cancer cases are diagnosed at an early stage (localized lesion). In this case, the five-year survival rate is The associate editor coordinating the review of this manuscript and approving it for publication was Orazio Gambino . 56.0%. At the end of the 20th century, lung cancer was one of the leading causes of preventable deaths.
The American Cancer Society estimates about 235.760 new cases of lung cancer and approximately 131.880 deaths in the United States for the year 2021. The number of new cases and deaths from lung cancer would decrease due to people stopping smoking and due to advances in early detection and treatment for this disease. If a pulmonary lesion (nodule) is detected at an earlier stage when it is small and before it has spread throughout the body, it is more likely to be successfully treated [3].
There are several image exams modalities that are able to evaluate pulmonary nodules, such as Computed Tomography (CT), Magnetic Resonance Imaging (MRI) and radiograph. CT has a relative low cost, wide availability and good sensitivity. It is the preferred exam modality for pulmonary nodule evaluation in many clinical situations. Nowadays low-dose CT is the lung cancer screening exam recommended for patients with considerable smoking history [4].
CT scan image analysis is challenging and can be exhausting to radiologists. Factors such as distraction, fatigue and limitations of professional experience may result in diagnostic errors [5], [6]. Computer-Aided Detection (CADe) systems are being developed to automate pulmonary nodule detection and measurement to assist the radiologists. These solutions are used as a second opinion in diagnostics [7], [8].
With the development and deployment of images database, such as the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI), the validation of CADe systems is possible based on expert diagnostics [9]. The LIDC-IDRI database consists of low-dose thoracic CT scan images and the diagnosis of the lesions found [10]. To evaluate the performance of CADe systems, it was created the Lung Nodule Analysis 2016 (LUNA16) challenge. The metric used is the Competition Performance Metric (CPM) score obtained by the Free Response Receiver Operating Characteristic (FROC) curve (i.e., the average sensitivity at seven predefined False Positives per Scan (FPs/Scan) rates: 1/8, 1/4, 1/2, 1, 2, 4, 8) [11].
In several previous studies, investigators have developed automated CADe systems. Typically, CADe systems consist of five main steps: i) data acquisition; ii) pre-processing; iii) lung parenchyma segmentation; iv) Nodule Candidate Detection (NCD); and v) False Positive Reduction (FPR) [12]. However, some studies summarize only two major steps, NCD and FPR respectively [13]- [15].
CADe systems have been proposed with different approaches. Traditional approaches use hand-crafted features that require domain expertise [16]. These features have limited representation capability, and it is insufficient to deal with the large variations of pulmonary nodules [17]. The pulmonary nodules have variations in sizes, shapes, and locations. Furthermore, the contextual environments around the nodules presented differently for each category, such as solitary nodules, ground-glass opacity nodules, cavity nodules, and pleural nodules [18]. Solutions to pulmonary nodule candidate detection using Deep Convolutional Neural Network (DCNN) are being proposed. These solutions use learning-based features and show better results than traditional approaches [19].
This paper proposes an automated CADe system for pulmonary nodule detection in low-dose thoracic CT scan images with two major steps, NCD and FPR respectively. In the first step, we use an end-to-end object detection method based on two-dimensional (2D) DCNN to identify regions as pulmonary nodules on axial slices of thoracic CT scan. In the second step, a three-dimensional (3D) DCNN classifier ensemble based on CT attenuation patterns removes false positive pulmonary nodule candidates. The main contributions of this paper are the following: 1) We employ an image with three channels and 8 bits per channel composed of slices of thoracic CT scans. In this way, we used an end-to-end object detection method called Mask Region-Convolutional Neural Network (Mask R-CNN) [20] pre-trained with natural images to 2D nodule candidates detection. 2) We proposed an algorithm to discard pulmonary nodule candidate detections that have centroid close to each other in an exam, removing redundant candidates. 3) We designed a 3D DCNN network based on Visual Geometry Group (VGG) [21] that supports patches with different sizes to classify nodule candidates. 4) We proposed a classifier ensemble based on CT attenuation patterns to boost the performance for nodule classification. The rest of this paper is organized as it follows: Section II introduces the related works about automated CADe systems for pulmonary nodule candidate detection; Section III describes the proposed methodology; Section IV presents the experiments and results obtained and presents an ablation study about the classifier ensemble. Finally, we conclude this paper in Section V.

II. RELATED WORK
The available literature offers acknowledged studies with DCNN and it has achieved excellent results in the field of automated CADe systems for pulmonary nodule detection.
Some of these researches work combining individual systems. Setio et al. [13] proposed a CADe system for detecting pulmonary nodules using ablation study DCNN with two steps. They used three existing CADe systems to detect nodule candidates specifically designed for solid, subsolid and large nodules in the first step. For each candidate, it was extracted a set of 2D patches from differently oriented planes. In the second step, a classifier ensemble based on 2D DCNN was proposed, for which the outputs are combined using a dedicated fusion method to get the final classification. Their CADe system achieved a sensitivity that reaches 85.40% and an average of 1.0 FP/Scan on the LUNA16 dataset. In another work, Setio et al. [11] combined seven pulmonary nodule candidate detection systems and five false positive reduction systems developed per participants of the LUNA16 challenge. Combining these solutions achieved a sensitivity of 96.90% and an average of 1.0 FP/Scan. Some works collectively use traditional approaches and deep learning models for nodule detection. For example, Zhang et al. [22] used multi-scale Laplacian of Gaussian (LoG) filters and prior shape and size constraints to nodule candidates detection. Further, a densely dilated 3D DCNN was applied to reduce the number of FPs and estimate nodule diameters, so it has obtained a sensitivity of 94.90% and an average of 1.0 FP/Scan on the LUNA16 dataset.
In the last few years, solutions have been proposed exclusively with deep learning models. Dou et al. [23] proposed a solution with two stages. In the first stage, they designed a 3D Fully Convolutional Network (FCN) to screen the nodule candidates. In the second stage, they created a hybrid loss residual network that harnesses the location and size information to the pulmonary nodule detection, which VOLUME 9, 2021 achieved on the LUNA16 dataset a sensitivity of 86.50% and an average of 1.0 FP/Scan. Zheng et al. [14] proposed a 2D CNN based on the U-net [24] that takes Maximum intensity projection (MIP) images of different slab thicknesses axial section slice as an input. Further, they applied multiple 3D DCNN to reduce false positive candidates and fused the final predictions. Their CADe system achieved a sensitivity that reaches 92.67% and an average of 1.0 FP/Scan on the LUNA16 dataset. In another approach, Ozdemir et al. [25] proposed a system with two main components: i) a CADe module that detects and segments suspicious lung nodules, and 2) a Computer-Aided Diagnosis (CADx) module that performs analyzing suspicious lesions from CADe. The CADe module comprises a 3D FCN based on the V-Net architecture [26] and a 3D scoring network, which computes refined nodule probability estimates for false positive reduction. The sensitivity of their solution is 94.20% and an average of 1.0 FP/Scan on the LUNA16 dataset. Tang et al. [27] proposed a scheme that is composed of many stages. Initially, the pulmonary parenchyma area is segmented by various methods. After, a scheme based on the 3D U-Net [24] and transfer learning method is used to detect pulmonary nodules, so it has obtained on the LUNA16 dataset a sensitivity of 92.40% and an average of 4.0 FPs/Scan. Zhu et al. [28] proposed a Generative Adversarial Network (GAN) architecture called Functional-Realistic GAN (FRGAN) to create realistic high-resolution images. Thus, they used data augmentation to balance the ratio of the positive and negative nodule samples. In the experiments, used the CADe system CUMedVis [17] and, the LUNA16 and Non-Small-Cell Lung Cancer (NSCLC) datasets. The CPM score of their scheme is 91.50% with 1304 nodules.
Deep learning techniques have been achieved on the state-of-the-art results for object detection and instance segmentation in computer vision challenges. Most notably are region-based DCNN [20], [29]- [31] techniques capable of achieving great results on a range of object detection tasks, amongst them, nodule detection. For example, Ding et al. [32] proposed a novel pulmonary nodule detection approach based on DCNNs. Initially, they used a deconvolutional structure Faster Region-based CNN (R-CNN) [31] for candidate detection on axial slices. Then, a 3D DCNN is presented for the false positive reduction, which achieved on the LUNA16 dataset a sensitivity of 92.20% and an average of 1.0 FP/Scan. Zhu et al. [19] proposed a fully automated cancer diagnosis system called DeepLung. Two 3D Dual Path Networks (DPN) DCNN were designed for nodule detection and classification respectively. Specifically, a 3D Faster R-CNN is designed for nodule detection. The solution achieved a CPM score of 84.20% without any FP nodule reduction step on the LUNA16 dataset. Cai et al. [33] proposed a detection and segmentation method for pulmonary nodule 3D visualization diagnosis based on Mask R-CNN [20] and ray-casting volume rendering algorithm, respectively. It was applied the Feature Pyramid Network (FPN) with CNN to explore multi-scale feature maps fully. Their CADe system achieved on the LUNA16 dataset a sensitivity that reaches 88.10% and an average of 1.0 FP/Scan. Peng et al. [34] proposed two types of 3D multi-scale DCNN for the NCD and FPR steps, respectively. In the first step, a CNN was used to extract multi-scale feature of pulmonary nodules and a Region Proposal Network (RPN) structure was used to determine region candidates. In the second step, another 3D multi-scale DCNN was used to classify the nodule candidates. The CPM score of their scheme is 92.30% on the LUNA16 dataset. In another approach, Mei et al. [35] proposed a CADe system based on DCNN named SANet. Initially, they used a 3D RPN to generate pulmonary nodule candidates. Then, a false positive reduction module uses multi-scale feature maps. The experiments were carried out on the LUNA16 and PN9 (new dataset proposed) datasets. The sensitivity of their solution is 90.09% and an average of 1.0 FP/Scan on the LUNA16 dataset.

III. METHOD
We proposed a CADe system based on CT attenuation patterns for pulmonary nodule detection. In each step of the CADe system, attenuation ranges were used for highlight different visual patterns. We have used patterns with higher CT numbers to represent consolidation and nodules. Fig. 1 illustrates the CADe system overview proposed. Attenuation ranges were used on Interstitial Lung Diseases (ILD) to improve visibility or visual separation among disease categories [36].
Ideally, in CADe systems with two steps, the NCD step should be designed for screening nodule candidates with low false positives rates and high sensitivity, but that is not always possible. Hence, the false positive nodule candidates may be corrected by the FPR step, generally by classification techniques. However, false negative cases have no opportunity to be corrected after the NCD step. Moreover, systems with high false positives rates in the DCN step need robust techniques in the FPR step. Therefore, we designed the NCD step of our CADe system to get reasonable rates between false positives and sensitivity.
In the NCD step, the 2D Mask R-CNN method was used for object detection and instance segmentation in low-dose thoracic CT scan images (Fig. 2a). This method generates bounding boxes for each interest object in the image, i.e., 2D nodule candidate detection ( Fig. 2b) (Section III-C2). Instances generated outside the lung parenchyma were discarded (Fig. 2c). We employ a scheme to segment the lung parenchyma (Section III-C3). Then, we used the algorithm
proposed in Pereira et al. [37] to group 2D bounding boxes on nodules (Fig. 2d). We used Jaccard overlap bigger than or equal to the threshold of 0.2. Finally, we used the algorithm proposed to discard pulmonary nodule candidate detections that have the centroid close to each other in an exam ( Fig. 2e) (Section III-C4). The NCD step output comprises a set of coordinates and confidence indexes of nodule candidates (Fig. 2f).
In the FPR step, we used a classifier ensemble to determine nodule candidate detections generated in the previous step (Fig. 2g). The ensemble combined 3D DCNN models of the same architecture (Fig. 2i). Thus, the final prediction of the classifier ensemble comprises a set of confidence indexes (Fig. 2j). However, the models were trained with different samples regarding attenuation ranges, size 3D patches, and affine transformations (rotation and shifting) in each epoch (Fig. 2h).
The final prediction of the nodule candidates was calculated ( Fig. 2k) with the confidence index obtained by NCD (Fig. 2f) and FPR (Fig. 2j) steps.

A. DATASET
Our experiments were performed using the thoracic CT scan images from the publicly available LUNA16 dataset. It contains only CT scans with slice thickness less than or equal to 2.5 millimeters (mm). Inconsistent slice spacing or missing slices were excluded, leaving 888 CT scans out of a total of 1.018 CT scans from the LIDC-IDRI database. It was defined 1.186 positive nodules that at least three out of four radiologists believe that the nodule diameter is bigger than 3 mm [11]. The LUNA16 dataset is split into 10 subsets. Thus, we perform 10-fold cross-validation to evaluate the performance of the CADe system.

B. PRE-PROCESSING
The original thoracic CT scan images were pre-processing with different attenuation ranges in each step. Thus, we applied a linear transformation to the CT number values of the original image in Hounsfield Units (HU) inside a defined attenuation range. CT number values below and over of the range is mapped to a constant. Equation (1) gives the mapping of the CT number values. Fig. 3 illustrates four different attenuation ranges. Among them, α is the output value of the linear transformation, β is the input value of the original CT image. The ζ and δ are the minimum and maximum of the output CT number values, respectively. The γ and λ are the minimum and maximum of the input CT number values, respectively.

C. NODULE CANDIDATE DETECTION
We designed the NCD step to identify and localize pulmonary nodules with the highest possible sensitivity. Thus, we try to minimize false negative, i.e., cases with no opportunity to be corrected. The FPR step tries to correct the false positive cases.

1) DATA ACQUISITION
The LUNA16 dataset ignores nodules annotations larger than 30 mm. Thus, machine learning-based CADe systems trained with this dataset could learn to ignore large nodules [25]. Therefore, due to the limited annotations provided by the LUNA16 dataset, we used additional training data in the NCD step from the one publicly available in Reeves and Biancardi [38], i.e., 2D nodules annotations performed by experts. We also used an additional validation set with 123 from CT scans from the LIDC-IDRI database.
Due to the heterogeneity of the low-dose thoracic CT scan images and the diversity of tissues present in the exams, we defined an attenuation range with the CT numbers γ = −1000 HU and λ = 400 HU, and rescaled them to ζ = 0 and δ = 255 (Section III-B). Thus, we got slices of CT scan with 8-bit and kept the isotropic resolution of the slices.
Then, we generated images with three channels composed of slices of CT scans. We defined E = {e 1 , e 2 , . . .} how a thoracic CT scan and e a slice of the exam in the axial plane. We also defined I = {e | e ∈ E} as an image, i.e., I E. Therefore, an image I contemplates three adjacent slices I = {e i−1 , e i , e i+1 }. Besides, the slices that make up the ends of an exam generate exceptions, in this case, two image channels correspond to the same slice.
Thus, the employed image contemplates context information between the slices, an essential aspect for nodule detection since a nodule can contemplate more than one slice. Another reason for using the three-channel images was to accommodate the DCNN architecture pre-trained with natural images to 2D inputs.

2) OBJECT DETECTION METHOD
We used the end-to-end object detection Mask R-CNN method [20] to predict bounding boxes for each 2D nodule candidate in the input 2D image. We employ the architecture ResNeXt as the backbone network, configured with 101 convolutional layers, the cardinality of 64, and the bottleneck width of 4d [39]. Motivated by the success of deep transfer learning in CADe systems [40], [41]. We used the transfer learning technique with pre-trained weights of the ImageNet database (i.e., 2D natural images) [42]. Next, the weights were fine-tuning with our dataset (Section III-C1). The problem was modeled with one class nodule.
At the training stage, the network was optimized by Stochastic Gradient Descent (SGD) algorithm with backpropagation, where the initial learning rate was set to 10 −3 and gradually reduced up to 10 −5 , the momentum parameter was set to 0.9, the gamma was set to 0.1, the weight decay was set to 10 −4 , and the mini-batch size was set to 4. We regularized our models by the early stopping strategy to avoid overfitting. In the testing stage, we accepted the detection confidence index (i.e., P NCD ) greater than or equal to the threshold of 0.1.

3) LUNG PARENCHYMA SEGMENTATION
Thoracic CT scan images may contain imaging sensor noises, tissues, cartilages, calcifications, bones, organs, etc. Therefore, we employ a scheme to segment the lung parenchyma, it contemplates many stages, as shown in Fig. 4.
Initially, we defined an attenuation range with the CT numbers γ = −1000 HU and λ = 400 HU, and rescaled them to ζ = 0 and δ = 1.0 (Fig. 4b). Next, we used a binary threshold with the mean value of the whole CT scan to divide the chest into the outside and inside regions (Fig. 4c). We also used a binary min/max curvature flow filter to denoise due to processing previous. Then, we segment the thoracic with the neighborhood connected filter using eight seeds, one to each corner from the CT scan (Fig. 4d). Irrelevant objects were removed through the binary dilation filter with the ball structure element of radius seven (Fig. 4e). Furthermore, we applied the logical operation OR between the resulting image and out of the curvature flow filter (Fig. 4f), followed by the logical operation NOT (Fig. 4g). After, the two largest connected components were selected as the internal lung area. However, there were holes in the mask generated during the segmentation process, such as noise, graininess, vessels, tissues, nodules attached to the pleura, etc. In sequence, we applied the morphological operation DILATION with the ball structure element of radius five (Fig. 4h), and we applied the convex hull algorithm per slice (Fig. 4i).  Examples of iteration of the proposed algorithm to discard repeated nodule candidate detections. a) Nodule candidates for an exam. b) Directed graph generated from the nodule candidates (Fig. 5a). c) Directed subgraphs generated from the directed graph (Fig. 5b). d) Nodule candidates for an exam. e) Directed graph generated from the nodule candidates (Fig. 5d). f) Directed subgraphs generated from the directed graph (Fig. 5e).
Therefore, the generated parenchyma mask served to discard the 2D nodule candidate detections with the center outside the lung parenchyma generated by the Mask R-CNN method. Fig. 4i illustrates the detections on each slice. In yellow, the detections disposable, and in red, the detections non-disposable, which the algorithm [37] uses to group and generate 3D nodule candidate detections.

4) DISCARD NODULE CANDIDATE DETECTION
In the list of nodule candidates generated from algorithm [37], we identified some candidates whose centroid was too close to each other. Problem repeatedly identified in CADe systems [11], [13], [14]. Therefore, we designed an algorithm to remove redundant nodule candidates. Details of the algorithm are shown in Algorithm 1.
The proposed algorithm aims to select a set of nodule candidates to identify a region of an exam. The relationships between nodule candidates using a directed graph were decomposed into directed subgraphs to distinguish different relationships between nodule candidates. The hypothesis is that each directed subgraph represents a region that possibly identifies a nodule. In summary, given a set of nodule candidates C in per exam, the algorithm generates a set of directed graphs. Each graph can generate one or more directed subgraphs distinct based on the criterion out-degree (outDegree greater than zero) and path length (pathLength equal one). Thus, each resulting subgraph can generate one or two nodule candidates.
However, there are patients with multiple nodules with the centroids close to each other. For example, the algorithm can generate one or two nodule candidates for each directed subgraph. The algorithm checks if there is a difference between the vertex with a greater out-degree and the vertex with a greater confidence index. Thus, the two vertices have characteristics to identify distinct nodules with close centroids. Fig. 5 illustrates one hypothetical example of nodule candidate detections of an exam, represented in 2D, to help understand the algorithm iterations. Fig. 5a shows three nodule candidates, c 0 (black), c 1 (cyan), and c 2 (red), with the confidence index 0.9, 0.8, and 0.7, respectively. Fig. 5d shows three nodule candidates, c 3 (black), c 4 (cyan), and c 5 (red), with the confidence index 0.9, 0.8, and 0.7, respectively. Therefore, there are six candidate detections in the exam C in = {c 0 , c 1 , c 2 , c 3 , c 4 , c 5 }. Each candidate has five data as illustrated in Algorithm 1, e.g., c 0 = [x, y, z, d, p = 0.9].
Initially, the proposed algorithm performs all combinations between the candidates of an exam. When comparing two candidates, the algorithm checks overlapped between candidates and used the radius as a criterion. Thus, it can determine the relationships between the candidates and creates a set of directed graphs (line 1 up to 11 of the Algorithm 1).
To exemplify these instructions, Fig. 5b shows oriented graph obtained G 0 (V 0 , A 0 ), composed of vertices V 0 = {v 0 , v 1 , v 2 } (mapped from candidates c 0 , c 1 , and c 2 , respectively) and edges Fig. 5e shows oriented graph obtained candidates c 3 , c 4 , and c 5 , respectively) and edges Further, the algorithm creates all possible distinct directed subgraphs that obey the criterions out-degree and path length for each directed graph generated. Therefore, each directed graph can generate one or more directed subgraphs distinct. Among the subgraphs that contain the same vertices, only remains the subgraph with greater out-degree and greater confidence index. When multiple directed subgraphs present the same values for out-degree and confidence index, arbitrary selection occurs (line 12 of the Algorithm 1).
We present through examples the result of the function. Fig. 5c shows the directed subgraphs distinct H 0 and H 1 , they have the same vertices, and the vertices v 0 and v 1 have the same out-degree (outDegree = 2). However, the vertex v 0 has a greater confidence index. Therefore, the subgraph H 1 is discarded. Fig. 5f shows the directed subgraphs distinct H 2 and H 3 , and they do not have the same vertices. Therefore, the two subgraphs remain.
Finally, the algorithm determines if each directed subgraph generates one or two nodule candidates. For decision-making, it evaluates if the vertex with the greater out-degree also has the greater confidence index, in this case, it generates only one candidate (e.g., vertex v 0 of the subgraph H 0 - Fig. 5c, and vertex v 3 of the subgraph H 2 - Fig. 5f). However, if the vertex with the greater out-degree is not the vertex with the greater confidence index, in this case, it generates a nodule candidate for each vertex (e.g., vertices v 3 and v 4 of the subgraph H 3 - Fig. 5f) (line 13 up to 20 of the Algorithm 1). Besides, if a vertex is listed more than once (e.g., vertex v 3 is listed in the subgraphs H 2 and H 3 - Fig. 5f), it only results in a nodule candidate (e.g., v 3 - Fig. 5f).
In the example in Fig. 5a, there are three nodule candidates, and after the execution of the algorithm, only remained c 0 . for all c b ∈ C in do 3: if c a = c b then 4: db ← getDistanceBetween(c a , c b ) {Returns the distance between c a and c b .} 5: if db < c a then 6: V ← setVertexIfNotExists (V , c a , c b ) {Inserts the vertices c a and c b in V , if not exists.} 7: end if 9: end if 10: end for 11: end for 12: H ← getAllDirectedSubgraphDistinct(V , A, outDegree, pathLength) {Creates all the directed subgraphs distinct with outdegree greater than outDegree and path length equal pathLength. Among the subgraphs that contain the same vertices, only remains the subgraph with greater out-degree and greater confidence index.} 13: for all h ∈ H do 14: c od ← getVertexGreaterOutDegree(h) {Returns the candidate with greater out-degree on h} 15: c ci ← getVertexGreaterConfidenceIndex(h) {Returns the candidate with greater confidence index on h} 16: if c od = c ci then 18: end if 20: end for 21: C out ← getComplement(C out , C in , V ) {Inserts candidates who have no relationship with other candidates, i.e., the relative complement of sets C in and V , denoted by C in − V .} 22: return C out In the example in Fig. 5d, there are three nodule candidates and after the execution of the algorithm, just remained c 3 and c 4 .

D. FALSE POSITIVE REDUCTION
In order to face the nodule candidates classification problem, we used multiple 3D DCNN classifiers, specifically, the ensemble method with the Bootstrap Aggregation (Bagging) [43] technique. Therefore, we trained classifiers with samples of different sizes and attenuation range.

1) DATA ACQUISITION
In the FPR step, the subset for training was split, 80% were used for training and 20% for validation. We highlight that the validation set was only used to determine the best models with the deep learning techniques.
We extracted 3D patches with two different sizes for each candidate, the first one with 32 × 32 × 32 pixels and another with 48 × 48 × 48 pixels, i.e., L = {32 3 , 48 3 }, and with one channel centered around the candidate location. We used them as inputs to the networks.

2) 3D DCNN ARCHITECTURE
We proposed a 3D DCNN architecture based on the VGG [21], which provided better image classification performance and is commonly used for feature extraction [14]. The architecture was defined empirically with the results obtained in the validation set. The problem was modeled with two classes W = {nodule, irrelevant finding}. Each model generated by 3D DCNN architecture is a classifier, denoted by K (a) = {P(w | a) | ∀w ∈ W }, where the classifier K determines confidence index P(w | a) for input sample a to each class w of W . Fig. 6 shows the 3D DCNN architecture proposed to classify nodule candidates.
The architecture supports 3D patches with different sizes as inputs due global max-pooling layer, which was used to maintain only the most essential features. The network consisting of six convolution blocks, each block has the 3D convolution and 3D batch normalization layers, followed by the Rectified Linear Unit (ReLU) activation function [44]. The 3D convolution layer uses the parameters kernel 3 3 , stride 1, and padding 1. There are 3D max-pooling layers between the convolution blocks with the parameters kernel 2 3 and stride 2 3 . Before fully connected layers, we used a global max-pooling layer. Finally, between two fully connected layers, there is the ReLU activation function. To determine the confidence index of each of the classes, we used the Softmax activation function. Besides, the maximum number of channels is 128. In total, the network consists of 12 layers.

3) TRAINING PROCESS
The number of false positive candidates obtained of the NCD step was many times bigger than the number of true nodules of the LUNA16 dataset, which means the training sets were highly imbalanced. Therefore, we used data augmentation methods to balance the set. Thus, we address the data skewness problem and prevent overfitting by adding invariances to the set.
We construct an online scheme to apply randomly affine transformations for all training samples on-the-fly. In each training epoch, the center of each 3D patch was slightly shifted to a random position up to ±3 pixels along each direction (i.e., X, Y, and Z). Moreover, the 3D patch was rotated at angles 0 • , 90 • , 180 • , and 270 • randomly using three possible directions. Therefore, in each epoch, the composition of the training set was different. This procedure was applied to all 10 subsets.
At the training stage, each network was optimized by the SGD algorithm with back-propagation and cross-entropy as the loss function. The initial learning rate was set to 10 −3 , the momentum parameter was set to 0.9, the weight decay was set to 5 × 10 −4 , and the mini-batch size was set to 16. We regularized our models by the early stopping strategy to avoid overfitting. We store the model weights (i.e., checkpoints) at the end of every epoch, denoted by m cp where m is a model and cp epoch.

4) TESTING PROCESS
The classifier ensemble proposed to combine 3D DCNN models trained with different samples regarding attenuation ranges G and size 3D patches L. Besides, we employ the checkpoint ensemble method [45], which employs checkpoints stored in the middle of the training stage. Specifically, we used the best checkpoint based on the validation set (m cp ) and two more checkpoints as predecessors M = {m cp , m cp−1 , m cp−2 }. Also, each classifier performed several classifications about the same candidate with different angle rotations, i.e., for each candidate, four augmented copies were generated and rotated at angles O = {0 • , 90 • x, 180 • y, 270 • z}. Therefore, the final prediction of the classifier ensemble comprises the fused of the output of several 3D DCNN to nodule class through average. The calculation formula is presented in Equation (2).
In order to get the final classification result of the testing stage, we combined the confidence index obtained by the NCD and FPR steps through average, hence enhance discrimination capability. Equation (3) presents the calculation formula.

IV. EXPERIMENT AND RESULT
The proposed CADe system was tested and evaluated on the publicly available LUNA16 dataset. The experiment was repeated 10 times (i.e., 10-fold cross-validation), each time leaving out one different subset for testing and nine for training. The evaluation was performed using the FROC curve, obtained by sensitivity and the average number of false positives per scan. We used as analysis and comparison criterion the operative point 1 FP/Scan and CPM metric. Experiments were realized using two NVIDIA Titan Xp graphics card with 12 GB.
To the NCD step, we evaluated each sub-step used, and thus we measured the impact on the system. The experimental results are shown in Table 1. When using only the 2D bounding boxes with the algorithm to group 2D bounding boxes on 3D nodules, the system achieved a sensitivity of 98.57% and an average of 59.47 FPs/Scan (Table 1 -Sub-step 1). The 2D Mask R-CNN method generated 159.580 bounding boxes with a confidence index greater than or equal to 0.1. When discarding 2D bounding boxes generated outside the lung parenchyma, the system to achieved sensitivity of 98.57% and an average of 54.31 FPs/Scan, i.e., it reduced the number of candidates by 8.66% without losing any True Positive (TP) ( Table 1 -Sub-step 2). Lastly, when also using the algorithm proposed to discard  nodule candidate detections that have centroid close to each other in an exam, the system achieved a sensitivity of 98.15% and an average of 49.67 FPs/Scan, and thus it reduced the number of candidates by 8.54% and losing only 5 TPs (Table 1 -Sub-step 3). As mentioned above, this step reached the goal to identify and localize pulmonary nodule candidates with reasonable rates between false positives and sensitivity.
We also evaluated the performance of the CADe system proposed at seven operative points of the FROC curve and calculated the CPM score. The results are shown in Table 2. Table 3 shows the performance comparison CADe system proposed and that of other published CADe systems. The works of Setio et al. [11] and Zhang et al. [22] obtained better sensitivities on nodule candidate detections, 98.30%, and 100.00%, respectively. However, in order to localize as many candidates as possible, these CADe systems obtained more FPs. The CADe system proposed by Zhang et al. [22] achieved the same result as we did, a sensitivity of 94.90% and an average of 1.0 FP/Scan. However, the CADe system proposed by Setio et al. [11] achieved a sensitivity of 96.90% and an average of 1.0 FP/Scan. Although, they combined the results from seven pulmonary nodule candidate detection systems and five false positive reduction systems. Newer studies have been proposed with relevant results. For example, Zhu et al. [28] treated the quality of the images, and Peng et al. [34] and Mei et al. [35] proposed new DCNN architectures. However, they do not surpass all our results in the different steps (NCD and FPR) with the evaluation metrics average sensitivity at FPs/Scan or CPM.
Our method achieved a CMP score of 92.24, a result comparable with current researches. We highlight that the CADe system proposed to have achieved a sensitivity of 94.90% and an average of 1.0 FP/Scan, outperforming other state-of-the-art methods. Fig. 7 shows the detection results, where we only visualize a slice where the detection center is located. We crop the square area with the detection center with the size of 48 pixels. Fig. 7a shows the detected true positive nodules (cyan rectangles), and Fig. 7b shows the detected false positive nodules (yellow rectangles), and Fig. 7c shows the false negative nodules (magenta circles).
Besides, our proposal has a low computational cost. In the NCD step, we used 2D DCNN, and in the FPR step, we used  3D DCNN to classify small patches. Despite using a classifier ensemble, the training process is performed offline.
Most CADe systems have their operative point set somewhere between 1 to 4 FPs/Scan [23], [32]. Therefore, our proposal is in line with current usage practices and it has the potential to be used in clinical decision support.

A. ABLATION STUDY
To assess the gain in performance caused by the classifier ensemble, we compared the individual performance of each classifier. Fig. 8 presents the FROC curve of the CADe system proposed, also exposes the individual performance of each classifier. For each classifier, we used only a sample O = {0 • }, and the best checkpoint M = {m cp }. Thus, each classifier exclusively uses a sample size L and an attenuation pattern G.
We evaluate each classifier ensemble based on an attenuation pattern to demonstrate the performance individually and   the boost performance when combining different patterns.
In the experiments, we used L = {32 3 , 48 3 } and combine the different parameters of G = {γ = −1000 and λ}. Table 4 shows quantitative results at seven operative points of the FROC curve and calculated the CPM score.
We also measure the impact of the checkpoint ensemble method against the strategy to classify the same candidate with different angle rotations. Table 5  Notoriously, the classifier ensemble performed better than individual classifiers. Therefore, we understand that the elaboration of the proposed classifier ensemble was assertive, helped to boost the performance of the CADe system.
We were also able to identify that each classifier presents a different performance, mainly influenced by the attenuation pattern and size sample.

V. CONCLUSION
This paper proposes a CADe system that uses a classifier ensemble based on CT attenuation patterns to detect pulmonary nodules in low-dose 3D CT scans automatically. For pulmonary nodule candidate detection, we used an end-toend object detection method. Next, we used an algorithm to group 2D bounding boxes on 3D nodules. Then, we designed the algorithm to discard nodule candidate detections that have centroid close to each other in an exam. Also, we designed a classifier ensemble for false positive reduction, which fuses the individual classification through average to improve nodules classification sensitivity. Our CADe system achieved a sensitivity of 94.90% and an average of 1.0 FP/Scan on the publicly available LUNA16 dataset. We believe that the CADe system proposed is in line with the current practices and it can be a powerful computational tool for clinical decision support on pulmonary nodule detections.