A Fast and Efficient CAD System for Improving the Performance of Malignancy Level Classification on Lung Nodules

Accurate malignancy level classification of lung nodules can reduce lung cancer mortality rate effectively. In this study, we present a fast and efficient CAD system to improve the performance of nodules malignancy level classification. Firstly, to reduce false positives (FPs), we propose a novel vessel segmentation method which measures vessel likelihood by tubular-like structures discriminating from multiple views. The method can recognize irregular vascular structures robustly and sensitively, and achieve fast vessel segmentation. In addition, a mathematical description for 3D pulmonary entities using neighbor centroids clustering is provided as a fundamental condition for spatial feature extraction. To optimize features extraction, we formulate a gray values cumulative function and a patches selection function based on the mathematical description, to generate axial spatial outline and spatial density distribution samples of the entities, respectively. Then, we use Edge Orientation Histogram (EOH) to extract edge features from the spatial outline and propose a multi-scale path LBP (MSPLBP) to extract the texture feature of the density distribution samples. Finally, the fused EOH and MSPLBP are classified into 6 malignancy levels by three state-of-the-art classifiers. The experimental results show that the vessel segmentation method achieves an average <inline-formula> <tex-math notation="LaTeX">$F1\_Score$ </tex-math></inline-formula> of 78.14% and AUC value under <inline-formula> <tex-math notation="LaTeX">$PR$ </tex-math></inline-formula> curves of 0.8149. Moreover, our system reaches an average <inline-formula> <tex-math notation="LaTeX">$accuracy$ </tex-math></inline-formula> of 95.88% and consumes average 176.26 seconds for evaluating a CT set on malignancy level classification. These results indicate that the system can segment vessels exactly, and classify the malignancy level of nodules efficiently. Our system is the potential to be a powerful tool for early diagnosis of lung cancer.


I. INTRODUCTION
Cancer is defined as a relentless growth of abnormal cells in a specific tissue that can spread from the primary neoplasm to distant organs [1]. Lung cancer causes 1.3 million deaths annually and has the highest mortality rate among all cancer-related diseases [2], [3]. A statistic from CA : A Cancer Journal for Clinicians reveals that a total number The associate editor coordinating the review of this manuscript and approving it for publication was Hossein Rahmani . of 234,030 new cases of lung cancer occurred in 2018 and 66% of these patients died [3]. Actually, 80% of lung cancer patients are diagnosed in the advanced stages which results in the 5-year survival rate falling from 70% to 18% [4]- [6]. To solve this problem, classifying the malignancy level of lung nodules is clinically important because they are the crucial manifestations of lung cancer at an early stage.
Lung nodule is a rounded opacity whose maximum diameter is less than 3 cm [5]. A diversity of radiographic characteristics of lung nodule e.g., size, margin, nodular VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ calcification, and nodular cavitation, can be used to differentiate malignancy levels of lung nodule [7], [8]. However, radiologists have to check a large number of CT slices to analyze lung nodule, and this work is burdensome and time-consuming [9], [10]. Therefore, computer-aided diagnosis (CAD) systems have arisen to assist them and potentially enhance evaluation efficiency.
In general, there are four main stages in a CAD system for lung cancer including image preprocessing, nodule candidate detection, false positives (FPs) reduction and malignancy level classification [11], [12]. In the stage of nodule candidate detection, a considerable number of suspicious nodules are marked out from lung parenchyma of the thorax. It should be noted that most of the nodule candidates are FPs, and they will reduce both the accuracy of malignancy level classification and the efficiency of the CAD system [13]. Thus, the FPs reduction is often regarded as the prerequisite for malignancy level classification. Preliminary results show that the usage of vessel segmentation approach can eliminate approximately 38% of the FPs [14], [15]. Unfortunately, accurate vessel detection in CT images still remains a problem because of the geometrical complexity in vascular structures e.g., vessel branches, bifurcations and small vessels [16].
In recent years, considerable research efforts have been devoted to solving this problem. Among these works, 3D image-based vessel segmentation has been intensively investigated for its advantages in generating 2D vessel slices at arbitrary views, which can significantly increase the accuracy of vessel detection. Existing 3D vessel segmentation methods can be divided into two categories: learning-based and model-based. The learning-based vessels segmentation methods can describe vascular feature comprehensively and offer exciting opportunities for detecting more general vessel structures [17], [18]. The recent experimental results suggest that such approaches can achieve great results in terms of sensitivity, specificity, and accuracy, especially in 3D neuron [19] and retinal vessel segmentation applications [20]. However, the training stage in such methods requires large number of vessel labeling information which is extremely rare in chest CT dataset. In addition, the model-based methods e.g., Hessian-based model [19], multi-scale models [20], centerline extraction model [21], have emerged as a powerful and efficient tool on vessel segmentation for decades years. Among these methods, the Hessian-based model like the Frangi filter [22] is thought of as a key approach and has been widely used in many applications including lung vessel segmentation [23], cerebral vascular structure segmentation [24], and color fundus retinal vessel segmentation [25]. For example, in the work developed by Forkert et al. [24], the authors use the Hessian matrix to enhance tubular-like structures in the time-of-flight image sequence. This effort plays a vital role in this method for achieving an exact segmentation of malformed as well as small vessels from TOF magnetic resonance angiography dataset. Similarly, Zhang et al. [26] adopt the Hessian matrix to find a locally adaptive derivative (LAD) from retinal fundus images in their method. The experimental results show that the method by using LAD can deal with typically difficult cases like crossings, central arterial reflex, closely parallel and tiny vessels. However, when processing the vessels in chest CT images, some problematic disadvantages of the Hessian-based model in the previous 3D vessel segmentation methods mentioned above cannot be overlooked. One challenge is the detection of complex vascular structures, such as junction suppression, varying diameter, endpoint or bifurcation, because they usually contain many local blob-like structures which have low response to the Hessian filter, and thus the morphological characteristics of the vessels are broken seriously. Additionally, 3D Hessian-based model are computationally expensive since their processing of each voxel and its neighborhood in the image at several scales.
In this work, we propose a novel multi-view vessel segmentation method by detecting tubular-like structures information to solve the above problems. It is noteworthy that the tubular-like structure is the decisive characteristic to recognize various types of vessels. Specifically, the input CT images set is first reconstructed into three slices sequences along the normal directions of three orthogonal planes i.e., sagittal plane, coronal plane and transverse plane, and we can obtain the cross-sections of each lung vessel from multi-view. This scheme drastically reduces the computational complexity of our method comparing with voxel calculation. In addition, it also ensures the available differentiation between vessels and nodules, and increases the sensitivity of detecting vessels by utilizing three orthogonal views. Then, in each view, the connected regions are enhanced by Frangi filter and the response results are employed by our proposed structure discriminating model, to identify the tubular-like structures. This model considers the drawback of Frangi filter that has low response on complex vascular structures locally, and thus uses the overall response ratio to determine vessels. Finally, we measure the vessel likelihood of each pixel relying on their structure discriminant results in three orthogonal views. The experimental results show that our method can recognize more general vascular structures robustly and sensitively, and achieve fast vessel segmentation.
Besides, after the process of vessel segmentation, classifying the remaining connected regions, including nodules and lung tissues, into their malignancy levels is another concern in our system. Focusing on this problem, feature extraction is an effective approach that can describe the relevant image information contained in a pattern [1]. To improve the generalization performance, data fusion has become an emerging direction in feature extraction since this method can learn multiple features from objects [27]. In particular, fusing edge features and texture features for the malignancy level classification of lung nodule has been mentioned in a number of studies, because the morphological characteristics of nodules are sophisticated and single feature set can't comprehensively describe their information. One of the critical contributions for the issue should be mentioned, for instance, is the work of Li et al. [28] that proves texture features and edge features are helpful in differentiating malignant nodules from benign nodules. Moreover, to improve the description of lung nodules, Zhang et al. [29] design a multiple features descriptor that fuses texture and edge information of image patches in their study. The results from their experiments clearly demonstrate the promising classification performance of the method.
It should be pointed out that data fusion has two major problems that one is optimizing the description and the other is reducing dimension. Fusing more features is definitely better for optimizing description. But considering most of the popular single feature descriptors, such as HOG [30], MS-LBP [31] or SIFT [32], have already suffered the problem of high dimension, fusing these features will make this problem more serious. Furthermore, 3D feature is often adopted for optimizing lung nodule description, and this may further increase the dimensions. Therefore, to realize a better feature extraction of lung nodules for malignancy levels classification, optimizing description and reducing dimension should be considered, simultaneously.
In response to these challenges, this work presents an efficient malignancy level classification method by extracting spatial edge features and spatial texture features from the result of vessel segmentation. Our method comprises four main steps. Before the feature extraction, we design a mathematical description for the remaining connected regions that uses neighbor-centroids clustering to obtain spatial association information between these regions. This mathematical description is the fundamental condition for extracting spatial features in the subsequent steps. Moreover, we present an approach for extracting spatial edge features based on the mathematical description. In this approach, we formulate a gray values cumulative function that combines all patches of an entity into one image. The function can both improve the outline completeness of the entities and reduce the computational complexity of spatial edge feature extraction. From the combined image produced by the function, we use Edge Orientation Histograms (EOH) [33] to extract edge features. EOH is a simple and efficient edge feature descriptor, which is suitable for our work. Furthermore, a spatial texture feature extraction approach is proposed in this work. To simplify the density distribution representation of the entities, we formulate a patches selection function in this approach. The function selects three samples discretely from each patches sequence. For optimizing the texture feature extraction from the samples, we propose a multi-scale path LBP (MSPLBP) based on path integral Local Binary Patterns (pi-LBP) operator [34]. Pi-LBP can effectively encode the cross-scale correlation and reduce the sensitivity to noise. However, this method has poor performance in describing the texture size and texture structure, and its dimension is very high. Considering this problem, MSPLBP expands the lateral scale to improve the description of the texture size and the texture structure, and merges the paths to reduce the feature dimensions. Finally, three state-of-the-art classifiers i.e., NNCS [35], MWLSTSVH [36], and DLSR [37] are used to classify the fused spatial edge and texture feature of the 3D lung entities into 6 malignancy levels.
In general, our contributions can be summarized as follows: 1. We propose a multi-view vessel segmentation method to reduce FPs of lung nodule candidates. The method can recognize more general vascular structures robustly and sensitively, and achieve fast vessel segmentation.
2. We design a mathematical description for 3D lung entities with the results of vessel segmentation to provide the fundamental conditions for spatial feature extraction.
3. We formulate a gray values cumulative function to improve the outline completeness of the entities and simplify the spatial edge feature extraction using EOH.
4. We formulate a patches selection function for the entities to reduce redundant information of density distribution representation. And a texture feature descriptor i.e., MSPLBP, is proposed to extract texture features from the function results, which can improve the texture feature description and lower feature dimensions.
The rest of this paper is organized as follows. Section 2 presents the detailed design on vessel segmentation, spatial edge and texture feature extraction, and three state-ofthe-art classifiers used in our work. Experimental results and discussion are reported in Section 3. Finally, the conclusion of this paper is in Section 4.

II. MATERIALS AND METHODS
The overview schema of this work is depicted in Fig. 1. The proposed method depends on four modules: (1) Initialization: the pixels whose intensities are below to a given threshold are set to 0. As the intensity of both the lung nodule and the vessel are usually high, we can effectively eliminate the noise having low intensity in the CT image; (2) Vessels Segmentation: CT images are reconstructed into 3 slice sequences at 3 orthogonal views. In each sequence, we compute the overall response ratio of each connected region in all the slices to judge tubular-like structure. With this result, we discriminate a voxel in the CT set as vessel if it belongs to a tubular-like structure region in any view; (3) Spatial feature extraction: the spatial edge feature and the spatial texture feature are extracted to improve the feature description of our system; (4) Classification: the fused edge and texture feature vector is classified into 6 malignancy levels. The detailed description of these modules is introduced in the following section.

A. IMAGE ACQUISITION
The chest CT databases used in this work are provided by the LIDC and our co-operator LNUTCM. As we all know, LIDC is a publicly available reference database of low-dose helical CT images which consists of 1018 cases [38]. Notably, the Affiliated Hospital of LNUTCM is a grade-A hospital and national TCM model hospital granted by the National Traditional Chinese Medicine Administrative Bureau [39]. It is one of the best models of the Chinese health system, being appraised as the state standard-setter for reliability in VOLUME 8, 2020 public treatment and service [39]. Our CT database supplied from LNUTCM contains 382 cases.
To quantitatively evaluate our vessel segmentation method, the data of CT images with vascular labels is required in this work. But no gold standard exists in advance for labeling vessels in chest CT images. Hence, artificially defined vascular labels for both LIDC and LNUTCM are used in our validations. These labels in our work are provided by 16 radiologists from LNUTCM with 8-10 years of experience in general radiology and then a thorax surgeon with 30 years of experience is enquired for further verification. In addition, considering that manual segmentation is extremely tedious and timeconsuming, we select 158 CT sets to label vessels for the quantitative evaluations of our vessel segmentation method in the experiment. The selection standard of these data sets mainly includes two aspects. The first one is that the CT sets should contain lung nodules with various shape characteristics. This is because the goal of our vessel segmentation method is to optimize the malignancy level classification of lung nodules, and thus the data is required to evaluate not only the performance of our method on segmenting vessels, but also excluding lung nodules. It must also be mentioned that there are 6 cases having no pathology contained in the 158 CT sets to improve the diversity of the data. Moreover, the CT slice spacing is also the important standard for selecting the CT sets. As the slices spacing will influence the imaging quality of sagittal plane and the coronal plane, and thinner slice spacing can present better morphological characteristic, we only select the CT sets having slice spacing less than 2mm.
Unlike the labeled vessel data, both LIDC and LNUTCM contain detailed annotations about subjective nodule malignancy levels. Four specialists set an integer value from 1 to 5 to evaluate each nodule's likelihood of malignancy for LIDC: 1 is highly unlikely for cancer, 2 is moderately unlikely for cancer, 3 is indeterminate likelihood, 4 is moderately suspicious for cancer and 5 is highly suspicious for cancer [38]. Here, considering non-nodule may also involve in the stage of classification, we add 0 to denote impossible for FIGURE 2. 3D nodule and vessel mapping on the sagittal section, coronal section, and transverses section. (a) shows a 3D nodule model, whose imaging on three mapping planes all present blob-like structures. In contrast, (b) displays a 3D vessel model that only the mapping plane on the z-axis presents blob-like structures, whereas both the other two mapping planes on the x-axis and y -axis present tubular-like structures. It can be found from (a) and (b), the most significant difference of vessels and nodules is the appearance of tubular-like structures in some certain mapping planes.Thus, we can detect tubular-like structures from multiple views to recognize vessels, especially to distinguish them from nodules.
cancer. Therefore, a total of 6 levels are used to evaluate the malignancy of a lung nodule candidate. For the data from LNUTCM, an experienced specialist was asked to set an integer value for each nodule with the same standard as LIDC, and we also add level 0 to this database.

B. VESSEL SEGMENTATION METHODOLOGY
Vessel segmentation is a necessary prerequisite for accurate malignancy level classification of lung nodules because vessels often produce ambiguity in lung nodules detection. In this subsection, we propose a vessel segmentation method that detects tubular-like structures from multiple views.

1) MULTI-VIEW VESSEL SCHEME
It is commonly known that vessel is a kind of tube that usually exhibits tubular-like structures in images. However, in many cases, vessels may present blob-like structures that are similar to lung nodules in the single view, especially when they are perpendicular to the images. Thus, using a single view method to segment vessels is confusing, and this is one of the main reasons for the high FPs rate in lung nodules detection. To solve this problem, we adopt a multi-view vessel reconstruction scheme to increase the probability of the appearing tubular-like structures which can help to distinguish vessels from nodules.
To illustrate the difference between vessel and nodule in multiple views, we exhibit the mapping results of a nodule model and a vessel model in Fig. 2. From the figure, we notice that mapping a vessel onto three orthogonal planes can obviously increase the probability of the appearance of tubular-like structures. Thus, we restructure CT images set of lung nodules candidates into three orthogonal planes sequence i.e., sagittal plane, coronal plane, and transverse plane. Each of these reconstructive planes contains the cross-sections of lung entities in its view. It should be pointed out that the smaller intersection angle of a vessel with the planes is more helpful for the cross-sections to present segment AB is the cross-section of a vessel intercepted by x-axis, line segment CD is the cross-section of the vessel intercepted by the y -axis. θ 1 , θ 2 denote the intersection angles of ab with the x-axis and y -axis, respectively. The intersection angle θ 1 is smaller than θ 2 , and this leads to AB is longer than CD; (b): α, β, γ represent the intersection angles between a vessel and x-y plane, y -z plane, x-z plane, respectively, and it is impossible to make them greater than 45 • , simultaneously. a tubular-like structure and we utilize Fig. 3(a) to present this issue. Corresponding to Fig. 3(a) in practical application, AB representing vascular cross-section is more similar to a tubular-like structure than CD. In addition, we find that at least one of the angles between vessels and the reconstructive planes is not greater than 45 • that is shown in Fig. 3(b). This ensures the length of the cross-sections on reconstructive planes is larger than 1.4 times of the vascular diameter. Moreover, the intersection angles are even smaller in practical application because vessels are not absolutely straight. Therefore, multi-view CT images reconstruction greatly improved the probability of appearing tubular-like structures.
With the description above, our multi-view reconstruction scheme mainly reveals three advantages for vessel detection. As it promotes the tubular-like structure to appear, the difference between vessels and nodules becomes more obviously and thus the performance of distinguishing them is improved. Furthermore, the sensitivity of detecting vessels is also increased due to we have a greater probability to find a tubular-like structure. This effectively reduces the interference structures for the next stage. Finally, the scheme transforms the 3D CT images set into three groups of 2D images. This process drastically reduces the computational complexity comparing to other 3D vessel modelings. Therefore, this scheme can optimize our vessel segmentation method comprehensively.
It must also be mentioned that our method may produce incorrect segmentation in some cases. In general, the lungs within chest CT images can be grouped into sets of lung tissues, nodules, and vessels. Except for vessels, a small number of lung tissues like trachea can also present tubular-like structure in the reconstructive plane which may result in lung tissue segmented incorrectly by our method. But it does not affect the lung nodule malignancy level classification in our system for the following two reasons. The first reason is that the removal of tubular-like lung tissue does not break the lung nodule area used in feature extraction. Besides, although this problem may cause a drop in the accuracy of vessel segmentation, sensitivity is improved in our system and this is more important because removing more interference structures is helpful to malignancy level classification of nodules.

2) TUBULAR-LIKE STRUCTURE DETECTION
Through the process of our multi-view vessel reconstruction scheme depicted above, we have improved the probability of the appearance of tubular-like structures. In this subsection, we aim to detect the tubular-like structure in each view and thus propose a structure discriminating model based on the response result of the Frangi filter. Frangi filter has an advantage in distinguishing tubular-like structures and blob-like structures by using the ratio of Hessian eigenvalues. This is extremely suitable for vessel segmentation in the nodule detection system. However, this method suffers some challenges, especially when detecting junction, endpoint, bifurcation, or some irregular vessel structures. Considering these problems, our structure discriminating model calculates the overall response ratio of a connected region to measure its likelihood of tubular-like structure which improves the robustness of detecting more general tubular-like structures. The details of this model will be further described below.
Frangi filter uses the second-order derivatives of image intensities to distinguish tubular-like structures from blob-like structures. In this method, the second derivatives of the Hessian matrix are calculated using the concepts of linear scale-space theory. With these concepts, the interference from local noise in images can reduce effectively. Specifically, the second derivative of a pixel (x, y) is defined as a convolution with derivatives of Gaussian calculated in (1) - (3): (1) where I s (x, y) is a (2s + 1) × (2s + 1) image block with the center pixel (x, y); G xx (s), G yy (s) and G xy (s) are the second derivatives of Gaussian filter at scale s as shown in (4) - (9): where G(x, y) is a two-dimensional Gaussian function. Then the Hessian matrix of the pixel (x, y) with the Gaussian filter at scale s is shown in (10) and a function is defined using the VOLUME 8, 2020 eigenvalues of the Hessian matrix for measuring vesselness shown in (11): where V 0 (x, y, s) denotes the response of the filter to pixel (x, y) at scale s; λ i (i = 1, 2) are the eigenvalues of H (x, y, s); (12) is the measure of second-order structures and R B in (13) is the 2D blobness measure accounting for the eccentricity of the second-order ellipse; β and c in (11) are thresholds that control the sensitivity of the vessel filter to the measures R A and R B . The method uses the maximum of filter responses V 0 (x, y, s) at all scales to be the vesselness measure: where s min and s max are the maximum and minimum scales at relevant structures for covering the range of vessel widths. The enhancement results of Frangi filter for different situations are shown in Fig. 4, where (a), (b), (c), (d) are vessel models and (e) is a nodule model. In each group, the left is original image and the right is enhancement result.
As can be seen from Fig. 4, although the areas having low response to Frangi filter exist in both the vessels and the nodule, the ratios of these areas in the vessels are far less than the nodule. Fig. 4 (b), (c) and (d) display some complex vascular structures including endpoints, junction points, and varying diameter, all of which have low response results to the filter. Through analysis, we find that the low response areas usually have smooth edges enclosing them and this makes them form blob-like structures locally. In these areas, their second derivatives in vertical or horizontal directions decrease dramatically, which leads to the low response result for the filter finally. In contrast, Fig. 4(a) has a well enhancement result because the edge of its endpoint is sharp. Considering the areas enclosed by smooth edges in tubular-like structure are far less than the blob-like structures as is shown in Fig. 4(e), we find that the overall response ratio to the Frangi filter can be used to distinguish them. Therefore, we present a structure discriminating model of (15) that uses the conjunction of response ratio and axis lengths ratio: where CR is a connected region, threshold T RF is set for RspFgi() in (16) which is the overall response ratio of a connected region to Frangi filter, and threshold T RL is set for OtlinLth() in (17) that is the length ratio between the major axis and the minor axis of the ellipse having the same standard second-order central moment as CR. In (16), I (x, y) is the gray value of pixel (x, y) in CR.
In (17), MajLength() and MinLength() are the lengths of the major axis and the minor axis, respectively. This function can ensure the overall outline of CR presenting tubular-like structure, which is important for small regions. (17) Finally, the discriminant result of (15) is used in (18)-(21), where (18)-(20) can determine whether a pixel belongs to the tubular-like structure region in a view, and (21) takes the final decision with the results from all the views. The (18)-(21) are shown as below: where g x i (y, z) is the decision result of a pixel at the position (y, z) in the reconstructive plane x i whose normal direction is parallel to x-axis; g y i (x, z) is the decision result of a pixel at the position (x, z) in the reconstructive plane y i whose normal direction is parallel to y-axis; g z i (x, y) is the decision result of a pixel at the position (x, y) in the reconstructive plane z i whose normal direction is parallel to z-axis. CR x , CR y and CR z are connected regions in the corresponding views that contain pixels (y, z), (x, z) and (x, y), respectively.
The g x i (y, z), g y i (x, z) and g z i (x, y) will be marked as 1 if the connected regions containing them are judged as vessels, otherwise they will be marked as 0. The g(x i , y i , z i ) denotes the final decision result of the pixel at position (x i , y i ) in z i th slice of CT sequence. This method utilizes global characteristics of enhancement result that can avoid the influence of the low response in endpoints, vascular junctions, and irregular cross-sections which improves the robustness of detecting tubular-like structures.

C. FEATURE EXTRACTION
After the vessels are segmented, we aim to extract spatial edge features and spatial texture features from the remaining connected regions for classifying malignancy levels in this subsection. It consists of three main parts. Firstly, we design a mathematical description for the connected regions to obtain spatial association information between them that provides a fundamental condition for feature extraction. Then the spatial edge feature and spatial texture feature are extracted in the following two parts, respectively.

1) REGIONS OF INTEREST (ROIs) CLUSTERING
The spatial association information between the remaining connected regions is essential for spatial feature extraction. To obtain the information, we should divide these connected regions into patches sequences corresponding to the entities they belong to. As the vessels have been removed by our vessel segmentation method, the number of connected regions is significantly reduced and this is helpful to cluster these regions. With the characteristic, we design a mathematical description using neighbor-centroids clustering to divide the connected regions. All of the feature extraction processes in the subsequent steps are based on this description.
Firstly, it must be declared that all the remaining connected regions in the CT slices are regarded as ROIs in this subsection. And then we divide these ROIs by slice into N sets that defined in (22): where SliceROI i is the set of ROIs in the ith slice and N is the number of all CT slices. We use P ini to denote ROI that is the first cross-section of an entity satisfying the conditions: (23) or (24).
where P s is a ROI, f (X , Y ) is a function that calculates the distance between the centroid coordinates of X and Y with the L 2 norm and T cen is the threshold for the determination of ROIs clustering. The model of ROIs clustering is shown in (25)- (27): where P and P are ROIs and ETI l K denotes a complete ROIs set of an entity that starts from lth ROI in K th CT slice. With this description, all the ROIs in the CT slices are divided into patches sequence corresponding to entities and we can extract spatial edge and texture feature from these entities.

2) EDGE FEATURE EXTRACTION
In this subsection, we design an approach for extracting spatial edge features from the patches sequence. Edge features including shape features, such as lobulation, spiculation and roundness [40], are the significant characteristics for cancer. However, as some lung tissue regions were broken in the process of vessel segmentation, these characteristics may be changed, and this may interfere with the precision of the edge feature description. To solve the problem, we formulate a gray values cumulative function that can integrate all patches of an entity in one image. This function can not only improve the completeness of the outline in each entity, but also reduce the computational complexity of spatial edge feature extraction. Then, considering gradient distribution provides powerful edge information for discriminating various anatomical structures of nodule in CT images, we utilize EOH to extract edge features from the function result.
To illustrate the characteristics of lung tissue and nodule, we exhibit two patches sequences in Fig. 5. It can be found from Fig. 5(a) that the lung tissue regions in each patch are the different parts of the same entity actually, and they are separated due to the vessel segmentation in the previous stage. This means the spatial structure of the lung tissue is broken and extracting spatial edge features in this situation is inaccurate. On the contrary, in Fig. 5(b), the lung nodule regions are reserved completely and their relative positions in each patch are similar. Through these characteristics, we find that overlapping all patches of an entity can enhance its edge feature. With this approach, the dispersed tissue regions are combined in one image and thus the complete edge feature is generated. In addition, for the nodule, the most distinctive patches can be outstood which is a benefit for classification. On the basis of these findings, we formulate a gray values cumulative function as follows to integrates all patches of an entity in one image where Gray i is the gray value of ith CT patch in an entity and n is the number of the CT patches.
VOLUME 8, 2020    6 shows the results of (28) in (a) and (c). For further enhancing the edge feature, we have a binary process before extracting edge feature and the binary images are shown in Fig. 6 (b) and (d). With this function, the spatial edge feature of the entities, especially the lung tissue, is enhanced. Moreover, the computational complexity of the spatial edge feature extraction is sharply reduced.
After the process of the gray values cumulative function, we use EOH to extract the edge feature. EOH is a simple and efficient edge feature descriptor that computes a histogram of gradient magnitude corresponding to gradient orientation. The drawbacks of EOH is sensitive to noise and illumination, and performing badly on the rotation component. As the integrated image in this stage has a simple background, the drawbacks mentioned above are not referred in our approach and thus EOH is suitable for our approach.
Specifically, EOH is adopted with horizontal gradient component G x and vertical gradient component G y generated from the convolution between an image block and two 3 × 3 special matrixes. The G x and G y are calculated in (29) and (30): is the orientation of the image gradient, calculated by (31) and (32): where π 2 in (32) is used to map the orientation θ into [0 • , 180 • ] for avoiding directional symbol. We divide [0 • , 180 • ] into σ bins and the magnitude E k is accumulated by (33) when belongs to the kth bin, where bin l denotes the lth bin in direction range.
θ(x, y) = m(x, y), θ(x, y) ∈ bin l As original EOH loses location information of the parts in an object, we divide the ROI into 2 × 2 = 4 blocks and extract EOH from each block, respectively. Thus, the EOH in our method comes from the 4 blocks, as is shown in Fig. 7. We evenly divide the gradient orientation into σ = 18 bins over 0 • to 180 • in the application and there are totally 18 × 4 = 72 dimensions emerged in the EOH feature vector.

3) TEXTURE FEATURE EXTRACTION
In addition to the edge feature, texture feature is also an important factor for the malignancy levels classification of lung nodules because it can describe density, and especially, distinguish ground-glass opacity (GGO) types of nodules. As the density of 3D nodule is no uniform, obtaining spatial density distribution is necessary to improve the accuracy of nodule description. To abstract the spatial density distribution representation of lung nodule and reduce redundant information of patches sequence, we formulate a patch samples selection function based on the mathematical description in Section II-C-1. The function selects three samples discretely from each patches sequence. Moreover, to optimize the texture feature description, we propose MSPLBP to extract texture features from these selected samples in this subsection.
As can be seen from Fig. 8, the patches at different positions of the same lung nodule may generate different density distributions and thus single patch usually can't reflected the real density distribution of the lung nodule. For example, to (d) of the first row in Fig. 8, this nodule can be judged as a pure GGO on the basis of the density distribution in the patch from (b), whereas from the second patch in (c), it can be seen as a mixed GGO. Moreover, the density distributions in neighboring lung nodule patches are similar. This means that a large amount of redundant density distribution information exists in these patches. Considering the above problems, we design a patches selection function that selects three samples from a lung nodule patch sequence to represent the density distribution of this nodule. The function is shown in (34) -(36): where n is the numbers of patches in the entity ETI l k , Area(P) is the function for calculating the area of ROI in patch P, and Round(x) rounds the elements of x to the nearest integers. Then, Tex i (i = 1, 2, 3) are used together for texture extraction in the subsequent steps. It can be seen from the (34)-(36) that Tex 1 , Tex 2 , and Tex 3 are indexes of the patches having the largest area in the front, middle and back parts of the sequence, respectively. This ensures these patches containing identifiable texture information and we can obtain the density distribution from them. In this function, we ignore other patches to reduce the redundant density distribution information and improve the efficiency of texture feature extraction.
To optimize the texture feature extraction from the selected patches, we propose MSPLBP based on pi-LBP. LBP is vastly used for texture description in various applications, and pi-LBP proposed by Lin and Qi [34] is based on LBP that can fully utilize the cross-scale correlation. The results indicate that pi-LBP can improve the robustness of LBP obviously. However, pi-LBP can't describe the texture size and the texture structure very well. These texture characteristics are helpful to improve the precision of texture feature description and optimize the malignance levels classification. Moreover, pi-LBP has high dimensions and is designed only for a 2D image. This not only affects the efficiency of the method, but also limits the expansion in spatial texture feature description. To solve these problems, MSPLBP expands the lateral scale to improve the description of the texture size and the texture structure, and merges the paths to reduce feature dimensions. The details of MSPLBP will be shown below.
The main difference between pi-LBP and LBP is shown in Fig. 9 that (a) presents the conventional multi-scale description of MS-LBP with a circle-like structure and (b) is pi-LBP using paths to combine neighborhood pixels across different scales. The code model of pi-LBP is generated by (37): where P is the number of bits in LBP code, KN is the number of nodes in a path, f = (f (1), . . . , f (KN )) is a filter satisfying KN i=1 f (i) = 0, g j,i is the grey value of the ith node in the jth path, the function s(x) is defined as 1 if x ≥ 0 and 0 otherwise. Additionally, pi-LBP also has rotation invariant with the following equation: if U (pi_LBP P,f ) ≤ 2, P + 1, otherwise. (38) where U (pi_LBP P,f ) denotes the number of bitwise transitions from 0 to 1, or 1 to 0 in the binary form of pi_LBP P,f . The path in pi-LBP is a gray value set of the nodes from each scale, which impacts the texture feature of the center pixel. There are 7 paths designed in pi-LBP as shown in Fig. 10.    As can be seen from Fig. 10, pi-LBP expands the paths into three scales vertically. But there is only one scale expended in the horizontal direction i.e., the leaf nodes 2 and 4 in (b), and the leaf nodes 3 and 5 in (c). The structure loses the texture width information which is important for describing texture structure and texture size. This affects the evaluation of ground glass degree and leads to inaccurate results in the malignancy level classification of lung nodule. Considering this problem, we design a path structure that expands the horizontal scale of leaf nodes as is shown in Fig. 11. With this structure, we can describe larger and more complex texture feature, and thus the precision of the GGO texture feature is improved in our method.
Besides, high dimension is also a problem in pi-LBP. As the dimension of pi-LBP is 680, the length of our texture feature vector will be 680 × 3 = 2040 if we use pi-LBP in spatial texture extraction. It will seriously affect the efficiency of our method. To solve this problem, we merge the paths at the same scale to reduce the number of paths because multiple paths is the main reason for high dimensions. The merged paths in MSPLBP are shown in Fig. 12.
As can be found from Fig.12, all of the merged parts are overlapping in pi-LBP and thus our approach can reserve the correlations between the nodes as pi-LBP. To further reduce the dimensions in our method, we design a set of filter weights shown in Table 1 where the node index is corresponding to the number in Fig. 12. Considering the central pixel tends to integrate with surrounding pixels when the scale is expanding, we regard all none-leaf nodes as a whole and set them to the same weight. In contrast, the weights of leaf nodes are set with the range [2,0]. With this set of filter weights, the dimension of the MSPLBP feature vector is dropped to (1 + 3 + 3) × 10 = 70 and thus the computational complexity is greatly reduced.

D. CLASSIFICATION
In this work, we select three classification methods i.e., NNCS [35], MWLSTSVH [36] and DLSR [37], to divide the joint feature vector of EOH and MSPLBP. These classification results are used to evaluate the performance of our system. As there are 6 levels in the result of a nodule malignancy classification, all the selected classifiers are proposed for the multi-classification problem.
NNCS is an improved neural network classifier utilizing the composite stumps to share features, and it adopts an adaptive stage-wise iterative method to generate network. NNCS consists of the input layer, one hidden layer, and the output layer. The parameters between the input layer and the hidden layer are estimated by a weighted linear regression with sparsity constraints, and other parameters are calculated by weighted least squares. NNCS simplifies the structure of the neural network which can improve the efficiency of computation with higher accuracy.
MWLSTSVH aims to solve multi-classification problems in SVM frameworks with the one-versus-rest method. This method introduces local density information into the LS-TSVH to reduce the impact of noisy samples and uses the Newton downhill algorithm to improve the efficiency. The results of computational comparisons with other classical multi-class classification algorithms show that MWLSTSVH achieves a better classification performance than the compared algorithms.
DLSR is proposed for multi-classification by enlarging the distance between different classes under the conceptual framework of LSR. In this approach, the Hadamard product of matrices is introduced to organize the ε-draggings for a compact model form, which translates well the one-versusrest training rule for multi-classification. As only a group of linear equations that needs to be solved in each iteration, DLSR has low time complexity in applications. Experiments show that this algorithm is comparable to classical algorithms.
In our training process, we adopt stratified 5-fold cross validation that divides the selected dataset into 5 groups corresponding to the stratified result of each malignancy level [41], and uses the 4 groups to train the system and the one group left to validate the system. It must also be mentioned that, to further reduce generalization error of the trained system, we divide the GGO samples in each malignancy into 3 categories i.e., pure GGO, mixed GGO, or solid opacity, and evenly distributed each category to the 5 groups.

III. EXPERIMENTAL RESULTS AND DISCUSSION
This section presents and analyzes the experimental results of our system for evaluating the performance of nodule malignancy classification. The experiment consists of two parts: (i) the evaluation of the proposed vessel segmentation method, and (ii) the comparative experiment among different malignancy level classification methods. In the first experiment, we aim at comparing the accuracy and the efficiency of our vessel segmentation method with the other related methods. And the goal of the second experiment is to investigate the performance of our joint EOH and MSPLBP descriptor for nodule malignancy classification by applying the NNCS, MWLSTSVH and DLSR classifiers. In addition, the improvement of malignancy classification after using our vessel segmentation method is exhibited in the second experiment. All experiments are conducted on 8GB RAM, Intel Core i7 processor with 3.60 GHz, and Windows 10 operating system. The method is implemented using the MATLAB R2018b Win64.

A. LUNG VESSEL SEGMENTATION EXPERIMENT
In this subsection, we present the process and the results of our method in vessel segmentation. Fig. 13 shows the whole process of our vessel segmentation method in three views. The first column in this figure shows the images generated by the threshold method. This method can suppress the noise and low-intensity areas in original CT images, and especially, separate weak connections between the vessels and juxta-vascular nodules (JVN) to restore the real lung nodules. It must also be mentioned that a low threshold is helpful to improve the completeness of vessels and nodules, but it may lead to the noise enhanced. Therefore, an appropriate gray threshold is important in our method, and in this experiment, we set it to 130.
Besides, the images in the second column display the response results of the Frangi filter. In these images, we can observe that the overall response ratio of tubular-like structure regions is obviously higher than blob-like structure regions i.e., partial vessels and nodules. Moreover, unlike the smooth margin nodules, the lobulated nodules have higher response in central regions, whereas lower response in each lobulation subregion. Although the overall response ratio of the lobulated nodules may be higher than the smooth margin nodules, it's still much lower than the typical tubular-like structure. Therefore, we can find from the images in this column that high overall response ratio is a remarkable characteristic of tubular-like structure comparing to blob-like structure.
Furthermore, the third column is the heat maps of the overall response ratio of Frangi filter in each connected region. From these images, we can find that the vessels with blob-like structure have the lowest response and the tubular-like structure vessels have the highest response. Notably, the heat of nodules is between these two kinds of vessels. The reason for that is nodules have irregular appearance. In general, the |R B | (0 < |R B | ≤ 1) in tubular-like structure is close to 0. The blob-like structure vessels are very similar to typical circle regions and their eigenvalues of the Hessian matrix are approximate, which makes their |R B | close to 1. Considering the nodules are irregular, their |R B | may be close to 0 in some subregions and this makes the whole nodule region tend to present more tubular-like structure than blob-like structure vessels. Therefore, the value of threshold T RF is critical for distinguishing vessels and nodules.
The fourth column shows the discriminating results of tubular-like structure with the threshold T RF = 0.92. The detected vessels are labeled red in the images. With this threshold, our method can separate tubular-like structure vessels from nodules and blob-like structure vessels accurately. Finally, the segmentation results are shown in the fifth column. It can be seen that the numbers of the detected vessels in each view are approximate and this means our multi-view vessel discriminating method can effectively improve the sensitivity of vessel segmentation. Figure. 14 illustrates 6 examples of vessel segmentation results. These results show that our method has high sensitivity for most vessel structures including bifurcation, junction suppression, and varying diameter. Recognizing vascular bifurcation is challenging, because this structure has diversity appearances, such as different intersection angles, branch quantity, and branch length. In our method, vascular bifurcation can be seen as a combination of many tubular-like structure regions. The Frangi filter can highly respond to each of these regions separately. Thus, the overall response ratio of the bifurcation will be very high. Similarly, junction suppression may disconnect the vascular branches from the main vessel at weak connection points, and this will reduce the sensitivity and accuracy of other vascular structure extraction methods. But it will not affect our method because we calculate the overall response ratio of each vessel segment separately. Additionally, the vessel with varying diameter, especially the small vessel which has weak structure characteristic, is difficult to be detected in many methods. As Frangi filter can enhance vessels at multi-scale and obtain local VOLUME 8, 2020 FIGURE 13. Illustrative vessel segmentation results using the proposed method in each view. Each group contains three images sequences that display the whole vessel segmentation results in three orthogonal views, respectively. From top to bottom: transverse plane, sagittal plane, and coronal plane. From left to right: original example images, response result images to Frangi filter, the likelihood heat maps of tubular-like structures, determination results with a given threshold, and segmentation results by the proposed method.It should be pointed out that, in the heat map, the range of response ratio i.e., the likelihood of tubular-like structure, from 0 to 100 corresponds to the cool color i.e., blue, to the warm color i.e., red, in heat color bar. maximum response from each scale, our method can perform well on the vessels with varying diameters including the small vessels.
However, it should be pointed out that our method misses many low-intensity vessels because these vessels are eliminated in the threshold method. Even so, we still can't reduce the threshold to improve the sensitivity of detecting these vessels due to this will increase the interference of noise. Moreover, some tubular-like lung tissue, such as trachea, are also segmented by our method. Although this may 40162 VOLUME 8, 2020 decrease the accuracy of vessel segmentation, it doesn't affect the classification of the malignancy level. Fig. 15 shows the results of applying different vessel segmentation methods to 2 examples with representative patches containing vascular bifurcations, small vessels, and JVN. In Fig.15, three other state-of-the-art vessel segmentation methods are selected for comparison: Ring Pattern Detector (RPD) [42], Centerline Extraction Method (CE) [21], and Weighted Symmetry Filter (WSF) [20]. Among them, RPD and WSF are proposed focusing on tubular-like structure enhancement. In RPD, ring-like patterns are sought in the local orientation distribution of the gradient to compute structuredness, evenness, and uniformness of vesselness which are used for enhancing tubular structures. To compare the performance of this vesselness filter with our method, the threshold method is used to segment vessels after filtering by RPD. Moreover, WSF uses a symmetry filter to enhance tubular structures in the given image and employs a graph-cut-based model for the enhanced vessel map to segment the vessels. Except for RPD and WSF, we also select a circle enhancement filter i.e., CE in our experiment. CE is a vessel segmentation method based on centerline extraction that tracks the vessel tree from a user-initiated seed pointing to the ends of the vessel tree. Comparing with the methods, we can evaluate the performance of our methods more comprehensively.
For a fair comparison, the parameters of these methods are optimized for the best performance as follows: RPD is set as a single scale of 1.2 mm, CE is involved with a scale parameter σ = 5, and WSF has two orientation parameters and we set them as θ 1 ∈ { π 16 , 2π 16 , 3π 16 , . . . , 15π 16 , π} and θ 2 ∈ { 2π 16 , 4π 16 , 6π 16 , . . . , 30π 16 , 2π}. Our proposed method has four parameters. Among them, θ, c are referred to Frangi filter and we set β = 0.50 and c = 15. The parameter T RF is set as a boundary between tubular-like structure regions and blob-like structure regions. Lower T RF tends to lose the restriction of blob-like deformations and make the FPs ratio increase, whereas upper T RF will rise the false negatives (FNs) ratio for tubular-like structure regions. T RL is used to ensure the overall outline of regions presenting a tubular-like structure. In this experiment, we have found that the best performance can be obtained if T RF = 0.92 and T RL = 2. Fig. 15(a) shows the vessel segmentation results of these related methods. It can be seen from that the RPD can detect the disconnected vessel segments and high-intensity regions in vessel confluence. But this method misses the low-intensity parts of a vascular cross-section which corrodes vessel tree and produces small fragments that affect nodules evaluation. CE can also detect vessels in confluence, and obtain complete vessel main tree. The drawback of this method is that it affected by vascular junction suppression seriously. This may cause small vessels and endpoints of the vessel tree to be missed. WSP has better performance on vascular junction suppression and bifurcations. However, similar to RPD and CE, it yields relatively low responses to the irregular weak vascular branches as shown in patch 2.
In sharp contrast, the proposed method can not only have better responses to bifurcations, and vessels confluences with both high and low intensities, but also recognize small vessels and disconnected vessel segments. The reason is that our method focuses on the local tubular-like structures and thus is less affected by the global irregular vascular deformation. This embodies the robustness and sensitivity of our method.
Besides, Fig. 15(b) shows the results of these methods on the vessels close to JVN. As can be seen from these images, the nodule in our method is the most complete, and all of the other methods break the morphological characteristic of the nodule. The vessels contained in the nodules are helpful to maintain the completeness of the nodules and can improve the accuracy of malignancy level classification. Thus, they should not be removed together with other vessels and our method is more suitable for vessel segmentation in this CAD system than the other methods in this experiment.
In Fig. 16, we choose three CT sets for building 3D models to illustrate the results of vessel segmentation. This figure displays the whole vessel trees which are segmented by the proposed method, including junction suppression, small vessels, endpoints or bifurcations, and the nodules are reserved for malignancy level classification. As can be found from the figure, a large number of endpoints are disconnected from the main vessel trees. It is one of the most challenging in lung vessel segmentation. With the proposed method, these regions can be detected sensitively, and this can reduce FPs significantly for the subsequent stage. Moreover, the nodules are well extracted by the proposed method, and the nodular features are reserved completely.
To quantitatively evaluate the performance of our vessels segmentation method, we use the Precision − Recall (PR) curve in our experiment because the PR curve performs better than the Receiver Operator Characteristic (ROC) curve when the number of negative samples, such as the background, is uncertain and greatly exceeds the positive samples [23]. Moreover, the F1_score is introduced to measure the comprehensive results of precision and recall under

Recall
where TP, FN , TN , FP are true positive, false negative, true negative and false positive number of lung nodules, respectively. The computed PR curves of the 4 methods are presented in Fig. 17.
As can be seen from the PR curves in Fig. 17, our proposed method takes the maximum AUC of the three methods, especially obvious with the CE. The reason is that our method recognizes vessels when tubular-like structures appear in any of the three orthogonal planes. This scheme not only improves the sensitivity, but also increases the robustness for detecting various vessel structures including ordinary vessel, bifurcation, segmental vessel or varying diameters vessels. Note that the precision of our method is lower than WSF when the recall is greater than 0.93. The reason is our method uses a single characteristic i.e., the overall response ratio to Frangi filter, to determine vessel. This means that if we lower the threshold T RF to improve the sensitivity for the vessel regions with inapparent tubular-like structures, other regions with important pathological features such as nodule, patch shadow or calcification, may be also segmented by mistake. Considering the purpose of our system is malignancy level classification for lung nodules, higher T RF is practicable to our system.
The average recall, average precision and average F1_score of these methods are shown in Table 2. For presenting the validity of our scheme that uses 3 orthogonal views for determining a vessel, we add other schemes that use different numbers of views to detect a vessel in our experiment. Two new views are involved with the experiment, namely normal direction of x + y + z = 0 and x + y − z = 0, where the x-axis is the normal direction of the transverse plane, the yaxis is the normal direction of the sagittal plane and the zaxis is the normal direction of the coronal plane. Thus, our experiment contains 5 situations that the number of views ranges from 1 to 5, and 3 views is the scheme of our method. As there are multiple choices in the cases of 2 views scheme and 4 views scheme, we use the maximum average F1_score of all the choices as the final result in the corresponding case.
As can be seen from Table 2, compared with the other three methods i.e., RPD, CE and WSP, our method has the highest average recall and this proves the effectiveness of our method to detect more general lung vessel structures. For the average precision, the proposed method is 1.75% higher than CE, whereas it is 2.25% and 2.80% lower than RPD and WSP, respectively. The main reason for this is the interference from the thick tracheal wall. They usually present ring-tubular structure and thus have a high overall response ratio like vessels. Moreover, the average recall of our method is 29.51% and 8.06% higher than the methods with 1 view and 2 views and is 1.3% and 3.43% lower than 4 views and 5 views. We can find that the growth of average recall falls sharply when the number of views is bigger than 3. This means that 3 views is the demarcation point of average recall growth in vessel detection. In contrast, the average precision is approximate between the methods with different numbers of views. Therefore, it can be demonstrated that the effect of our multiple views scheme mainly embodies in average recall, while the average precision mostly depends on our structure discriminating model.
In order to evaluate the computation performance of our method quantitatively, we also present the runtime of each method in Table 2. In our experiment, the runtime is the average elapsed time of the processing of the methods on 50 CT sets. As can be seen from Table 2, the proposed method achieves the shortest runtime compared with RPD, CE and WSP. It is worthwhile mentioning that the runtimes of RPD and WSP are 2.28 times and 2.55 times of the proposed method, respectively. Considering all indicators comprehensively, the proposed method has better performance because it sacrifices a small percentage of precision to exchange for a great improvement in runtime. Similarly, 3 views is the optimal balance of the average recall and the runtime among all the alternatives.

B. MALIGNANCY LEVEL CLASSIFICATION OF LUNG NODULE EXPERIMENT
The quantitative evaluation of our method for malignancy level classification is presented in this subsection. In order to ensure the objectivity of this experiment, all multi-classification problems are applied to the NNCS, MWLSTSVH and DLSR classifiers. Furthermore, this experiment is tested on 580 CT studies with 673 lung nodules, among which 220 CT studies and 360 CT studies are selected randomly from LIDC and LNUTCM, respectively.
Aiming to show the performance of our proposed MSPLBP, we select HOG [30], texture and edge descriptor (TED) [43], pi-LBP [34] and the combination of shape diagrams, proportion measurements and cylinder-based analysis (SPC) [44] for comparison that HOG is an edge feature descriptor, SPC is a shape feature descriptor, pi-LBP is a texture feature descriptor, and TED is a joint edge and texture feature descriptor. Additionally, we use accuracy (ACC), sensitivity (TPR), and specificity (SP) as shown in (42)-(44) to evaluate our method: Due to ACC, TPR, and SP only can be used in binary classification problems, and our work in this subsection is a multi-classification problem, we take the average value of each indicator in all malignancy levels as the final result. Besides, in this experiment, the parameter σ of NNCS is set to be 0.5, two penalty parameters of MWLSTSVH are set as c = 0.01, v = 0.01 and the kernel parameter of MWLSTSVH is set as σ = 0.5, and the only one parameter λ in DLSR classifier is set to 0.1. The experimental results are presented in Table 3.
Through Table 3, it can be observed that our feature descriptor outperforms all the other related methods. We also notice that among the five comparison methods, only TED can extract both texture and edge feature and, its performance is obviously better than the single feature descriptors i.e., HOG, pi-LBP, EOH, SPC and MSPLBP. This proves the effectiveness of data fusion used in malignancy levels classification of lung nodules. Moreover, concerning the edge feature descriptors, the average ACC of HOG in the three classifiers is 2.87% higher than EOH. The reason is that HOG makes finer blocks and normalized gradient histogram in each block, and thus it can describe more subtle edge features and insensitive to light changes. However, as the CT images in this stage have uniform illumination and simple background, the advantage of HOG is not obvious. The performance SPC is better than HOG and pi-LBP because it uses the shape diagrams and proportion measurements to represent the external 3D characteristics of nodules like lobulation and spiculation, and this description has high accuracy. However, the SPC uses single feature leading to the generalization not well, thus its accuracy is lower than the fusion feature descriptor. Furthermore, comparing to pi-LBP, MSPLBP increases the average ACC by 6.13%. This means that our improvement in pi-LBP is effective.
The runtime of these methods in training and testing is shown in Table 4. EOH takes the least time of all the methods while HOG is the most time-consuming method that takes average more 144.31 seconds at the training stage and 0.220 seconds at the testing stage than EOH. Considering the advantage of HOG in the average ACC of Table 3 is not obvious, EOH is more appropriate for our work to extract the edge feature. On the other hand, MSPLBP takes less average 56.23 seconds at the training stage and 0.234 seconds at the testing stage than pi-LBP. As one can see, MSPLBP achieves not only higher ACC, but also less runtime than pi-LBP. Additionally, TED consumes more average 7.82 seconds at the training stage and 0.0137 seconds at the testing stage than the joint EOH and MSPLBP which means our descriptor has excellent performance on efficiency.
In order to demonstrate the superiority of our method for lung nodule malignancy level classification, we select three related methods for comparison i.e., 3D Texture Features and 3D Margin Sharpness (TF&MSF) [45], Multi-crop Convolutional Neural Network (MC-CNN) [6] and Hybrid model [46]. Moreover, to display the contribution of our vessel segmentation process, we add a method in the experiment that extracts features using joint EOH and MSPLBP without the vessel segmentation process. Thus, our connected region clustering method is not applicable in this situation due to  the number of vessel areas is large. In order to evaluate our method more accurately, we divide all lung nodules into pure GGO, mixed GGO, or solid opacity, and use ROC curves to show the performance of the methods on each kind of nodule. In this experiment, all the methods use the MWLSTSVH for classification and the results are shown in Fig. 18.
It can be observed from the results presented in Fig.18 that our proposed method takes the maximum AUC for the three kinds of nodules, especially obvious with the mixed GGO. Note that the worst performance of our method appears on pure GGO that is only 0.0007 more than the Hybrid model. This may be ascribed to the fact that many pure GGOs have extremely low intensities in CT slices which makes them removed at the stage of threshold processing, and thus the sensitivity of our method decreases. But the ground-glass part in mixed GGO usually has higher intensities than pure GGO and they can be retained more completely. In addition, as the texture feature of mixed GGO is significantly different from lung tissue, the false positive ratio of mixed GGO is lower than solid opacity. Notably, Fig.18 also shows that our method without the vessel segmentation process has the minimum AUC on each kind of nodules. The main reason is the similarity between solid nodules and blob-like vessels in edge feature and texture feature, will decrease the specificity of our feature descriptor. This means that vessel areas can seriously affect our feature descriptor.
To reveal the overall evaluation of our method, we present the average ACC, TPR, and SP of the related methods in Table 5. In order to optimize the comparison methods, we select the classifiers corresponding to the original articles that MLP Multilayer Perceptron (MLP) is used in TF&MSF, SVM is used in the Hybrid model. As can be observed, our system obtained an ACC of 95.88% which is higher than the values reported in other related works. Obviously, the vessel segmentation process improves our method by 7.54% in ACC. Table. 6 presents the runtime of the training and testing stage in each method. To ensure the objectivity of the experiment, all the methods in this experiment use MWLSTSVH to be the classifier. It should be pointed out that the testing time is the average time consumed on each CT set for the methods. As can be observed, our system takes the least time at the training stage and testing stage. One of the reasons for that is the length of our feature vector is very low that reduces the time spent in the classifier. The other reason is our system segments a large number of vessel regions with low computational complexity at the testing stage and thus only a few ROIs need to extract feature.
The encouraging results of experiments with the same database show the effectiveness of our system on lung nodule malignancy level classification. Furthermore, the performance of the proposed vessel segmentation method demonstrates great potential for our system.

IV. CONCLUSION
This work presents a fast and efficient system to evaluate the malignancy level of lung nodules. As a large number of FP nodule regions will decrease the accuracy and efficiency of feature classification in malignancy evaluation, we propose a novel vessel segmentation method to reduce the FPs. The method can remove vessels fast, sensitively and robustly by using a multi-view discriminating scheme. And the usage of the enhancement result of Frangi filter in this method to detect tubular-like structures is helpful to ensure the completeness of lung nodule regions. Furthermore, we optimize spatial edge features and spatial texture features extraction of lung nodule to improve malignancy level classification. In this method, much redundant feature information is reduced, and thus the efficiency of the system can be promoted greatly.
The performance of our method is demonstrated in two aspects: vessel segmentation and malignancy level classification. In the experiment of vessel segmentation, the proposed method achieves promising results in average recall (84.78%), average precision (72.46%), average F1_score (78.14%) and AUC value (0.8149). The experimental results prove the robustness of the usage of the overall response ratio of Frangi filter to detect more general vessel structures, such as junction suppression, varying diameter, endpoint or bifurcation. Moreover, the results also demonstrate that our multi-view scheme can improve the accuracy of differentiating between vessels and nodules. Besides, the average elapsed time of our vessel segmentation method for a CT set is 171.6 seconds which is less than the other related methods. This indicates that our multi-view scheme with the overall response calculation is less time-consuming and can save time for our CAD system. For the experiment of malignancy level classification, we compare the proposed joint EOH and MSPLBP with the other feature descriptor including single feature methods and multiple features methods. The experimental results show that MSPLBP increases the ACC by average 6.13% and reduce the average testing time by 0.234 comparing to pi-LBP. Finally, comparing with other CAD systems of malignancy level classification for lung nodules, our system obtains the best accuracy of 95.88% and the least timeconsuming. Although some methods based on deep learning model [47] have achieved higher accuracy in the classification of lung nodule malignancy, we still solve the problem from the traditional perspective in the system because this method has its own advantages. Firstly, compared with deep learning, the traditional machine learning requires smaller data volume and shorter training cycle. In addition, the traditional machine learning method is easier to adjust the model by using the prior knowledge of doctors and thus the whole training process can be controlled.
In summary, by using the proposed methods, the CAD system achieves fast and efficient classification of malignancy level for lung nodules. Our system is still under development and future works should be directed at improving the specificity of vessel segmentation and reducing the influence of binarization.