Research on the Auxiliary Classification and Diagnosis of Lung Cancer Subtypes Based on Histopathological Images

Lung cancer (LC) is one of the most serious cancers threatening human health. Histopathological examination is the gold standard for qualitative and clinical staging of lung tumors. However, the process for doctors to examine thousands of histopathological images is very cumbersome, especially for doctors with less experience. Therefore, objective pathological diagnosis results can effectively help doctors choose the most appropriate treatment mode, thereby improving the survival rate of patients. For the current problem of incomplete experimental subjects in the computer-aided diagnosis of lung cancer subtypes, this study included relatively rare lung adenosquamous carcinoma (ASC) samples for the first time, and proposed a computer-aided diagnosis method based on histopathological images of ASC, lung squamous cell carcinoma (LUSC) and small cell lung carcinoma (SCLC). Firstly, the multidimensional features of 121 LC histopathological images were extracted, and then the relevant features (Relief) algorithm was used for feature selection. The support vector machines (SVMs) classifier was used to classify LC subtypes, and the receiver operating characteristic (ROC) curve and area under the curve (AUC) were used to make it more intuitive evaluate the generalization ability of the classifier. Finally, through a horizontal comparison with a variety of mainstream classification models, experiments show that the classification effect achieved by the Relief-SVM model is the best. The LUSC-ASC classification accuracy was 73.91%, the LUSC-SCLC classification accuracy was 83.91% and the ASC-SCLC classification accuracy was 73.67%. Our experimental results verify the potential of the auxiliary diagnosis model constructed by machine learning (ML) in the diagnosis of LC.


I. INTRODUCTION
Lung cancer (LC) is one of the most common malignant tumors worldwide [1]- [3]. According to the 2018 International Agency for Research on Cancer statistics, there will be 2.1 million new cases of LC and 1.8 million deaths worldwide [4]. Due to its high morbidity and mortality, it has become one of the most serious cancers threatening human health. Clinicians' visual analysis of LC histopathological The associate editor coordinating the review of this manuscript and approving it for publication was Kathiravan Srinivasan .
images is one of the most important methods for evaluating LC subtypes [5]. However, it is complicated and challenging for pathologists to review thousands of histopathological images, and it is even more difficult for doctors with less experience [6]. Therefore, to relieve the pressure on doctors and improve the accuracy and efficiency of diagnosis, it is particularly important to study the computer-aided diagnosis model of LC [7].
From the perspective of pathology and treatment, LC can be divided into non-small cell lung carcinoma (NSCLC) and small cell lung carcinoma (SCLC), of which 80%-85% are NSCLC and the rest are SCLC [8]. The main histological types of NSCLC are lung adenocarcinoma (ADC) and lung squamous cell carcinoma (LUSC). The other histological types of NSCLC are lung adenosquamous carcinoma (ASC), large-cell carcinoma [9]. In particular, ASC is a relatively rare subtype of NSCLC that accounts for 0.3-5% of all NSCLCs [10]. Due to the different histopathological types of LC, the treatment methods adopted are also different. When the lung tissue classification is determined, the appropriate treatment mode can be selected, such as the reasonable application of surgery, chemotherapy, radiotherapy, molecular targeted therapy and immune therapy. In addition, LC screening errors can be avoided, clinicians' multifarious work pressure can be slowed, patients' survival time can be maximized and the patient's quality of life improved.
The LC imaging examination methods mainly include: (1) X-ray photography, which is one of the most basic lung imaging examination methods, but the resolution of the photography is low and there are blind spots in the examination [11]. (2) Computed tomography (CT), specifically, chest CT has advantages in detecting early peripheral LC and identifying the location of the lesion. Currently, it is one of the most commonly used imaging methods for preoperative diagnosis and staging of LC [12]. However, one of the limitations of using CT examinations is that for patients undergoing repeated examinations, it is necessary to consider the impact of the radiation dose produced by the operation [13].
(3) Magnetic resonance imaging (MRI) has high sensitivity and specificity for vertebral and bone metastases, but it is not recommended for routine diagnosis of LC [13], [14]. (4) Ultrasound is a non-invasive tool, which is usually superior to radiography in the examination of postoperative pulmonary complications (PPCs). It has developed into a valuable method [15]. Although the above imaging examination methods play an important role in detecting of LC, the results of each examination are only used as a reference for the diagnosis, staging, re-staging, efficacy monitoring and prognosis evaluation of LC, while histopathological examination is the gold standard for tumor qualitative and clinical staging [16], [17].
However, due to the complex texture features of histopathological images, as far as the authors know, there is no computer-aided diagnosis method for ASC, LUSC and SCLC based on histopathological images. To fill this gap, this paper includes ASC sample data for the first time and introduces a computer-aided model for automatic classification of LC subtypes based on histopathological images of LC. First, seven texture analysis methods are used to extract 265-dimensional features of LC histopathological images, and the relevant features (Relief) algorithm is used for feature selection. Compared with a variety of machine learning (ML) models, the Relief-SVM model obtains optimal performance for LC subtype classification [18]. The main contributions of this study are as follows: • This study was the first to include relatively rare ASCs in lung histopathological images and applies them to the automatic classification of LC subtypes.
• This paper was the first to apply the Relief-SVM algorithm to the classification of LC histopathological images, which demonstrates the tremendous potential of ML algorithms to be used in the diagnosis of LC subtypes.

FIGURE 1.
Overall flowchart of the auxiliary classification and diagnosis of LC subtypes. Explain some abbreviations in the figure: gray-level gradient co-occurrence matrix (GLGCM), gray-level co-occurrence matrix (GLCM), local binary pattern (LBP), gray-level difference statistics (GLDS), the least absolute shrinkage and selection operator (LASSO). Figure 1 shows the overall flow chart of the auxiliary classification and diagnosis of LC subtypes. In the following sections, we will introduce them in detail.  To better display the image details, two pathological tissue images of each category were selected for display in this study, of which A was LUSC, B was ASC, and C was SCLC.

II. MATERIALS AND METHODS
22 had LUSC, 27 had ASC and 45 had SCLC. All subjects had signed informed consent in this retrospective study. For improving the generalization performance of the model, this study took an average of 1-2 pathological tissue sections from each patient as the research object, and finally selected 121 histopathological images [19], [20]. According to the classification standards provided by the World Health Organization, LC patients were collected from the hospital pathology database, and the relevant pathological diagnosis results were confirmed by pathology. The detailed information is shown in Table 1. All tumor tissues included in the study were made into histopathological sections by hematoxylin and eosin (H&E) staining [21], which were collected through the microscope of the hospital pathology department and stored as JPG image files with four resolutions: 744 × 554, 2048 × 1536, 1024 × 768, 640 × 480 [6].

B. HISTOPATHOLOGICAL IMAGE PREPROCESSING
Histopathological examination is the gold standard for qualitative and clinical tumor staging [22]. Histopathological images have been widely used by doctors for diagnosis and treatment, and are an important basis for predicting patient survival [23]. The histopathological images of LUSC, ASC and SCLC are shown in Figure 2. According to reports, the following problems exist in histopathological images: • The histopathological images are faced with a large number of rich geometric structures and complex textures caused by the diversity of structural morphology [24].
• The histopathological images are easily affected by color differentiation and noise due to external reasons such as illumination conditions [25].
• Due to the difference in microscope magnification, equipment parameters and other factors in histopathological images, the image size and resolution are not the same [6].
• Texture features such as local microvessels in histopathological images are the key to disease diagnosis, and the extraction of features is of great significance to assist in the diagnosis and classification of LC [26]. For all these reasons, the histopathological images presented to us are often not perfect. As an efficient low-pass filter, the Gaussian filter has shown good filtering performance in both the spatial domain and frequency domain, and is widely used in image processing to reduce noise [27]. Generally, the pixel value of each point is replaced by the weighted mean of its neighbourhood, so Gaussian filtering is the process of weighted averaging of the whole image. The histopathological images processed by Gaussian filtering tend to be smoother and contain less noise, which is ready for our subsequent studies [28]. According to the Ciompi et al. description, the existence of differences such as color destandardization in histopathological images will limit the interpretation of histopathological images by inexperienced pathologists, and the color differences will affect the performance of the automatic diagnosis model [25]. To avoid the problem of information loss due to excessive brightness of histopathological images [26], we adopt the adaptive histogram equalization algorithm (AHE) to normalize the color of histopathological images in this study [26]. In this mode, the image is divided into several 8×8 pieces, and then the histograms of multiple local regions of the image segmentation are calculated. At this point, the brightness is redistributed in the 8 × 8 area to change the image contrast. Therefore, this algorithm is more beneficial for improving the local contrast of the image and obtain more image details [26]. According to the investigation, the improvement in model classification performance by the color normalization method is limited. The image after the above image preprocessing operation is shown in Figure 2(b). Figure 3 shows the details before and after the preprocessing operation of the ASC histopathological image.

C. FEATURE EXTRACTION OF HISTOPATHOLOGICAL IMAGES OF LUNG CANCER
Because the rich features presented in histopathological images are an important basis for clinicians to carry out diagnosis, the effective extraction of image features is the key to improving the accuracy of computer-aided diagnosis [29]. This paper explores the effect of two feature extraction methods on LC histopathological image classification.

a: LBP
The LBP method for texture feature extraction was proposed in 1994 [30]. However, the greatest defect of the traditional LBP algorithm cannot satisfy the requirement of textures of different sizes and frequencies. It only covers a small area within a fixed radius, which makes it impossible to extract the features of the whole histopathological image perfectly. To compensate make up for this deficiency, and to meet the requirements of a constant grey level and rotation, this paper uses a circular neighbourhood to replace the traditional square neighbourhood LBP algorithm [31]. The LBP algorithm has the advantages of simple calculation, efficient recognition, good texture feature display and low computational complexity [32], [33].
As the traditional LBP algorithm cannot efficiently extract the texture features from the large-scale image structure, the improved LBP operator in the circular domain is adopted in this study to allow the existence of N pixels in the circular domain with a radius of R. Using the above theory, LBP feature processing was performed on the histopathological  GLCM is a regular method for describing the texture features by studying the spatial correlation characteristics of grey images. Calculating the joint probability density of two pixels positions to generate the co-occurrence matrix, the co-occurrence matrix is statistical second-order statistical characteristics of image brightness changes; it not only reflects the characteristics of the distribution of brightness, it also reflects the same or close to the brightness of the pixel brightness between the location of the distribution characteristics, thus constituting a set of texture features to better show the image features [34], [35]. However, due to the existence of a large number of co-occurrence matrices, it is generally not used directly in the field of image processing. Instead, texture feature quantities are extracted again according to the grey co-occurrence matrices. The commonly used local steady characteristic is the energy E, entropy H, moment of inertia of I, and correlation C. Thus, this study will consider these four with the mean and standard deviation for subsequent image recognition and the classification statistic formula (such as (1)-(4)), and three points in the lung cancer model experiment on the image to extract the characteristics of the above four mean (MN) and standard deviation (SD) as shown in Figure 6. Correlation The grey-gradient co-occurrence matrix synthesizes the grey level and gradient information existing in the image; that is, the image gradient information is added to the greylevel co-occurrence matrix so that the gradient information is mixed in the grey-level co-occurrence matrix and the image feature extraction effect is often better [31], [36]. Based on the standardization of the grey-gradient co-occurrence matrix, we can calculate a series of secondary statistical characteristics. Fifteen is adopted in this study for the commonly used digital features: small grads dominance, big grads dominance, gray asymmetry, grads asymmetry, energy, gray mean, grads mean, gray variance, grads variance, correlation, gray entropy, grads entropy, entropy, inertia, differ moment. After processing the GLGCM on lung cancer histopathological images, fifteen features were extracted from each type of sample and are displayed in Figure 7.

d: OTHER HANDCRAFTED TEXTURE EXTRACTION METHODS
The moment feature mainly expresses the geometric features in the image area. It has invariant characteristics such as rotation, translation, and scale, and is often called an invariant moment [37], [38]. The moment invariant function has been widely used in image pattern recognition, classification, target recognition and other tasks. Therefore, this paper uses Hu invariant moments for the feature representation of lung cancer histopathological images. The seven features of torque change describe the shape features in the tissue images [38]. The gray difference statistical method is one of the detection methods based on statistical texture features [39]. It describes the gray level changes between each pixel of the texture image and its neighboring pixels. The method is simple and easy to implement. In addition, texture feature extraction methods such as Markov random field and wavelet transform are becoming more and more active in the field of pattern recognition [40]- [42].

2) CNN AUTOMATICALLY EXTRACTS HISTOPATHOLOGICAL IMAGE FEATURES OF LUNG CANCER
CNNs have gradually developed into one of the mainstream algorithms in the field of computer vision after decades of development [43], [44]. The back-propagation strategy proposed by LeCun et al. in 1990 has always been regarded as the origin of CNN [45]. With the AlexNet network structure proposed by Krizhevsky et al., the AlexNet structure that focuses more on details and complexity and has better performance stands out [46]. It won the 2012 ImageNet competition champion and made a major breakthrough in the image recognition task, which had a strong impact on the field of VOLUME 9, 2021  computer vision at that time. With the gradual improvement of people's matching structure, CNN has been applied to different fields of human life [47]- [50], with remarkable effects in image classification, target detection, face recognition, natural language processing and other related fields [51].
To date, the modern CNN architecture mainly consists of the following five parts: the convolution layer, the pooling layer, the activation function, the dropout rate (optional) and the full connection layer. CNN's feature extraction process and the doctor's diagnosis have a certain similarity, and those closer to the bottom of the layers in the convolution-based coding tend to learn more general features and the extracted features can be partial, with high generality characteristic figures and edges, such as visual features (such as color and texture), and those closer to the top of the layer are more abstract with more specialized characteristics. In Figure 8, this study shows the layered features extracted from lung cancer pathological tissue images using VGG16 pairs to verify the reusability of the underlying features [6].

D. FEATURE SELECTION, FEATURE DIMENSION REDUCTION AND FUSION
The 265-dimensional features were fused together by serial fusion. Problems such as the dimensionality disaster caused by too many sample attributes will adversely affect the performance of the model [52]. Therefore, it is extremely important to retain key features and eliminate irrelevant features. Feature selection often reduces the difficulty of learning tasks and is more conducive to improving operating efficiency [53]. Relief is a well-known filtering feature selection method. This method designs a ''correlation statistic'' to measure the importance of features. The LASSO  algorithm is more conducive to reducing the risk of overfitting and obtaining a ''sparse'' solution [54]- [60]. Therefore, this study compared the no feature selection, LASSO feature selection, the relief feature selection model performance and the linear discriminant analysis (LDA) feature dimension reduction [61], [62]. Through a contrast experiment, we chose the relief feature selection. The relief feature selection diagram was shown in Figure 9. Figure 10 shows the LASSO feature selection algorithm using 10-fold cross validation. The Lasso algorithm formula was shown in equation 5.

E. CLASSIFIER DESIGN AND IMPLEMENTATION
Support vector machines (SVMs) have gradually developed into the mainstream technology of machine learning and have been widely used in every field of life [63]. The basic SVM model is often defined as finding the hyperplane with the largest interval that is most suitable for the classification of samples in the feature space, which makes the constructed linear classifier have stronger generalization performance. Based on the investigation, it was found that the kernel function can better map low-dimensional feature data to high-dimensional data, and the kernel function applied in different fields is not the same. To make features more easily separated or better structured, we carried out a detailed experimental comparison of four kernel functions and the three optimization algorithms in the following experiments to make the SVM learner achieve the optimal classification effect [64], [65].
In addition, we also conducted parallel comparisons with the Back Propagation (BP) Neural Network Classifiers and K-Nearest Neighbor (KNN), Decision tree [29], Naive Bayesian (NB) classifier [13] to find the best classification model.

F. PERFORMANCE EVALUATION
For the purpose of evaluating the reliability of the model, the receiver operating characteristic (ROC) curve was drawn to evaluate the effect of the classification model, and the area under the curve (AUC) value was quantitatively displayed [66]. According to their true category and learners' predicted category, the samples were divided into the following four conditions: true positive (TP), true negative (TN), false negative (FN) and false positive (FP). The vertical axis of the ROC curve represents the true positive rate (TPR) and the horizontal axis of the ROC curve represents the false positive rate (FPR), defined as Equations (6) and (7), respectively.

III. EXPERIMENTS AND RESULTS
The experiment in this study was based on the PyCharm platform and MATLAB R2016a platform [67], [68]. The LIBSVM toolbox was used for statistical analysis of the SVM algorithm [69]. All deep learning experiments were implemented in the KERAS deep learning framework based on the TensorFlow 2.0.0 model by using the Python3.6 programming language [70], [71]. Handcrafted texture extraction methods and SVM support vector machine classification were deployed in MATLAB R2016a platform. The experiments in Tables 3, 4, 5 and 6 were repeated ten times, and the average value was used as the experimental result.

A. SELECTION OF THE FEATURE SELECTION METHOD, KERNEL FUNCTION AND OPTIMIZATION ALGORITHM
In this study, no feature selection, LASSO feature selection, Relief feature selection and LDA feature dimension reduction were compared, and four kernel functions and three optimization algorithms were used to attempt to construct the best auxiliary diagnostic model. The three optimization algorithms include grid search-SVM (GS-SVM), particle swarm optimization-SVM (PSO-SVM), and genetic algorithms based on SVM (GA-SVM). The experimental results are shown in Tables 3, 4, 5 and 6. The experimental parameters are shown in Table 7 [18]. According to the comparison of experimental results, the optimal combination of the Relief feature selection, particle swarm optimization algorithm and polynomial kernel was finally selected in this study for the subsequent classification and diagnosis of LUSC, ASC and SCLC.

B. COMPARISON OF THE CLASSIFICATION EFFECT OF LC HISTOPATHOLOGICAL IMAGES WITH OTHER MODELS 1) COMPARISON WITH OTHER CLASSIFIERS
In this study, several mainstream classification algorithms, such as the NB classifier and KNN classifier, were compared with the Relief-SVM model. As seen in Table 8, the Relief-SVM model used in this paper can achieve an overall accuracy of more than 70%, and the classification accuracy of LUSC-SCLC achieved 83.91%. This is a satisfactory result, which can help the application of cytology in the diagnosis of lung cancer. In addition, the classification accuracy achieved by the Relief-SVM model is 27.39% higher than the NB model in LUSC-SCLC and 24.34% higher than the SVM model. We drew the ROC curves of the Decision Tree, BP, KNN, SVM, LASSO-SVM, RELIEF-SVM,  LDA-SVM and NB as shown in Table 9 and the AUC values are shown in Table 10. As seen in Table 10, the AUC value of Relief-SVM for LUSC-SCLC classification increased by 0.3625 compared with BP and 0.3250 compared with LASSO-SVM, and the AUC value of Relief-SVM for LUSC-ASC classification increased by 0.2208 compared with KNN. The results show that Relief-SVM has good feasibility.

2) COMPARISON WITH CNN
To make our model robust to feature transformation, firstly, the individual data set is randomly divided into 70% for training and 30% for verification. Then, the method of flipping, translation, and multiangle rotation was used to expand the original data for 100 times respectively [72]- [75]. Table 11 shows the number of images in the augmented training set and validation set. Inspired by the researchers [76]- [81], we trained the CNNs model, fine-tuning the pre-trained CNNs model, and custom the CNNs model to conduct experiments on the data set after data augmentation [6], [82].
The custom model architecture were shown in Fig 11. The main experimental parameters of fine-tuning the pre-trained CNNs are shown in Table 12 [6], [78]. Figures 12,13,14, and 15 respectively show the accuracy and loss rate curves under different experimental backgrounds. It can be seen from Figures 12 and 13 that the accuracy and loss curves of the four classic CNNs have large fluctuations. Under the guidance of Liu et al. [6], although there are large differences between the natural images contained in the ImageNet and medical images, according to the survey, fine-tuning pre-trained CNNs models can reduce the problem of the limited number of samples to a certain extent, and obtain good classification performance. We adjust the model by adding dropout, l2 regularization in the experiment, using different optimization algorithms, and adjusting the learning rate, we show in Figure 16, 17 and 18 the change in the accuracy of the verification set under different combinations of dropout rates and learning rates. But the effect of fine-tuning the pre-trained CNN model is still very poor.
In addition, we also customize the CNN model to classify VOLUME 9, 2021 lung cancer subtypes, and the experimental results are still not ideal. Therefore, when the availability of the original medical data set is limited, the input information still comes from a small number of original images, so the input information is still highly relevant. We cannot generate new information, but can only mix existing information, and cannot completely eliminate the impact of overfitting. The trained CNNs model, fine-tuning pre-trained CNNs structure, and custom model show poor classification performance when facing a small number of medical images [83], [84]. On the contrary, the classification effect of Relief-SVM model is better, which is more meets our expected effect.

IV. DISCUSSIONS
Histopathological examination is the gold standard for qualitative and clinical staging of lung tumors. Based on the local microvascular morphology, nucleus and cytoplasm on the histopathological images, doctors can diagnose tumor lesions. But for inexperienced or less experienced doctors, this is still a very difficult task. This study uses traditional feature extraction methods, trained CNNs, finetuning pre-trained CNNs models, and custom models to divide lung histopathological images into LUSC, ASC and SCLC. Firstly, we extracted 265-dimensional features using seven handcrafted texture extraction algorithms, selected LDA feature dimensionality reduction algorithm, Relief and LASSO feature selection algorithms, and used a variety of mainstream classification algorithms for horizontal comparison, which proved that the Releif-SVM model has the best classification performance. The above algorithms all have their own advantages, but the LDA algorithm is more suitable for data with Gaussian distribution, so the performance of LDA-SVM in this research is poor [85]- [87]. We also selected four typical CNN models, and trained VGG16, ResNet50, DenseNet201, and InceptionV3 under the optimization of SGD algorithm and Nadam algorithm. In the experiment of    trained the CNNs model, the performance of ResNet50 and DensNet201 under the Nadam optimization algorithm is better than that of VGG16 and InceptionV3. On this basis, we apply the Nadam optimization algorithm to fine-tune the pre-trained CNNs model and custom models to try to find the best classification model. However, compared with the classification performance of Relief-SVM on the histopathological images of lung cancer subtypes, the DenseNet201 model still has a lower accuracy. In addition, we explored the collocation and combination of different dropout rates and learning rates under different experimental backgrounds. The results of Fig. 16, 17, 18 shown that there is still an  overfitting phenomenon. However, the good news is that trained DenseNet201 shows good performance in these three combinations.
The results show that the Relief-SVM model built by machine learning is more suitable for the classification of lung histopathology images with a small data set. It means VOLUME 9, 2021  that for a limited number of data sets, especially small medical data sets, machine learning methods are often more effective for classification problems [84], [88]- [90]. In addition, the CNN features extracted from the DenseNet201 structure also prove from the side that it also plays a role in the task of classification of lung cancer subtypes.
According to the survey, the patch-based image classification method has been widely used in various histopathological images, which to a certain extent alleviates the problem of difficult and small number of medical image collection. However, due to the complexity of the texture of the tissue image and the time-consuming process of image annotation, there are currently few medical image annotation   data with a large number and comprehensive labeling data. In addition, since only image-level tags exist in the data set, how to distinguish between patch-level tags and image-level tags is a huge challenge for patch-based classification methods.
It is worth noting that in previous work, there have been studies on classification of lung cancer subtypes based on computer-aided diagnosis. However, according to research, there has not been any study that includes histopathological images of ASC for automatic classification. Therefore, this study included ASC samples for the first time. The Relief-SVM model used in this paper can achieve an overall accuracy of more than 70%. This is a satisfactory result, which can help the application of cytology in the diagnosis of lung cancer. In addition, we also summarized the research related to this article as shown in Table 13 [21], [91]- [96].
In summary, the classification capabilities of the Relief-SVM model is superior to the trained CNNs, fine-tuned pre-trained CNNs model, and custom models. However, our work also has some limitations. On the one hand, this study collected a limited number of training set images from the initial stage to establish a more accurate classification accuracy, and did not try other numbers of training set images. If medical image data from multiple centers with different classifications can be used for research, the diversity of the data will be further enhanced and the classification performance of the model will be improved. In addition, due to the general influence of external factors, the data set is susceptible to color and noise. The use of the above image preprocessing methods has certain limitations and may potentially affect the test results. Therefore, when we face more abundant data sources in the follow-up research, we should pay more attention to the preparation and collection of histopathological images. On the other hand, if a larger amount of lung histopathology image data can be included for research, we will obtain more complete disease information, which will further improve the generalization of the classification model. Therefore, the histological subtype classification of lung cancer based on histopathological images is worthy of our further study to provide clinicians with more objective auxiliary diagnosis results.

V. CONCLUSION
This paper was the first to apply the Relief-SVM algorithm to the classification of LC histopathological images. We included 121 histopathological images of LUSC, ASC and SCLC. The experimental results show that Relief-SVM has the best classification performance in distinguishing lung cancer subtypes, which provides a favorable guidance for the classification of lung histopathological images. And it also provides a correct direction for the classification of medical images with complex texture tissue. In addition, this paper analyzes and explores the classification effects under the traditional manual method of extracting features, trained the CNNs model, fine-tuning pre-trained CNNs models, and custom models to find which classification model is more suitable for the classification of different subtypes of lung cancer.
The experimental results show that the Relief-SVM classification model achieves best classification performance regardless of whether it is compared with CNN models which are trained directly by using the histopathology image data set or compared with the fine-tuning pre-trained CNNs models and custom models. In other words, when faced with a small number of medical images, traditional manual methods have shown great potential in the classification of lung histopathological images. However, the CNN models which are trained directly by using the histopathology image data set and the fine-tuning pre-trained CNNs models and custom models CNN model show weak classification performance.
Notably, our study has a considerable guiding function for doctors with less experience, which can provide objective reference results and relieve the heavy work pressure of doctors. It is feasible and practical to a certain extent. We will continue to collect more samples from different central institutions and build a lung histopathological image database for lung diseases in the future.

VI. CONFLICTS OF INTEREST
The authors have no relevant financial interests in this article and no potential conflicts of interest to disclose.
MIN LI received the bachelor's degree in computer science and technology from Shandong Women's University, China, in 2019. She is currently pursuing the master's degree in software engineering with Xinjiang University. Her current research interest includes medical image processing.
XIAOJIAN MA has been working with the Department of Information Management, The Affiliated Cancer Hospital of Xinjiang Medical University, since 2010. For ten years, he has focused on the construction of hospital information management, has been committed to the research on the security and reliability of the core information system of large hospitals, and has rich practical experience in the construction and management of hospital information.