Classification for Dermoscopy Images Using Convolutional Neural Networks Based on the Ensemble of Individual Advantage and Group Decision

Skin cancer is a common and deadly cancer. Dermoscopy is an effective tool for the observation of abnormal skin pigmentation. However, dermoscopy images are extremely complex and present great challenges for diagnosis. Therefore, we proposed a classification method based on the ensemble of individual advantage and group decision in dermoscopy images, including the ensemble strategy of group decision, the ensemble strategy of maximizing individual advantage, and the ensemble strategy of block-integrated voting. We used generative adversarial networks (GANs) to create a balanced sample space to better train convolutional neural networks (CNNs). Through transfer learning, the pre-training CNNs were used for fine-tuning, then the effects of different CNNs on the classification of different categories of dermoscopy images were compared, and the CNNs with better classification effect were selected for the ensemble of different strategies. This study is based on the ISIC 2018 dataset and ISIC 2019 dataset. Compared with the different individual CNNs and the frameworks, the proposed ensemble strategies achieve a better improvement in the evaluation criteria.


I. INTRODUCTION
Skin cancer is one of the most common cancers in humans and is easily confused with other skin diseases. Skin cancer is particularly common in the United States [1]- [3]. Nearly 5 million adults are treated for skin cancer annually, with average treatment costs of $8.1 billion each year [1]. In the United States, 95,830 cases of melanoma are newly diagnosed in 2019 [2]. Skin cancer is a serious problem for the world. The skin cancer rates is higher as compared to other cancers [4]. In Australia, more than 14,000 new cases of melanoma are reported yearly, leading to nearly 2,000 deaths [5]. In Europe, over 100,000 new melanoma cases and 22,000 melanoma related deaths are reported annually [6]. Malignant melanoma is the deadliest form of skin The associate editor coordinating the review of this manuscript and approving it for publication was Jiju Poovvancheri . cancer, which accounting for 79% of skin cancer deaths [7]- [9]. Early diagnosis is of great importance for treating skin cancer as it can be cured better at early stages [7]- [10]. Considerable attention has been paid to dermoscopy research in the field of dermatological research. Cascinelli et al. [11] first applied dermoscopy technique to the clinical diagnosis of malignant melanoma. Dermoscopy is currently one of the most effective tools to assist dermatologists in their diagnosis and is gradually being introduced into clinical diagnosis [12], [13], but it is highly dependent on the clinical experience of dermatologists and the dermoscopy images themselves are extremely complex, as shown in Fig. 1.
Such as intra-class variation, inter-class similarity, and blurring of the boundary of skin lesions have a great influence on the diagnosis. Therefore, computer aided diagnosis (CAD) has gradually become the focus of research. Early dermoscopy image classification methods VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ usually focused on the manual extraction of features [14]- [19]. Kusumoputro and Ariyanto [20] extracted shape and color features from dermoscopy images and used artificial neural network to classify malignant melanomas. Schaefer et al. used an automatic border detection approach [21] and assembled the extracted features for melanoma recognition [22]. The problem faces a number of challenges due to interclass similarities within skin cancer and large intra-class variations in background, color, and illumination. In addition, the method of extracting features manually is complex and requires a lot of manpower. The extraction process is unstable and the generalization ability is limited. With the development of deep learning, convolutional neural networks (CNNs) have an outstanding performance in image processing and powerful generalization ability, including but not limited to segmentation, classification and detection. CNNs are able to learn multilevel features from original data, and the extracted features are more high-level and more robust. Many researchers established a system that combined recent developments in deep learning and machine learning for skin lesion segmentation and classification [23]- [28]. Schaefer et al. [23] segmented the area of the lesionusing an approach based on thresholding, region growingand region merging, and the extracted features are analysed in apattern classification stage. Demyanov et al. [24] employed a convolutional neural network with 8 layers to solve the problem of dermoscopy pattern classification. Kawahara et al. [25] gained consistent additional improvements toaccuracy using a per image normalization, a fully convolu-tional network to extract multi-scale features, and by poolingover an augmented feature space. In addition, ensembles and attention mechanisms were also considered in the image classification and segmentation of skin lesion [29]- [31]. Gessert et al. [29] combined the crops both by simple averaging and a meta learning strategy to improve the accuracy of model classification. Wei et al. [31] proposed a novel Attention Based DenseUnet network with adversarial training for skin lesion segmentation. Although some research has been done on the computer-aided diagnosis, the problem of insufficient medical samples and uneven distribution still exists. To alleviate the above problems, researchers have used transfer learning and generative adversarial networks (GANs), which have achieved excellent results in CAD [32]- [38]. Ghazi et al. [33] used transfer learning to fine-tune the three powerful and popular deep learningarchitectures, namely GoogLeNet, AlexNet, and VGGNet. Through transfer learning, CNNs can converge faster. Burlina et al. [37] used GANs to generate fundus images of retinal diseases and used deep learning for discriminative tasks in ophthalmology. However, few dermoscopy studies have been proposed to use transfer learning and GANs. In addition, the loss function can help the CNNs to learn effective information more accurately and pay more attention to special samples. Yang et al. [39] presented to penalize the loss function with danger samples to enable the CNN to pay more attention to danger samples. Different from the extensively studied lesion classification, dermoscopy feature extraction is a new task in the area. Traditional ensemble strategies limit the advantages of each CNN, and each CNN has different recognition capabilities for different categories of images. Most research focuses on parameter tuning and CNN structure, and there is still much room for improvement on multiple evaluation criteria.
Based on the above analysis, we proposed a classification method based on the ensemble of individual advantage and group decision to automatically classify the dermoscopy image. The main contributions of our work can be summarized as follows: 1) We propose a classification method based on the ensemble of individual advantage and group decision, including the ensemble strategy of group decision, the ensemble strategy of maximizing individual advantage, and the ensemble strategy of block-integrated voting. The method can solve the problem of a individual convolutional neural network (CNN) in solving multiple problems. Different from the traditional voting method, our method only performs binary classification voting on one dermoscopy image at a time. Our method is more robust and stable than the different individual CNNs and the traditional ensemble strategies. The basic principle of the method we propose is simply to use the idea of divide and conquer. The complex classification problem is divided into several simple binary classification problems, and finally the dermoscopy image classification is completed through different strategies. 2) We train multiple CNNs based on transfer learning.
Through transfer learning, the pre-training CNNs are used for fine-tuning, then the effects of different CNNs on the classification of different categories of dermoscopy images are compared, and the CNNs with better classification effect are selected for the ensemble of different strategies. The experiment finds that transfer learning has excellent performance on the dermoscopy image dataset. 3) We use the method of GANs to create balanced sample space and combine rotation, scaling, and cropping for data augmentation to improve the training effect of CNNs.

A. DATA AUGMENTATION
We collected 10,015 dermoscopy images from the International Skin Imaging Collaboration (ISIC) 2018 dataset. From the 10,015 images, we selected all images as our experimental dataset, which include 7 skin diseases: actinic keratosis (AKIEC), basal cell carcinoma (BCC), benign keratosis (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic nevus (NV) and vascular lesion (VASC). The outstanding feature extraction capabilities of CNNs require large and balanced dataset. The problem of insufficient medical dataset and uneven distribution poses an important challenge, which will reduce the training effect of CNNs and easily lead to over-fitting. In view of the uneven distribution of the dataset, we divide the training set into a balanced sample space. By using GANs to generate some fake sample images to expand the categories with fewer samples. The flow chart of the GANs is shown in Fig. 2. Noise is used as the input of the generator. The generator attempts to generate a fake image that is sufficiently similar to the real image, making it difficult for the discriminator to correctly distinguish the authenticity of the image. If the discriminator can effectively distinguish the authenticity of the image, then the effect of the generator needs to be improved, and the parameters need to be adjusted. The training of the generator and discriminator can be defined as: min  where D denotes discriminator, G denotes generator, x denotes real data, and z denotes the noise. The most important thing in GANs is to optimize the objective function. The purpose of the discriminator is to make that the probability of D (G (z)) is true as small as possible, that is, 1 − D (G (z)) is as large as possible. At the same time, the probability of D (x) is true as large as possible. The purpose of the generator is to make that the probability of D (G (z)) is true as large as possible, that is, 1−D (G (z)) is as small as possible. Because the samples generated by GANs are fake, we need to pick out more realistic or useful images for our training, as shown in Fig. 3.
After constructing the balanced sample space, we use GANs, rotation, scaling, and cropping methods to augment the dataset. The processed dataset is larger and more balanced than before. Compared with ISIC 2018 dataset, ISIC 2019 dataset added squamous cell carcinoma (SCC) with a total of 25,331 dermoscopy images. For the ISIC 2019 dataset, we also use the above processing method. GANs may generate deformed or blurred dermoscopy images, as shown in Fig. 4.
We need to pick as clear and undistorted images as possible. Because these images are relatively good for us to train CNNs. The reason for using the dataset for two consecutive years is to better verify the generalization of our proposed method and the feasibility of transfer learning on dermoscopy images.

B. IMAGE SIZE
The size and integrity of the image have an important effect on the feature extraction of CNNs. Relatively complete dermoscopy images are of great significance to CNNs. Most cropping causes skin lesions to deform, but shape contour information is an important basis for extracting features.  When images of different sizes are used as training set, the training time is also different. Therefore, we consider the time cost and the effect of feature extraction, and use 600 × 600 as the image size. Fig. 5 shows the uneven distribution of dermoscopy images. Insufficient medical samples make it difficult to extract key features, and uneven distribution of samples easily leads to overfitting. To solve this problem, we use transfer learning and construct a weighted loss function.

C. WEIGHTED LOSS FUNCTION
We use CNNs based on pre-trained models on ImageNet. With the help of transfer learning, we fine-tune the pre-trained model and modify the last fully connected layer for training. Because we use ISIC 2018 dataset and ISIC 2019 dataset, we first use the pre-trained model based on ImageNet to train CNNs based on the ISIC 2018 dataset. Then we use the pre-trained model based on ISIC 2018 dataset to train CNNs based on ISIC 2019 dataset. Although we constructed a relatively balanced sample space, the generated samples are limited in terms of feature extraction. Considering the uneven distribution of dermoscopy image dataset, this experiment sets the weight coefficient in the cross entropy loss function to multiply the larger sample by the smaller weight and the fewer sample by the larger weight, so as to alleviate the problem of uneven distribution in the dataset. Hence, the weighted cross entropy loss function is defined as: where w i = N/N i denotes weight of loss function, N denotes total number of samples, and N i denotes the number of samples for class i, p ji denotes the desired output, q ji denotes the actual output, m denotes the number of samples in batches, n denotes the number of categories of dermoscopy images.

D. IMAGE NORMALIZATION
This study normalized the dermoscopy images to reduce the interference of uneven light in dermoscopy images. At present, image normalization methods often use the image pixel value subtracts the average pixel value of the entire dataset. However, the illumination and observation angles of different dermoscopy images in the dataset are different.
There are certain limitations in subtracting the uniform average value of the whole dataset, so the standardization effect of a individual image is not ideal. In view of the above problems, this experiment normalizes each dermoscopy image by subtracting the channel average intensity value calculated for a individual image from the image. By giving a dermoscopy image, the normalized image is calculated and defined as: where u (X R ), u (X G ), and u (X B ) denote the average pixel values of the 3 color channels respectively.

E. THE ENSEMBLE STRATEGY OF GROUP DECISION
In order to take advantage of group decision, we proposed the ensemble strategy of group decision, the flow chart of which is presented in Fig. 6. Except for the large sub-modules containing m CNNs, each of the small sub-modules only performs binary classification for each corresponding dermoscopy category. Among them, each small sub-module contains n CNNs. The result of each small sub-module after voting is a minimum of 0 and a maximum of n. The maximum value of the voting result of the small sub-modules is compared with the threshold (If the small sub-module of the maximum value of the voting result is not unique, the small sub-module with the highest probability is selected). If the threshold is satisfied, the category of dermoscopy image corresponding to the maximum value of the voting result is output as a prediction result. Otherwise, voting is performed by a large sub-module including m CNNs, and the category of dermoscopy image corresponding to the maximum value of the voting result is output as a prediction result (If the category of dermoscopy image corresponding to the maximum value of the voting result is not unique, the category of dermoscopy image with the highest probability is selected). The threshold is 2. We select 9 CNNs as a group. Our large sub-module contains 9 CNNs, and each of the small sub-modules contains 3 CNNs. The CNNs in the small sub-modules are selected based on the classification effect of the dermoscopy images. The first three CNNs with better classification effect of the corresponding dermoscopy image category in the group were put into the corresponding small sub-module. The pseudocode of the ensemble strategy of group decision is shown in Algorithm 1.

F. THE ENSEMBLE STRATEGY OF MAXIMIZING INDIVIDUAL ADVANTAGE
In order to further reduce the computational cost and better exert the effect of different CNNs on the classification of different categories of dermoscopy images, we used more recent CNNs. Based on these CNNs, the ensemble strategy of maximizing individual advantage is proposed. The flow chart is presented in Fig. 7.
The ensemble strategy of maximizing individual advantage consists of small sub-modules, each of which only performs binary classification for each corresponding dermoscopy category. Among them, each sub-module contains n CNNs. The result of each sub-module after voting is a minimum of 0 and a maximum of n, and the category of dermoscopy image corresponding to the maximum value of the voting results of the small sub-modules is output as a prediction result (If the sub-module of the maximum value of the voting result is not unique, the sub-module with the highest probability is selected). We select 9 CNNs as a group. Each sub-module of our experiment contains 3 CNNs. The CNNs in the submodules are selected based on the classification effect of the dermoscopy images. The first three CNNs with better classification effect of the corresponding dermoscopy image category in the group were put into the corresponding submodule. The pseudocode is shown in Algorithm 2.

G. THE ENSEMBLE STRATEGY OF BLOCK-INTEGRATED VOTING
In order to better play the role of blocks for different categories of dermoscopy image classification, we proposed the ensemble strategy of block-integrated voting based on multiple CNNs. The flow chart is presented in Fig. 8.
The ensemble strategy of block-integrated voting is to integrate m different CNNs into one block (The CNNs in each block are preferably not combined repeatedly with each other). Each block performs binary classification on different categories of dermoscopy images, and the category of dermoscopy image corresponding to the maximum value of the voting result is added as a prediction result to the next block (If the category of dermoscopy image corresponding to the maximum value of the voting result is not unique, the category of dermoscopy image with the highest probability is selected). After a number of blocks are accumulated,  the category of dermoscopy image corresponding to the maximum value of the cumulative voting result is selected as the predicted result output (If the category of dermoscopy image corresponding to the maximum value of the voting result is not unique, the category of dermoscopy image with the highest probability is selected). We select 9 CNNs as a group. 3 different CNNs were selected for each block in our experiment, and accumulated through 84 blocks. CNNs in each block are selected by combining formulas, which is defined as: where n and m denote the total number of CNNs per group and the total number of CNNs in each block. The pseudocode is shown in Algorithm 3.

III. EXPERIMENT RESULTS AND ANALYSIS
Experimental images are from ISIC 2018 and ISIC 2019 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset [40]- [42]. For the optimizer, Adam was used as the optimizer. Experiments were implemented with PyTorch framework on a computer equipped with four NVIDIA Tesla P100 GPUs with 16GB of memory. The CNNs performed 100 epoch of training. In the experiment, the batch size was 64, the learning rate was 0.0001, and the learning rate decay was 0.00001. In order to ensure the stability and fidelity of the experiment, We used ten-fold cross-validation technique on the training set to train the model and independent test to evaluate the performance. The independent dataset was re-used to evaluate the performance accuracy to avoid any systematic bias in the cross-validation set. Our dataset is from ISIC, and the dermoscopy images are classified by our proposed method. Take the dermoscopy image as input, and then output the category to which the dermoscopy image belongs. Since the ensemble strategy of group decision and the ensemble strategy of maximizing individual advantage contain multiple small sub-modules, each small sub-module performs binary classification for the corresponding dermoscopy image category. Therefore, we need to select CNNs that have a better classification effect in the corresponding dermoscopy image category. For example, the first small submodule is for AKIEC, and we need to select CNNs with better criteria such as accuracy in the binary classification of AKIEC. The ensemble strategy of block-integrated voting accumulates the results of multiple blocks and then makes a comprehensive decision. Therefore, the ensemble strategy of block-integrated voting pays more attention to the overall classification effect of CNNs. We need to select CNNs that have good performance in the average value of each criterion. Different CNNs have different classification effects, so we select different CNNs as much as possible based on the classification effects. In order to comprehensively evaluate the classification performance of the model, accuracy (ACC), sensitivity (SE), specificity (SP), F 1 and area under ROC curve (AUC) are used as evaluation criteria. In the experiment, the CNN with better classification of dermoscopy images was selected based on the ACC and AUC as the main evaluation criteria. SE and SP are also important criteria in medical diagnosis. SE is also called true positive rate or recall. The higher the value, the lower the probability of missed diagnosis. The SP is also called the true negative rate, and the higher the value, the higher the probability of diagnosis. The criteria are defined as:  (12) where N tp , N tn , N fp , N fn , P, R, t pr and f pr denote the number of true positive, true negative, false positive, false negative, precision, recall, true positive rate and false positive rate, respectively, and they are all defined on the image level. A lesion image is considered as a true positive if its prediction is lesion; otherwise it is regarded as a false negative. A nonlesion image is considered as a true negative if its prediction is non-lesion; otherwise it is regarded as a false positive.

B. COMPARISON OF CLASSIFICATION EFFECTS UNDER DIFFERENT IMAGE SIZES
CNNs usually use fixed size and square images as input, and different size images can get different feature extraction effect. All images are cropped to the desired size before entering the CNN, but most cropping causes skin lesions to deform, and shape contour information is an important basis for discriminating skin cell damage categories. However, if the input image is large, it will cause the training cost of CNNs to be high. Therefore, we tried a variety of image sizes to the optimal image size, and the experimental results based on ISIC 2019 dataset are shown in Table 1. Because our proposed method requires multiple CNNs, we need to weigh the classification effect and training cost. As the size increases, the classification effect is better. However, the change in classification effect starting from the size of 600 × 600 is not so obvious. By comparing the experimental results of multiple image sizes as input, we find that images with an image dimension of 600 × 600 suit us better as input. Experiments show that the relatively complete dermoscopy image information is beneficial to feature extraction, and the CNN can be performed better to some extent. The comparison of the size of 600 × 600 and the  Table 2.
By comparing the effects of 43 different CNNs on the classification of 7 different categories of dermoscopy images, it is found that the best average accuracy of the CNN is not necessarily the best in the classification of different categories of dermoscopy images. InceptionV4 performs best on average accuracy, but DenseNet201 is 0.002 higher than InceptionV4 on DF. InceptionResNetV2, DenseNet121 and SENet154 perform better than Incep-tionV4 on BKL, and InceptionResNetV2, InceptionV3, DenseNet161, SE_ResNeXt101_32×4d and VGG11BN perform more outstanding on AKIEC. According to Table 2, the ACC and AUC of classification of different dermoscopy image categories are used as the main evaluation criteria, and the CNNs with better classification of dermoscopy images are selected. Based on the consideration of computational cost, two sets of CNNs are selected in the experiment. The first group includes DenseNet121, DenseNet161, DenseNet169, DenseNet201, InceptionV3, InceptionV4, InceptionResNetV2, SE_ResNeXt50_32 × 4d and Xception. The second group includes AlexNet, ResNet18, ResNet50, ResNet101, ResNet152, SENet154, SqueezeNet1_0, VGG13BN and VGG16BN. IA indicates the ensemble strategy of maximizing individual advantage. GD represents the ensemble strategy of group decision. VWB denotes the ensemble strategy of block-integrated voting. AVR denotes the ensemble strategy of averaging. VT represents the ensemble strategy of voting. AVR and VT are common traditional ensemble strategies. According to Table 2, the improvement of VWB on each criterion is more obvious, but the improvement of GD on AUC is better.

D. COMPARISON OF CLASSIFICATION EFFECTS BASED ON ISIC 2019 DATASET
In order to better compare the performance of three different ensemble strategies, we also performed experiments on the ISIC 2019 dataset and compared it with the different individual CNNs and the traditional ensemble strategies. The experimental results based on ISIC 2019 dataset are shown in Table 3. In order to compare the impact of choosing different numbers of CNNs on our proposed method based on ISIC 2019 dataset, we took DenseNet169, DenseNet201, InceptionV3, InceptionV4, InceptionResNetV2 from the first group as the third group, and took ResNet152, SENet154, SqueezeNet1_0, VGG13BN, VGG16BN from the second group as the fourth group. In the two groups, our large sub-module contains 5 CNNs, and each of the small submodules contains 3 CNNs. 3 different CNNs were selected for each block in our experiment, and accumulated through 10 blocks. The method of selecting CNNs is explained in Section II. Because the ISIC 2019 dataset is more comprehensive, we performed more experiments.
The experimental results show that the ensemble performance based on the first group is better than the second group,  the third group is better than the fourth group. By comparing these groups, we find that choosing better CNNs can make our proposed method perform well in dermoscopy image classification. By comparing the first group with the third group, the second group and the fourth group, experiments show that selecting more and better CNNs can make our proposed method more robust and stable. The ensemble strategy of block-integrated voting is the most prominent in terms of ACC. VWB-1, VWB-2, VWB-3, VWB-4 have performed well in their respective groups, especially VWB-1 can reach 0.924. However, the ensemble strategy of group decision on the AUC evaluation criterion is significantly better than the ensemble strategy of block-integrated voting. GD-1 is 0.002 higher than VWB-1, GD-2 is 0.015 higher than VWB-2, GD-3 is 0.016 higher than VWB-3, and GD-4 is 0.029 higher than VWB-4. AUC can more accurately evaluate the classifiers for unbalanced data. The ensemble strategy of group decision performs best on AUC. Our proposed method is highly dependent on selected CNNs. If the selected CNNs are better, the ensemble strategy of group decision and the ensemble strategy of maximizing individual advantage perform similarly, but the ensemble strategy of maximizing individual advantage can reduce certain calculation costs. The ensemble strategy of group decision is not much different from the ensemble strategy of maximizing individual advantage in the first group, only 0.005 and 0.001 differ from AUC and F 1 . In addition, when the selected CNNs perform well, our proposed method improves each evaluation criterion compared to the traditional ensemble strategies, such as the first group. However, if the selected CNNs perform poorly, the performances of the ensemble strategy of group decision and the ensemble strategy of maximizing individual advantage differ greatly and the proposed method may not be as effective as traditional ensemble strategies on some evaluation criteria, especially on SE.
By comparing Table 3, the evaluation criteria of the different individual CNNs are obviously not as good as our proposed method. Different strategies and individual CNNs have different classification effects, but the classification effect of the ensemble strategy is usually better than that of the individual CNN. DenseNet201 has performed well in these individual CNNs, but compared with the ensemble strategies, the classification effect still has a big gap. In general, the ensemble strategy of block-integrated voting has shown an advantage in various evaluation criteria, especially in terms of SE, VWB-1 is 0.071 higher than DenseNet201, 0.040 higher than GD-1 and IA-1, 0.057 higher than AVR-1, and 0.163 higher than VT-1. On the whole, the classification effect of the VWB-1 is better.

E. COMPARISON OF CLASSIFICATION EFFECTS OF DIFFERENT FRAMEWORKS
Over the years, numerous studies on lesion classification have been conducted. We compare different frameworks based on the ISIC 2019 dataset, as shown in Table 4.
Among the three strategies proposed by us, the comprehensive performance of VWB-1 is more outstanding. We can see that VWB-1, which we propose, has achieved good results in various evaluation criteria. A individual classifier has limitations on various evaluation criteria. The performance of VWB-1 is similar to that of [10], but VWB-1 has improved in all evaluation criteria, especially ACC, AUC and SE increased by 0.011, 0.018 and 0.012 respectively. Compared with [10], VWB-1 is mainly based on ensemble learning to combine multiple CNNs. Comparing VWB-1 and [44] shows that the performance in ACC is very similar, but our proposed method performs better on the overall classification effect. When the sample distribution is uneven, especially when the proportion of positive and negative samples is seriously unbalanced, it is difficult for ACC to evaluate the classifier effect. Compared with [45], VWB-1 is close in most evaluation criteria, but VWB-1 has obvious improvement in AUC and SP. On the whole, [46] performs better than VWB-1. However, VWB-1 is 0.002 higher in SP than [46]. In addition, the calculation cost of [46] is relatively high and the realization process is complex. Reference [47] is mainly based on Google InceptionV3 CNN architecture. Overall, VWB-1 is better than [47] in most evaluation criteria. Compared with [47], VWB-1 increases by 0.012 in AUC. By comparing different frameworks, it is found that combining multiple deep learning techniques can improve the classification effect to a certain extent.

F. TRANSFER LEARNING
We analyze the application of transfer learning in ISIC 2019 dataset by comparing the accuracy and loss of ResNet50 in three cases. The three cases we compared are in the absence of pre-training, in the case of ImageNet pretraining, and in the case of ISIC 2018 dataset pre-training. The loss function can be a good tool to reflect the gap between the model and the actual data. Through the loss function, we can better analyze and understand the subsequent optimization of the model. In addition, the different distribution of the dataset will cause our optimized loss function to be VOLUME 8, 2020 different from the real data loss function, so we need to choose the dataset carefully. We use GANs to create a relatively balanced sample space and use ten-fold crossvalidation to ensure the stability and reliability of training. In general, the larger the loss function value, the smaller the classification probability of the classifier on the real label, and the worse the performance. ResNet50 is selected to compare the loss and accuracy in three cases on the validation set, which is presented in Fig. 9.
Through experiments, we find that within 100 epochs, the pre-trained ResNet50 performed well on accuracy and loss. In particular, ResNet50 pre-trained by ISIC 2018 dataset has higher starting accuracy, faster convergence speed, and higher approximation accuracy. After ISIC 2018 dataset pretraining, ResNet50 can converge after about 15 epochs, thereby achieving a more stable state. InceptionV3 and ResNet50 have similar performance, but InceptionV3 has higher accuracy and lower loss. Through transfer learning, we can share the learned model parameters with the new model in a certain way, thereby speeding up and optimizing the learning efficiency of the model.

IV. DISCUSSION
Skin cancer is the most common form of cancer. While amenable to early detection by direct inspection, visual similarity with benign lesions makes the task difficult. Dermoscopy images are introduced to better visualize key details in skin lesions to improve diagnostic accuracy. We tried 43 CNNs, including DenseNet, SE-ResNet and other CNNs. By comparing the effects of different CNNs on the classification of different categories of dermoscopy images, in most cases, the deeper and more recent CNN has better feature extraction ability and classification ability than the relatively shallow CNN. By selecting more complete images to ensure the integrity of the feature information, the data augmentation and transfer learning are used to effectively alleviate insufficient samples during training. We trained CNNs by using GANs to create a balanced sample space. Setting different loss weight for different categories of samples alleviates uneven data distribution to some extent. We conduct extensive experiments and prove the effectiveness of the ensemble strategies proposed by the classification method based on the ensemble of individual advantage and group decision, which satisfies the higher clinical demand to some extent. VWB-1 performs similarly to [10], but the method we proposed is higher than 0.024 on SE. Comparing VWB-1 and [43] shows that the performance on ACC is very similar, but our proposed method performs better on the overall classification effect. By comparing different frameworks, it is found that combining multiple deep learning techniques can improve the classification effect to a certain extent. In addition, we also considered the influence of age, gender, and lesion location on the classification, which is presented in Fig. 10.
Middle-aged and elderly people are the main group of skin cancer patients, especially middle-aged people suffer from BCC. Skin cancer is more common in men than women, except for NV. Different categories of skin cancer lesions vary, e.g., the main lesion location of DF is concentrated in Head or neck. The combination of computer-assisted diagnosis and physician experience may have great potential, which requires more clinical experience as a guide. When an individual CNN deals with a problem, it is easy to encounter a bottleneck in the model, and there is no guarantee that it will perform well when solving multiple problems. Therefore, it is common to use the ensemble of voting, the ensemble of averaging and other methods to fuse the excellent models to improve the generalization ability of the individual CNN, and to gather the advantages of each model to get an optimal solution. In the process of using the fusion CNNs, the problem of high computational cost is often encountered, because the excellent CNNs are generally deep CNNs, which are characterized by deeper levels and more parameters. Our proposed method includes three ensemble strategies, each of which has its advantages. We first performed experiments on ISIC 2018 dataset and found that CNNs with better classification results can make our proposed method more effective. Then, we continued the experiment on ISIC 2019 dataset, the purpose of which was to prove the feasibility of transfer learning on dermoscopy images. At the same time, it proves that our proposed method can improve the classification effect of dermoscopy images.  The study deviated from a real-life clinical setting and was potentially affected by verification and selection bias. However, one goal of the classification method is to raise awareness among the people to check their moles regularly. It is not assumed that the higher accuracy in a diagnostic study with digital images substitutes the clinical expert, but that the application might be implemented in a screening process and brings the people to the expert dermatologist who will be able to precise the diagnosis.

V. CONCLUSION
In conclusion, A balanced and large dataset is crucial for CNNs. By creating balanced sample spaces through GANs and using transfer learning, CNNs are better trained. Different strategies and individual CNNs have different classification effects, but the classification effect of the ensemble strategy is usually better than that of the individual CNN. Based on the classification method of the ensemble of individual advantage and group decision, we proposed three kinds of ensemble strategies to enhance the classification effect of dermoscopy images. Compared with the different individual CNNs and traditional ensemble strategies, the method proposed in our study has a certain improvement in multiple evaluation criteria.