Dermoscopy Image Classification Based on StyleGANs and Decision Fusion

Skin cancer is one of the most common cancers in humans in recent years, affecting people of all ages. If skin cancer is treated in time, the curative effect and prognosis are favorable. At present, dermoscopy is an effective way for the early diagnosis of skin cancer. However, manual detection is highly dependent on the clinical experience of doctors, and the complexity of the dermoscopy image itself poses a great challenge to the classification. Therefore, we propose a decision fusion method. Through transfer learning, based on multiple pre-trained convolutional neural networks (CNNs), we use the block to combine multiple CNNs and finally make decisions through multiple blocks. The method of decision fusion can solve the generalization capability of a individual convolutional neural network (CNN) model, and is more robust and stable than the traditional fusion strategy. Based on ISIC 2019 dataset, we use StyleGANs to generate high-quality images to alleviate the problem of less and uneven distribution of the dermoscopy image dataset and improve the classification effect of CNNs. Our proposed method can improve the accuracy of dermoscopy image classification and provide help for dermatologists.


I. INTRODUCTION
The skin is the largest organ of the human body, and it is also the first line of defense of the human body. It has the functions of protecting, secreting, excreting and regulating body temperature. With the changes in lifestyle and environment, various skin diseases affect the normal life of human beings. Many people across the globe are suffering with skin diseases and the number of skin cancer cases are more compared to any other classes of cancer [1]. In the United States alone, there were 5.4 million new cases of skin cancer each year [2]. From all skin cancers, melanoma cases were present in just 5% of cases, but 75% of times, it might lead to death [3], [4]. Melanoma mortality rates in Spain increased during the last decades of the 20th century [5]. The prognosis for advanced melanoma remains poor. However, if detected during the earliest stages, the curative effect and prognosis are better. Early detection of melanoma by means of accurate screening is an important step towards reduction in mortality and prolongation of patients' survival [6]. In recent years, dermatologists around the world have devoted considerable The associate editor coordinating the review of this manuscript and approving it for publication was Wu-Shiung Feng.
attention to the study of dermoscopy. Esteva et al. showed that the convolutional neural network (CNN) achieves performance on par with all tested experts across both classification tasks, demonstrating an artificial intelligence capable of classifying skin cancer with a level of competence comparable to dermatologists [7]. Sinz et al. also showed that the specificity of dermascopic diagnosis of malignant melanoma even higher than clinical diagnosis [8]. Dermoscopy images are acquired by dedicated instruments, and histological images are acquired by invasive biopsy and microscopy. Both models produced highly standardized images. At present, dermoscopy is an effective way for the early diagnosis of skin cancer. Dermoscopy images can provide high image clarity, as shown in Fig. 1.
In order to assist doctors in the diagnosis, dermoscopy is introduced clinically [8]. But manual detection is highly dependent on the clinical experience of doctors, and the complexity of the dermoscopy image itself, such as intra-group variation, inter-group similarity and skin lesion boundary blurring, has made a huge impact on detection. Therefore, the diagnosis and treatment of patients with computer vision have gradually become an important research direction in the development of the medical field. Digital dermoscopy can help dermatologists diagnose and treat patients. Digital dermoscopy also facilitates the comparison of different lesions, medical consultations, presentation of interesting cases, and continual education [7]. Computer vision systems classified dermoscopy images with accuracy that exceeded some but not all dermatologists [9].
There has been considerable interest in developing computer vision systems for diagnosis, but few groups have directly compared different ensemble strategies. Prof. Finlay has publicly stated that he is fascinated in computer-aided diagnosis [10]. Research on automatic classification of skin lesions images had already appeared in the literature [11]. Early research focused on the manual extraction of shapes [12], textures [13], [14], and color [15], [16] to fuse these low-level features to improve classification accuracy [4], [17]. Schaefer et al. used an automatic border detection approach [18] to segment the lesion area and then assembled the extracted features, i.e., shape, texture and color, for melanoma recognition [19]. Because of the large intraclass differences between melanoma and non-melanoma and the small differences between classes, most of the time, the effect of manual extraction of features is not ideal. In addition, most of the methods based on manual extraction of features are complicated, resulting in low applicability and insufficient generalization ability, etc. in clinical practice. Previous work in dermatological computer-aided classification [20]- [22] lacked the generalization ability of medical practitioners due to insufficient data and attention to standardized tasks such as dermoscopy examination [23], [24] and histological image classification [25]- [27].
A sufficiently rich dataset is very meaningful for research, especially in the field of deep learning. However, medical image datasets are often unevenly distributed and less. Therefore, image classification research based on deep learning is largely limited. In 2014, Goodfellow proposed a new confrontation process to evaluate the effects of the generated model [28]. Generative adversarial networks (GANs) are mainly divided into two parts: the generation model and the discriminant model. The function of the generated model is to simulate the distribution of real data. The role of the discriminant model is to judge whether a sample is a real sample or a generated sample. The goal of GANs is to train a generated model to perfectly fit the real data distribution so that the discriminant model cannot be distinguished. By using GANs, the problem of less and uneven distribution of medical image data can be alleviated to some extent. With the appearance of GANs, more and more people pay attention to GANs. Radford et al. proposed DCGANs, which have clear structural constraints and indicate that they have strong credibility for unsupervised learning and that they are generalized most of the time [29]. Liu proposed CoGANs [30], which can learn a joint distribution unsupervised by adding weights to the network to share constraints and solving distributed solutions. NVIDIA described a new GANs training method whose core idea is to gradually increase the capabilities of the generator and discriminator: starting with low resolution, adding a new layer of continuous modeling fine detail as a training process [31]. NVIDIA proposed another perspective on the GANs framework, creating a generator architecture that learns to generate advanced attributes and random variables in images [32]. With the development of research, deep learning techniques represented by convolutional neural networks (CNNs) have achieved good results in the fields of natural images and medical images analysis. Unlike methods that rely on manual extraction of features, deep learning techniques such as CNNs have significant advantages [33], [34]. The main advantage of the CNN is that it automatically learns the feature representations [35] required for the corresponding detection or classification tasks through the dataset, and has a good performance in many applications [36]. Codella et al. established a system combining recent developments in deep learning and machine learning approaches for skin lesion segmentation and classification [37]. Kawahara et al. presented a fully CNN based on AlexNet [34] to extract representative features of melanoma [38]. Yu et al. proposed a novel method based on very deep CNNs to meet the challenges of automated melanoma recognition in dermoscopy images, which consists of two steps: segmentation and classification [39]. Yang et al. used the segmented lesion region to guide the classification by region average pooling [40]. On utilizing age and sex information for classification, Matsunaga et al. observed slight effects [41]. iGessert et al. combined the crops both by simple averaging and a meta learning strategy to improve the accuracy of model classification [42]. Wei et al. proposed a novel Attention Based DenseUnet network with adversarial training for skin lesion segmentation [43]. Gupta et al. investigated the application of the Gaussian mixture model based on the analysis and classification of skin diseases from their visual images using a Mahalanobis distance measure [44]. Burlina et al. used GANs to generate fundus images of retinal diseases and used deep learning for discriminative tasks in ophthalmology [45]. Although some research has been done on the diagnosis of skin diseases by computer-aided doctors, there is still much room for improvement in classification evaluation metrics.
Based on the above analysis, we use StyleGANs to enhance the dataset and propose a method of decision fusion to automatically classify the dermoscopy image.
The main contributions of our work can be summarized as follows: VOLUME 8, 2020 1) We propose a decision fusion method. Based on multiple pre-trained CNNs, we use block idea to combine multiple CNNs and finally make decisions through multiple blocks. The method of decision fusion can solve the generalization capability of a CNN model, and is more robust and stable than the traditional fusion strategy. 2) We train StyleGANs based on the ISIC 2019 dataset and generate high-quality images with trained Style-GANs. The images that are favorable for CNNs training are selected from the generated images and added to the dataset, which improves the classification effect of CNNs.

A. DATA AUGMENTATION BY STYLEGANS
The International Skin Imaging Collaboration (ISIC) 2019 dataset has 25331 dermoscopy images [46]- [48], totaling 8 classes (i.e., actinic keratosis (AKIEC), basal cell carcinoma (BCC), benign keratosis (BKL), dermatofibroma (DF), melanoma (MEL), melanocytic nevus (NV), vascular lesion (VASC) and squamous cell carcinoma (SCC)). In this experiment, sixty percent of the dataset is defined as the training set, and twenty percent of the dataset is defined as the validation set. The remaining dataset is used as the test set. The training set is used for training to obtain a CNN model. Then we use the validation set to verify the validity of the model and pick the model that gets the best results until we get a satisfactory model. Finally, we use the test set to evaluate the final effect of the model. The test set is used only during model evaluation. It is worth mentioning that the division of the dataset should be as consistent as possible in the distribution of the data to avoid affecting the final result due to the additional bias introduced by the data division process.
Although the CNN has strong feature expression ability, it requires a large number of datasets for training. Most medical image datasets are less and unevenly distributed, which will reduce the training effect of CNNs and easily lead to over-fitting. In order to alleviate the above problems, data augmentation based on GANs are adopted. Since the advent of GANs, a wide variety of new GANs have emerged. We generated the dermoscopy images by selecting Style-GANs, as shown in the Fig. 2. Based on the eight dermoscopy image classes, we divide the training set into eight classes. We train StyleGANs with each class of training set as input, so that we can get eight different StyleGANs. Different StyleGANs generate corresponding dermoscopy images. 200 generated images of each class are included in the training set. As can be seen from Fig. 2, StyleGANs have a good performance in generating dermoscopy images. StyleGANs result in the automatic, unsupervised separation of high-level attributes and random variations in the generated image, and the ability to control the composition intuitively and at a specific scale. However, not all dermoscopy images generated by StyleGANs are useful for training CNNs. StyleGANs may generate some blurry or distorted dermoscopy images, as shown in the Fig. 3. These poorly generated images are not conducive to our training of CNNs. Therefore, we need to manually select clear and undistorted dermoscopy images into the training set, as shown in the Fig. 2. We generated a large number of dermoscopy images with the help of StyleGANs, and selected the high-quality dermoscopy images and added them to the original dataset for CNNs training, as shown in the Fig. 4. The selection is made manually by visual assessment.

B. IMAGE SIZE
CNNs usually use fixed size and square images as input, and different size images can get different feature extraction effect. All images are cropped to the desired size before entering the CNN, but most cropping causes skin lesions to deform, and shape contour information is an important basis for discriminating skin cell damage classes. Relatively complete dermoscopy images are helpful for CNNs to extract features. However, if the image is large, the time and computational cost of training CNNs is high. We try to preprocess and crop the image without affecting the recognition target, and adjust the image size with the center as the origin. By comparing three sizes of dermoscopy images, we choose a size of 600 × 600 as input.

C. LOSS FUNCTION
The CNN transforms the input image through multiple hidden layers to achieve high-level feature extraction. This end-to-end automatic learning feature process has a good performance in feature extraction. CNNs have better generalization ability and less computational complexity than fully connected neural networks. In this experiment, the transfer learning was used to fine-tune the CNN model pre-trained on the ImageNet to extract the characteristics of skin lesions. The final convolutional layer features of the CNN model are extracted due to the differences between the pre-trained dataset and the dermoscopy image dataset and the ability to express deep features. Considering the uneven distribution of medical image datasets, this paper sets the weight coefficient w in the softmax loss function to multiply the large sample by the smaller weight and the small sample by the larger weight, so as to alleviate the problem of uneven distribution in the dataset. Hence, the weighted softmax loss function is defined as: where w j = N/N j denotes weight of loss function, N denotes total number of samples, and N j denotes the number of samples for class j, x i denotes the test image, y i denotes the mark of each image, θ denotes the parameters of the model, k denotes the total number of image classes, 1 {y i = j} denotes the value is only 1 if the value in the braces is true, otherwise it is 0.

D. IMAGE NORMALIZATION
In order to reduce the interference caused by uneven light in medical images, this study normalized the dermoscopy image. The common image normalization method is to subtract the image pixel value from the average pixel value calculated on the whole dataset. However, the illumination, skin color, and viewing angle of the dermoscopy image in the dataset vary widely, and subtracting the uniform average does not well standardize the illumination of individual images. In response to the above problem, this experiment normalizes each dermoscopy image by subtracting the average intensity value of the channel calculated on a individual image from the image. By giving a dermoscopy image X, the normalized image X norm is calculated and defined as: where u (X R ), u (X G ), and u (X B ) denote the average pixel values of the 3 color channels respectively.

E. CLASSIFICATION FRAMEWORK BASED ON DECISION FUSION
Ensemble learning is to build a set of base classifiers from the training dataset, and then get the prediction results through each base classifier, and finally classify by voting. Ensemble learning is not a classifier, but a combination of classifiers. Based on the traditional ensemble strategies, we proposed a CNNs image classification framework, as shown in the Fig. 5. There are many evaluation metrics for medical image classification, but it is difficult for a individual CNN to perform well on multiple evaluation metrics. Therefore, we integrated multiple CNNs. Each fusion is called a block, and each block contains n CNNs, and then the n CNNs respectively classify the input dermoscopy image. At this point, each CNN will have a softmax output, which will add up the softmax output of each CNN, that is, the sum of the softmax of n CNNs is m. There are k classes of dermoscopy images in the experiment, each softmax contains k values, which represent the probability of being possible for each class of dermoscopy image for the input dermoscopy image. Next, divide m by n to get the mean of the softmax in the block. There are 8 classes of dermoscopy images in this experiment, k is 8. Considering the cost of calculation, each block contains 5 CNNs, and n is 5. The flow chart is presented in Fig. 6.
In CNNs, the output of the last fully connected layer is usually used as the input of softmax. Through the softmax, we map real numbers between 0 and 1. The softmax function is defined as follows: VOLUME 8, 2020 where y j , z j and k denote the probability of class j through softmax mapping, output of class j through the last fully connected layer, total number of dermoscopy image classes. After obtaining the mean value of softmax in the block, the maximum value can be obtained from the k values. By finding the maximum value corresponding dermoscopy image class as a block of the predicted result. By analogy, the results of other blocks are grouped together. Finally, voting is performed, and the dermoscopy image class corresponding to the item with the most votes is used as the result predicted by the entire framework. If the number of votes is the same, then we randomly select one of them as the prediction result. For k classes, we can get k voting results, which is S k . Considering the cost of calculation, 6 blocks are selected here. The difference between the proposed decision fusion and the traditional fusion method is that the fusion methods are different. The traditional fusion method is to simply take the average of the prediction probabilities of multiple CNNs or simply accumulate the prediction results of multiple CNNs. The proposed decision fusion method mainly uses blocks.
Based on the analysis above, we designed the framework in four steps: 1) Step1. Select the dermoscopy images: We choose clear and non-deformed dermoscopy images as input for CNNs. 2) Step2. Calculate softmax value: We use the softmax function to map the output of each CNN between 0 and 1 to obtain the probability that each dermoscopy image belongs to each class. 3) Step3. Calculate the result of each block: We calculate the average value of the softmax output of the CNNs in each block, and select the dermoscopy image class corresponding to the maximum value as the output of this block. 4) Step4. Perform image classification: By superimposing the output results of multiple blocks, the maximum value is taken from the final accumulation result as the final prediction result of the entire framework.

III. EXPERIMENT RESULTS AND ANALYSIS
Experimental images are from ISIC 2019 Skin Lesion Analysis Towards Melanoma Detection Challenge dataset. This dataset consists of 25331 dermoscopy images and the ground truth is also available from the challenge, which is based on the manual delineation by clinical experts. 0.5 is used as the classification threshold according to ISIC 2019. For the optimizer, Adam, SGD, and RMSprop were tried. It was found that different optimizers had little effect on the classification of dermoscopy images, and finally a more stable Adam was used as the optimizer. Experiments were implemented with PyTorch library on a computer equipped with four NVIDIA Tesla P100 GPUs with 16GB of memory. We obtain weights through some existing pre-trained models. The CNNs performed 200 epoch of training. In the experiment, the batch size was 32, the learning rate was 0.0001.

A. EVALUATION METRICS
In order to comprehensively evaluate the classification performance of the model, accuracy (ACC), sensitivity (SE), specificity (SP), average precision (AP) and area under ROC curve (AUC) are used as evaluation metrics. In the experiment, the CNN model with better classification of dermoscopy images was selected based on the ACC and AUC as the main evaluation metrics. ACC, which is the ratio of correctly classified samples to all samples, can be used to evaluate the ability of the classifier to judge the overall sample. SE and SP are important metrics in medical diagnosis. SE is also called true positive rate or recall. The higher the value, the lower the probability of missed diagnosis. The SP is also called the true negative rate, and the higher the value, the higher the probability of diagnosis. The evaluation metric of AP is introduced, which obtains the final result by calculating the precision and recall. As a metric of the relationship between precision and recall, the value is between 0 and 1, the higher the value, the better the classifier. In order to more accurately evaluate the classifier for unbalanced data, AUC was introduced. AUC is defined as the area under the ROC (receiver operating characteristic) curve, which in turn is a graphical interpretation of true positive rate (TPR) and false positive rate (FPR). The metrics are defined as: AUC = 1 0 t pr f pr df pr (8) where N tp , N tn , N fp , N fn , P, R, t pr and f pr denote the number of true positive, true negative, false positive, false negative, precision, recall, true positive rate and false positive rate, respectively, and they are all defined on the image level. A lesion image is considered as a true positive if its prediction is lesion; otherwise it is regarded as a false negative. A non-lesion image is considered as a true negative if its prediction is non-lesion; otherwise it is regarded as a false positive.

B. COMPARISON OF CLASSIFICATION EFFECTS UNDER DIFFERENT DATASETS
In general, medical image datasets are less and unevenly distributed, which poses a huge challenge for CNNs training. Therefore, in this experiment, the dermoscopy image is generated by using StyleGANs, thereby enhancing the dataset and alleviating the problem that the dermoscopy image dataset is less and unevenly distributed. The comparison of CNNs classification effects before and after the enhanced dataset is shown in Table 1. Through experiments, we selected three CNNs, InceptionV3, ResNet50, and VGG16BN, to compare the dermoscopy image classification effects of the original dataset and the dataset enhanced by StyleGANs. Through experiments, it can be seen that the three CNNs trained in the dataset enhanced by StyleGANs have a certain degree of improvement in various evaluation metrics. Especially in the two important evaluation metrics of AUC and SE, Incep-tionV3 increased by 3.5% and 7.9%, ResNet50 increased by 2.4% and 6.3%, and VGG16BN increased by 1.6% and 3.3%, respectively.

C. COMPARISON OF CLASSIFICATION EFFECTS UNDER DIFFERENT IMAGE SIZES
Most cropping causes skin lesions to deform, and shape contour information is an important basis for discriminating skin cell damage classes. Therefore, we tried a variety of image sizes to the optimal image size, and the experimental results are shown in Table 2.
By comparing the experimental results of multiple image sizes as input, it is found that different CNN models have better evaluation metrics than image sizes of 400 × 400 and 224 × 224 with image size 600 × 600 as input. Experiments show that the relatively complete dermoscopy image information is beneficial to feature extraction, and the CNN model can be performed better to some extent. The comparison of the size of 600 × 600 and the size of 224 × 224 is the most obvious. In the three evaluation metrics of AUC, AP and SE, InceptionV3 is increased by 22.2%, 38.9%, 37.1%, and ResNet50 is increased by 14.9%, 32.8%, 28.1%, and VGG16BN is increased by 17.3%, 26.2%, 24.5%, respectively.

D. COMPARISON EFFECTS OF DIFFERENT CLASSIFICATION METHODS
We compared decision fusion with the individual CNN and traditional fusion methods. It contains 43 CNNs, and the traditional fusion methods include averaging and voting. The experimental results are shown in Table 3.
According to Table 3, CNNs have different performances on different evaluation metrics. It is difficult for a individual CNN to perform particularly well on all evaluation metrics. For example, in ACC, Xception can reach 97.7%. In AUC, DenseNet201 has a good performance, which can reach 94.9%. In AP, InceptionV3 can reach 88.7%. In SE, the outstanding performance is SE_ResNeXt50_32 × 4d, which can reach 83.5%. In SP, InceptionV3 performs well and can reach 98.3%. When we use a individual model to deal with problems, we often encounter problems in model generalization. The generalization ability of a model is limited due to some objective factors. In addition, after establishing a model, the model may perform better in solving certain problems, but the results may not be satisfactory when solving other datasets. Therefore, we use some methods to fuse excellent models to break the generalization ability of the individual model to unknown datasets, and combine the advantages of multiple models to obtain a better solution to the same datasets. For the generalization capability of a individual CNN model, we propose a method of decision fusion and compare it with the traditional fusion strategy. In general, the classification effect through fusion is better than a individual CNN. However, the CNNs selected during the fusion process are very important. If the selected CNNs are not well classified, the classification effect of the fusion model may be worse. According to Table 3, the CNNs selected in each block perform as well as possible on various evaluation metrics, especially on ACC and AUC. In addition, the CNNs in each block are as diverse as possible, that is, there are differences between different CNNs. In this experiment, three groups of CNNs were selected for averaging, voting and decision fusion. ResNet50, SE_ResNeXt50_32 × 4d, SE_ResNeXt101_32 × 4d, BN-Inception, SE_ResNet152. Because our proposed method requires multiple different CNNs and different CNNs will have different effects on our proposed method. We mainly compare the overall impact of each group of CNNs on our method, so not all CNNs need to be used. Therefore, the three groups of CNNs we selected have different classification effects as a whole. Our idea is to use 6 blocks with 5 CNNs in each block. First we divided the selected CNNs into three groups, each group has 15 CNNs. We number these 15 CNNs in sequence. The first three CNNs of each group are put into these 6 blocks. The missing two CNNs in each block are selected from the remaining 12 CNNs and cannot be repeated. Averaging1, Averaging2, Averaging3, Vote1, Vote2, Vote3, Decision Fusion1, Decision Fusion2, and Decision Fusion3 represent the averaging, voting, and decision fusion strategies based on the first, second, and third groups of CNNs, respectively. According to Table 3, based on the first group of CNNs, the classification of the model using the voting method is not good, and the SE is only 44%. Because most of the first group of CNNs are not very effective, they affect the effect of the fusion to some extent. The third group of CNNs selected CNNs with better classification effect than the first group and the second group. In ACC, Averaging3 and Decision Fusion3 performed very well, reaching 99.5%. In SP, Vote3 and Decision Fusion3 can reach 99.6%. In other evaluation metrics, Decision Fusion3 performed better than other CNNs and fusion models. For example, in AUC, Decision Fusion3 is 4% better than DenseNet201. In SE, Decision Fusion3 is increased by 54.3% compared to Vote1. Therefore, it is meaningful to adopt a fusion strategy, but it is necessary to pay attention to the classification effect when selecting the merged CNNs. From the perspective of the third group of CNNs, Decision Fusion3 and Averaging3 performed equally well in ACC, but Decision Fusion3 has some improvement in other evaluation metrics. For example, in terms of AUC and SE, Decision Fusion3 increased by 0.9% and 1.1%, respectively. Decision Fusion3 and Vote3 behave similarly on the SP, but in other respects Decision Fusion3 has a significant improvement, especially in the SE aspect increased by 21.8%. The decision fusion proposed by us has a certain improvement compared with the traditional fusion strategy.

E. COMPARISON OF LOSS
In general, the larger the loss function value, the smaller the classification probability of the classifier on the real label, and the worse the performance. Through ten-fold cross validation, the loss is used to measure the quality of the model. Several typical CNN models were selected to compare the loss, which is presented in Fig. 7.
It can be seen from the loss curve that Densnet201 has the fastest convergence rate, and the loss is stable at 0.23 after about 25 epoch, and other CNN models converge after about 45 epoch. According to Table 3 and Fig. 7, DenseNet has a good performance and we found that the deep and more recent CNN model usually performs better in dermoscopy image classification, but not all.

F. DERMOSCOPY IMAGES CORRESPONDING TO PATIENT INFORMATION
Some of the dermoscopy images in the dataset have patient information such as age, gender, and lesion location. We counted various classes of information and the number VOLUME 8, 2020 of patients with various classes of skin cancer, as shown in Table 4.
According to Table 4, men with BCC and SCC were about 1.5 and 2 times as likely as women respectively. The location of lesions with NV is mainly concentrated in anterior torso and lower extremity. The location of BCC is mainly found in head, neck and anterior torso. DF is more likely to appear in lower extremity. Skin cancer patients are mainly middleaged, and infants have a higher probability of having NV than other classes of skin cancer.

IV. DISCUSSION
Deep learning provides an end-to-end learning paradigm. The whole learning process does not divide the artificial subproblems, but completely passes it to the deep learning model to directly learn the mapping from raw data to expected output. Traditional image recognition problems often use divide and conquer. It is divided into several steps such as preprocessing, feature extraction and selection, and classifier design. The motivation of the divide and conquer method is to divide the mother problem of image recognition into several small sub-problems that are simple, controllable and clear. Although the optimal solution can be obtained on the sub-problem, the optimal solution on the sub-problem does not mean that the final solution of the global problem can be obtained. In contrast, end-to-end learning eliminates the need for data annotations to be performed before each independent learning task is performed. The cost of labeling samples is expensive and error-prone. The quality of medical image datasets are important for deep learning, but medical image datasets are usually unevenly distributed and lacking, so that the model can not complete the training, which easily leads to over-fitting and thus reduces the effect. Although researchers now share data to alleviate this problem, medical image data involves issues such as personal privacy, which poses a huge challenge to obtaining a large, diverse, balanced set of medical image data. In addition, some medical image cases are few and less accessible. In the medical field, our proposed method can help doctors in diagnosis and treatment and enhance medical image datasets.
Considering the characteristics of deep learning and medical image datasets, GANs can solve this problem to some extent. The images generated by GANs do not involve sensitive issues such as personal privacy, and at the same time provide high-quality quality datasets for deep learning. As a more mature model in deep learning, CNNs have a outstanding performance in image processing. StyleGANs can learn advanced attributes and random variables in images to generate high-quality images. CNNs are able to extract key features from these generated high-quality images to improve the classification of images. In addition, we use a complete dermoscopy image to ensure that the image information is complete. CNNs can be trained better by setting the weight of loss function and preprocessing images. Although CNNs have strong feature extraction capabilities, a individual CNN has certain limitations, and it does not guarantee good performance on multiple issues. Therefore, we propose a method of decision fusion. Through transfer learning, based on multiple pre-trained CNNs, we use block idea to combine multiple CNNs and finally make decisions through multiple blocks. The method of decision fusion can solve the generalization capability of a individual CNN model, and is more robust and stable than the traditional fusion strategy. However, decision fusion has certain requirements for selecting CNNs, and selects CNNs with better effects as much as possible, so as to maximize the performance of decision fusion. Medical images contain a lot of information, shape contour information is an important basis for the classification of skin cell damage classes, and complete medical images are very meaningful for research. In addition, the patient information corresponding to the medical image also has a certain promotion and guidance for the research.
The computer plays an auxiliary role in the diagnosis. When encountering some important problems, it is necessary to combine the computer with the human experience. The significance of computer-aided diagnosis can not only alleviate the pathological reading problems of primary clinicians, but also enhance people's health awareness. The Internet and smart devices are very popular. Patients can be better detected and treated by transplanting computer-aided diagnostics to mobile devices. For skin cancer, if the treatment is timely and timely, the curative effect and prognosis are better.

V. CONCLUSION
The dermoscopy image presents a huge challenge to automatic classification due to its own complexity. In this paper, we use StyleGANs to train based on the ISIC 2019 dataset. The trained StyleGANs generate high-quality images in batches, and select more realistic images to add to the original dataset. Based on the enhanced dataset, we used 43 pretrained CNNs for training. Three groups of CNNs were selected from the 43 trained CNNs for comparison of different fusion strategies. Through experiments, it is found that the selection of CNNs with better classification results is beneficial to the effect of fusion. The decision fusion has a significant improvement over multiple evaluation metrics for a individual CNN and is more stable and improved in evaluation metrics than the traditional fusion strategy. The method we proposed not only can automatically classify dermoscopy images, but also enhance the dataset and alleviate the uneven distribution of dermoscopy images. In general, Skin cancer will have a better prognosis if it is found and treated earlier.
Computer-aided diagnosis can not only help doctors, but also enhance people's health awareness.