Skin Cancer Detection Using Combined Decision of Deep Learners

Cancer is a deadly disease that arises due to the growth of uncontrollable body cells. Every year, a large number of people succumb to cancer and it’s been labeled as the most serious public health snag. Cancer can develop in any part of the human anatomy, which may consist of trillions of cellules. One of the most frequent type of cancer is skin cancer which develops in the upper layer of the skin. Previously, machine learning techniques have been used for skin cancer detection using protein sequences and different kinds of imaging modalities. The drawback of the machine learning approaches is that they require human-engineered features, which is a very laborious and time-taking activity. Deep learning addressed this issue to some extent by providing the facility of automatic feature extraction. In this study, convolution-based deep neural networks have been used for skin cancer detection using ISIC public dataset. Cancer detection is a sensitive issue, which is prone to errors if not timely and accurately detected. The performance of the individual machine learning models to detect cancer is limited. The combined decision of individual learners is expected to be more accurate than the individual learners. The ensemble learning technique exploits the diversity of learners to yield a better decision. Thus, the prediction accuracy can be enhanced by combing the decision of individual learners for sensitive issues such as cancer detection. In this paper, an ensemble of deep learners has been developed using learners of VGG, CapsNet, and ResNet for skin cancer detection. The results show that the combined decision of deep learners is superior to the finding of individual learners in terms of sensitivity, accuracy, specificity, F-score, and precision. The experimental results of this study provide a compelling reason to be applied for other disease detection.


I. INTRODUCTION
The normal life cycle of cells in the human body follows a systematic order. Its creation, functional time, and death should follow the order to make the body function properly. When the order gets disturbed it results in the production of various kinds of diseases one of the diseases is known as cancer.
The associate editor coordinating the review of this manuscript and approving it for publication was Humaira Nisar .
Cancer can take birth anywhere in the human anatomy, which may consist of trillions of cellules. In cancer, some parts of the body cells start the division process without blocking and these cells diffuse into surrounding tissues. Typically, in human beings cells expands in number and divide too, to produce new cells as per body requirements. During the process cells grows, get older or become defiled, consequently, the cell dies, and are replaced by new and fresh cells. The systematic precise cell breaking process gets destructed once cancer develops. As a result, the threshold of cell abnormality and damage develops drastically. Cell survival exists only when the old cells die, and new cells are only produced when they are required. If the cells are not required, then these unnecessary cells split without stopping and may cause the growth of a tumor [1].
Two types of tumors are triggered by cancer i.e. benign and malignant. Mostly, the tumors which consist of cancerous cells are malignant. Malignancy means that the cells spread or attack tissues near the tumor. Malignant tumors split and a few cancerous cells move to distant parts in the body over the lymph system or blood resulting in the fabrication of new tumors distant from the mother tumor. In contrast to malignant tumors, benign tumors, the second type doesn't split in the body, or capture tissues. It is notified that benign tumors are usually larger. But once benign tumors are removed, they cannot grow up back, while malignant tumors after surgery may sometimes do. Many benign tumors in the body elsewhere are mostly not dangerous, but the benign tumor in the brain is a vital threat to life [2]. The dermoscopy image samples for benign and malignant are shown in Fig. 1, Fig. 2, respectively.
The skin has the ability to protect the human body against, sunlight, heat, injury, and viral or bacterial infections. The skin helps the body in maintaining body temperature and also hoards fat and water. Skin Cancer is a very common kind of cancer and is marked as the most major public health issue. Skin cancer can start at any place on the body's skin, but it is normally initiated in the skin exhibit to sunlight. There are assorted layers of skin. But usually, skin cancer initiates in the outer layers known as the epidermis layer.
There are many different categories of skin cancer among them basal cells and squamous cell skin cancers are known as non-melanoma skin cancers. Non-melanoma skin cancer has an efficient response to treatment, also it hardly invades the other segments of the body. The most threatening skin cancer type is melanoma among the other kind of skin cancer. Melanocytic lesions, such as melanoma, and nonmelanocytic lesions, such as basal cell carcinoma, are two types of malignant skin lesions. The most aggressive, severe but less frequent skin cancer is known as melanoma [2]. If melanoma skin cancer is not diagnosed timely, so most probably it will capture close by tissues and invade different parts of the body. An increase in the figure of melanoma cases is recorded every year. Melanoma Foundation [3], a well-known cancer institute predicted 9730 deaths due to melanoma in the United States. Additionally, it estimated a 200% increase in recorded cases of 87,110 since 1973.
Machine learning and deep learning-based techniques for cancer detection using image data and protein sequences have been developed in the literature. Machine learning techniques requires human-engineered features. The manual feature extraction is based on expert's subjectivity, which is very tedious and time-exertion activity. This problem has been resolved by the deep learning approaches to some extent. Deep learning approaches perform automatic feature engineering. Due to this property, in current years, deep convolutional neural networks (DCNN) gained popularity and researchers have been using DCNN to solve problems in all domains including medical image classification. On the other hand, ensemble learning methods have recently been presented as a way to improve classification performance [14], [15], [16].
Unfortunately, due to the unavailability of huge amounts of medical data, the performance of deep learning models is very limited. As there aren't enough labeled medical images that's why researchers commonly use transfer learning approaches. Transfer Learning is concerned with storing knowledge gained while solving one problem and applying it to another but related problem. Further, the performance of the individual learners is limited to performing decision-making on sensitive issues such as cancer detection. This problem can be solved by combining the decision of individual learners. The combined decision is expected to be more accurate than the individual learners. The detection accuracy of skin cancer can be increased by combining the decision of discrete learners. In the presented work, an ensemble of deep learners using VGGNet, CapsNet, and ResNet has been developed. The results reveal that the ensemble model proposed performed well than the individual deep learners.
The rest of the paper is arranged as; related work is presented in section II, proposed methodology has been described in section III, results and discussion are presented in section IV, after that the conclusion along with the future work are reviewed in section V.

II. RELATED WORK
There have been enormous amount of work done for skin cancer detection using machine learning and deep learning approaches. Machine learning approaches perform skin lesion detection by extracting the manual features from dermoscopy images. Waheed et al. devised a machine learning technique for detecting melanoma in dermoscopic images in [4]. Mohsin et al. performed cancer detection by using discriminating information from mutated genes in protein amino acids patterns in [5]. Abdul M. et al developed a cancer prediction technique using nearest neighbor and support vector machine in [6]. Melanoma detection has been performed in [7] using a support vector machine. For cancer classification, they used segmented images. These ML approaches requires handcrafted features and limited to the expertise of dermatologists.
On the other hand, DL models provide automatic features extraction and have witnessed significant performance for various medical image classification tasks. Alizadehet al. [34] proposed a novel system by using a deep learning technique for the automatic detection of skin cancer. They detect cancer by ensemble approach by combining two CNN models with other classifiers and image texture feature extraction. The system was then evaluated by different evaluation metrics on ISIC 2016, ISIC 2019, and PH2 datasets. Jaing et al. [35] proposed a new deep-learning bases model called residual attention network to differentiate 11 types of skin diseases VOLUME 10, 2022  that are trained on 1167 histopathological image datasets gathered by them over the time span of 10 years. They use reinforce feature learning to obtain an area of interest in an image and then a class activation map to get a visual explanation from their proposed network. Zhang [36] proposed a model to detect melanoma by deep learning CNN model EfficentNet-B6 trained on ISIC 2020 dataset. They claimed to use this model first time for skin cancer detection with transfer learning, they use Area Under the receiver operating characteristic curve metrics for evaluation of their system. Yuan et al. performed segmentation of skin lesions by using deep CNN [8]. Yu et al. used very deep residual networks to perform automatic melanoma identification in dermoscopy images [9]. Bi et al. employed multi-stage fully convolutional neural networks carried out dermoscopic image segmentation [10]. Ulzii-Orshikh Dorj et al. performed the categorization of skin cancer via the convolutional neural network in [11]. Esteva et al. performed dermatologist-level categorization of skin cancer by using deep learning [12]. Mahbood et al. carried out a classification of skin lesions using a combination of deep neural networks [13].
Ensemble networks are commonly used by the researches nowadays in order to improve the classification performance. Generally, the various models are trained individually and the results are achieved by combining predictions from multiple models using majority voting and stacking techniques. Aboulmira et al. [33] proposed an ensemble network for skin lesion classification. The features are extracted by employing the individual models and then the various models are combined in order to achieve a better classification rate. The proposed ensemble of seven predictors has been evaluated on the ISIC-2018 publicly available dataset and yields better performance as compared to existing methods. The early detection of skin cancer has been performed using an improved capsule network (CapsNet) namely FixCaps in [32]. The proposed method obtained a larger receptive field as compared to the baseline CapsNet with a large kernel size of 31*31 not only improved its detection performance but also reduces the computational overhead. Cao et al. [31] proposed a novel inter-pixel correlation learning (ICL) network for early-stage skin lesion detection by preserving both the short-term and long-term correlations. The proposed model is based on the encoder-decoder architecture in which local semantic correlations are strengthened using local neighborhood metric learning (LNML) and global information is captured using pyramid transformer inter-pixel correlations (PTIC). This two-stage framework increases the inter-class variance and intra-class consistency to obtain better segmentation performance evaluated on public challenge datasets. Javaid et al. [45] proposed machine learning based skin cancer segmentation and classification using dermoscopic images. The image segmentation has been performed using OTSU thresholding method, and for ML model classification various features are extracted i.e. gray level co-occurrence matrix (GLCM), histogram of oriented gradient (HoG), and color features. In another ML based, skin cancer diagnosis has been performed based on the images of skin [46]. Firstly, the skin has been detected using median filter, and then the Mean shift segmentation is used for segmentation.
Araújo et al. [37] proposed a deep learning-based system that uses the deep learning model U-net for the classification of skin cancer. They develop a system of automatic segmentation of skin cancer images along with U-net that is further combined with some post-processing techniques which aim to improve and restore images. They use PH2 and DermIS datasets for training their network and after that validate on sensitivity, specificity, accuracy, AUC, Dice, and Jaccard. Subramanian et al. [38] employed a CNN to classify and detect different types of skin cancer. Raja uses HAM10000 (''Human against Machine with 10000 training images'') which contains 10015 skin cancerous images. After that, they validate their model by comparing their system with other state-of-the-art models based on accuracy, precision, recall, and F score. Ali et al. [43] proposed a deep convolutional neural network model for the classification of skin cancer. Before using the model they preprocess the dataset to remove noise for better results. They evaluate their model with other deep learning-based models like AlexNet, ResNet, VGG-16, DenseNet, MobileNet, etc. after training their model on the HAM1000 dataset. Hemsi et al. [47] performed accurate skin cancer detection using deep neural networks. The proposed framework evaluated on HM10000 dataset is used to create both the plain and hierarchal classifier in order to classify seven moles of skin cancer. An improved VGG model for skin cancer diagnosis has been mentioned in [48]. The baseline VGG model has been improved by adding batch normalization and fully connected network to improve the diagnostic performance.
Bajwa et al. in [17] proposed ensemble model by using different optimized model as ResNet-152 [18], SE-ResNeXt-101 [19], DenseNet-161 [18] and NASNet [19] using the ISIC dataset to classify seven different types of skin cancer. The ensemble is a machine learning strategy that increases classification accuracy by combining the decisions of numerous independent learners [20]. The ensemble model takes advantage of the diversity of separate models to generate a combined judgment; as a result, the ensemble model is projected to improve classification accuracy [21], [22]. Pacheco et al. [39] performed skin cancer classification by using different deep learning models along with combining metadata with the images by using the information fusion technique. They trained EfficientNet, DenseNet-121, MobileNet-v2, ResNet-50, and VGG-13 on two different datasets that are ISIC 2019 and PAD-UEFS-20 along with the patient's different clinical features like age, gender, and anatomical region cancer history and skin prototype. Jusman et al. [40] trained two deep learning models VGG-16 and Multi-layer Perceptron for the classification of skin cancer by using the HAM10000 dataset that contains 10015 dermoscopic skin cancer images. They compare both models based on accuracy and time taken by the model and predict that VGG-16 gives the best performance than the multilayer perceptron. Togacar et al. [41] proposed a novel model to detect skin cancer by using the deep learning convolutional model MobileNetV2 along with the spiking network to give the best results and restructured the dataset for training with an auto encoder model. They use the ISIC skin cancer dataset that is consist of 1497 malignant tumor images and 1800 benign images. Nawaz et al. [42] proposed a deep-learning-based strategy, including faster region-based convolutional neural networks (RCNN) and fuzzy k-means clustering, to present a fully automated method for segmenting cutaneous melanoma at its earliest stage. Before applying the model they preprocess the dataset to remove noise and other illumination problems. ISBI-2016, ISIC-2017, and PH2 datasets are used to train and evaluate the model. The twoclass (benign, malignant) skin cancer classification has been performed in [9], [23], [24], and [25].
The performance of the individual learners is restricted to decision-making, which may be overcome by merging the decisions of individual learners. The combined decision is predicted to be more accurate than the individual learners. The detection accuracy of skin cancer can be improved by merging the decisions of discrete learners. In this research, an ensemble of deep learners was constructed by using VGGNet, CapsNet, and ResNet, and the results demonstrate that the proposed ensemble outperformed the individual learners.

III. PROPOSED METHODOLOGY
The proposed deep learning-based ensemble approach is developed in two stages. In the first stage, three deep learning models of VGG, CapsNet, and ResNet have been developed using malignant and benign images obtained from the International Skin Imaging Collaboration (ISIC) skin cancer images repository. In the second stage, the findings of deep learners have been combined using majority weighting. The block diagram of the proposed for skin lesion detection is shown in Fig. 3.

A. DATASET
The Dataset consisting of cancerous and non-cancerous images is downloaded from the 'International Skin Imaging Collaboration (ISIC) images repository [26]. ISIC is a collaboration between academia and industry to aid in the development of digital skin imaging applications that will help to reduce life cycle of melanoma. ISIC is developing principles to address the terminologies, technologies, and techniques that skin imaging uses with special attention to the major issues like interoperability and privacy. ISIC 2019 repository contains dataset of 25000 images. The dataset consists of different types of cancerous and non-cancerous images. In proposed ensemble approach binary class classification has been performed using 3000 malignant and 2800 benign images. As this dataset contains only 2800 benign images that's why malignant images are taken equal to the number of benign images to avoid biasedness of algorithm. The dataset is divided into training data consisting of 80% of total images. Remaining images have been used as the test data. The dataset consists of images of different sizes. Images have been resized to 224 × 224 × 3 in proposed approach.

B. DEEP NEURAL NETWORK
In order to develop the proposed ensemble, three different convolution-based deep neural network models of VGG, CapsNet, and ResNet have been developed. Architectural development details of the models are described below: One of the most widely used CNN models is VGGNet. The reason for the popularity of the VGG model is; its easiness, simplicity, and the use of small-sized convolutional kernels that make the VGG model a popular deep learning model.
For the extraction and classification of features, the VGGNet architecture employs a 3 × 3 convolution kernel with max-pooling and ReLU layers, as well as three fully linked layers. The use of smaller kernels in the design helps in fewer amount of parameters and, as a result, more efficient training and testing. Furthermore, the effective receptive fields can be expanded to bigger values by stacking a sequence of 3 × 3 sized kernels (e.g. 5 × 5 with two layers, 7 × 7 with three layers, and so on). Most crucially, smaller filters allow more layers to be stacked, resulting in a deeper network and higher performance on vision tasks. This effectively conveys the architecture's central notion, which encourages the use of deeper networks for better feature learning.
The VGG model layer consists of five blocks after the input layer. In the proposed model, at the input layer VGG model reads pre-processed images of size 224 × 224. After the input layer, the first block of the layer starts containing 2 convolutional layers followed by the pooling layer. The first convolutional layer consists of 64 filters. After polling, the resultant image reduces to size 112 × 112. Each block of layers has the almost same pattern of layers. In the second block, a first and second convolutional layer has 128 filters followed by the pooling, and features are reduced to a size of 56 × 56. The third block consists of three convolutional layers with 256 filters followed by max-pooling to reduce the feature map to 28 × 28. The fourth block also has 3 convolutional layers, with 512 filters then there's the pooling layer, which shrinks the feature map down to a size of 14 × 14. The fifth and the last block has three convolutional with the 512 filters followed by the pooling layer that further reduces the feature map to 7 × 7. The VGGNet detailed architecture is shown in Fig. 4 2) CAPSULE NETWORK (CapsNet) Convolutional neural networks have been very successful in deep learning areas. There are many properties of convolutional neural networks that are contrary to the human brain and make them work ineffectively. The schematic of CapsNet architecture is given in Fig. 5 First, complex engineering systems may have different levels of structure. Neural networks have a very low level of structure. The layers of neurons that are in the neural network are opposite to our cortex membranes. Another drawback of neural networks is that it has no clear understanding of the entity. Capsule the network proposed by Jeffrey Hinton addresses the above-mentioned issues. It is an attempt to more closely mimic the organization of neurons in the human brain. The Capsule network uses the dynamic routing between the capsule which gives the complete benefit of an inherent spatial and dimensional relationship. The CapsNet gives the ability to better understand when there is any change in the image, thus CapsNet model has more generalization ability. The drawback of spatial invariance in CNN has been removed in the capsule network using the routing-by-agreement subsampling technique instead of using max-pooling. The capsule network consists of one convolutional layer, after that layer, there are primary capsules and then digital capsules.
The output of convolutional layer of CapsNet can be calculated as: is the input of the (i,j) th neuron in k th the feature map x o,k denotes the displacement of the neurons in the k th feature map, represents the size of kernel, x k,s,t is the weight coefficient of (s,t) th synaptic connections of neurons in the k th feature map. x Represents the value of the input signal of (i,j) th neuron of the input zone.
In the digit caps layer, the output for the j th capsule can be calculated by using squashing function. Short vectors are shortened to a length of slightly less than 1 by using the squashing function. It is calculated as: where S j is the component of the j (th) capsule in the digit caps layer: where C i,j is the weighting factor of degree of coherence between the i (th) capsule of the primary layer of capsule network and the j (th) capsule in the digit caps layer of capsule networkû j|i is predicted outcome of the i (th) capsule in primary capsule layer:û where W i,j is the weight coefficient matrix between i (th) capsule, in the primary layer of CapsNet and the j (th) capsule of digit caps layer of CapsNet, U i is the vector output of the i (th) capsule of primary capsule layer. The softmax function is given in equation 3.10.
b i,j is the logarithm probability between the i (th) capsule in the primary capsule layer and the j (th) capsule in the digit caps layer, b i,j is iteratively calculated by using the following equation.
where b i,j is the corrective coefficient. In the suggested model's learning process the k (th) capsule in the primary layer has a long vector-only for images with the corresponding expressions. In order to achieve the functionality of the form is minimized: In this equation k (th) is the quantity of capsules in every single primary capsule layer, value of λ = 0.5, value of m + = 0.9 and m − = 0.1 capsule corresponds to cancerous image. The value of the T k = 1 only when input image has cancerous features, otherwise T k = 0.

3) ResNet
The deep neural network model incorporates residual learning. The model's most essential components are the convolution and pooling layers, which are completely coupled and placed one upon the other. The Residual Network is distinguished from the standard network by the identity connection between its tiers. The residual block of the ResNet is depicted in Fig. 6, to bypass one or more layers in the ResNet, it introduces the ''skip connection'' and ''identity shortcut connection'' in the model. The detailed architecture of deep residual network (ResNet) is presented in Fig. 7  ResNet models, residual block F(X ) can be expressed mathematically as: In the given equation Y is the output, X is the input, and F is the function that is applied to any input that is passed to the residual block. It has weight layers represented by W i in which i must be greater than 1 but less than the number of layers present in the residual block. However, when several layers are equal to 2 then the term F(X , W i ) can be dumb down and written as: where σ represents the ReLU activation function used by the model. Another non-linearity added with identity mapping after the addition is F(X ) = σ (Y ) it doesn't use extra parameters. The building block of a residual network can be constructed as: where Y and X are the output and input vector, F is the residual mapping that has to be learned. If the dimension of X and F are not equal, linear projection W s X can be performed by the given shortcut connections so that dimension can be matched.
Layers of ResNet are grouped in 4 parts as shown in Fig. 6. Initial convolution layers have filters of size 7 × 7 and 3 × 3, followed by max-pooling. The first group has three further parts or residual blocks each sub-block consists of 3 convolutional layers with a size of kernels 1 × 1, 3 × 3, and 1 × 1, respectively. The number of kernels used at each layer of the sub-block is 64, 64, and 128. The second group of layers has four residual blocks each consisting of 3 convolutional layers with a size of kernels 1 × 1, 3 × 3, and 1 × 1, respectively. The number of kernels used at each layer of the sub-block is 128, 128, and 512. The third group consists of 18 layers and six further residual blocks each consisting of 3 convolutional layers of the same size as in residual blocks 1 and 2. The number of kernels used at each layer of the sub-block is 256, 256, and 1024. Layers are stacked over one other with a different number of kernels. The fourth and final group consists of 9 layers and three residual blocks with each sub-block consisting of 3 convolutional layers with a size of kernels 1 × 1, 3 × 3, and 1 × 1, respectively. The number of kernels used at each layer of the sub-block is 512, 512, and 2048. The model uses average pooling and layers with the softmax for classification.

4) PROPOSED ENSEMBLE MODEL
In this stage, as shown in Fig. 7, the dataset S = < Z (i) , t (i) > n b i=1 is separated into two datasets: training consist of 8% randomly selected images of the dataset. The remaining 20% images form the test dataset S ts = < Z (i) , t (i) > i=n ts i=1 . The training dataset is then used for the development of the classifiers of VGG, CapsNet, and ResNet. Then test data is used to evaluate the learners. After that when individual models are developed, the decision of the each models is obtained for every image X , i.e.d (i) The predicted labels construct the individual decision map of M VGG , M CapsNET and M RestNet . For the proposed binary classification the decision belongs to cancerous/Noncancerous class as d Using the predicted label of individual learners t (i) individual the combined decision of deep learners using majority voting is obtained by: There: k ∈ c j 0 otherwise (13) After that, the image will be assigned to the class that will gain the most votes. VOLUME 10, 2022 FIGURE 7. The block diagram of deep residual network [44].

C. DIFFERENT MEASURES USED TO EVALUATE PERFORMANCE
The following quality measures were used to assess the suggested technique's performance:

1) ACCURACY
The classifier's ability to properly anticipate the class labels is measured by accuracy. It's calculated as follows: 2) SENSITIVITY Sensitivity and Specificity are the most extensively used parameter in epidemiological and medical research, but most of the statisticians in mathematical fields are unaware of them. It assesses the classifier's ability to correctly speculate the positive class. The value of sensitivity is calculated as follows:

3) SPECIFICITY
The classifier's ability to properly predict the negative class is measured by specificity. The term specificity is defined as follows:

4) F-SCORE
F-score is used to measure the statistical tests. To compute the prediction accuracy, F-score utilizes Recall and Precision.
The weighted average of recall and precision can also be used to measure the F-score. The number of right guesses divided by the total number of predictions is how the recall is calculated. Precision is defined as the number of accurately predicted predictions divided by the number of predictions returned. The value of the F-score is calculated by: where, Precision = TP TP+FP While Recall = TP TP+FN

5) CONFUSION MATRIX
A confusion matrix represents the righteousness and falsehood of the machine learning approach. The size of the confusion matrix is directly proportional to the number of things to be predicted. The rows of the confusion matrix show the prediction of the machine learning algorithm on the other side columns of the confusion matrix represent the actual value of known truth. Then the top left corners as shown in Fig. 8 contain a true positive, and a true negative is at the bottom right-hand corner. While the left-hand bottom corner of Fig. 8 contains a False Negative, the top right-hand corner contains the False Positive.

IV. RESULTS AND DISCUSSION
The Table 1 expresses the performance analysis of the proposed model including individual deep learners and deep learners based ensemble system developed in [27]. Along with that, the outcome of the proposed approach is compared too with the individual machine learning approaches developed in [7]. It has resulted from Table 1 that VGG, CapsNet, and ResNet provide the accuracy values of 79%, 75%, and 69% respectively. The ensemble model that has been presented is having prediction accuracy of 93.5%. Table 1 also shows that sensitivity values of VGG, CapsNet, and ResNet are 66%, 72% and 75%, respectively. Moreover, the proposed ensemble model has a sensitivity value of 87.25%. By considering mentioned values, the proposed ensemble model classifies the cancerous images even more accurately in comparison to the individual learners. However, it is discovered that the specificity values of VGG, CapsNet, and ResNet are 93%, 77% and 63%, respectively and the ensemble system that has been proposed, has a lower specificity value of 84% that reflects the proposed model has detected cancer more accurately by combining the decision of individual learners and by exploiting their diversity. While at the same time, it is noticed from table that the proposed ensemble model has a higher F-Score, lower False-positive, and higher precision values of 92%, 16%, and 94% respectively as compared to individual learners. To add further, Table 1 also describes the performance of the proposed approach which is compared with the deep learning-based ensemble system developed in [15]. It can be notify from the table that the proposed ensemble performs better than the ensemble approach developed in [27] in terms of accuracy, sensitivity, and specificity. On the other hand, Table 2 describes the data about the performance comparison of the proposed deep learning ensemble with the individual machine learning approaches developed using the ISIC dataset in [7]. The authors in [7] perform the cancer detection on the segmented image. After that it is observed from the Table 2 that the proposed ensemble performs better than the individual machine learning model of support vector machine (SVM), K-nearest neighbor (KNN), Naïve Bays (NB), and random forest-based ensemble approach in terms of accuracy, precision, recall, and F-Score. This table shows that the SVM has higher accuracy as compared to the ensemble model proposed. The reason behind that includes the factor that the SVM is developed using segmented images. Table 3 compares the performance of various deep learning models with the proposed model. It can be observed from Fig. 9 that the proposed model has outperformed state-of-theart methods in terms of various performance metrics such as accuracy, sensitivity, specificity, etc. Fig. 10 to 12 shows the test and training accuracies of the individual learners. The accuracy of the VGG model is shown in Fig. 10. It is observed from the figure that the training Accuracy of VGG is up to 0.80 and test accuracy is 79%. The accuracy of the CapsNet model is shown in Fig. 11. It is observed from the figure that the training accuracy of the CapsNet model is 75% and test accuracy is 73%. Fig. 12 shows that the training accuracy of the ResNet model is   80%. Fig. 13 to Fig. 15 show the training and test loss of the individual learners. It is observed from the figures the VGG, CapsNet, and ResNet have a training loss of 0.40, 0.41, and 0.35 respectively.
A confusion matrix represents the righteousness and falsehood of a machine learning algorithm. The confusion matrix of VGG is shown in Fig. 16. It is noticed from the figure that the VGG model has more classification errors in the   No-cancerous class. The confusion matrix of CapsNet is shown in Fig. 17. It is observed that the CapsNet model has more classification errors in the Non-cancerous class. Fig. 18 shows the confusion matrix of ResNet.   It is perceived from the confusion matrix that the ResNet has more misclassification in the Cancerous class. Fig. 19 shows the confusion matrix of the ensemble model proposed. It can be seen from the confusion matrix that the proposed ensemble model has more classification errors in the non-cancerous class. It is spotted from the confusion matrix that the proposed model contains less classification error of both cancerous and non-cancerous classes as compared to individual models.

A. ABLATION STUDY
In this section, we have presented the performance analysis of the proposed ensemble network of VGG16, ResNet and CapsNet. The model performance is evaluated on ISIC public dataset for skin cancer classification from dermoscopic  images. The training evaluation is performed using different batch sizes, and the optimal hyperparameters such as 40 epochs, a learning rate of 0.001 with a batch size of 32, and a stochastic gradient descent (SGD) optimizer are selected for the proposed model. Fig. 9 shows the accuracy of proposed ensemble network as compared with individual learners. From Fig. 9, it can be analyzed that the proposed model achieved the best accuracy of 93.5%. Fig. 10 to Fig. 12 shows the accuracy curve for training and testing accuracy of individual learners. Fig 13 to Fig. 15 shows the validation loss evaluation of VGG16, CapsNet and ResNet, respectively. The best loss function of 0.06 was also achieved for the proposed ensemble. The classification training time of VGGNet, Cap-sNet, ResNet, and the proposed method is 106s, 136s, 188s,  and 109s respectively. The graphical illustration of various model. There is slight trade off between the VGGNet and the proposed model in terms of accuracy and training time, and the proposed model is more weighted because of higher performance difference.

V. CONCLUSION
The malignant lesion is the leading cause of death due to skin cancer. If it is diagnosed in the early stages its treatment may be possible. In literature, deep learning approaches have been used to detect cancer but the performance of the individual learners is limited. The performance can be enhanced by combining the decision of diverse individual learners for decision-making on sensitive issues such as cancer. This paper developed an ensemble model to detect skin cancer. It is developed by combining decision the three deep learning models of VGG, Caps-Net, and ResNet. It is noticed from the results that the proposed ensemble achieved an average accuracy of 93.5% with a classification training time of 106 s. The proposed model performs better than individual learners with respect to different quality measures i.e. sensitivity, accuracy, F-Score, specificity, false-Positive, and precision. In future, we are intend to study the achievement of reinforcement learning-based techniques for skin cancer detection.
AZHAR IMRAN received the master's degree in computer science from the University of Sargodha, Pakistan, and the Ph.D. degree in software engineering from the Beijing University of Technology, China.
He worked as a Senior Lecturer at the Department of Computer Science, University of Sargodha, from 2012 to 2017. He is currently an Assistant Professor at the Department of Creative Technologies, Faculty of Computing & Artificial Intelligence, Air University, Islamabad, Pakistan. He is a renowned expert in image processing, healthcare informatics, and social media analysis. He has over nine years of national as well as international academic experience as a full-time Faculty, teaching courses in software engineering, and core computing. He has delivered guest talks, conducted seminar and trainings at numerous national and international forums in past. He has contributed in multiple international conferences in diverse roles (keynote speaker, technical/committee member, registration, and speaker). His research interests include image processing, social media analysis, medical image diagnosis, machine learning, and data mining. He aims to contribute to interdisciplinary research of computer science and human-related disciplines. He is a Regular Member of IEEE, contributed with more than 40 research papers in well-reputed international journals and conferences. He is an Editorial Member and a Reviewer of various journals, including IEEE ACCESS, Cancers (MPDI), Applied Sciences, Mathematics, The Visual Computer (Springer), Biomedical Imaging and Visualization (Talyor & Francis), Multimedia Tools and Applications, IGI Global, and Journal of Imaging.
ARSLAN NASIR received the master's degree in computer science from Iqra National University, Islamabad, and the master's degree (Hons.) in information technology from the University of Sargodha, Pakistan. He is currently an IT Instructor at the Punjab Public Colleges, Sargodha, Pakistan. He has published various journals and conference papers. His research interests include image processing, bioinformatics, disease diagnosis, and deep learning.
MUHAMMAD BILAL joined the FAST National University of Computer and Emerging Sciences (NUCES), Chiniot-Faisalabad (CFD) Campus, Pakistan, as an Assistant Professor, in August 2021. Before joining NUCES, he worked as a Tutor for three years at the School of Computer Science and Engineering, Taylor's University, Subang Jaya, Malaysia. He worked more than four years as a Lecturer with the Department of CS and IT, University of Sargodha, Sargodha, Pakistan. He also worked as a Software Developer with Sofizar, Lahore. He has published several research papers in journals and conferences. His research interests include data mining, social computing, machine learning, information processing, social media data analytics, and software engineering. VOLUME 10, 2022 GUANGMIN SUN was born in China, in 1960. He received the B.Sc. degree in electronic engineering from the Beijing Institute of Technology, Beijing, China, in 1982, the M.Sc. degree in communication and information systems from the Nanjing University of Science and Technology, Nanjing, China, in 1991, and the Ph.D. degree in communication and information systems from Xidian University, Xi'an, China, in 1997. He is currently a Professor with the Beijing University of Technology, Beijing. His current research interests include neural networks and applications, image processing, and pattern recognition.
ABDULKAREEM ALZAHRANI (Member, IEEE) received the M.Sc. degree in advanced web engineering and the Ph.D. degree in computer science from the University of Essex, U.K., in 2011 and 2017, respectively. He is currently an Assistant Professor of computer science (AI) at Al Baha University. His research interests include artificial intelligence, computational intelligence, intelligent environments, autonomous agent, and multi-agent systems.
ABDULLAH ALMUHAIMEED received the bachelor's degree in computer science from Imam Muhammad Ibn Saud Islamic University, in 2007, and the M.Sc. and Ph.D. degrees in computer science from the University of Essex, U.K., in 2011 and 2016, respectively. He is currently an Assistant Research Professor of computer science at The National Centre for Genomics Technologies and Bioinformatics, King Abdulaziz City for Science and Technology (KACST). His research interests include semantic web, ontologies, artificial intelligence, machine learning, data science, sentiment analysis, recommendation systems, search engines, big data, neutral language processing, deep learning, and fuzzy logic.