Dermatologist-Level Classification of Skin Cancer Using Cascaded Ensembling of Convolutional Neural Network and Handcrafted Features Based Deep Neural Network

Skin cancer is caused due to unusual development of skin cells and deadly type cancer. Early diagnosis is very significant and can avoid some categories of skin cancers, such as melanoma and focal cell carcinoma. The recognition and the classification of skin malignant growth in the beginning time is expensive and challenging. The deep learning architectures such as recurrent networks and convolutional neural networks (ConvNets) are developed in the past, which are proven appropriate for non-handcrafted extraction of complex features. To additional expand the efficiency of the ConvNet models, a cascaded ensembled network that uses an integration of ConvNet and handcrafted features based multi-layer perceptron is proposed in this work. This offered model utilizes the convolutional neural network model to mine non-handcrafted image features and colour moments and texture features as handcrafted features. It is demonstrated that accuracy of ensembled deep learning model is improved to 98.3% from 85.3% of convolutional neural network model.


I. INTRODUCTION
The skin is the major tissue of the human body that covers about twenty square feet area. It covers the complete body and its thickness differs significantly over all parts of the body, and also varies between man and woman and the old and young one [1], [2]. For example, the average thickness The associate editor coordinating the review of this manuscript and approving it for publication was Wenming Cao . of the skin on the forearm is 1.26 mm in female and 1.3 mm in male. The skin shields against thermal, mechanical, and bodily harms. It also defends us against bacteria and the elements, and the presence of intercellular lipids prevents moisture loss [3], [4].
Over the last few decades, there has been an upsurge in the number of patients diagnosed with skin cancer.. Skin cancer sufferers must have early detection and regular diagnosis in order to survive. Though, a significant number of cases remain unobserved until it reach to advanced stages, which reduces the chances of survival. An appealing method for early recognition is to employ automated classification of dermoscopic images analysed via Computer Based Diagnosis (CBD) system [5], [6]. CBD is basically clinical decision support system that assists clinicians in the understanding of medical images. CBD is used as an instrument to deliver additional information to dermatologist, who takes final decision. Its primary goal is to increase the diagnosis accuracy and consistency of dermatologist by decreasing the false negative rate due to observational oversight, intra-observer and inter-observer variation. Most of the time two types of broad methodologies are deployed in CBD systems. The first stage is to get the position of the lesions. The next stage is to quantify the image features of abnormal and/or normal patterns. Usually, the computer based diagnosis system includes three basic components. The foremost is the image processing and analysis system that supports to enhance and extract the lesions by selection of the primary candidates of the lesions and apprehensive patterns [7], [8]. The second is the quantification of image features for example the size, colour, texture, shape and contrast of the pigments selected in the first step. It is essential to identify distinctive features that can discriminate consistently between a lesion and other usual anatomical structures. The last stage is feature processing which classifies between abnormal and normal patterns or identify skin lesion class, based on the features acquired in the second stage [9], [10].
The purpose of this study is to design a computer based melanoma lesion detection scheme that supports dermatologist as a decision support for melanoma classification. This paper provide a mechanism of feature fusion and suggests a classification framework by integrating ConvNet model with hand-crafted features as a cascaded ensembled model. In brief, significant contributions of this work include: • To design a cascaded ensemble of ConvNet and handcrafted features based deep learning model, • To propose a hybrid handcrafted features comprising texture features and colour moments, and • Design a model that integrates the strengths of models reliable on hand-craft feature extraction methods and deep learning model to demonstrate improved accuracy for dermatologist-level diagnosis of skin cancer in comparison to standalone ConvNet model. The following is how the paper is organised. Section 2 discusses the general structure of the skin and existing clinical melanoma diagnostic methods. Section 3 discusses the material and methods including handcrafted features, ConvNet model, proposed architecture and image dataset. Section 4 offers the simulations and performance metrics. Section 5 is about the conclusion and further research possibilities.

II. RELATED WORK
Several automated detection systems have been developed in the literature by researchers to moderate subjectivity and complications of clinical diagnosis of skin disease. Some of the approaches that have been followed by researchers are presented below.
In an early work, Friedman et al. have devised the ABCD acronym to offer the nonprofessional primary and public healthcare specialists with a valuable and remarkable mnemonic to support in the timely identification of possibly treatable malignant melanoma [11]. This abbreviation is abbreviated for asymmetry, border irregularity, colour variegation, and diameter. The ABCD rule is most appropriate to discriminate early, thin tumours from benign pigmented lesions. Later, ABCD rule has expanded to ABCDE by introducing one more term evolving by Abbasi et al. [12]. Here, evolving signifies a new or changing lesion. ABCDE highlights the consequence of developing pigmented lesions as melanoma. Dermatologist should be focused to deviations of size, outline, and symptoms such as itching, bleeding surface, and shades of colour in a patient with nevi. An attempt to increase self-screening investigations by joining the ABCDE rule and the ugly duckling mark is proposed by Jensen and Elewski as ABCDEF rule [13]. The added letter ''F'' stands for funny looking. The funny looking mole similar to the ugly duckling mark.
Most of the recent computer aided diagnosis methods evolved around these ABCD, ABCDE and ABCDEF rules. The border of the skin lesion was extracted by doctors and the features were constructed using ABCDE rule in a skin cancer recognition system proposed by Ercal et al. [14]. An Artificial Neural Network (ANN) classifier is utilized to categorize the skin lesions into benign or malignant categories. This system has achieved 80% accuracy. A procedure to identify benign lesions from malignant melanoma by the usage of macroscopic skin images is suggested by Alcón et al. [15]. In this work, first the exclusion of the low frequency elements of the skin image is utilized for background correction. Later, a thresholding using Otsu's procedure is performed to segmentation of the lesion area. 55 features are formulated and mined from the segmented lesion area by considering ABCD standards. A correlation-based feature selection method and adaboost classifier are applied feature selection. One decision support part is further added that focused on the use of the individual information including age, gender, sex, skin type, and part of the human body along with the output of classifier. This work has attained 86% accuracy.
Ramezani et al. have projected a system in which morphological operators are applied for thin and thick hairs elimination, pre-processing and post-processing [16]. Otsu's method of thresholding is used on blue channel of RGB images locally to determine the lesion area, and then, nine features are mined. These mined features comprise the boundary, colour distribution and shape. A classifier model is created based on statistical analyses of the algorithm outputs. Finally, 77% accuracy is attained by applying an optimal threshold on output index score. Garnavi et al. have used wavelet based texture features and contours for computer-aided melanoma analysis [17]. The texture, border and geometry features are VOLUME 10, 2022 mined that are based on wavelet-decomposition, boundaryseries model and shape indexes respectively. Four separate classifiers namely Support vector machine (SVM), Arbitrary Forest (RF), Logistic Tree Model (LTM), and Hidden Naïve Bayes (HNV) are applied for grouping. Ramezane et al. [18] have suggested a melanoma recognition system by the use of SVM classifier, using the features based on texture, boundary irregularity, asymmetry, colour distribution, and diameter of the skin lesion. Lopez et al. have solved the problem of skin lesion detection of melanoma using transfer learning. In transfer learning features, and weights from a formerly trained model are used for training of new model. Transfer learning is valuable when dealing with comparatively lesser datasets for examples medical images. Visual Geometry Group (VGG) ConvNet architecture is used for transfer learning in this work [19].
Xie et al. [20] have projected a classification framework for diagnosis of melanocytic tumours as benign or malignant by utilizing dermatoscopy images. Three steps are considered in this work. In the first step skin lesions are partitioned through Self-Generating Neural Network (SGNN). The texture, colour and contour features are mined in second step. Finally, skin lesion are categorized using an ensemble classifier in third step. This ensembled classifier has combined the Back Propagation Neural Network (BPNN) with Fuzzy Neural Network (FNN) to gain high accuracy. These days, ConvNets have been widely used in this field, and these models are broadly recognized for automatic feature extraction and classification. Yu et al. have offered a hybrid classification structure for dermoscopy images. This hybrid framework is designed by combining linear SVM, ConvNet and Fisher vector (FV) [21]. Codella et al. [22] have proposed a high accuracy using ConvNet to extract image descriptors by using a pre-trained model from skin lesion image. They also examine the most recent network structure called Deep Residual Network (DRN).
A ConvNet toolbox for skin cancer classification is developed by Nunnari et al. [23]. This implemented software architecture offers researchers to quickly design new Con-vNet model and hyper-parameter arrangements. This work recommends that interactive methodologies should be used to train efficient model in an explorative style. Mukherjee et al. have offered a Deep Convolution Neural Network (DCNN) based method [24]. This method is verified on Dermofit and MEDNODE image datasets in two stages. Both datasets are combined and accomplishes accuracy of 83.07%. A completely automatic scheme for skin lesion identification utilizing optimized deep features and dissimilar saturation levels is designed by Mahbod et al. [25]. Three pretrained ConvNet models namely AlexNet, VGG, and ResNet18 are employed for feature extraction. SVM classifier is designed from these generated features and attained 83.33% melanoma identification accuracy. A scheme for pigmented skin lesion identification by the use of a system termed as DermoDeep is offered by Abbas and Celebi [26]. The DermoDeep model performs the fusion of DNN and visual features. DNN model is a five layer architecture and 2800 ROI images are used for training. The DermoDeep model is evaluated through sensitivity (sensitivity) and true positive rate (specificity) performance metric and attained 93% and 95% respectively.
Khan et al. [27] have suggested a hybrid method by utilizing Faster Region Based CNN (FRCNN), Iteration-Controlled Newton-Raphson (ICNR) and transfer learning. A bee colony enabled contrast stretching process is utilised in the localization step. The contrast stretched images are plugged into FRCNN to get segmented images. DenseNet201 is utilized as pre-trained model to mine suitable features via transfer learning. Later, these features are selected through ICNR approach. Finally, the most prominent features are used for categorization by Multi Layer Perceptron (MLP). The method is trained and tested with ISBI2016 and ISBI2017 image datasets and gained 94.5% and 93.4% accuracies respectively for each dataset. Polat et al. have offered two different methods. The first method has used a single CNN model, and the second method applied the arrangement of one-versus-all and CNN approach [28]. The ConvNet model was trained and tested using raw dermatological photos, and seven different models, each with two classes, were created and then integrated with the one-versus-all style in the first and second methods, respectively. CNN has gained 77% accuracy in the recognition of skin ailment with seven lesion classes and the alignment of ConvNets using one-versus-all method have attained 92.90% accuracy.
Polap et al. [29] have presented a smart home system that diagnoses the skin health of the house's residents by in-built sensors and offered AI algorithms.
To extract and assess the SM fragment from the dermoscpy images of dimensions 224 × 224 × 3 pixels, Kardy et al. [30] have used a pre-trained VGG-SegNet technique. The proposed section is made up of an encoder-decoder section and a SoftMax classifier to perform binary classification.
Khan et al. [31] have suggested approach in which dermoscopic scans are first treated by the decorrelation formulation method, then sent to a masked region based Con-vNet for skin lesion segmentation, with the model trained on segmented colored images. Finally, the DenseNet deep architecture is used to mine features from the segmented images.
Using a deep pre-trained CNN model and an improved moth flame optimization method for dimensional reduction, a completely automated system for multi-class skin lesion separation and classification was developed by Khan et al. [32]. The Kernel Extreme Learning Machine classifier is utilized to fuse the generated features using a multiset maximum correlation analysis.
Transfer learning is a prominent method in deep learning which adapt a pre-trained ConvNet model for skin lesion diagnosis. Mahbod et al. [33] have suggested a threelevel fusion method with various fine-tuned deep models and segmented skin lesion images. This work has achieved admirable classification accuracy on the ISIC2018 image dataset. Akram et al. have discussed a method which utilizes a multi-level architecture of features selection and a dimensionality reduction method [34]. The authors have claimed that fused features from set of pre-trained models increases the inclusive accuracy and applying the feature choice followed by dimensionality reduction stage considerably increase the performance. Amin et al. have offered a scheme for segmentation and categorization of skin lesions [35]. The deep features are fused to design a feature vector for skin cancer recognition. The model has shown high accuracy due to combination of three approaches. In the first strategy the grouping of multiple datasets is designed. In the second strategy, deep features mining from two pre-trained models namely VGG-16 and AlexNet is performed. The third strategy consists of serial fusion and optimization by Principal Component Analysis (PCA). Authors have merged ISBI 2016, ISB2017, and PH2 image datasets into one dataset for classification and achieved 99.0% accuracy.
Deep learning frameworks have recently gotten a lot of applications in the remote sensing and medical imaging areas. Zhang et al. [36] have presented an interleaving perception CNN for integrating heterogeneous information and increasing combined recognition rate of light detection and ranging data and hyperspectral image, which is input into a two-branch ConvNet for final prediction. For cell nuclei in colon cancer classification, a phased detection-identification paradigm is presented by Li et al. [37]. A location of attention network detects nucleus positions first, encoding contextaware description on the input image and decoding features on the proximity map. Furthermore, throughout the decoding phase, a cascade residual fusion block is employed to enhance prediction accuracy. A two-channel CNN is designed for medical hyperspectral object detection applications to tackle the challenge of supervised CNN models with small datasets. To provide local comprehensive information, a reliable CNN is used by Wei et al. [38] for medical hyperspectral image classification. The characteristics taken from two channels' underlying layers are concatenated into a vector that is anticipated to preserve both global and local information at the same time.
What is typical in most of the methods briefly discussed above is that skin lesion detection can be implemented either with the use of hand-crafted features or by the analysis of the skin colour images using hand-crafted features. Con-vNet based classification models have achieved outstanding performance in cancerous skin lesions recognition, but many hand-crafted features such as texture, colour distribution also play significant roles. Hand-crafted features are considered to define image content from particular aspects, which can deliver additional information to a deep learning model.

III. METHODOLOGY
This sections discusses summary of colour moment features, texture features, convolution neural network, proposed methodology and image dataset.

A. COLOUR MOMENTS
Colour is one of the most significant feature to be extracted in any object recognition system. Color is one of the most important characteristics for differentiating between benign and malignant melanocytic tumours. The appearance of six alarming colours suggested by the ABCD dermoscopic principle identifies the majority of malignant melanomas. Histopathologically, the presence of these alarming colours implies the existence of melanin in the epidermis and dermis' deeper layers [39]. Many researchers have used colour histogram as a feature. Colour histogram is provides satisfactory level of precision. However, it suffers with less spatial distribution and noise. Colour moments are used to overcome limitations of the colour histogram. Here, RGB channels are used to represent the colour images. Four moments namely mean, standard deviation, skewness and kurtosis are computed for each of these channels. So, an image is characterized by 12 moments i.e. 4 moments for each channel. Let us express the probability distribution of ith colour channel at the jth pixel as P ij . The four colour moments are calculated as [40]:

1) MOMENT 1-MEAN
Mean is the average colour value in the image.

2) MOMENT 2-STANDARD DEVIATION
It is the square root of variance which denotes the distribution of colour in an image and is calculated by

3) MOMENT 3-SKEWNESS
Skewness is defined as the degree of asymmetry in the colour distribution. It delivers the nature of asymmetric colour distribution which in term used for finding the shape of the distribution and calculated by,

4) MOMENT 4-KURTOSIS
This is similar to skewness and calculates the colour distribution shape, specifically calculates the peakedness or smoothness of the distribution relating to its normal distribution and is computed by A feature set is designed consisting of statistical moments up to 4th order separately for each colour channel as: VOLUME 10, 2022 A feature set of moments is designed by union of m R , m G , and m B as The GLCM features are based second order statistical texture features. Following features are calculated from GLCM [41].

1) CONTRAST
Contrast is the local variations in the gray-level co-occurrence matrix.
2) CORRELATION Correlation is the joint probability of the considered pixel pairs.

3) ENERGY
Energy is the sum of squared elements in the GLCM. It is also termed as the angular second moment or uniformity.

4) HOMOGENEITY
Homogeneity estimate the closeness of the grey level distribution of entries along the GLCM diagonal.

5) DISSIMILARITY
Dissimilarity is the distance between pairs of objects (pixels) in the region of interest.
A feature set consisting of these texture features is designed as

C. CONVOLUTION NEURAL NETWORK (CONVNET)
ConvNet is a sort of deep learning architecture that is regularly used to computer vision applications for example object analysis and identification. A ConvNets principally contains convolutional layer, activation layer, dense layer, and pooling layer [42], [43]. ConvNets preserve spatial integrity of input images. Convolution is a mathematical process on two functions that yields a third function that describes how the shape of one is impacted by the other in functional analysis. Both the result function and the method of generating it are referred to as convolution. The basic purpose of the convolution layer is to extract features from an image. Convolutional kernels are the set of weights that are applied to pixel values. These weights are refined by a back-propagation throughout the learning stage. The convolution operation is applied by convolution kernels. The continuous domain convolution of two functions f and h is defined as follows [48].
The analogous convolution operation for discrete signals is defined by: This 1 − D convolution for 2 − D convolution situation is defined by: The function h is referred to as a filter (kernel) in this case, and it is utilised to convolve over the picture f. The convolution between the kernel and image is achieved at each pixel position and the output is a 2 − D array which is termed as feature map. A nonlinear activation layer for example softmax, Rectified Linear Unit (ReLU ) Arbitraryized Leaky Rectified Liner Unit (RL − ReLU ), Leaky Rectified Liner Unit (L-ReLU), Parameterized Rectified Liner Unit (P − ReLU ), and Exponential Linear Units (ELU ) are used activate the output of convolution layer. Activation functions are an essential element of deep learning models [48]. These functions are used to decide the output of a model, accuracy, and also impact on the efficiency of the model. These functions impacts on the convergence and the convergence speed. A pooling layer is frequently placed after the convolutional layer. Spatial pooling used for down-sampling while preserving the most substantial features. It reduces the number of parameters to avoid overfitting. Various examples of pooling operations are max pooling, average pooling and sum pooling etc. It is also possible to specify the stride and the kernel size in addition to choosing different pooling filters. The final layer is referred as dense layer which is a dense layer. This layer offers the prediction of the ConvNet model.

D. PROPOSED CASCADED ENSEMBLED DEEP LEARNING MODEL
A cascaded ensembled deep learning model by integration of handcrafted feature mining and ConvNet learning ability to categorize skin lesions is proposed in this work. A graphical description of the network architecture is presented in Figure 1. The cascaded ensembled deep learning model has a dense model and a ConvNet architecture. It has two inputs which are the feature set consisting of colour moments and GLCM features and RGB colour images. A four layer fully connected model is applied to categorize the designated features, that delivers a nonlinear mapping function f (·), which transforms x to y. This mapping is defined as [42] Here, w and b symbolize the weight matrix and bias vector respectively. The ConvNet is utilized to categorize skin disease images where deep features are mined by convolution layer. The convolutional layers perform convolution in its place of multiplication in the dense layers that is calculated as (19) where, N , i, j symbolizes the size of kernel, feature map, and convolution filter respectively. c ij is the convolution filter for the i − th input and the j − th output. The outputs of the ConvNet model and fully connected model are consolidated at the last. A softmax transformation function is used here to transform the real values into estimated probability. It is defined as where, x i and p i represents the i − th output and the output of the softmax nonlinear activation function respectively.

E. IMAGE DATASET
Dermoscopic lesion images were acquired from HAM10000 Dataset collected by multiple institutions [45], [46]. It contains 10015 images of skin pigments which are divided amongst seven classes. A disease label for each image are decided diagnostically or histopathologically. This dataset consists of 10, 015, 193, and 1, 512 labelled images in training, validation and test set respectively. A brief description of each class is given below.

1) ACTINIC KERATOSES (AKIEC)
In this category irregular, scaly patches on the skin are assumed to be precancerous. These are mostly appeared owing to sun exposure. 327 images are available in this class.

2) BASAL CELL CARCINOMA (BCC)
It is the utmost frequent class of skin cancer that generally appears as a sore that appears to become recovered and then reappears and may start to bleed. These tumours are locally invasive and tend to burrow in but not spread to distant locations. 514 images are available in this class.

3) BENIGN KERATOSIS (BKL)
Benign also referred as Lichen planus-like keratosis (LPLK). These lesions normally appear as a single macule, papule, or marker that changes over time as it heals. This class has a total of 1099 scans.

4) DERMATOFIBROMA (DF)
It is a frequent benign fibrous nodule that frequently appears on the lower legs and also referred as cutaneous fibrous histiocytoma. This class consists of 115 images.

5) MELANOMA (MEL)
It is a severe type of skin cancer, which starts in cells known as melanocytes. It is less frequent than basal cell carcinoma and squamous cell carcinoma however more unsafe because of it spread more rapidly. 1113 images are available in this class.
Moles appear throughout early age. These moles grows gradually, change colour, and becomes outstretched. This class consists of 6705 images.

7) VASCULAR SKIN LESIONS (VASC)
These are comparatively common irregularities of the skin and tissues. These are generally termed as birthmarks. Vascular tumours may be benign or malignant and can appear anywhere in part of body. This class consists of 114 images. This dataset has the challenge of learning with imbalanced dataset. This problem is addressed by oversampling of the minority class. A type of data augmentation method referred as the Synthetic Minority Oversampling Technique (SMOTE) is utilized for the minority classes. SMOTE is designed to learn the topological qualities of the neighbourhood of those points in the minority class, making overfitting less likely. This oversampling method first chooses a minority class exampleI 1 at arbitrary and recognizes its k closest lesser class neighbours. The synthetic case is then designed by selecting one of the k closest neighbours I 2 at arbitrary and connecting I 1 and I 2 to create a line segment in the feature space. The artificial cases are created as a convex grouping VOLUME 10, 2022 of the two selected cases I 1 andI 2 . Figure 2 displays the number of images before and after augmentation, whereas Figure 3 displays representative images from all classes. The parameters of SMOTE namely sampling strategy, arbitrary state, shrinkage are set to default values i.e. auto, none and none respectively.

IV. RESULTS
The goal of this study is to evaluate both a ConvNet and a cascaded ensembled deep learning model's generalisation skills. After learning of the classifiers, the performance of each model is assessed using the significant metrics namely precision, sensitivity, f-score and accuracy [41]. Receiver Operating Characteristic (ROC) curves are also computed and displayed to compare and verify the performance of each model. An ROC graph is a method for envisioning, forming and choosing classifiers based on their precision [42]. It is a 2 − D representation for performance of classifier. However, it is required to reduce ROC dimension to a single scalar quantity for comparison of multiple classifiers. For this purpose, a common metric area under the ROC curve is computed termed as the AUC. Here, AUC is a share of the area of the unit square. AUC ranges between 0 and 1.0. A arbitrary classifier yields the diagonal line between bottom left corner and top right corner, which has an area of 0.5. So, a trained classifier should not have an AUC less  than 0.5. AUC value more than 0.95 is considered decent in medical applications. Furthermore, micro average and macro average are computed in this multiclass classification problem. A macro-average metric assesses performance for each skin disease class independently before calculating the mean by taking all classes into account equally, whereas a micro-average metric averages all skin classes to calculate the average value.    Two batch normalization layers are also added after last two convolution layers. BN is quite good at cutting down on the number of epochs needed to train a model. It has stabilized the training, permitting for a larger range of learning rates and regularization strengths. Summary of ConvNets model consisting of layers, output shape, parameters and connection details is given in Figure 4.
With an initial learning rate of 0.001 and momentum of 0.9, the ReduceLRonPlateau schedule is employed, with a patience of 3 epochs and a decay factor of 0.7. This callback routine observes a accuracy throughout each iteration and the learning rate is reduced, if no improvement is found for VOLUME 10, 2022 a patience number of epochs. The categorical cross entropy is engaged as the loss function, and the adam optimizer is used to generate the model. Additionally, the performance throughout the training monitored by passing accuracy as metric. Training of the model continued for 50 epochs with batch size 32. After 50 epochs training and validation accuracies are stabilized. The learning curves are provided in Figure 5 and Figure 6. Confusion matrix for results is plotted in Figure 9 (a). The findings of this experiment are shown in Table 1. This model achieves an average accuracy of 0.853.  consisting of colour moments and GLCM features as This hybrid feature set consisting of seventeen features is computed for each colour image of the image data set. This classification model consists of a fully connected model in the form of multi-layer perceptron and a ConvNets model. Two inputs are simultaneously passed to this model i.e. the colour RGB image and the hybrid feature set f . The ConvNets model has the same architecture as discussed in Experiment 1. A four layer fully connected model with 40, 80, 100, and 200 neurons respectively in first, second, third and fourth layers is used to build MLP model. All these fully connected layers are followed by dropout layer with probabilities 0.2, 0.3, 0.3 and 0.4 in that order. The input layer  and output layer of both models are integrated as discussed in above section.
With an initial learning rate of 0.0012 and momentum of 0.9, a ReduceLRonPlateau schedule with a patience of 2 epochs and a decay factor of 0.75 is utilized. The category cross entropy is employed as the loss function, and the adam optimizer is used to generate the model. Furthermore, the performance throughout the training observed by passing accuracy as metric. Training of the model continued for 20 epochs with batch size 16. After 20 epochs training and validation accuracies are stabilized. The learning curves are provided in Figure 7 and Figure 8. Figure 9 (b) shows the results confusion matrix. Table 1 displays the experiment's Enhanced changes in f-score for the performance for the used models. additional performance metrics. This model achieves an average accuracy of 0.983.
The cascaded ensembled deep learning model has achieved more accuracy than the ConvNet model, conferring to a comparative evaluation of both models. All three performance metrics i.e. precision, sensitivity and f-score are higher in second model in comparison to ConvNet model for individual classes. The test loss is reduced to 0.052 from 0.477 and test accuracy improved to 0.983 from 0.853 as shown in Table 3. Visual representation of Table 1 is shown in Figures 10, 11 and 12.   Figure 13 and Figure 14 also confirms the dominance of cascaded ensembled deep learning model over ConvNet model. The AUC for each individual class is better in second model than ConvNet model. Furthermore, the micro-average and macro-average area are higher in cascaded ensembled model. This work has showed that cascaded ensembled model outperforms than a regular ConvNet model on the cost of computational complexity. The proposed cascaded model need more computation in comparison to ConvNet model since it need additional feature extraction step. However, in the area of medical diagnosis, the precision is more significant than computational cost.

V. CONCLUSION
Considering the current achievement of deep learning architectures, an efficient method is presented for the skin lesion classification. In this work, a cascaded model is created that combines the strengths of models based on hand-crafted feature extraction approaches and deep learning model. To gain the high accuracy of the skin disease image classification the powerful ability of feature learning of deep ConvNets is integrated with handcrafted features including colour moments and texture features. This deep learning architecture termed as cascaded ensembled deep learning model in this paper. The simulation results show that our proposed model outperforms the ConvNet model. More research is being done to create a more robust model by combining clinical features like sex, age, itching, burns, medical history, and location with handmade features to create a more robust model.

CODE AVAILABILITY
For assistance of dermatologist and researchers the code is provided on Github repository at: https://github.com/ shamiktiwari/Skin-cancer-classification-using-Cascaded-Ensemble-of-ConvNet-and-Handcrafted-Features.
AKHILESH KUMAR SHARMA (Senior Member, IEEE) received the B.E., M.E., and Ph.D. degrees in computer science and engineering. He is currently working as a Professor at Manipal University Jaipur (MUJ), India. He has chaired many sessions and acted as an expert for keynotes in IIT's and NIT's in India and also in Vietnam, Thailand, Malaysia, Australia, China, and Singapore. He has established the CIDCR Laboratory in MUJ. He has published over 70 articles in journals and conferences and written books and book chapters. He has six patents and four copyrights to his credit. He is a Senior Member of ACM, CSI, IUCEE, and the MIR Laboratory, USA. He is also a member of the Institution of Engineers of India. He has received many awards. He is the Acting Secretory of the ACM Professional Chapter.
SHAMIK TIWARI is currently working as a Senior Associate Professor with the School of Computer Science, University of Petroleum and Energy Studies, Dehradun. He has rich experience of around 18 years as an academician. His research interests include digital image processing, computer vision, bio-metrics, machine learning especially deep learning, and health informatics. He has written many national and international publications, including books in these fields. He is an active member of the Universal Association of Computer and Electronics Engineers (UACEE) and the International Association of Innovation Professionals (IAOIP). VOLUME 10, 2022