Deep Semantic Segmentation and Multi-Class Skin Lesion Classification Based on Convolutional Neural Network

Skin cancer is developed due to abnormal cell growth. These cells are grown rapidly and destroy the normal skin cells. However, it’s curable at an initial stage to reduce the patient’s mortality rate. In this article, the method is proposed for localization, segmentation and classification of the skin lesion at an early stage. The proposed method contains three phases. In phase I, different types of the skin lesion are localized using tinyYOLOv2 model in which open neural network (ONNX) and squeeze Net model are used as a backbone. The features are extracted from depthconcat7 layer of squeeze Net and passed as an input to the tinyYOLOv2. The propose model accurately localize the affected part of the skin. In Phase II, 13-layer 3D-semantic segmentation model (01 input, 04 convolutional, 03 batch-normalization, 03 ReLU, softmax and pixel classification) is used for segmentation. In the proposed segmentation model, pixel classification layer is used for computing the overlap region between the segmented and ground truth images. Later in Phase III, extract deep features using ResNet-18 model and optimized features are selected using ant colony optimization (ACO) method. The optimized features vector is passed to the classifiers such as optimized (O)-SVM and O-NB. The proposed method is evaluated on the top MICCAI ISIC challenging 2017, 2018 and 2019 datasets. The proposed method accurately localized, segmented and classified the skin lesion at an early stage.


I. INTRODUCTION
Skin cancer is a more aggressive and common in human beings. It's caused due to abnormal cells growth. These cells are developed through mitosis and replicate themselves. Melanoma is caused due to anomalous cell growth; these cells replicate themselves by migrating from bloodstream to other body organs and also infect the adjacent skin tissues. The basement membrane provides protection for epidermis. Cancerous cells grow and bypass the basement membrane and spread into the inner skin layers. Melanocytes create The associate editor coordinating the review of this manuscript and approving it for publication was Shuihua Wang . brown pigment known as melanin. Melanin is a protective pigment, which provides protection to the skin from ultraviolet rays. Skin cancer is commonly caused to those peoples who play or work outside and it is typical amongst sunbathers. Fair-skinned peoples are mostly affected by skin cancer due to less melanin production. However, skin cancer may also develop in dark skinned people, due to lack of exposure to sunlight [1]. From 2008-2018 current news depicted that, 53% rise in new melanoma cases are diagnosed yearly [2], [3]. In the next 10 years the death rate of this disease is estimated to increase. If treated in later stages, less than 14% survival rate of this disease [4], [5]. Therefore, diagnosis of skin cancer at an earlier stage is necessary and a challenging task. The classified dermatologists mostly pursuits a sequence of steps for skin cancer diagnosis, first of all they do observation of suspected lesions with an naked eye, then microscopically magnifying lesions and monitored by biopsy. This is a time-consuming task and the patients are diagnosed too far ahead. Depending on the ability of the clinician, correct diagnosis/treatment is subjective. Dermatologists diagnose the skin lesion having accuracy less than 80% [6]. In health care centers, there are not many professional doctors available all around the world. Therefore computerized methods are implemented so far for detection of the skin cancer [7]. The machine learning algorithms such as decision tree [8] Bayesian classifiers [9], and SVM [10] are used to classify different grades of the skin cancer. However, accurate skin cancer detection still a intricate task due to several factors such as variability in texture, shape, color and size of the lesion, poor contrast/brightness, light/dark hairs, and irregular/unclear lesion boundaries. The optimized features extraction/selection is also a challenging task for accurate classification [11]. To overcome these challenging tasks, this article investigates a new methodology for the detection of eight types of skin cancer such as MN, BCC, AK, NV, BKL, DF, VL, and SCC. The foremost contribution steps are opted for accurate detection is as follows: YOLOv2-SqueezeNet model is used for localization of the infected region with locations, class label as well as prediction scores. The localized affected region is segmented using modified 13-layers semantic segmentation model. Deep features are extracted and selected using ResNet-18 and ACO respectively. The resultant features vector is passed to the O-SVM and O-NB for skin lesions classification.

II. RELATED WORK
Recently much work is carried out for discrimination of different kinds of skin lesions, some of which are discussed in this section [12]. Skin lesions are detected in four major steps i.e., preprocessing, segmentation, features extraction and finally classification. During image acquisition, dermoscopic images having certain artifacts such as thin/thick hair, low contrast image resolution, dark spots/bubbles around the infected skin region and irregular lesion boundary that ultimately minimize accuracy of skin lesion detection. To handle these challenging tasks preprocessing help in accurate detection of the skin lesion [13]. The high pass filter is used to highlight the edges; further illumination is removed by homomorphic filter [14]. Segmentation is a crucial step, provides significant information about lesion such as border, shape, asymmetry and the irregularity [15]. Morphological filtering with weight based features selection approach is used for detection of lesion boundary [16]. Star shape semantic segmentation method is used for skin lesion segmentation [17]. ABCD rule-based approach is used for skin lesion detection. In which total dermoscopic scores are also measured on the basis of asymmetry, lesion diameter and color [18].
Feature extraction is a third major step to extract meaningful information from the input images based on the certain characteristics such as shape, color and texture. However, best features selection is also a challenge for improved classification [18]. Hence after GLCM features extraction, GA is applied for the selection of optimum features [19]. PCA and PSO are also used for the selection of active features vectors [20]. After features extraction, classification is done to discriminate the affected skin region into benign/malignant. The KNN, decision tree [21] and SVM [22] are used for skin lesions classification. Deep learning methods [23]- [25] are mostly utilized for skin lesions detection [2]. Esteva et al developed GoogLeNet and Inception V3 CNN models for skin cancer classification. AlexNet [26] model is applied on the dermoscopic images to learn the pattern of the skin lesion. The extracted features pattern in the form of vector is passed to the multiclass SVM for discrimination among the healthy and infected skin region. Deep full resolution convolution network (DFRCN) with softmax layer [27] is used for classification of skin lesion.

III. PROPOSED METHODOLOGY
The proposed deep learning appraoch for skin lesion detection as shown in Fig 1,    YOLO loss is computed into three major categories such as localization, confidence and classification. Localization loss computes the error between actual and predicted bounding box. While, confidence loss is measured through addition of the confidence scores when skin lesions are detected and when it is not detected in a bounding box of a grid cell. Whearse, classification loss compute squared error among conditional class probabilities for each class in the grid cell. The YOLO loss function is computed as shown below: where, G, B, h, w, p, and s denotes number of grid cells, number of the bounding box, height, width, probability, and confidence scores, respectively. The localization and classification losses are controlled using weight parameters A 1 and A 4 , respectively. Similarly, A 2 and A 3 control the confidence loss.
The variable with hat denote ground truth value in i th grid cell. Whereas, the variable without hat and subscript i represent value of j th bounding box in i th grid cell. (x i ,y i ) and (x i ,ŷ i ) represent the center points of j th bounding box and ground truth with respect to i th grid cell, respectively. Skin lesions are localized using proposed YOLOV2-squeezeNet as shown in the

B. SEGMENTATION OF SKIN LESIONS
In this work, 13-layer semantic segmentation model is proposed. The segmentation model consists of four blocks that are illustrated in Figure 3.
The dilated convolution layer might increase receptive field of layer without increase the number of the parameters or the computations. Therefore, in this model, two dilation convolution layers having dilation factor 1 and 2 are used. In which 3 × 3 filter size convolutional layer and pads the input size is same to the output size through setting [1111] padding option.
Two batch-normalization layers are used among the convolution and the ReLU layers to normalize the input x i by measuring the µ B and σ 2 B over the mini-batch size to speed up the CNN training and also minimize the sensitivity of the network initialization.
The normalize activations are defined as: In the decoder section dilated convolution layer with 4 dilated factors is used. The last convolutional layer is applied with 1 × 1 filter size to squeeze down the number of channels related to the class labels. The mini-batch size of the proposed model is 16. The model is trained on maximum 300 epochs, with 1e-3 learning rate. The layered architecture of the segmentation model for training is mentioned in Table 2. The segmented lesion region is shown in

C. CLASSIFICATION OF DIFFERENT KINDS OF SKIN LESIONS
In medical domain, different types of disease classification using machine learning approaches are helpful for medical specialists. The computerized approaches more computationally exhaustive due to increase the number of the patients slices. The deep convolutional neural networks perform better on large number of the input data as compared to classical methodologies [33]. In deep learning methodologies, features are extracted from input and integrated into single matrix to improve the performance. In this article, ResNet

D. FEATURES ENGINEERING AND CLASSIFICATION
The features vectors are obtained in the previous section, in which prominent features selection from the pool of the features vectors is a challenging task. Therefore in this article features engineering is performed based on ant colony optimization [34]. ACO is a computational approach is utilized for problem optimization. In which problem is optimized by finding shortest path based on the phromone and heuristic exponential weights. The features vector length 1000 is passed to the ACO to find out the active deep features based on optimized cost function. In this approach features are optimized using selected parameters as mention in Table 3. The best cost function is graphically shown in Fig 6.

IV. MATERIAL FOR PERFORMANCE EVALUATION
The proposed method performance is evaluated on three latest challenging ISIC 2017 [35], 2018 [36] and 2019 [37] datasets.  [38]. The description about the dataset is mentioned in the Table 4. In Table 4, second column shows total skin lesion slices that are already available in dataset. To increase the size and complexity of dataset, the rotation is applied with different angles such as 30 • , 60 • , 90 • , 120 • , 180 • , and 270 • . In this process we observe that number of images of MEL and NV are sufficient as compared to other types of the lesions. Therefore, we used number of images of MEL and NV without augmentation for experimentation. The 25331 slices are available but after the rotation with different angles numbers of slices are increased up to 35971. Three experiments are implemented to compute the proposed method performance on MATLAB 2020a toolbox with 740K Nvidia Graphic Card.

A. EXPERIMENT #1 LOCALIZATION OF THE SKIN LESIONS
In this experiment, YOLOv2-squeezeNet model is applied to localize the skin lesions with class labels as well as predicted scores. The model performance is evaluated with different performance measures such as average precision, Recall, IoU and average log miss rate (am) as mentioned in the Table 5. The localization results in the Table 5 shows that, methods achieved mAP of 0.95 on ISBI 2017, 0.96 on ISBI 2018, 1.00 on ISBI 2019 and 0.94 on ISIC 2020 datasets. The graphical representation of the mAP with respect to am and IoU is shown in the Fig 8. The localization results with respect to the class labels and predicted scores are also visually shown in the Figure 9.

B. EXPERIMENT #2 PIXEL BASED CLASSIFICATION
In this experiment, localized infected region is segmented using proposed segmentation model. The segmentation results are also evaluated with pixel by pixel with ground truth annotations in term of performance measures such as IoU, and accuracy as mentioned in Table 6. The segmentation result in Table 6 shows that, a proposed segmentation method achieves global accuracy of 0.   Figure 11.
The discrimination outcomes are computed using different metrics as mentioned in the Table 7-15.  The classification results in Table 7-15, shows that, proposed method classifies the input images into two classes VOLUME 8, 2020          The quantative results comparison is performed with existing works in term of different performance metrics as mentioned in Table 15.
The classification results compared recent work [39]- [43]. The focus net model is utilized for segmentation, in which encoding layers encode the data properly that helps in the prediction of lesion segmentation. However, focus Net has less sensitive for lesion detection with 0.76 sensitivity [39]. Unet with FCN8s method is used for lesion segmentation with 0.87 sensitivity and 0.95 specificity on ISIC 2017 dataset [40]. Similarly, ensemble approach, in which combination of the transfer learning models are used for segmentation of the skin lesions with 0.76 validation score [41]. Another existing method used pre-trained model i.e., DenseNet-201, ResNet-50, Inception-v3, Inception-ResNet-v2 models with are used with FrCN for detection of the skin lesions with 0.88 accuracy [42]. Transfer learning models such as VGG16, Densenet201, InceptionResNetV2, Google net is used for skin lesion detection on ISBI 2019 dataset with 94.92% accuracy [43].
In this work, two proposed end to end deep models i.e., YOLOv2-SqueezeNet and 3-D semantic segmentation model are fine-tine by the selected configuration parameters that provide accurate localization and segmentation of lesion region. Furthermore, data augmentation is implemented to balance the slices of the different kinds of lesions. After data augmentation, deep features are extracted using cross entropy function and optimum features are selected using ACO. The data augmentation approach with optimized features vector provides higher classification accuracy. The proposed method achieves up to 98% accuracy on ISBI 2018, 2019 and 99% accuracy on ISBI 2017 datasets.
The results comparison, prove that proposed work performed better as compared to latest work published.

V. CONCLUSION
In this research, ensemble CNN models are proposed for skin lesion detection. In the localization method, ONNX and squeeze Net model is used as a backbone of the YOLOv2 model. In addition, depthconcat7 layer is passed as an input to YOLO model. The method localizes the infected skin lesion more accurately. The method achieves mAP of 0.95, 0.96, 1.00 and 0.94 on ISBI 2017, ISBI 2018, ISBI 2019 and ISIC 2020 datasets respectively. The 3D-segmentation method is also proposed based on CNN. The configuration parameters of the segmentation model are selected after the extensive experiment for accurate lesion segmentation. The segmentation method achieves Global Accuracy of 0.93, 0.95 on ISBI 2017, and ISBI 2018 respectively. The skin lesion classification is performed by applying ResNet-18 model and deep features are extracted by cross entropy activation function. Later, extracted features vectors are enhanced by using ACO method. The hybrid classification approach provides good classification results compared to the recent existing work. In future this work, further enhance to apply the re-enforcement learning for accurately classify the skin lesion.
MUHAMMAD ALMAS ANJUM is currently a Professor at the College of Electrical and Mechanical Engineering, National University of Sciences and Technology (NUST), Pakistan. He has also contributed as a Team Lead in establishing the Centre of Excellence Information Technology, where he served as its first Pioneer Head. He designed and established the Centre of Innovation and Entrepreneurship, College of EME. He has also served as the Dean of the Faculty of Computer Sciences, University of Wah, and the Director of the Research and Development, College of EME, NUST. His areas of specialization are pattern recognition, security systems (biometrics), and computer vision. Apart from this, he has more than 45 international publications in his area of specialization and he has also published a book titled ''Face Recognition a Challenge in Biometrics: Image Resolution Issues in Face Recognition.'' He has evaluated more than ten master's and Ph.D. thesis. He is reviewer / member of more than dozen international technical committees as well as an Executive Editor of UW Journal of Computer Sciences. He has undertaken different technical projects around the globe while contributing in uplifting the local communities.
JAVARIA AMIN actively involved in research and producing high-quality work on medical image processing, pattern recognition, and computer vision. She has published more than 22 research articles in reputed and prestigious international journals with an accumulated impact factor of around 55. Her focus area of research is the detection of anomalies in human body parts using machine learning and powerful deep learning algorithms.
MUHAMMAD SHARIF (Senior Member, IEEE) is currently an Associate Professor at COMSATS University Islamabad, Wah Campus, Pakistan. He worked, for one year, at Alpha Soft, an U.K. based software house, in 1995. He is an OCP in Developer Track. He has been in the teaching profession, since 1996. His research interests include medical imaging, biometrics, computer vision, machine learning, and agriculture plants imaging. He also headed the department, from 2008 to 2011, and achieved the targeted outputs. He has more than 210 research publications in IF, SCI, and ISI journals as well as in national and international conferences and gained more than 245 Impact Factor. He has supervised three Ph.D. (CS) and more than 50 M.S. (CS) theses students till date. He received the COMSATS Research Productivity Award, from 2011 to 2018. He served in TPC for the IEEE FIT 2014-2019, and currently serving as an Associate Editor of IEEE ACCESS, a Guest Editor of special issues, and a reviewer of well reputed journals.
HABIB ULLAH KHAN (Member, IEEE) received the Ph.D. degree in management information systems from Leeds Beckett University, U.K. He is currently an Associate Professor of MIS at the Department of Accounting & Information Systems, College of Business and Economics, Qatar University, Qatar. He has more than 19 years of industry, teaching, and research experience. He is also an active researcher and his research works are published in leading journals of the MIS field. His research interests include the areas of IT security, online behavior, IT adoption in supply chain management, Internet addiction, mobile commerce, computer mediated communication, IT outsourcing, big data, cloud computing, and e-learning. He is a member of leading professional organizations, like the IEEE, DSI, SWDSI, ABIS, FBD, and EFMD. He is a reviewer of leading journals of his field and also working as an editor for some journals. VOLUME 8, 2020 MUHAMMAD SHERAZ ARSHAD MALIK received the Ph.D. degree from Universiti Teknologi PETRONAS, Malaysia. He has more than ten years of industrial, research, and academia experience in various research and senior administration roles at various countries. He is currently working as an Assistant Professor at the Department of Information Technology, Government College University Faisalabad, Pakistan, where he is also the Chairman of the Department of Estate Care. His research interests include machine learning and human interaction, data visualization, big data, digital image processing, and artificial intelligence.
SEIFEDINE KADRY (Senior Member, IEEE) received the bachelor's degree in applied mathematics from Lebanese University, in 1999, the M.S. degree in computation from Reims University, France, and EPFL, Lausanne, in 2002, the Ph.D. degree from Blaise Pascal University, France, in 2007, and the H.D.R. degree in engineering science from Rouen University, in 2017. He is currently working as an Associate Professor at Beirut Arab University, Lebanon. His current research interests include education using technology, smart cities, system prognostics, stochastic systems, and probability and reliability analysis. He is a Fellow of IET and ACSIT and a Program Evaluator of ABET. He is an Associate Editor of IEEE ACCESS journal.