Classifying Melanoma Skin Lesions Using Convolutional Spiking Neural Networks With Unsupervised STDP Learning Rule

Deep learning methods have made some achievements in the automatic skin lesion recognition, but there are still some problems such as limited training samples, too complicated network structure, and expensive computational costs. Considering the inherent power-efficiency, biological plausibility and good image recognition performance of spiking neural networks (SNNs), in this paper we make malignant melanoma and benign melanocytic nevi skin lesions classification using convolutional SNNs with unsupervised spike-timing-dependent plasticity (STDP) learning rule. Efficient temporal coding, event driven learning rule and winner-take-all (WTA) mechanism together ensure sparse spike coding and efficient learning of our networks which achieve an average accuracy of 83.8%. We further propose to use feature selection to select more diagnostic features to improve the classification performance of our networks. Our SNNs with feature selection reach an average accuracy of 87.7%. Experimental results show that comparing to CNNs that need to be trained from scratch, our SNNs (with and without feature selection) not only achieve much better classification accuracies but also have much better runtime efficiency. Moreover, although the pretrained CNNs models can achieve similar running time, our proposed SNNs are more stable and easier to use than the pretrained CNNs because we do not need to try many pretrained models any more, and our SNNs also have much better classification accuracies than the pretrained CNNs. In addition, our networks have only three convolutional layers, and the complexity of the model and the parameters that need to be trained in the networks are greatly reduced. Our works show that STDP-based SNNs are very beneficial for the implementation of automated skin lesion classifiers on small portable devices.


I. INTRODUCTION
Skin cancer is one of the most common worldwide malignancy [1]. It has been found that over the past three decades, the people diagnosed as skin cancer is more than those diagnosed as all other cancers combined [2]. Malignant melanoma is a kind of high-risk deadly skin cancer. Early detection of malignant lesions has great significance for helping the clinicians to improve the chances of survival [3]. Because of the visual similarities of some lesion types, correct diagnosis The associate editor coordinating the review of this manuscript and approving it for publication was Ganesh Naik . is a challenging task for clinicians, and is largely dependent on the experience.
Machine learning (ML) methods have shown their advantages in detecting key features and patterns from complex datasets, thus are suitable to perform classification, prediction or estimation tasks [4]. In recent years there is growing trend on the application of ML methods as an aid to accurate and automated cancer diagnosis and detection [4], [5]. The application of ML techniques has significantly improved the accuracy of cancer prediction by 15%-20% over the past decades [5]. Deep learning, in particular CNNs supported by advanced computing technology and large datasets, has become one of the most popular and powerful machine learning VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ methods in image recognition and classification [6], and has been applied to the classification of skin lesions [7]- [9]. The CNNs can learn from training image set and automatically extract important features for classification. The prior knowledge and complicated image preprocessing which are very necessary in the image classification using traditional ML algorithms, are no longer greatly demanded. Some classifiers based on deep learning methods have shown to classify images of skin cancer with the performance comparable to the level of skilled dermatologists [8]. Thus CNNs have the potential to help develop dermatologist-level, computer-aided fast skin lesion classifiers. However, at present there is still a lack of high-quality medical image datasets those can be used for training. This mainly refers to the lack of annotated/labeled data or lack of images for the abnormal classes [10]. CNNs with simple architectures are prone to overfitting on the small training datasets. Some researchers apply very deep CNNs architecture (for example, Resnet152, which has 152 layers [11]). Although this can improve the classification performance of the networks, it also adds more computing costs, which is major barrier in clinical applications [12]- [14]. More researchers tend to use pretrained CNNs to classify skin lesions [15]- [26], which avoids the problem of overfitting, but the network architectures are still very complex. And the pretrained CNNs use features learn from natural image datasets (such as ImageNet) without sufficiently considering the features of medical images [17], which limits the application of transfer learning in the field of medical image analysis. So unsupervised learning methods are needed in medical image analysis when labeled training data is scarce. Moreover, in order to facilitate the implementation of automated skin lesion classifiers, and to make the access convenient and cheap, for example through installation of apps on mobile devices, the computational costs of the CNNs-based skin lesion classifiers must be reduced.
SNNs have been emerged as an ideal biologically inspired neuromorphic-computing paradigm for realizing energyefficient on-chip intelligence hardware [27], [28]. Like in the brain, information in SNNs is encoded not only by spike rates, but also by precise spike times and spike latencies of neurons. Furthermore, SNNs usually apply bio-inspired STDP as unsupervised local learning (synaptic weight modification) rule, which is crucial for brain learning. STDP is observed in different brain areas [29]- [32], in particular in the visual cortex [33]- [35]. Weight modification of this synaptic plasticity depends on the temporal order and time difference between presynaptic and postsynaptic spikes. Since individual spike events in the networks can be made sparse in time, learning in SNNs in principle is sparse and event-driven, leading to low computational consumption. Moreover, SNNs equipped with unsupervised STDP learning rule have the capability to learn the spatio-temporal patterns of input signals, especially in an online mode [36]- [38]. SNNs with multiple hidden layers can extract more complex features from input to obtain high classification performance [39], [40]. There are deep SNNs with unsupervised learning rules whose performances are comparable to the traditional CNNs for small-scale image recognition tasks [41]- [43]. Recently SNNs have shown very good performance in the task of pattern recognition such as visual processing [41], [44], [45] and speech recognition [46], [47]. They have also been applied to predict strokes and seizures in the medical diagnosis based on electroencephalograms(EEG) classification [48], [49]. Kasabov et al [48] proposed a new spiking neural networks reservoir system for early prediction of occurrence of stroke on an individual basis. Their results showed that their method had obvious advantages in the accuracy and time of stroke event prediction compared with standard ML algorithms. Ghosh-Dastidar et al [49] proposed an efficient SNNs model for epilepsy and epileptic seizure detection using EEG. Their model reached a classification accuracy of 92.5%.
Although SNNs have shown good performance in natural image classification, to our knowledge, this is the first work that applying SNNs on medical image classification tasks. It is worth noting that medical image classification is more difficult than traditional natural image classification [50]. This is mainly because unlike natural images, the boundaries between different issues, or between normal tissues and lesion areas are usually not clear, and their texture differences are not obvious. Besides, some imaging principles used in medical images are different from those used in natural images. In addition, comparing to the most similar existing SNNs (i.e., Kheradpisheh et al. [41]), in order to improve the classification performance of the SNNs, we further propose to use feature selection operation to help pick out diagnostic features which may not be so prominent. In our model, input skin lesions images are encoded into a spiking train by a difference of guassians (DOG) filter. Then, we use unsupervised STDP rule in combination with WTA and lateral inhibition mechanisms to extract prominent hierarchical features from the spiking trains of skin images in CNNs-architectures (with only a few convolutional and pooling layers). We use feature selection method to select diagnostic features from extracted features. Finally, a SVM classifier is used to classify the categories of the input skin lesions images based on the selected features.
In our networks, STDP learning rule and lateral inhibition mechanism work together to enable the networks to process input images with sparse but informative spikes. For the SNNs with unsupervised STDP learning rule, this kind of convolutional SNNs architectures provide so far the best classification performance in processing large scale natural images [41]. Our STDP-based SNNs can learn from the small training datasets quickly and effectively in an unsupervised manner and do not suffer from the overfitting challenge caused by small training set size [41], [51], [52]. Therefore, our networks can well solve the problem of lacking labeled data in the field of medical image analysis. Moreover, our networks have only a few convolutional layers and pooling layers, so there are a very small number of parameters need to be trained. Besides, efficient temporal coding, WTA and lateral inhibition mechanism ensure the sparseness of the spike coding, these together enable the networks to require very little computational and memory consumption, which facilitate the deployment of skin lesions automatic diagnosis systems to smart phones and other portable devices, as well as cloud computations.
The classification results of our networks are compared to those of the art-of-date CNNs. The results show that our networks which have only a few hidden layers can learn features well from the small skin lesions training datasets, and after using feature selection they have better classification performance than CNNs trained from scratch and pretrained. Our networks take much less computational costs than traditional CNNs trained from scratch, and their running times are comparable to those of pretrained CNNs models. Our works evaluate the validity of SNNs with unsupervised STDP learning rule in classifying medical images and discuss the advantages of the SNNs over traditional deep neural networks.

II. RELATED WORKS
Since high-quality published skin cancer datasets are limited, using CNNs trained from scratch will make network models prone to overfitting. Therefore, more researchers tend to use pretrained CNNs to classify skin lesions [15]- [26]. Most such works are to pretrain CNNs on ImageNet [53] dataset, and then the weights of CNNs are fine-tuned according to the practical classification problems. The most common CNNs architectures used are VGG16 [18], [25], Resnet [22], [54]- [56], GoogleNet [16], [20], [57], and Alexnet [58]. Haenssle et al. [16] made the classification of melanoma and melanocytic nevi using a pretrained Inception v4 model, and they achieved an area under the receiver operating characteristic curve (AUC) of 0.95. Compared with a large international group of 58 dermatologists, their results were better than most dermatologists. Yu et al. [17] confirmed that very deep CNNs with effective training can be used to make skin lesions diagnosis, even with limited training data. They achieved an accuracy of 85.5% on the 2016 ISIC dataset, and won the first place at ISIC challenge 2016. Menegola et al. [18] used VGG16 networks pretrained on diabetic retinopathy dataset and on ImageNet respectively, and their results showed that the networks pretrained on ImageNet had better performance, and reached an AUC of 80.9% on the 2016 ISIC dataset. Codella et al. [19] combined deep learning, sparse coding and SVM for melanoma recognition, and they achieved an accuracy of 93.1% for the melanoma vs all non-melanoma lesions task, and an accuracy of 73.9% for the melanoma vs melanocytic nevi task. Mahbod et al. [21] proposed a method combining the classification results of three pretrained CNNs (Alexnet, VGG16 and Resnet18) to discriminate between three lesion classes (malignant melanoma, seborrheic keratosis and benign nevi). This method had been proven to be more effective than a single networks, and obtained an AUC of 0.838 for melanoma classification. In addition, to solve the problems that handwork can not deal with the huge intra class variation of melanoma and there is high visual similarity between melanoma and non-melanoma, some researchers proposed to perform segmentation first, and then recognize melanoma according to the segmentation results [17], [59]- [62].
Although very deep CNNs or combined deep network models can improve the classification accuracy, the problem of hardware resources consumption will be more serious, which make it more difficult to deploy diagnostic systems on small portable devices. Besides, recent studies have shown that the success of transfer learning depends on the distance between the data used to pretrain CNNs and the data in the actual classification task [63]. Therefore, these CNNs pretrained on natural images may not well deal with the challenge of melanoma recognition [17], [64]. In recent years, SNNs have shown good performance in many image recognition tasks [40], [51], [65]- [69]. And due to their sparse, spiked-based communication framework, they have huge advantages in the computational costs and hardware friendliness [27], [70]- [74]. Kheradpisheh et al. [41] proposed a deep SNNs which use STDP learning rule to train multiple convolutional/pooling layers to extract the visual features of images and then send them to the SVM classifier. Their model reached an accuracy of 98.4% on MNIST datasets, and their results showed that the networks performs well on small training datasets. Qiao et al. [74] proposed a biologically plausible SNNs model for hardware implementation, which used a hardware-friendly STDP mechanism to achieve unsupervised learning. Their model achieved a recognition accuracy of 94.5% on the MNIST datasets, and significantly reduced hardware costs and power consumption.
Therefore, we try to use convolutional SNNs to make melanoma classification. Our convolutional SNNs use STDPbased unsupervised learning method which can learn features from small training datasets. Moreover, the networks use sparse coding method and only three convolutional layers, which make the networks run faster and consume less memory.

A. MATERIALS
In order to evaluate the performance of our STDP-based convolutional SNNs on the skin lesion classification, we use the data from the International Skin Imaging Collaboration (ISIC) 2018 Challenge, which is an international effort to automatic skin lesion analysis towards melanoma detection, including lesion segmentation, dermoscopic feature extraction and lesion classification. The ISIC datasets are currently the largest public dermoscopic image collection of skin lesions in the world. The dataset we used is from ISIC 2018 challenge Task 3: Disease classification. Melanoma is the most deadly type of skin cancers, and is responsible for an overwhelming majority of skin cancer deaths. So, one of the most practical task for an AI skin cancer diagnostic model is to classify melanoma from non-melanoma. Since melanocytic nevi (one type of benign skin lesions) has high similarity with melanoma in morphological appearance, and are easily confused with melanoma in clinical diagnosis, we adopt the melanocytic nevi images as the non-melanoma images in this work and try to use our proposed model to distinguish melanoma from melanocytic nevi.

B. OVERVIEW OF THE PROPOSED METHODS
Our method mainly consists of three parts, skin lesion image data preprocessing, feature extraction based on spiking neural networks, and skin lesions classification using SVM classifier as shown in Fig 1. Our SNNs are similar to the networks of Kheradpisheh et al. [41], which include a DOG encoding layer, three convolutional layers (Conv1, Conv2 and Conv3), and three pooling layers (Pool1, Pool2 and Pool3). The DOG filter is applied to convert the preprocessed skin images into spikes using the intensity-to-latency coding scheme. Convolutional layers and pooling layers are all consisting of integrate-and-fire (IF) neurons. Each convolutional layer learns features from its input by STDP learning rule, in combination with WTA weight updating strategy and lateral inhibition mechanism. Different to [41], in this work, we replace the final global maximum pooling layer (Pool3) with a feature selection layer in order to improve the classification performance of the SNNs. The outputs of feature selection layer are then used to make the skin lesion classification by SVM classifier.

C. DATA PREPROCESSING
Data preprocessing is used to remove the hairs or noises in the raw images. We use three preprocessing methods which are hair deleting, media filter and global contrast normalization. Many images in the ISIC dataset have a lot of hairs which affects the classification accuracy. In our work hair deleting is implemented by using an algorithm called DullRazor, which was proposed by Lee et al. [75] and used to preprocess skin images [76], [77]. The DullRazor can remove hairs effectively, and also additional noises by using Fast Median Filtering. The example results of hair deleting algorithm is shown in Fig 2. Then, a media filter with a window size of 7*7 is used to reduce small pores on the skin and light reflections or shines in the dermoscopic images. Finally, we use global contrast normalization to eliminate the effects of Efficient input spike coding can lead to fast and accurate responses of SNNs. A DoG filter is applied to encode each input skin image into discrete spikes. The output of the filter is spikes of the DOG cells which detect contrasts in the input image according to their receptive fields. The higher the contrast, the earlier the cell fires when its activation is above a certain threshold. That is, the firing time of a DoG cell is inversely proportional to its activation value. This intensityto-latency temporal coding is shown to be effective to detect V1 like oriented edge features as well as complex visual features in higher cortical areas [78]. The output spikes of the DOG filter are grouped into some sequential time steps to be processed in the following convolutional layer. These sequential time steps can be efficiently propagated in parallel on the GPU. Therefore, each input two-dimensional image is encoded into a three-dimensional spiking train with time step as one dimension.

2) CONVOLUTIONAL LAYER
Each convolutional layer in the networks has several feature maps to learn different features determined by their input synaptic weights. Like CNNs, each visual feature obtained in one convolutional layer is a combination of several simpler features extracted from the previous layer. Each neuron receives input spikes from the neurons located in the same convolutional window of all feature maps of the previous layer. Input synaptic weight sharing is applied to neurons belonging to the same feature map. At each time step, the membrane potential of an IF neuron is updated as follows: where V (t) is the membrane potential of this neuron at time step t, W is the shared input synaptic weight matrix of the corresponding feature map. W j is the synaptic weight between the jth presynaptic neuron in the previous layer and this neuron. S j is the spiking train of the jth presynaptic neuron. When its membrane potential exceeds its threshold V thr , the neuron fires a spike, and its value is reset: V (t) = 0 and S(t) = 1. Lateral inhibition mechanism is applied to the neurons of all convolutional layers. When a neuron belonging to one feature map fires, it inhibits neurons in that same location but belonging to other feature maps to fire. In addition, each neuron is allowed to fire only once. Each spike of a feature map indicates the detection of a particular feature at that location, and the earlier the spike, the more prominent the detected feature.

3) STDP LEARNING RULE
In the networks, the synaptic learning of each convolutional layer is unsupervised and done layer by layer. A simplified STDP rule in combination with a winner-takes-all (WTA) mechanism is used to update the input synaptic weights of neurons in convolutional layers. The modification of one synaptic weight is calculated as follows: where ω ij is the synaptic weight from the jth neuron in the input layer (pre-synaptic neuron) to the ith neuron in one convolutional layer (post-synaptic neuron). t j and t i are the firing times of the pre-synaptic neuron and the post-synaptic neuron, respectively. And a + and a − are the two learning rate parameters of STDP. In a convolutional layer, neurons corresponding to the same feature map detect the same feature but at different locations. They compete with each other to update their shared input synaptic weights. The neuron that fires the earliest is the winner which is then modify their shared weights according to STDP rule, and the other neurons of the same feature map are prevented from doing weights updating. Moreover, in order to encourage different feature maps to learn different and prominent features, there is local lateral inhibition between the feature maps of one convolutional layer. That is, if one neuron is allowed to modify its input synaptic weights, it prevents other neurons at the same location and belonging to other feature maps to update their input synaptic weights. These biological mechanisms make the learning event-driven and the information processing sparse and effective in the networks.

4) POOLING LAYER
The first two pooling layers in the networks are local pooling. Each local pooling layer performs a nonlinear max pooling operation over its previous convolutional layer. Such an operation helps the networks to gain invariance and also to reduce the dimensionality of the input. A pooling neuron receives input spikes from the neurons in a pooling window located in the corresponding feature map of its previous convolutional layer. Each pooling neuron is allowed to fire at most once, and its input synaptic weights and threshold are all set to one. Due to the rank-order coding used in the DOG encoding layer, this max pooling operation can simply propagate the first spike in each local pooling window, which corresponds to the most prominent feature. After the last convolutional layer, there can be a global pooling layer whose outputs are used to classify the input prototypes. This layer performs global maximum pooling over their corresponding feature maps in the last convolutional layer. That is, a global pooling neuron receives input spikes from all the neurons located in the corresponding feature map of its previous convolutional layer. The thresholds of the global pooling neurons are set to infinity. Therefore, the output of each global pooling neuron is the maximum neural membrane potential of its corresponding neural map and it is also the maximum membrane potential value of all the time steps of this neural map. So, there is only one output value for each feature map, which indicates the presence of that feature in the input image. These output values are used to classify the input prototypes by SVM classifier.

E. FEATURE SELECTION
The features extracted by the convolutional layers in the convolutional SNNs are to some extent redundant (including irrelevant features). In [41], Kheradpisheh et al. used a global maximum pooling to compress input information and remove the redundancy. After the global pooling, there is only one output value for each feature map, which represents the most prominent feature of this feature map. These output values are then used to classify the input prototypes by SVM classifier. However, when classifying images with very high similarity, it is very likely that the global maximum pooling might filter some diagnostic but not the most prominent features, which will affect the classification result of the classifier. Therefore, to improve classification accuracy we use univariate feature selection based on chi-square test to replace the global maximum pooling in order to select more diagnostic features, while reducing redundancy.
After the learning of all convolutional layers have finished, we use each training sample as the input of the SNNs, then the output of Conv3 (extracted features) is flattened and input to the feature selection layer. The chi-square test is used to measure the relationship coefficient between each extracted feature and input category, and these relationship coefficients are sorted by value. After that, an optimal feature percentage (counting from the feature with the biggest relationship coefficient in the ranking) can be determined according to the best classification performance of SVM classifier. This optimal feature percentage is later used for feature selection in the test process. The single variable feature selection is implemented with SelectPercentile method in the scikit-learn toolbox [79].

IV. EXPERIMENTAL STUDY AND RESULTS
In this section, we evaluate the performances of our STDPbased convolutional SNNs with and without feature selection on the melanoma (MEL) vs melanocytic nevi (NV) dataset. We compare the melanoma classification results of our networks to those of art-of-date CNNs which had been used in the melanoma skin lesions classification before. And we analyze the effectiveness of feature selection method on improving the melanoma classification performance of the STDP-based convolutional SNNs.

A. DATASET
In our experiments, our task is to discriminate between melanoma and melanocytic nevi. Melanocytic nevi are nonmalignant lesions characterized by atypical melanocytic hyperplasia in a lentiginous epidermal pattern, which are visually similar to melanoma and are very difficult for experts to distinguish. The ISIC 2018 dataset includes 1113 images of MEL and 6705 images of NV. After removing the images with serious noise, we use 1081 images of MEL and 6638 images of NV in the task. Considering the data of these two categories is imbalanced, we randomly select 1081 samples from the 6638 NV images, then we use four-fold cross validation on the 1081 images of the two categories. The samplings of NV images are performed 5 times (20 experiments in total).

B. EXPERIMENTAL SETTINGS
Our networks are implemented with SpykeTorch [80], an open-source high-speed simulation framework based on PyTorch. The computations of the framework are based on tensor, which is totally done by PyTorch functions, so that the networks can run efficiently on the GPU platform. CNNs used for comparison study are also implemented based on PyTorch. All these models use the same data preprocessing method as mentioned in this paper, and run on a computer server with an NVIDIA GeForce RTX 2080TI GPU (11GB GPU memory) and an Intel Core i7-8700 3.20GHz CPU.
In our experiments, we use a DOG filter with a size of 5*5 in the encoding layer of our networks. Each skin image is encoded into 15 spiking time packets (time steps) for processing. The first, second and third convolutional layers consist of 6, 20 and 20 feature maps with the kernel sizes of 5*5, 17*17 and 5*5, respectively. And their IF neurons have thresholds of 4, 28 and 6, respectively. The pooling window sizes of the first and second pooling layers are 7 and 5, and the strides are 6 and 5, respectively. The learning rates of all the three convolutional layers are set to a + = 0.07 and a − = −0.01. The thresholds of the IF neurons, the sizes of the feature maps, and the learning rates a + and a − are determined according to the visualization of the learned features and the classification results of the SVM classifier. For STDP-based convolutional SNNs with feature selection, the optimal feature percentage used in the test is 8% determined in the feature selection phase. The SVM classifier uses a liner kernel and the penalty parameter C=2 (optimized by the grid search).

C. EVALUATION METRICS
The output values of the global maximum pool or that of feature selection are used to train the linear SVM classifier to classify input prototypes. In the skin lession diagnosis, it is more important to correctly predict melanoma lesion which has high mortality than to incorrectly predict benign melanocytic nevi lesion [55], therefore in our experiments the performance of the SVM classifier is evaluated in terms of accuracy (AC), precision (PREC), sensitivity (SE), specificity (SP) and AUC to measure the classification performance of the networks. The criteria are defined as: where N tp is the number of ture positives, N tn is the number of true negatives, N fn is the number of false negatives, and N fp is the number of false positives.

1) QUALITATIVE EVALUATION OF FEATURES EXTRACTED BY STDP-BASED CONVOLUTIONAL SNNS
The features of input skin lesions images are learned in the first, second and third convolutional layers of the STDPbased spiking neural networks. As shown in Fig 3, the neuronal maps of the first convolutional layer work as filters to detect the edge features with different orientations over the images. The second convolutional layer learns visual features which are the combination of oriented edges. The third convolutional layer learns the more complex features which are related to category-specific prototypes. In all the learning processes of the three convolutional layers, STDP tends to learn those frequent and salient features, and the lateral inhibition mechanism helps the spiking neurons to learn different features.

2) QUANTITATIVE EVALUATION AND COMPARISION
The features extracted by the convolutional layers are then filtered by global maximum pooling layer or feature selection, and the final filtered outputs are used to train SVM classifiers to distinguish melanoma from melanocytic nevi. We compare the melanoma classification results of our proposed networks  with other traditional CNNs on the MEL/NV dataset. These traditional CNNs include VGG16, Alexnet, Inception v3, Resnet50, Resnet152 which are all pretrained on ImageNet, and Resnet152 trained from scratch. All these CNNs are from the latest studies in the melanoma detection literatures. The pretrained CNNs are implemented by transferring the weights trained on ImageNet and then fine-tuning all the last fully connected layers using our MEL/NV dataset. In addition, we also train a CNNs with the same structure as our SNNs from scratch, which have the same number of convolutional and pooling layers (CNNs-three). All these networks use the same data preprocessing method as mentioned in this paper. The classification results of our SNNs and other traditional CNNs are listed in Table 1. Each result value in the table represents mean value from five sets of experiments using four-fold cross validation. It can be seen from Table 1 that on the melanoma vs melanocytic nevi classification task, our SNNs without feature selection reach an average AUC of 0.907, an average AC of 0.838, an average PREC of 0.8, an average SE of 0.875, and an average SP of 0.801, and its AUC value is higher than all the 6 traditional CNNs. And after using feature selection our SNNs achieve the best AUC (0.936), AC (0.877), PREC (0.846), SE (0.903) and SP (0.847) among the presented networks. Moreover, our networks show the highest sensitivity (SE), which means that melanoma lesions are much easier to be correctly identified by using our SNNs, which is very important in the actual diagnosis of skin lesions. Therefore, compared with these listed traditional CNNs, our STDP-based convolutional SNNs provide better classification performance on the melanoma vs melanocytic nevi classification task.
Moreover, to compare the hardware resources consumption of these networks for the melanoma vs melanocytic nevi classification task, the running times and the weight numbers of these networks are measured and listed in Table 1. Running time and weight numbers can represent the computational and memory resource requirements of a network model during training [81]. It can be seen from Table 1 that CNNsthree have the least running time, but their classification performance is very poor, so shallow CNNs can not be used in this classification task. With the same network structure, CNNs-three run faster than our SNNs, this is because our SNNs have an additional time dimension, and therefore more data needs to be processed. By comparing the running times of Resnet152 and pretrained Resnet152, it can be seen that existing deep CNNs, if not pretrained, will consume dozens VOLUME 8, 2020  of times more training time and more hardware resources. Therefore, pretrained CNNs are often used to reduce the consumption of computing resources. However, as shown in Table 1, we find that the five state-of-art pretrained CNNs have very different classification performance. Some work well in melanoma classification but some do not. Therefore, it proves that improper pretraining is likely to weaken the model's learning capability, so it is usually needed to try many pretrained models to find the most appropriate one, which is also time-consuming. The proposed STDP-based convolutional SNNs is generally a good choice. On one hand, as shown in Table 1, comparing to non-pretrained CNNs models (i.e., CNNs that need to be trained from scratch), our SNNs (with and without feature selection) not only achieve much better classification accuracies but also have much better runtime efficiency (tens of times better); on the other hand, although the pretrained CNNs models (i.e., the finetuned CNNs) can achieve similar running time, our proposed SNNs are more stable and easier to use than the pretrained CNNs because we do not need to try many pretrained models any more, and SNNs also have much better classification accuracies than the pretrained CNNs. In addition, our networks have only three convolutional layers, and there are no fully connected layers. The complexity of the model and the parameters that need to be trained in the network are greatly reduced, which is very beneficial for building a neuromorphic computing platform or deploying to small portable devices.
In order to further show the good performance of our proposed SNNs, we use the remaining 5557 NV images after random sampling and 281 MEL images from the four-fold cross validation as the test sets, and make the MEL/NV classification using the models listed in Table 1. It should be noted that in the case of unbalanced datasets, the AC value and PREC value will be unreliable, but the AUC value can still measure the performance of the classifier well. The classification results are listed in Table 2. It can be seen that on the remaining 5557 NV images, our SNNs using feature selection still show the best AUC value, SE value and SP value, which shows the superiority of our model in the MEL/NV classification over other CNNs. And for each listed network model, the AUC and SP values decrease slightly, but there is no much difference between each of these three classification values in Table 1 and Table 2, which proves that the sampling method we used in the experiments is feasible.

E. ABLATION STUDY: EFFECT OF USING FEATURE SELECTION
In this section we analyze why our SNNs perform better when using feature selection. In our SNNs without feature selection, the neurons in the last pooling layer perform a global maximum pooling over their corresponding neuronal maps of the last convolutional layer. Only one output value is taken for each feature to represent the presence of this feature in the input image. But when the networks recognize images with high similarity, those rare but diagnostic information is easily filtered by the global maximum pooling operation, which affects the final classification performance of the SVM classifier. Therefore, we use the univariate feature selection algorithm to replace the global maximum pooling. Using feature selection can retain enough optimal features while removing redundant features. Figure 4 shows the melanoma classification accuracies of our SNNs when using different percentages of optimal features during the testing phase. It can be seen that when 8% of the optimal features are selected and used in the classification, we obtain the best accuracy of 0.883 (the highest point in the figure) for the melanoma vs melanocytic nevi classification. Figure 5 shows the performance comparison of our SNNs with and without feature selection in a single experiment. Figure 5 (a) and (b) are the confusion matrices of our SNNs without and with feature selection, Figure 5 (c) shows the ROC curves of our SNNs without and with feature selection. We can see that on the MEL/NV dataset, the recognition rate of MEL increases from 0.875 (without feature selection) to 0.907 (with feature selection), the recognition rate of NV increases from 0.822 (without feature selection) to 0.861 (with feature selection), and the AUC increases from 0.917 (without feature selection) to 0.945 (with feature selection). It is obvious that the use of feature selection improves the classification performance of our SNNs on the MEL/NV dataset.

V. CONCLUSION
Although deep CNNs have achieved remarkable success in medical image analysis tasks in recent years, their expensive computational costs limit the application of CNNs in the real-time clinical practice. Besides, since labeled skin cancer images are limited, using CNNs trained from scratch will make networks prone to overfitting, and the CNNs pretrained on natural images are not very suitable for skin lesion image analysis. In this paper, we use a STDP-based convolutional SNNs to distinguish melanoma from melanocytic nevi, which is the first work that applying SNNs on medical image classification tasks. The classification results show that efficient temporal coding, event driven learning rule and winner-takeall (WTA) mechanism together ensure sparse spike coding and efficient learning of our networks which achieve an average accuracy of 83.8%. We also find that after the learning of all convolutional layers have finished, using feature selection can effectively help pick out diagnostic information from the features extracted by SNNs, and can improve the classification performance of our SNNs to an average accuracy of 87.7%. In addition, compared with listed traditional CNNs, our SNNs using feature selection show the best classification accuracy and AUC value, and more importantly, our networks have significantly lower computational costs than CNNs that need to be trained from scratch. So, our SNNs are very friendly to smartphones and portable devices with low computational power (CPUs or low-end GPUs) limitations [82]. And also our SNNs are more stable and easier to use than the pretrained CNNs. This is very helpful to facilitate the implementation of automated skin lesion classifiers and makes the access convenient and cheap. Moreover, our networks can be easily extended to other medical image analysis fields.
In the future, we will try to improve the spike coding in the input layer of the SNNs in order to better encoding some visual information such as texture and color which the DOG filter may not deal well with [41]. We will also try to further improve our network architectures by adding feedback connections, which has been proven to play an important role in the visual pathway [83]. In addition, our future work will include multi-classification of skin cancer lesions with high performance, which is still challenging to STDP-based SNNs.
QIAN ZHOU received the Ph.D. degree in control theory and control engineering from The University of Nankai, Tianjin, China, in 2010. She is currently an Associate Professor with the Hebei University of Technology, Tianjin, China. Her current research interests include biological neural networks, spiking neural networks, STDP plasticity, and deep learning. From 2017 to 2018, he worked as a research associate at the Department of Computer Science, University of Oxford. He is currently a Professor with the Hebei University of Technology, China, and a awardee of ''100 Talents Plan'' of Hebei Province. He has published dozens of articles in top AI or database conferences, such as., AAAI, IJCAI, ICDE, EDBT, CIKM, and so on. His research focuses on topics within artificial intelligence and data mining, especially deep learning, medical artificial intelligence, health data mining, and computing vision.
RUOWEI QU received the M.S. degree from the School of Science, University of Science and Technology Beijing, China, in 2012. She is currently pursuing the Ph.D. degree with the School of Electrical Engineering, Hebei University of Technology, Tianjin, China. Her research interests include medical image processing, pattern recognition, and machine learning.
GUIZHI XU received the Ph.D. degree from the School of Electrical Engineering, Hebei University of Technology, Tianjin, China, in 2002. She is currently a Professor and a Ph.D. supervisor with the Hebei University of Technology. She has acted as the Principal Investigator (PI) of tens of national and provincial projects, including Natural Science Foundation of China. She has published more than 90 SCI or EI indexed academic papers and three books. She was awarded the Outstanding Contribution Award of Science and Technology in Hebei Province, the Second Prize of the Natural Science of Hebei Province, the Hebei Science and Technology Progress, and the Hebei Province Excellent Teaching Achievement, and the Third Prize of the Natural Science of Hebei Province. VOLUME 8, 2020