A Supervised Learning Algorithm for Spiking Neurons Using Spike Train Kernel Based on a Unit of Pair-Spike

In recent years, neuroscientists have discovered that the neural information is encoded by spike trains with precise times. Supervised learning algorithm based on the precise times for spiking neurons becomes an important research field. Although many existing algorithms have the excellent learning ability, most of their mechanisms still have some complex computations and certain limitations. Moreover, the discontinuity of spiking process also makes it very difficult to build an efficient algorithm. This paper proposes a supervised learning algorithm for spiking neurons using the kernel function of spike trains based on a unit of pair-spike. Firstly, we comprehensively divide the intervals of spike trains. Then, we construct an optimal selection and computation method of spikes based on the unit of pair-spike. This method avoids some wrong computations and reduces the computational cost by using each effective input spike only once in every epoch. Finally, we use the kernel function defined by an inner product operator to solve the computing problem of discontinue spike process and multiple output spikes. The proposed algorithm is successfully applied to many learning tasks of spike trains, where the effect of our optimal selection and computation method is verified and the influence of learning factors such as learning kernel, learning rate, and learning epoch is analyzed. Moreover, compared with other algorithms, all experimental results show that our proposed algorithm has the higher learning accuracy and good learning efficiency.


I. INTRODUCTION
Neurocomputing science explores the mathematical basis of neurocomputing by studying the mechanism and process of neural intelligent system [1], [2]. As the research object of neurocomputing science, artificial neural networks (ANNs) abstract and simulate the structure and function of biological nervous system, and play an important role in information processing and pattern recognition [3]- [5]. An artificial neural network consists of many artificial neuron models [6]. Every neuron model represents an activation function, and every connection between arbitrary two neuron models represents a synaptic weight. Every neuron in biological nervous system can transmit the information to other neurons by emitting spike trains [7]. Information in traditional neuron The associate editor coordinating the review of this manuscript and approving it for publication was Geng-Ming Jiang . models is encoded and processed based on the firing rate of spike trains, because there is a large amount of information in the rate [8]. Related encoding methods include spike count, spike density, and population activity [2]. Related activation functions include Sigmoid function, Rectified Linear Unit (ReLU), and piecewise linear function [3]. However, the firing rate is too simple to describe neuronal activities, and neuroscientists suggested that the precisely firing time of spikes is also essential for informational encoding [9], [10]. Then, many researchers proposed a variety of precisely temporal encoding methods such as time to first spike, latency phase, and rank order [2]. Based on these encoding methods, spiking neuron modals can simulate biological neurons in the temporal dimension and have more biological plausibility than traditional neuron models [11], [12]. Composed of spiking neurons, spiking neural networks (SNNs) as a new generation of neural networks provide the stronger ability VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ to approximate arbitrary continuous functions [12]- [14], and can process many complex spatio-temporal patterns well [15]- [19]. Supervised learning in ANNs has a learning mechanism which furnishes the desired output data with given input data, produces the actual output data, and makes a comparison of two output data sets to regulate the synaptic weights [2]. When the sample has changed, supervised learning can make the network adapt to a new circumstance by regulating synaptic weights. These data sets used in the training are the sets of discrete spikes, which are called spike train. With precisely temporal encoding, the aim of supervised learning for spiking neurons or SNNs is to output corresponding desired spike trains [20]. Researchers verified that the mechanism of supervised learning naturally exists in the biological nervous system, even though several details are still unknown [2]. The referent activity template is one of the current explanations for supervised learning, which is mimicked by the circuits of interest [21]. Moreover, the desired output signal of supervised learning can represent an instantiation of reinforcement learning by operating only on a much shorter duration [22]- [24].
Supervised learning algorithms for spiking neurons or SNNs mainly include three types, gradient descent, synaptic plasticity, and spike train convolution [25]. Algorithms based on gradient descent firstly use the gradient value and backpropagation (BP) to gain a degree of association between spike trains. This degree is applied to adjust the synaptic weight so that the deviation between the actual output spike train and desired output spike train tends to zero. SpikeProp [26] as the first algorithm of gradient descent for SNNs can learn with a single spike. Then, Xu et al. [27] proposed gradient-descent-based (GDB) which can learn multiple spikes in every layer. Henceforth, the algorithm based on gradient descent for SNNs can process the complex spatial-temporal pattern of spike trains. Algorithms based on the synaptic plasticity are more biologically plausible. Based on the mechanism of synaptic plasticity proposed by Knott [28], spike-timing-dependent plasticity (STDP) mechanism is useful for spike train learning [29], [30]. Combining the Bienenstock-Cooper-Munro (BCM) rule and STDP, Wade et al. constructed an algorithm named synaptic weight association training (SWAT) [31]. Ponulak and Kasiński [32] proposed a remote supervised method (ReSuMe), which adjusts all synaptic weights according to STDP and anti-STDP. Because the learning mechanism of ReSuMe only depends on the spike trains and STDP, ReSuMe can adapt many types of spiking neuron models [33]. Based on ReSuMe, Lin et al. [34] proposed T-ReSuMe by combining triple spikes. Taherkhani et al. [35] constructed DL-ReSuMe by merging a delay shift approach and ReSuMe. The above two types are only based on the discontinuous points [25]. However, algorithms based on spike train convolution can transform all spike trains into a continuous form by using specific spike train convolution kernel. Hence, spike trains can be calculated by using common mathematical operators [36].
Carnell and Richardson [37] mapped all spike trains into a vector space and made use of linear algebra to construct a supervised algorithm. Mohammed et al. [38] came up with spike pattern association neuron (SPAN) by combining Hebbian interpretation of the convolution and Widrow-Hoff (WH) rule. Based on SPAN algorithm, Yu et al. [39] proposed a precise-spike-driven (PSD) rule, which associates all input spike trains with the given desired spike train and only uses one convolution of input spike trains. Then, Lin et al. [40] proposed a spike train kernel learning rule (STKLR) based on the definition kernel functions for inner product of spike trains. Gardner and Grüning [41] proposed an optimal algorithm Flittered-error synaptic plasticity rule (FILT), where a convolution with an exponential filter is considered. Recently, Wang et al. [42] constructed a variable delay learning algorithm based on the inner product of spike trains.
We focus on the supervised learning algorithm of spike trains for spiking neurons in this paper. Firstly, multiple spikes from input, actual output, and desired output spike trains make up a complete data set of spikes to regulate the synaptic weights, so spike selection in this set is an important research part. In addition, the discontinuous internal state variables in neuron modals also bring many difficulties into the construction of an efficient algorithm. In this paper, we analyze some relationships between all spikes in the above set, and discover that existing algorithms have some redundancies in the spike selection and computation. To eliminate the repetitive spike computation, we construct a direct regulation method of synaptic weights based on a unit of pair-spike, which can be applied to all supervised learning algorithms of spike trains. Then, using the representation of spike train kernel, we transform all discrete spike trains into the continuous function. Thus, the unified representation of spike trains and formal definition of spike trains similarity measure are realized by mapping all spike trains into a high-dimension space corresponding to the kernel function. Finally, we use the spike train kernel based on a unit of pairspike to construct a supervised learning algorithm for spiking neurons. We denote the algorithm as unit of pair-spike (UPS).
The rest of this paper is organized as follows. In section II, the spiking neuron model and learning architecture are introduced. Section III presents the divide of intervals for spike trains, a spike selection and computation method called pairspike, and UPS. In section IV, many experiments are carried out to demonstrate the learning capability of UPS and compare UPS with other algorithms. Section V presents the discussion and section VI describes the conclusion and future work.
They are all constructed to explain the spiking mechanism of biological neurons. SRM is applied more and more widely, because it can intuitively express the internal state of neurons and easily describe the learning algorithms [27]. Furthermore, SRM has more biological plasticity than other spiking neuron models such as LIF and integrate-and-fire (IF) [43]. Thus, we select SRM as the basic spiking neuron model.
Based on the cortical interneuron, SRM proposed by Jolivet et al. [43] is a spiking neuron model with a varying neuronal threshold and like a nonlinear IF. However, SRM uses an integral over the past to express the membrane potential, which is different with the nonlinear IF. Then, this standard SRM was simplified by setting a fixed neuronal threshold to use easily [44]. Furthermore, by only using the firing time of since output spike to express the refractory period, SRM is convenient in many researches [34], [36], [45]. Thus, we use this convenient SRM. In SRM, the internal state as an integral over the past is represented by a variable membrane potential u at time t. If there is no input spike, u is at a resting state. When input spikes arrive from presynaptic neurons via synapses, if u defined as a sum of postsynaptic potential (PSP) is below the setting threshold V thresh , u will be induced by these input spikes. Suppose that some input spiking neurons connect with the current neuron through N neural synapses, G i spikes are transmitted by ith synapse. We denote the arrival time of these spikes at the current neuron as t 1 i , t 2 i , . . . , t G i i , the firing time of gth spike as t g i , and the firing time of the most recent output spike prior to current time ast f . The internal state u(t) at t is, where w i is a synaptic weight for ith synapse, i ∈ Z + . PSP is determined by a spike response function ε(t) expressed as: where τ is a time decay constant that determines the shape of this function. After an accumulation of input spikes, SRM fires an output spike at time t f when u(t) exceeds V thresh . The membrane potential u(t) immediately falls back to a resting potential u rest = 0 at t f . The current neuron must not fire a spike even receiving many input spikes. Then, there are two processes, absolute refractory period (ARP) and relative refractory period (RRP). The phase named ARP is a repolarization. At the end of ARP, current neuron comes into RRP, whose u(t) is lower than u rest and cannot reach V thresh . Therefore, firing spike is difficult for the current neuron. RRP is a hyper polarization. In the expression of the neuronal internal state, a refractoriness function ρ t −t f depicts this RRP. The function ρ t −t f is given by, where τ R is a time decay. The refractoriness is mainly induced byt f , which ignores all earlier output spikes. Fig. 1 presents the learning architecture, which makes up of two components, input layer and output layer. The input layer includes many input units and output layer includes one output unit. Input units are independent of each other. Every input unit connects with the output unit. The synaptic weight represents the connection strength. w 1 , w 2 , and w i are 1st, 2nd, and ith weights between input units and the output unit respectively, where i ∈ Z + . All input units generate the input train of discrete spikes. The output unit generates an actual output spike train, where the desired output spike train is set in advance. The supervised learning is implemented by adjusting all synaptic weights. Descriptions of the learning architecture are as follows: The output unit generates a desired output spike train when every input unit generates only an input spike train. Then, all input spike trains generated from the input layer transfer to the output layer. After a transformation through the synapses, all input spikes accumulate in the output unit to output an actual output spike train. When the actual output spike train is outputted, it is applied to compare with the desired output spike train. If they are different, the error of them will be back to control the regulation of all synaptic weights. Because there is usually still a new difference after the regulation in an epoch, the process has to be iterative until the two output spike trains are absolutely equal or the number of iterations is up to a setting value. Many experiments have proved the learning capability of this architecture [31]- [40].

III. THE SUPERVISED LEARNING ALGORITHM: UPS A. THE DIVIDE OF INTERVALS FOR SPIKE TRAINS
The spike train s = t f ∈ : f = 1, 2, . . . , F represents a sequence of spiking times in an interval [0, T ]. It is shown formally as: where F is the number of spikes and t f is the f th spike firing time. δ (t) is the Dirac delta function. If t = 1, δ (t) = 1, VOLUME 8, 2020 otherwise δ (t) =0. s i , s a and s d express ith input spike train, the actual output spike train, and the desired output spike train respectively. t g d and t h a denote the firing times of gth spike in s d and hth spike in s a respectively, where g ∈ Z + and h ∈ Z + . G is the number of all spikes in s d and H is the number of all spikes in s a , where G ∈ Z + and H ∈ Z + . We denote every time interval of two nearest desired output spikes as desired time interval (DTI). The divide of intervals for s d is shown in Fig.2. Except the first DTI is 0, t 1 d , the gth interval is t g−1 d , t g d . We denote every time interval of two nearest spikes in s a as actual time interval (ATI). The first ATI is 0, t 1 a and the hth interval is t h−1 a , t h a . By combining s a with s d , we divide s i into some intervals that include the input spikes. Because DTI is fixed in the learning process, we select DTI as the basic interval. There are three types of intervals in a DTI. The first type named first input time interval (FITI) is from the firing time of the first desired output spike to the first actual output spike, the second type named median input time interval (MITI) is between two nearest actual output spikes, and the third type named last input time interval (LITI) is from the last actual output spike to the second desired output spike. There is no MITI when only one actual output spike exists in a DTI. Especially, if there is no actual output spike in a DTI, the DTI only equals to a LITI. Some intervals in Fig.3 include this condition.
In Fig.3, 0 millisecond (ms) as the initial time means the firing times of actual output spikes and desired output spikes in interval [0, 0] or [−∞, 0] are equal. Set 0 ms as the firing time of the last desired output spike in [0, 0]. The interval from 0 ms as the firing time of a desired output spike to t 1 d is a DTI 0, t 1 d . The interval from 0 ms as the firing time of an actual output spike to t 1 d is a LITI 0, t 1 d . Thus, a DTI without actual output spike equals to a LITI.

B. THE SELECTION AND COMPUTATION OF PAIR-SPIKE
For the supervised learning of spiking neuron models, the adjustment of all synaptic weights depends on the relative temporal relationship among input spike trains, actual output spike trains, and desired output spike trains [25], [40]. With WH rule, the changed value of w i at t time is expressed as: where η is the learning rate (LR), s i (t) is an input train, s a (t) is an actual output train, and s d (t) is a desired output train. Generally, w i increases when the running time equals to the firing time of desired output spike at t g d and decreases when the time equals to the firing time of the actual output spike at t h a . We set t f i as the firing time of f th spike and F i as the number of spikes in s i , then the update rule of w i for every t f i is given as follows: where φ(·) is the update function of w i . We can know from (6) that every t a , t f i will be selected H times. Except the use of the same s i , two parts of (6) have no direct connections with each other. That is to say, if t f i is not only earlier than t g d but also earlier than t h a , t f i will firstly compute with t g d to increase w i and then compute with t h a to decrease w i . This spike selection and computation method is widely used in existing algorithms [25], [32], [34] and called multi-spike in this paper. The total adjustment of w i for t f i equals to the sum of two parts in (6), so the corresponding update value is expressed as follows: Fig. 4 shows an example of multi-spike. However, there is a problem of wrong adjustment. In Fig.4, The original spike selection and computation method called multi-spike, where t 1 i , t 2 i before t 1 a are selected to compute with t 1 a to decrease w i , and t 1 The final update value of w i in the first learning epoch is obtained as: We can see from Fig.4 that the accumulation of input spikes fired at t 1 i and t 2 i is enough to produce an output spike at t 1 a . After that, only the accumulation of input spikes fired at t 3 i , t 4 i is insufficient to produce an output spike at t 1 d . Support the accumulation of input spikes fired at t 1 i and t 2 i fails to produce an output spike at t 1 a , then the accumulation of input spikes fired at t 1 (8). Thus, w i may finally increase, which is opposite to the desired adjustment.
Based on the same principle, there is also the similar problem when we exchange t 1 d and t 1 a . Then, t 1 i and t 2 i are selected twice, one is computed with t 1 d to increase w i , and the other is computed with t 1 a to decrease w i . t 3 i and t 4 i are just computed once with t 1 a to decrease w i . The final update value of w i in the first learning epoch is obtained as: Only the accumulation of input spikes fired at t 1 i and t 2 i is insufficient to produce an output spike at t 1 d whereas the accumulation of input spikes fired at t 1 (9). The w i in evidence should decrease because t 1 a is later than t 1 d . However, w i finally decreases, which is opposite to the desired adjustment.
Here, we analyze the causes of above problems. Back to the example shown in Fig.4, when t 1 a is earlier than t 1 d , the immediate cause of this problem is that t 1 d is computed with unrelated t 1 i and t 2 i in FITI to increase w i . The primary cause is that the accumulation of only input spikes fired at t 3 i and t 4 i in LITI is insufficient to product an output spike at t 1 d . If there is a MITI, input spikes in MITI will be in the same state with input spikes in FITI. When t 1 a is later than t 1 d , the direct reason is that t 1 a is computed with unrelated t 1 i and t 2 i in FITI to increase w i . The primary cause is that on the basis of the accumulation of input spikes fired at t 1 i and t 2 i in FITI, an additional accumulation of only input spikes fired at t 3 i and t 4 i in LITI leads to product an output spike at t 1 a . Therefore, it is a feasible method to remove these unrelated computations.
We set t g d and t h a also as the firing time of the first desired output spike and the first actual output spike later than t f i respectively. According to the above analyses, we design a direct spike selection and computation method based on the input spikes: t f i and its first output spike later than t f i (t g d or t h a ) make up a unit of pair-spike computed only once to adjust w i in every epoch. We denote this method as pair-spike. The update value of w i for t f i is changed as follows: (10) is a function to obtain the minimum value between t g d and t h a , where t g d and t h a are initially set to positive infinity or a same number much larger than the known duration T . The final update value of w i in one learning epoch is obtained as: For a long s d with many firing times, t f i in FITI and MITI is selected by using pair-spike to only compute with t h a to decrease w i and t f i in LITI is selected by using pair-spike to only compute with t g d to increase w i . Like multi-spike, the pair-spike as a general method is useful for all supervised leaning algorithms of spike trains.
Here, we use the example of Fig. 4 to explain pair-spike well. t 1 i and t 2 i in FITI are unselected by using pair-spike to compute with t 1 d to increase w i . Thus, t 1 i and t 2 i earlier than t 1 a in FITI are selected to only compute with t 1 a to decrease w i . t 3 i and t 4 i later than t 1 a but earlier than t 1 d in LITI are selected to only compute with t 1 d to increase w i . The final update value of w i in the first learning epoch is obtained as: The accumulation of input spikes fired at t 1 i and t 2 i is enough to produce an output spike at t 1 a whereas only the accumulation of input spikes fired at t 3 i and t 4 i is insufficient to produce an output spike at t 1 d . Thus, φ(t 1 (12). w i finally increases, which is our desired adjustment.
After we exchange t 1 d and t 1 a in Fig. 4, the computation of t 1 i and t 2 i with t 1 a is removed by using pair-spike. t 1 i and t 2 i earlier VOLUME 8, 2020 than t 1 d are selected to only compute with t 1 d to increase w i whereas t 3 i and t 4 i later than t 1 d but earlier than t 1 a are selected to only compute with t 1 a to decrease w i . Then, Only the accumulation of input spikes fired at t 1 i and t 2 i is insufficient to produce an output spike at t 1 d . The accumulation of input spikes fired at t 3 i and t 4 i is also insufficient to produce an output spike at t 1 a . The total decrease (13), which can directly avoid the wrong adjustment in (9).

C. AN ALGORITHM USING SPIKE TRAIN KERNEL BASED ON THE UNIT OF PAIR-SPIKE
Kernel method (KM) [46] is one of the effective ways to solve the problem of non-linear pattern. The core idea of KM is to convert the original data into an appropriate high-dimensional kernel feature space by using a non-linear mapping, which can discover the relationship among data [47], [48]. Suppose x 1 and x 2 are two points in the initial space. The nonlinear function λ(·) realizes the mapping from an input space X to a feature space P. The kernel function can be expressed as the following inner product form: where κ (x 1 , x 2 ) is a kernel function. For the firing times t m and t n of two spikes, the kernel function form is given by: where κ must be symmetric, shift-invariant and positive definite such as Gaussian kernel, Laplacian kernel [2], [47]. By combining (15), φ(·) in (11) is set to the kernel function. Then, (11) can be expressed as follows: where κ () is a kernel function. Since the use of spike train kernel is based on the unit of pair-spike, we denote this algorithm of (16) as unit of pair-spike (UPS).

IV. SIMULATION RESULTS
In this section, we carry out several experiments. Firstly, we implement a simple example to show the learning process. Then, we make a comparison to demonstrate the affection of pair-spike. Moreover, we analyze some learning factors that exert influences on the learning performance of UPS, including the learning kernel κ( ), LR, and learning epoch. Finally, UPS compares with other algorithms.
The parameters of SRM are the time constant of postsynaptic potential τ = 7 ms, time constant of refractory period τ R = 80 ms, and neuronal threshold V thresh = 1 mV. Due to all experiments are implemented by the computer program with the discrete time, SRM has to progress in a discrete mode. In this paper, the discrete precision is 0.1 ms.
In order to evaluate the learning performance, a correlation-based measure C is set to measure the distance between the desired and actual output spike trains [49], [50]. Like Victor-Purpura distance [51], C is popular in many papers [31]- [34], [38]- [40]. In all following experiments, C = 1 means SRM precisely fires the desired output spike train, and the maximum C is replaced by C only when C is greater than the maximum C.

A. A SIMPLE EXAMPLE OF UPS
In the simple example, κ () is set to a Laplacian kernel [48]. Then, the mathematical representation of UPS is given as follows: where τ + = 7 ms. The basic parameters are the number of input spike trains N I = 200 and the number of output spike trains N O = 1. Initial weights are generated as a uniform distribution in an interval (0, 0.2). All input spike trains and the desired output spike train are Poisson spike trains generated randomly with a Poisson rate 20 Hz of a time interval (0, T ), where the length of all spike trains T is set to 100ms. Because this is a simple experiment, the number of upper limited iterative epoch U is set to a smaller number 100.
The complete learning process is presented in Fig.5 (a), including the initial output spikes ( * ), desired output spikes (pentagram), actual output spikes after the learning of last epoch (×), and actual output spikes at all learning epochs during learning process (·). As shown in Fig.5 (a), there are many incorrect spikes in the early period of this learning process. With the number of incorrect spikes decreasing, all actual firing times gradually tend to the desired times. Finally, this learning successfully fires the desired spike train, which proves the learning ability of UPS to spike train learning. Fig. 5(b) shows the change of C in this learning process. With three obvious oscillations, C has an upward trend. Finally, UPS learns successfully at 63 epochs. The oscillation from 20 epochs to 30 epochs is sharper than others. By combining the learning accuracy with the learning process, we can see that the incorrect spikes are generated later than 60 ms from 20 epochs to 30 epochs. At the same time, the firing times between 40 ms and 50 ms fluctuates obviously. From the appearance to the disappearance of these incorrect spikes, this is exactly the performance of an effective regulation. 53432 VOLUME 8, 2020 The rhythmic generation of spike doublet discharge is one of main reasons for the above oscillatory burst pattern [52]. The spike doublet also appears in the learning algorithms of spike count and spike time. In fact, the subset of spikes plays a predominant role in the regulation process of synaptic weights [53]. That is to say, UPS with pair-spike is more oscillatory than other algorithms without pair-spike, which can also be seen by comparing Fig. 5(b) with other figures of learning accuracy in [25], [27], [32]- [34] et al. However, spike doublet is unrelated to the initial condition of synaptic weight. The converging of these weights is related to the input set and parameters. The coupling temporal correlation can lead to the spike doublet just in certain parameter regimes [53]. Thus, by setting some suitable parameters, such as LR, we can avoid the oscillation or limit it in a small range.

B. EFFECT OF PAIR-SPIKE FOR UPS
We carry out a group of experiments to compare the difference between UPS with multi-spike and UPS with pairspike. In these experiments, N I is set to 400 and T is set to 400 ms. Due to the increase of experimental complexity, U is set to a slightly larger number 500 and LR is set to a smaller number 0.01. Other settings remain the same as the settings in section IV.A. Without any special explanation, 30 experiments are performed at each case in this paper. The mean and standard deviation (SD) of maximum C (M C and SD C respectively) are presented in Fig. 6(a), where M C and SD C are used to analyze the learning accuracy and its stability respectively. We can see from Fig. 6(a) that UPS with pair-spike has a bigger M C and a smaller SD C than UPS with multi-spike, which means the learning accuracy of UPS with pair-spike is higher and stabler. Especially, SD C of UPS with multi-spike is so more than SD C of UPS with pair-spike, which more reveals that multi-spike can sometimes lead to an abnormal regulation. Fig. 6(b) shows the effect of pair-spike on the efficiency, including the mean and SD of the epoch for maximum C (M E and SD E respectively), where M E and SD E are used to analyze the learning efficiency and its stability respectively. Fig. 6(b) presents that UPS with pair-spike has the less M E and SD E than UPS with multi-spike, which means the learning efficiency of UPS with pair-spike is higher and stabler. Thus, it can be proved that pair-spike has a positive effect on the supervised learning of spike trains.

C. INFLUENCES OF LEARNING FACTORS 1) INFLUENCE OF LEARNING KERNEL
In the previous experiments, learning kernel is all set to a Laplacian kernel. However, different kernels may have VOLUME 8, 2020 different effects on learning performance. Whereas there are many kernels, we select Gaussian, Laplacian, and Inverse Multi-quadratic (IM) frequently used to prove the influence of κ( ) [36], [47]. When κ( ) is a Gaussian kernel, the mathematical representation of UPS is given by, where τ g = 2 ms. When κ( ) is a IM kernel, the mathematical representation is obtained as: where c = 1 ms. In these experiments, basic parameters LRs for Gaussian, Laplacian and IM kernel are set to 0.025, 0.01, and 0.015 respectively. Other settings remain the same with the settings in IV.B. Fig. 7 shows the experimental results. Fig. 7 shows that UPS using each kernel all can learn well. With the biggest M C and smallest M E , Gaussian kernel has the best performance. IM kernel makes the same accuracy with Laplacian kernel and has a better efficiency. With the smallest M C and biggest M E , Laplacian kernel is worst relatively. Although the performance of Gaussian kernel is best, Gaussian kernel as the best one cannot represent the general standard of UPS. On the contrary, if UPS with Laplacian kernel has an excellent performance, we can prove the general advantage of UPS still further and the potential disadvantages can be more obviously for comprehensive analyses. To keep the generality of UPS, we select Laplacian kernel as a representation in all following experiments. Laplacian kernel can also prove the universal advantage of UPS better when UPS compares with other algorithms.

2) INFLUENCE OF LEARNING RATE
In the previous experiments, LR is directly set by researchers. However, different values of LR may bring different results for the sensitive learning of spike trains. In this section, we test LR in the range of benchmark values to select the best LR. Except LR is set to 0.0001, 0.0005, 0.001, 0.005 and 0.025, other experimental settings are unchanged. Fig.8 presents the learning results. Fig.8 (a) shows that the learning accuracy with LR = 0.0001 is very low. With LR increasing, M C increases, SD C decreases, and all M C can reach 0.9. It means a larger learning rate can lead to a higher and stabler learning accuracy and UPS has less dependence on LR in a range. In order to validate this increment, we set a larger LR = 0.015. The learning result with LR = 0.015 are also presented in Fig. 8. However, M C decreases and SD C increases when LR increases to 0.015. Fig.8 (b) presents the influence of LR on the learning efficiency. With LR increasing, M E decreases firstly, then increases, and has a minimum with LR = 0.01, which shows M E has a negative correlation with M C . Thus, an appropriate LR in a scale can make a better learning performance. In the following experiments, LR is set by a self-adaptive method used in [36], [40]. This method is based on the firing rate, which needs a minimum of benchmark interval of FR v min , a maximum of benchmark interval of FR v max , and benchmark LR η * . Based on the above experimental parameters, results, and analyses, v min , v max , and η * are set to 20 Hz, 40 Hz, and0.01 respectively. In addition, this self-adaptive method can also make those oscillations mentioned in IV.A in a very small range to obtain a better learning result [2].

3) INFLUENCE OF LEARNING EPOCH
In the previous experiments, U is all set to a number no more than 500. However, a large U may influence the learning result. In this section, U increases from 100 to 600 with an interval of 100. To obviously show the influence, Poisson rate of all spike trains is up to 40 Hz. Other basic parameters are the same with those in section IV.B. Fig. 9 shows the learning accuracy and efficiency.
We can see from Fig. 9(a) that M C increases and SD C decreases gradually along with an increasing U . It means more learning epochs can truly improve the learning accuracy. However, the same increase of U has less and less influences on this improvement at the same time, which reduces the average effect of learning epochs. For example, M C can increase by 0.133 from U = 100 to U = 300 whereas M C can only increase by 0.029 from U = 300 to U = 500. Thus, the learning accuracy of UPS is not sensitive enough to the learning epoch that is enough, which is also shown in other algorithms such as ReSuMe [27], [32]. Fig. 9(b) presents the influence of U on the efficiency. With U increasing, M E and SD E all become larger and larger, have a positive correlation with M C , and have a negative correlation with SD C , which means an increase of computational cost and an instable learning efficiency. Thus, U should be set to a suitable value to satisfy the demand of valuable M C , which can also reduce the computational cost.

D. COMPARISON OF UPS AND OTHER ALGORITHMS
In section IV.C, we analyze the influence of learning factors for UPS. On this basis, we appropriately select the parameters such as kernel, LR, and U . To better demonstrate the learning performance, UPS compares with other algorithms through more complex experiments of spike train learning in this section. The compared algorithms include ReSuMe, SPAN, and PSD using pair-spike. We select these algorithms because they belong to different types or they are independent of spiking neuron models. ReSuMe is based on the synapse plasticity when SPAN and PSD use different convolutions of spike trains. Algorithms based on the gradient descent are unused, because they usually require an analytical expression of state variables and much more learning epochs to obtain the higher learning accuracy [25]. Parameters of ReSuMe in [32] are the time constant a = 0.002, an amplitude A + = 1, and the time constant τ + = 7 ms. The convolution kernel of SPAN is α kernel where a time constant τ = 7 ms [38]. The parameters for PSD algorithm are τ s = 10 ms [39]. The η * of ReSuMe, SPAN and PSD is set to 0.01, 0.0005, and 0.01 respectively. We take many experiments with some transitional attributes of spike trains, such as N I , T , and FR, which can comprehensively verify the learning ability of spike trains [25], [27].

1) COMPARISON OF UPS AND OTHER ALGORITHMS WITH TRANSITIONAL NUMBER OF INPUT SPIKE TRAINS
This section includes two group of experiments, where every initial weight is at a random distribution with the interval (0, 0.2), and the output spike train and all input spike trains are Poisson spike trains with T = 500 ms. In the first group, FR = 50 Hz and N I increases from 100 to 500 with an interval of 50. Without any special explanations, U of all experiments is set to 1000 for all complex experiments in section D. Fig. 10(a) shows the learning accuracy. When N I is small, all input spike trains fail to provide enough spikes to make the spiking neuron model emit all desired output spikes. The learning ability of all four algorithms is worse. With N I increasing, the learning accuracy of all algorithms increases. Especially, UPS and ReSuMe have a relatively larger increase. However, when N I is less than 250, the learning accuracy of four algorithms is still lower. When N I is more than 200, all algorithms except PSD have a good learning ability, but the growth rate of learning accuracy of all algorithms is lower. In general, all algorithms can make M C tend to 1 when N I is close to 500. UPS has the higher and stabler learning accuracy than other three algorithms in most cases (150, 250, 350-500 input trains), which reveals the stronger ability of UPS to control spiking times. Fig. 10(b) presents the learning efficiency, where M E of all four algorithms fluctuates within a certain range. With the increase of N I , the learning efficiency of UPS increases firstly and then decreases whereas other algorithms have no obvious rules. However, the learning accuracy and learning efficiency of all algorithms have no special correlation. The learning efficiency of UPS is higher than that of other three algorithms in most cases (150 and 300-500 input trains), which also reveals the stronger ability of UPS to control spiking times.
In the second group of experiments, all input spike trains are Poisson spike trains with FR = 10 Hz and the desired spike train is a Poisson spike train with FR = 100 Hz. Since FR of input spike trains is much smaller than FR of the desired spike train, N I and its interval should be larger to show the trend. N I increases from 200 to 1000 with an interval of 100. Fig. 11(a) presents the learning accuracy. We can see that the learning accuracy of all algorithms in Fig. 11(a) has the same trend with the learning accuracy in Fig. 10(a) when N I increases. All algorithms have the worse learning ability relatively when N I is few. With the increase of N I , learning accuracy of all four algorithms increases and UPS, ReSuMe have relatively higher growth rates. However, the learning accuracy of all algorithms is still low when N I is less than 500. When N I is up to 500, all algorithms have the similar learning accuracy again. All algorithms can make the learning result tend to the success when N I is close to 1000. In general, UPS has the stronger ability to control the firing time of spikes because UPS has the higher learning accuracy than other three algorithms in most cases (300-400, 600-1000 input trains).
The learning efficiency is shown in Fig. 11(b), where four algorithms present different trends. With the increase of N I , the learning efficiency of algorithms SPAN and PSD fluctuates within a certain range, whereas the learning efficiency of UPS and ReSuMe has a trend to decrease, especially when N I is no less than 500. Combining Fig. 11(b) with Fig. 11(a), we can see that it is from N I = 500 that the learning accuracy of UPS and ReSuMe start to grow faster than that of SPAN and PSD again. UPS has the higher learning efficiency than other three algorithms in four cases (300-600 input trains), which shows that UPS also has a good efficiency to control the firing times of spikes.

2) COMPARISON OF UPS AND OTHER ALGORITHMS WITH TRANSITIONAL LENGTH OF SPIKE TRAINS
Two groups of experiments are carried out in this section, where T changes from 100 ms to 1000 ms with an interval of 100 ms. In the first group, N I is set to 400 and other settings remain the same as the settings in the first group of experiments in section IV.D1). Fig. 12(a) reveals that all four algorithms can learn to fire. When T is 100 ms, all algorithms can learn well. However, with the gradual increase of T , their learning accuracies decrease. When T is less than 400 ms, the learning accuracy of UPS, ReSuMe, and SPAN is still similar. At the same time, the learning accuracy of PSD decreases rapidly. Since T is up to 400 ms, the difference of learning accuracy between SPAN and UPS has become obvious. The gap of learning accuracy between UPS and ReSuMe also increases gradually when T is more than 600 ms. In general, ReSuMe and UPS have the higher learning accuracy than SPAN and PSD. Especially, UPS has the higher learning accuracy than other three algorithms in all cases. ReSuMe has the stabler learning accuracy than other three algorithms in four cases (300, 500, 700, and 900 ms) while UPS has the stabler learning accuracy than other three algorithms in three cases (100, 400, and 600 ms). Therefore, UPS has the stronger ability than other algorithms to control the firing time of spikes. Fig. 12(b) shows the learning efficiency. With T increasing, the learning efficiency of all algorithms decreases and tends to be stabler gradually. PSD has the lowest but stablest learning efficiency in most cases. Generally, UPS has a good learning efficiency to control the firing time of spikes, VOLUME 8, 2020 because UPS has the higher learning efficiency than other three algorithms in three cases (200, 500, and 700 ms) and an approximately highest learning efficiency in other three cases (400, 600, and 800 ms).
In the second group of experiments, N I is set to 600. Other settings remain the same as the settings in the second group of experiments in section IV.D1). Fig. 13 shows the learning results.
We can see from Fig. 13(a) that all four algorithms have a good learning accuracy when T = 100 ms. However, the learning accuracy of all algorithms gradually decreases when T increases, especially from T = 500 ms to T = 700 ms. The difference between SPAN and UPS in the learning accuracy has become obvious since T = 200 ms. The gap between ReSuMe and UPS in the learning accuracy gradually increases when T = 300 ms. Although the learning accuracy of all algorithms in Fig. 13(a) has the same trend as that in Fig. 12(a), the difference of learning accuracy between four algorithms remains within a certain range in Fig. 13(a). Especially, SPAN even has the higher learning accuracy than ReSuMe when T = 900 ms and T = 1000 ms. In general, UPS has higher learning accuracy in all cases and the stabler learning accuracy in three cases (100, 200, and 1000 ms) than other three algorithms, so UPS has a stronger ability to control the firing time of spikes. Fig. 13(b) presents the learning efficiency. With an increase of T , the learning efficiency of UPS and ReSuMe fluctuates within a certain range while the learning efficiency of PSD has a trend to decrease. When T = 1000 ms, the difference of learning accuracy between SPAN and UPS is less than their difference when T = 100 ms, but the difference of learning efficiency between SPAN and UPS also reaches a maximum at the same time. Thus, even if SPAN has the same learning efficiency with UPS, UPS still has the higher learning efficiency than SPAN. In general, UPS has the higher learning efficiency than other three algorithms in four cases (100-300, 500-600, and 800 ms), which can reveal the good efficiency of UPS to control the firing time of spikes.

3) COMPARISON OF UPS AND OTHER ALGORITHMS WITH TRANSITIONAL FIRING RATE OF SPIKE TRAINS
There are two groups of experiments in this section, where FR of all spike trains changes from 10 Hz to 100 Hz with an interval of 10 Hz. In the first group of experiments, N I is set to 400 and other settings remain the same as the settings in the first group of experiments in section IV.D1). Fig. 14 shows the experimental results. Fig. 14(a) shows the learning accuracy. When FR is few, all algorithms can learn well. However, with the increase of FR, more and more complex spike trains make the learning accuracy of all algorithms gradually decrease. The stability of learning accuracy of all algorithms also becomes worse when FR increases. By comparing different algorithms, we can see that the gap of learning accuracy among ReSuMe, PSD, and UPS gradually increases since FR = 30 Hz while the difference of learning accuracy between SPAN and UPS keep in a certain range. On the whole, UPS has the higher learning accuracy than other three algorithms in all cases except FR = 20 Hz and stabler learning accuracy than other three algorithms in four cases (100, 300, 400, and 600 ms). Besides, UPS can learn to precisely output the desired spike train in 18 experiments and M C is very close to 1 in other 12 experiments when FR = 10 Hz. In the same condition, SPAN and ReSuMe can both successfully learn in 15 experiments, and PSD just can successfully learn in 5 experiments. Therefore, UPS has the stronger ability to control the firing time of spikes than other three algorithms. Fig. 14(b) presents the learning efficiency. With the increase of FR, the stability of learning efficiency of each algorithm is better and better. For example, SD E of ReSuMe decreases from 295 with FR = 10 Hz to 101 with FR = 100 Hz. However, the learning efficiency of all algorithms decreases at the same time. For example, M E of UPS increases from 544 with FR = 10 Hz to 839 with FR = 100 Hz. In general, UPS has the higher learning efficiency than other three algorithms in most cases 70 and 90-100 Hz), so UPS has the stronger ability to control the firing time of spikes.
In the second group, s d is a linear spike train [25]. Other settings remain the same as the settings in the first group of experiments in section IV.D3). The Poisson input and linear output can reveal the ability of directional and practical learning [2].   Fig. 15(a) has the same trend with the learning accuracy in Fig. 14(a), the learning accuracy in Fig. 15(a) all has an improvement in each case. For instants, M E of PSD is 0.752 in Fig. 14(a) whereas M E of PSD is 0.909 in Fig. 15(a) when FR = 100 Hz. Because M E of all algorithms is more than 0.9 in every case, all algorithms have a good learning accuracy. However, no algorithm can learn to precisely output the desired spike train whatever the FR is. In general, UPS has the higher learning accuracy than other three algorithms in all cases and has the better stability of learning accuracy than other algorithms in all cases except FR = 10 Hz, which reveals that UPS has the stronger ability for the directional and practical learning. Fig. 15(b) presents the learning efficiency. We can see that the learning efficiency of all algorithms in Fig. 15(b) has the same trend as the learning efficiency in Fig. 14(b). In general, there is a negative correlation between FR and the learning efficiency as a whole. The learning efficiency of UPS is higher than other three algorithms in most cases (10-20 and 60-100 Hz), which also reveals the stronger ability of UPS for the directional and practical learning.

V. DISCUSSION
In this paper, using the spike train kernel based on a unit of pair-spike, we propose a supervised learning algorithm. The proposed algorithm named UPS uses the kernel method of pair-spike to directly adjust the weights. All experimental results indicate that UPS is an effective algorithm. Moreover, we can obtain two conclusions from these results. Firstly, UPS has the relatively higher learning accuracy. When N I is enough, T is short or FR is fewer, UPS and other algorithms all can learn with the higher learning accuracy. As T , FR increase or N I decreases gradually, UPS has the higher learning accuracy than other algorithms in most cases. Secondly, UPS has a good learning efficiency. In section IV.D, the learning efficiency of UPS is higher than that of other three algorithms in 31 of all 58 cases, which reveals that UPS has the good efficiency to spike train learning. Another advantage of UPS is the independence. The derivation of UPS is only based on the firing time of spikes and independent of the spiking neuron models.
Generally, many algorithms consume the computational resource on a useful input spike many times in every learning epoch. Following the thought that all spikes in different intervals have different functions to regulate the weights, we analyze the problem of original spike selection and computation method named multi-spike and its causes. Then, we use every useful input spike once in every learning epoch to construct a direct spike selection and computation method named pairspike. The derivation of pair-spike is only on the basis of spike trains, so pair-spike adapts to all supervised learning algorithms of spike trains. Experimental results in section IV.B prove that pair-spike can improve the learning accuracy and efficiency. Moreover, compared with multi-spike, pair-spike also reduces the computational cost. The computational cost of multi-spike in every learning epoch is based on the number of input spikes and the number of all output spikes. For the step alone, the time complexity of multi-spike is O(n 2 ) at least. If we regard the time complexity of a circle of the two output spike trains s a and s d as O(n), the time complexity of multi-spike is O(n 3 ). However, the computational cost of pair-spike in every learning epoch is only based on the number of input spikes. The number of computational times in an epoch is equal to the number of input spikes earlier than the last output spike and no more than the number of total input spikes. For the step alone, the time complexity of pair-spike is only O(n). Therefore, with the same software engineering techniques, pair-spike has a computational advantage than multi-spike, even if pair-spike may need little more learning epochs than multi-spike.
Kernel method plays an important role in UPS. In section IV.C.1), we test Gaussian kernel, Laplacian kernel, and IM kernel. These kernels all can make an effective learning result. Some other kernels such as the linear kernel and Sigmoid kernel are unsuitable for UPS, because the learning accuracy of these kernels is lower or the learning is failed.
UPS also has some disadvantages such as local best. For example, it is difficult for UPS to reach C = 1 when C closes to 1 sometimes, which means UPS is failed to make a successful learning in theory. As shown in Fig. 9, due to the average effect of every epoch for the learning accuracy decreases with the learning epoch increasing, M C has never reached 1 after 600 epochs based on 30 experimental statistics. In fact, this problem is not unique for UPS, which also exists in other learning algorithms [31]- [41]. For instants, the leaning accuracy of all algorithms has the problem in the second group of experiments in section IV.D3). Besides, UPS sometimes truly needs a little more epochs to obtain the greater maximum C close to 1 than other algorithms such as ReSuMe with pair-spike in the second group of experiments in section IV.D2). In our view, some methods such as multiple kernels or multiple convolutions may avoid these problems of UPS [2].

VI. CONCLUSION AND FUTURE WORK
A supervised learning algorithm called UPS is constructed by using the spike train kernel based on a unit of pair-spike in this paper. Combining the advantages of kernel method and the direct computation of our proposed method pairspike, UPS regulates the weights directly during the running process. As an effective and efficient supervised learning algorithm, UPS shows a better learning performance than several existing algorithms. In the future work, the above problems of UPS are important aspects to solve, where a higher accuracy with the fewer epochs is our desired result. Meanwhile, the relationship between pair-spike and different types of algorithms will be studied deeply and concretely in many aspects such as sensitivity and robustness. Moreover, the proposed algorithm UPS expanded for SNNs in future will be constructed and utilized in some actual databases and practical applications such as image classification, urban computing, and biomedical signal processing.