Neuromorphic Decoding of Spinal Motor Neuron Behaviour During Natural Hand Movements for a New Generation of Wearable Neural Interfaces

We propose a neuromorphic framework to process the activity of human spinal motor neurons for movement intention recognition. This framework is integrated into a non-invasive interface that decodes the activity of motor neurons innervating intrinsic and extrinsic hand muscles. One of the main limitations of current neural interfaces is that machine learning models cannot exploit the efficiency of the spike encoding operated by the nervous system. Spiking-based pattern recognition would detect the spatio-temporal sparse activity of a neuronal pool and lead to adaptive and compact implementations, eventually running locally in embedded systems. Emergent Spiking Neural Networks (SNN) have not yet been used for processing the activity of in-vivo human neurons. Here we developed a convolutional SNN to process a total of 467 spinal motor neurons whose activity was identified in 5 participants while executing 10 hand movements. The classification accuracy approached ${0}.{95} \pm {0}.{14}$ for both isometric and non-isometric contractions. These results show for the first time the potential of highly accurate motion intent detection by combining non-invasive neural interfaces and SNN.


I. INTRODUCTION
N EXT generation of Human Machine Interfaces (HMIs) aims at fast, safe, touchless, and intuitive control of digital devices, based on the prediction of human intention obtained by decoding neural activity, through neural interfaces. Applications range from the control of smart devices (smartphones, home, virtual and augmented reality) to the control of assistive devices and robots.
Neural interfaces extract information from different regions of the nervous system and differ in the degree of their invasiveness. Implanted electrodes directly measure the activity of neurons (e.g., electroneuronography (ENG), electrocorticography (ECoG), Microelectrode arrays (MEAs)), while non-invasive measures (Electroencephalography (EEG), surface electromyography (sEMG), etc.) provide global information on neural activity [20], [39]. The source of these neural recordings, i.e. the activity of many neurons, has a natural spiking nature, since information in neural systems is encoded at the population level (spatial) in the precise temporal pattern of spikes, as demonstrated in somatosensory [29], [42] and visual and audio [4] cortex for sensory stimuli and decision making [46].
When establishing a neural interface by decoding spiking biopotentials, the extracted spiking activity of neurons is usually transformed into non-spiking features, moving from a binary discrete domain to a continuous one. In particular, the use of a set of kernels enables to extract richer information from spike trains in terms of neural recruitment strategies, such as identification of neurons encoding for a specific function, spike train similarity, and probability distribution of spiking neuronal behaviour [43]. However, during all these types of transformations, some original information of the spiking neural activity could be lost, and in general we would always need to interpose this feature extraction from a discrete to a continuous domain, to then process these features with stateof-the-art machine learning [9], [44]. Also, some works focus on the waveforms of the action potentials in the neural signals instead of on their firing pattern [33].
Therefore, only a Spiking Neural Network (SNN)-based machine learning can unleash the full information potential of the intrinsic spiking nature of a pool of neurons, exploiting the complexity of their spatio-temporal sparse activity. Moreover, Fig. 1. Experimental protocol and High-Density-sEMG (HD-sEMG) electrode placement. a) The protocol included single-finger flexion (5 gestures), 3 different grips with 5, 3 and 2 fingers, and thumb abduction and opposition (10 gestures in total). b) 6 HD-sEMG grids each with 64-channels recorded the activity of 14 muscles. Two large grids (8mm Intra-Electrode Distance (IED)) were placed over the forearm and four small grids were placed over intrinsic muscles (4mm IED).
the study of neuromorphic algorithms, implemented in this work with traditional computing, would lead to adaptive, extremely efficient, and compact implementations on neuromorphic hardware. Thus, we here hypothesize that SNN-based architectures have the potential to efficiently decode the sparse activity of a large number of spinal motor neurons involved in the generation of complex hand motion patterns across different tasks.
Despite their intrinsic spiking nature, only a few attempts have been proposed to process natural spiking information from biological neurons, either from in-vitro neuronal cultures obtained by animal cortical tissues [6], [7], [30] or in-vivo from anesthetized animals [5]. We emphasize here the terminology natural spiking information since several works performed an artificial spike-encoding of neural biopotentials from human in-vivo recordings, like in [8], [16], and [31] and used SNNs to extract movement intentions. This artificial spike-encoding is performed by thresholding the signal with an asynchronous delta modulator approach [13], [35]. However, in those cases, the original biological spiking information of each recorded neuron was not preserved, because there was no previous identification of the activity of single neurons in the recordings. Thus, this is why we address here the direct processing of natural neural spiking information received by motor neurons from the Central Nervous Systems (CNSs) with SNNs.
In this study, we present the case of a non-invasive neural interface based on spinal motor neuron activity. We show for the first time the processing of human in-vivo natural spiking activity of individual spinal motor neurons with a SNN during the execution of daily-life gestures. With this approach, we show the detection of natural finger movements from pools of motor neurons innervating 14 hand muscles. We identify the activity of single motor neurons from Electromyography (EMG) by decomposition [24], [34], [40], [41], by using muscles as a peripheral gate to extract central neural information, generated at the spinal cord level. In this way, we decouple the information sent by the CNS to muscles from the muscle fibers action potentials, de-facto accessing information about the CNS through the muscular system [10], [19], [21], [38], [47], [53]. This type of wearable neural interfaces have been already implemented and extensively evaluated [3], [15], [25], [52]. This proposed SNN-based architecture provides the basis for the development of noninvasive, wearable neural interfaces for the next generation of intuitive HMIs for a broad range of daily-life conditions, such as touchless control of devices, gaming and controlling virtual reality, as well as for control of prostheses and rehabilitation.

A. Subjects
Five healthy male individuals (age: 27.2 ± 3.3 yrs; weight: 74.6 ± 7.1 kg; height: 179 ± 6.7 cm) participated in the experiments after having signed an informed consent form approved by the Imperial College London Research Ethics Committee (approval no. 18IC4685), in conformity with the Declaration of Helsinki.
To record the HD-sEMG dataset, we asked the subjects to perform different types of gestures by flexing each individual finger in three different wrist postures -neutral, flexed, and extended -and also thumb abduction and opposition, since these degrees of freedom of the thumb are involved in grips and object manipulation. For the same reason, we asked to simulate three different grips, i.e. two-finger, three-finger, and five-finger grips. Grips were asked without contact with objects to maintain consistency in terms of interaction with the environment, i.e. action of external forces, with respect to the other 7 gestures. Among these 10 natural gestures, represented in Fig. 1.a, the ones involving a single finger were considered across different wrist postures, respectively in neutral, extended, and flexed positions (15 recordings overall).
Each gesture lasted 6 seconds, for 4 repetitions. A pause of 1 minute between each recording of the 4 repetitions per gesture was interposed, to avoid the presence of fatigue effects. These 6 s were divided in 2 s for reaching the required finger posture (with an excursion of the interested joint angle resulting in an increase in the contraction), 2 s to block the finger posture (isometric part, with a plateau in the contraction), and 2 s to come back to the neutral position. We indicate later in the text the first 2 of these 3 phases, used in the analysis, as increasing (I), plateau (P), by indicating their concurrent selection with I+P.
To indicate to the participants the movements to be executed and the timing of the phases, visual feedback in terms of the type of gesture and an animated cue were provided during the execution of the tasks.

C. Experimental Setup
Six 64-channel grids with equidistant electrodes covered the forearm and the hand: two grids (8 mm inter-electrode distance (IED)) over the extrinsic (into the forearm) extensor and flexor muscles, four grids (4 mm IED) over the intrinsic muscles (into the hand). The grids were made of plastic with electrodes printed in gold. HD-sEMG signals were recorded with a monopolar recording configuration by a 400-channels amplifier (Quattrocento, OT Bioelettronica, Torino, Italy). Signals were amplified with a gain of 150, band-pass filtered between 10 and 900 Hz, sampled at 2048 Hz, and A/D converted to 16 bits. A laptop received the digitized data to store and visualise it in real-time. A monitor was placed in front of each participant to represent a picture indicating the gesture to execute and the timings of the execution. This visual feedback was represented with a custom-made application developed in Matlab (The Mathworks, Natick, US), which also visualized and saved the HD-sEMG signals. In Discussions, we speculate about the adaptation of this instrumentation to the wearable case.

D. Signal Processing
The spiking activity of individual spinal motor neurons innervating muscles in the hand (intrinsic) and in the forearm (extrinsic) actuating fingers and wrist was directly interfaced with a convolutional SNN composed by two layers of LIF neurons (Fig. 2) to identify the performed hand gesture. Decomposition extracts spiking information of single motor neurons non-invasively from HD-sEMG signals. It consists in separating the firing occurrences from the motor unit action potential waveforms ( Fig. 2.b). These waveforms do not correspond to neural information and depend only on the volume conduction, i.e. properties of the recording system, interposed tissues, and the relative distance between active motor units and electrodes [18]. The output of the decomposition is a collection of time-varying IPTs, the sequence of firing occurrences for each identified motor unit ( Fig. 2.b). Each identified waveform for a motor unit corresponds to a spike in the IPTs and the relative firing occurrences are extracted by thresholding [26]. It is worth remembering that the spiking information of single motor neurons is the net product of the integration at the spinal level between the central supraspinal commands and the peripheral afferent commands performed by modules of interneurons. Thus, the input of the SNN is the natural neural binary information of a pool of spinal motor neurons, decoupled from their action potential waveforms, and it provides information about the exertion commands from spine modules to each recorded muscle.
To identify the motor neurons firing patterns of the investigated muscles and track the same motor neurons across multiple tasks, we concatenated the HD-sEMG signals of the recordings relative to each gesture and then decomposed the concatenated HD-sEMG as in [51]. Each group of 64 concatenated HD-sEMG signals corresponding to a recording grid of 64 electrodes was decomposed separately with the Convolution Kernel Compensation (CKC) algorithm [27] (Fig. 3.a). Since the amount of HD-sEMG data to decompose for all the gestures exceeded the computational capacity of the decomposition algorithm (approximately 100MB of data for the 100 decomposition runs executed), HD-sEMG data were divided in 4 different concatenations. The order of concatenation of the HD-sEMG recordings was: • Grips: Five-finger grips, Three-finger grip, Two-finger grip The concatenated HD-sEMG signals were digitally filtered between 20 and 500 Hz with a 4th-order Butterworth filter and then decomposed by the CKC algorithm [27]. The accuracy of this motor unit identification from HD-sEMG was assessed by pulse-to-noise ratio (PNR) [23]. The final output of the decomposition was manually inspected by expert operators according to the consensus study published in [14] and [36]. We discarded a spike train when it presented an average firing rate lower than 2 Hz, or when the corresponding spike-triggered averaged MUAP waveform did not presented a physiological shape but only noise. The latter condition was assessed by identifying shapes with several phases which are not typical of physiological MUAPs.
To complete the motor neuron tracking across all the observed recordings, after having decomposed separately the four HD-sEMG concatenations, the spiking activity of the identified spinal motor neurons for each concatenation was matched across all the other concatenations. To do so, we paired similar action potential waveforms of the respective motor units across the four concatenations, by ordering similar motor unit pairings from the most similar to the least similar until the termination of the pairings. Waveform similarity was assessed with a 2D-cross-correlation between the matched motor unit action potential templates. The average action potential waveforms for each electrode were obtained by spike-triggered average (STA) [37]. In the case of unpaired motor units in this last process based on action potential waveforms, also these unpaired motor units were included in the analysis, since they were already assessed to be accurate when decomposed in their EMG concatenated signals, as described above.
As shown in [51], for electrode grids covering more than one muscle, motor unit location was assessed by computing the root mean square (RMS) of spike-triggered averaged motor unit action potentials per each channel, having a 64-electrode matrix per grid. Then, the grids were divided arbitrarily into three bands as shown in [51] and motor units were assigned according to the most active band and the expected position of each muscle with respect to the band. The methodological aspects of this method are discussed in the cited paper.
The spike trains of all the considered motor neurons of all muscles for each subject, segmented with windows of 200 ms [45] (Fig. 3.b), were sent to the 1-D input array of the network. This segmentation did not imply any smoothing operations, as our SNN works directly with spikes without any imposed continuous transformation. In this way, the minimum firing rate detectable with such a window is 5 Hz, and lower firing rates are de-facto clipped to 0.

E. Structure of the Network, Hyperparameter Optimization and Network Calibration
The proposed convolutional SNN is based on the Deep Continuous Local Learning (DECOLLE) framework [28] Table I. b) This network is designed to be used online by calibrating the decomposition and the network in a calibration phase on a remote server and then using the mixing matrix for online decomposition and the trained optimized network for online classification. This remains a future perspective for this paper since we would need online decomposition, and the part implemented here is the one for the calibration part.
( Fig. 4.b). The hyperparameters which have the greatest impact on the performance are the dimension of the input window, the number and size of the layers and the time constants of neurons and synapses.
We chose the minimum number of layers (two) in the convolutional architecture ( Fig. 4.a), to optimize the trade-off between accuracy and power consumption. The first convolutional layer has an input size equal to the number of the processed motor neurons, which is different for each subject (Table I), and an output of 64 (set empirically after preliminary tests); the second has an input of 64 and an output of 128. Each layer is trained locally, using the classification of the gestures in a fully connected readout layer as the objective function. Each of these readout layers produced a number of outputs equal to the number of classes to discriminate.
The spiking activity of individual spinal motor neurons innervating the targeted muscles was directly interfaced with a convolutional SNN composed of two layers of LIF neurons (Fig. 2). Each of the two convolutional layers was trained autonomously, by surrogate gradient and local learning. The network is mainly ruled by the time constant of the neurons' membrane, and of the synapses, represented by the α and β hyperparameters, respectively. α and β were tuned specifically for each subject during network training.
The time constants of the synapses τ syn and membrane potential τ mem of the LIF neurons are encoded in the α = e −δt τmem and β = e −δt τsyn hyperparameters. The two convolutional layers were tuned for each subject separately by testing the network on all the 10 classes, by considering the contraction phase I+P and a window width of 200 ms with no overlap. This hyperparameter tuning was run on 40 % of the available dataset. The remaining 60 % was used for network training and testing. To divide these two parts of the dataset, 200-ms windows were randomly reshuffled after segmentation making sure that the windows used in the first and second parts were distinct, ensuring the independence of the two subsets. This division simulates two different sessions in a subject-centered network training scenario, first for hyperparameter optimization and then for training the optimal network, as represented in Fig. 4.b. For both parameters, the tested values were 0.75, 0.8, 0.85, 0.9, 0.95, and 0.97, corresponding to time constants of 8.0, 10.3, 14.2, 21.9, 44.9, 75.6 ms.

F. Intra-User Hand Gesture Classification
Fixed the optimal hyperparameters, the optimized network was trained and tested for each subject by selecting different dataset portions of the remaining 60% of the dataset not used for the hyperparameter optimization. Two different sets of classes were discriminated: 5 classes (with only the singlefinger flexion), and 10 classes, adding the 3 grips and the 2 thumb opposition gestures. Different parts of the dataset were separately used for classification: increasing (I), plateau contractions (P), and both together (I+P), obtained by segmenting each repetition for each task in the corresponding 2 s-long phases. Also, different recording conditions were simulated by considering different selections of muscles (only intrinsic, only extrinsic or both groups), classes (only single finger flexion or also thumb abduction, opposition and 3 different grips) and phases of contractions. The SNN training was run for 100 epochs, by grouping different classes, muscles and contraction phases.

G. Inter-User Classification
In different recording sessions and for different subjects, the number of identified neurons by HD-sEMG decomposition is highly variable. This is because motor neuron identification is based on identifying the action potential waveforms of the most superficial motor units discriminable in the measured HD-sEMG signals. These waveforms depend on the conduction volume, which varies due to electrode displacement across different subjects and sessions. A further explanation is formulated in Discussions. To have the same number of motor neurons per muscle for all the subjects, we selected a subset of the most active motor neurons (ordered by the number of spikes in each muscle) in a number equal to the minimum number of motor neurons identified for each muscle across the 5 subjects, reported in Table I (10, 3, 5, 4, 7, 0, 0, 3, 4, 5, 0, 6, 10, 0).
The firing rate for each MU was averaged across all the tasks for each subject, to then sort the motor neurons per overall firing rate and then select the first ones, having the same number of motor neurons per muscle across subjects. This procedure of inter-subject concatenation was implemented for 4 subjects for the training phase, to then test the trained network on the remaining subject. We repeated this procedure for each subject, so testing the network on the data of each subject after having trained the network on the data of the other 4.
From the dataset different portions were considered, i.e. the conditions of 5 classes and 10 classes, and all phases I+P, I, and P, with a window width of 200 ms with no overlap. We chose a value of α and β as those corresponding to the best average across all subjects of the results obtained for each subject in the phase of hyperparameter optimization, corresponding to 0.97 for α and 0.75 for β.

H. Support vector machines (SVM) as Classification Benchmark
In the perspective of comparing our network with a traditional widely used machine learning model, SVM was evaluated to classify the data for each subject described above. The spike train firing timings of all motor neurons were segmented into 200-ms-width windows, and the spikes in each window were summed for each motor neuron. So each data point was constituted by the number of spikes in the window for each motor neuron. The same subdivision between the training and test set used for the SNN was adopted. SVM classification was cross-validated 5 times, by reshuffling the windows selected for the training and test dataset.

I. Optimisation of the Network Structure for Minimizing Computational Resources
The network architecture and hyperparameters used in the analysis above were selected empirically after testing a different number of layers and number of neurons per layer. To test the trade-off between network size and accuracy, Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. we implemented a single-layer convolutional network and a single-layer fully connected network. In both cases, the input size was equal to the number of motor neurons and the output size, respectively equal to 128, 64, 32, 16. The readout layer had an output equal to the number of classes to discriminate. We trained and tested these networks for all the subjects separately (intra-user), for all the 10 classes, only for the plateau phase, and for all muscles.

J. Power Consumption
To compute the energy consumption and inference time of the convolutional SNN, we deployed the network onto the NVIDIA Jetson Nano, an embedded system with a 128-Core Maxwell GPU with 4GB 64-bit LPDDR4 memory 25.6 GB/s (https://developer.nvidia.com/embedded/jetsonnano-developer-kit). The energy consumption performance is presented as Energy-Delay Product (EDP), a metric suitable for most modern processor platforms, defined as the average energy consumption multiplied by the average inference time. The inference time is defined as the time elapsed between the end of the presented sample and the classification. The EDP was calculated using the dynamic power consumption, measured as the difference of total power consumed by the network and the static power, when the GPU is idle, which corresponds to 50 mW.

A. Human Neural Spiking Activity Processed With Spiking Neural Networks
The number of identified motor neurons analyzed in this study is reported in Table I. Their activity was tracked across all the 10 tasks (as shown in Fig. 3.a) for each of the 5 subjects. On average, 93.4 ± 13.8 motor neurons per subject were identified (467 in total) and the accuracy of this identification was quantified by a PNR [23] equal to 32.2 ± 5.0 d B across all motor neurons. As explained in [23], this value of the PNR corresponded to an average decomposition accuracy of > 90% (see Section II). The identified motor neurons for all subjects, across all their activities, presented a mean firing rate of 13.2 ± 7.9 Hz. In the table, motor neurons are grouped by the 14 targeted muscles represented in Fig. 1.b and covered by 64-channels HD-sEMG electrode grids placed as explained in detail in [50]. More than ten of the most important muscles actuating the finger and the wrist were targeted, both to map more completely the biomechanics of hand movements and to simulate the usage of a high-density myoelectric glove for intrinsic muscles or a more traditional high-density myoelectric band around the forearm, over extrinsic muscles.

B. Optimal Hyperparameters
The results for the user-specific hyperparameter optimization, to find the combination of α and β that maximizes the classification accuracy in the test phase, are shown in Fig. 5. The results are provided for one representative subject, for the two layers of the network. Based on this grid search, we chose for each subject the best hyperparameter values, reported in Table II. On average, across all layers, subjects, and α-β combinations, the accuracy was 0.95 ± 0.07. Fig. 6 shows the median (yellow line) and interquartile range (box) of test accuracy across subjects for each layer for three muscle groupings: all muscles (dark violet), intrinsic muscles (magenta), and extrinsic muscles (cyan), and for the three contraction phases. These values are reported both in the case of classifying 5 and 10 classes. We obtained an overall test accuracy of 0.92 ± 0.10 for all muscles, 0.83 ± 0.23 for only intrinsic and 0.86 ± 0.19 for only extrinsic, across all subjects, the two class selections (5 and 10), and the two layers respectively. The accuracy was 0.86 ± 0.20 for plateau (steady and isometric) contractions, while for increasing contractions (non-isometric) was 0.81 ± 0.20, and for both the contractions considered together was 0.95 ± 0.14.

C. Intra-User Hand Gesture Classification
The test accuracy performance of the second layer -across all subjects and the two class selections -was overall higher than for the first: 0.94 ± 0.1 (versus 0.91 ± 0.11) for all muscles, 0.91 ± 0.13 (versus 0.75 ± 0.28) for intrinsic, and 0.91 ± 0.14 (versus 0.81 ± 0.22) for extrinsic, with overall lower standard deviation in all cases. This applies also when analyzing the different phases (I, P, I+P) of the movement: respectively the second versus the first layer presented an accuracy of 1.0 ± 0 versus 0.9 ± 0.18 for I+P, 0.86 ± 0.15 vs 0.75 ± 0.22 for I, and 0.9 ± 0.12 vs 0.82 ± 0.25 for P. Adding further layers did not improve these figures of merit, we therefore optimized the accuracy/resources trade-off using only two layers, in order to apply this structure for onchip implementation. Sec. II-I presents the optimization of computational resources, by analyzing the test accuracy when progressively reducing the number of neurons.

D. Inter-Subject Hand Gesture Classification
In the case of a multi-subject dataset (4 subjects), to then classify data of a new subject, we found a test accuracy averagely smaller than in the case of the intra-subject classification. Fig. 7 the median and interquartile range of test accuracy across subjects for each layer for three muscle groupings, represented as in Fig. 6. We obtained an overall test accuracy of 0.54 ± 0.21 for all muscles, 0.50 ± 0.18 for only intrinsic and 0.42 ± 0.16 for only extrinsic, across all subjects, type of contractions, the two class selections (5 and 10), and the two layers respectively. However, by considering only 5 classes, we obtained slightly higher average values of test accuracy equal to 0.63 ± 0.20 for all muscles, 0.61 ± 0.16 for only intrinsic and 0.49 ± 0.16. Remarkably, across 3 out 5 subjects, they could be classified averagely 0.78 ± 0.05, 0.73±0.04, 0.61±0.05 respectively for each muscle grouping, for 5 classes. Averagely, higher values of test accuracy were found in this analysis for intrinsic than for extrinsic muscles.

E. Classification of Motor Neurons With SVM
In the case of applying SVM on the same motor neuron data by training the model for each individual subject, values for different muscle grouping, different selections of contraction phase, and different selections of the classes look on average smaller than the ones of the network. Fig. 8 the median and interquartile range of test accuracy across subjects for each layer for three muscle groupings, represented as in Fig. 6. With SVM we obtained an overall test accuracy, across all subjects, all contraction phases and classes, respectively of 0.83 ± 0.10 for all muscles, 0.72 ± 0.14 for only intrinsic and 0.61 ± 0.19 for only extrinsic. Across all muscle grouping and number of classes, for P contractions the accuracy was 0.70 ± 0.19, while for I contractions and for I+P contractions was respectively 0.70 ± 0.17 and 0.75 ± 0.13. By comparing these values with the ones obtained with SNN, we can observe that mean values for the SNN were greater than the respective ones for SVM.

F. Power Consumption
The dynamic power of the network was 100 mW for a total consumed energy of 0.97 mJ and inference time of 9.7 ms, resulting in EDP at 9.4 uJ*s.

G. Optimisation of the Network Structure for Minimizing
Computational Resources Fig. 9 shows the mean and standard deviation of the test accuracy across subjects, for varying the size of the singlelayer networks. While for the fully connected network we found a progressive decrease in accuracy by decreasing the size, for the convolutional network we found the same test accuracy for a size greater than or equal to 32 neurons, on average over 0.8 and similar to the value obtained with the fully connected layer with 64 neurons.

IV. DISCUSSIONS
We propose a neuromorphic framework for processing the spiking activity of human motor neurons, toward the design of wearable neural interfaces. Motor neuron activity was recorded in-vivo, during the execution of natural hand gestures. Movement intention was inferred from the spike trains of almost one hundred motor neurons for each subject (thus almost 500 motor neuron spike trains were processed in total). This framework unleashes the full potential of interfacing biological neurons with artificial spiking neurons, solely using spikebased encoding. The use of LIF neurons and local spike-driven plasticity rules opens up the possibility of implementing such architecture on neuromorphic chips, leading to an even more efficient online and wearable implementation. We considered muscles in the forearm (extrinsic), to target the use of myoelectric armbands and bracelets for monitoring users' Authorized licensed use limited to the terms of the applicable license agreement with IEEE. Restrictions apply. activity [22], [48], and in the hand (intrinsic), so far considered mainly in neurophysiology [50], [51], [54].
We here processed an unprecedentedly complex dataset, with around 500 hundred motor neurons innervating 14 hand muscles, and including gestures like single-finger flexion (5 classes), thumb movements, and three different grip types (for a total of 10 classes), to include more natural gestures. To identify this great number of motor neurons, we used the largest HD-sEMG montage currently attempted, presented in [50] and [51] and involving 384 channels, to decompose spinal motor neuron spike trains from motor unit action potential waveforms. The dataset presented in this paper also contains non-isometric contractions, implying an increase of the contraction to perform one of the 10 gestures, followed by a plateau isometric phase for each gesture. We processed these contraction phases both separately and together, to understand during which phase of the motion the network can extract more information about the hand gesture. We studied the network performance when trained separately for each subject and when trained for all subjects, to validate its use in a user-centered approach whereby a pre-trained network can be adapted to a single user for best performance.
When using optimized hyperparameters to train the network with data of each individual user, we got a high accuracy of 0.95 ± 0.14 (across subjects and layers) considering all the muscles and both isometric and non-isometric contractions in the same training. This is an important result, enabling high classification accuracy of daily-life gestures, which can be non-isometric or pseudo-isometric in a variable way. At the second layer of the network, we found less difference in classification accuracy among the three muscle groupings and an average higher classification performance (Fig.6). This means that adding a second layer can increase and stabilize the pattern recognition of neural spiking information encoding different gestures. Finally, we showed how to optimize the network with  a different combination of hyperparameters representing the neuron and the synapse time constants (respectively α and β). After an initial tuning, which could be performed by the user as a calibration phase to update periodically, the network can be fine-tuned with data from the user through local learning at each layer. The results obtained for the best hyperparameters is empirical and must be repeated per each subject, eventually leading every time to a different result, since the data to train the model will change and different stability point could be found for the hyperparameters.
We also investigated the case of recognizing neural spiking patterns from a dataset containing information collected from many users, to create a general model of human neural patterns associated with gestures. This raises the problem of variability in the number of identified motor neurons across different subjects, as shown in Table I. In fact, the identification of motor unit action potentials associated with a single motor unit (thus a motor neuron) depends on the specific volume conduction of a certain session (position of the electrodes with respect to the muscles) and even more from the anatomic aspects of one subject (interposed tissues between electrodes and muscles, the shape of the muscles and body metric). This could hinder the successful deployment of the system in myoelectric control whether the network would need to be trained on multiple subjects. Although it is reasonable for many applications to train the network specifically for one subject, leading to high accuracy as shown, developing universal models of human neural patterns could be useful for instance to avoid training the model for each specific user, saving time during the usage. The solution that we proposed consists in extracting subsets of an equal number of motor neurons for each muscle for all the subjects and for all sessions, by selecting the most active motor neurons from each subject. As shown in [51], by selecting a representative subset of MUs, we still preserve the majority of neural commands sent from the spinal cord to the muscles. It is clear that concatenating motor neurons of different subjects generates a dataset that does not correspond to a physiological grouping of motor units since it groups together motor units of different subjects. However, this is an example of a first attempt to reach a universal multi-subject motor neuron dataset by finding a way to keep consistency in the number of channels, i.e. motor neurons, across subjects. Nevertheless, we observe lower classification accuracy in this case, than when classifying patterns separately for each user. Further options in this direction could be the implementation of a customized input layer that adapts to the number of identified motor neurons with an arbitrary number of outputs, concatenated with the following layers trained on many subjects. Also, training on a larger population of users and mapping more conditions and gestures would be necessary for an inter-user classification model of human motor intentions from motor neuron spiking activity.
Few examples of SNN processing biological neural information can be found in the literature. A first common approach is an artificial spike encoding of in-vivo neural biopotentials, like EEG and EMG. This spike encoding of the recorded signal is performed by thresholding the signal values exceeding a baseline, with an asynchronous delta modulator approach [13], [16], [35]. Differently, in this study, we did not use an artificial spike encoding to extract spiking information from biopotentials, i.e. EMG, but we identified the natural spike encoding received by each single motor neuron from spinal neuronal circuitries or super-spinal structures [32]. This spiking neural information is inherent in the recorded EMG signals and coupled with volume conduction information [17]. Through decomposition, we decoupled this neural spiking information from the conduction volume information (motor unit action potential waveforms). Thus, we did not feed a SNN with an artificial spike encoding from biopotentials, but with the true natural neural activity of human in-vivo neurons, e.g. spinal motor neurons. A second approach for SNN processing of biological neural information is processing in-vitro spiking information from a population of neurons plated onto a substrate-integrated multi-electrode array [6], [7], [30] by recording from in-vitro neurons obtained from rat neocortex. Finally, a third approach is to record neuronal spikes from anesthetized animals while stimulating the nervous tissues to be processed with neuromorphic devices [5]. However, so far nobody attempted to process the activity of in-vivo human individual neurons receiving their spiking information from the CNS, i.e spinal and supraspinal structures, during daily-life gesture execution [55].
We are aware of the small pool of subjects involved in this experiment. However, we were focused in showing the novel application of a SNN on in-vivo motor neuron spike trains and such a dataset was sufficient to train and test the networks. As stated above, this dataset is already unprecedentedly rich, with around 500 hundred motor neurons innervating 14 hand muscles processed concurrently. After this first proof of concept, we aim to extend this framework in the next studies by validating more complex SNN structures with more subjects, both males and females. This framework has been developed with the goal of implementation for wearable neural interfaces. A wearable adaptation of the instrumentation used in this work would require a) a more flexible and compliant material for the electrode grids, b) a miniaturized chip implementing the signal conditioning for each 64-channel grid, and c) a wireless solution, as discussed in [49]. Sensing and decoding the neural drive using a sleeve array was already demonstrated successful for a person with tetraplegia targeting paralyzed muscles during attempted movements [52]. Regarding the high number of EMG electrodes required, in the order of tens for covering one muscle and hundreds to cover muscular groups, this is at the moment a fundamental requisite for motor neuron identification from myoelectric activiy. In fact, the state-of-the-art blind source separation methods for EMG decomposition still rely on montages guaranteeing high spatial resolution and redundant information shared across the channels [12], [23], [27]. Devices like 8-channel armbands like the one used in [16] would not be suitable to extract spinal motor neuron information.
Also, this framework is designed to be used in daily-life contexts to control external devices, like gaming, control VR or mechatronic devices. The required online adaptation of the framework has been mentioned in Methods II and described in Fig. 4.b. An online implementation of the decomposition would provide a very convenient way to access non-invasively to spinal neural drive [3] and the hyperparameter optimization and the training of the network would be added in this calibration phase.
In the particular case of rehabilitation, we recommend our framework to control hybrid orthoses based on FES and mechatronic exoskeletons, which are usually triggered by the residual myoelectric activity [1], [2]. In fact, hemiplegic stroke survivors can still generate residual myoelectric activity even if they are severely impaired in coordinating movements and controlling muscle exertion with proficiency [11]. In the case of amputees, we in our work specifically record the activity from motor neurons innervating the proximal part of the forearm, like in the case of trans-radial amputees, to then compare this information with the one from motor neuron innervating intrinsic muscles (inside the hand).
Finally, we aim to implement this framework on neuromorphic chips in the next future. This is a fundamental step for wearable implementation and online control. In fact, the power consumption of the overall system is very high, due to the inherent limits of the current state-of-the-art technologies used in this framework. Besides the necessity of online decomposition, the translation of our networks on in-silico chips will speed up significantly the processing and enable great efficiency in power consumption. To minimize the time of computation and the energy consumption needed for edge computing on neuromorphic chips, the first requirement is minimizing the network size and number of operations (number of connections, neurons, layers, etc.), while maintaining reasonable levels of accuracy. For the shown results, we used 64 neurons and 128 neurons for the two convolutional layers, respectively, which leads to tens of thousands of synaptic connections (for a convolutional kernel size of 3). We observed that both for a convolutional and a fully connected layer with at least 32 neurons the test accuracy can be kept averagely over 0.8, although the convolutional layer shows slightly more stable performance by decreasing the number of neurons between 128 and 32. To this aim, technological aspects to map this architecture on neuromorphic chips need to be solved.
Although we here accessed motor neuron spiking information via a non-invasive neural interface using blind source separation, motor neuron activity could be as well extracted using implantable devices targeting nerves or the cortex. This would open the scenario of extending biological neuronal circuitry with artificial silicon neurons.

V. CONCLUSION
We propose a neuromorphic framework for processing the spiking activity of human motor neurons to be used in the next generation of neural interfaces. This framework could be used for a broad range of purposes and adapted to the case of implanted neural devices.
ACKNOWLEDGMENT Dario Farina and Chiara Bartolozzi share the senior authorship.