Improved Spike-Based Brain-Machine Interface Using Bayesian Adaptive Kernel Smoother and Deep Learning

Multiunit activity (MUA) has been proposed to mitigate the robustness issue faced by single-unit activity (SUA)-based brain-machine interfaces (BMIs). Most MUA-based BMIs still employ a binning method for estimating firing rates and linear decoder for decoding behavioural parameters. The limitations of binning and linear decoder lead to suboptimal performance of MUA-based BMIs. To address this issue, we propose a method which consists of Bayesian adaptive kernel smoother (BAKS) as the firing rate estimation algorithm and deep learning, particularly quasi-recurrent neural network (QRNN), as the decoding algorithm. We evaluated the proposed method for reconstructing (offline) hand kinematics from intracortical neural data chronically recorded from the primary motor cortex of two non-human primates. Extensive empirical results across recording sessions and subjects showed that the proposed method consistently outperforms other combinations of firing rate estimation algorithm and decoding algorithm. Overall results suggest the effectiveness of the proposed method for improving the decoding performance of MUA-based BMIs.


I. INTRODUCTION
Brain-machine interfaces (BMIs) seek to restore lost motor function in severely paralysed patients by translating brain activity into control signals for guiding assistive devices, such as a computer cursor [1]- [4], robotic arm [5]- [7], or functional electrical stimulation (FES) system [8]- [10]. Numerous BMI studies utilising neuronal action potentials or spikes -also known as single-unit activity (SUA)have shown compelling results in animals [11]- [14] and humans [1], [2], [6], [7]. Nevertheless, spike recordings are not chronically stable, and the number of observable units The associate editor coordinating the review of this manuscript and approving it for publication was Mohammad Zia Ur Rahman . progressively decline over time [15]- [17]. Several factors thought to cause this instability are glial scar formation (induced by neural tissue responses) encapsulating the electrodes, micromotion of the electrodes, insulation degradation, and mechanical breakage [18], [19]. The instability of SUA hinders clinical translation of SUA-based BMIs.
To overcome the above problem, multiunit activity (MUA) has been proposed as an alternative input signal to single-unit activity (SUA) [17], [20]- [22]. MUA refers to all spikes detected via a threshold-crossing technique without classifying (sorting) further into individual units. Thus, MUA represents the aggregate spikes from an ensemble of neurons in the vicinity of the recording electrode tip. Compared to SUA, MUA is more stable and requires simpler VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ signal processing. Most MUA-based BMIs employ a binning method for for estimating firing rates and linear decoders for decoding movement parameters [23]- [29]. The binning method estimates firing rates by counting the number of spikes within a predefined bin/window width. Despite being simple and fast, binning results in a coarse/noisy estimate of firing rates. As movements are usually smooth/continuous over time [26], a method that can yield a smooth estimate of firing rates could potentially improve decoding performance of MUA-based BMIs. Additionally, the use of linear decoders with their inherent assumptions leads to suboptimal decoding performance since neural signals often exhibit non-linear, non-stationary, and non-Gaussian characteristics [30]. Therefore, it is highly desirable to develop a decoding algorithm that is robust against the above neural signal characteristics. The rise of deep learning in recent years has presented the opportunity to potentially improve the decoding performance of MUA-based BMIs [31]- [33]. So far, however, there have been very few studies utilising deep learning for MUA-based BMIs. The present work aims to improve the decoding performance of MUA-based BMIs by proposing a method which comprises (1) Bayesian adaptive kernel smoother (BAKS) as the firing rate estimation algorithm and (2) deep learning, specifically quasi-recurrent neural network (QRNN), as the decoding algorithm. We evaluate the proposed method for decoding (offline) hand kinematics from neural signals chronically recorded from the primary motor cortex (M1) area of two nonhuman primates while performing self-paced reaching tasks. We benchmark the proposed method against all possible combinations of two firing rate estimation algorithms (binning and fixed kernel smoother) and four decoding algorithms (Kalman filter, Wiener filter, multilayer perceptron, and long short-term memory) as reported in previous studies (e.g. [5], [22], [23], [31], [33], [34]). Lastly, we compare the decoding performance between MUA and SUA using the proposed method.
The remainder of this paper is organised as follows: Section II describes the methods, including experimental setup, signal processing, and decoding algorithms; Section III presents the empirical results, followed by analyses and discussion in Section IV; and finally, the conclusion is drawn in Section V. Fig. 1 illustrates the schematic overview of spike-based BMI system. Firing rate estimation was performed in an overlapping fashion (overlap of half window width). The estimated firing rate was then standardised (i.e. z-transformed) before being fed to the decoding algorithm. Linear decoding algorithms were implemented using Scikit-learn (v0.24.1) library [35], whereas deep learning based decoding algorithms were implemented using Tensorflow (v2.3.0) framework [36]. To tune the deep learning decoders, we used a hyperparameter optimisation framework called Optuna (v2.10.0) [37]. All the experiments were conducted in Python programming language (v3.8.8) running on a Windows-based machine with Intel(R) Core(TM) i7-4790 CPU @3.6 GHz. The source code used for conducting these experiments is available at https://github.com/nurahmadi/spike_bmi.

A. NEURAL RECORDINGS
Neural data were obtained from publicly available dataset deposited by Sabes lab [38]. The neural data were recorded from the primary motor cortex (M1) area of two adult male Rhesus macaque monkeys (Macaca mulatta), indicated as monkey I (Indy) and monkey L (Loco). The recordings were made using a 96-channel Utah microelectrode array (Blackrock Microsystems, US) coated with platinum contact (400 k impedance, 400 µm interelectrode spacing, 1 mm electrode length). The recordings were referenced to a silver wire placed under the dura (several cm away from the electrodes). They were preamplified and filtered by using a 4th-order low-pass filter at 7.5 kHz and were then digitised with 16-bit resolution at 24.4 kHz sampling rate. These digitised recordings are referred to as raw neural signals. Details of the experimental setup are described elsewhere [39]. For Monkey I data, we used a total of 34 recording sessions spanning around 10 months between the first (I20160407_02) and last (I20170131_02) sessions with varying durations from 6.00 to 56.05 minutes (average of 13.47 ± 10.61 minutes). The first and last sessions correspond to 29 and 328 days after the electrode implantation date, respectively. For Monkey L data, a total of 10 recording sessions were used which span 20 days between the first (L20170210_03) and last (L20170302_02) sessions ranging from 18.67 to 53.32 minutes (average of 33.78 ± 10.22 minutes). The first and last sessions correspond to 338 and 358 days after the electrode implantation date, respectively. More detailed information about the recording statistics, can be seen in Supplementary  Tables 1 and 2 for Monkey I and L, respectively.

B. BEHAVIORAL TASKS AND KINEMATIC DATA
The monkeys were trained to reach randomly drawn circular targets which were uniformly distributed around an 8 × 8 or 8 × 17 grid for monkey I and 6 × 6 for monkey L. Monkey I reached the targets with the left arm, whereas monkey L reached the targets with the right arm. The targets were acquired when the monkeys reached the targets using their fingertip and held them for 450 ms. Upon every target acquisition, a new random target was presented immediately without an inter-trial interval. The fingertip position of the reaching hand and the target position (in x-y Cartesian coordinates with mm unit) were both sampled at 250 Hz. The position data were then low pass filtered with a non-causal, 4th-order Butterworth filter at 10 Hz to reject sensor noise. A more detailed description of the behavioural task is given in [39]. Velocity and acceleration data were computed using the first and second derivative of the position data. The position, velocity, and acceleration data, which herein are referred to kinematic data, were downsampled to match the timescales of the firing rate (feature) data. These pairs of firing rate (input) and kinematic (output) data were used to train or fit the decoding algorithms (i.e. decoders). Except for Kalman filter which used all the kinematic state variables (position, velocity, and acceleration), all other decoders only used the velocity data. Velocity decoding was selected following several high-performing BMIs in the literature [33], [39], [40].
The raw neural signals were band-pass filtered with a causal, 4th-order Butterworth IIR filter between 500 to 5000 Hz. Spikes were then detected whenever the absolute value of the band-passed signals crossed a threshold value (typically set to between 3.5 and 4.0 times the standard deviation). Here, MUA refers to all the detected spikes aggregated per channel. This included unsorted spikes (also called 'noise' or 'hash' units) which represent threshold crossing spikes that did not match any single-unit templates. Only MUA with spike rates exceeding 0.  SUA was obtained by sorting (i.e. classifying) the detected spikes into distinct putative single units via principal component analysis and template matching. Each channel could contain more than one units (up to five units, including the 'hash' units). More detailed information on the spike detection and sorting processes can be found in [39]. We only included SUA with spike rates above 0. 5  Binning estimates firing rate by computing the number of spikes within a predefined window width. The window width was set to a certain value such that it maximises the decoding performance.

2) FIXED KERNEL SMOOTHER (FKS)
FKS estimates firing rate by convolving a spike train with a Gaussian kernel function with predefined window width and fixed bandwidth (i.e. standard deviation) parameter. The bandwidth parameter was set to one-quarter of the window width which covers 95% of the observations. The window width was tuned to maximise the decoding performance.

3) BAYESIAN ADAPTIVE KERNEL SMOOTHER (BAKS)
BAKS estimates firing rate by convolving a spike train with a Gaussian kernel function with predefined window width and adaptive bandwidth (instead of fixed bandwidth as in FKS). BAKS has two parameters, which are shape parameter (α) and scale parameter (β). As in binning and FKS methods, the window width for BAKS was tuned to maximise the decoding performance. The shape parameter (α) was set to 4, while the scale parameter (β) was set to n 4/5 , where n represents the number of spikes. The detailed derivation of BAKS formula and its parameter tuning are described in our previous study [41]. Briefly, the value of α was tuned by minimising mean integrated squared error (MISE) of firing rate estimation from synthetic spike train data. These synthetic data were generated by using biologically plausible models, inhomogeneous Gamma (IG) and inhomogeneous inverse Gaussian (IIG), with known underlying rate functions. The rate functions were selected such that they could represent non-stationary processes usually encountered in empirical data, which include continuous process with homogeneous or heterogeneous frequency, and discontinuous process with sudden rate changes.

E. DECODING ALGORITHM 1) KALMAN FILTER (KF)
KFs have been employed in numerous BMI studies to predict hand kinematics or kinetics from neural signals [5], [42]- [45]. KF combines process and measurement models with the assumption that both models are linear and Gaussian. The process model defines the evolution of the state (assumed to be Markovian) from the previous timestep to the current timestep. It is composed of two steps, prediction and update, which are performed in a recursive manner. In the prediction step, a priori state estimate and a priori error covariance at the current timestep are predicted from previous state estimate and error covariance. In the update step, a posteriori state estimate and a posteriori error covariance at the current timestep are obtained from the predicted state estimate and error covariance combined with the information from the current measurement using Kalman gain weighting. Kalman gain represents how much weight given to the measurements to refine the current state estimate. Following Wu et al.'s work [42], KF parameters were assumed to be invariant and were estimated using least square regression on the training data. State variables used in KF were hand position, velocity, and acceleration in x and y directions.

2) WIENER FILTER (WF)
WF linearly estimates kinematic data from the neural signals with the assumption of known stationary signal and additive noise. WF produces an optimal estimate in minimum mean squared error sense. WFs were employed in a number of prior BMI studies [1], [2], [13], [46], [47]. An estimate of hand kinematics,ŷ(t), is expressed as: where w i (τ ) denotes a filter kernel and x i represents the measurements (i.e. firing rates). C is the number of channels, whereas L is the number of taps. The filter weights (w i (τ )) were estimated using linear least-squares on the training data.
Parameter L was tuned from value range between 1 and 10 such that it maximises the decoding performance. The state variables used in WF were hand velocity in x and y directions.

3) MULTILAYER PERCEPTRON (MLP)
MLP is a type of feedforward artificial neural network (ANN) comprising at least three layers (an input layer, one or more hidden layers, and an output layer) of nodes or neurons. Each node produces an output by computing weighted sum of its inputs and passing through an activation function, which is formulated as where W denotes the learnable parameters (weights), x represent the input vector, b is the bias vector, and φ is the activation function. We used a non-linear activation function called rectified linear unit (ReLU; φ(z) = max(0, z)) for all layers except for the output layer which uses linear activation function (φ(z) = z). The number of layers along with other hyperparameter values were determined through hyperparameter optimisation procedure as described in the following section.

4) LONG SHORT-TERM MEMORY (LSTM)
LSTM, proposed by Hochreiter and Schmidhuber in 1997 [48], is one of the most popular deep learning methods and has achieved state-of-the-art performance in various tasks, particularly those with time-series data [49], including BMI applications [31], [33], [50]. LSTM can effectively learn long-term temporal dependencies via a memory cell that maintains its state overtime and gating mechanism that controls the flow of information into and out of the memory cell. The states of LSTM components at timestep t are mathematically expressed as follows: where x, h, f, i, o, c consecutively represent the input, output, forget gate, input gate, output gate, and memory cell. The operators , σ , and tanh denote the element-wise multiplication, logistic sigmoid function, and hyperbolic tangent function, respectively. Matrices W, U and bias vectors b represent the input and recurrent weights, respectively. We empirically selected 1 layer with the last timestep from the LSTM output was connected to a fully connected layer to obtain the final output. The other hyperparameter values were determined through hyperparameter optimisation.

5) QUASI-RECURRENT NEURAL NETWORK (QRNN)
QRNN which was developed by Bradbury et al. in 2016 [51] is another type of RNNs that is able to learn long-term temporal dependencies of sequential data while also offers increased parallelism as in convolutional neural networks (CNNs). Recent studies showed that QRNN performed well for hand kinematic decoding [52] and gait decoding [53]. QRNN consists of two main components: (1) convolutional component which performs convolutions in parallel across timesteps, and (2) pooling component which handles temporal dependencies in parallel across feature dimensions. The components of QRNN are mathematically formulated as follows: where z, f, o, c, h consecutively represent the candidate vectors, forget gate, output gate, memory cell, and hidden state. Operator * represents a masked convolution which is a type of convolution that depends only on the past and present inputs. This means that the convolution cannot use input data from the future. We empirically selected 1 layer and window size of 2 with the last timestep from the QRNN output was connected to a fully connected layer to obtain the final output. As in MLP and LSTM, the other hyperparameter values of QRNN were automatically selected through hyperparameter optimisation.

F. DEEP LEARNING OPTIMISATION AND TRAINING
Hyperparameters of deep learning (DL) based decoders were optimised using validation sets which were split from training sets. The validation set size was 10% of the training set size. The list of hyperparameters to be optimised along with their search range is shown in Table 1. The hyperparameter optimisation was conducted using a Bayesian optimisation framework called Optuna [37] independently for each combination of input signal (SUA or MUA), firing rate estimation algorithm (binning, FKS, or BAKS) and decoding algorithm (MLP, LSTM, or QRNN). It was run for 200 iterations with pruning mechanism that would automatically stop unpromising trials at the early stages of the training. The pruning (also known as automated early-stopping) mechanism sped up the hyperparameter optimisation process. To save the computational time of experiments, the hyperparameter optimisation was performed only once using the first recording session of Monkey I (I20160407_02); for the subsequent sessions of Monkey I and Monkey L, we used the same hyperparameter configuration. The resulting optimised hyperparameter values for MUA-driven DL decoders are listed in Table 2. As for SUA-driven DL decoders, the optimised hyperparameters can be seen in Supplementary Table 3.
The DL decoders were trained using the optimised hyperparameter configuration and mean squared error (MSE) loss function. The DL decoders were trained on each recording session using the same hyperparameter configuration. We trained the DL decoders using all number of MUA or SUA which exceeded a minimum threshold of 0.5 Hz (see Supplementary Table 1-2 for the details). The training data consisted of firing rate as the input (feature) and velocity in x-and y-directions as the output (ground truth). The firing rate was obtained from a rolling segment (window) of 240 ms with an overlap of 120 ms. All the DL decoders followed the same training procedure.

G. PERFORMANCE EVALUATION AND METRICS
To evaluate the performance of the proposed method, we used k-fold growing-window forward validation scheme [54], where k=5, as illustrated in Fig. 2. Compared to k-fold  cross-validation, this scheme is more appropriate for performance evaluation of BMI decoding because it takes into account the sequential information of the neural time-series data, and it represents more accurate view of the decoding performance in the past given the available data at that time.
In order not to lose too many training samples when compared to k-fold cross-validation, the minimum size of training data was set to 50% of the whole data within each session.
Decoding performance was evaluated using two commonly used metrics [23], [55], [56]: (1) root mean square error (RMSE) and (2) Pearson's correlation coefficient (CC), which are formulated as follows: where y t andŷ t denote the true and decoded velocities in xand y-directions at timestep t, respectively, and N represents the total number of samples. We employed velocity decoding as in several previous studies [33], [39], [40] H. STATISTICAL ANALYSIS For each recording session, the mean and standard error of the mean (SEM) of the decoding performance were evaluated on k=5 different folds within the testing sets. Unless VOLUME 10, 2022  otherwise noted, when reporting the decoding performance with ± symbol, it represents the SEM value. To test statistical significance between a pair of different decoders, a two-tailed paired t-test was used if the difference between the pairs follows normal distribution; otherwise, a two-tailed paired Wilcoxon signed-rank test was used. The significance level (α) was set to 0.05. Boxplots were used to visualise the decoding performance comparison across 34 recording sessions for monkey I and 10 recording sessions for monkey L. The horizontal line and circle mark inside each boxplot represent the median and mean, respectively. The coloured solid box represents interquartile range (IQR) from 25th to 75th percentiles. The whisker extends 1.5 times the IQR.

A. SELECTION OF WINDOW WIDTH
We assessed the impact of firing rate estimation algorithms under different window widths on decoding performance. For each firing rate estimation algorithm, we varied the value of window width from 16 ms to 400 ms with an increment of 16 ms. The firing rate estimation was conducted in an overlapping fashion where the overlap size was set to half of the window width. We used linear decoders (WF and KF) for performance comparison because they have significantly fewer number of hyperparameters than DL decoders. Thus, the decoding performance is not confounded by the choice of hyperparameters. Figs. 3(a)-(b) illustrate the impact of varying window widths on average decoding performance using WF decoder measured in RMSE and CC, respectively. For the case of KF decoder, the performance comparison can be seen in Supplementary Fig. 1. Empirical results from both WF and KF decoders yielded similar finding where increasing the window width up to a certain value would improve the decoding performance; above this value, however, the decoding performance reached a plateau or tended to decrease. Based on these results, for the subsequent experiments and final performance evaluation on the testing sets, we used window width of 240 ms. This is because using larger window width would only increase the computational complexity and execution time while offering very small (negligible) performance improvement.

B. SELECTION OF NUMBER OF TAPS
We investigated the optimal number of taps for WF decoder by varying the number of taps (L) from 1 to 10 with an increment of 1. As shown Fig. 4, increasing L improves the decoding performance and reaches the best decoding performance at L=4. After this point, the decoding performance decreases. This finding was consistently observed  across firing rate estimation algorithms measured with both RMSE and CC metrics. Therefore, we used L=4 for final performance evaluation of WF decoder on the testing sets.

C. DECODING PERFORMANCE UNDER VARYING SHAPE PARAMETER VALUES
To study the sensitivity of BAKS to the values of its main controlling parameter, we varied the values of α from 1 to 10 with an increment of 0.5 and compared their associated decoding performance. The comparison of BAKS decoding performance using WF decoder across different α values is shown in Fig. 5. The best decoding performance was achieved at α=6.5 and α=4.5 when measured in RMSE and CC metrics, respectively. However, compared to that of α=4.0, the difference in decoding performance was very small (0.0111% for RMSE, 0.0002% for CC). As illustrated in Fig. 5, the BAKS plot (green line) looks flat which indicates negligible difference in decoding performance across α values. If we zoom-in the BAKS plot (see the insets of Fig. 5), we can then see the performance difference. We observed an average performance difference of 0.01% in both RMSE and CC metrics. Despite the difference, the decoding performance of BAKS across all the varied α values was consistently better than those of binning and FKS. Since binning and FKS do not use α parameter, their decoding performance is exactly the same. For better readability when compared to that of BAKS, binning and FKS are represented by flat lines (in red and blue colours, respectively) across α values. We also observed the same results when using KF decoder as can be seen in Supplementary Fig. 2.

D. PERFORMANCE COMPARISON ACROSS FIRING RATE ESTIMATION ALGORITHMS
Next, we evaluated the decoding performance of each firing rate estimation algorithm using MUA-driven WF decoder on the testing sets of all recording sessions of monkeys I and L. Fig. 6(a)-(b) present long-term decoding performance comparison over 34 recording sessions of monkey I across firing rate estimations methods, measured in RMSE and CC, respectively. We found that BAKS outperformed binning and FKS in both metrics across all (100%) recording sessions. The average decoding performance of each firing rate algorithm was as follows (sorted from highest to lowest): BAKS (RMSE = 47.664 ± 1.063, CC = 0.758 ± 0.007), FKS (RMSE = 49.110 ± 1.073, CC = 0.739 ± 0.008), and binning (RMSE = 49.165 ± 1.098, CC = 0.739 ± 0.007). The RMSE and CC values are written in terms of mean ± standard error of the mean (SEM). Compared to binning, BAKS yielded an average performance improvement of 3.05% (RMSE) and 2.47% (CC). On the other hand, FKS exhibited an average performance improvement of only 0.07% in RMSE and performance degradation of 0.06% in CC. The boxplot comparison among binning, FKS, and BAKS is shown in Fig. 2(c)-(d). Statistical tests showed that the performance of BAKS differed significantly (*** p<0.001) from that of binning (in both RMSE and CC metrics). However, there was no statistically significant difference in decoding performance between FKS and binning (see Fig. 2(c)-(d)).
To determine whether the above findings were also observed when using different decoding algorithms, we performed decoding comparison across firing rate estimation algorithms using KF, LSTM, and QRNN decoders. Consistent with the previous findings, BAKS was found to outperform other algorithms across decoding algorithms in both monkeys I and L, as shown in Supplementary Figs. 3-5 for KF, LSTM, and QRNN decoders, respectively. We found that the performance improvement of BAKS relative to binning was higher when using the linear decoders (KF and WF) compared to when using the DL decoders (LSTM and QRNN). The performance improvements of BAKS when using the linear decoders were 1.64%-3.26% and 2.41%-3.91% in RMSE and CC, respectively. As for the cases of the DL decoders, the performance improvements were 0.42%-1.64% (RMSE) and 0.11%-1.68% (CC). There were statistical significant differences (*** p<0.001) in decoding performance between BAKS and binning when using KF, WF, and LSTM decoders (see Supplementary Figs. [3][4][5]. Overall results across sessions, subjects, and decoders showed the superior performance of BAKS compared to binning and FKS.

E. PERFORMANCE COMPARISON ACROSS DECODING ALGORITHMS
Using BAKS as the firing rate estimation algorithm, we then evaluated and compared the decoding performance of the proposed decoder (QRNN) against other decoders (KF, WF, MLP, and LSTM). Results from monkey I dataset are shown in Fig. 7. Figs. 7(a)-(b) present the decoding performance comparison over 34 recording sessions in terms of RMSE and CC, respectively. We found that QRNN consistently outperformed all the other decoders. According to the decoding performance (from highest to lowest), we obtained the following order:    and y-directions) taken from the last monkey I recording session for the cases of WF and QRNN, respectively. Detailed comparison of decoding performance across firing rate algorithms, decoding algorithms, subjects, and performance metrics is given in Table 3.

F. DECODING PERFORMANCE COMPARISON BETWEEN SUA AND MUA
Further, we sought to determine whether MUA has better decoding performance compared to SUA. Thus, we compared the decoding performance of MUA against SUA using BAKS coupled with different decoders. Fig. 9 shows the decoding performance comparison using WF decoder from monkeys I and L datasets. MUA was found to be superior than SUA as measured with RMSE and CC metrics across recording sessions from both monkeys (Figs. 9(a)-(b) and 9(e)-(f)). MUA had statistically significant differences in decoding performance compared to SUA as shown in Figs This corresponded to an average improvement of RMSE = 8.08% and CC = 6.7% for monkey I dataset and an average improvement of RMSE = 7.88% and CC = 18.46% for monkey L dataset. Results from using QRNN decoder also showed the same trend, that is, MUA significantly and consistently outperformed SUA across recording sessions, subjects, and performance metrics as illustrated in Fig. 10. In this case, MUA yielded an average improvement RMSE = 8.84% and CC = 4.11% (RMSE = 8.73% and CC = 13.12%) for monkey I (L) dataset. Results from using other decoders, KF and LSTM, can be seen in Supplementary Figs. 8 and 9, respectively. More detailed numerical comparison of decoding performance between SUA and MUA across decoders is given in Table 4.

G. COMPARISON OF COMPUTATIONAL COMPLEXITY
Lastly, we compared the computational complexity of BAKS against other firing rate estimation algorithms. The computation of BAKS is composed of two steps: adaptive bandwidth estimation and kernel evaluation. To estimate firing rate at VOLUME 10, 2022   one evaluation point (e.g. at the middle of observation interval), each step of BAKS requires O(n) operations, where n denotes the number of spikes within the observation interval (i.e. window width); thus, the computational complexity of BAKS is O(2n). In the case of FKS, there is no bandwidth estimation step because the bandwidth is predefined and fixed throughout the experiment. Using the this bandwidth, FKS performs kernel evaluation, which has computational complexity of O(n). In the case of binning, the computation is performed by simply counting the number of spikes within the observation interval. Therefore, the computational complexity of binning is O (1). We also compared average runtime, that is, the average time needed by each algorithm to produce one sample of firing rate. To make fair comparison, we used data from the first recording session (I20160407_02) of monkey I dataset with the same window width (240 ms) for all the algorithms. The runtime was computed by using time() function within time built-in module in Python. We reported the average and standard deviation of the runtime from 90 iterations (MUA channels) in Table 5. BAKS took an average runtime of 132.30 ± 66.03 µs which corresponds to 5.93 (1.50) times slower than binning (FKS). The average runtime of FKS (88.33 ± 21.36 µs) was 3.96 slower than that of binning (22.31 ± 7.83 µs). The summary of computational complexity and runtime comparison across all methods can be seen in Table 5.

IV. DISCUSSION
According to rate coding theory, firing rate -the rate at which a neuron 'fires' spikes-carries a significant amount of information about behavioural task or stimuli. Thus, one common preprocessing step in spike-based BMI is to estimate firing rate from the spike train. A widely used binning method results in a noisy estimate of firing rate, leading to suboptimal decoding performance. Previous studies [23], [57], [58] showed that decoding performance could be improved by utilising Gaussian kernel smoother to obtain a smooth estimate of firing rate. However, these studies employed a fixed smoothing parameter (bandwidth) which may yield simultaneously under-and over-smoothing, depending on the spike dynamics within the experiment. We hypothesised that employing an adaptive (i.e. variable, instead of fixed) bandwidth-based kernel smoother can improve the quality of firing rate estimates, which, in turn, leads to potentially better decoding performance.
To test our hypothesis, we proposed BAKS for estimating firing rate and applied it to MUA-based BMI using linear decoders (KF and WF). These linear decoders were selected due to significantly fewer number of hyperparameters compared to DL decoders. This makes the decoding performance comparison more reliable and less confounded by the choice of hyperparameters. We then compared the decoding performance of BAKS against binning and fixed kernel smoother (FKS) algorithms. Comparison results from chronic intracortical neural data demonstrated that BAKS consistently and significantly outperformed other algorithms across different recording sessions, subjects, decoders, and performance metrics. BAKS incorporates a data-driven and adaptive bandwidth parameter that allows for a smoother and more accurate estimation of firing rate especially when there is a rapidly changing spike dynamic [41]. This smoothing may act as input denoising which could provide better regularisation. On the other hand, both binning and FKS employ a fixed, predefined bandwidth parameter; thus, they cannot accurately estimate the firing rate from a spike train with rapidly changing spike dynamic. Our results are in good agreement with the previous studies [23], [57] showing that a smooth estimate of spike rate can provide an improvement of decoding performance over simple binning method.
The good performance of BAKS comes at the expense of increased computational complexity and slower computational (run) time compared to binning and FKS. Although BAKS scales linearly as O(2n), there is upper bound of the number of spikes. Neurons possesses refractory period where neuron has to wait before it can fire again. In this study, the largest maximum and mean ± standard deviation of number of spikes within 240 ms window were 50 and 3.67 ± 4.05, respectively. The statistical summary of number of spikes for each monkey is given in Table 6. This study focuses on the decoding accuracy and uses naive straightforward implementation of BAKS formula without applying any optimisation technique. A potential avenue for future work is to address the computational complexity and time issues of BAKS.
We found that the average performance improvement of BAKS relative to other firing rate estimation algorithms was larger when using linear decoders (KF and WF) than when using DL decoders (MLP, LSTM, and QRNN). In other words, DL decoders are less sensitive to the smoothness and accuracy of estimated firing rates than linear decoders. We argue that this is because DL decoders can compensate for the differences in estimated firing rates via different optimised hyperparameter configurations. DL decoders have multiple hyperparameters that can confound the analysis and performance benchmark. To make fair comparison, we applied the same hyperparameter optimisation procedure to DL decoders with different firing rate estimation algorithms. In the case of linear decoders, we can evaluate the impact of firing rate estimation algorithms using the same and simple hyperparameter setting, which eliminates bias in performance benchmark.
To further improve the decoding performance, we proposed BAKS as firing rate estimation algorithm combined with QRNN as the decoding algorithm. We compared the proposed method (BAKS-QRNN) against other methods, which are all other possible combinations of firing rate algorithm (binning, FKS, or BAKS) and decoding algorithm (KF, WF, MLP, or LSTM). Extensive experimental results showed that BAKS-QRNN consistently and significantly outperformed other methods across different recording sessions, subjects, and performance metrics. We also found that DL decoders were superior than linear decoders, which demonstrates the effectiveness of DL decoders (especially QRNN) in capturing the complex, non-linear relationship between neural signals and hand kinematic data.
Next, we compared the decoding performance between MUA and SUA using BAKS coupled with different decoders. Empirical results revealed that MUA achieved significantly higher decoding performance than SUA. The same finding was observed across different recording sessions, subjects, decoding algorithms, and performance metrics. These results contradict several prior studies where SUA was shown to yield better decoding performance than MUA [17], [20], [22], [59]. It is difficult to find the exact reason to this contradiction due to the differences in many aspects such as the subject, recording setup, behavioural task, signal processing, decoding algorithm, etc. One possible explanation is that in our study, to obtain SUA, we only used well-isolated (sorted) spikes and discarded unsorted spikes (also known 'noise' or 'hash' units). Hash units contained all spikes that did not match any of the operator's defined templates used for spike sorting. Todorova et al. have recently shown that hash units contained some information about movement and discarding this information could degrade the decoding performance [34]. On the contrary, when computing MUA, we used all the detected spikes, including the hash units, which potentially contributed to improved decoding performance.
This present study expands our previous study [60] by adding new technical content and contributions as follows: (1) proposing MUA as an alternative input signal and comparing its decoding performance to that of SUA, (2) adding fixed kernel smoother (FKS) for performance benchmark, (3) proposing QRNN based DL decoder and comparing its decoding performance to other DL decoders and linear decoders, (3) using chronic neural data from two monkeys which span more than 11 months of recording sessions, and (4) making the source code publicly available that enables reproducibility and performance benchmark against new methods.

V. CONCLUSION
This study proposes BAKS as a firing rate estimation algorithm and QRNN as a decoding algorithm for MUA-based BMI. Based on extensive performance evaluation on chronic neural recordings, we have shown that BAKS coupled with QRNN significantly outperforms other combinations of firing rate estimation algorithm and decoding algorithm. This suggests the feasibility and the potential use of BAKS and QRNN for improving the decoding performance of MUAbased BMIs.

DATA AND CODE AVAILABILITY
The neural data are available from Zenodo at https://zenodo. org/record/583331 and the source code is available from Github at https://github.com/nurahmadi/spike_bmi.