Deep Metric Learning With Locality Sensitive Mining for Self-Correcting Source Separation of Neural Spiking Signals

Automated source separation algorithms have become a central tool in neuroengineering and neuroscience, where they are used to decompose neurophysiological signal into its constituent spiking sources. However, in noisy or highly multivariate recordings these decomposition techniques often make a large number of errors. Such mistakes degrade online human-machine interfacing methods and require costly post-hoc manual cleaning in the offline setting. In this article we propose an automated error correction methodology using a deep metric learning (DML) framework, generating embedding spaces in which spiking events can be both identified and assigned to their respective sources. Furthermore, we investigate the relative ability of different DML techniques to preserve the intraclass semantic structure needed to identify incorrect class labels in neurophysiological time series. Motivated by this analysis, we propose locality sensitive mining, an easily implemented sampling-based augmentation to typical DML losses which substantially improves the local semantic structure of the embedding space. We demonstrate the utility of this method to generate embedding spaces which can be used to automatically identify incorrectly labeled spiking events with high accuracy.

Abstract-Automated source separation algorithms have become a central tool in neuroengineering and neuroscience, where they are used to decompose neurophysiological signal into its constituent spiking sources.However, in noisy or highly multivariate recordings these decomposition techniques often make a large number of errors.Such mistakes degrade online humanmachine interfacing methods and require costly post-hoc manual cleaning in the offline setting.In this article we propose an automated error correction methodology using a deep metric learning (DML) framework, generating embedding spaces in which spiking events can be both identified and assigned to their respective sources.Furthermore, we investigate the relative ability of different DML techniques to preserve the intraclass semantic structure needed to identify incorrect class labels in neurophysiological time series.Motivated by this analysis, we propose locality sensitive mining, an easily implemented sampling-based augmentation to typical DML losses which substantially improves the local semantic structure of the embedding space.We demonstrate the utility of this method to generate embedding spaces which can be used to automatically identify incorrectly labeled spiking events with high accuracy.

I. INTRODUCTION
N EUROPHYSIOLOGICAL signals can generally be char- acterized as an additive mixture of repeating events from different sources, such as the motor unit activation potential (MUAP) in electromyography (EMG) or the spike potential in microelectrode cortical recordings [1], [2].The ensemble of these activation events constitute neural codes that can provide direct insights into a neurological system of interest [3], [4], [5].Consequently, the extraction of individual sources from noisy signal by blind source separation (BSS) algorithms has long been a major focus in computational neuroscience [4], [6], [7], [8].Modern BSS algorithms leverage highly multivariate data, such as the output from high-density electrode arrays [9], [10], to identify the contributions of many individual spiking sources [4], [11], [12].These sources also provide an extremely clean control signal for human-machine interfacing applications when compared to the original bulk signal [5], [13], [14].
Despite these successes, BSS algorithms frequently make mistakes when labeling source activations, which can be difficult to identify and correct automatically [15], [16].As a result, applications that use neurophysiological data still broadly rely on bulk signal [17], [18], [19].In offline decomposition, a degree of manual or semi-automated post-hoc cleaning is commonly employed, often using additional knowledge about the system of interest, for example the temporal statistics of the sources [20].The nature of this manual error correction generally relates to the mixing system of interest, for example, intracortical and intramuscular EMG decompositions generally require post-hoc examination of source classes [8], [21], whilst surface EMG (sEMG) decompositions also require further inspection of individual neural activations [22].However, whilst accurate, manual cleaning is constrained to offline methods and is an extremely time-consuming process [23].In response, modern source separation pipelines are increasingly using additional automated post-processing steps [16], [20], [24].A contemporary direction is to use the noisy BSS-derived labels to train a neural network classifier, which can then outperform the noisy label it was trained on [25], [26], [27].However, such supervised deep learning approaches generally suffer from a critical threshold where the initial label noise overwhelms the ability of the model to self-correct [28].
A different approach to leveraging deep learning for neurophysiological source separation is through deep metric learning (DML), in which a model is trained to learn the relative similarity or dissimilarity between data samples rather than to directly classify them [29], [30].In this formulation, the neural network acts as an efficient featurizer, taking high-dimensional neurophysiological input and extracting a low-dimensional embedding in which individual spiking events are easily assigned to their respective sources.In recent years DML has seen particular success in person reidentification tasks [31], [32], medical image classification [33], [34] and digital pathology [35], [36].In the theoretical domain, problems with optimization stability and computational inefficiency have been largely solved through a combination of sampling strategies [37], [38], [39] and better-distance metrics [29], [40], [41], [42].DML is attractive in the context of spiking events, which are stable over time given fixed conditions.This means that for a well trained model, each spiking event from the same source should cluster in the same location in the embedding space, with the DML model theoretically only needing to remove the distortions caused by noise and superposition events.DML also has lower-data demands and a better tolerance for class-imbalance when compared to typical neural network-based classifiers [43], both of which are common issues in neurophysiological time series processing.
In this article we have two main objectives for a DMLtrained featurizer.First, we want neural events from different sources to map to different regions of the embedding space so they can be discriminated, that is, we want good interclass variance.Second, we want neural events from the same source to preserve some intraclass structure for the purposes of identifying label noise.This second objective is complicated by the fact that maintaining a degree of intraclass variance is not usually the focus of DML methods, which are generally more interested in interclass separation [44].Current losses tend to incentivize the model to ignore intraclass semantic differences and collapse the embedding down to a tight cluster [39].To mitigate this, we propose a simple method of augmenting DML training to prevent this happening, allowing the embedding to be used for more than just class separation.
In summary, the main contributions of this article are as follows.
1) We robustly demonstrate that DML is an effective method of building a neural network featurizer for source separation of neurophysiological signals.Using sEMG signal, we show that the activations from different sources clearly cluster, making class discrimination by simple clustering methods trivial.2) We implement a number of popular DML methods, such as N-pair loss and angular loss [40], [42], demonstrating that they generate embedding spaces with excellent class discriminability, but poor local semantic structure.We show that more contemporary methods which preserve such structure can be used to identify outliers within an artificially corrupted neurophysiological signal.3) We propose locality sensitive mining (LSM), an easily implemented sampling-based method of maintaining intraclass structure that can be implemented in a variety of different DML paradigms.We go on to show that for neurophysiological signal LSM outperforms other contemporary methods of preserving local semantic structure, such as multilevel distance regularization (MDR) [44], whilst having a very simple implementation that can be added to many different DML losses.4) For the first time we are aware of in the literature, we show that DML methods which maintain intraclass semantic structure have clear practical utility for identifying incorrectly labeled outliers in the class clusters.As an example, we leverage this to build a selfcorrecting source separation pipeline for neural spiking signals, which we call DeepDecomp.

II. RELATED WORK
DML has classically been employed in discrete highdimensional data types, such as images, rather than in the time series domain.This has not been due to a lack of theoretical grounding [45], [46], [47], but relates more to the difficulty of building discrete class pairings for signals when components have variable temporal dynamics, which may explain the lack of DML methods for neurophysiological signal processing in the literature.However, when the independent factors in the generative process have relatively short and stereotyped responses, such as spiking neurons, it becomes straightforward to break the signal into windows centered on each activation.In this case, the problem becomes similar to face reidentification, where the neural network needs to learn an invariance to confounding factors when bringing images of the same face closer together in the embedding space [31], [32], [48].There has also been some work in face reidentification to make training robust to label noise, using external models to build metrics of label quality [49], [50], [51], [52].Unfortunately, these methods are generally reliant on an extremely large amount of data to be effective, which precludes their use in most neurophysiological datasets.
An alternative solution to the problem of label noise is to preserve a richer intraclass embedding, such that semantic similarity or dissimilarity between samples from the same class is better preserved, allowing identification of outliers.Early work on DML losses generally either ignored intraclass variance or actively sought to reduce it [41], [53].Some proxy-based losses, which pull samples towards shared class proxies rather than other in-class samples [54], [55], have been designed in part with the intention of reducing overfitting by preventing intraclass collapse [56], [57].There has also been some work in preserving intraclass characteristics using generative models, such as variational autoencoders [58], [59], with the main objective of improving generalization performance by excluding features that are not shared across a class when making embedding decisions.
More recently there have been some explicit attempts to preserve local semantic structure in the embedding space, as illustrated in Fig. 1.DML with self-supervised ranking uses an auxiliary term based on self-supervised learning [44], using a set of transform functions to augment samples to different degrees, such that more perturbed samples are embedded further away from the original image.However, this requires domain knowledge when sculpting the transform functions, for example the authors perturb image scale, viewpoint and color in their task of bird identification.Selecting similar transformations for neural signals would not be a simple task.An alternative approach is MDR, which also adds an auxiliary loss term, albeit one which aims to prevent class collapse by using proxies set at multiple distances from each sample [60].

A. Deep Metric Learning
The basic aim of DML is to train a neural network to map a sample taken from one of C classes to an embedding vector x, such that for an arbitrarily selected anchor embedding x a , samples from the same class x p are embedded closer than samples from a different class x n , as measured by some distance metric D. D can be a number of different metrics, such as the Euclidean distance, cosine similarity or Kullback-Leibler divergence [61].Commonly the loss function is formulated in terms of a relative distance between positive pairs (samples from the same class) and negative pairs (samples from different classes), such that where m is a margin term that specifies the objective interclass separation.
After a small amount of optimization, the bulk of negative pairs will be further away than positive pairs, making most training examples in a batch uninformative [37].This can be partially mitigated by mining strategies; calculating the DML loss using only pairings from each batch selected using some heuristic based on the embedding space [62], for example selecting only pairings where x a is further away from x p than x n .A related approach is to use multiple negative pairings N for each anchor term, as proposed originally in the N-pair loss [40] where f a,p,n is the DML loss and B is a batch of size M.

B. Locality Sensitive Mining
Whilst current methods of selecting pairings from B are suitable if the objective is to maximize interclass distance within the embedding space, they have the side effect of penalizing local semantic structure within a class [54].When positive pairings are selected randomly, the network will generally learn to embed all samples from a particular class into a dense point [57].
To preserve local semantic structure, we instead propose a new mining procedure, LSM, using a top-k algorithm to select the k closest x n to x a that belong to a different class and using only the closest x p .GPU implementations of top-k algorithms have become extremely efficient in recent years, due to their increasing use within machine learning applications [63].In the N-pair formulation this mining approach can be written as where ϒ = argmax(∠(x a , x p )) is the argmax set of the pairwise cosine similarity ∠ between x a and its associated set x p in B and = topk(∠(x a , x n )) is the top-k values of the ordered pairwise cosine similarity ∠ between x a and its associated set x n in B. 1 is the indicator function.
Like other mining procedures, LSM can be combined with many DML losses.As examples, we selected N-pair loss combined with the popular angular loss as a distance metric [42].

C. Angular Loss
Angular loss is a stable geometric reformulation of the angular distance metric, which constructs a right-angled triangle with x n and the midpoint between x a and x p , with the final vertex being the point on the semicircle joining x a and x p [42].By dropping constant terms, this geometric relationship can be used as the DML loss f a,p,n in (2), expressed as where α is an angle in radians which sets the upper accepted bounds of the loss, analogous to m in (1).

D. Cross Entropy Loss
A major difficulty in optimizing a model to detect spiking events is that spikes are relatively rare, meaning the data has a strong class imbalance toward windows with no activity [25].This is particularly problematic in DML as the losses generally have local optima of mapping all samples to a single location in the embedding space.We found that the addition of a small auxiliary cross-entropy (CE) term with temperature was useful in avoiding this trivial solution Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where z = (W/ W ) 2 x and W is a trainable matrix that compresses the embedding vector down to a dimension C vector for comparison with the one-hot encoded class labels.The impact of the CE term was weighted by a coefficient such that

E. Neurophysiological Signal Decomposition
The timestamps used by DeepDecomp can be generated using a wide variety of manual and automated processes, however, for this study the gradient convolution kernel compensation (gCKC) algorithm was selected due to its strong performance in HD-sEMG signal decomposition [64], [65].In the gCKC framework for BSS, the vector of spiking sources s at time t are first extended with L delayed versions of themselves, allowing the mixing problem, which is convolutive in most neurophysiological settings, to be written in instantaneous form where the signal observation vector x at time t is a linear mixture parameterized by the operation of the mixing matrix H on the extended source vector s plus noise ω.In practice both the observation and source vectors are additionally extended with a further R values for reason of numerical stability during the source separation procedure.Unlike independent component analysis methods which seek to directly estimate a separation vector for each source, gCKC seeks to include the additional statistical information that the spiking sources generate repetitive events within the signal.Sources are instead estimated indirectly using a linear minimum mean square error estimator, with the estimated jth source ŝj at time point t given by ŝj (t) = ĉT where ĉT ŝ j x is the transposed cross-correlation vector between an activation of the jth source and extended HD-sEMG matrix and C −1  xx is the inverted autocorrelation matrix of the extended HD-sEMG matrix x.
The vector ĉT ŝ j x is usually initialized with a time point likely to contain a source activation, which can be estimated by, for example, the Mahalanobis distance calculated on the signal [65].Once selected, ĉT ŝ j x is then optimized to find the rest of the source's signal contributions.This can be done with either a fixed-point algorithm as in [16] or in the gCKC formulation by gradient descent where c • ŝj x is the updated cross-correlation vector, α is the learning rate, and f (•) a contrast function designed to estimate the nongaussianity of the output source in a similar fashion to independent component analysis.Optimized sources can then be converted to timestamps using a linear threshold or a twoclass k means clustering algorithm.

A. Evaluated Methods
We compared a number of different DML methods, examining the utility of their respective embedding spaces for identifying two types of label noise and classifying unseen neural activations.The methods studied were as follows.
N-Pair Loss (NL) [40]: The original N-pair loss using Euclidean distance, where each sample in a batch is paired with one randomly selected in-batch positive and all in-batch negatives.A small auxiliary CE term, l C in (5), was added to stabilize training, with both γ and τ set to 0.1.
Angular Loss With NL (AL) [42]: The N-pair loss (with the same auxiliary CE term), but using angular loss instead of Euclidean distance as the distance metric.α to 0.25 radians in both the cleaning and refitting stages.It should be noted that this method also acts as an ablation of the LSM component of training.
AL With LSM: The same loss as AL (with the same hyperparameters), but using the proposed mining approach instead.k was set to 5 throughout, which was selected after a hyperparameter study also detailed in this article.
MDR [44]: We also compare our approach with the recent MDR, which also aims to preserve intraclass variance, albeit mainly from the perspective of improving generalization performance.
CE: As a baseline we also included a classifier trained using CE, that is, purely l C in (5) with both γ and τ set to 1.The embedding dimension is simply the output of the layer before W.

B. Experiments
In experiment 1 we evaluated the ability of the models to clean a label set corrupted by feature-dependent noise, where label flipping probability is related to its associated features [66].In the context of source-separated HD-sEMG, this most commonly occurs as a false-positive, where a separation vector incorrectly assigns a high probability of an in-class MUAP being present when it is not, that is, a noise class or other MU class label is flipped to the MU class of interest.To simulate this effect, we corrupted the label set by generating an artificially noisy separation vector for each MU class by randomly selecting 15 MUAP labels from that class and using the average of the associated extended HD-sEMG vectors to generate a linear minimum mean square error prediction on the extended HD-sEMG matrix.A two-class k-means clustering algorithm was then used to parameterize a linear threshold to find activations, creating a label set with a high degree of feature-dependent noise.Five levels of increasing difficulty were generated by taking an amount of false positives corresponding to 10%/20%/30%/40%/50% of the number of true labels, selected at random from the set of false positives.
In experiment 2 the models were instead evaluated on classdependent label noise, when the probability of a label flipping to another class is stable across all labels in the class [66].As for most neural spiking signals, in HD-sEMG source separation this error generally occurs when the separation vectors are very similar, usually due to similar MUAP waveform shapes Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.The model is trained twice-a cleaning phase to find the false positive labels and a refitting phase to find the false negatives.After the refitting phase the predicted class activity is much cleaner than that of the original source separation algorithm, as seen in (b).(c) Convolutional neural network trained by a DML loss to find neural activations.Windows of neurophysiological time series are embedded into a low-dimensional space which can be used to source separate and, if a method that preserves local semantic structure is used, to clean a noisy label set.
between two MU classes.This can be simulated by transferring a percentage of labels to a similar MU class.This was done by first averaging the MUAPs of each MU class and then crosscorrelating these averages with the average MUAP of every other class in the recording, with 10%/20%/30%/40% of the class labels transferred to the class with the highest value.If labels had already been transferred to the closest class then the next closest class was selected until all classes had label transfers.A maximum label corruption of 40% was used to preserve the concept of a majority true and minority false class.
After training with the artificially corrupted label sets, methods which successfully preserved intraclass variance should have a dense central cluster due to the methods of label corruption.To avoid bias from manual selection and to demonstrate a fully automated pipeline, we elected to automatically assign samples to this cluster using a simple density-estimator.As the cleaning process could potentially bias the label set by incorrectly removing in-class outliers, we also retrained the models with the cleaned sets and used the models to find unlabeled spiking events.This also allowed demonstration of the generalization performance of the models trained with the different loss functions.Finally we explored the impact of the k hyperparameter within LSM by rerunning experiment 1 with multiple values of k.
As a demonstration of the practical utility of the proposed method for building a self-correcting source separation pipeline for neural spiking signals, we give an example pipeline, DeepDecomp, which automate the cleaning of the output of a noisy gCKC decomposition of sEMG signal

C. Model and Training
To convert the source-separated HD-sEMG signal into labeled windows, first each channel of the HD-sEMG signal was standardized by z-scoring and then cut into overlapping 80-sample wide windows at a stride of 1.Each window was then labeled by reference to the predicted source activity at the final sample of the window.This meant the bulk of windows were labeled as part of the inactive class due to the sparse nature of motor neuron spiking.Due to this serious class imbalance, each minibatch was created from the entirety of the windows labeled as containing a motor neuron spike, with an additional 256 samples randomly selected from the inactive class.Each class assignment was then converted to a one-hot representation, the bulk of which had only one class active at any one time, although rarely two activations would occur simultaneously on the same time-point.As the richness of the intraclass embedding of the inactive class windows was not of any great concern, the embeddings of these windows were not used as anchor samples when calculating the DML component of the tested methods, although they were used as negative samples and in the calculation of l C .
A convolutional neural network implemented using the tensorflow machine learning library in python was used as the embedding model [Fig.2(c)].Grid search optimization was used to select specific model architecture hyperparameters.Convolution steps used a 1-D 3-sample wide kernel, with 32 filters and a drop-out of 0.2.1-D max-pooling was completed with 2-sample wide kernels.Each densely connected layers had 64 neurons and a drop-out percentage of 0.5 during training.Both the convolution and densely connected layers used ReLU activation functions.Finally the output of the last densely connected layer was densely connected to a bias and activation-free embedding layer of 8 neurons wide, which was then divided by its L2 norm.This was an intentionally lowdimension embedding compared to standard DML due to the desire to avoid dimensionality issues during the clustering steps in the refitting phase.The additional matrix W used in the categorical CE was initialized with truncated normal noise, whilst the weights of the neural network layers was initialized by glorot uniform.The Adam optimization algorithm at a learning rate of 0.001 was then used to train the model over 500 epochs for both cleaning and refitting stages.
When the model was used to find neural activations, the cosine similarity of each sample to the average embedding vectors of classes was calculated, with the spiking activations assigned to MUs by way of a two-class k-means clustering algorithm.These activation labels could then be compared to the precorrupted data using the rate of agreement (RoA) metric [1], a percentage defined as the number of true positive matches divided by the total number of true positives, false positives and false negatives.When the model was instead used to clean the corrupted label set, a simple density estimator was used.First a local scale value v was estimated by finding the mean cosine similarity of the each embedding vector to its 20 nearest neighbors and taking a median of this value across all vectors.For each label the number of other labels with a cosine similarity more than v was found and the label with the highest number of neighbors was selected as the center of the cluster.All labels within a cosine similarity higher than v were then added to the refitting training set.This simple approach was generally adequate for quickly finding the densest region of the embedding space, which was usually the cluster of true labels.

D. Experimental Dataset
The HD-sEMG dataset consisted of 20-s recordings taken from the dominant tibialis anterior muscle of 10 men performing an isometric contraction at 15% of maximal force, previously used to validate source separation techniques [67].Maximal contraction was defined as the mean force of three 5-s maximal contractions separated by 3 min of rest, with force sampled at 2048 Hz by load cells mounted on an isometric brace.Force feedback was provided to the participants by an oscilloscope.The signal from a monopolar 12 × 5 electrode array placed over the main muscle innervation zone was sampled at 2048 Hz having been band-pass filtered at 10-500 Hz. gCKC with an additional k-means source refinement step was implemented using the tensorflow machine learning package [16], [65].As the label set was to be artificially corrupted it was important that the original be as noise free as possible, so additional post hoc steps were taken to maximize the likelihood that the timestamps were correct.Sources were manually cleaned by examining interspike intervals and the source-to-noise ratio of each activation.An additional step of validating decomposition accuracy was implemented by comparing the sources to those found using the DEMUSE source-separation software package [64], [65], with source cleaning completed by a different trained operator.

A. Feature-Dependent Label Noise
In experiment 1, which tested the effect of featuredependent label noise by simulating noisy separation vectors, the LSM-trained network generated an embedding space with dense clusters for each class corresponding to the true labels.Surrounding each cluster is a large sparse periphery of false labels with no apparent structure, the expected result as these false samples shared fewer features.In contrast, the embedding spaces generated by training with CE, NL and AL gave a more uniform single cluster for each class (with some distant outliers for CE).The utility of these different embedding spaces for identifying false labels is particularly clear when LSM is compared to AL using a 2-D principal component space (Fig. 3).
Table I shows the cleaning results when selecting only samples from the highest-density region of a class embedding.The LSM-trained network generated an embedding space with utility for removing false labels even at the maximum tested value of 50% of total correct values, with a median post-cleaning false label retention of 2.3% of the total correct labels in the class.The number of true labels lost during the cleaning process fell as the precleaning percentage of false labels increased, but even at the highest-false label percentage tested, a median of 74.1% of the true values were still retained.
MDR also performed well at preserving some local semantic structure and so the embedding space had some utility for identifying noisy labels, although is was slightly outperformed Fig. 3. Effect of two different DML losses on the embedding space for two units as shown by the first and second principle components for 40% corruption with feature-dependent label noise.(a) Effect of using the original random sampling method of angular loss, leading to all intraclass embeddings contracting down to a point.In (b) the same optimization was run again using a DML loss that preserves local structure, creating an embedding space in which the true labels cluster away from the false labels.
by LSM.In contrast, the networks trained with CE, AL and NL did not generate embedding spaces suitable for label cleaning.AL and MDR generally produced much more distributed clusters, meaning less samples in total, true or false, were selected by the density estimator.

B. Class-Dependent Label Noise
In experiment 2, when labels were randomly flipped to the MU class with the closes average MUAP shape, LSM again generated embedding spaces with clear separation between true and false separation.However, unlike in the first experiment, the false labels formed a second distinct cluster within the embedding space (Fig. 4).As the true label cluster always had more values, it was still clearly identified by the density estimator.
Table II gives the cleaning results for class-dependent label noise across the different tested methods.As in experiment 1, the LSM-trained model generated an embedding space that allowed the density estimator to identify almost all false labels.Even at a 40% transfer the median post-cleaning false label retention was only 1.0% of the total correct labels in the class.MDR also performed quite well, albeit with a slightly higherfalse label burden than LSM.
As true labels were lost both to the initial transfer to other classes and to the cleaning phase, far fewer were retained in the post-cleaning dataset than in experiment 1 and would need to be recovered in the refitting stage.Once again, the CE, NL and AL trained models generated embedding spaces that were not useful for cleaning, and AL and MDR again generating a looser embedding with the consequence of less samples selected.

C. Rediscovering Unlabeled Activations
An important requirement if the cleaned label set is to be useful is that the cleaning process does not overly bias against true labels that are lost at this stage, making them difficult to recover.Lost labels tend to be more peripheral in the cluster, meaning their MUAP shapes are likely to be less similar to the MU class average, potentially due to superposition with a MUAP from a different class or due to a noise artefact.False negatives are also still used for training, but are labeled inappropriately, with a possibly detrimental effect of the model to generalize.To demonstrate that neither of these potential problems actually impacted training, after the cleaning stage of both experiments 1 and 2, the model was refitted with the LSM-cleaned label set.As an additional comparison, we also refitted using the same label set, but with the other methods compared in this study.
For both experiments the predicted activity after running the DeepDecomp pipeline was generally both sparse and clean, with MUAPs easily identifiable using all methods tested.These labels were compared to the original data using the RoA metric, with results given in Table III.LSM and MDR tended to outperform the other methods tested, most likely due to a better ability to generalize that has been noted in DML losses that preserve semantic information [44].Most striking was the ability of the DeepDecomp pipeline to recover the original label sets in extreme degrees of label noise, as demonstrated visually in Fig. 5.

D. Impact of the k Hyperparameter
Two main trends emerged when the cleaning component of experiment 1 was repeated using different values of the k Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.hyperparameter in the LSM method, which controls the number of top-k negatives which are used to build the loss (Fig. 6).As k increases the intraclass embedding becomes increasingly dense, resulting in the density estimator selecting a greater number of samples.At the same time, the embedding begins to lose its intraclass variance, resulting in increasing numbers of false negatives being incorrectly assigned as true.This leads to an optimal value of k being 5 if the main objective is to maximize the likelihood of identifying false negatives.Its worth noting that a k of 1 is functionally similar to an easy-negative loss [39], although this is quite inefficient with respect to losing true positives to the cleaning process.

VI. CONCLUSION
In this study we demonstrate that a DML pipeline which preserves local semantic structure can embed high-dimensional neurophysiological signal into a low-dimensional space that allows accurate identification of incorrectly labeled activation events.Furthermore, we present LSM, a sampling-based augmentation to DML losses which preserves such semantic structure.By using artificially corrupted sEMG data, we show that this simple change can outperform other contemporary methods of preserving intraclass variance, making possible a practical pipeline for cleaning noisy source separated time series data.As an example, we created DeepDecomp, a pipeline which utilizes LSM-augmented DML to clean heavily corrupted sEMG decomposition data over two passes.Importantly the model was still able to perform even when the source of label corruption is class or even feature dependent, an important consideration in neurophysiological signal where mistakes often occur due correlated effects such as source superposition.
Although this study focused on source-separated HD-sEMG signal, it is important to emphasize the broader applicability .Single channel of unprocessed HD-sEMG and the post-decomposition predicted activity of a single class before and after cleaning and refitting, with true and false labels.(a) demonstrates the degree of complex superposition inherent to sEMG signal as opposed to cleaner recordings such as those from intracortical sources.A linear separation filter based on an average of only 15 labels is applied to the signal to generate (b), which is consequently extremely noisy, simulating a poorly optimized filter.A number of false positives corresponding to 50% of the number of true class labels has been selected.After the cleaning and refitting phases of DeepDecomp the spiking motor neuron activity in (c) is clearly identifiable, whilst incorrect labels have been suppressed.Fig. 6.Impact of the number of top-k negative samples used in LSM during the cleaning phase, after the data labels were corrupted with feature-dependent label noise.As k rises, more true positives are densely clustered in the embedding space, but the network begins to lose its intraclass variance, causing the density estimator to fail.A k of 5 was found to be optimal for maximizing the number of true positives whilst preventing class collapse. of this approach to any imperfectly labeled neurophysiological time series data characterized by repeating events.Whilst the study focused specifically on action potentials, the proposed methodology could also be used for pattern recognition in bulk neurophysiological signal, for example by supplementing contemporary prosthetics and exoskeletons [68], [69].Additionally, the labeling process need not be by a BSS algorithm.For example, a DeepDecomp-like pipeline could be applied to a dataset for which only a small component of the data has been manually labeled by an expert operator, recovering the rest of the labels accurately.In this way, the proposed approach can be viewed as a minimally supervised method for decomposing neurophysiological time series into individual cell activities.

Fig. 1 .
Fig. 1.Effect of traditional DML losses versus those designed to preserve local semantic structure.Traditional methods tend to obliterate intraclass variance in the embedding of samples from the same class (in blue), whilst recent directions instead preserve this, while maintaining good separation from samples belonging to different classes (in red).

Fig. 2 .
Fig. 2. (a) DeepDecomp, an example pipeline by which the noisy activation labels found from source-separating the high-density sEMG signal are cleaned.The model is trained twice-a cleaning phase to find the false positive labels and a refitting phase to find the false negatives.After the refitting phase the predicted class activity is much cleaner than that of the original source separation algorithm, as seen in (b).(c) Convolutional neural network trained by a DML loss to find neural activations.Windows of neurophysiological time series are embedded into a low-dimensional space which can be used to source separate and, if a method that preserves local semantic structure is used, to clean a noisy label set.

Fig. 4 .
Fig.4.Principal component plot of the embedded samples from four classes with added class-dependent label noise.The samples selected automatically for the refitting phase have been circled.The false labels have shared features as they come from the same class.This results in two tight clusters for both true and false labels, however, they are still clearly separable.

Fig. 5
Fig.5.Single channel of unprocessed HD-sEMG and the post-decomposition predicted activity of a single class before and after cleaning and refitting, with true and false labels.(a) demonstrates the degree of complex superposition inherent to sEMG signal as opposed to cleaner recordings such as those from intracortical sources.A linear separation filter based on an average of only 15 labels is applied to the signal to generate (b), which is consequently extremely noisy, simulating a poorly optimized filter.A number of false positives corresponding to 50% of the number of true class labels has been selected.After the cleaning and refitting phases of DeepDecomp the spiking motor neuron activity in (c) is clearly identifiable, whilst incorrect labels have been suppressed.
Deep Metric Learning With Locality Sensitive Mining for Self-Correcting Source Separation of Neural Spiking Signals Alexander Kenneth Clarke , Student Member, IEEE, and Dario Farina , Fellow, IEEE

TABLE I CLEANING
RESULTS FOR DIFFERENT LEVELS OF FEATURE-DEPENDENT LABEL NOISE

TABLE III REFITTING
RESULTS FOR DIFFERENT LEVELS OF LABEL NOISE