Deep Learning Inspired Feature Engineering for Classifying Tremor Severity

Bio-signals pattern recognition systems can be impacted by several factors with a potential to limit their associated performance and clinical translation. Among these factors, selecting the optimum feature extraction method, that can effectively exploit the interaction between the temporal and spatial information, is the most prominent. Despite the potential of deep learning (DL) models for extracting temporal, spatial, or temporal-spatial information, they are typically restricted by their need for a large amount of training data. The deep wavelet scattering transform (WST) is a relatively recent advancement within the DL literature to replace expensive convolution neural networks models with computationally less demanding methods. However, while some studies have used WST to extract features from biological signals, it has not been investigated before for electromyogram (EMG) and electroencephalogram (EEG) signals feature extraction. To investigate the hypothesis of the usefulness of WST for processing EMG and EEG signals, this study used a tremor dataset collected by the authors from people with tremor disorders. Specifically, the proposed work achieved three goals: (a) study the performance of extracting features from low-density EMG signals (8 channels), using the WST approach, (b) study the effect of extracting the features from high-density EEG signals (33 channels), using WST and study its robustness against changing the spatial and temporal aspects of classification accuracy, and (c) classify tremor severity using the WST method and compare the results with other well-known feature extraction approaches. The classification error rates were significantly reduced (maximum of nearly 12%) compared with other feature sets.


I. INTRODUCTION
Tremor is one of the most common neurological disorders that causes involuntary sinusoidal movements or shaking in one or more body parts, such as the head, hands, voice, trunk, or even eyes [1]. Tremors affect the parts of the brain The associate editor coordinating the review of this manuscript and approving it for publication was Yongming Li .
controlling muscle contractions that cause movement. There are more than 20 types of tremors, depending on their appearance, cause, or origin. Tremor's treatment could be achieved via different options, such as medication, lifestyle changes, surgery, or application of biomedical loading to the upper limb, which leads to damping or suppression of the muscle's motions. Meanwhile, tremor can be diagnosed through medical, physical, or even neurological examinations to evaluate VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ nerve and motor functionality and sensory skills [2]. A doctor may also ask for an electromyogram (EMG) test to measure involuntary muscle activities, diagnose tremor severity, and evaluate the reactions of different muscles to nerve stimulation [3]. Recent studies have also attempted to utilize EMG and electroencephalogram (EEG) signals to measure tremor severity to control the damping load used in biomedical loading treatments. However, the previously reported accuracies in the literature are deemed unsatisfactory and unsuitable for real-time clinical applications [4].
There are three key stages in the EMG/EEG pattern recognition (PR)-based control systems: signal segmentation, feature extraction, and classification [5]. Although the classification method is the key factor in the processing stages of any PR system, the accuracy of the classifier is mainly affected by the quality of the feature extraction methods [6]. A useful feature should have high discrimination power, low computational costs, and provides complementary information to other features within the extracted subset of features. Several types of feature extraction methods have been utilized for EMG/EEG signal analysis in the time domain (TD), frequency domain (FD) and time-frequency domain (TFD), where it has been previously shown that the latter representation may provide superior information representations [7].
In this regard, the wavelet scattering transform (WST) was recently developed as an enhanced time-frequency analysis technique based on the continuous wavelet transform (WT). Mallat introduced the deep wavelet scattering transform (WST) as a technique for feature extraction from high dimensionality datasets [8]. WST and deep convolutional neural networks (CNNs) methods have similarities in terms of their structure in the form of a succession of layered nodes. WST and CNN both have the functionality of nonlinear operation, pooling, and signal convolution. In the case of CNN networks, training involves an intensive amount of labeled training data and a significant amount of computational time and resources. In comparison to CNN, WST uses preconfigured wavelets, with fixed weights, as convolutional layers, requiring only a few layers with no iterations required to optimize the weights of the convolution filters [9]. As a result, WST significantly reduces the calculation time and can be considered as a compact CNN without any training for weight optimization. Fig.1 shows the block diagram of the difference between WST and CNN networks. It is also important to mention here that while CNN and other deep learning approaches perform well across a variety of tasks, they fall short when it comes to capturing long-and short-term signal dynamics. However, balancing the effects of temporal and spatial resolutions is important for capturing the temporal spatial dynamics of the EMG/EEG patterns. This is a factor that requires the selection of a proper window size through which one can observe such dynamics. The literature reported that when using an overlapping windows scheme for feature extraction, the optimal window size to avoid any perceived delays by the users for real-time applications should vary between 150 and 250 ms [10], [11], [12] and between 100 ms and 300 ms in other studies [12], [13]. Generally, the literature is against using an analysis window of less than 100 ms when using sparse EMG datasets collected with a few sensors, as reducing the window size makes the classification process more difficult. However, short window lengths can be processed more quickly, thereby reducing controller delays [14]. This study tests WST as a deep learning model to capture the spatial and temporal information and determine the effectiveness of the resolution for different window sizes, especially at 32 ms. In addition, it is compared with recent state-of-the-art feature extraction methods.
On the other hand, the number of electrode channels also affects spatial information, since retrieving more channels yields more spatial features. Although additional channels offer advantages, there are practical restrictions on the maximum number of electrodes that may be employed owing to crosstalk from nearby electrodes [15], [16]. Furthermore, using more electrodes leads to increased processing times, resulting in longer controller delays. In addition, more channels require more surface area with an associated complexity to fix these electrodes, making them uncomfortable for the users. In this study, we employed high-density (HD) EEG recording to enhance the ability to obtain spatial information by increasing the number of electrodes. HD recording allows for precise temporal and spatial understanding of the underlying muscle and brain activities.
This study offers a unique perspective through which we analyze the EMG and EEG signals for classifying tremor patterns while also looking at the low-density EMG and high-density EEG datasets to study the robustness of WST with changing spatial and temporal resolutions.

II. BACKGROUND FOR TREMOR SEVERITY CLASSIFICATION METHODS
Several studies attempted to classify Tremor types through the use of machine learning and PR. Palmes et al. [17] utilized an support vector machines (SVM) classifier system to recognize the tremor types (essential, Parkinson, and normal) through the features extracted from the EMG electrodes. Their research considered several methods for EMG feature extraction including zero crossings (ZC), mean-absolute value (MAV), variance (VAR), and mean spectral density (MSD), with a reported accuracy of 90%. Rissanen et al. [18] analyzed the EMG signals using the histogram and crossing rate (CR) method to classify two types of groups which are Parkinson's patients and healthy subjects, reaching an accuracy of 86%. Cheraghizanjani et al. [19] developed a tremor test rig that measured tremor levels in patients with neurological disorders, with laser displacement sensors to investigate two types of tremors: horizontal and vertical postural and resting hand tremors. Woods et al. [20] developed a smartphone-based application to identify Parkinson's disease and essential postural tremors using a discrete wavelet transform and SVM. The results showed that using attention and distraction influences the identification of Parkinson's and essential tremors with an accuracy of 96.4%. On the other hand, in the study by Camara et al. [21] seven Parkinsonian patients had their data collected using local field potentials (LFP), an experiment within which deep brain stimulation electrodes (DBS) were utilized. To select the appropriate features and detect tremor types, artificial intelligence techniques are used. The system accuracy was 89.5% for the overall performance. Pan et al. [22] also performed a comparative analysis between SVM and neural networks to classify tremors using microelectrodes embedded in the brains of patients with PD. The classifier evaluation dataset was generated based on EMG power spectrum density activities. The SVM's performance was compared to that of a multiple layer perceptron (MLP) and a radial-based function network (RBN). The results showed that using SVM provided more relevant results for tremor classification, especially for PD data which reached 81.14%. A wristwatch-type wearable device with a gyroscope and accelerometer was used to record the data by Jeon et al. [23]. In their study, nineteen time and frequency domain features were extracted from the collected datasets. The system configuration used the wrapper feature selection algorithm, principal component analysis, decision tree, support vector machine, discriminant analysis, and k-nearest neighbor algorithms to develop an automatic scoring system. The highest accuracy achieved in that study was 92.3%.

III. DATA ACQUISITION
Fourteen patients (12 males and two females, ranging in age from 45 to 85 years), with pre-identified varying Tremor conditions,, were recruited for the evaluation of the proposed PR system. Before every test, the patients were informed about the data collection technique and the research objective; accordingly, the patients agreed and signed the consent forms. The university research ethics committee authorized the consent forms and testing techniques. Laboratory tests were conducted at the University of Sydney's Brain and Mind Center's Parkinson's Disease Research Clinic. The proposed method was successful in recognizing tremor severity in three different classes: weak, moderate, and severe from different patients. Table 1 shows participants' information.
The EMG signals were collected from patients' forearms and hands. EEG data were also collected from the patients' heads to analyze their brain activity. Participants were instructed to perform six distinct actions/tasks with their   • In the first task: Participants were instructed to rest their forearm and hand on a desk for 10 seconds to demonstrate a resting tremor, in which the affected body part was supported against gravity but not engaged ( Fig. 2(a)). They repeated the exercise twice, with a 5s break between each activity. • In the second task, subjects with postural hand tremors stretched and maintained a specific forearm position ( Fig. 2(b)). Each trial lasted 10 s, with a 5s pause for a total of two trials.
• Third task: shows that the participant was instructed to flex and extend the forearm three times with a 5s pause between trials (Fig. 2(c)).
• Fourth task: The patient was instructed to take a cup from the desk, drink it, and return it to the desk. This motion was repeated three times at 5s interval ( Fig. 2(d)).
• Fifth task: Fig. 2(e) shows the participant being asked to shift chickpeas from one plate to another three times with a 5s delay ( Fig. 2(e)).
• Sixth task: For 10s, the individual was instructed to draw a spiral on the paper and then draw a line on the spiral. This exercise was repeated thrice at 5-second intervals ( Fig. 2(f)).
SynAmps RT was used to record EMG and EEG signals during the experiments. This is a broadband amplifierequipped high-speed digital acquisition system. SynAmps RT has an analogue component that amplifies low-level signals. Digital components also aid in digitization, external logging, and data transmission to a host computer. CURRY 8 was used as the software package for SynAmps RT. This is the most advanced and complete data recording and analysis software available. CURRY 8 also includes a MATLAB interface in several areas that allows data to be exported to MATLAB and imports the findings back into CURRY 8. As shown in Fig. 3, the SynAmps RT and CURRY 8 systems used EMG electrodes (8 channels) and EEG electrodes (33 channels) to collect tremor data. The 8 EMG electrodes positions are shown in Fig. 4, with the sampling frequency 1000Hz for EMG and 256Hz for EEG data collection.

IV. WAVELET SCATTERING TRANSFORM (WST)
The time-series input of biomedical signals is routed through several layers in WST, with the output of one layer serving as an input for the next. As shown in Fig. 5, each layer consists of three steps to perform the wavelet scattering transform: convolution wavelet transforms, nonlinearity, and averaging.
Translation invariance, local deformation stability, and a rich feature information representation are all provided by the WST. Furthermore, it is resistant to time-warping deformations and preserves class discriminability, making it ideal for classification in real-time applications. Finally, it is considered an informative representation of nonlinear and nonstationary signals, such as EMG and EEG signals [24], [25]. In general, the attribute of the WST can be surmised below • Competence: All three stages of WST must make it competent. By using wavelets, the accuracy is optimized in the temporal and spatial domains. As a result, the scattering transform can collect localized information from various scales with only a few coefficients at each layer.
• Robustness: The scattering representation has low computational cost. Using WST, the distance between two vectors never increases from the original vector form, thereby reducing process variability [26]. As a result, the scattering coefficients vary little and are unaffected by outliers. Furthermore, the localization of wavelets, associated with their logarithmic design and widths in the frequency domain, also provides deformation stability, a desirable property of robust descriptors.
• Interpretability: The scattering coefficients have easy-to-understand architecture and process. The extracted features were stable to local deformations, as a feature derived from the WT. Scattering decomposition could detect small variations in the peak amplitude and duration of the EMG signals.

V. WAVELET SCATTERING TRANSFORM FORMULA
The deep WST equations are obtained and explained in this section and are shown in Fig.6 [25]. Let X (t) be the timeseries input signal fed into the WST. The wavelet cluster, which is scaled by a mother wavelet, is ψ (t), and the low-pass filter function φ J (t). A low-pass filter can achieve a local translation-invariant representation for signal X * φ J (t). over a temporal window of size T. The wavelet modulus transform is used to recover all high frequencies removed by the lowpass filter. The local translation-invariant descriptor of X can be obtained by averaging the input signal using a low-pass filter [8].
This averaging eliminates all high frequencies. A wavelet modulus transform can be used to extract the high frequencies X * ψ j . The first layer of the scattering coefficients can be determined by averaging the output using low-pass filtering.
By obtaining the complementary high-frequency coefficients and averaging them again, it was possible to retrieve the information lost by averaging in the first layer and obtaining the second-layer scattering coefficients.
The wavelet modulus convolutions are obtained by repeating this technique for additional layers.
The m th order of the scattering coefficients is obtained by averaging S m .
The features of the input signal X for 0 < m < k can be determined from the scattering coefficients of all the orders.
where k is the maximal decomposition order because the energy diminishes with each repetition, and we can have up to three layers; consequently. Since most practical applications only require three layers, therefore only three layers were used in this study as well [25].

VI. METHODOLOGIES
Two methodologies were applied here to prove the robustness of using the deep WST feature extraction method to diagnose tremor severity using EMG LD signals (four channels) and EEG HD signals (32 channels).

A. METHODOLOGY 1
This methodology proved the robustness of extracting features from LD EMG signals using WST. In this methodology, the proposed WST, compared with well-known feature sets tested by different research groups, including: • STFE: Spatial-temporal features set: consisting of integral square descriptor, normalized root-square coefficient of first and second differential derivatives, mean log kernel, an estimate of mean derivative of the higher order moments per sliding window, and a measure of spatial muscle information (6 features) [31].
• PSDTD consists of six TD-based power spectrum features: the first three even power spectrum moments, a sparsity measure, and an irregularity factor (6 features) [32].
• AR-RMScombines the sixth-order AR model parameters and RMS (7 features).
In this methodology we used wavelet time scattering as a feature extraction method and a support vector machine (SVM) for classification. Based on training accuracy, SVM parameters were adjusted: C = 30 and γ = 0.0736. SVM was chosen because it has been extensively used in the published literature and is well known. The SVM classifier also has a low computing cost, allowing us to compare the efficacy of WST in identifying the EMG data. 5-fold cross-validation was used to calculate the error rate or loss to classify the EMG data. The Wilcoxon signed-rank test was used to verify the statistical significance of the classification findings, with a p-value < 0.05. Fig.7 shows the block diagram of this methodology.

B. METHODOLOGY 2
In this methodology, we study the parameters associated with the temporal and spatial information contained in the input data stream and the impact of their interactions on classification accuracy. The proposed technique was validated by using tremor datasets as high-density (HD) EEG channels.
To investigate the impact of temporal resolution, the analysis window duration was varied between 32, 64, 128, 192, and VOLUME 10, 2022 256 ms. All the tests used a fixed overlapping increment window of 32 ms because the classification error rate is rarely affected by the window increment [14].
To study the effect of spatial resolution, the number of channels used was also changed to 8, 16, and 32 channels, with each window length and overlapping increment. The deep WST is a deep learning model used to extract spatial and temporal information. This study compared the proposed technique with other well-known feature sets. The choice of these feature sets was based on the fact that they are nowadays standard feature extraction methods for comparison in many research papers. Because of the large number of channels used in this work, after extracting the feature, the spectral regression feature projection method (SR) was utilized to reduce dimensionality. SR was applied to map the original feature set from one domain into (c-1) features as a new domain, where (c) denotes the number of classes. Finally, a support vector machine (SVM) is used to classify the EMG data. The Wilcoxon signed-rank test was used to check the statistical significance of the classification results, with a p-value of < 0.05 being significant. Fig.8 shows a block diagram of the proposed methodology.

A. RESULT 1: WST CLASSIFICATION ACCURACY
These experiments investigated the success of WST for EMG signal classification. In the WST network, only three parameters are defined: time invariance, number of wavelet filter banks, and number of wavelets per octave. A Gabor wavelet approach was employed to generate wavelet decomposition. A Gaussian function is used to produce a low-pass filter. Two wavelet filter banks are used in this study. There were eight wavelets per octave in the first wavelet filter bank, and one wavelet per octave was used in the second filter bank. Fig.9 shows the wavelet filters for the two filter banks. Fig.10 shows the low-pass filter scale with a 100s invariance scale.    The performance of WST was compared to some well-known and frequently used feature sets demonstrated in this part of the paper using the data collected from the EMG channels. The Wilcoxon signed-rank adjusted test revealed significant differences (p-values 0.01) between the WST for all approaches. Fig.12. Compared to the results, WST outperformed other methods with less computational cost. Although the CNN (The architecture has two layers each consisting of a convolutional 2D layer, a batch normalization layer, and a 2D layer with maximum pooling) method can provide almost the same accuracy, WST has a much lower computational cost because it uses only three layers.

B. RESULT 2: COMPARE WST WITH WT AND WPT
This experiment compared the WST with other wavelet families to demonstrate its effectiveness using LD EMG channels. The Gabor wavelet approach was used to generate wavelets for three layers of decomposition for each of the wavelet families that were used (energy of wavelet coefficients used as features for WT and WPT). Fig. 13 shows the average classification error results across all 14 subjects and tasks. Again, the WST features outperformed the other methods, with a p-value of < 0.01.  Fig.14 shows how the proposed deep learning WST method outperformed other approaches in terms of average classification error rate for LD EMG channels when window sizes were reduced up to 32 ms. Moreover, this result shows that varying the window size have the minimal effect on the classification error rate with the proposed method. Even though these results indicate that lowering the analysis windows, increment parameter had a minimal effect on the achieved classification errors, statistical analysis revealed that these classification errors were significantly different between the various windows increments, with p-values < 0.001 for all tests across all the datasets.

D. RESULT 4: THE EFFECT OF CHANGING THE TEMPORAL AND SPATIAL RESOLUTION
The number of electrodes and the window size significantly affected the classification error rate for HD EEG signals. This experiment compared the effects of increasing the number of electrodes with varied window sizes for HD EEG channels. The suggested approach (WST) demonstrates that it can achieve considerable classification accuracy with short window sizes of up to 32 ms, allowing it to be used in real-time applications. The average rates of classification error with a standard deviation when the number of channels is increased from 8 to 32 and the window duration is increased from 32 ms up to 256 ms are listed in Table 2.

E. RESULT 5: PROCESSING TIME
This section compares the proposed WST to all conventional and feature extraction methods' computational times. It should be stated that since iterative methods like CNN are known to be computationally demanding and require a predetermined number of training iterations, they were not included in this section of the analysis. To achieve this test, we used a dataset generated at random with 150 samples distributed over 10 dimensions. The time needed to extract each feature was estimated using Matlab R2019b installed in the Intel Processor Intel(R), i5-6300U, CPU @ 2.40GHz, 2496 MHz, 2 Core(s) and windows 10 operating to get the results shown in Table 3, where the analysis was repeated 1000 times for each feature type and then to get the average value.
The proposed WST method outperformed all other approaches, whether deep learning-based or traditional feature engineering methods, by a significant margin in all of the comparisons between features, classifiers, and datasets. Although this was the case, all of the conventional feature engineering-based methods, such as (HuTD, HATD, and HATD), took less processing time to extract features however the accuracy of classification is higher than the proposed method.

VIII. DISCUSSION
There are some characteristics of tremor as a neurological disorder that aid researchers in determining the most effective therapy to eliminate it from the body. Each patient experiences tremors at a different frequency and severity. Some patients have a severe form of hand tremor, but others have fewer issues because their tremor is weak. This paper's main goal is to categorize a group of Parkinson's patients' tremors using a powerful feature extraction method that can be used for real-time assessment.
The findings of this paper demonstrates that potential feature extraction methods can be designed by utilizing WST concepts. This method was verified using EMG and EEG database collected from patients with different severity levels of tremor (weak, moderate, and severe). The WST was not used before for feature extraction from EMG and EEG biomedical signals, except in an early investigation by our research group proving the effectiveness of WST for EMG studies. In the first part of this study WST performance was tested to classify all the tasks/activities actioned by patients.
During that test WST clearly shows its powerful by getting a low error classification rate. Also, the performance of the proposed WST method tested across various state-of-the-art features extraction methods, the proposed WST method outperformed all other approaches by a considerable margin.
As WST is a time-frequency analysis technique based on the WT technique, therefore the performance of this method was also tested against WT and WPT. The results shows that this method can easily challenge other methods Interestingly, the temporal-spatial aspect was also considered in this work to test this method's robustness. While reducing the windows' sizes the number of extracted samples will reduce which theoretically should provide less data to train the corresponding models. Also, reducing the possibility of extracting spatial information by reducing electrodes density for high-density EEG signals, has been investigated. In both cases, the classification error results clearly indicated that proposed method not much affected by reducing the windows size and number of channels (reducing the temporalspatial information) compared to other methods.
As the classification accuracy alone may not be the best way to assess the performance of real-time systems, we aim to investigate the usefulness of this approach for developing our WST method for real-time control in our ongoing research. Several offline metrics, such as feature efficiency, have recently demonstrated promise as better indicators of the usability metric throughput in this direction. As part of our ongoing research in this area, we are focusing on how to choose the appropriate metrics to assess real-time performance while utilizing our suggested WST.

IX. CONCLUSION
This work explored the classification of tremor severity using the deep wavelet scattering transform, which provides a translation invariant and nonlinear features representations for the EMG signals collected from different hand gestures and the EEG signals of the brain activities for patients experiencing tremor disorders. When combined with appropriate classifiers, this study reveals that wavelet scattering coefficients may be effectively used for classification and could produce highly accurate classification results for tremor severity with an average accuracy of 96.07% across all subjects. Additionally, the WST was also shown to outperform existing traditional and wavelet analysis tools from the literature, with a maximum reduction in the classification errors by nearly 12% (see Fig.12). On the other hand, while the performance of the proposed WST approach was closed to that of CNN, it is important to mention here that WST may also be classified as a deep scattering network because it uses convolutions, modulus operations, and low-pass filtering, similar to CNN neural network models. However, unlike CNN, the WST can avoid the need for multiple model parameters, high computational costs, hyperparameter adjustment, and difficulties in comprehending and interpreting the extracted features compared with the neural network. It also addresses the problem of WT changing over time by providing translation invariance, local deformation stability, and rich feature-information storage.
On the other hand, in terms of the temporal and spatial aspects of the classification, although a large window size can reduce the classification error rate, this study shows that using WST with a short window size of 32 ms can attain a low classification error rate of < 10% when using 32 EEG channels. The relationship between HD-EEG's temporal and spatial elements is significant. The results of this study also suggest a great potential in terms of accelerating the process in HD EEG recognition systems, allowing for the possibility of using it in real-time applications. RAMI N. KHUSHABA (Senior Member, IEEE) received the Ph.D. degree from the University of Technology Sydney (UTS), in 2010, with a focus on myoelectric signal processing and pattern recognition. His work experience included several positions focused on the applications of machine/deep learning, signal processing, analytics, data science and modeling in different fields ranging from powered prostheses control, driver drowsiness detection, neuromarketing, sleep-disordered breathing detection in heart failure and COPD patients, HVAC energy optimization and fault detection, mining automation, radarsbased materials detection, and modeling for transport. His research interests include myoelectric control, machine and deep learning theory and applications, and vision, LiDAR, and radars signal processing. He has more than 20 years' solid experience in cross-disciplinary applied research area and established his international track record, He has authored over 240 peer review papers he is leader and researcher in Computational Intelligence, Humanized Computational Intelligence based technology, Health Technology, Machine Learning, and Bio-Mechatronics Systems. Adel is IEEE Senior Member and Co-Vice Chair of IEEE CI chapter in NSW section.