Static-Dynamic Temporal Networks for Parkinson’s Disease Detection and Severity Prediction

Most patients with Parkinson’s disease (PD) have different degrees of movement disorders, and effective gait analysis has a huge potential for uncovering hidden gait patterns to achieve the diagnosis of patients with PD. In this paper, the Static-Dynamic temporal networks are proposed for gait analysis. Our model involves a Static temporal pathway and a Dynamic temporal pathway. In the Static temporal pathway, the time series information of each sensor is processed independently with a parallel one-dimension convolutional neural network (1D-Convnet) to extract respective depth features. In the Dynamic temporal pathway, the stitched surface of the feet is deemed to be an irregular “image”, and the transfer of the force points at all levels on the sole is regarded as the “optical flow.” Then, the motion information of the force points at all levels is extracted by 16 parallel two-dimension convolutional neural network (2D-Convnet) independently. The results show that the Static-Dynamic temporal networks achieved better performance in gait detection of PD patients than other previous methods. Among them, the accuracy of PD diagnosis reached 96.7%, and the accuracy of severity prediction of PD reached 92.3%.

PD, and it is usually diagnosed by a process of elimination. The doctors have to go through multiple consultations and examinations to diagnose the initial symptoms, which will result in great waste of time and money [4], [5]. Hence, it is essential to find a high performance and high efficiency evaluation approach [6].
Typical symptoms of PD mainly include the five characteristics as follows: static tremor, rigidity, movement retardation, postural instability, and non-motor symptoms [7]. These symptoms are mild at the initial stage and gradually increase as the progression of the disease. The onset of PD is caused by the degeneration of nerve cells producing the neurotransmitter dopamine. The decrease of dopamine will directly affect muscle activity, which results in a decrease of patient's exercise capacity [8], [9]. Previous researches have demonstrated the great potential of gait analysis in PD detection [10]. Gait analysis can extract effective information concerning the functions of primary motor cortex, basal ganglia and cerebellum, which will be advantageous for the detection and monitoring of neurodegenerative diseases [11]. Hence, it would be helpful to build a powerful gait classification model for the initial diagnosis of PD.
With the rapid development of sensor and computer technologies, numerous gait analysis systems have been developed, including video analysis system [12], multiple acceleration sensor system [13], [14], and multiple pressure sensor system [15], [16], etc. Among them, the method of using multi-cameras is able to get the information with high recognition, but it is also susceptible to environmental factors such as sunlight. Inertial sensors are required to be placed in different parts of the subject, which increases the experiment's workload [17]. Pressure sensors, as a more convenient, low cost, and less effect to the patients, are often used to collect gait data of patients [18], [19].
Recently, many studies on the use of machine learning methods have been developed to diagnose PD [20], [21], [22]. In contrast, there are some limitations such as small range of application and poor explanation of features. Therefore, it is of great significance to propose an interpretable and widely applicable diagnostic model [23], [24], [25]. The optical flow algorithm has wide applications in computer vision, which can calculate the motion speed and direction of an object This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ through detecting the change of the intensity of an image pixel over time [26]. However, the idea of motion transfer has not been used in time series. From the kinematics point of view, walking is a rhythmic alternate movement of the feet, and the force transfer pattern on the sole influence the gait quality. For example, the movement range and muscle strength of PD patient's knees, hips and ankles are limited [27], [28]. Impaired muscle and unstable posture can all lead to reduced forward limb propulsion and then affect the transfer mode of the force-bearing on the sole. Inspired by these kinematics researches, the main contributions in this paper are as follows: (1) A force flow algorithm is proposed to extract the force transfer information of gait. The algorithm regards the transfer of the force points at all levels on the sole as a multiple dynamic time series signal, and uses a parallel 2D-Convnet to extract its dynamic temporal features, which effectively extracts the motion law of the gait, providing a new idea for the evaluation of healthy gait. (2) A Static-Dynamic temporal networks is proposed, which has two pathways separately extracting temporal features and force transfer features, the lateral connections are used to fuse the features of the two paths. The results show that the Static and Dynamic pathway are complementary, as the fusion of the two pathways is significantly better than either pathway.
The rest of the paper is organized as follows. Section II first shows the related work on gait recognition of PD, then the background concepts of the optical flow algorithm are introduced. Section III gives the description of the dataset and presents the proposed Static-Dynamic temporal networks. Section IV reports the results of our model. Finally, Section V gives the conclusion.

II. RELATED WORK A. Existing Gait Detection Algorithms
In the study of computer-assisted medical diagnosis of PD patients, researchers have investigated many machine learning and feature extraction methods based on gait data. For example, Ertugrul et al. [29] developed a shifted one-dimension local binary pattern (1D-LBP) method. The shifted 1D-LBP was applied to each of the 18 vertical ground reaction force (VGRF) signals to construct 18 1D-LBP mode histograms and achieved an accuracy of 88.88% in PD diagnosis. Özel et al. [30] first applied the Weighted Common Average Reference (WCAR) to reduce noise in the output signals. Then, the statistical features were extracted from multi-sensor signals using the Local Binary Pattern (LBP) conversion, and the PD diagnosis accuracy of 92.96% was achieved using the K Nearest Neighbors (KNN) method. Ren et al. [31] studied the Wiener-Akaike-Granger-Schweder influences between VGRF signals in different parts of the sole. Using a statistical test to determine whether the pressure values of the part of the sole differed significantly between the healthy controls and the PD patients. Though the above methods have achieved promising performance for PD diagnosis, the severity of the patient was not assessed. For this, Veeraragavan et al. [32] extracted 34 features such as gait cycle, standing time, swing time, and stride length of the left and right feet, and used Artificial Neural Network (ANN) to classify. In their experiments, PD diagnosis is achieved with 97.4% accuracy and severity assessment is also performed with 87.1% accuracy. Even though the above algorithms have extracted some effective characteristics, they are also easily affected by physiological parameters such as age, height and weight of the subjects, as well as environmental factors.
In addition, with the rapid development of neural networks, numerous deep learning methods have been proposed for PD diagnosis and severity prediction. For example, Nguyen et al. [33] proposed a PD diagnosis model based on Transformer algorithm, which first applied time attention on individual VGRF signals. Then they applied spatial attention to build multi-sensor spatio-temporal gait features and achieved excellent performance. Zhao et al. [34] presented a dual-stream network model to diagnose PD patients. The first network used a 2D-neural network to extract the spatial features of forces. The second network used a recurrent neural network to extract temporal features, and the average of the both networks determined the final classified result. Nancy Jane et al. [35] proposed a Q-Back-propagation time-delay neural network (Q-BTDNN) classifier, which established a temporal classification model for PD severity assessment, and obtained the accuracy of 92.19%.
In summary, the field of PD detection and severity prediction based on machine learning has limitations such as small application scope and insufficient feature extraction. The neural network is good at learning the mapping between input and output but cannot effectively discover the gait motion law [36]. So, it is of the essence to find an effective gait evaluation method applicable to sensor-based gait data.

B. Background of Optical Flow Method
Optical flow is a significant means of kinematics analysis in computer vision, widely used in dynamic detection tasks, including tracking [37], gesture recognition [38], etc. By finding the correlation between two frames of continuous signals, the algorithm can find the corresponding relationship of each pixel to estimate the movement of objects [20], [39]. At present, the optical flow is generally considered to represent the motion or temporal information of a video. In the task of action recognition, the results of many experiments have proved that although many actions can be identified by using a single video frame, there are still some actions that depend on motion information, and better recognition results can be obtained by combining motion information [40].

III. STATIC-DYNAMIC TEMPORAL NETWORKS
The aim of this paper is to propose an automated system for PD detection and severity prediction using the VGRF data. The overall framework of the proposed model includes data preprocessing and Static-Dynamic temporal networks construction.

A. Dataset Description
The dataset was collected from three independent research groups (Ga [41], Ju [42], Si [43]), and contained 93 PD patients (mean age: 66.3) and 73 healthy volunteers (mean   Table I gives the statistical details of the subjects. Every subject was asked to walk for two minutes on flatland without any assistance. Eight pressure sensors were mounted on each foot to measure forces as functions of time. When the feet of subjects were parallel, the relative coordinates of the pressure sensors on the sole are shown in Fig.1. All pressure sensors outputs were sampled with a sampling frequency of 100. The records also include two signals that reflect the summation of the eight sensor outputs for left and right foot. The severity level of PD was graded by the Hoehn&Yahr (H&Y) stage. The H&Y scale is a gross evaluation of PD, and ranges from 0 to 5 [44]. The larger H&Y scale indicates higher PD progression. The severity of PD patients is also quantified using the Unified Parkinson's Disease Rating Scale (UPDRS). The Unified Parkinson's Disease Rating Scale includes 17 evaluation items, and each item is evaluated on five levels with 0, 1, 2, 3, and 4 [45]. Table II presents the division of subjects based on the UPDRS for each subdataset. In addition, the dataset added force transfer signal is available for download at https://github.com/WoDeTianK/gaitin-parkinsons-disease.

B. Signal Preprocessing
Each walk was divided into several small segments with 100 sampling points and an overlap rate of 50% in that deep learning requires mass data. A total of 64,468 segments are segmented, and each segment has its own category. In addition, the segments of a given subject will not appear in the training sets and the test sets at the same time.

C. Static-Dynamic Temporal Networks
Our generic architecture has a Static temporal pathway and a Dynamic temporal pathway. The Static temporal pathway is used to extract the temporal features of the gait signal, and the Dynamic temporal pathway is used to extract the force transfer features of the gait signal. The two pathways are fused by lateral connections. In addition, an attention mechanism is introduced to adaptively control the weight of each sensor signal for enhancing the utilization of effective information. The overall architecture is shown in Fig. 2.

1) Static Temporal Pathway:
In the field of time-series signal recognition, one-dimensional convolutional neural network (1D-Convnet) can effectively extract temporal information. For the VGRF signal containing 18 inputs, a parallel 1D-Convnet is proposed in this paper, which can effectively extract temporal features of multiple time-series signals. The pathway consists of 18 parallel 1D-Convnet. Each convolutional network has four one-dimensional convolutional layers, and there is a max-pooling layer after two convolutions layers. These 18 parallel 1D-Convnet process 18 VGRF signals respectively to extract the respective features of each VGRF signal. Since each sensor collects pressure signals from different parts of the sole, each pressure signal has its own depth features. Where, 1D-Convnet convolves time series data using one dimension convolution kernel to extract temporal features. The process of 1D convolution is shown in Eq. (1).
where, P i refers to the size of 1D convolution kernel along the time dimension, p refers to the position in the current time dimension, ω p i jm refers to the p th value of the convolution kernel connected to the m th feature map in the previous layer, and the value of unit at x th position on the j th feature map in the i th layer, denoted as v x i j . b i j refers to the bias for this feature map, and selu(·) refers to the activation function.
2) Dynamic Temporal Pathway: In recent years, many gait classification models have been applied in PD detection, but there are some limitations, such as unable to extract multi-sensor fusion features and motion features. To solve the above problem, the transfer features of the force points on the sole were extracted to further mine the gait information of PD patients.
The plane composed of the two insteps is regarded as an irregular "image", and a gait segment is regarded as "video". The position of pixels is determined by where the sensors are placed. Table III shows the concrete parameters of the gait segment. In the domain of machine vision, the variation of pixel intensity is used to gain the movement information of objects between adjacent frames. In this paper, a force flow method is proposed to extract the force transfer information of gait.
First, the force state of different positions can be obtained by sorting the pressure value of different positions on the sole from small to large. The calculation formula is shown in Eq. (2).
where, sort(·) refers to the descending sort operator, s n refers to the VGRF of the n th pressure sensor in some moment, n ranges from 1 to 16. k n is the sort of pressure values of 16 pressure sensors in some moment. Suppose there is a pixel with coordinate (x, y) at time t, and its sort of pressure values is k(x, y). After time t, there is k(x + x, y + y)= k(x, y) It means that the force state flows from (x, y) to (x + x, y+ y), getting the motion vector ( x, y, t), which is projected in the xy plane, getting the two-dimensional motion vector ( x, y). Combining the motion vectors at different positions, the motion field is formed.
where, ⃗ A(M) refers to the motion field, P(x, y) and Q(x, y) represent the movement functions of force points relative to the x direction and y direction. The force transfer of each position on the sole between the two frames will be represented in the form of a motion vector. The motion vector is decomposed into two vectors relative to the X -axis and Y -axis directions, respectively. As the sixteen levels force points flow in the multiple-frame data, the sixteen dynamic temporal signals relative to the x-direction and the sixteen dynamic temporal signals relative to the y-direction are respectively formed. Then, we spliced them into a two-dimensional signal. In this paper, the parallel 2DCNNs are used to extract the 3) Lateral Connections: In this paper, the lateral connections are used to fuse the information of the two paths. Lateral connections are a common technique used in target detection tasks to merge different levels of spatial resolution and semantics. Similar to [46], [47], we add a lateral connection between the two pathways for every stage, these connections are right after SELU1, Pool1, and SELU2, respectively. The two pathways have different information dimensions but have the same number of channels at the same position, so the lateral connection matches the size of features of the corresponding channels to fuse the data of two pathways. Denoting the feature shape of the Static pathway as (H, C), the feature shape of the Dynamic pathway is {αW, 2, C}. We reshape and transpose {αW, 2, C} into {2αW, C}, then the output of the lateral connections is fused into the Dynamic pathway by concatenation. 4) Sensor Attention: In this paper, the attention mechanism is introduced to assign larger weights to the sensor signal with high contribution for classification [48]. The global spatial information is squeezed into a channel descriptor symbol, and the channel-wise statistics are generated by global average pooling. Each sensor channel produces a statistic z ∈ R c by shrinking U such as the c-th element of z is computed by: where F sq (·) refers to the Squeeze operation and u c refers to the c th feature. To generate weights for each sensor channel, the nonlinear relationship between the channels is captured. Considering the complexity and generalization of the model, two fully connected (FC) layers and two sigmoid functions are set to quantify the importance of each channel, expressed as: where F ex (·) refers to excitation operation, σ (x) is the Sigmoid activation function, δ(x) is the ReLU activation function.
W 1 refers to the parameters of reduced-dimension layer, W 2 refers to the parameters of dimensionality-increasing layer. The input channel is multiplied by the corresponding weight to obtain the final output.
whereX = x 1 ,x 2 , · · ·,x c and F scale (u c , s c ) are channel-wise multiplication between the scalar s c and the feature map u c ∈ R 50 . 5) Full Connection Layer: The full connection layer is used to learn the spatial features of different channels. In this paper, the outputs of 34 parallel networks are flattened and concatenated into a one-dimensional vector, which connects the output layers through two fully connected layers. For the task of PD detection, the output layer consists of 1 neuron to predict the classification probability. For the task of severity prediction, the output layer consists of 5 neurons to predict the category. The final hyper-parameters of the network are given in Table IV.

IV. EXPERIMENT ANALYSIS A. Performance Analysis for PD Diagnosis
To test the proposed deep neural network (DNN) model, we used ten-fold cross validation on 300 walks, and the dataset is divided into ten parts. Each of these parts is reserved for testing, the remaining nine are used to train the diagnostic model, and the average of the ten rounds is used as the final result. Fig. 5b shows that the average error loss decreases and inclines to zero with the increase of the number of iterations. The loss basically stabilizes after 20 iterations, and the loss  is basically stable after the number of iterations reaches 30, indicating that the model converges fairly quickly. Fig. 5a shows that when the number of iterations was 22, the classification accuracy at the segment level was 91.6%, indicating that the parameters of the network reached the optimal state. The subject diagnostic results were got by a majority vote over the classification of gait segments, and our proposed model achieved an accuracy of 96.7% at the subject level, which shows that the model can effectively detect patients with PD. Moreover, the sensitivity and the specificity all performed well.
Finally, Table V presents the comparison of the proposed model with the other studies that used the same dataset. As shown in this table, the present model achieved better accuracy and sensitivity than other models. For instance, compared to the detection algorithm proposed by Nguyen et al. [33] and Özel et al. [33], the classification accuracy and sensitivity of this algorithm are respectively improved, and the leading reason can be attributed to the extraction of the force transfer features, which offers a new idea for the detection of abnormal gait. In comparison to the diagnostic model reported by Zhao et al. [34] and Khoury et al. [49], our algorithm processes the input 1D signals independently, which makes it easier to generalize to other experiments. Therefore, our approach is easier to adapt to other clinical gait studies. Compared to the classification algorithm reported by Ertugrul et al. [29] in 2016, our model is more suitable for the gait classification problem because it uses multiple nonlinear activation functions. Thus, our model achieves a better performance in the validation set than hand-crafted methods. In addition, we conduct experiments on Static-Dynamic networks, Static-only network, and Dynamic-only network, respectively. The experiment results indicate that the Static and Dynamic pathway are complementary in that their fusion obviously increases on both.

B. Performance Analysis for the Severity Assessment of PD
The Static-Dynamic temporal model has shown good performance in the evaluation of PD, but to provide targeted treatment for different PD patients and monitor the recovery, the severity of the PD patients also be assessed based on UPDRS score in this paper. In that the UPDRS score is directly proportional to the severity of the PD, we subdivided the severity of PD into the following five categories: • Scale 1: UPDRS < 5 • Scale 2: 5 ≤UPDRS < 15 • Scale 3: 15≤UPDRS < 25 • Scale 4: 25≤UPDRS < 35 • Scale 5: 35≤UPDRS The severity assessment model retained the network structure of the diagnostic model and only modified the last layer of the network, which is composed of a fully connected layer with five neurons and a softmax activation layer. To show the performance of the severity assessment model, we present the plots of accuracy and loss during model training in Fig. 5. Confusion matrices are usually employed to analyze the performance of multiple classification model. The confusion matrix's rows indicate the model's outcomes and the columns indicate the actual labels. Provided that the rows and columns of the confusion matrix have the same class, it means the class predicted by the model is correct. Provided that the rows and columns of the confusion matrix do not have the same class, it means the class predicted by the model is wrong. Moreover, the final column of the confusion matrix represents the precision, the final row of the confusion matrix represents the recall rate, and the lower right cell represents the overall accuracy of the prediction. The confusion matrix obtained by our model is shown in Fig. 6, where several observations can be made. First, the precision of severity assessment of PD reached 92.3%. Second, the subjects with Class 2 obtained 100% in precision, recall, and F1, which indicates that these subjects with Class 2 were more distinguishable compared to those with all other classes. Conversely, the subjects with Class 5 achieved a precision of 92.5% and a recall of 86.0%, suggesting that they were not as easily distinguishable. For all we know, this is the second study to assess the severity by gait analysis based on the UPDRS scores. Table VI presents the comparison between the proposed model and the method reported by Maachi et al. [50] in 2020. In addition, Fig. 7 illustrates the boxplot of precision, recall, and F1 Score distributions for two methods. The results show that our proposed method has better performance. The leading justification is that our method not only extracted the time series information of the signal, but also extracted the force transfer information of the signal. The combination of "static" and "dynamic" time series features made the model more competitive.

V. CONCLUSION
This paper presents a Static-Dynamic temporal networks based on VGRF time series signals for PD detection and severity prediction. Firstly, since each sensor collects pressure signals from different parts of the sole, and each pressure signal has its own depth features, we use the parallel one-dimensional convolutional network to convolve with time series data to extract temporal features. Secondly, we regard the transfer of the force points at all levels on the sole as a multiple dynamic time series signal, which is processed with the two-dimensional convolutional networks parallelization to extract the motion features. Finally, we introduce the attention mechanism to add weight for individual sensor signals. After validation, it obtains state-of-the-art accuracy for PD detection and severity prediction. We hope that this Static-Dynamic time series concept will foster further research in gait recognition.