Gait Intention Prediction Using a Lower-Limb Musculoskeletal Model and Long Short-Term Memory Neural Networks

The prediction of gait motion intention is essential for achieving intuitive control of assistive devices and diagnosing gait disorders. To reduce the cost associated with using multimodal signals and signal processing, we proposed a novel method that integrates machine learning with musculoskeletal modelling techniques for the prediction of time-series joint angles, using only kinematic signals. Additionally, we hypothesised that a stacked long short-term memory (LSTM) neural network architecture can perform the task without relying on any ahead-of-motion features typically provided by electromyography signals. Optical cameras and inertial measurement unit (IMU) sensors were used to track level gait kinematics. Joint angles were modelled using the musculoskeletal model. The optimal LSTM architecture in fulfilling the prediction task was determined. Joint angle predictions were performed for joints on the sagittal plane, benefiting from joint angle modelling using signals from optical cameras and IMU sensors. Our proposed method predicted the upcoming joint angles in the prediction time of 10 ms, with an averaged root mean square error of 5.3° and a coefficient of determination of 0.81. Moreover, in support of our hypothesis, the recurrent stacked LSTM network demonstrated its ability to predict intended motion accurately and efficiently in gait, outperforming two other neural network architectures: a feedforward MLP and a hybrid LSTM-MLP. The method paves the way for the development of a cost-effective, single-modal control system for assistive devices in gait rehabilitation.

conditions lead to a remarkable reduction in independence and participation, exacerbating the loss of quality of life [1], [2].For the ageing population, gait impairments are also a potential cause of falls and other metabolic and cardiovascular diseases [3].Assistive devices have emerged as a promising solution to preserve, regain and enhance gait performance.In the design of such a device (e.g., neuromuscular electrical stimulation [4], [5], robotic exoskeleton [6], [7] or powered prostheses [8]), prediction of motion intention is a fundamental task to achieve intuitive control.Moreover, motion intention prediction during gait can have implications for medical diagnosis [9], providing insights into disease progression, prevention and rehabilitation.
There is evidence that ambulatory motions in gait are periodic, which could be described in terms of a single gait cycle [10].Knowing that gait is periodic is a strong cue for motion recognition.In addition, the periodic motion could be estimated by tracking [11].Therefore, the question is whether motion intention can be predicted through the measurement and extraction of cyclic gait signals.
In the current literature, a variety of gait analysis and machine learning techniques have been proposed to predict motion intention using multimodal gait signals, including combinations of kinematic/kinetic and electromyography (EMG) signals.For example, researchers have proposed the use of EMG signals and kinetic signals -from the force plate [12] or built-in force sensors [13], [14] -to recognise a variety of locomotion modes while others integrated the use of EMG signals and kinematic signals -from motion sensors [15] or inertial measurement unit (IMU) sensors [12], [16] -to predict joint angles.The use of advanced artificial neural network architectures as a surrogate model has been developed rapidly, due to their demonstrated high accuracy made possible by novel network designs and training methods.Among these architectures, long short-term memory (LSTM) [17] has been reported to obtain increasingly impressive performance in various challenging prediction problems.LSTM architectures are customarily trained via backpropagation through time.Compared to standard, feedforward neural networks, such as the multilayer perceptron (MLP) [18], the LSTM enables the learning of long-term temporal dependencies by using a specialised structure of neural units, which selectively remember or forget information from previous time steps.These characteristics make it most effective and robust in the application of time series of biological signals [16].
Despite successful applications of multimodal signals in gait intention prediction, they require costly sensing modalities Fig. 1.A workflow to test our hypothesis that a stacked long short-term memory (LSTM) neural network architecture enables gait intention prediction.(I) gait motion was tracked by using optical cameras and IMU sensors; (II) lower limb joint angles were modelled using musculoskeletal modelling; (III) the modelled joint angles using IMU sensors was assessed, with the modelled joint angles using optical cameras considered as the gold standard and (IV) the performance of artificial neural networks in the gait motion prediction was tested.
(such as EMG sensors, IMU sensors or force sensors) and signal processing for feature extraction and recognition, leading to a technically complex and computationally expensive modelling solution.Other studies using kinematic signals alone have shown promising results in predicting discrete information in gait.For example, the use of only IMU data has been proposed to predict the upcoming terrain transition [19] and locomotor transition for the amputee population [20] while Apapicco et al [21].proposed a generalised method to predict the upcoming stride for the normal population.However, attempts at using only kinematic signals to provide continuous time-series information in gait (e.g., joint angles) are still lacking.
Given the need to predict motion intention and reduce the cost due to sensors and data processing, our study developed a novel method that integrates machine learning with musculoskeletal modelling techniques to predict the upcoming time series of joint angles, using only kinematic signals.Our hypothesis was: that a stacked long short-term memory (LSTM) neural network architecture can predict time-dependent gait motion accurately and efficiently.Two motion tracking techniques, namely optical cameras and IMU sensors were applied.In addition, the lower limb joint angles were modelled using musculoskeletal modelling techniques, which have been shown to enhance the accuracy and reliability in joint kinematics tracking [22].The purposes of the study, therefore, were to: (1) track kinematic signals using optical cameras and IMU sensors; (2) model the lower limb joint angles based on musculoskeletal modelling; (3) assess the performance of IMU-based tracking, with the optical camera-based tracking are the gold standard; and (4) assess the performance of a stacked LSTM and other neural network structures in predicting intended joint angles during gait.The workflow for testing our hypothesis is shown in Fig. 1.
Preliminary results of this work were presented at a conference [23].This paper presents an expanded study that builds upon our preliminary research in the following ways: (1) the training dataset has been increased in size; (2) the architecture of the LSTM network has been optimised using the enlarged dataset; (3) the study has investigated the use of optical cameras in addition to IMU sensors; and (4) the correlation between the performance of intention prediction and gait periodicity has been assessed.The rest of the paper is organised as follows: detailed methods of motion tracking, modelling and prediction were in Section II; experimental results were explained in Section III; discussion and conclusion of the research work were given in Sections IV and V, respectively.

II. METHODS AND MATERIALS A. Gait Data
Six healthy subjects (4M/2F, mean ± SD; age of 22.8 ± 0.4 years; height of 168.7 ± 5.6 cm; body mass of 55.5 ± 7.7 kg) without any musculoskeletal disorders or recent lower-limb injuries were recruited.Ethical approval and informed consent were obtained from the University of Birmingham.
Gait experiments were conducted in the University of Birmingham Biomechanics Laboratory, equipped with eight Vicon Vantage Cameras (V5, Vicon, UK, 100 Hz) and wireless IMU sensors (Trigno Avanti, Delsys, USA, 2000 Hz).Reflective markers (14 mm diameter) were placed bilaterally on the bony landmarks of the second and fifth metatarsal heads, posterior calcaneus, medial and lateral malleoli, medial and lateral femoral condyles, anterior superior iliac spine and posterior superior iliac spine.Clusters of four markers were also placed bilaterally on the shanks and thighs.Seven IMU sensors were placed on the pelvis and lower limb segments.For the thighs, IMU sensors were placed on the marker clusters (Fig. 2).
Subjects were asked to stand still in an anatomical position, followed by a level ground walking with a self-selective, comfortable walking speed (1.18 ± 0.06 m/s) along an eight-meter straight walkway.At least five walking trials were repeated from the same starting point.Marker trajectories and nine-axis IMU data were acquired simultaneously to enable a comparison between IMU-based tracking and conventional, camera-based tracking.
Raw marker data pre-processing was performed within Vicon Nexus (2.12.1, Vicon, UK) which included steps of Reflective markers were placed on the bony landmarks of the second and fifth metatarsal heads, posterior calcaneus, medial and lateral malleoli, medial and lateral femoral condyles, anterior superior iliac spine and posterior superior iliac spine.Clusters of four markers each were also placed bilaterally on the shanks and thighs.Seven IMU sensors were placed on the pelvis and the segments of feet, shanks and thighs.For the thighs, IMU sensors were placed on the marker clusters.(C) Each IMU has a global reference system defined as x pointing towards the global East; y pointing towards the global north-pole and z pointing perpendicular to x and y in the air.These axes should be aligned with the axes of the body segment coordinate system as defined in [49], [50] as much as possible.The initial IMU orientation with respect to the global reference frame is determined at the anatomical position using a built-in filter within the Trigno Avanti sensor.
labelling, gap-filling, and smoothing using a zero phaselag, fourth-order Butterworth filter with a cut-off frequency of 6 Hz.Gait cycles (the time interval between any of the repetitive events of the heel strike) were detected using the reflective marker at the posterior calcaneus [24].A built-in filter within the Trigno Avanti sensor was applied to calculate body segment orientations using nine-axis IMU data.The marker trajectories and body segment orientations were then converted to.trc and.sto files, respectively, which are compatible with OpenSim, an open-source musculoskeletal modelling platform for joint kinematics modelling [25].

B. Musculoskeletal Modelling to Model Joint Kinematics
A generic, full-body musculoskeletal model in OpenSim (Version 4.3, USA) was applied [26].Some modifications were implemented, including the removal of musculotendon actuators and the lock of the subtalar and metatarsophalangeal joints.This resulted in a five-degree-of-freedoms lower limb joint kinematics model, comprising hip rotation, hip flexion, hip adduction, knee flexion and ankle dorsiflexion.
Camera-based joint kinematics modelling began with scaling the generic model to match the anthropometry of each subject using the OpenSim Scaling tool.The scaling process utilised the marker trajectories from the standing, calibration trials.The anatomical bony landmarks were assigned a tracking weight of 1000, prioritising their trajectories for scaling.Other markers, such as those on the thigh and shank clusters, were assigned a weight of 1. Subsequently, the lower limb joint angles were modelled using the Inverse Kinematics (IK) tool, aiming to minimise the squared distances between the virtual marker trajectories from the scaled model and the measured marker trajectories from the subject.
The OpenSense toolkit, integrated into the OpenSim platform, was used to generate the IMU-based joint angles [27].This process involved the initial step of calibration, followed by the computation of IK.During calibration, the IMU orientations obtained from the calibration trial were first used to register each IMU sensor to its corresponding body segment in the musculoskeletal model.In addition, the heading direction of the IMU sensor on the pelvis was set as the target heading direction, resulting in the alignment of all IMU sensors to point anteriorly along the anteroposterior axis of the musculoskeletal model.Following calibration, the joint angles were computed iteratively until the angular errors between the virtual IMU orientations and the measured IMU orientations were minimised.Since OpenSense currently interfaces with only two IMU suppliers (i.e., Xsens and ADPM), a custom-built data adapter was developed.This adapter exports essential information, including orientations, linear accelerations, angular velocities, magnetic headings and frequency from Trigno Avanti sensors to the OpenSense workflow.Moreover, it facilitates an automatic registration of IMU sensors to their corresponding body segments in the musculoskeletal model.The open-source code of the IMU data adapter is available at https://simtk.org/projects/imu2opensense/.

C. Assessment of Modelled Joint Kinematics
In order to mitigate the interference of the ferromagnetic disturbances present in the laboratory environment on the IMU orientation estimation, a pre-screening process was implemented.Following a similar approach recommended by the OpenSense toolkit, we conducted the pre-screening as: first, if the differences in the heading direction exceeded a threshold of 45 degrees between the calibration trial and the walking trial, poor estimations of IMU orientations were indicated, leading to the exclusion of the walking trial; secondly, if the differences exceeded a threshold of 30 • within a 60 ms duration of a gait cycle, unrealistic variability was indicated, leading to the exclusion of the corresponding gait cycle.As a result of the pre-screening, four gait cycles remained for each subject.

D. Intention Prediction Task
The intention prediction task was performed for each subject.For each subject k, joint angles were modelled using musculoskeletal modelling for each single time step in a gait cycle and then concatenated to form a long vector T Here, t represents all the time steps in four gait cycles of the subject k, and i is the lower limb joint.Next, the vector Z ki was normalised based on its mean (µ ki ) and standard deviation (σ ki ) as: σ ki .Gait intention prediction was attempted at different time lengths τ , from a minimum of τ =10 ms to a maximum of τ =100 ms, in discrete steps of α =10 ms.In detail, τ = α × P, where P = {1, 2, 3, . . ., 10.This selection was made to meet the 10 ms real-time assistive device control limit, ensuring precise synchronisation with the intended movement [28].In addition, the time range τ enabled a comparison between our method and the one using EMG signals, the latter typically predicting Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
upcoming motion within a window τ ∈ [10, 100]ms.One predictor, that is one neural network, was used for each joint; this predictor took the current joint angle at time step j and was trained to output the joint at time j + P. In detail, for each subject k, the input to the ith (i ∈{1,2,3,. . ., 5}) neural network for the ith joint angle prediction at time step j was T , the expected output was T .

E. Neural Network Models
A standard, neural network architecture usually consists of one layer of input units (neurons), one or more layers of hidden units, and one layer of output units [18].The input layer consists of one unit for each element of the input vector, which in this case is X i .The input layer commonly acts as a buffer, fanning the input data out to the first layer of hidden units.The hidden and output layers of units sequentially process the data.The output units provide the desired system response, in this case, the elements of the Y i vector of the predicted joint angle.In a 'feedforward' architecture, the information unidirectionally flows from the input to the output layer.Feedforward neural networks are essentially nonlinear models that perform a static mapping f between the input vector and the output vector, that is Y = f (X).
In this study, two types of artificial neural network architectures are considered, the feedforward MLP and the recurrent LSTM.Whilst the MLP is known to be an extremely versatile neural network model thanks to its universal approximator capability [29], the LSTM has gained increasing popularity for overcoming the vanishing gradient problem which affects the training of standard recurrent structures [17], [30].The two architectures differ by the type of hidden units used, respectively perceptrons and LSTM units, and both may be composed of several layers of hidden neurons.
K-fold non-nested (flat) cross-validation (K = 4) was used to assess their performance.Specifically, modelled joint angles for each subject were divided into a training set, consisting of three gait cycles, and the validation/test set containing the remaining cycle.The training and validation/test sets were mutually exclusive.The validation/test set was used to evaluate the accuracy of candidate LSTM and MLP neural network structures, and for a final test of the expected accuracy once the structure was optimised.The optimisation process was repeated four times, with a different gait cycle for validation/test in each cycle.Flat cross-validation might potentially lead to an optimistic bias in the evaluation of the expected performance.Unfortunately, the available data set was not large enough to permit one further split of the training data to perform nested cross-validation.The accuracy of prediction was assessed using the coefficient of determination (R 2 ), a dimensionless metric ranging between 0 and 1, indicating the strength of the relationship between the modelled and predicted joint angles.The R 2 values averaged over the 4-fold validation were compared among six subjects using a pairedsample t-test.All training and test trials were run on MATLAB (2018b; The MathWorks Inc., USA) using the BlueBEAR high-performance computing system.
The MLP network consisted of two hidden layers, each hidden layer comprising 8 neural units and utilising the rectifier linear unit (ReLU) activation function [29], [30].The output layer also used the ReLU function and was responsible for generating the predicted joint angle.The MLP network architecture was optimised using grid search, varying the number of hidden layers from one to five, increasing the number of units per hidden layer from 8 (2 3 ) to 1024 (2 10 ), and keeping the smallest structure obtaining the highest performance (i.e., any larger structure did not yield statistically, significantly higher R 2 ).
A similar experimental procedure was used to determine the optimal architecture for the stacked LSTM network.Namely, 15 initial structures composed of one hidden layer were created, starting with a small architecture of 10 hidden units, and progressively increasing the size of the hidden layer of 10 units, until reaching the largest tested configuration of 150 hidden units.Any two architectures giving statistically undistinguishable results were deemed equivalent, and the smallest architecture (60 hidden units) achieving top prediction accuracy (R 2 ) was kept.Subsequently, a second hidden layer was added on top of the optimised first layer.Also in this case, the size of the second hidden layer was progressively increased from 10 to 150 units, and the smallest configuration (100 units) obtaining top accuracy was kept.The procedure was then repeated adding a third layer of LSTM units, but no configuration yielded a statistically significant performance improvement.Finally, three architectures of heterogeneous hidden layers were tested: a) adding one extra layer of perceptron units after the two optimised layers of LSTM units; b) adding two extra layers of perceptron units after the two optimised layers of LSTM units; and c) substituting the second layer of LSTM units with one layer of perceptron units.For all three heterogeneous architectures, the experimental procedure described above was followed to optimise the size of the perceptron layer(s).No statistically significant gain in performance was found from adding extra perceptron layers after one or two LSTM layers.In general, the best-performing architecture was found to consist of two layers of respectively 60 and 100 hidden LSTM units.Adding further layers or units typically caused a drop in accuracy, which suggested that the enlarged neural network model became over-parameterised and tended to overfit the data, i.e. was able to learn the data noise in addition to the model structure.The optimal MLP and LSTM network architectures are shown in Fig. 3.The results of the structure optimisation trial, including the training and validation/test errors for the LSTM model, are provided in the supplementary file.
Both the feedforward MLP and recurrent LSTM neural networks were trained using the state-of-the-art Adam optimiser [31].The hyperparameters of the Adam procedure were experimentally optimised: the learning rate was fixed to 0.008 and 0.005 for respectively the MLP and LSTM networks, the learning rate drop factor was set to 0.2, the learning rate drop occurred every 125 epochs, and a batch size of 2048 was used.The running time of the Adam training procedure was very reasonable: it amounted to 941 s for the optimised LSTM architecture (15 minutes), and 28.9 s for the optimised MLP architecture (0.5 minutes).

F. Data Analysis and Statistics
The accuracy and reliability of the IMU-based motion tracking were first assessed by comparing the modelled joint angles using IMU sensors with the modelled joint angles using optical cameras.Poor tracking results were identified if the root mean square error (RMSE) was greater than 8 • or the coefficient of determination (R 2 ) was less than 0.60.These threshold values were determined based on the best IMU-based gait analysis [22].
In addition to the coefficient of determination (R 2 ), the performance of MLP and stacked LSTM networks in the prediction tasks were comprehensively evaluated using additional metrics, namely the RMSE, absolute error (the absolute difference between the modelled and the predicted joint angle at each time step) and normalised RMSE (%, the percentage difference between the modelled and predicted joint angles over the whole gait cycle, normalised by the standard deviation).Again, their values averaged over the 4-fold validation, were compared using the paired-samples t-test among six subjects.The similarity between gait cycles for each subject was assessed using the coefficient of multiple correlations (CMC, [32]), which quantifies waveform similarity in gait analysis.
Pearson product-moment correlation coefficient (r ) was used to assess the relationship between gait kinematics variables and the prediction performance, where Pearson's r was

III. RESULTS
IMU-based motion tracking was found accurate and reliable on the sagittal plane when compared to camera-based motion tracking (Fig. 4).The average RMSE was 6.3 • and the coefficient of determination was above 0.63 for all lower limb joints.The largest error was found from the hip rotation (RMSE = 11.7 • , R 2  = 0.17, TABLE I) followed by hip adduction (RSME = 11.0 • , R 2  = 0.24), therefore, modelled joint angles on the non-sagittal excluded for predicting motion intention.
The LSTM network significantly outperformed the MLP ( p < 0.05, Fig. 5) in predicting the intended motion with a prediction time 10 ms.The average RMSE was 5.3 • , and the average coefficient of determination (R 2 ) was 0.81 across all lower limb joints.
When using the stacked LSTM network with a prediction time of 10 ms, the largest absolute errors occurred during pre-swing (60-75% of the gait cycle, as shown in Fig. 6) at the hip and ankle joints, with errors of 4.8 The performance of the stacked LSTM network, as assessed by the normalised RMSE (%) and R 2 , was found to have a strong correlation (0.67 ≤ |r | ≤ 0.88) to the gait similarity for our healthy cohort (N = 6), as assessed by the coefficient of multiple correlations (CMC, Fig. 7).The stacked LSTM performance decreased with prediction time: at the prediction time of 100 ms, the RMSE was up to 16.1 • (R 2 < 0.40) from the IMUs and up to 16.6 • (R 2 < 0.42) from the optical cameras (Fig. 8).

IV. DISCUSSION
This study is the first to demonstrate the prediction of motion intention throughout a gait cycle using only kinematic signals.Previous works only focused on discrete intent recognition or intent classification techniques to provide motion prediction at a discrete level.The use of musculoskeletal modelling techniques played a crucial role in obtaining accurate and reliable joint kinematics which was key for intention prediction.Moreover, in support of the initial hypothesis, the stacked LSTM network demonstrated precise prediction of intended joint angles during gait, surpassing the performance of both the feedforward MLP and hybrid LSTM-MLP network architectures.
The most accurate IMU-based motion tracking studies showed an average error of less than 5 • for lower limb joint angles in gait analysis [11], [27], [30] when compared to Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with the prediction time of 10 ms correlates with absolute angular velocity over time during gait.The joint angular velocity is derived by taking the time derivative of the modelled joint angles using IMU sensors (above) or optical cameras (below).The green solid line is the mean absolute error and the shaded area for ±1SD; the purple solid line is the mean absolute angular velocity across all subjects (N = 6).The vertical dashed line at 60% of the gait cycle divides the stance and swing phases.|r| is the Pearson product-moment correlation coefficient between the absolute angular velocity and the absolute error.
the camera-based motion tracking.Our study achieved similar accuracy on the sagittal plane, but lower accuracy on the non-sagittal plane (i.e., the hip adduction and rotation).Investigating the sources of errors, such as the IMU sensor as well as filters used in the sensor, might help reduce the errors.For example, among these studies, IMU data were acquired using Xsens IMU sensors [11], [27], [30].Customised filters by using advanced sensor fusion algorithms, such as the Mahony filter, can also mitigate errors [22].Our study, in addition, enabled subjects to walk overground at a self-selective pace, which resulted in a larger variation in velocity when compared to treadmill walking [33].
EMG signals are commonly used in motion intention prediction due to their ability to provide an ahead-of-motion feature.Specifically, EMG signals generated by lower limb muscles are detectable 10 to 100 ms prior to muscle tensions and the resulting motion [34].Previous studies have shown that using EMG signals could result in prediction errors of less than 4 • to forecast upcoming knee flexion/extension angles during gait.The prediction time ranged from 27 ms to 50 ms [8], [16], with up to nine muscles' EMG signals across the knee being measured.Our study produced a comparable error (RMSE = 5.3 • ; R 2 = 0.81) with a prediction time of 10 ms, which is an acceptable RMSE in many gait rehabilitation applications [35].More importantly, our method was effective in all lower limb joints without the need for additional surface EMG sensors.This will be greatly valuable in controlling neuromuscular electrical stimulation -an assistive device widely used for gait rehabilitation for people with neurological conditions.The presence of stimulation artefacts makes direct and continuous control via the use of EMG signals infeasible or difficult [36].Utilising IMU sensors alone in motion tracking and prediction could also provide extra benefits, such as cost-effectiveness, wearability and ease of use, all of which meet additional criteria to achieve the desired outcome across a wide range of real-world scenarios [37].However, since our performance was achieved in an experimental environment, further validation is necessary to enable functional implementation in real-world settings.
Stacked LSTM networks had been utilised in previous studies to extract features from EMG data to predict locomotion intention [38].However, up to now evidence that recurrent LSTM structures are superior to feedforward structures in terms of gait prediction accuracy is controversial.The results obtained in this study indicate a clear superiority of the recurrent LSTM models over the feedforward MLP counterparts.A possible explanation of this result is that the time dependencies of gait need to be explicitly encoded in the structure (input layer) of feedforward neural network models, whilst they are machine learned by the recurrent structures.The difficulty of fully identifying these time dependencies might explain the poor results obtained by the MLP predictors.
Without relying on the EMG signal acquisition and processing, our study used time-series joint kinematics calculated from the musculoskeletal model to predict the intended movement and reported the benefit of using a stacked LSTM network (Fig. 5).The capability of the stacked LSTM to provide prior information from past events has the potential to replace the need for acquiring EMG signals, making it a cost-effective solution to predict intended movement in many biomechanical applications.
Our study revealed that the angular velocity of each joint significantly affected the prediction performance of the proposed method ( p < 0.001, Fig. 6).For natural walking speeds between 1.00 -1.40 m/s, the peak angular velocity of each joint occurred at different gait phases.Our findings aligned with those of previous studies [39].Furthermore, we observed that the peak of the instantaneous absolute error occurs concurrently with the peak of the angular velocity of the corresponding joint.This phenomenon was likely due to the fact that larger joint angular velocities corresponded to more intensive movements and required greater adaptability from the proposed method.
A periodic gait pattern has been identified in healthy adults, according to previous studies [40].Our study quantified a similar gait periodicity as assessed by the similarity of lower limb joint kinematics between gait cycles, in our young, healthy subjects.In addition, our study found a strong, positive correlation (0.67 ≤ |r | ≤ 0.88) between the network prediction performance and the gait periodicity.In gait rehabilitation, regaining a more periodic gait pattern is important [41].By improving the repeatability and regularity of gait cycles, gait rehabilitation can help patients move more efficiently and reduce their risk of falls and other injuries.Our findings indicate that gait periodicity could be utilised to control assistive devices.Specifically, by leveraging machine learning techniques to learn the periodicity of the gait kinematic signals, neural networks can provide real-time biofeedback in such devices.It is also worth noting that our workflow could be further applied to other rehabilitation programs that involved lower limb or upper limb motions with periodic patterns, such as cycling or reaching tasks [42], [43].By incorporating the motion intention prediction into the assistive device controller, patients are likely to regain these periodic motions, leading to better recovery from neurological or musculoskeletal conditions.
Our study found that as the prediction time increased, the performance of the neural network in making accurate predictions decreased (Fig. 8).This was due to a longer prediction Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
time led to a lower correlation between input and reference data [44], [45].This finding was consistent with previous research [46].We recommended a prediction time of 10 ms as it meets the needs of intuitive assistive device control and is feasible for real-time implementation.For the camera-based or IMU-based joint angle predictions, the neural network needs to acquire 10 ms of new kinematics data (marker or IMU data) at the sample rate of 100 Hz.In addition, there is a delay in the joint angle calculation based on musculoskeletal modelling, approximately 30 ms for the camera-based and 50 ms for the IMU-based motion tracking [22], [47].Finally, the best-performing neural network (i.e., the stacked, twolayer LSTM) requires only 3.76 ± 0.41 ms for the intention prediction.This overall computation time, as calculated on a laptop with an AMD R7 (5000 series) CPU, is well below the suggested 300 ms threshold in the literature for predicting the user's intention and converting it into proper control input for the assistive device [48].
A number of limitations should be considered when interpreting the findings.The first limitation is the small sample size, as well as the homogeneity of our cohort, consisting exclusively of healthy participants.This limited the presence of a significant degree of inter-subject variation.The limited sample size also mandated the use of flat cross-validation, which might have led to an optimistic estimation of the expected performance.The second limitation arises from the simplicity of the predicted tasks, which only involved level walking at a comfortable speed.This simplicity may have contributed to the high performance observed in intra-subject prediction tasks, as it may not fully capture the complexity of walking tasks outside the experimental laboratory.Third, while our study demonstrated promising results in predicting continuous lower limb joint angles based solely on kinematic signals, additional efforts are required for clinical application.These efforts include the calibration of both the joint kinematics models and the LSTM model, utilising pathological gait kinematics specific to individual patients themselves.Finally, our study only focuses on kinematics, encompassing measurement, modelling and prediction.Future work should also focus on kinetics, such as the ground reaction forces, joint moments and internal muscle forces and joint contact forces, to expand the applicability of the proposed method.

V. CONCLUSION
Our study is the first to achieve practical prediction of upcoming joint angles using only the kinematic signals.This was achieved by integrating the musculoskeletal model with the LSTM model.The musculoskeletal model proved to provide accurate and reliable joint kinematics tracking for both the optical cameras and IMU sensor measurement techniques.Additionally, we proposed an optimal stacked LSTM architecture, surpassing the performance of both the feedforward MLP and hybrid LSTM-MLP network architectures.This architecture was proved to be accurate and efficient in the intra-subject motion prediction task.Our proposed method provided a promising solution for designing a cost-effective assistive device controller and has implications for the diagnosis of gait disorders.

ACKNOWLEDGMENT
Anonymised supporting data are available on request.The data used in this research was collected subject to the informed consent of the participants.Access to the data will be granted in line with that consent, subject to approval by the project ethics board and under a formal Data Sharing Agreement.Computations were performed using the University of Birmingham's BlueBEAR HPC service, which provides a High-Performance Computing service to the University's research community.

Fig. 2 .
Fig. 2. (A) and (B).Marker and IMU sensor placement.Reflective markers were placed on the bony landmarks of the second and fifth metatarsal heads, posterior calcaneus, medial and lateral malleoli, medial and lateral femoral condyles, anterior superior iliac spine and posterior superior iliac spine.Clusters of four markers each were also placed bilaterally on the shanks and thighs.Seven IMU sensors were placed on the pelvis and the segments of feet, shanks and thighs.For the thighs, IMU sensors were placed on the marker clusters.(C) Each IMU has a global reference system defined as x pointing towards the global East; y pointing towards the global north-pole and z pointing perpendicular to x and y in the air.These axes should be aligned with the axes of the body segment coordinate system as defined in[49],[50] as much as possible.The initial IMU orientation with respect to the global reference frame is determined at the anatomical position using a built-in filter within the Trigno Avanti sensor.

Fig. 3 .
Fig. 3. Architecture of MLP (a) and LSTM (b) networks.The inputs X i were the modelled joint angles using optical cameras and IMU sensors; the outputs were the predicted joint angles Y i .
classified as: a weak correlation for | r | < 0.39; a moderate correlation for 0.40 ≤ | r | < 0.69; a strong correlation for 0.70 ≤ | r | < 0.89, a very strong correlation for 0.90 ≤ | r | < 1.00.Preliminary analyses were performed to ensure no violation of the assumptions of normality.Unless otherwise stated, an alpha level of 0.05 was used throughout to identify statistical significance.All analyses were conducted in MATLAB (2018b; The MathWorks Inc., USA).
• and 9.4 • from IMUs and cameras at the hip, and 8.0 • and 7.5 • from IMUs and cameras at the ankle.The largest absolute error occurred during terminal swing (80-100% of the gait cycle) at the knee joint, with errors of 10.2 • and 13.2 • from IMUs and cameras, respectively.The absolute error was found to have a moderate to strong correlation (0.69 ≤ |r | ≤ 0.99) with the absolute angular velocity over time during walking ( p ≤ 0.001).

Fig. 4 .
Fig. 4. Comparison of optical cameras and IMU sensors in the modelled lower-limb joint angles.The solid line indicates the mean and the shaded area for ±1SD across all subjects (N = 6) during level walking at 1.18 ± 0.06 m/s.

Fig. 5 .
Fig. 5.The performance of stacked LSTM and MLP networks in predicting lower limb joint angles (hip flexion, knee flexion and ankle dorsiflexion) with a prediction time of 10 ms.Error bars are 1SD across all subjects (N = 6).The p value of the paired-samples t-test of significance for the differences in the results from LSTM and MLP networks.

Fig. 6 .
Fig.6.The absolute error of the stacked LSTM network in predicting the lower limb joint angles (hip flexion, knee flexion and ankle dorsiflexion) with the prediction time of 10 ms correlates with absolute angular velocity over time during gait.The joint angular velocity is derived by taking the time derivative of the modelled joint angles using IMU sensors (above) or optical cameras (below).The green solid line is the mean absolute error and the shaded area for ±1SD; the purple solid line is the mean absolute angular velocity across all subjects (N = 6).The vertical dashed line at 60% of the gait cycle divides the stance and swing phases.|r| is the Pearson product-moment correlation coefficient between the absolute angular velocity and the absolute error.

Fig. 7 .
Fig.7.The performance of the stacked LSTM network in predicting the lower limb joint angles ('+' hip flexion; 'x' knee flexion and 'o' ankle dorsiflexion; red represents the modelled joint angles using optical cameras and blue the modelled joint angles using IMU sensors) with the prediction time of 10 ms correlates with the gait similarity per subject.The prediction performance was assessed by the normalised RMSE (%) and R 2 ; gait similarity was assessed by using the coefficient of multiple correlations (CMC).

Fig. 8 .
Fig. 8.The change of performance in terms of RMSE and R 2 with the prediction time using the stacked LSTM network.The values of RMSE and R 2 are the mean across predictions to three lower limb joint angles (hip flexion, knee flexion and ankle dorsiflexion).

TABLE I ERROR
AND CORRELATION BETWEEN OPTICAL CAMERAS AND IMU SENSORS IN MODELLING LOWER LIMB JOINT KINEMATICS ACROSS ALL SUBJECTS (N = 6) DURING WALKING AT 1.18 ± 0.06 m/s