Deep Learning Models for Stable Gait Prediction Applied to Exoskeleton Reference Trajectories for Children With Cerebral Palsy

Gait trajectory prediction models have several applications in exoskeleton control; they can be used as feed-forward input to low-level controllers and to generate reference/target trajectories for position-controlled exoskeletons. In our study, we implement four deep learning models (LSTM, FCN, CNN and Transformer) that perform one-step-ahead gait trajectory prediction after training on gait patterns of typically developing children. We propose a methodology that optimises for stability in long-term forecasts, and evaluate the performance of the models on typically developing (TD) and Cerebral Palsy (CP) gait during recursive prediction of 200 time-steps in the future (which may lead to propagation of errors) and in the presence of varying levels of Gaussian noise (1%-5%). Results on TD gait show that the FCN and Transformer, with mean absolute errors (MAEs) for one-step-ahead predictions between 1.17°−1.63°, are the most suitable for the intended application. We also proposed an approach for generating adaptive trajectories that can be used as reference trajectories for position-controlled exoskeletons. Gait patterns from children with Cerebral Palsy were fed into gait trajectory prediction models trained on typically developing gait only, to generate corrective patterns. Preliminary results show that the gait patterns of typically developing children were introduced onto the generated trajectories.


I. INTRODUCTION
Cerebral Palsy (CP) is the most prevalent motor disability in children [1], affecting approximately 2.11 per 1,000 live births [2]. CP is a lifelong non-progressive condition as a result of a lesion to the brain [3]. The static lesion can be due to a brain injury that occurs before, during, or after birth, or due to an abnormality throughout fetal development [4]. CP is characterised by motor disability [3], yet the specific form of CP depends on the level of motor impairments, their type, and location [4]. CP can affect one side of the body (hemiplegia), lower extremities (diplegia), or both sides of The associate editor coordinating the review of this manuscript and approving it for publication was Frederico Guimarães . the body including upper and lower extremities (quadriplegia) [3].
CP is non-curable, with 50% of children losing their ambulatory capacity by adulthood [5]. However, available treatments target motor disabilities to minimise their impact on an individual's life [3]. Interventions include botulinum toxin injections, orthopaedic devices, orthopaedic surgery, casting, and occupational therapy [6].
Technological innovation in the field of rehabilitation has led to the development of devices such as robot-assisted gait trainers and powered exoskeletons, that can benefit children with CP. These devices have aided in 'massed practice', a type of training whereby a patient performs exercises with reduced number and duration of breaks in-between, in one rehabilitation session [1], [7]. Massed practice can optimise motor learning while reducing the strain on the therapist [1].
Several exoskeletons have been developed specifically for children with CP [8]. Sarajchi et al. present a comprehensive literature review on this topic [8]. Promising results have been reported on the effectiveness of exoskeletons in improving CP gait when used in rehabilitation. Benefits include: a reduction in metabolic cost during ambulation, improvements in knee extension and a reduction in crouch gait during stance, increased mean velocity, and increased cadence [1], [5], [9]. There are about fifteen single-joint and multi-joint lower limb exoskeletons primarily designed for children with Cerebral Palsy [8], including HAL [10], P-LEGS [11], Trexo [12], CPWalker [13], EExRoLEG [14] and WAKE-up [15].
Exoskeletons move and interact with the user and the environment based on a control strategy, often consisting of a 3-level hierarchy: high, mid, and low level of control [16]. Having knowledge of future gait trajectories can enhance the performance of the exoskeleton, by being used as feed-forward input to the low-level controllers rather than utilising feed-back input only [17]. This can lead to better tracking of the movement of the exoskeleton, and compensate for the control time-delays [17]. Several probabilistic and machine learning based methods have been used to predict future gait trajectories [17], [18], [19], [20], [21], but they are yet to be evaluated for stability in their predictions, that can be impacted due to measurement or controller noise, as well as during signal acquisition and transmission.
Furthermore, many exoskeletons follow a fixed gait trajectory, which is often the mean trajectory of a healthy population [22]. However, this may not be the most suited trajectory for the user, since it may not take into account their individual parameters, such as height and limb length, which have all been shown to influence gait [23]. Several studies worked on generating normalised gait cycles based on body parameters [24], [25]; while this approach provides more individualised gait trajectories to follow, it does not take into consideration the stride-to-stride variability during gait nor the asymmetry between the left and right joints.
Motivated by the current limitations, in this paper we present several novel contributions. Firstly, we develop stable deep learning models that predict one-step-ahead kinematic trajectories (flexion-extension angles) of the hip, knee, and ankle joints of both legs. We present a methodology for optimising for the long-term stability of those models, using dynamic time warping (DTW) distance metric for early stopping during training. The stability was evaluated by: (1) recursive forecasting (where predictions are used as input to the models, leading to propagation of errors), and (2) the addition of varying levels of Gaussian noise to the input of the model (1-5%). We evaluate the performance of the models in predicting TD and CP gaits. Finally, we propose an approach for generating continuous individualised, and corrective reference trajectories for children with CP, that take the stride-to-stride variability and asymmetry of gait into consideration. This approach involves training the deep learning models on gait from typically developing children only, feeding the models with CP gait as input, and then using the predictions from the trained models as potential reference trajectories for exoskeletons. We hypothesise that these models can learn features of 'healthy' gait patterns. When CP gait patterns are used as input, these models can 'correct' CP patterns by introducing TD gait patterns, while still considering the individual features of the child and the asymmetry of their gait. We specifically focus on predicting pediatric gait patterns, and implement the above using a variety of deep learning models including long-short-termmemory (LSTM), fully connected network (FCN), convolutional neural network (CNN), as well as Transformers which have never been investigated for gait trajectory prediction.

II. BACKGROUND
Forecasting gait trajectories has several uses in exoskeleton control. Future trajectories can be used as feed-forward input to the controllers, allowing for better tracking of the exoskeleton's movement [17], and compensating for delays in controller response times [17], [26]. Future trajectories can also be used as target trajectories, as a guide for users to follow [22].
Several approaches have been used for the gait trajectory forecasting task, including probabilistic models [17] and deep learning models such as LSTMs and CNNs [19], [20]. These approaches vary in the number of time-steps predicted, either single or several time-steps in the future. The predicted trajectories were in the form of joint angles, linear accelerations or angular velocities. Another difference amongst the approaches is the input parameter or sensors used to collect the data to develop the models which included motion capture systems [19], [27], [28], [29], IMUs [26], [30], encoders [31], and surface electromyography (sEMG) [20], [32]. A summary of some approaches in literature is presented in Table 1. Predictive models operating in real-time are prone to receive noisy inputs, during signal acquisition and transmission. Therefore, models predicting gait trajectories for exoskeleton control need to be robust to noise and stable in their predictions. While the accuracy of these models has already been evaluated, the stability of these models is yet to be investigated.
In addition to using predictive model outputs as feed-forward to controllers, predicted gait trajectories can be used as reference trajectories for exoskeletons [22]. Exoskeletons that operate based on position control have a reference trajectory, often based on healthy individuals, which dictates the position the joints need to be in during a gait cycle. This reference trajectory is used to correct pathological gait [33], but it does not take into consideration several parameters that influence gait, including speed, gender, and anthropometrics [23]. Gaussian process regression and recurrent neural networks (RNNs) have been used to address this issue, by generating healthy gait trajectories based on parameters such as speed, gender and anthropemtrics [24]. These models learn the mapping between body parameters and healthy gait cycles, allowing the generation of the most appropriate reference trajectory for each individual. Individualised gait trajectories used for gait rehabilitation have resulted in improvements in energy efficiency, measured by an increase in heart rate and reduction in peripheral capillary oxygen saturation (SpO2), compared to generalised trajectories [34].
While these approaches provide more individualised trajectories, they are fixed for all gait cycles and do not consider the inherent cycle-to-cycle variability during gait. Children with spastic CP, have been shown to have higher within-day and between-day variability in comparison to typically developing children, which can be due to the limited range of motion caused by their spasticity [35], [36]. An online adaptive trajectory generation is needed to accommodate for the cycle-to-cycle variability. Vallery et al. [33] use complementary limb motion estimation (CLME) for hemiparetic individuals, that rely on the trajectories of the healthy leg for the online estimation of the reference trajectory for the pathological leg. Using CLME was more efficient, led to EMG patterns that were closer to unperturbed gait than when using a fixed reference trajectory and avoided out-of-phase walking, which can be generated by using a fixed reference trajectory [33]. Nevertheless, this approach is restricted to hemiparetic individuals and not those who have both limbs affected. Meanwhile, Zhou et al. [25] use RNNs to generate normalised gait trajectories based on anthropometrics, as well as gait speed, yet their approach doesn't accommodate for the kinematic asymmetry of the left and right joints.
These limitations were the motivation behind our current study which presents one-step-ahead kinematic trajectory prediction models that are optimised for stability in their long-term predictions, and are evaluated for their robustness in the presence of added noise. This study also presents an approach to generate adaptive target/reference trajectories for children with CP, that vary from cycle-to-cycle, and take into account the asymmetry of the left and right joints, since a separate trajectory will be generated for each joint of the left and right sides. A similar approach has been done by Endo et al. [37], which train the GaitForMer network based on healthy gait patterns for human motion forecasting, and then retrain the model that learned gait mappings to predict the severity of gait impairment of patients with Parkinson's disease, based on the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS).

A. OVERVIEW
Our approach involves developing end-to-end kinematic trajectory prediction models, that perform one-step-ahead prediction of joint angles of the hip, knee and ankle for both legs, based on 100 time-points of past joint angles (equivalent to 1000ms for a sampling frequency of 100Hz). These models are trained to optimise for long-term stability in their predictions, by using dynamic time warping distances (DTW), in addition to validation loss, as metrics to end training of the models. We implement four deep learning models (LSTM, FCN, CNN and transformer) that, importantly, are trained on the gaits of typically developing children; we then evaluate their stability in long-term-forecasting of 200 time-steps in the future, which is twice the length of the input size. The stability of the four models is evaluated by: (1) performing recursive prediction which can lead to the propagation of errors, and (2) by adding varying levels of Gaussian noise to the input (1-5%). Finally, we use the four models to predict future trajectories (200 time-steps ahead) when using 100 time-points of CP gait as input. We hypothesise that the predictions of the models that learned the patterns of the gait of typically developing children could be used as an aid to correct the gait of CP children.

B. DATA
A dataset containing recordings of gaits of typically developing children and children with Cerebral Palsy was used to train and evaluate the models of this study. The dataset consists of flexion-extension angles of the hip, knee and ankle measured simultaneously in the sagittal plane, for the right and left legs. The data has been collected and provided by Canterbury Christ Church University, and Chailey Clinical Services.

1) METHODOLOGY OF DATA COLLECTION
The gait of typically developing (TD) children and children with Cerebral Palsy (CP) was recorded while they walked at self-selected speeds. Several trials were conducted for each child, who were asked to walk a distance of 8 meters per trial. Data were collected using the ISEN inertial motion capture system (STT Systems, Spain) which uses six inertial measurement units (IMUs) to capture the gait. Raw inertial measurements collected by IMUs were exported to and processed by the accompanying ISEN software, which derives the flexion-extension angles for the hip, knee, and ankle, for the left and right legs. The data were collected at a sampling frequency of 100Hz.

2) DEMOGRAPHICS
The participants included 10 typically developing (TD) children and 11 children with Cerebral Palsy (CP) (see Table 2). TD children were between 4 and 13 years old, while children with CP were between 8 and 12 years old with a Gross Motor Function Classification Scale (GMFCS) between I-II. The anthropometrics and demographic details of the participants are included in Figure 1 and Figure 2 respectively.

C. PRE-PROCESSING
The data of typically developing children were divided into 3 subsets: training, validation and testing sets. We split those data at a subject level (i.e. data from 8 children were used for training, data from the other 2 children for testing and data from 1 child for validation). The children for each set were selected at random. This was done to avoid testing the models    on samples from a child used for training and therefore ensure the generalisability of the models.
Each one of these sets was pre-processed by firstly segmenting the trials into samples; each trial is a recording of the flexion-extension angles of the hip, knee, and ankle joints while walking for 8 meters. Each sample constitutes of an input matrix x in , which is made of the joint angle values for 100 time-steps, and a target vector y out , which is made of the joint angle values for one following time-step. The input window size was specifically chosen to be 100 time-steps because this corresponds to 1000ms (for a 100Hz sampling frequency), which is equivalent to the length of one full gait cycle since the average length of a gait cycle for TD school-aged children is 980-990ms [39]. This means that a model can make a prediction based on one full previous cycle of an exoskeleton user. In a previous study we conducted, we investigated the effect of varying the length of the input window on the accuracy of predicting trajectories in the form of Euler angles [21]. The range of input window sizes used in that study were 50, 100, 200, 400, 600, 800, and 1000 ms. Results showed that for short-term predictions, the size of the input window does not have a significant influence on accuracy, while for long-term predictions, larger input window sizes result in better performance. This further supports our choice to set the input window size to 100 time-steps (equivalent to 1000ms).
For n samples in a set, X in ∈ R n×l in ×f , where l in (set to 100) is the number of input time-steps and f (set to 6) is the number of features that we input to the models (hip, knee and ankle angles for the left and right leg). Similarly, Y out ∈ R n×l out ×f , where n is the number of samples, l out (set to 1) is the number of target time-steps, while f (set to 6) is the number of features. The samples were generated using the sliding window method [21]; the stride value was set to 1 to maximize the number of training samples that can be generated from each trial. For the typically developing gait data, the training, testing, and validation sets had 41120, 7316, and 4832 samples, respectively.
In addition to these 3 main sets, we generated two additional long-term prediction validation and testing sets. These sets have the same input size as in X in (i.e. 100 time-steps). However, they have an output size (l out ) of 200 time-steps. These additional sets were generated to evaluate the feasibility and stability of recursive long-term forecasts of the trained models (see section III-E2 and section III-F for further details).
All the data from children with CP were used only for testings the models, for one-step-ahead prediction and long term predictions. The data were processed in the same manner as the data for typically developing children.
After their generation, the sets were normalised using min-max normalization, such that X in ∈ [0, 1] and Y out ∈ [0, 1]. The testing and validation sets are normalised according to the normalisation factors (i.e. min and max values) used during training. The min-max values were chosen to accommodate for the min-max values of both the TD and CP distributions with an additional safety boundary. The reason for this is to ensure that the models are capable of handling test data from subjects that have slightly different joint angle ranges, so the input of the models remains bounded between 0 and 1. Also, this is to accommodate for differences in CP and TD gait data distributions which can be shown in Figure 3.

D. ONE-STEP-AHEAD TRAJECTORY PREDICTION MODELS
We implemented four deep learning models that perform one step ahead prediction of gait trajectories, including a Fully . Probability density distribution of the hip, knee, and ankle angles (in degrees) for TD and CP gait data before processing. The blue and yellow lines correspond to CP and TD probability density distributions, respectively.
Connected Neural Network (FCN), a Long Short-Term Memory (LSTM), a Convolutional Neural Network (CNN), and a Transformer. In this study, the four sequence-to-sequence models are trained to make one-step ahead predictions based on a 100 time-step input window of joint angles, specifically the hip, knee and ankle angles of the left and right foot (see Figure 4). The models' g(X ) learns the mapping between input X (made of a 100 time-steps) and the outputŶ (onestep ahead prediction), to minimise the difference between the estimated outputŶ and the true output Y . For n number of samples, l in input window length, l out output window length, and f features, the input of the model is matrix X , where X ∈ R n×l in ×f , and the output of the model is matrixŶ , wherê Y ∈ R n×l out ×f . In the following subsections, the architecture of each of the models will be described.

1) FULLY CONNECTED NETWORK (FCN)
The Fully Connected Network (FCN) consisted of a series of fully connected linear layers with ReLU activation functions in between, and a final sigmoid layer as an output. The 2-dimensional input R 100×6 (given that we have 6 joint angles and a 100 time-step input window) has been flattened to a 1-dimensional vector R 600 before passing it through the fully connected layers. We use a total of five linear layers, with the architecture shown in Figure 5.

2) LONG-SHORT-TERM-MEMORY NETWORK (LSTM)
The LSTM is a type of gated recurrent neural network that has been frequently used with time-series data, since it processes the data sequentially. Each unit of an LSTM containing an input, output, and a forget gate, controls how information is passed through the network, with the parameters of these gates set during the training process [40]. For our implementation, we use a neural network that contains 2 layers, and 100 hidden units per layer. The last hidden state of the final layer is then passed onto a fully connected layer before reshaping the output into the desired shape. The architecture of the LSTM network is shown in Figure 6.

3) CONVOLUTIONAL NEURAL NETWORK (CNN)
While CNNs are most commonly used with 2-dimensional inputs such as images, several studies have used them with 1-dimensional series, whereby the 2D convolution operation VOLUME 11, 2023 is replaced with the 1D convolution operation [41]. The CNN architecture we implemented contained two pooling and four convolution layers, followed by a fully connected linear layer at the end. A ReLU activation function was used after each convolution layer. The architecture of the CNN is illustrated in Figure 7.

4) TRANSFORMER
Transformers have become increasingly popular, outperforming CNNs and LSTMs in several applications as shown in [42]. Transformers rely on attention mechanisms rather than on recurrence or convolutions. The transformer architecture we implement is based on the one proposed by Vaswani et al. [43]. The transformer contains one encoder layer and one decoder layer. The input which consists of 100 time-steps of six joint angles (hip, knee, and ankle flexion-extension angles for both legs) is fed into a linear layer that expands the dimension from R 100×6 to R 100×80 . Expanding the input dimension is necessary to be able to set the number of multi-dimensional heads of the encoder model to 8. The output of the linear layer is concatenated with positional encodings, which are used to inform the model of the order of the sequence [43]. The result of the concatenation is fed into the encoder, which is a single layer consisting of 8 multi-attention heads, and a 100-unit feedforward network. Meanwhile, the last time-step of the input is fed to a linear layer that expands the dimension from R 1×6 to R 1×80 . The decoder receives two inputs: the output of the decoder's linear layer which is concatenated with positional encodings, and the output of the encoder (which is the output of the feedforward network added and normalised with a residual connection) [43]. The output of the decoder goes through a fully connected layer and then a sigmoid activation function. The positional encodings had a dropout rate of 0.2, while the encoder and decoder had a dropout rate of 0.1. The architecture of the Transformer implemented is illustrated in Figure 8.

E. MODEL OPTIMISATION
The following subsections describe how we trained and optimised the LSTM, FCN, CNN and Transformer.

1) HYPER-PARAMETERS
All models were trained using the Adam optimiser, with the mean squared error (MSE) between one-step-ahead predictions and true values used as the loss function to update the weights of the models. The models were trained up to 40-50 epochs. We stored the models at the epoch where the DTW distance between the recursive predictions of the 200 time-steps and the true joint angles of the validation set was the lowest. To select the optimal hyperparameters for the models in this study, we have started with hyperparameters that have been selected in one of our previous studies [21]. Those hyperparameters were selected based on a hyperparameter search that uses the tree-structured Parzen estimator algorithm, a type of Bayesian hyperparameter sampler, and optimised for the prediction of trajectories in the form of Euler angles for children with CP. Details on the search space are included here [21]. We have then fine-tuned those hyperparameters to optimise the performance of the models on this dataset. The batch sizes for the FCN, LSTM, CNN, and Transformer were 32, 256, 256, and 512 respectively. The learning rate was set at 0.0001 for the FCN, LSTM and CNN, and at 0.001 for the Transformer.

2) DYNAMIC TIME WARPING DISTANCES AS EARLY STOPPING CRITERIA
In our study, we use mean square error (MSE) between the one-step-ahead predictions and the true values as the loss function, with the models being trained to minimise the loss. While this ensures low errors on one-step-ahead predictions, it does not guarantee that the models are not over-fit to short-term forecasting, and are capable of making long-term recursive forecasts. Therefore, after each epoch of training, we calculate the dynamic time warping (DTW) distance between 200 recursively predicted time steps and the true gait values of the validation set. The validation loss was monitored to ensure the model is learning and performing well on short-term prediction, and the DTW distance on the validation set was used to determine when to end the training of the model to avoid overfitting and ensure the stability of the models in long-term forecasting. In Figure 9 we plot an example of the training and validation MSE loss as well as the DTW distance measured for each epoch during the training of one of the models. Figure 9 shows that during the beginning of training, both validation loss and DTW distance decrease, but after a certain number of epochs, the DTW distance increases, indicating a worsening performance in the ability of long-term recursive forecasting. Therefore, during the training of our models, we optimise for low one-stepahead MSE validation loss as well low DTW distance in long-term recursive forecasting. This approach is illustrated in Figure 10.

3) FRAMEWORK
The Pytorch machine learning framework has been used to implement our deep learning models. We utilised several additional libraries including Numpy, Matplotlib, SciPy, Seaborn, and Scikit-learn. DTW python package was used for calculating dynamic time-warping distances [44]. Computation was run on an Nvidia Geforce RTX 2070 GPU.

F. LONG-TERM RECURSIVE TRAJECTORY FORECASTING
Recursive forecasting is an approach that reuses one-stepahead predictions made by the model as input to the model. In this study, this method has been used to evaluate the feasibility of long-term recursive forecasts (see section III-G), but is also used during training as a metric for early stopping to optimise for long-term stability (see section III-E2). We used the one-step-ahead prediction models developed in section III-D for recursive forecasting.

G. EVALUATING STABILITY
We evaluate the stability of the networks using two methods. The first method involves long-term recursive prediction. A stable network would be able to make long-term predictions using recursive input, without being significantly affected by noise resulting from the propagation of error. We evaluate the stability of the networks by recursively predicting 200 time-steps in the future, equivalent to approximately two gait cycles, and is twice the length of input times-steps used by the model. For the first 100 recursive predictions, the input to the model will be a combination of true and predicted values, while the next 100 recursive predictions will all be based on predicted values only. We compare the long-term predictions to the true values by calculating the errors between them.
The second method we use to assess the stability of the networks was the addition of Gaussian Noise to the predictions. Used in-conjunction with the long-term recursive prediction described above, this method involves the addition of varying levels of Gaussian noise (1-5%) to each prediction before using it as input to the model. We recursively predict 200 time-steps with Gaussian noise in the future and calculate the errors compared to the true gait values.

H. CEREBRAL PALSY GAIT CORRECTION
Position-controlled exoskeletons often guide users to follow a reference/target trajectory, based on the mean trajectories of a healthy population, which results in corrections to their pathological gait patterns [22], [33]. Our study proposes an approach for generating an adaptive target/reference trajectory. Our approach involves training one-step-ahead trajectory prediction models on the gait of typically developing children only. We then feed these models with CP gait, and the models' predictions are used as proposed reference/target VOLUME 11, 2023  trajectories for that CP child (see Figure 11). Our models make predictions based on input from both the right and left limbs, and produce separate output predictions for the right and left limbs, instead of the same trajectory for both limbs. This allows accommodating for the asymmetry and slight differences in right and left limb trajectories, especially for children with unilateral CP, where only one side is affected. We hypothesise that these models will learn the mappings of 'healthy' trajectories and the inter-joint couplings, and will introduce the TD patterns onto CP gait when CP gait is used as input.
Specifically, once the models have been trained on TD gait, we feed them with 100 time-steps of CP gait and recursively predict 200 time-steps in the future. We then compare the predictions to the natural evolution of CP gait to see whether the models introduced TD patterns to the predictions.

I. PERFORMANCE METRICS
In this study, we evaluate the predictive performance of the models in short-term (one-step-ahead) and long-term predictions using mean squared error (MSE) and mean absolute error (MAE) between the predictions and true values.
These metrics were calculated after the de-normalisation of the predictions. Given n testing samples, f features, and l out prediction length (set to 1 for one-step-ahead predictions and set to 200 for long-term-predictions), the equations of the MSE and MAE are shown below.

IV. RESULTS
In this section, we present the results of the predictive performance of the four models, in the short-term (one-step-ahead, section IV-A) and the long-term (200 time-steps, section IV-B). We also report the effect of noise on gait predictions (section IV-C), and show illustrative examples of CP gait trajectory corrections (section IV-D).

A. PERFORMANCE ON SHORT-TERM (ONE-STEP-AHEAD) PREDICTIONS
Four deep learning networks (LSTM, FCN, CNN and Transformer) were trained for the task of one-step-ahead gait trajectory prediction based on a 100 time-step input (see Figure 12 and Figure 13). The trajectories include 6 features, which are the hip, knee, and ankle angles in the sagittal plane for both legs. The deep learning models were trained on the gait patterns of typically developing (TD) children. We assess the predictive performance of the model on the test set, which are data from two TD children withheld from training, and calculated the mean square errors (MSE) and mean absolute errors (MAE) between the predictions and true values. We also test these models (trained on TD gait only) on data from 11 children with Cerebral Palsy (CP). The results are reported in Table 3.
Results in Table 3 show that the MAE of the LSTM for onestep-ahead prediction of TD gait is the lowest (0.87 • ), followed by the Transformer (1.17 • ) and then the FCN (1.63 • ).  The CNN has the worst performance (4.05 • ). The MAEs of the predicted values and true values for children with CP are higher compared to TD children.

B. PERFORMANCE ON LONG-TERM RECURSIVE PREDICTIONS
We use the models that perform one-step-ahead predictions and are trained on typically developing children, to perform long-term forecasting, by recursively using the one-step-ahead predictions as input (see Figure 14). We predict a total of 200 time-steps in the future. The results are reported in Table 4. For long-term predictions of TD gait, the LSTM has the lowest MAE (9.36  long-term predictions, the models have similar performance, with differences in errors between the different models smaller in long-term predictions than the differences in errors between the models in short-term predictions.

C. EFFECT OF GAUSSIAN NOISE ON THE STABILITY OF THE MODELS
Recursive predictions, generated by using on-step-ahead predictions as inputs to the models, are a way to assess the stability of the networks since the predictions will contain a level of error which will be continuously propagated. We have evaluated the performance of recursively predicting 200 timesteps in the future based on a 100 time-step input (see section IV-B for details). To further investigate the effect of noise, we add Gaussian noise to the predictions (between 1-5% of the predicted value) before using it as a recursive input, and then we measure the errors for long-term (200 time-step) predictions (see Figure 15 where we illustrate the impact of noise on the prediction of hip flexion-extension angles). Note that noise has been added to all joint angles, and its effect is evaluated in TD and CP gait predictions. Figure 16 reporting the effect of Gaussian noise (levels 1-5%) on the MAEs for long-term prediction of TD gait, shows that errors increase linearly with increasing noise levels. LSTM is the most affected by noise. The noise affected the performance of the FCN and Transformer slightly more than the CNN, but much less than the LSTM. Overall results show that the CNN, FCN and Transformer are more stable in the presence of noise compared to the LSTM. A similar trend is noted for CP gait predictions with noise (see Figure 17).

D. GENERATING ADAPTIVE REFERENCE TRAJECTORIES FOR CEREBRAL PALSY GAIT
We present an approach that suggests corrections to CP gait trajectories, that we propose can be used as target/reference trajectories for position-controlled rehabilitative exoskeletons. After training the deep learning models on TD gait only, we observe the long-term predictions (200 time-steps) when  CP gait is used as input. Results reported in Figure 18 show that the models seem to be introducing TD patterns onto the predicted CP gaits. Preliminary observations show that the predicted trajectories are ahead of the actual CP trajectories which indicates that the models may be imposing a higher gait speed. This is illustrated in Figure 18(a) and Figure 18(b), where there is a decrease in the stride time in the predicted corrections, measured by a shorter peak-to-peak distance. The stride time in Figure 18  in Figure 18(b) was also reduced by 57 time-steps in the corrected intervention.
Furthermore, the predicted corrections seem to show an increased range of motion, such as increasing knee flexion, making it more similar to TD gait. This is illustrated in Figures 18(c), 18(d), 18(e) where the range of motion of the joint angles increased by 28.4 • , 13.97 • , and 19.32 • respectively, in the predicted corrections compared to the CP gait without intervention.
These observations follow desired CP rehabilitation outcomes which include increased mean velocity and improvement in knee extension [1], [5], [9]. While these results are encouraging, they are preliminary observations and the effectiveness of the generated trajectories in enhancing the rehabilitation outcomes (such as reducing metabolic cost, and increasing gait speed), and the comfort of users should be evaluated in a clinical setting.

V. DISCUSSION
This study focused on developing end-to-end deep learning models for the task of gait trajectory prediction (flexionextension angles of the hip, knee, and ankles of both right and left legs). The intended application of these models is exoskeleton control, specifically, rehabilitative exoskeletons for children with Cerebral Palsy (CP). We trained four deep learning models (LSTM, FCN, CNN and Transformer) for the task of one-step-ahead predictions based on a 100 timestep input window. To the best of our knowledge, this is the first time Transformers have been evaluated for gait trajectory forecasting. These models have been trained on the gait patterns of typically developing (TD) children. We proposed a methodology that optimises for long-term stability during training. This methodology involves using dynamic time warping (DTW) distances between long-term recursive predictions and true values as an early stopping metric (described in section III-E2). This has prevented the models from over-fitting on one-step-ahead predictions, at the cost of long-term stability. The blue line represents the CP gait input, the red line represents the actual CP values, and the green line represents the predicted corrections to CP gait. (a) and (b) show a decrease in the peak-to-peak distance in predicted correction compared to CP gait without intervention, suggesting that the models are imposing higher speeds. (c), (d), and (e) show an increase in the range of angles in predicted corrections compared to CP gait without intervention indicating that the models are imposing a larger range of motion.
We first assessed the performance for one-step-ahead predictions using four deep-learning models. The MAEs for predictions of typically developing gait patterns ranged between (0.87 • to 4.96 • ) across all models. The LSTM had the lowest errors followed by the Transformer, the FCN and then the CNN. We can see that the performance gap is quite large between the CNN and the other three models. As for longterm recursive predictions (200 future time-steps), the MAEs on the TD gait test set ranged between (13.41 • to 14.93 • ). The differences in performance across all models are narrower in long-term predictions, yet LSTM still had the lowest errors and CNN the largest errors.
It is difficult to directly compare the findings of our study to what has been reported in the literature since no prior studies investigated the use of AI trajectory prediction models on pediatric gait or on the gait of children with CP. Furthermore, previous studies investigated different kinematic or kinetic parameters, such as predicting linear acceleration, angular velocity, or joint moment instead of joint angles or captured their data using different modalities such as using EMG or a motion capture system. Despite these differences, we can still make some broad comparisons of our models' results with what has been done in the literature. Our results show that the CNN has the worst performance, as shown previously by us [21], but different to what was reported by Moreira et al. [45] who performed ankle joint torque estimation based on kinematics, speed, and anthropometry, and found the CNN to be more robust. They didn't however compare the performance to FCN or transformers. On the other hand, a study by Molinaro et al. [46] found that the LSTM outperforms the FCN in joint moment prediction, which is similar to the result of our study, where the LSTM outperforms the FCN in one-step-ahead predictions.
We have also investigated the stability of the models by adding Gaussian noise. When comparing how varying levels of Gaussian noise (between 1%-5% of the predicted value) impacted performance, we saw a linear increase in MAEs. The LSTM, which had the lowest short-term and long-term errors, was affected the most by Gaussian noise, as shown by the largest increase in MAE. On the contrary, the CNN, which had the largest short-term and long-term errors, appeared to be the most stable in the presence of Gaussian noise showing the smallest increases in MAEs with increasing noise levels. The response of the Transformer and FCN to added noise was similar; they were impacted slightly more than the CNN network, but less significantly compared to the LSTM network. The results of this study stress the importance of reducing noise from the system, due to its influence on the predictions. This should be considered during the design of the exoskeletons.
Based on the results obtained, using the Transformer and FCN seem to be the most appropriate deep learning models for trajectory predictions for exoskeleton control since they combine low errors in short-term (one-step-ahead) and long-term prediction tasks while being more stable in the presence of added noise.
In this study, we have also proposed an approach that generates adaptive target/reference trajectories for children with Cerebral Palsy. We hypothesised that a trajectory forecasting model trained on the gait of typically developing children only, will learn their representations, and introduce corrections to CP gait patterns when used as input. We have shown with our preliminary results that the models introduced TD patterns to CP gait. While these results are encouraging, the effectiveness of using these gait patterns as reference trajectories on the rehabilitation outcomes and comfort of users will need to be assessed.
There have been a few limitations to our study. Firstly, the dataset we have used consists of 21 children (10 TD and 11 CP). We believe that a larger sample of TD children, with a wider anthropometric distribution, is needed to train the model; ideally, the anthropometrics of children with CP and of TD children used in this study should have had a more similar distribution, with a larger sample and more gait variability. Additionally, the models have been trained on flexion-extension angles collected from IMU sensors rather than encoders, which are typically used in exoskeletons. As future work, we plan to incorporate the trajectory-generating models into the control strategies of exoskeletons that rehabilitate children with Cerebral Palsy, as a high-level controller. We will test how well exoskeletons perform using the proposed trajectories as reference trajectories, and how effective they are in the rehabilitation of children with Cerebral Palsy. The influence of using the generated adaptive trajectories as reference gait trajectories in exoskeletons on the rehabilitation outcomes, metabolic cost, speed, and comfort of children with Cerebral Palsy, and how they compare to fixed trajectories needs to be investigated. Furthermore, as future work, it's possible to use reinforcement learning to generate reference trajectories, based on the patient's capacity and considering parameters such as the range of motion of their limbs and the spasticity of their muscles.

VI. CONCLUSION
In this study, we implement four deep learning algorithms trained on the gait of typically developing children, for the task of one-step-ahead prediction of gait trajectories. The intended application of this implementation is to aid in the control of exoskeletons that are used for the rehabilitation of children with Cerebral Palsy. We proposed a methodology that optimises for the stability of long-term predictions. We evaluated the performance of the models under the presence of noise, with results suggesting that the Transformer and Fully Connected Network (FCN) may be the most suited for the intended application due to their stability and low errors. We also proposed an approach that generates adaptive reference/target trajectories for position-controlled exoskeletons, with models using learned representations of typically developing (TD) gait to 'correct' Cerebral Palsy (CP) gait. Preliminary results show that these models introduce TD patterns to CP gaits, and we need to test the effectiveness of using these adaptive patterns as reference trajectories on the outcomes of rehabilitation and user comfort in future studies.