Predicting Three-Dimensional Ground Reaction Forces in Running by Using Artificial Neural Networks and Lower Body Kinematics

This study explored the use of artificial neural networks in the estimation of runners’ kinetics from lower body kinematics. Three supervised feed-forward artificial neural networks with one hidden layer each were modelled and assigned individually with the mapping of a single force component. Number of training epochs, batch size and dropout rate were treated as modelling hyper-parameters and their values were optimised with a grid search. A public data set of twenty-eight professional athletes containing running trails of different speeds (2.5 m/sec, 3.5 m/sec and 4.5 m/sec) was employed to train and validate the networks. Movements of the lower limbs were captured with twelve motion capture cameras and an instrumented dual-belt treadmill. The acceleration of the shanks was fed to the artificial neural networks and the estimated forces were compared to the kinetic recordings of the instrumented treadmill. Root mean square error was used to evaluate the performance of the models. Predictions were accompanied with low errors: 0.134 BW for the vertical, 0.041 BW for the anteroposterior and 0.042 BW for the mediolateral component of the force. Vertical and anteroposterior estimates were independent of running speed (p=0.233 and p=.058, respectively), while mediolateral results were significantly more accurate for low running speeds (p=0.010). The maximum force mean error between measured and estimated values was found during the vertical active peak (0.114 ± 0.088 BW). Findings indicate that artificial neural networks in conjunction with accelerometry may be used to compute three-dimensional ground reaction forces in running.


I. INTRODUCTION
Three-dimensional ground reaction forces (GRFs) are fundamental to our understanding of human locomotion, and the preventions of injuries from high impacts and over-usage [1].However, their direct measurement in running is constrained to the use of instrumented treadmills in laboratory grounds.Such systems permit assessments under controlled conditions with comparable findings to over-ground running [2].Alternatives, such as force plates, are restricted by the collection of a finite number of consecutive steps per running trial.
The associate editor coordinating the review of this manuscript and approving it for publication was Sotirios Goudos .
Wearable solutions, such as triaxial force sensors, also enable three-dimensional recordings with high precision [e.g.3], but with limited utility due to their size and altered interface between the foot and the ground.
Indirect measurement of biomechanical loads with insole pressure sensors [e.g.4,5,6] has also been proposed paving the way for the monitoring of GRFs in the open field.Nonetheless, such commercial sensors show a number of performance limitations [7]- [9], while they still require advanced algorithms for the deduction of three-dimensional GRFs from pressure data [e.g.10,11,12].
With the development of low-cost, accurate and lightweight inertial motion capture units (IMUs), it has become feasible to record kinematics in any environment for prolonged time periods.Numerous studies have examined the validity of estimating GRFs from accelerometry in conjunction with biomechanical models [e.g.13,14,15], artificial neural networks (ANNs) [e.g.16,17,18], or mass-springdamper systems [e.g.19].
In view of the abovementioned methodological approaches, ANNs may return the most accurate approximations [e.g . 18].These computational models are based on the structure and function of the human brain [20] and they have being extensively studied for the past 30 years in a number of contexts; yet, their application in biomechanics have attracted researchers only in the last years [e.g.21,22,23], with only a few studies investigating ANNs for kinetic analyses [e.g.24,25].The effectiveness of such models, derives from the interconnection of simple processing elements (i.e.artificial neurons or units) capable of carrying out parallel computations and transferring information between each other [26].Yet, the architecture of such models contributes greatly to their deployment for engineering applications.
The models used in the present study were feedforward supervised ANNs with backpropagation.Feedforward implies that the input signal is fed to the network's neurons, which in their turn modify and forward the information through the model, until eventually a prediction is generated [20].Neurons in an ANNs are organised into a series of interconnected layers: an input, one or more hidden, and an output layer [20].For the ANN to generate an initial prediction, every connection between two neurons is given a random weight factor, usually not bigger than 1.Additionally, every neuron that does not belong to the input layer is assigned with a random bias term and an activation function; these parameters determine the output of each neuron.Activation functions are commonly employed to transform a linear input signal, enabling the network to model complex non-linear patterns [27].In more detail, every neuron in the input layer receives the signal and transfers it to all units of the first hidden layer; subsequently, receiving units of this layer sum the weighted information, add a bias term, and apply the activation function [28].Eventually, their output in transmitted in a similar fashion to the next layer(s), enabling the model to deliver a prediction about the data or use this information to take a decision.In supervised learning, to obtain the desired result, the known true output is also presented to the system.Whenever the ANN generates a prediction, the algorithm uses a loss function to compare it with the actual recordings, and calls upon a backpropagation algorithm [29] to improve the results by tuning the weight factors and bias terms associated with each neuron.This repetitive process of generating predictions and minimizing the loss function is usually terminated when a pre-determined set of training iterations or a predefined error is reached [29].Apart from the weights and biases, the network's performance may be additionally optimised by adjusting values of hyper-parameters: such variables include the type of activation function (e.g.sigmoid), number of epochs (i.e. one full cycle of iterations of the training set), batch size (i.e. the number of inputs to work through before updating the model's parameters), and dropout rate (i.e. to periodically exclude a random number of neurons to improve the performance of the remaining units) [30].Building a model able to generalise to unpresented input samples requires the dataset in hand to be split into training, validation and test sets.The objective of the training set is to allow the model to train by adjusting its parameters.Then, the validation set is called to provide an unbiased evaluation of the training process.Finally, the test set verifies the working accuracy and generalizability of the ANN.
To the extent of the authors' knowledge, the capacity of an ANN to predict all three GRF components in running conditions from kinematic input is not yet explored.Thereby, the objective of this research was to extend the work of previous authors who estimated vertical biomechanical loads in running [16], [18], to a three-dimensional analysis.Analyses were carried out on a public dataset of running trials as captured by motion cameras and an instrumented treadmill.Successful predictions of the employed ANNs may allow the presented methodological approach to be extended to open field applications with wearable IMUs.

II. METHODOLOGY A. PARTICIPANTS AND DATA COLLECTION
The analyses in this study were carried out on a public dataset of running biomechanics from twenty-eight regular professional runners with a training running volume greater than 20 km per week (age: 34.8 ± 6.6 years; height: 176 ± 6.7 cm; mass: 69.6 ± 7.6 kg; gender: 27 males), as recorded and presented by Fukuchi, et al. [31].Recruits did not report any neurological or musculoskeletal disorders that may affect their performance.
A combination of reflective markers and clusters were attached on the lower limbs and pelvis of each subject [as described in detail in 31].Three 30 sec running trials per participant, at increasing speeds (2.5 m/sec, 3.5 m/sec and 4.5 m/sec) were recorded.Kinematics and kinetics were logged with twelve motion capture cameras (Raptor-4, Motion Analysis, Santa Rosa, CA, USA) and an instrumented dual-belt treadmill (FIT, Bertec, Columbus, OH, USA), operating at 150 Hz and 300 Hz, respectively.

B. DATA PROCESSING
For the demands of this study, only the kinematics of the two shank clusters, each consisting of four markers placed on a rigid shell, were considered.Gaps in the trajectories of the markers were handled with rigid body fills in Vicon Nexus (Oxford, UK).Marker trajectories and force data were then filtered with a low-pass, second order, zero-phase shift Butterworth filter with cut-off frequencies of 20 Hz and 50 Hz, respectively [similarly to 32].Next, the markers' position of each cluster were averaged separately, and double differentiated with respect to time, resulting in the computation of the shanks' acceleration in three dimensions with respect to the laboratory-coordinate system.A threshold of 20 N [similar to 32] in the vertical component of the recorded GRFs was employed for the identification of gait events (foot-strikes and toe-offs).Accelerations and GRFs were both scaled to 100 samples from heel-strike to toe-off (100% of stance phase); GRFs were additionally normalised to body weight (BW).All data processing was done in MATLAB R2017b (Mathworks, Inc., Natick, MA, USA).

C. ARTIFICIAL NEURAL NETWORK
Three supervised feed-forward ANNs were developed in Python 3 (Python Software Foundation, Delaware, US) using the Tensorflow source-platform.Each one of these three networks was fed with a single component of the threedimensional acceleration signal and was dedicated to the prediction of the corresponding vertical, anteroposterior and mediolateral GRF components.
The hidden layer consisted of 10 neurons (n H 1−10 ), and utilised dropout as a regularization method and TanH as an activation function.Finally, the output layer had 100 linear neurons (n O 1−100 ) that generate GRF predictions scaled to 100 data points ( f1 , f2 . . .fn . . .f100 ).Root mean square error (RMSE) was used as loss function to compare predicted and measured GRF force-time waveforms (f 1 , f 2 . . .f n . . .f 100 ).Mean error of the estimation of the peak force was also computed.
To update the weights and biases of the neural networks, the ANNs made use of a backpropagation algorithm and the Adam (Adaptive Moment Estimation) optimizer [33] with an initial learning rate of 10 −3 , and β 1 and β 2 (i.e. the exponential decay rate for the first and second moments) equal to 0.9 and 0.999, respectively.
In view of the neural network modeling, the dataset was randomly split into training (16 subjects; approximately 4,300 stances), validation (6 subjects; approximately 1,430 stances) and test sets (6 subjects; approximately 1,430 stances), accounting for roughly 60%, 20% and 20% of the total sample size, respectively.To guarantee that this partitioning does not affect the predictive capacity of the ANNs, the dataset was randomly split and fed to the network thrice in total, and all the generated predictions are presented in this study (Supplementary Material).In order to ensure that the predictive ability of the model is subject-and speedindependent, all recorded stances of each training set were shuffled before being introduced to the ANNs.
A grid search was employed on the training set (16 subjects) to attain optimal values (from preset ranges) for the following hyper-parameters: number of training epochs (500 or 1000), batch size (64, 128 or 256) and dropout rate (0.2 or 0.5).For each combination of hyper-parameters' values, a leave-one-subject-out cross-validation (LOSO-CV) was carried out [34]: with a 16 subject training dataset, models were constructed on the stances of 15 subjects, and evaluated on the one subject that was left out of the sample.Subsequently, the mean and standard deviation of the RMSE was calculated from the 16 folds.Each feature was standardized with the population mean and standard deviation before the LOSO-CV process.The combination of hyper-parameters that returned estimates with the lower mean RMSEs was considered as the optimum.To demonstrate that the networks were able to generalize their predictions with the selected set of hyper-parameters' values, the generated models were evaluated using the validation set (6 subjects).
Consecutively, training and validation sets were merged into a single new training set (22 subjects; approximately 5,730 stances), and the acceleration inputs were again standardised and shuffled.A LOSO-CV on this data set was then performed to re-train the ANN model, and the validation errors were calculated.
The RMSE was used as a performance function to evaluate the predictions of the test set (6 subjects); in order to standardise the acceleration input signal (α 1 , α 2 . . .α n . . .α 100 ) of the test set stances, the mean values and standard deviations from the newly created training set (22 subjects) were used.Lastly, RMSEs were additionally grouped based on running speed, and one-way ANOVAs and a Welch test were contacted to examine the effect of running speed to the accuracy of the predictions; Shapiro-Wilk and Levene's tests were also carried out to test if the assumptions of the ANOVAs were met.Prediction errors (± S.D.) were finally computed at different peak locations (graphically displayed on Fig. 2) for TABLE 1. Validation error (± S.D.) and the test set RMSE (± S.D.) between measured and predicted GRFs.

TABLE 2.
Mean error (BW) the force peaks between measured and estimated GRF components.Force peak locations are graphically displayed on Fig. 2.
the different components the estimated GRFs.Statistical significance was set at p < 0.05 for all calculations.

III. RESULTS
The optimal hyper-parameters' values were different for each one of the three ANNs that were built to model the GRF components, as well as for the three distinct random dataset splits that were carried out (data not shown).The validation errors that were computed on the dataset of 22 subjects to quantify the models' performance, indicated that the networks' structural parameters were fine-tuned: the RMSE for the vertical component was on average equal to 0.146 (± 0.030) BW, while the calculations for the remaining two components returned even lower mean error values (TABLE 1, Validation error: 0.048 ± 0.009 BW and 0.047 ± 0.009 BW).
The average RMSEs of the GRF estimates of the test set (6 subjects) were comparable when compared to the validation error, demonstrating the networks' good generalizability (Table 1, Test Set: 0.134 ± 0.027 BW for the vertical, 0.041 ± 0.007 BW for the anteroposterior and 0.042 ± 0.006 BW for the mediolateral component).Grouping the predictions by running speed showed that mean RMSE and speed were positively correlated (TABLE 1).To determine if the differences in the predictions between the three speed conditions are statistically significant, three one-way ANOVAs were conducted (one for each force component); there were no significant outliers in the data set, whereas Shapiro-Wilk test of normality was used to affirm that the dependent variable (RMSEs) was approximately normally distributed for each speed condition.Levene's test confirmed homogeneity of variances for the vertical and mediolateral components of the force.There was no statistically significant difference among running speeds for the predictions of the vertical force component (p = 0.233).
Yet, findings for the mediolateral aspect of the GRFs were statistically significant between running speeds (p = 0.010); a Tukey HSD test showed that the difference exists between the 2.5 m/sec and 3.5m/sec (p = 0.043), and the 2.5 m/sec and 4.5 m/sec (p = 0.011) conditions (TABLE 1, in bold).Additionally, a Welch test demonstrated that there was no significant effect of running speed to the estimation of the anteroposterior element of the force (p = .058).
Predicted and measured GRF BW waveforms were averaged and plotted for all the stances of the test set (Fig. 2: all trials), and then again for each separate running condition (Fig. 2: 2.5 m/sec, 3.5 m/sec and 4.5 m/sec).As confirmed from RMSE (TABLE 1) and visual representation (Fig. 2) alike, predictions (dashed orange line) on the vertical and anteroposterior aspect of the reaction forces were highly precise for all running speeds.The mean difference of the force peaks between measured and predicted GRF components are presented in TABLE 2; generally, the ANNs were overestimating the peak of the vertical component (0.114 ± 0.088 BW), while underestimating the propulsive peak (Fig. 2, positive force) of the anteroposterior element (0.028 ± 0.022 BW).In regards to the mediolateral force, the estimates also bear a low mean RMSE error (TABLE 2: 0.042 ± 0.006 BW), while their graphical display shows a general overestimation of the first medial peak (Fig. 2, positive force; TABLE 2: 0.047 ± 0.034 BW).

IV. DISCUSSION
The present study aimed to compute the three-dimensional GRFs in running with the use of accelerometry and ANNs.An openly available motion capture running dataset of professional athletes performing on either competitive or elite level was used [31].The kinematics of the calves were used VOLUME 7, 2019 as an input in our models, while the generated predictions were compared to the recordings of an instrumented treadmill.Estimated waveforms (Fig. 2) and force peak errors (TABLE 2) were accompanied with low errors for all GRF components.
To date, several authors reported methods to approximate biomechanical loads in running [reviewd in 8,9]; for example, Wouda et al. [16] used inertial sensors at the lower legs and pelvis along with an ANN to estimate vertical GRFs with an RMSE less than 0.27 of BW.Billing et al. [10] used insoles with mounted hydrocell sensors and ANNs to predict all three components of GRFs in running with excellent results: mean absolute percentage errors (MAPEs) equal to 1.4%, 6.6% and 39.8% for the vertical, anteroposterior and mediolateral components, respectively.On the other hand, Clark et al. [14] presented a two-mass spring model that predicted vertical ground reaction forces in steady-speed level running with very low RMS errors (0.17 ± 0.07 BW) for different speed conditions (3 to 6 m/sec).Komaris et al. [15] considered the kinematics of the pelvis and thighs and estimated the vertical GRF in slow speeds, spanning from 2.5 to 4.5m/sec, with an overall RMSE equal to 0.14 ± 0.04 BW.Finally, Jie-Han et al. [18] used a single IMU attached on the shoe to estimate the vertical GRF at 2.2 m/sec, 2.5 m/sec and 2.8 m/sec with reported mean RMSEs between 0.015 and 0.017 BW; however, such low errors may be due to the adopted splitting procedure, which led to incorporating from the same subject in both the training and evaluation sets, and thus biasing the performance of the model.Besides, it is well-reported in the literature [e.g.36] that the LOSO is the most fitting cross-validation method to assess the efficacy and generalizability of a model to users not included in the dataset, especially for datasets composed by a limited number of subjects.
In the present study, the predictions of each force component (TABLE 1) were generally accompanied with lower or similar RMSEs than those presently reported in the literature.Furthermore, comparable differences between measured and estimated peak forces were in good agreement with values reported by other authors: mean error for the peak vertical force at 2.5 m/sec was equal to 0.096 ± 0.076 BW (TABLE 2), compared to 0.10 ± 0.155 BW reported by Jie-Han et al. [18].Yet, it should be stressed that these metrics are very dependent on the dataset being examined and no direct comparison should be made between dissimilar techniques on different Even though the collective use of accelerometry and machine learning in the prediction of vertical GRFs in running has been previously demonstrated [16], [18], ANNs in the present study were successfully used to approximate all three loading components at different running speeds.Additionally, the dataset employed for this analysis is considerably larger (28 subjects) and arguably more informative, when compared to the number of recruits from recent similar studies in the literature (e.g.eight subjects by Ngoh et al. [18]; seven subjects by Wouda et al. [16]).
The excellent predictive capacity of the learning models to measure all GRF components is demonstrated by both the low mean RMSEs (TABLE 1, 0.134 BW, 0.041 BW and 0.042 BW) and the graphical representation of the predictions (Fig. 2).Even though the approximation of the anteroposterior and mediolateral aspects of the force returned rather low average errors (TABLE 1, 0.041 BW and 0.042 BW) when compared to the vertical one (0.134 BW), it should be noted that this is also due to the low magnitude of those components when compared to the body weight of the running subject.Yet, both estimates of shear forces exhibit typical patterns in running conditions: for example, the anteroposterior force showed a characteristic biphasic behaviour with a double peaked braking phase [37], [38] and a transition from braking to propulsion occurring at approximately 50% of the total support time (Fig. 2: anteroposterior force, zero crossing).Even though the mediolateral force is characterised by a large intrinsic variability which hinders its standardised graphical display, the estimated waveforms (Fig. 2) exhibit typical double medial and double lateral peaks [37], [38] and a peakto-peak amplitude of approximately 0.13 BW.
Finally, predictions made by the ANNs were generally independent of running speed, with the sole exception of the estimation of the mediolateral force component at 2.5 m/sec (TABLE 1, 0.030 ± 0.003 BW) being significantly more accurate than the corresponding estimations at 3.5 m/sec (0.043 ± 0.009 BW,p = 0.043) and 4.5 m/sec (0.047 ± 0.010 BW,p = .011).

V. CONCLUSION
This is the first study that reports on the potential application of ANN modeling along with lower body kinematics for the estimation of all three GRF components in running.The predictions generated by the ANNs were accompanied with notably low errors, and were generally unrelated to running speed.With the development and application of wearable IMUs being on the rise, the developed model, in conjunction with body-worn sensors, may potentially lead to the widespread measurement of biomechanical loads in open field conditions and sport activities.

FIGURE 2 .
FIGURE 2. Predicted (dashed orange line) and measured (blue solid line) body-weight normalised GRFs (Vertical, Anteroposterior and Mediolateral) of the running trials of 6 subjects (approximately 1,430 stances) at three different speeds (all trials, 2.5 m/sec, 3.5 m/sec and 4.5 m/sec), and the mean of all running speeds.