Multi-Action Knee Contact Force Prediction by Domain Adaptation

Most recent musculoskeletal dynamics estimation methods are designed for predefined actions, such as gait, and don’t generalize to various tasks. In this work, we address the problem of estimating internal biomechanical forces during more than one actions by introducing unsupervised domain adaptation into a deep learning model. More specifically, we developed a Bidirectional Long Short-Term Memory network for knee contact force prediction, enhanced with correlation alignment layers, in order to minimize the domain shift between kinematic data from different actions. Furthermore, we used the novel Neural State Machine (NSM) as a simulation platform to test and visualize our model predictions in a wide range of trajectories adapted to different 3D scene geometries in real-time. We conducted multiple experiments, including comparison with previous models, model alignment across action classes and real-to-synthetic data alignment. The results showed that the proposed deep learning architecture with domain adaptation performs better than the benchmark in terms of NRMSE and t-test. Overall, our method is capable of predicting knee contact forces for more than one action classes using a single architecture and thereby opens the path for estimating internal forces for intermediate actions, while the knowledge of the hidden state of motion may be used to support personalized rehabilitation. Moreover, our model can be easily integrated into any human motion simulation environment, which shows its potential in enabling biomechanical analysis in an automated and computationally efficient way.


I. INTRODUCTION
C OMPUTATIONAL methods that simulate human loco- motion along with the internal state of the musculoskeletal system can be essential for creating personalized reha- Iliana Loi and Konstantinos Moustakas are with the Department of Electrical and Computer Engineering, University of Patras, 26504 Patras, Greece (e-mail: loi@ceid.upatras.gr;moustakas@ ece.upatras.gr).
Evangelia I. Zacharaki was with the Department of Electrical and Computer Engineering, University of Patras, 26504 Patras, Greece.She is now with the Miller School of Medicine, University of Miami, Miami, FL 33136 USA (e-mail: exz187@med.miami.edu).
Digital Object Identifier 10.1109/TNSRE.2023.3345006 bilitation plans [1], developing clinical assessment tools [2], monitoring the movement of a patient post-operationally [3], etc.The performance of experiments simulating realistic conditions requires solving for a variety of environmental constraints accounting for interactions of the digital human with the 3D scene.Although, significant progress has been made in computer graphics research to create, with the help of machine learning (ML) or/and deep learning (DL), visually-realistic animations of digital human-scene interactions in real-time [4], such state-of-the-art simulation platforms have not been sufficiently exploited in human biomechanics research to improve performance and robustness of the simulators and to allow for a plurality of experiments under realistic conditions.Some efforts towards this direction have recently been made to utilize simulation outcomes from musculoskeletal modeling techniques [5], [6] in order to train ML/DL surrogate models that can help improve the performance of physicsbased simulators, especially in human biomechanical function estimation.In particular, many recent data-driven approaches focus on ML-based force prediction including the estimation of internal forces (i.e.knee contact forces) during a specific action, and usually this action being gait [5], [7].It should be mentioned that there are works that produce force estimations for various movements [8], [9], by re-training their models with kinematics and/or dynamic data from different actions, but are not offering any generalization potential to more than one actions within a single framework.More importantly, the utilized kinematic data are usually obtained through measurements or computations in simplified environments with limited human-scene interactions.
To address those limitations, we exploit a novel deep auto-regressive framework, the NSM [4], as a generator of multi-action synthetic motion.NSM allows to simulate goal-driven human locomotion with periodic and non-periodic motions and precise 3D scene interactions.Our aim was to integrate into such an advanced digital human motion simulation environment, a new methodology that estimates internal biomechanical forces, i.e. knee contact forces (KCF), during different tasks.
In particular, we developed a Bidirectional Long Short-Term Memory (BiLSTM) network that minimizes the domain shift between experimental data from different movements, aiming to provide an insight into the internal body state while generalizing to more than one actions simultaneously.For that purpose, our BiLSTM model is enriched with an unsuper-vised domain adaptation layer, called CORAL (COrellation ALignment) [10], that can be used for online estimation of forces also during action transitions.Moreover, the knowledge of the hidden states of a personalized digital human model may be potentially useful for rehabilitation purposes, such as post-surgery locomotion simulation.
In order to evaluate the effectiveness of our method to produce real-time predictions, we trained the proposed BiLSTM architecture on experimental data of two representative action classes (multi-speed gait and sit-to-stand) and integrated the trained models in the NSM environment for testing.Since NSM provides a versatility of kinematic data with various scene interaction tasks, our approach is easily extendable to other action classes, beyond gait and sit-to-stand (STS), given training data availability.Overall, our main contributions are summarized as follows: • We implemented a BiLSTM model for simulating internal biomechanical forces during two tasks by applying unsupervised domain adaptation, i.e. the CORAL method.We show that by minimizing the domain shift between experimental data from different movements, the model may generalize also in new similar actions for which it wasn't originally trained.
• By estimating knee contact forces for more than one actions within a single modeling framework, we provide a solution to the prediction of forces for intermediate actions (movement transition) that usually require more complex architectures.
• We integrate our model into the NSM environment and assess its capability to produce real-time predictions.The augmentation of NSM with joint forces' prediction allows to enhance the 3D virtual character with realistic physicsbased functionalities.

II. RELATED WORK A. Force Estimation by Machine Learning
Recently, researchers have turned to data-driven (machine learning) approaches for estimating human biomechanics, since they are more automated, require less parameterization and manual effort, and offer real-time solutions as well.Most works develop surrogate models for force estimation or prediction that focus on estimating medial and lateral knee contact (KC) forces [5], [7], [8], [9] or muscle forces in lower extremities [5], [9], [11] using a plethora of DL techniques, such as ANNs [7], [8], RNNs, fully-connected neural networks [9], and CNNs [5], [9], or ML algorithms, such as principal component regression (i.e. a regression analysis based on PCA) [9].
In respect to application field, ML-based solutions focus mainly in estimating tibiofemoral load data during gait [5], [7], [9], [11], sit-to-stand [9] or more rarely sport movements [8].Models were trained using raw data (marker motion data, ground reaction forces (GRFs), muscle electromyography (EMG), IMU signals) as well as derivative data from musculoskeletal analyses (e.g.KCFs).The models were validated by comparing the networks' estimations with musculoskeletal modeling calculations [5] and/or data from publicly available databases (e.g.Grand Challenge Competition [12]).Some works such as [7] examined whether GRFs affect the estimation capability of the models, and concluded that in the performed experiments, the knee loads estimated by omitting GRFs were similar to the ones produced when trained with GRFs.
All the aforementioned data-driven approaches focus on estimating joint or muscle forces during a specific movement (e.g.gait, squatting) or train different models to predict contact forces during more than one actions.Thus, they lack a mechanism that allows them to adapt the same model during the transition from one action class to another.Addressing the domain shift is the main differentiation of our current work, which envisions in the long-term to adapt predictions while performing different actions in real time.

B. Domain Adaptation
Domain adaptation is a subcategory of transfer learning that addresses the problem of knowledge transfer between two or more related domains with different distributions, by learning domain-invariant models from data [13].There are several domain adaptation techniques.Domain masks [14], [15] are utilized to enhance the performance of DL models by distinguishing domain-specific features from features that can be shared across domains.Another domain adaptation method is to apply a linear transformation to each domain to project the features to a common space [15], [16].Such domain adaptation methods were used in [17] and [18] in order to minimize the domain shift between the distributions of the kinematic data obtained from different subjects and consequently enhance the inter-subject accuracy of the proposed neural network.Specifically, in [17] a regression-supervised domain adaptation framework was developed to estimate EMG-based wrist kinematics from different subjects.In [18] a supervised domain adaptation technique was utilized as the input layer of a recurrent neural network (RNN) in order to linearly transform the input features and, thus, solve the domain shift during inter-subject gesture recognition based on EMG signals.
However, domain adaptation techniques that are supervised as the aforementioned ones, cannot be applied in cases where one of the two related domains lacks labeled samples [10].The latter case is quite common and necessitates the development of unsupervised domain adaptation methods.One approach is to learn a domain invariant subspace using both labeled source data and pseudo-labeled target data [19].To reduce error accumulation during learning due to inaccurate pseudo-labeling, a selective pseudo-labeling strategy was proposed [19] based on unsupervised clustering analysis by exploring the structural information underlying the target domain.Other unsupervised domain adaptation techniques include domain distribution alignment [10] and matrix rank embedding to promote both feature discriminability and transferability [20].A lot of state-of-the art works fall in this category such as [21], [22], [23], [24], and [25].In [21] an adversarial domain adaptation network was created for Electroencephalogram (EEG) classification, which both aligns the marginal distributions of different domains and aims for decreasing the sub-domain shift.Unsupervised domain alignment was also used in [22] for deep sleep staging, while an adversarial Fig. 1.A general overview of our BiLSTM architecture.Initially, we train our model with the source dataset x S t and without using the CORAL layers (Model A).Subsequently, we re-train our model (Model B) using a pair of vectors x S t , x T t measured at the current time frame t, using the same source dataset (x S t ) as during initial training.During re-training, we freeze the first two pre-trained BiLSTM layers in order to use them as feature extractors and we re-train the last two as well as a fully connected layer that acts as our predictor.Each domain adaptation layer (CORAL layer) produces a D * s vector that consists of the transformed source features.
domain-adaptive technique was developed in [23] to detect fall events of elderly patients using sensors during different device placement and configuration scenarios.Finally, in [24] an unsupervised domain adaptation method combined with a self-guided adaptive sampling scheme was used to account for instantaneous domain shift during classifier updates.The latter enhances the EMG feature learning across subjects during gesture recognition.
In this work, we performed unsupervised domain adaptation based on correlation alignment [10] in order to minimize the domain shift between experimental data stemming from different actions.Our model may be easily integrated into any motion synthesis framework, since CORAL is unsupervised, avoiding the need for labeled data in the target domain.More details on our implementation of domain adaptation are provided in section III-A.

A. Unsupervised Domain Adaptation
Considering that our target domain data is unlabeled (no KCFs provided), we introduce in our modeling framework an unsupervised domain adaptation technique, in order to enable the estimation of contact forces concurrently with motion synthesis in any virtual environment, like the NSM.Let's assume that the dataset in the source domain contains n S labeled samples and is denoted with i=1 , where x s i ∈ R d is the feature vector of sample i (d is the number of input variables) with corresponding label y S i .The dataset in the target domain, , consists of n T unlabeled samples with same number of input variables, x T j ∈ R d [13], [19].We used CORAL, an unsupervised domain adaptation method [10], to align the distributions of the source and target domain.This is performed by computing the covariance of the source and target features and linearly transforming the source data [10], which can be mathematically formulated as: where cov(D S ) and cov(D T ) are the covariances of the domain and target features, respectively.H S is a diagonal matrix with a small regularization parameter λ on its diagonal elements, which is set to 1.This is added to the covariance matrix of the source, to make it explicitly full rank.Matrix H T is the same matrix as H s but with its second dimension corresponding to the second dimension of the target matrix, D T .By applying a linear transformation A to the original source features, the distance between the covariances of the source and target domain is minimized.We used the following (Eq.2) residual distance as an evaluation measure of the effectiveness of the introduced CORAL layer: where C Ŝ is the covariance of the transformed source features, D * s [10].
We developed CORAL as a custom layer using functional Keras API [26].Similarly to other works [10], [15], our BiLSTM network is used as a representation learner, while the CORAL layers are introduced to align the extracted features, as illustrated in Fig. 1 (Model B).We initially train our model using available motion capture data without the CORAL (Model A in Fig. 1).Then, we introduce domain adaptation as the last two layers of our architecture and we re-train our model using the same (as in Model A) source data and a different dataset as the target.The domain shift can be due differences in movement pattern (action) or differences in the data generation process (real or synthetic).The different evaluation scenarios are described in section V.

B. Deep Learning Models
1) BiLSTM for Knee Contact Force Prediction: BiLSTM was introduced in [27] and it is an extension of the LSTM architecture that consists of an RNN with two parallel sequences of forward and backward feedback connections, used to remember previously parsed data and to prevent them from gradually vanishing during training.In this way, the model is exposed to both past and future information with respect to a specific time Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
In this work, we developed a deep BiLSTM network to predict medial and lateral knee contact forces based on human kinematics data.By the term "deep" we refer to a stacked BiLSTM architecture where the output of each BiLSTM hidden layer will be the input to the next BiLSTM hidden layer (Figure 1).As suggested by recent publications [29], [30], the stacked layered approach is considered to have better performance compared to single-layer approaches, since a plethora of hidden layers achieves higher levels of representations of time sequence data [29] and, hence its more useful for time series prediction as the problem discussed in this publication.
Our baseline BiLSTM network (Model A) consists of 4 BiL-STM layers, each containing 128 units, and a fully connected dense layer as the final layer producing 5 outputs -as many as the KCFs components we wish to predict.This architecture was designed based on experiments which showed that 2 or 3 BiLSTM layers are too few and produce high error values, while 5 or more result in almost identical results with our current 4-BiLSTM layer architecture.Fewer layers account for the use of less computational resources, hence less training time.As for the hyperparameters, each layer has a Rectified Linear Unit (ReLU) activation function.The adaptive moment estimation (Adam) [31] optimizer was utilized with a learning rate of 0.001, to ameliorate the problem of gradient descent getting stuck at local minima during training, and to accelerate the convergence of the learning process [30].The model was trained over 30 epochs with a batch size of 32 and converged after approximately 15 epochs.Moreover, the early stopping method was used to prevent the model from over-fitting.Table I provides a comparison between different hyperparameter values for our baseline BiLSTM (Model A) in terms of Normalised Root Mean Square Error during gait action.Smaller batch size lets the model process the training dataset in smaller portions at a time, while more epochs aid in exposing the model to the same data for more iterations, thus leading to better convergence and accuracy.As for the learning rate, which determines how often the model's weights are updated, it is observed from Table I that the default value of 0.001 offers a slight boost in performance.The BiLSTM model was developed and trained in Python Keras.
The input to our final model (Model B) is a pair of vectors x S t , x T t , where x S t is the source input vector and x T t is the target input vector.Each instance of source x S t ∈ R 2 and target x T t ∈ R 2 input data that our model parses is a two-dimensional vector with dimensions [t, k], where k is the number of joint angles measured at the current time frame t.As for the output, each instance of our source output data is a 2D vector as well, ŷS t ∈ R 2 with dimensions [t, l], that consists of the knee joint forces that our model predicts.In our case, each input vector will contain 8 joint angles, as much as the virtual avatar used for visualizing NSM's results has, and each output vector will have 5 KCFs.However, every LSTM, as well as a BiLSTM model has a 3D input of dimensions [N , n, f ] where N is the number of samples (subjects) parsed by the model at each iteration (i.e. during training N = batchsi ze, whereas N = 1 during inference), n is the number of time During training, the model was evaluated using the 3-fold-cross-validation scheme across all N subjects (also known as the leave-subject-out evaluation method), meaning that all instances in our dataset are split into 3 folds where the 2 3 are used as training data and the remaining 1  3 as test data.This process is repeated until each and every fold appears in the test data.An average prediction accuracy was then computed from the results of the three folds.
2) Artificial Neural Network Model (ANN_2): For comparison to the proposed BiLSTM architecture, we implemented a feed-forward ANN model (as an extension of previous work [7]) and trained it to predict KCFs beyond gait motion.This new ANN, which we will be referring to as ANN_2 [32], has 3 hidden layers instead of 2 and fewer neurons at each hidden layer than the original ANN [7], namely 121 instead of 400.Some network hyperparameters were also modified, i.e. the batch size was set to 256 and the learning rate to 0.0001, and the use of biases was enabled.The network was trained and tested in Python Keras [26].During training only joint angle measurements were used and GRFs.Similarly to BiLSTM, the ANN_2 model was evaluated with 3-fold cross-validation across all subjects (n = 54 for gait and n = 19 for the STS human motion capture dataset).More details regarding the utilized datasets will be provided in Section IV.

C. A Deep Auto-Regressive Model -The Neural State Machine
NSM [4] is a deep auto-regressive algorithm for goal-driven prediction of a virtual character's motion and interaction with Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.scene objects in real time.As illustrated in Fig. 2, NSM consists of two machine learning networks; i) the motion prediction network consisting of a prediction and an encoder module, which synthesizes the virtual character's pose in the current and future time frames based on the user's control signals and the character's state (input X t , at time frame t, i.e. the character's pose, trajectory data, goal position and orientation, action at the goal, the interaction scene geometry, etc.) and ii) the gating network, a fully connected network that is responsible for the transition between different action classes based on a subset of parameters of the current input, A t ⊆ X t .Overall, NSM takes as input the character's pose, trajectory, and goal data from the current frame t, and predicts those parameters for the next frame t + 1.Furthermore, this model enables the generation of various action classes -walk, run, sit, open, carry, climb, and idle -while supporting the automatic transition between them like sit-to-idle [4].
In this work, we integrated our BiLSTM architecture (more specifically Model B) to the NSM in order to perform real-time knee contact force prediction simultaneously with motion prediction, as illustrated in Fig. 2.More specifically, a subset of parameters from the output of NSM, i.e. the character's pose (predicted joint angles) are introduced to our BiLSTM (i.e.Model B) architecture in order to predict KCFs.We integrated our enhanced model (Model B) into NSM, rather than our baseline model (Model A), because the synthetic data produced by NSM follow a different distribution than the experimental data being available for model training.Another issue was that NSM does not provide force predictions and so we could not train a network solely on synthetic data from NSM.Consequently, before incorporating our model into NSM, we re-trained our network using real (human experimental) data as source and synthetic (virtual data from NSM) as target, minimizing the domain shift.The estimated KCFs are displayed in real-time in the NSM environment for visual assessment.

IV. DATASETS A. Gait of Variable Speed
The experimental gait dataset was obtained from previous work [7] and it contains gait data of 54 healthy subjects both young (mean age 22 years) and older (mean age 69.6 years) who were asked to perform gait trials of variable speed, i.e. speed increasing from 3 to 7 km/h, with an increment of 1 km/h and monitored with a 10-camera VICON system.The overall dataset consists of 4874 gait trials.The trajectory of 42 markers was recorded, which were placed on specific anatomical landmarks of the subjects' body, in order to capture the motion of the lumbar, hip, knee, ankle, and pelvis.
Since this experimental dataset contains physics-based simulations of body forces, it can be used as ground truth for training our BiLSTM model.However, the trained model cannot be directly applied in a virtual environment such as the one of NSM due to potential differences in the input data distributions.In order to mitigate the domain shift through domain adaptation, the target data space had to be determined.For this purpose, we created a synthetic gait dataset in the virtual environment by monitoring the movement of the virtual avatar in the NSM, while performing gait trials of various speeds in real-time.More specifically, the joint angles of the 3D character were recorded using C++ scripts, and the data were stored in Excel files.During this process, we provided user signals (goals) that would lead the 3D character only to the gait state avoiding transitions to other states.
As for the pre-processing, the KCFs were calculated through musculoskeletal modeling.Specifically, the markers' spatial trajectories and GRFs were used to extract joint kinematics (angles) and KCFs through musculoskeletal modeling processes carried out in OpenSim [33].Through this process, joint angles for lumbar extension/bending/rotation, hip flexion/adduction/rotation, pelvis tilt/list/rotation, knee angle, patella-knee-angle, ankle angle, and subtalar angle are computed, as well as six components of medial and lateral KCFs.Subsequently, these data are being transformed to dataframe format using Python Data Analysis Library (Pandas) [34].
In other words, our input data are stored in a 2D vector of dimensions [M g •W g , k g ], where M g is the number of trials for all subjects, W g , is the duration of each input gait instance and k g = 13 is the number of joint angles.Likewise, our output is a 2D vector of dimensions [M g • W g , l g ], where l g = 5 is the number of KCFs.In particular, as we mentioned earlier, we computed three medial (in the x, y and z axes) and three lateral (in the x, y and z axes) knee contact force components, but we consider only two lateral forces, since the lateral force in the x-axis, is practically negligible, thus, resulting in 5 KCFs in total.All time series are synchronized by resampling in order to have the same duration which is essential for temporal pattern comparison in time series forecasting problems.
Before introducing the data to our network we omit variables whose standard deviation is less than 10 −6 , as well as the patella-knee-angle, subtalar angle, lumbar extension, lumbar bending and lumbar rotation, since there is no correspondence for these two joints with the skeletal avatar of NSM.These variables are not included in the contact force prediction problem for the skeletal virtual character in the NSM application, i.e. the size of input data is reduced to k g = 8.All input and output data are normalized in the range [0, 1].More specifically, for the normalization of output data, scaling parameters were stored as part of the modeling framework and used to scale the predicted values back to their original range.Furthermore, we performed outlier detection and trimming by rejecting values that were higher than the 99% quantile of the data distribution or lower than the 1% quantile of the data distribution.The same procedure was followed to pre-process the corresponding synthetic gait dataset.

B. Sit-To-Stand
To evaluate whether our method generalizes to actions other than gait, we tested our model using a STS human motion capture dataset, which is part of the publicly available KIT Whole-Body Human Motion Database1 [35], [36].The KIT database, contains motion capture, auxiliary (e.g.external and internal forces), and anthropometric data as well as video recordings from 53 different subjects (16 female and 37 male) aged from 15-55 years, while performing a wide range of actions including environment and human-object interactions.Moreover, this database was recently enriched by data obtained from 2 subjects while performing 12 actions of bimanual daily household activities (cooking chores like peeling fruits and vegetables, mixing ingredients, pouring, etc., cleaning chores such as sweeping and more) [37].
The STS dataset extracted from KIT contains 266 trials of both left and right leg joint angle measurements from 19 subjects, which were recorded using an optical Vicon MX motion capture system with 56 markers covering specific anatomical landmarks of the whole body.Similarly to the gait motion case, this STS dataset will be used as the source dataset in our experiments, thus, knee joint forces were added through OpenSim Analyses as described in Section IV-A.Furthermore, a STS synthetic dataset was also created using the NSM framework in order to record the virtual avatar's joint angles while sitting on chairs with various geometries.This dataset was used as the target dataset in one of our experiments.
The experimental STS dataset was pre-processed following the pipeline described in [38], hence the same parameters, i.e. joint angles during pelvis tilt/list/rotation, hip flexion/adduction/rotation, lumbar extension/bending/rotation, knee angle, patella-knee-angle, ankle angle and subtalar angle, as well as internal knee forces, were extracted by motion marker's raw positions and GRFs via OpenSim's IK and JRA, respectively.During this movement, the joint angles of lumbar extension, lumbar bending, lumbar rotation and subtalar angle have very small (close to 0) values, thus, were omitted.In addition to the subtalar angle, the patella-knee angle was also omitted from the dataset, since the skeleton of NSM's 3D character does not have these two joints.
The input data x of the STS dataset are stored in a 2D dataframe of dimensions [M s • W s , k s ], where M s is the total number of trials for all 19 subjects, W s , is the duration of each trial and k s = 8 is the number of joint angles after omitting the aforementioned parameters.The output, y, is also a 2D vector of dimensions [M s • W s , l s ], where l s = 6 are the number of the KCFs.In contrast to the gait movement, here, the medial force in the y-axis is negligible and thus is omitted, reducing the number of the KCFs to l s = 5.Synchronization by resampling was subsequently performed, similarly to the gait dataset.Both input and output data were normalized by subtracting the mean and dividing by the standard deviation.Outlier detection and trimming was also performed in this case.Moreover, we followed the above procedure to pre-process the synthetic STS dataset.

V. RESULTS
We conducted three sets of experiments that are summarized in Table II).First, we evaluate our baseline model without the CORAL layers (Model A), by training it with either the gait or the STS experimental dataset (columns Exp1_Gait and Exp1_STS.The results are compared with methods from previous works [7], [32] utilizing the same datasets and crossvalidations setting. In all experiments the loss function used to train both BiLSTM and ANN models was the Mean Square Error (MSE) between biomechanical simulations (serving as ground truth) and model predictions.Moreover, to assess the obtained results we used the Normalised Root Mean Squared Error (NRMSE) [7] which, being scale-invariant, allows to compare errors across action classes and components of forces (medial, lateral).
Subsequently, we present the results of our baseline model when enriched with the unsupervised domain adaptation technique, i.e.CORAL layers [10] (column Exp2 in Table II).In this experiment, the first layers of the pre-trained (using the experimental gait dataset) Model A provide a representation learner mechanism, while CORAL is implemented within the last two layers of our architecture (Model B) to align source and target features.We then re-trained our model using the gait experimental dataset as the source and the STS experimental dataset as the target dataset.Finally, we tested our enhanced model (Model B) on the STS (target) data in order to assess whether a model originally trained on gait data can be used for KCF estimation during a different motion type (action) using a simple domain adaptation method, such as CORAL.
Finally, in order to assess whether our model can be integrated into a virtual environment for real-time motion synthesis, such as the NSM, we conducted a similar experiment as before, but using the synthetic gait and STS datasets (described in Section IV) as target domain (columns Exp3_Gait and Exp3_STS in Table II).More specifically, we trained Model B (i) using the gait experimental dataset as the source and the gait synthetic dataset as the target and (ii) using the STS experimental dataset as the source and the STS synthetic as the target dataset, respectively.Similarly to [32], we visualize the medial and lateral KCFs on the virtual character, so as to provide real-time force estimation feedback.In this experiment, we address a completely different problem than in Experiment 2 since we try to calibrate two datasets that have different distributions, yet they share the same action class, whereas in Experiment 2 we try to align data of different distributions and classes.

A. Experiment 1: Evaluation of Proposed BiLSTM
To assess our framework, we compared the performance of our Model A (without using any domain adaptation) with previous machine learning models in a knee contact force estimation scenario In Table III the average across folds Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE II SUMMARY OF EXPERIMENTS PERFORMED IN THIS STUDY TABLE III OF OUR BILSTM (MODEL A) MODEL AGAINST PREVIOUS MODELS IN TERMS OF NRMSE DURING
GAIT AND SIT-TO-STAND NRMSE of ANN and SVR models [7] are reported.compared models were trained with the same dataset, i.e. without taking into consideration motion-dependent variables (e.g.GRFs), and were also evaluated using the same leavesubject-out validation scheme.Subsequently, we conducted the same experiment by training our BiLSTM model (Model A) with the experimental STS dataset and compared our results against the ANN_2 model [7], [32], which was also trained and tested using the same STS dataset.The results are also reported in Table III (rows 6 and 7).
We observe that in both cases (gait and sit-to-stand) Model A performs better than previous models presented in [7] and [32].This positive result most probably is attributed to the ability of RNNs to generate predictions by exploiting temporal dependencies, something that is lacking in fully connected feed-forward networks like ANN.RNNs have a recurrent architecture that acts like a memory mechanism.Thus, unlike feed-forward networks, RNNs have the ability to process time sequences and to provide both estimations (of the same time frame t) or predictions (of the next time frame t + 1).Especially a BiLSTM can manipulate both past and future time dependencies due to its architecture (it contains both a forward and a backward sequence) and is usually preferred over conventional models and unidirectional LSTMs.

B. Experiment 2: Model Alignment Across Action Classes
The average NRMSE for all folds are reported in Table IV and illustrate how well our model performs with and without the use of the CORAL layers in the STS dataset.Lower NRMSE values indicate that the model estimates better new (unseen) data.As mentioned above, we apply the CORAL layers to our pre-trained model and re-train the last layers using the gait experimental dataset as the source and the STS experimental dataset as the target.Then, we test our model using the STS dataset.In order to evaluate the effectiveness of our architecture's domain adaptation (i.e.CORAL) layer, we computed the residual distance of source and target features (Eq.2) before and after domain adaptation.The domain shift before applying linear transformation A (see Eq.1) to the source features is E be f or e = 73739.36,while after linearly transforming the data (using A), it reduces down to E a f ter = 0.62, indicating that our implementation works as expected.
As illustrated in Table IV, our model produces more accurate predictions when using the CORAL-based domain adaptation technique, compared to not using it.Both Pearson's Correlation Coefficient and NRMSE values in Table IV show that predictions with CORAL (Model B) tend to be more correlated (higher R 2 ) and closer (smaller NRMSE) to groundtruth (experimental) measurements for all KCF components.
To explore whether the predictions obtained w/ or w/o domain adaptation, i.e.
ŷwD A produced by Model A and ŷwoD A by Model B are significantly different, we calculated a paired samples t-test at significance level α = 0.05.The null hypothesis was that the mean difference between the two sets of paired results is zero, hence we test if there is a statistically significant difference between the means of our models' predictions.As "pair", we refer to KCF predictions during STS or gait movement at the same time frame t.Moreover, we conducted the same t-test between the predictions of the proposed Model B ( ŷwD A ) and the ground truth, i.e. the experimentally measured KCFs during STS or gait (y).To sum it up, we calculated 4 paired samples t-tests: i) Model A vs Model B during STS movement to test if the improvement in the KCF prediction through our approach is statistically significant (the smaller p-values the better the new approach), ii) Model B vs Ground-truth STS data to understand whether the residual errors are significant (in this case the larger p-values the better our method), and same statistical tests for the gait data.The p-values were computed automatically, using built-in Python functions.What is important is that if the p-value is less than the level of significance α, then the null hypothesis is rejected indicating that the means of the compared results are statistically different.
The obtained p-values of paired samples t-tests are illustrated in Table V.All p-values in the first row are less than α (i.e.null hypothesis is rejected), indicating that the predicted KCF components during STS w/ or w/o domain adaptation are significantly different and thus highlighting the contribution of the proposed architecture using CORAL (Model B).The latter results were expected, since in this experiment both Model A and B are trained based on the gait experimental dataset (source).Therefore, a model solely trained on gait (Model A) would produce predictions that differ from the ones of a model with domain adaptation (Model B), which has the advantage of incorporating knowledge of the target, STS, dataset.Furthermore, the second row shows that the KCF predictions of Model B do not significantly differ from the STS ground-truth values (real human KCF data), thereby further supporting the validity of our approach.As for the KCF predictions in the gait dataset, we observe that the proposed model reliably reproduces ground truth (Table V, fourth row), and its predictions are not significantly different from the ones w/o domain adaptation (Table V, third row).To further support our findings the p-values between the predictions of Model B and the ones of ANN_2 during STS movement are less than alpha, meaning that there is a statistically significant difference between the means of the predictions of the two models, indicating our proposed model (Model B) performing better.Finally, in the last 3 rows of Table V, we provide the p-values for our 1 st Experiment.Hence, we test whether the predictions of Model A are significantly different from the ones of i) ANN and ii) SVR architectures that were proposed in [7] during gait movement, as well as the ones of iii) ANN_2 [7], [32] during STS action.As observed, in all cases the p-values are greater than α = 0.05, meaning that, although the predictions of Model A are better (smaller NRMSE) than the ones from previous frameworks, the difference is not statistically significant.

C. Experiment 3: Real-to-Synthetic Model Alignment
Our enhanced with domain adaptation model (Model B in Fig. 1) is integrated into the real-time auto-regressive motion prediction/synthesis framework to test the capability of our model to produce real-time predictions and to exploit the synthesized motion trajectories in order to forecast the contact forces required to achieve a desired goal.To do so, we retrained our model using either the experimental gait dataset as source and the synthetic gait dataset as target, or the experimental STS dataset as source and the synthetic STS dataset as target as illustrated in the 3rd column of II.Since NSM does not generate joint forces, we cannot train a network based entirely on synthetic data from NSM, and, thus, unsupervised domain adaptation (i.e.integration of CORAL layers) is necessary in order to minimize the domain shift between the distributions of real and virtual data that are of the same action class.Although we are not able to quantitatively assess the predictions in the virtual environment due to lack of "ground truth", we use the evidence obtained from the gait-to-STS experiments (Section V-B) and deduce that our implementation using CORAL improves predictions in the case of domain shift.order to qualitatively assess the performance of our framework in a human motion simulation we a visualization mechanism in NSM.Particularly, we visualized the medial and lateral KCFs on the digital human model while the character moves and interacts with virtual scene objects in real time.As shown in Fig. 3, we embedded a sphere on the left knee of the human model, using a cyan-to-red color scale to indicate the magnitude of the force (cyan/red corresponding to minimum/maximum values of the predicted KCF force, respectively).By observing Fig. 3, during the idle state the sphere is colorized cyan, since KCFs have the least possible value, whereas the sphere becomes red when the walking speed of the 3D character is increased.All of the above confirm our expectations and indicate that our framework is applicable in real-time scenarios.Moreover, we added two lines (green and red lines as illustrated in Fig. 3) on the knee contact points (force action points), which represent both the direction and the magnitude of each one of the forces, medial or lateral (i.e. the length of the lines change according to the KCFs magnitude).It is worth mentioning that in Fig. 3, the arrows depicted on the floor of the application, indicate the trajectory of the movement of the character, and they are part of the NSM virtual environment.Moreover, our model runs at approximately 60 frames-per-second (fps), pointing out the capability of our framework to produce realtime predictions.

VI. DISCUSSION AND FUTURE WORK
In this work, we developed a joint contact force prediction framework based on deep learning and unsupervised domain adaptation techniques.DL approaches such as the one proposed in this work, are surrogate models for conventional musculoskeletal and computational techniques.These Fig. 3. Visualization of the magnitude of medial and lateral KCFs during idle, walking, and walking with increased speed states (the term "running" is misused here).Green and red lines (indicated with a yellow box) represent the direction of the corresponding force, medial or lateral.
surrogate models have been trained based on widely used musculoskeletal tools like OpenSim and, thus, follow their limitations.Therefore, the benefit of machine learning models is not higher accuracy but reproducibility, robustness, and efficiency.As for domain adaptation, it has been widely used for giving the ability to machine and deep neural networks to learn shared features from multiple data sources.Our network architecture is inspired by works like [10], [13], and [15], which use the first layers of their models as representation learner mechanisms and then feed the extracted features to domain adaptation (CORAL) layers whose output feeds the last layer or layers of their neural networks providing classification or regression predictions.Specifically, we conducted three sets of experiments in order to evaluate different aspects of our proposed model.In the first experiment, we trained and tested our baseline model (Model A in Fig. 1) using human capture data from two different action classes, gait, and STS, and evaluated our model's predictions using the NRMSE measure.Our results as reported in Table III indicated that our framework performs well in both activity cases and it showed better results compared to previous similar methods.In this experiment, our goal was to assess our BiLSTM model (Model A) in order to utilize it as a "base" in our enhanced with domain adaptation Model B. Nevertheless, more validation is required, especially for the STS activity pattern for which the training dataset was relatively small (266 trials vs ∼4900 trials contained in the gait dataset).
The main aim of this work is to provide a model that can simulate the internal biomechanical forces during more than one tasks.To do so, we enriched our model with an unsupervised domain adaptation technique, the CORAL method, [10] in order to minimize the domain shift between data from different action classes.As a result, our model could eventually be trained offline to recognize a specific class (e.g.gait as source) and then could be applied to fit other classes (e.g.STS, stair ascending/descending, etc. as target) by adapting features during test time.Especially, the developed Model B (depicted in Fig. 1) whose frozen layers are trained using gait data, is able to produce valid predictions for any action when retrained as described in subsection III-B.1.Overall, we benchmark our method using the example of gait and sitto-stand due to the availability of such data, while this use is indicative and non-restrictive.
Our results in Table IV indicate that our model (Model B in Fig. 1) produces more accurate predictions on the target dataset when enriched with the domain adaptation layers, i.e. the CORAL layers than without them (Model A in Fig. 1).It is worth mentioning that both Models A and B were trained on the source (gait) dataset and tested on the target (STS) dataset, thus, the comparison was performed between a model that was enhanced with a transfer learning technique (Model B) and a model (Model A) that was trained with a completely different dataset (gait) than the one that was applied to (STS).However, if we compare Model B's results (Table IV) to the predictions of Model A when trained with the STS dataset (Table III rows 6 and 7), Model A performs better as expected, since Model A is cleanly trained on STS dataset, whereas Model B is fit to the same dataset via a transfer learning method.Overall, using this simple, yet effective unsupervised domain adaptation approach (CORAL), we create a network that can predict KCFs for more than one action class simultaneously (e.g.gait and STS) and since the predictions of two actions can be guaranteed, then our framework offers the ability to produce predictions for intermediate actions that usually require more complex and computationally expensive architectures.
Our third experiment was to integrate our model into a real-time motion synthesis framework, the NSM, in order to enhance it with joint force prediction along with motion prediction and synthesis, and test it in a real-time force estimation scenario.Moreover, since this motion synthesis framework can produce motion trajectory predictions, our model is given the ability to perform long-term forecasting of KCFs required to perform goal-driven actions.Our model runs at approximately 60fps and the knee contact forces of a virtual avatar were visualized at the same rate, while the character was moving and interacting in the virtual scene.It is worth pointing out that since NSM cannot generate joint forces we had no way to quantitatively assess the performance of our model.Furthermore, our model is capable of being easily integrated into any motion synthesis framework, since the implemented domain adaptation is unsupervised and thus applicable when target output data are not available.
We acknowledge that while our framework presents advantages over biomechanical modeling approaches in respect to computational speed, robustness and potential for extrapolation to out-of-sample distributions, it lacks precision in cases with limited samples, such as for patients with disease (e.g.osteoarthritis).Moreover, our modeling framework is flexible to address different clinical scenarios if enough data are available for proper estimation of the domain shift, but we should also note that adaptation is limited to specific impairments, e.g. in lower extremities due to total knee replacement, etc.Overall, we mainly envision the usability of our framework for the creation of models of a healthy state, that can later be adapted to the motions of impaired individuals through healthy-to-impaired state adaptation.We plan to investigate the latter application scenario in our future work.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Furthermore, as future work, we intend to further exploit the benefits of deep domain adaptation, such as in deep CORAL [39], where CORAL is embedded into a deep neural network, not as a layer, but as a differentiable loss function that acts as a nonlinear transformation that aligns the correlations of the source and target domains.Our model could also be applied to other joints of the lower or upper body without altering its architecture.In the case of hip or ankle contact forces estimation, we could re-train our models (both Model A and B) using the same gait dataset [7], i.e. the same joint angles as input, and compute the corresponding hip and ankle contact forces as output by repeating only the JRA process during musculoskeletal modeling in OpenSim for creation of the training data (used as ground truth).Finally, we would also like to try to incorporate physics-based constraints into our framework to generate more accurate results.

VII. CONCLUSION
In this work, we developed and trained a BiLSTM for predicting knee contact joint forces without using motion-dependent variables (GRFs, EMGs, etc.) but relying only on kinematic data.By integrating a simple, yet effective unsupervised domain adaptation technique, i.e.CORAL, our model is rendered able to predict KCFs for more than one action classes simultaneously and even open the path to estimating internal loads for intermediate actions, a task that usually requires rather complex solutions.Our results indicated that our enhanced model with CORAL (Model B) performs better than without using it in terms of NRMSE and t-test statistical analysis.Moreover, we integrated our framework into a deep auto-regressive algorithm for goal-driven motion synthesis, NSM, to i) test the real-time capability of our network and the ease to be incorporated in any motion synthesis system and ii) augment NSM with a model that simulates internal biomechanical forces.Since there is no robust method to evaluate the real-time predictions of our network in NSM execution, we provide a visualization mechanism to qualitatively assess the validity of our model predictions.Our work offers a simple solution that opens the path for ergonomically-adjusted motion estimation and physiology-aware simulation, which can be used as a tool for rehabilitation planning, as well as the prediction of human motion in rich environments with realistic scenes and character-scene interactions.

Fig. 2 .
Fig.2.Schematic diagram of our KCF prediction framework within the NSM[4].From the output of NSM, the virtual character's predicted pose, x t+1 (i.e.joint angles) is given as input to our BiLSTM (Model B) model to estimate KCFs ŷt+1 at the subsequent frame t + 1.
Manuscript received 11 August 2023; revised 13 December 2023; accepted 16 December 2023.Date of publication 19 December 2023; date of current version 16 January 2024.This work was supported by the EACEA (European Union's European Education and Culture Executive Agency)-Erasmus+ through the Project: EMMBIOME (Erasmus Mundus Master in Biomedical Engineering) under Grant 101082688.(Corresponding author: Iliana Loi.)

TABLE I HYPERPARAMETERS
COMPARISON FOR OUR BILSTM (MODEL A) MODEL IN TERMS OF NRMSE DURING GAIT MOVEMENTsteps that the model predicts and f is the number of input features, thus, our input is reshaped accordingly.Moreover, the source features produced by the first two BiLSTM layers of our model are introduced to the CORAL layers, which produce the transformed features D * For the BiLSTM layers of Model B, the same hyperparameters as Model A were used, i.e. 128-unit layers with ReLU activation function and Adam with 0.001 learning rate as the optimizer.Moreover, re-training was set to 20 epochs with 32 batch size.
s , a vector of dimensions [t, k].