3D Knee and Hip Angle Estimation With Reduced Wearable IMUs via Transfer Learning During Yoga, Golf, Swimming, Badminton, and Dance

Wearable lower-limb joint angle estimation using a reduced inertial measurement unit (IMU) sensor set could enable quick, economical sports injury risk assessment and motion capture; however the vast majority of existing research requires a full IMU set attached to every related body segment and is implemented in only a single movement, typically walking. We thus implemented 3-dimensional knee and hip angle estimation with a reduced IMU sensor set during yoga, golf, swimming (simulated lower body swimming in a seated posture), badminton, and dance movements. Additionally, current deep-learning models undergo an accuracy drop when tested with new and unseen activities, which necessitates collecting large amounts of data for the new activity. However, collecting large datasets for every new activity is time-consuming and expensive. Thus, a transfer learning (TL) approach with long short-term memory neural networks was proposed to enhance the model’s generalization ability towards new activities while minimizing the need for a large new-activity dataset. This approach could transfer the generic knowledge acquired from training the model in the source-activity domain to the target-activity domain. The maximum improvement in estimation accuracy (RMSE) achieved by TL is 23.6 degrees for knee flexion/extension and 22.2 degrees for hip flexion/extension compared to without TL. These results extend the application of motion capture with reduced sensor configurations to a broader range of activities relevant to injury prevention and sports training. Moreover, they enhance the capacity of data-driven models in scenarios where acquiring a substantial amount of training data is challenging.


I. INTRODUCTION
W EARABLE lower-limb joint angle estimation using a reduced inertial measurement unit (IMU) sensor set facilitates the rapid and cost-effective assessment of sports injury risk [1], [2], [3], [4], [5], motion capture [6], [7], [8], [9], and improvement of motion techniques [1], [10], [11].The IMU is a commonly used, lightweight, and affordable sensor that is often integrated with machine learning models to enable kinematic estimation [6], [10].Traditional wearable motion capture systems often necessitate one IMU sensor per lower-limb segment to capture the 3D angles of each joint, resulting in the use of numerous sensors [12], [13].Karatsidis et al. [12] employed a wearable system consisting of 17 IMUs.Utilizing the acceleration signals obtained from IMUs, they reconstructed the kinematics of the anatomical segments.This full IMU configuration can be intrusive, timeconsuming, expensive, and susceptible to errors such as sensor swapping during mounting [6], [7].In contrast, the reduced IMU sensor configuration offers advantages by placing IMU sensors on a subset of the total lower-limb segments [3], [14].This configuration enhances user comfort, reduces setup time, mitigates wear-related difficulties, and decreases system costs compared to the full IMU configuration [1], [3], [6], [12].One significant application of the reduced sensor configuration is sports performance monitoring.Athletes and fitness enthusiasts prefer using fewer IMUs, which provides greater freedom of movement and facilitates long-term daily monitoring [15].This setup allows individuals to receive real-time information about their joint angles and movement techniques, giving them the ability to make adjustments to prevent injury risk and optimize performance [2], [15].Furthermore, emerging human-computer interaction also benefits from a reduced sensor configuration [6].This configuration enhances the convenience of interaction and enables the capture of diverse user motion poses, thereby improving the overall user experience [7], [8], [9].The most widely tested and practical sensor placement for the reduced IMU sensor configuration entails affixing three IMUs to the pelvis and shanks, utilizing one IMU per segment [1], [3], [6], [7], [8], [14], [16].The prevalent methodology for kinematic estimation modeling involves a deep-learning approach that correlates IMU sensor data with joint angles [10].Compared to physics-based methods, deep learning methods are favored for their superior accuracy and ability to extract sufficient information from fewer sensors [1], [2], [6], [7], [8], [9], [10], [11], [13], [16], [17].
The vast majority of existing research was implemented in only a single movement activity, typically walking [1], [2], [18], [19], [20], [21], [22], [23], [24].Mundt et al. [1] trained a long short-term memory (LSTM) neural network to estimate 3D joint angles of the hip, knee, and ankle joints during selfselected speed-level walking.Hossain et al. [2] proposed a modular deep-learning model to estimate sagittal plane hip, knee, and ankle joint angles during various walking conditions (overground, treadmill, slope, and stair).Semwal et al. [19] approximated human gait trajectories using an LSTM model trained on human gait joint angle data generated in Opensim simulations.Given the inherent reliance of data-driven models on the training set, this model was not suitable for dynamic activities beyond gait.Semwal et al. [20] estimated joint trajectories during six walking-type activities: brisk walking, normal walking, very slow walking, moderate walking, jogging, and brisk walking.The differences in data distribution for these walking-type activities are minimal, so it is not possible to demonstrate the performance of the algorithm in different non-walking activities.Semwal et al. [22] proposed a personalized LSTM-CNN model trained on a feature set consisting of anthropometric parameters and walking speed to estimate the sagittal plane angles of hip, knee, and ankle joints.However, this model can only predict one-dimensional angles for each joint, and its accuracy significantly benefits from the highly periodic nature of walking motion.Although walking is a crucial movement in gait analysis and rehabilitation of impaired gait, various non-gait dynamic movements associated with sports, exercise, and rehabilitation also have a significant impact on injury risk and outcomes of therapeutic interventions [25], [26], [27], [28].However, the feasibility of capturing joint angles using a reduced IMU set during complex and highly dynamic non-gait movements remain relatively unexplored.In contrast to the strong periodicity observed in walking, non-gait dynamic activities are characterized by robust non-periodic patterns, which may lead to reduced observability of joint angles under a reduced sensor configuration [6], [7], [17], thus posing challenges to accurate estimation.Huang et al. [6] proposed a bi-directional recurrent neural network that reconstructed the quaternion of all body segments using six body-worn IMUs.The model was validated on five user-defined uncertain motion categories, including controlled motion of the arms or legs, locomotion, full-body activities, and interaction tasks with objects.It achieved an angular error of 17.54 • across these various movements [6].
However, it has not been tested in various common activities related to sports injury risk and exercise.Therefore, we proposed achieving three-dimensional knee and hip joint angle estimation with a reduced IMU sensor set in five nongait activities, including yoga, golf, swimming, badminton, and dance.The five non-gait activity data we collected are highly relevant to global physical health and injury prevention and cover a range of activity levels from stationary to light, moderate, and vigorous activity [28], [29], [30], [31].Additionally, we introduced an activity-aware hierarchical model to implement the simultaneous recognition of motion patterns and joint angle estimation across these five activities.
Another factor that limits the extension of joint angle estimation with reduced sensor configurations to a wider range of activity types, is the accuracy decline when deep-learning models test with unseen activities [3], [32], [33], [34], [35].The process of collecting a substantial marker-based motiontracking database for every unseen activity is time-consuming and costly [6], [36], particularly for dynamic activities characterized by higher variability.To our knowledge, current research on reduced sensor configuration has not explored improving the model's generalization ability to new activities.To tackle this challenge, we proposed a transfer learning (TL) approach that transfers the knowledge obtained from joint angle estimation of the source domain (known activity) to the target domain (unseen activity) [35], [37], [38].TL effectively enhances the model's generalization to the unseen domain by leveraging the common knowledge of the source domain model and joint angle estimation [37], [39], [40].In various research fields, TL has shown efficacy in reducing the required size of datasets from unseen domains.Zhang et al. [34] employed TL with existing datasets covering various movements to estimate joint torque for a new movement.They pre-trained the LSTM network to learn structural similarities between movements.Ameri et al. [38] utilized existing training data before an electrode shift and a TL method with pre-trained convolutional neural networks to address the issue of insufficient training data post-electrode shift.Although TL has been extensively explored in cross-domain activity recognition [35], [37], [40], [41], its performance in cross-activity scenarios for joint angle estimation using a reduced sensor configuration remains unclear.In addition to considering the model's generalization ability across diverse activities, we also emphasized a practical scenario where the model extends its generalization from semi-static activities to dynamic activities.Semi-static activities exhibit lower movement intensity than dynamic activities, exemplified by yoga poses [17].This characteristic facilitates ease of execution for subjects and the quality of data obtained is relatively high, making semi-static activities suitable as source domain data sets.We hypothesized that the semi-static yoga pose dataset contains some similar features to the dynamic activity dataset.Therefore, the model's generalization ability from semi-static datasets to unseen dynamic activities can be enhanced through the TL method.
The primary contributions in this work are as follows: • This study is the first to implement 3D knee and hip joint angle estimation using a reduced IMU sensor configuration during five non-gait activities, including yoga, golf, swimming, badminton, and dance activities.
• This is the first study to extend the generalization ability of joint angle estimation models to unseen activities.The proposed TL method utilizes a dataset of the known activity to transfer the general knowledge of mapping correlations between IMU sensing data and joint angles into a model of unseen activities.In particular, this method could reconstruct three-dimensional joint angles during dynamic activities using datasets derived from semi-static yoga poses.In Section II, this study conducted the estimation of 3D knee and hip angles during five non-gait activities using a reduced IMU sensor configuration.An activity-aware-based hierarchical model with artificial neural networks was proposed to implement the estimation, enabling simultaneous recognition of motion patterns and joint angle estimation.This method is effective for estimating angles in known activities.In addition, we also proposed a novel TL method to enhance the model's generalization ability to unseen activities, especially unseen dynamic activities (Section II).We tested the TL method on twenty dual-activity transfer pairs to thoroughly assess its performance (Section II).The data collection process is elaborated in Section II.Section III presents the estimation results of the two proposed methods.In Section IV, we provided a comprehensive discussion of various aspects of the results, clarifying the strengths and limitations of the methods.

II. METHODS
To expand the application of motion capture with reduced sensor configurations to various activities, our study proposed two methods to estimate joint angles with a reduced sensor configuration in known and unseen activities, respectively (Fig. 1 (a)).

A. Activity-Aware-Based Hierarchical Model for Known Activities
An activity-aware-based hierarchical model (AAHM) was proposed for joint angle estimation of known activities in the training set.This model can also realize simultaneous motion pattern recognition and joint angle estimation, where motion pattern recognition provides additional activity-type information.Simultaneous motion pattern recognition and joint angle estimation may be more critical in multi-sport performance and risk monitoring [20], [42], [43], [44], [45].Activity recognition can offer additional insights into the type of activity, facilitating the computation of motion-related metrics or precise control of risk thresholds for different activities [13], [34], [42], [46], [47].AAHM can customize the joint angle estimation process by identifying specific activity types, especially in activities with multiple movement patterns.The AAHM consists of two stages (Fig. 1 (b)).In the first stage, an activity classification model is employed to recognize the specific activity types of the input samples.Then, samples are directed to the corresponding estimation model in the second stage based on the recognized activities.The second stage consists of five separate estimation models, each dedicated to a specific activity type and responsible for joint angle estimation (Fig. 1 (b)).By decoupling motion pattern recognition from the joint angle estimation process, AAHM incorporates motion pattern information as an additional constraint to improve joint angle observability.This is particularly beneficial for reduced sensor configurations where the information required for accurate joint angle estimation is limited [6], [9], [13], [47].
To reduce the complexity of the AAHM, we used a support vector machine (SVM) instead of a deep neural network as the first stage of the model [47], [48].Shallow machine learning techniques, including SVM, have a proven track record of effectively training accurate activity classification models [43].We also tried other classification models in the first stage, such as k-nearest neighbor and random forest.Moreover, we noticed that note that the performance of the first stage has a minimal impact on the joint angle estimation of the second stage, which allows us to focus on selecting a simpler classification model without affecting the overall joint angle estimation accuracy of AAHM.We trained separate estimation models corresponding to each activity for the second stage of AAHM.The separate estimation model consists of two LSTM layers with 128 units, two fully-connected layers with 64 neurons each, and a dropout layer.
We compared the estimation accuracy between the AAHM and the approach of directly applying the five separate estimation models corresponding to each activity.The latter approach assumes 100% accuracy in activity classification for the first stage, effectively eliminating the influence of misclassification on the joint angle estimation.In this way, we can evaluate the benefits of incorporating activity awareness in the AAHM and determine the extent of improvement in joint angle estimation achieved through this hierarchical model.The activity classification model and separate estimation models in AAHM were trained and validated using the leave-one-subject-out (LOSO) cross-validation method.In this method, the activity data from one subject were used for testing, the activity data from another subject were used for parameter validation, and the activity data from the rest subjects were used for training.This method ensures that the data from the same subject do not appear in the training and test datasets, which is preferable to standard n-fold cross-validation [37].

B. Transfer Learning for Unseen Activities
To improve the model's generalization ability to unseen activities, we proposed a novel TL method that utilizes a dataset of known activities to transfer the common knowledge of the correlation mapping between IMU sensing data and joint angles to a model of unseen activities.

1) Pre-Trained Source Model on Semi-Static Yoga Poses:
Initially, we required a pre-trained source model, which is exclusively trained on the inertial data of the semi-static yoga poses (Fig. 1 (c)).The pre-trained source model consists of two LSTM layers with 128 units, two fully-connected layers with 64 neurons each, and a dropout layer.The output layer is responsible for generating the joint angle estimations.The source model was trained using the LOSO cross-validation method.This method ensures rigorous evaluation of the model's performance across different subjects.

2) Knowledge Transfer Towards the Target Dynamic Activity
Model: A TL technique was proposed to enhance joint angle estimation in dynamic activities, such as golf, swimming, badminton, or dance, even with a small dataset specific to each dynamic activity (Fig. 1 (c)).Note that the source model can transfer knowledge to only one dynamic activity model at a time.
According to the conclusions of Yosinski et al. [39], the initial layers of a neural network primarily learn generic features that apply to a wide range of tasks.As the network becomes deeper, subsequent layers specialize in learning taskspecific features.The pre-training of the source domain model was performed using an abundant dataset of semi-static yoga poses.This allowed the model to learn to estimate joint angles from IMU data.However, since yoga movements primarily involve calibration actions on the lower limbs, although they directly incorporate the fundamental mapping relationship between IMU data and joint angles, they lack the specific motion patterns characteristic of dynamic activities (Fig. 1 (c)).To address this issue, replacing the source model's layers close to the output layer is generally recommended [34].Then, the pre-trained model can be integrated into entirely new models for each dynamic activity (Fig. 1 (c)).
In our TL method, We obtained all layers preceding the dropout layer from the pre-trained source model and transferred them to the target model.These layers contain generic features relevant to the joint angle estimation task (Fig. 1 (c)).The target model was assembled by integrating the transferred layers with a newly introduced fully-connected layer containing 64 neurons, a dropout layer, and an output layer.During training, the weights of both the transferred layers and the newly introduced layer were updated using a minimal dataset of dynamic activity.This fine-tuning technique plays a crucial role in the TL framework.It automatically minimizes the domain discrepancy between the datasets, enabling the model to adapt to the new target activity with limited data.
3) Subject-Independent Training: The target model for the dynamic activity was also trained using the LOSO method.As a result, the TL-based model became subject-independent since its pre-training, knowledge transfer, and fine-tuning processes did not rely on any dynamic-activity data from the tested subjects.To achieve optimal training performance, the hyper-parameters of the models were tuned.The learning rate during the training process was adaptively adjusted based on the model's accuracy variation on the validation set.In the later stages of model training, a lower learning rate was utilized to avoid getting stuck in local optima and to ensure more stable and incremental changes in the trainable parameters.
4) Validation Strategies: We implemented two methods for joint angle estimation in dynamic activities: TL and a method denoted as NoTL, which directly applies the source model trained on a large amount of semi-static yoga pose data to the target dynamic activity.We then compared the performance of these two methods.Each dynamic activity served as a target domain in our study, resulting in four transfer tasks: transfer from yoga to golf, transfer from yoga to swimming, transfer from yoga to badminton, and transfer from yoga to dance.
To evaluate the methods, we used the LOSO crossvalidation method.The training set consisted of yoga activity data from nineteen subjects for the NoTL and TL methods.In addition, 20% of the dynamic activity data from the nineteen subjects were set aside for the transfer and fine-tuning process in the TL method [49].This process was repeated for all twenty-one subjects.The average performance of the models across the twenty-one iterations was reported as the final result [37].In addition to transferring knowledge from the semi-static yoga poses to the four dynamic activities, we fur-Fig.2. Subject instrumentation layout with each orange box representing one IMU.Three IMUs were attached to each subject on the pelvis and shanks, with one IMU per segment.Four 3-marker clusters were used for motion tracking to calculate the ground truth for joint angles placed on each thigh and shank, respectively.Each shank IMU was fixed together with a 3-marker cluster.
ther explored transfer pairs involving different activities as the source and target domains.As a result, we obtained twenty transfer pairs for validation, considering the five activities in total (5 × 4 = 20).These transfer pairs allowed us to thoroughly investigate the effectiveness of knowledge transfer between various activity domains and assess the model's generalization across different types of movements.
In addition to the LSTM network as the basic network for the proposed AAHM and TL methods, we also explored the application of transformer networks.The transformer consists of an encoder and a decoder, both of which are composed of stacked modules based on self-attention [50].The bidirectional encoder representations from transformers (BERT) contains only one encoder and has been widely accepted for tasks such as text classification or named entity recognition [33].Our transformer network is based on the BERT architecture and consists of an embedding layer, a positional encoding layer, and a stack of encoder layers.Each encoder layer comprises multi-head self-attention, position-based feedforward networks, residual connections, and layer normalization.Following this is a flattening layer and two fully-connected layers with 128 and 12 units, respectively.To prevent overfitting, a dropout layer was introduced between two fully-connected layers.We set the number of attention heads to 8, the number of encoder layers to 2, and the number of units in the hidden layer of the feedforward network to 128.The configuration of the transformer network in this study is very similar to a recently published study [33].

C. Data Collection
1) Subjects: Twenty-one subjects (12 males and 9 females; age: 22.8±0.8;height: 1.71±0.06m; weight: 60.2±8.2kg) with no history of musculoskeletal disorders were recruited to participate in this study.All the subjects were healthy and given informed consent following the Declaration of Helsinki [51].
2) Markers and IMUs: Twelve reflective markers were placed on 12 anatomical landmarks to define the segments: left and right ilium anterior superior, left and right ilium posterior superior, left and right femur lateral epicondyle, left and right femur medial epicondyle, left and right fibula apex of lateral malleolus, left and right tibia apex of medial malleolus.Four 3-marker clusters were placed on each thigh and shank for motion tracking, placed on each thigh and shank, respectively (Fig. 2).The marker trajectories were captured using a ten-camera optical motion capture system (Vicon, Oxford Metrics Group, Oxford, U.K.).Three IMUs (MTw, Xsens, Netherlands) were securely strapped to the subject, with one IMU per segment on the pelvis and shanks.To reduce the influence of soft tissue and alleviate the occlusion of optical marker points, the shank IMUs were placed on the lateral side near the ankle.Although there was no rigid orientation requirement for the shank IMU in each subject, the general orientation of the shank IMU for all subjects was toward the lateral shank.The pelvis IMU was placed at the midpoint between the left and right Ilium Posterior Superior (LIPS and RIPS) on the back of the pelvis.This stable positioning indicated the subject's facing direction during the initial static phase of data collection.Each IMU outputs three-axis acceleration values and three-axis angular velocity values.Each shank IMU was affixed together with the 3-marker cluster above (Fig. 2).The IMUs and the optical motion capture system operated at a sampling rate of 100 Hz.The IMUs and optical motion capture system were electronically synchronized through a cable before starting data acquisition.
3) Experimental Procedure: Before experimenting, the researchers instructed the subjects to familiarize themselves with the activities they would be performing.Researchers also provided training on various dance and yoga movements to ensure the subjects were comfortable with the required actions.The data collection began with the subject performing a 5-second static neutral posture calibration, maintaining an upright stance with feet aligned to the marker line on the ground [52].This ensures that all subjects are facing the same direction at the beginning of the experiment, thereby increasing the invariance of IMU data to human human-facing direction for improved model training and generalization [6].Unlike traditional physics-based motion capture methods, this study omitted the sensor-to-segment calibration.The reason for this decision is that deep learning models are susceptible to random errors introduced by variations in sensor placement orientation during training, ultimately enhancing the model's robustness for subject-independent applications [36].Furthermore, it aligns with the common practice in deeplearning-based joint angle estimation studies, which typically avoid sensor-to-segment calibration [1], [2], [6], [7], [8], [9], [10], [11], [13], [16], [17].
The subjects performed various yoga movements for the yoga activity, mainly including Phantom Chair Pose, Dragon Pose, Warrior I Pose, and Warrior II Pose (Fig. 3).Each standard yoga pose was held for approximately 8 seconds to simulate the actual yoga exercise.The subjects were asked to swing an actual golf club at different amplitudes and speeds for the golf activity.During the swimming activity, subjects were required to simulate freestyle or breaststroke strokes while seated.The chair was diamond-shaped and the subject was seated on one corner of the diamond, allowing for a wide range of leg motion without contact with the chair.It was ensured that the subject's feet and legs remained in the air and did not have any contact with the ground or external supports.The pelvis (hips) functioned as a pivot point, facilitating leg movement.Additionally, subjects were permitted to support their hands backward on the two corners of the chair.This was also to meet the challenge of the common subject completing the breaststroke kick with only pelvis support.The subjects were asked to imitate their regular movements for the badminton activity, including swinging the racket in various directions and with different intensities.As a result, the magnitude and intensity of the activity varied among the subjects.The dance activity involved performing five distinct fitness dance movements, some selected from Zumba routines.The duration of each activity was two minutes.The reference knee and hip joint angles were calculated using the optical motion capture system and Visual3D software (C Motion, MD, USA).
4) Pre-Processing: The Vicon-captured optical data was processed in Visual3D (C-Motion, MD, USA) following the CAST procedure [53] to calculate the ground truth joint angles.This optical motion capture system determines ground truth joint angles through the following steps: a) calibration of the constant transformation between marker cluster coordinate systems and segments' anatomical coordinate systems during the initial N-pose (upright stance); b) tracking segment movement using three-marker cluster coordinate systems during activities; c) calculating relative Euler angles from the proximal segment to the distal segment in a flexion-abductionrotation sequence.Additionally, joint angles during the initial static N-pose period are regarded as zero, so the offsets during this period are removed [54].Specifically, the anatomical coordinate systems of segments are established using markers affixed to bony landmarks during static calibration.Concurrently, three-marker clusters are utilized to monitor segment movement, assuming a rigid connection with the clusters [53], [55], [56].
The raw time-series data from each IMU, including three-axis acceleration and three-axis angular velocity, were used as inputs for the data-driven models.The data was transformed into time slices, with each time slice consisting of three hundred time steps and eighteen features.The min-max normalization technique was used to normalize each axis of acceleration and angular velocity and normalize each dimension of the output joint angles.

D. Data Analysis
The performance of the TL method and AAHM was evaluated using the root-mean-square error (RMSE), which is calculated as the difference between the ground truth and the estimated joint angles.We calculated the RMSE for each subject and determined the average RMSE across all subjects.A paired t-test was performed on the RMSE values of the NoTL and TL methods with a significance level of p = 0.05.For the classification stage of AAHM, supplementary metrics such as accuracy, sensitivity, specificity, and F1-score were employed for performance evaluation [43].

III. RESULTS
The proposed TL method improved estimation accuracy for swimming, badminton, dance, and golf activities in inter-activity generalization scenarios with limited target training data (Fig. 4 (a)(c)(e)(g)).If the basic network was LSTM, the RMSE accuracy improvements brought by TL ranged from 3.7 • to 23.6 • in knee flexion/extension and from 3.2 • to 22.2 • in hip flexion/extension (all p < 0.0001).Specifically, when the model generalized from yoga to swimming activity, the TL method significantly outperformed the NoTL method (Fig. 4 (a)).For knee angle estimation, TL achieved RMSE accuracy improvements of 22.2 • , 2.4 • , and 1.4 • in flexion/extension, adduction/abduction, and internal/external rotation, respectively.For the hip joint angle estimation, TL also achieved RMSE accuracy improvements of 23.6 • , 3.6 • , 10.4 • in flexion/extension, adduction/abduction, and internal/external rotation, respectively (Fig. 4 (a)).When the model generalized from yoga to badminton activity, the TL method significantly outperformed the NoTL method.TL resulted in improved RMSE accuracy by 4.3 degrees in knee flexion/extension and 4.9 degrees in hip internal/external rotation (Fig. 4 (c)).When the model generalized from yoga to dance activity, the TL method significantly outperformed the NoTL method, resulting in accuracy improvements of 4.8 • in knee joint flexion/extension.For hip angle estimation, TL also brought accuracy improvements of 5.3 • in flexion/extension and 2.6 • in adduction/abduction (Fig. 4 (e)).When the model generalized from yoga to golf activity, the TL method significantly outperformed the NoTL method, resulting in accuracy improvements of 4.5 • in knee joint flexion/extension.For hip angle estimation, TL also brought accuracy improvements of 4.3 • in flexion/extension and 1.7 • in adduction/abduction (Fig. 4 (g)).If the base network was a transformer, the RMSE accuracy improvements brought by TL ranged from 6.37 • to 15.07 • in knee flexion/extension and from 5.98 • to 22.05 • in hip flexion/extension when the model generalized from yoga to other four activities (all p < 0.0001) (TABLE III).Compared to the TL method using LSTM as the base network, the minimum accuracy of joint flexion/extension of the TL Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.method using the transformer network is improved when tested on unseen activities, which suggests that the transformer network can perform better activity generalization after adding the fine-tuning mechanism of the TL method.This may be because the transformer model itself (without TL) is more sensitive to test samples from unseen activities compared to the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
LSTM model.Therefore, TL is more beneficial for improving the transformer's accuracy on unseen activities, which also reflects the necessity of TL to build generalized and robust transformer models in the future.We compared the accuracy of the transformer and the LSTM model after applying the TL method, and there was no significant difference between the two models in all 12-dimensional angle values (TABLE I, TABLE III).
The subject-independent TL achieved higher accuracy than NoTL when transferring knowledge from each subject's data of the yoga domain to the swimming domain (Fig. 4 (b)).In addition, subject-independent TL generally achieved higher accuracy than NoTL for badminton, dance, and golf activities, except for a few subjects (Fig. 4 ((d)(f)(h))).This indicates that the subject-independent TL-based model was robust to individual differences for the dynamic activity (Fig. 4 TL consistently outperformed NoTL when the source or target domains included swimming and when the source domain was yoga, especially in the flexion/extension dimension (Fig. 4, TABLE I).It is also noteworthy that in transfer pairs such as golf-badminton, badminton-golf, badmintondance, dance-yoga, dance-golf, and dance-badminton, TL, and NoTL exhibited similar performance due to the similarity in the data distribution of these activities (TABLE I).Golf, badminton, and dance activities could easily be substituted for each other in terms of joint angle estimation (Fig. 5 (b)).
In the task of simultaneous motion pattern recognition and joint angle estimation using a reduced IMU configuration, the average RMSEs of knee joint angle estimation by the AAHM model (the base network is LSTM) were 10.20 The first activity classification stage of AAHM achieved an accuracy of 81.8%, an F1-score of 82%, a sensitivity of 81.7%, and a specificity of 95.4% (Fig. 5 (b)).Note that directly applying the five separate estimation models corresponding to each activity would be equivalent to a virtual AAHM with 100% accuracy of activity classification in the first stage.Surprisingly, there was no significant difference between the real AAHM and virtual AAHM (with perfect activity classification) in all twelve knee and hip angle axes (Fig. 5 (a)).This observation indicates that the first classifica-tion stage has a negligible effect on the subsequent estimation stage, despite the AAHM achieving a classification accuracy of only 81.8%.Badminton and golf were the most easily confused with each other among the activities, followed by badminton, golf, and dancing, making these three activities prone to confusion (Fig. 5 (b)).Additionally, there was confusion between yoga and badminton/golf/dancing, indicating that certain activities exhibited similar motion patterns (Fig. 5 (b)).However, swimming had the highest classification accuracy among all the activities (Fig. 5 (b)).The degree of similarity between activities reflects the performance of transfer learning (TABLE I).
Based on the comparison between the ground truth and the estimated knee and hip angles for each activity, yoga, swimming, badminton, and dance showed a strong correlation in knee joint flexion/extension compared to other axes (Fig. 6).On the other hand, hip adduction/abduction and hip internal/external rotation were the two dimensions that exhibited lower estimation accuracy across all the activities (Fig. 6), consistent with the results obtained in the previous study [3].We presented the allowable ranges of ground truth values and estimated values for five activities (TABLE IV), along with the normalized RMSE of joint angle estimates concerning the allowable range of ground truth (TABLE V).

IV. DISCUSSION
For the known activities in the training set, the proposed AAHM with an LSTM neural network achieved satisfactory three-dimensional joint angle estimation accuracy during yoga, golf, swimming, badminton, and dance, using a reduced IMU configuration (Fig. 5 (a)).To further enhance the model's generalization ability to unseen activities, the proposed TL method improved the estimation accuracy of the model trained on the yoga activity when tested on unseen dynamic activities.The maximum improvement in estimation accuracy (RMSE) achieved by TL is 23.6 • for knee flexion/extension and 22.2 • for hip flexion/extension compared to the NoTL baseline (Fig. 4(a)(c)(e)(g), TABLE I).Notably, these improvements were achieved using only 20% of the original size of the dynamic activity dataset (Section II-B.4).

A. Joint Angle Estimation in Various Activities
We compared the joint angle estimation accuracy with the most related motion capture studies that also evaluated various semi-static and dynamic movements using reduced sensor configuration [6], [8], [17], [57].However, note that those studies did not specifically focus on validating the accuracy of a particular activity, and the subjects only performed a few repetitions for the specific movements.In our study, the joint angle estimation accuracy for golf, badminton, and dance ranged from 3.83 to 13.07 • in all dimensions (Fig. 4 (g)(c)(e), TABLE I).This level of accuracy was higher compared to the RMSEs reported in previous studies [6] (17.54 • ), [17] (10-15 • ), and [8] (15.02 • ).Additionally, our results were consistent with the RMSE reported in the study [57].The joint angle estimation accuracy for the highly dynamic swimming activity was approximately 7.15-20.71• (Fig. 4 (g)(c)(e), Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I RMSE COMPARISON BETWEEN TL AND NOTL IN ALL THE SOURCE-TARGET TRANSFER PAIRS (BASIC NETWORK WAS LSTM)
TABLE I), which aligns with the results reported the study [6].Semwal et al. [20] proposed an algorithm for calculating joint angles of the robot based on the 3-link However, a human knee joint cannot be simplified into one-dimensional hinge or a three-dimensional ball hinge for robots because the knee joint can move in all three dimensions, and the range of motion in the coronal plane is deeply affected by the motion in the sagittal plane.During walking, the flexion of the knee joint changes greatly, and the other two-dimensional angle changes are small.However, during non-walking dynamic activities, the three-dimensional angle changes may be large.Therefore, even if the error of this algorithm is small during walking-type activities, it may not be effective during dynamic activities.We also compared our results with a study that estimated lower-limb 3D joint angles during walking using the same reduced sensor set [3]. Sy et al. [3] reported sagittal knee and hip joint angle RMSEs of 10.0±2.9 • and 9.9±3.2• , respectively, similar to our results.Fig. 5. Simultaneous motion pattern recognition and joint angle estimation using a reduced IMU configuration (basic network was LSTM).(a) Overall accuracy comparison between the activity-aware-based hierarchical model (AAHM) and directly applying the five separate estimation models corresponding to each activity.Note that directly applying the five separate estimation models would be equivalent to a virtual AAHM with 100% accuracy of activity classification in the first stage.There were no significant differences between the AAHM and the five separate estimation models (all less than 0.1 • ).(b) Confusion matrix for the first classification stage of AAHM.Fig. 6.Comparison of the ground truth and the estimated knee and hip angles corresponding to each activity (basic network was LSTM).X-axis of each subfigure: time (s), y-axis: angle (degrees).R: Right, L: Left, flex: flexion/extension, abd: adduction/abduction, int: internal/external rotation.
However, they observed that during turning movements, the RMSEs for sagittal knee and hip joint angles could reach 15-20 • , and the RMSE for the hip joint angle of internal/external rotation could reach 30-45 • .The reason for these large errors is that the constraints designed by the algorithm are mainly based on walking movement patterns, which may not be applicable or require additional constraints for movements other than walking, such as turning.Our activities include many dynamic movements, such as golf swings that involve movements similar to turning, so joint angle estimation in dynamic activities may achieve lower accuracy than walking.Their study also reported an error of 15 • in hip adduction/abduction at the beginning of the movement, which could be attributed to the model considering previous actions and performing event detection.There is more movement variability in non-periodic dynamic activities such as badminton, which could be one reason for the larger errors observed in dynamic activities.In addition, estimating hip adduction/abduction and hip inter- nal/external rotation angles posed greater challenges than other axes (Fig. 6).These insights can inform future improvements joint angle estimation models and help focus on addressing the challenges associated with specific joint angle dimensions and activity types.
Swimming exhibited significant differences in movement patterns compared to the other three dynamic activities (golf, badminton, and dance), so swimming was less likely to be confused with the other activities (Fig. 5(b)).In swimming, the subject's feet could not touch the ground, and their legs could not rely on external support.Therefore, the pelvis served as the pivot point to support the legs, allowing for lifting off and completing movements (Section II-C.3).Whereas in other activities, the pivot point is the foot in contact with the ground.Furthermore, other body segments will move around the pivot point.Therefore, the relationship between the movement of the shank IMU and the pelvis IMU in swimming is different from that in other dynamic activities.The transformer excels in handling sequence-to-sequence mapping problems through attention mechanisms, such as translation tasks in natural language processing [50].However, joint angle estimation with reduced sensor configuration is not a simple problem, it mainly the challenge of time series estimation [19], [20], [21], [22], [58].Furthermore, due to the weak periodicity of the five non-gait dynamic activities investigated in this study, finding the motion patterns of human body segments is more challenging compared to typical time series modeling problems.The reduction in the number of sensors further reduces the observability of the joint angle estimation problem [6].Some recent studies reported that RNNs (or LSTMs) and MLP-type models may outperform transformers in certain time series tasks [59], [60].One reason for this is that transformers are less effective in handling time series data compared to vision-related information [61].Moreover, RNN-type models might still be a viable option if the temporal nature of the data is crucial for the task.Furthermore, we did not explore the sensitivity of the transformer model to longer time windows of data.Using longer historical sequence information tends to gradually enhance the performance of the transformer [33].However, training the transformer model with a longer sequence reduces its feasibility in potential realtime applications.
In a recent study, Geissinger and Asbeck [17] estimated the orientation of all body segments using a reduced set of sensors (five or six for the full body).The study compared two RNN models and two transformer models.They collected a special test set that included physical exercises with significantly higher accelerations, such as Frankensteins, burpees, pushups, high jumps, and jogging, as well as stationary periods and some low-acceleration movements.This special test set exhibits non-periodic and unpredictable behavioral patterns that are more similar to the dataset collected in our study.The overall joint angle estimation results showed that the RNN-type model performs best, but the accuracy difference among the four different models is within 2 • .No single model outperformed the others in all four reduced-sensor configurations, and the transformer (mainly encoder) outperformed the transformer under some configurations.These findings suggest that in more dynamic and non-periodic activities, models of the RNN class may be slightly superior.However, the differences between transformers and RNNs are small, so it is difficult to conclude which of these models is superior across different datasets and reduced-sensor configurations.Sharifi-Renani et al. [33] employed a transformer that utilizes four IMUs for sagittal angle estimation at the hip, knee, and ankle joints.The transformer architecture comprised encoder layers and fully connected layers.The study reported comparable or slightly superior performance of the transformer and LSTM models.However, it is noteworthy that the problem addressed in this study relates to a normal sensor configuration, not our reduced sensor configuration.Furthermore, the study emphasized caution in applying transformer models to new datasets, as variations in sensor placement and sensor accuracy could adversely affect model estimations [33].Future improvements in transformer architecture may demonstrate better results in the domain of reduced sensor configurations.Our study did not include all types of activities, and for other activities, the transformer may still be a favorable candidate model.

B. Transfer Learning
When transferring knowledge from the semi-static yoga poses to the swimming activity, TL substantially improved accuracy compared to NoTL (Fig. 4 (a)).However, when the target domain was badminton, dancing, or golf, TL showed a relatively smaller improvement over NoTL (Fig. 4 (c)(e)(g)).
It could be attributed to the differences between yoga and the dynamic activities of golf, badminton, and dance being smaller than the differences between yoga and swimming (Fig. 5(b)).Transferring knowledge between highly similar activity domains, such as badminton, golf, and dance, may result in no significant effect or even slightly negative transfer (TABLE I).It suggests that the information contained in the source domain data was already sufficient for estimating samples from the target domain [32].
The NoTL method displayed notable variability in RMSEs across subjects.Specifically, when applying the model trained on the yoga activity to the swimming activity, the RMSE values for some subjects showed a difference of more than 20 • (Fig. 4(b)).The error of the NoTL method includes not only individual differences but also large differences in data distribution between activities, such as swimming and yoga activities that are significantly different (Fig. 5(b)).Consequently, individual differences in the test set may be amplified by the activity differences, resulting in considerable RMSE variability among subjects.This implies that the model may be less robust across subjects if not adapted to new activities.In contrast, our proposed TL method effectively minimizes the data distribution differences between activities.Therefore, its RMSE variability between subjects appears reasonable and significantly smaller than that of the NoTL method (Fig. 4(b)).
Using approximately 20% of the target domain data for transfer is a commonly employed practice.Kang et al. [49] used 20% of the target domain data for transfer and observed that the accuracy reached a stable level when the amount of target domain data involved in the transfer exceeded this threshold.Similarly, Ameri et al. [38] utilized 25% of the target domain data to minimize the length of retraining sessions while ensuring accurate estimation accuracy.

C. Simultaneous Motion Pattern Recognition and Joint Angle Estimation
Despite the lower classification accuracy of the first stage of AAHM, the final estimation accuracy of AAHM remained comparable to directly applying the five separate estimation models corresponding to each activity (Fig. 5(a)).In cases where the classification model misclassified badminton samples as golf activity, which occurred with a probability of 22.2% (Fig. 5(b)), the estimation model for golf was used to test the badminton samples in the second stage of AAHM.The scenario corresponds to the NoTL method, where the source domain is golf, and the target domain is badminton (TABLE I).The accuracy of NoTL did not significantly decrease, remaining below 10 • (TABLE I).Therefore, even when there are misclassifications in the first stage, they have a relatively small impact on the final estimation accuracy of the AAHM.On the other hand, the first stage of AAHM demonstrated a high correct classification accuracy of 92.8% for swimming (Fig. 5(b)).It indicates that AAHM rarely confused swimming with other activities due to the significant differences between swimming and other activities.Therefore, samples were effectively prevented from entering the wrong estimation model in the second stage, which helped avoid a severe decrease in accuracy (TABLE I).

D. Future Possibilities
In future work, the proposed TL method can be extended to knowledge transfer between any domains with similarities in data distribution or feature space (Section II-B.2).Therefore, this approach can enhance the model's generalization ability in various dimensions, including different populations, and tolerate IMU placement errors, not limited to different activities alone.The TL method enables fine-tuning of estimation models originally trained on healthy populations for application to osteoarthritis and total arthroplasty populations.This adaptation uses only a small subset of observations from these specific groups, reducing the need for extensive data collection and retraining of models.
Our approach can be further extended to include more dynamic activities such as football and skiing in the future.This expansion holds the potential to make wearable injury prevention and sports training systems more widespread and economically viable.Moreover, the TL method enables the model to acquire the ability to estimate joint angles in dynamic activities using existing semi-static yoga data sets.This will contribute to the development of joint angle estimation models for more dynamic activities.Importantly, collecting data from semi-static activities is easier than collecting data sets from dynamic activities, thus significantly saving data collection efforts.
Future work should seek to extend these findings to further investigate the effectiveness of our proposed method Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
in estimating upper body joint angles with reduced sensor configurations.This method has the potential to integrate dependencies between the upper and lower bodies, allowing for the joint angle estimation of the upper body during activities involving the entire body, particularly in situations such as golf and swimming.The currently proposed activityaware-based hierarchical model can only recognize known activities, though future work should focus on exploring additional techniques such as anomaly detection to achieve activity recognition for unknown activities.

E. Limitations
One limitation of this study is that the swimming data was collected in a simulated seated position rather than in actual water.In real swimming situations, the forces on the joints of the whole body are significantly different compared to common activities on land.Therefore, although we instructed subjects to simulate swimming movements by driving their legs as if lifting off in water without attaching any object, the absence of water resistance and other aquatic factors might have somewhat affected the dynamics and joint angles.

V. CONCLUSION
This study performed accurate 3D knee and hip angle estimations using a reduced IMU sensor set across five activities, including yoga, golf, swimming, badminton, and dance.This validates the feasibility of the reduced sensor configuration beyond gait-type activities.Simultaneous motion pattern recognition and joint angle estimation revealed remarkable similarities in models for activities prone to confusion.This suggests a practical strategy: when developing estimation models for new activities with data distributions akin to known activities, the datasets of known activities can be directly employed, thus substantially reducing the need for extensive data collection.Additionally, we proposed a TL-based approach to estimate joint angles in unseen activities with limited training data, thereby enhancing the model's generalization ability across diverse activities.This is particularly beneficial for activities where the data distribution differs significantly from known data sets.Leveraging the known semi-static yoga poses dataset, our proposed approach significantly improved joint angle estimation in dynamic activities compared to without TL, utilizing only 20% of the original dataset size.These findings not only extend the applicability of motion capture with reduced sensor configurations to a wider spectrum of activities relevant to injury prevention and sports training but also offer a concrete TL-based solution for augmenting the model's adaptability to unseen activities using minimal training data.

Fig. 1 .
Fig. 1.(a) General flowchart for joint angle estimation of various movements with reduced sensor configurations.(b) Proposed activityaware-based hierarchical model structure for simultaneous motion pattern recognition and joint angle estimation.The first stage is a motion pattern classification model that identifies the activity type of the input sample.Based on the identified activities, the sample is directed to the corresponding estimation model in the second stage.(c) Transfer learning model structure for generalizing between different activities, using the example of extending the yoga training model to dynamic activities.We obtained all layers preceding the dropout layer from the pre-trained source model and transferred them to the target model.The target model was integrated by the transfer layer and a newly introduced fully-connected layer, a dropout layer, and an output layer.Subsequently, we trained the target model using a minimal dynamic activity dataset.

Fig. 3 .
Fig. 3. Semi-static yoga poses used for reconstructing joint angles in dynamic activities.Subfigures 1 to 6 correspond to the following yoga poses: Standing Pose, Phantom Chair Pose, Dragon Pose, Initial stage of Warrior I Pose, Warrior I Pose, and Warrior II Pose.The data collection also included transitional movements for these poses.

Fig. 4 .
Fig. 4. Accuracy comparison between NoTL and the proposed TL when transferring from the semi-static yoga poses to the four dynamic activities (swimming, badminton, dance, and golf) (basic network was LSTM).(a)(c)(e)(g) The mean RMSEs in each DoF for the methods.(b)(d)(f)(h) The RMSEs in the left knee and hip flexion/extension for individual subjects.The subject-independent TL-based model was robust to individual differences in the dynamic activity domain.NoTL represents directly applying the source model to the target activity (without TL).* denotes a significant difference between the RMSEs of NoTL and TL (p < 0.001 for flexion/extension; p < 0.05 for other dimensions).R: Right, L: Left, flex: flexion/extension, abd: adduction/abduction, int: internal/external rotation.

TABLE II RMSES
OF SIMULTANEOUS MOTION PATTERN RECOGNITION AND JOINT ANGLE ESTIMATION USING A REDUCED IMU CONFIGURATION WITH DIFFERENT MODELS

TABLE III RMSE
COMPARISON BETWEEN TL AND NOTL IN ALL THE YOGA-DYNAMIC ACTIVITY TRANSFER PAIRS (BASIC NETWORK WAS TRANSFORMER)

TABLE IV GROUND
TRUTH AND ESTIMATED ALLOWABLE RANGE (DEGREES) FOR FIVE ACTIVITIES (BASIC NETWORK WAS LSTM)

TABLE V NORMALIZED
RMSE (%) OF JOINT ANGLE ESTIMATES CORRESPONDING TO EACH ACTIVITY (BASIC NETWORK WAS LSTM)