Cross-Subject Lifelong Learning for Continuous Estimation From Surface Electromyographic Signal

The employment of surface electromyographic (sEMG) signals in the estimation of hand kinematics represents a promising non-invasive methodology for the advancement of human-machine interfaces. However, the limitations of existing subject-specific methods are obvious as they confine the application to individual models that are custom-tailored for specific subjects, thereby reducing the potential for broader applicability. In addition, current cross-subject methods are challenged in their ability to simultaneously cater to the needs of both new and existing users effectively. To overcome these challenges, we propose the Cross-Subject Lifelong Network (CSLN). CSLN incorporates a novel lifelong learning approach, maintaining the patterns of sEMG signals across a varied user population and across different temporal scales. Our method enhances the generalization of acquired patterns, making it applicable to various individuals and temporal contexts. Our experimental investigations, encompassing both joint and sequential training approaches, demonstrate that the CSLN model not only attains enhanced performance in cross-subject scenarios but also effectively addresses the issue of catastrophic forgetting, thereby augmenting training efficacy.

demand for efficient Human-Machine Interfaces (HMI), especially in areas like smart homes, industrial automation, and healthcare.User-friendly and high-quality interaction signals are essential in these contexts.Surface electromyography (sEMG) is a promising technology in this regard, playing a critical role in interpreting human intentions from biological signals.Accurate extraction of motion intentions from sEMG signals is crucial in high-precision related tasks of HMI, ensuring reliable and responsive control.The effectiveness of sEMG, as demonstrated in various studies [1], [2], [3], [4], [5], also shows considerable potential in the development of HMI.
Deep learning methods are widely applied in sEMGbased motion estimation tasks.Techniques employing artificial neural networks (ANNs) [6] have been proposed for joint motion estimation, although they are typically limited to specific degrees of freedom.Subsequent advancements introduced methods based on Recurrent Neural Networks (RNNs) for hand pose estimation [7], despite facing computational challenges.Recent developments include the integration of Convolutional Neural Networks (CNNs) with Long Short-Term Memory (LSTM) networks [4], [8], enhancing both performance and efficiency.However, existing methods primarily focus on individual subjects, underscoring the necessity for more universal approaches capable of handling multiple subjects and supporting knowledge transfer among them.
sEMG signals are markedly varied among individuals, presenting substantial challenges in developing versatile models for wide-ranging applications.Subject-specific models excel in certain gesture recognition tasks, yet they are impractical for broader applications, such as those in public facilities and industrial settings.The use of a unique model for each individual, coupled with frequent personnel changes, can lead to substantial costs.Thus, there is a clear need for a singular, adaptable model that ensures seamless integration across various users, devices, and scenarios.This model must be robust and flexible, capable of accommodating the unique characteristics of surface electromyography (sEMG) signals from each user without requiring extensive training.
Recent advancements in cross-subject studies tackle sEMG signal variability through two primary strategies.Transfer learning techniques, such as those described in [9], adapt subject-specific models for a broader user base.For example, Fan et al. [10] developed a hand gesture recognition method benefiting both intact-handed individuals and amputees.Yet, these methods encounter practical limitations, including increased memory demands and scalability challenges, necessitating distinct models for each user.As an alternative, cross-subject methods have emerged.Our previous research [5] introduced a BERT-based approach with µlaw normalization, enabling effective cross-subject training.Similarly, Long et al. [11] utilized adversarial transfer in a multi-scale model for sEMG signal interpretation across subjects.However, these prevalent transfer learning methods, including those of Long et al. [11], often prioritize new users and struggle to maintain robust predictions for previous users, leading to catastrophic forgetting and an inability to retain information from earlier users.In contrast, self-contained models like our BERT-based approach [5] focus on current training data but face difficulties in preserving knowledge from past users.
Lifelong learning is crucial for addressing catastrophic forgetting in sEMG-based tasks, combining adaptability and knowledge preservation.This method overcomes the limitations of subject-specific models and boosts efficient learning and memory usage.Progressive neural networks [12], [13] have notably tackled catastrophic forgetting in gesture classification.Classic lifelong learning methods like Learning without Forgetting (LwF), Elastic Weight Consolidation (EWC), and Gradient Episodic Memory (GEM) [14] have been applied in sequential time series.Nonetheless, applying lifelong learning to continuous sEMG estimation presents unique challenges, such as complex feature extraction and balancing between preserving existing knowledge and adapting to new users.Research exploring specific lifelong learning strategies for continuous sEMG estimation remains limited.
In this study, we introduce an innovative cross-subject model that incorporates lifelong learning, effectively addressing variability in sEMG-based tasks and demonstrating superior performance.Notably, this research marks the first adaptation of lifelong learning to sEMG-based regression tasks.
Our contributions are summarized as follows: • We propose a cross-subject model that excels in continuous sEMG prediction.
• Our model incorporates a novel lifelong learning strategy within the CSLN, which rapidly adapts to new users while retaining knowledge from previous users.
• Our method achieves state-of-the-art performance in hand kinematics estimation, surpassing existing methods not only in cross-subject settings but also in lifelong learning scenarios.

II. RELATED WORK A. Transfer Learning
Transfer learning [9] is a paradigm in artificial intelligence where knowledge gained from solving one problem is applied to a different but related problem.This approach has garnered significant attention due to its ability to leverage pre-existing knowledge and adapt it to new tasks, often leading to improved performance or efficiency, especially when labeled data for the target task is limited.By transferring knowledge learned from a source domain to a target domain, transfer learning enables models to generalize better and requires less training data, making it particularly useful in scenarios where data collection is expensive or time-consuming.Fine-tuning and Domain Adaptation are two widely used transfer learning methods.

B. Lifelong Learning
Lifelong learning, alternatively known as continual or incremental learning, has risen as a key approach to counter catastrophic forgetting, a phenomenon causing models to underperform or fail with changing data distributions.This leads to a rapid decline in performance on previously learned tasks.Recent research in this field, exemplified by works such as [15], [16], and [17], can be categorized into four main strategies [15]: gradient-based, modularity, memorybased, and meta-learning methods.
Modularity methods aim to balance between a single network that is prone to forgetting and multiple networks that impede task transfer.These methods adapt by adding new parameters for new tasks [23], [24] or implementing sparsity strategies [25], [26].
Finally, meta-learning approaches [34], [35] focus on automatically learning inductive biases related to architecture, data, and learning parameters, which traditionally required manual design.

III. CROSS-SUBJECT MODEL WITH LIFELONG LEARNING A. Data Preprocessing
In this study, the Root Mean Square (RMS) feature is extracted utilizing a sliding window of 100 ms with a 0.5 ms step, as per the long exposure method [4].This window size and step are chosen to balance temporal resolution and computational efficiency, effectively reducing noise and revealing more sequential features in the data.Following feature extraction, a logarithmic normalization, specifically µ-law normalization [36], [37], is applied to standardize the extracted features.The normalization can be formulated as: Here, x t denotes the input raw data, and µ is a hyperparameter, set to µ = 2 20 in our study.The choice of µ is based on an exploration range from 2 5 to 2 25 , optimizing for the best performance in data normalization.

B. Cross-Subject Lifelong Network
The Cross-Subject Lifelong Network (CSLN) features an encoder-decoder architecture with two main branches: a regression branch and a classification branch.In the regression branch, a Temporal Convolutional Network (TCN) based encoder extracts shared knowledge from sEMG signals, while individual features are generated through adapters.The classification branch, also employing a TCN, predicts subject labels, aiding in selecting specific knowledge for the regression branch.Two decoders then reconstruct the input signal using these features, enhancing cross-subject task performance.Figure 1 shows the CSLN structure.
The final phase of CSLN involves regression and classification tasks to predict hand motion and classify subjects.Adapters and linear layers with distinct parameters are used for adapting to new tasks, maximizing the utility of extracted features.
Adapters, as defined by Houlsby et al. [38], are compact neural network modules placed between pre-trained base layers and task-specific layers.They efficiently adapt models to new tasks or domains by learning limited task-specific parameters.In CSLN, adapters play a crucial role in acquiring subject-specific parameters, supporting the adaptation to various subjects and the lifelong learning process.Figure 2 displays the adapter structure used in CSLN, with the adapter function Ada(x) specified as follows: where GELU represents the Gaussian Error Linear Unit activation function and FC (c in ,c out ) denotes a linear layer with input channel c in and output channel c out .
Several key components are essential to outline the model's procedural steps.The input sEMG signal is denoted as x ∈ R c×t , with c representing the number of channels and t the window size.Regression and classification tasks use encoders E r and E c , and their corresponding decoders are D r and D c .The outputs include the classification soft label ŷc , motion prediction ŷr , and reconstruction x.The overall logical procedure is: where the h r and h c represent the features extracted for regression and classification, respectively.In order to describe the process, we define the process T C N (c i ,c o ) (I) as follows: Here, I and O represent the input and output, c i , c o represents the input and output channel of the whole T C N (c i ,c o ) process, F d (c in ,c out ) refers to the causal and dilated convolutions used in TCN [39], where d is the dilation rate, and c in and c out are the input and output channels of F, respectively.
Then, the operation of h c = E c (x), h r = E r (x, ŷc ) and x = D(h) is further detailed as follows: where c signifies the channel number for the input RMS feature of sEMG, and c h represents the hidden feature channel in our proposed model.N represents the total number of subjects among all trained tasks.The vector A = {a 1 , a 2 , . . ., a n }, n represents the number of subjects in all the previous tasks.
x is the reconstructed sEMG signal.The decoder is divided into two parts: D r for regression and D c for classification, operating on the respective hidden features h r and h c .Linear layers, in conjunction with adapters, function as classifiers and regressors in our model, enabling the inference and output generation from hidden features, the final processing steps are summarized as: Here, C and R denote the classifier and regressor, respectively, where N is the number of subjects and c o signifies the output channels, with c o = 10 in this study.The loss function is formulated as: In this equation, L r , L r ec , and L c represent the regression loss, reconstruction loss, and classification loss, respectively.The hyperparameter λ is employed to balance these components.Specifically, the loss functions are defined as: Here, MSE refers to the mean square error loss and CE to the cross-entropy loss.The terms y r and y c are the ground truth labels for hand motion and subject classification, respectively.

C. Lifelong Learning Strategy
While the Cross-Subject Lifelong Network (CSLN) demonstrates proficiency in cross-subject scenarios, a challenge arises in generalizing to new users without compromising the knowledge acquired from previous users.To tackle this issue, we integrate a lifelong learning strategy into CSLN.
Inspired by existing research in transfer and lifelong learning [33], [40], our approach employs a hybrid strategy that combines modularity with rehearsal.This method utilizes the modularity concept to transform the lifelong learning regression challenge into a more manageable classification problem, which is then addressed using a rehearsal technique.
For the regression branch, we employ a modularity method [15], adopting the adapter structure commonly used in transfer learning for fields such as natural language processing [38] and computer vision [41].This structure facilitates efficient parameter expansion and retraining for new tasks.
The classification branch aids in selecting knowledge via the outputs of multi-head adapters.These adapters produce feature outputs that are aggregated to form a comprehensive feature vector.The vector, when multiplied by individual classification labels, produces the regression feature, enabling the network to learn continually from new users while preventing catastrophic forgetting.For transferring knowledge in the classification branch, we implement a replay strategy, incorporating a memory module.
The memory module comprises a storage buffer and a sampler.The buffer stores observed RMS features with associated subject labels, assigned sequentially as array indices for efficient memory usage.It ensures equal capacity for the data from each subject.The sampler uniformly selects inputs from all subjects to maintain consistent batch sizes.After training, it randomly adds new observations to the buffer, supporting continuous learning and effective memory management.
The lifelong learning process in our model is delineated into three critical phases: (1) Preparing Phase: Before embarking on new tasks, the model undergoes an expansion process.This involves integrating new adapters and inheriting parameters from previously used adapters linked to user classification.To accommodate an increasing number of subjects across all tasks, we expand the linear layer in the classifier C accordingly.
(2) Training Phase: In this phase, the model trains on new tasks.A sampler effectively merges input data from new subjects with replay data proportionately, based on the number of subjects.Due to the absence of ground truth for hand motion in old data, we modify the loss function to: Here, L ′ signifies the loss derived from uniformly sampled data, whereas L pertains to the loss from solely new data.This equation aims to balance the input from different data sources in the lifelong learning framework.The parameter λ, which is pivotal for the role classification plays in maintaining knowledge of previously trained subjects, is set to 1 × 10 4 in our study.
(3) Storing Phase: Post-training, the sampler randomly selects an equal number of observations for each new subject, allocating them to the memory buffer.In our approach, 100 observations per subject are stored in the buffer.

IV. EXPERIMENTS A. Dataset
The Ninapro dataset [42], a widely recognized resource in the field of sEMG, provides comprehensive data covering both intact and amputated hands, encompassing over 300 acquisitions from 10 datasets, provides a diverse range of electromyography and kinematics data.This dataset is notable for its extensive collection of sEMG signals captured using the Delsys Trigno Wireless System with 12 electrodes.Additionally, it includes hand kinematics data measured by 22 joint angles via CyberGlove II data gloves.The sEMG data is sampled at a frequency of 2 kHz, while the hand kinematics, initially recorded at 20 Hz, are resampled to match the 2 kHz sEMG data.As such, the Ninapro dataset is an invaluable benchmark for evaluating sEMG-based humanmachine interfaces.
In this study, focus is placed on six hand movements related to grasping, targeting 10 finger joints to ensure data quality.All 40 subjects from DB2 are included in cross-subject settings, showcasing our structure's effectiveness.In all settings, 7/10 of each subject is used for training and 3/10 for testing.In lifelong learning experiments, these subjects are divided into 5 tasks (t i , where i ∈ 1, . . ., 5), with sequential training on each task.To facilitate a thorough evaluation and address potential biases, 5 varied task sequences are generated, detailed in Table I.

B. Evaluation Criteria
To rigorously assess and benchmark our method against other models, we utilize several key evaluation criteria: 1) Pearson Correlation Coefficient (CC): The Pearson Correlation Coefficient (CC) is a commonly employed metric that quantifies the linear relationship between two variables.Ranging from −1 to 1, a higher CC value indicates a stronger Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.correlation between the predicted and actual movements, denoting enhanced performance in motion estimation.

TABLE I THE DETAIL OF ALL THE TASK SEQUENCES. THE NUMBERS IN THE TABLE REPRESENT THE NUMBER OF SUBJECTS
2) Normalized Root Mean Square Error (NRMSE): NRMSE, a derivative of Root Mean Square Error (RMSE), quantifies the deviation between predicted and observed values.It is normalized for joint angles as follows: where θ max and θ min represent the maximum and minimum true joint angle values, facilitating comparison across different joints.
3) Unbiased Standard Deviation (σ ): The unbiased standard deviation (σ ) measures the dispersion of data, assessing the stability of our method.The σ for 10 joints per subject is calculated, with smaller values indicating reduced dispersion and thus higher estimation consistency.
4) Lifelong Learning Evaluation: For evaluating lifelong learning performance, we adopt criteria from gradient episodic memory [18]: Average Accuracy on Tasks (ACCT): This metric reflects the overall proficiency of the model across all tasks, indicating its ability to preserve prior knowledge while adapting to new tasks.
where R i, j means the performance of model trained after previous i tasks performed on the j th task.T means the total number of tasks, and the same applies throughout.Average Accuracy on Subjects (ACCS): Focused on the final model, ACCS evaluates the average accuracy across all subjects, indicating the model's effectiveness in retaining knowledge and adapting across subjects.
where CC represents the calculation of PCC, Y i denotes the true movement of the i th subject and Ŷi denotes the predicted movement of the i th subject.Backward Transfer (BWT): This measures the effect of learning new tasks on the performance of previous tasks, revealing the capability to maintain performance on earlier tasks.
Forward Transfer (FWT): FWT assesses the impact of prior learning on new tasks, indicating the ability to utilize previous knowledge for new tasks.
The Wilcoxon signed-rank test, adjusted by the Bonferroni correction, is used to evaluate the statistical significance of our method.

C. Baseline Methods
In this study, our method is compared against two transfer learning approaches (fine-tuning and domain adaptation) and two lifelong learning techniques (EWC and LwF).
Fine-tuning adapts pre-trained models to specific tasks, refining parameters with new data.Domain adaptation transfers knowledge between different domains, assessing the ability of our model to mitigate catastrophic forgetting.
EWC and LwF are lifelong learning methods focusing on knowledge preservation from previous tasks.EWC regularizes model parameters based on their task significance, while LwF utilizes knowledge distillation for transitioning knowledge between tasks.These methods evaluate the efficacy of our strategy in a lifelong learning context.

D. Experimental Results
The effectiveness of our method in continuous hand movement estimation tasks is rigorously evaluated against previous cross-subject approaches.All models are developed using the PyTorch framework [43] and trained on an NVIDIA GeForce RTX 3090 GPU with adam optimizer.The training process spans 400 epochs, with 200 epochs dedicated to each transfer phase.Our experiments are conducted under two distinct scenarios: cross-subject training and cross-subject lifelong learning.
In the cross-subject training scenario, models are simultaneously trained and validated using data from all 40 subjects.For cross-subject lifelong learning, models undergo sequential Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Parameter selection was meticulously conducted after extensive trials with various configurations.We standardized all models to train with a learning rate of 0.0001, which is halved post-200 epochs.The µ parameter in µ-law normalization is fixed at 2 20 , as determined from a parameter range study spanning 2 5 to 2 25 .The results of this study are detailed in Table II.Additionally, the λ parameter is set at 1 × 10 4 , optimizing the balance between different loss components.
1) Cross-Subject Settings: To demonstrate the efficacy of our method in a cross-subject context, we conducted comparative training with state-of-the-art cross-subject models [5] and our CSLN.training encompassed all 40 subjects, and the results are detailed in Table III.
An interesting observation is that the training convergence time increased with the number of subjects included.However, it is noteworthy that in the lifelong learning setup, the convergence time per subject was, on average, approximately 120 seconds shorter compared to direct task training.This indicates enhanced efficiency in the lifelong learning scenario.
2) Lifelong Learning Settings: In the lifelong learning experiments, a random shuffle is performed on the 40 subjects from the Ninapro DB2 dataset.These subjects are then randomly divided into 5 tasks, forming various task sequences.To mitigate potential biases, this process is repeated, creating  5 task sequences as documented in Table I.These sequences are referred to as S i (where i ∈ 1, . . ., 5), with their detailed division provided in Table I.The evaluation criteria, based on the correlation coefficient (CC), are presented in Table IV.
Furthermore, significant improvements are observed in the average performance in terms of ACCT and ACCS, compared to other approaches.The consistent results in FWT across various methods underscore the comparable efficacy of the proposed strategy.
A detailed analysis reveals that lifelong learning methods excel in BWT, surpassing transfer learning methods and affirming their superiority in retaining knowledge.This demonstrates their effectiveness in preserving previously acquired knowledge while adapting to new tasks.Specifically, our lifelong learning approach shows exemplary performance in BWT, ACCT, and ACCS, underscoring its robustness in knowledge retention.However, a lower performance in  the FWT criterion indicates a potential avenue further enhancement.
The experimental results reveal that the arrangement and order of subjects minimally impact the performance of our method.Similar levels of performance are observed across all generated task sequences in various evaluation criteria, indicating that the specific arrangement of subjects within task sequences does not significantly influence the overall efficacy.This finding implies that flexibility in selecting subject division and order can be tailored to practical requirements without adversely affecting the effectiveness of the lifelong learning approach.To demonstrate the impact of knowledge retention on overall performance, Fig. 5 illustrates the variation in ACCS following each training session using different strategies.
As depicted in Figure 6, the knowledge retention performance of our method is showcased on two channels of subject s8, corresponding to task T 1 in task sequence S 1 from the Ninapro DB2 dataset.The figure highlights three distinct periods for each strategy: after training on T 1 , after training on T 3 , and after completing the final task, T 5 .
While our method effectively mitigates catastrophic forgetting, it cannot completely eliminate it.Each new task introduces a heightened risk of forgetting the old tasks, inevitably leading to a decline in average performance across all tasks, especially as the number of tasks in a sequence increases.Here, we present an extreme case involving 40 subjects divided into 40 tasks, resulting in an accuracy shrinkage to 0.70.However, even in this scenario, our method still outperforms other methods on 5 tasks, underscoring its superiority.
Furthermore, the lifelong learning strategy exhibits accelerated convergence compared to training models from scratch for each task.Utilizing knowledge from previous tasks facilitates more efficient learning on subsequent tasks, thereby reducing the convergence time.This efficiency gain enhances training speed, potentially expediting model deployment in practical scenarios.
Regarding memory usage, the lifelong learning model consistently outperforms models that are directly trained or fine-tuned.The CSLN model maintains a fixed memory footprint based on the number of subjects, whereas the memory requirements of directly trained models escalate with the number of tasks.In scenarios with 40 subjects divided into 40 tasks, the memory consumption of direct training or finetuning models can be nearly 20 times greater than that of the lifelong learning approach.However, achieving an optimal scenario of one task for all 40 subjects remains a challenge in real-world contexts.As visualized in Figure 4, the memory ratio-defined as the ratio of memory usage between models post-direct training or transfer and post-lifelong learningconsistently favors lifelong learning models, with a ratio greater than or equal to 1.This consistent efficiency advantage highlights the value of lifelong learning in optimizing memory usage.

V. DISCUSSION
The CSLN model, as proposed in this study, demonstrates remarkable efficacy in estimating finger joint movements from sEMG signals for the environments that need a singular, adaptable model.It exhibits superior performance in both cross-subject and lifelong learning scenarios.Moreover, the adaptability to various temporal scales is implied by the utilization of Root Mean Square (RMS) features extracted with a sliding window, allowing it to accommodate sEMG data at different temporal resolutions.Additionally, its crosssubject training capability suggests the ability to generalize across individuals, including variations in the temporal patterns of their sEMG signals, further highlighting its practical Fig. 6.Performance Variation of Model on Subject s8 from T 1 of S 1 .The figure displays the estimation results of subject s8, comparing method performance across three periods: after training on T 1 , T 3 , and the final task T 5 ."FT" and "DA" denote fine-tuning and domain adaptation techniques, respectively.Red curves represent ground truth, while blue curves depict the estimation.All the results are smoothened by the AvgSmooth method mentioned in [5].
relevance in the field.Utilizing evaluation metrics such as CC and NRMSE, the model achieves a best CC of 0.8552 and an NRMSE of 0.0782, underscoring its advancement over previous cross-subject models, particularly for subjects with intact hands.Such results solidify the position of our method as state-of-the-art in cross-subject settings, highlighting its practical relevance in the field.
In addition, the novel lifelong learning method integrated into CSLN demonstrates remarkable efficiency in training.This approach surpasses traditional model training from random initialization, showcasing enhanced knowledge retention and faster adaptation to new tasks.With the highest BWT of about -0.10, the model leverages adapters and a memory module for more rapid convergence and reduced training duration.The model's expansion, facilitated by lightweight adapters, not only augments its practicality and scalability but also achieves optimal memory utilization, requiring approximately only one-twentieth of the parameters compared to other methods.However, the underutilization of previously learned knowledge may account for the lower performance in the FWT.
While the CSLN model presents a versatile solution for cross-subject sEMG applications, it is also crucial to acknowledge certain limitations.The model's assumption of distinct subjects within each task sequence may affect its generalization to scenarios with repeated subjects, potentially impacting real-world applicability.There is also room for improvement in inference efficiency, which is a vital aspect for practical deployment.

VI. CONCLUSION
This paper introduces the CSLN, a novel method for accurate continuous hand gesture estimation.Employing a lifelong learning strategy, CSLN adeptly navigates the complexities Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
of hand movement dynamics, effectively addressing crosssubject task challenges and significantly reducing catastrophic forgetting.Our method outperforms existing approaches in terms of accuracy and knowledge retention, making it a promising solution for practical applications such as in factory settings and smart home environments.These scenarios demand versatile methods that adapt seamlessly across various subjects and diverse spatial-temporal contexts.The application of CSLN to wider domains and an array of hand motion tasks promises to enhance its practical utility further.Future research directions include expanding the capabilities to encompass a broader spectrum of hand gesture dynamics and exploring its potential in other relevant fields.The aim is to contribute substantially to the advancement of hand motion estimation systems, fostering the development of robust and adaptable solutions.

Fig. 1 .
Fig. 1.CSLN Model Architecture: A Dual-Branch Cross-Subject Model for sEMG-Based Hand Motion Prediction with Enhanced Knowledge Retention.The regression branch is responsible for predicting motion based on learned knowledge, while the classification branch functions to select relevant knowledge.Common knowledge is stored by the TCN and regressor in the regression branch, whereas individual knowledge is stored in the adapters.

Fig. 2 .
Fig. 2. The Lightweight and Modular Adapter Architecture in CSLN for Knowledge Retention.

Fig. 3 .
Fig. 3. Data description for this work.CyberGlove channels are shown in (a), where the red dots represent the ten joints to be estimated while the yellow dots represent the unchosen joints.Six hand movements are introduced and shown in (b).

Fig. 4 .
Fig. 4. The memory ratio, which is defined as the ratio of memory usage between the model after direct training or transferring and the model after the lifelong learning procedure.

Fig. 5 .
Fig. 5.The variation of ACCS after each training procedure using different strategies on different task sequences.

TABLE II PARAMETER
STUDY OF µ W.R.T. CC AND NRMSE ON ALL 40 SUBJECTSTABLE III AVERAGE PERFORMANCE OF DIFFERENT MODELS ON 10 JOINTS AND 6 MOVEMENTS OF 40 DIFFERENT SUBJECTS IN CROSS-SUBJECT SETTING training across tasks segmented by subjects.To assess performance, we calculate metrics such as NRMSE and PCC, along with monitoring the average training time per epoch and the total convergence time for each training session.

TABLE IV PERFORMANCE
OF EVERY TASK SEQUENCE ON 10 JOINTS AND 6 MOVEMENTS OF 40 DIFFERENT SUBJECTS IN LIFELONG LEARNING SETTING WITH DIFFERENT LEARNING STRATEGIES

TABLE V DETAILED
CC RESULT OF EACH TASK OF TASK SEQUENCE S 1 ON 10 JOINTS AND 6 MOVEMENTS OF 40 DIFFERENT SUBJECTS IN LIFELONG LEARNING SETTING WITH DIFFERENT LEARNING STRATEGIES