Classification of Older Adults Undergoing Two Dual-Task Training Protocols Based on Artificial Intelligent Methods

Activities of daily living require efficient management between motor and cognitive tasks, known as dual tasks. The ability to properly perform such activities progressively decreases with aging. Hence, there is much effort in developing methods to identify relevant features and attenuate this functional loss. This study aimed at testing the ability of some intelligent algorithms to identify and differentiate functional features in community-dwelling older adults undergoing two 24-weeks dual-task training protocols. Additionally, we intended to provide a clear breakdown of the functional performance of the participants after undergoing two types of dual-task training protocols. We utilized the database from the EQUIDOSO-I clinical trial as input to four different types of intelligent classifiers. The algorithm’s performances were analysed considering accuracy, precision, sensitivity, and specificity. Individually, SVM achieved a higher value for sensitivity for one population group, but the remaining metrics continued to report low metrics indicating poor generalization ability of the model. Nonetheless, AdaBoost presented the most consistent results at the end of the intervention period (T2). No between-group difference affected the models’ accomplishments, corroborated by the similar performances at the baseline (T1) and the 6-month follow-up (T3). Therefore, future works should focus on establishing distinct protocols to enhance functional aspects of older adults.


I. INTRODUCTION
Several activities of daily living require efficient management between motor and cognitive tasks [1]. The interaction between these two demands is known as dual task and can be performed under a variable or fixed attention focus, when motor and cognitive activities occur non-concurrently or are simultaneously performed, respectively [2]- [4]. The ability to properly perform such activities progressively decreases with aging [5]. In this context, the gait pattern associated to another tasks is often observed due to its importance for activities of daily living and for suffering great influence from The associate editor coordinating the review of this manuscript and approving it for publication was Alba Amato . cognitive demands [6], [7]. Older adults shows decrease in gait speed [8] and cadence [9], and increased variability on the gait cycle when performing dual task activities [10].
On the other hand, dual-task training is effective to improve the performance of such functional activities [4], [11], [12]. Trombini-Souza et al. [13] developed a 24-weeks dual-task training protocol for community-dwelling older adults. On this matter, World Health Organization (WHO) has constantly drawn attention to the rapid expansion of the population aged 60 and over and to the fact that roughly 80% of older people will be living in low-and middle-income countries [14], such as Brazil. Also, traditionally, the United Nations and most researchers have defined older persons as those aged 60 or 65 years and over [15].Thus, we considered subjects aged 60 and over as older adults since both Brazilian public policies and the research community have commonly adopted this cut-off age to classify older adults.
To verify the therapeutic effect of two dual-task training protocols, the authors analyzed 13 gait variables acquired under single task, dual-task under variable priority (DTVP), and dual-task under fixed-priority (DTFP). Additionally, we observed 35 outcomes regarding cognitive function, activities-specific balance confidence, among others. During the training protocol focusing on the DTVP tasks, participants performed a motor and a cognitive activity, shifting attention between them according to the instructor's command. For the DTFP protocol, participants should focus their attention simultaneously on both motor and cognitive tasks [16]. We observed the effects of single or dual-task intervention during simple gait (motor component only) or walking associated with some cognitive task, respectively.
Although assessment of subjects under dual-task has been used to identify impaired gait patterns or deficits in cognitive functions [10], [17], [18], dealing with a large number of clinical outcomes can make it complex to identify and cluster the results of training protocols by using only classical statistical methods [19]. Moreover, identifying and differentiating such specific functional features can be time-consuming in clinical settings and complex for healthcare professionals, given the considerable amount of data.
Given that, Caldas et al. [20] highlighted the complexity of conventional methods to assess human gait to be regularly applied in clinical practice. Then, artificial intelligence (AI) methods arose as a viable alternative to simplify and optimize motion analysis in an accurate, sensitive, and specific way. Some AI algorithms have been used to identify redundancies in data sets and to determine the most representative aspects of the input variables [21]. Self-organizing maps (SOM) is one of these intelligent algorithms used to organize dimensionally complex data into clusters according to their similarities [22].
A recent study applied this method to cluster healthy subjects according to gait-pattern data, aiming at decreasing the complexity of motion analysis employing inertial sensors [23]. To optimize such tasks, a further study utilized hybrid adaptive approaches to determine the most relevant features, splitting subjects according to their age using selforganizing maps (SOM) with k-means clustering (KM) or fuzzy c-means (FCM) [24]. Although these intelligent algorithms are promising, our literature search has not identified the usage of such intelligent methods to select and differentiate the main functional characteristics of diverse groups of older adults submitted to different dual-task protocols.
Therefore, this study aimed at testing the ability of some intelligent algorithms to identify and differentiate functional features in community-dwelling older adults undergoing two 24-weeks dual-task training protocols. We hypothesized that the intelligent algorithms used in our study would be able to (i) extract hidden relationships from the database regarding a 24-weeks randomized controlled trial of dual-task training to improve the functional performance in community-dwelling older adults and, (ii) provide a clear breakdown of the functional performance of the participants after undergoing two types of dual-task training protocols.

A. STUDY DESIGN
For this study, we utilized the database from the EQUIDOSO-I clinical trial, which focused on studying the functional performance of community-dwelling older adults regarding gait biomechanics, cognitive function, activitiesspecific balance confidence, concerning about falling, sensory integration of static and dynamic body balance, mobility, strength and muscle power of lower limbs, symptoms of depression, and quality of life after 24-weeks of dualtask training. EQUIDOSO-I followed the recommendations of the World Health Organization, the declaration of the World Medical Association of Helsinki, the International Committee of Medical Journal Editors and the Consolidated Standards of Reporting Trials (CONSORT). The study was approved by the Research Ethics Committee of the University of Pernambuco (CAAE: 71192017.0.0000.5207; # 2.415.658) and prospectively registered in ClinicalTrial.gov (NCT03886805).

B. SAMPLE
Sixty participants of both genders (52 female), aged between 60 and 80 years of age (i.e., older people, as recommended by the United Nations and WHO). Our sample also reflects a multifaceted phenomenon known as the feminization of aging, associated with more women than men in the older population. In addition, it is related to the fact that women seek healthcare services more often [25].
The subjects were randomly allocated at a ratio of 1:1 into the experimental group (EG) or control group (CG). The EG participants were trained with DTVP activities during the first three months and, in the subsequent three months were trained under DTFP activities. The CG underwent only DTVP throughout the six months. For the current study, we considered the evaluations at the baseline (T1), after 24-weeks of dual task training (T2), and after 24-weeks follow-up period (T3).
During the trials, the participant was asked to walk a distance of 60 meters (round trip) in a straight corridor for acquiring the gait kinematics under the single task, DTPV, and DTFP protocols. For single task one, the subjects were asked only to walk throughout the 60 meters distance. Nonetheless, for the DTPV protocol, the participant was asked to perform sequential subtraction of three from the number 100 every five meters. Following, in the subsequent five meters, the participant should continue walking at the same pace, but without performing the cognitive task, until the instructor asked to resume such activity from the next mark on the floor. For the DTPF, on the other hand, the subject performed sequential subtractions of three from the number VOLUME 10, 2022 100 until the end of the 60-meter distance. The protocol of this clinical trial was published in detail by Trombini-Souza et al. [13], including its evolution regarding intensity and training volume, inclusion and exclusion criteria.

C. PREPROCESSING
Imputation methods were used to fill missing data in the preprocessing step. To assure efficient training and optimal evaluation, all the non-numeric attributes were either excluded or replaced from the dataset and later scaled to avoid model overfitting and other training issues. Minimum and maximum normalization, and normalization concerning standard deviation were used as regular preprocessing strategies.

D. FUNCTIONAL CLASSIFICATION STRATEGY
In order to identify and cluster the functional performance participants of each intervention group, Stratified K-Fold cross-validation, which consists of splitting the dataset into K smaller sets. Herein, K − 1 of the folds are used as training data, while the remaining part is used for testing. After the evaluation of all folds, the final performance is the K average of the values computed in the loop ( Figure 1). This method properly evaluates the classification pipeline, which consists of the following steps: preprocessing, feature extraction, and classification. The strategy was validated offline by running all tests on a personal computer using Python programming language and its libraries.

E. DATASET AND VALIDATION
As inputs, we used a set of 222 variables, of which 13 variables refer to gait (evaluated under single task, DTVP and DTFP) and 35 variables regarding the cognitive function, activities-specific balance confidence, concerning about falling, sensory integration of static and dynamic body balance, mobility, strength and muscle power of lower limbs, symptoms of depression, and quality of life. All these variables were evaluated in T1, T2, and T3. Gait variables were: speed, variability, asymmetry, cycle duration, cadence, stride length, stride velocity, stance phase, swing phase, double support, peak angular velocity, swing speed, and minimum toe clearance. Furthermore, we utilized features regarding postural balance, functional mobility, and cognitive function, detailed in the protocol published by Trombini et al. [13].
We report the results of four model configurations, built with a diverse set of classifiers and preprocessing techniques. A fair comparison between the multiple algorithms was achieved by selecting optimal hyperparameters for each individual model. This process was made through an exhaustive search over a list of potential parameters.
The process of comparing machine learning models by using the statistical significance still poses as a contradictory topic [26], [27]. Statistical methods rely on assumptions that often can not be fully perceived by models depending on the sampling techniques used to train and test the algorithms. In the case of k-fold cross-validation, one should consider that the Student t-test, Wilcoxon Test, or any other test that require the assumption of independence between the samples should not be used [28]. The k-fold cross-validation uses the same data to train different classifiers, therefore the observations in each fold are not independent, as consequence the metrics will depend on the folds making the use of Student t-test inappropriate. One of the proposed solutions is to still use the Student t-test, but change the training process of the model using a 5 × 2 cross-validation, based on five iterations of two-fold cross-validation [28]. Four widely used evaluation metrics were employed to assess the classifier's performance: Accuracy (A), Precision (P), Sensitivity (Sen), and Specificity (Spe). Classifiers with a good generalization ability are the ones with consistent results considering all metrics.

F. PROCESSING
In this phase, we employed diverse machine learning methods to accomplish the final differentiation between the set of functional parameters of the experimental and control groups. To compare the performance on binary classification, we implemented and trained models such as decision trees, driven by Gini's diversity index, and non-linear support vector machines (SVM), a robust classifier with polynomial and Gaussian kernels. Ensemble techniques were also tested, such as AdaBoost alongside a clustering method as K-means.
Decision trees are considered non-parametric supervised learning methods used for classification that apply inductive learning. A decision tree algorithm takes as input a set of attributes and returns a decision by learning simple decision rules inferred from the data features. The tree structure consists of nodes and edges that lead through attribute values to the final decision [29]. These algorithms are usually top-down and split variables at each step (searching for a local best), the differences between algorithms rely on alternative ways for measuring the homogeneity of the target variable within the subsets, such as the Gini Impurity measure: where p km is the proportion of a class k for a terminal node m. Decision trees are one of the most commonly used predictive algorithms in practice due to their simplicity and for being understandable, and interpretable. Moreover, it requires little data preparation, logarithmic cost, and the ability to handle multi-output problems. Support Vector Machines (SVM) are kernel machines able to represent complex and non-linear functions with successful applications for different fields [30]. For binary classification, SVM replaces non-convex and possible intractable classification loss with a convex surrogate loss such as the hinge loss (tractable) and only considers specific sets of functions namely kernels from Hilbert spaces [31]. The algorithm builds a hyperplane or set of hyperplanes, so that the instance with the largest separation between two classes is used on classification and regression problems, which is also called as large margin classification. Ensemble learning relies upon a collection, or ensemble, of hypotheses from the hypothesis space and average their results. This procedure should reduce the probability of misclassification and enlarge the hypothesis space [29]. Boosting techniques, such as AdaBoost [32], combine several weak learners into a strong one, creating the final model by using a weighted training set, in which each sample has an associated weight that indicates its importance during the learning process.
K-Means is a clustering algorithm proposed by Stuart Lloyd [33]. It corresponds to a data-driven technique characterized for separating samples in K groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares to maximize internal coherence on clusters (Equation 2). The algorithm requires the number of clusters to be specified and has been used across a large range of application areas.
where, n is the number of samples X of disjoint clusters C, and µ is the centroid of the clusters.

III. RESULTS
In our experiments, we used diverse preprocessing techniques and types of classifiers to differentiate functional features in older adults undergoing the before-mentioned protocols.
The results are presented in Figure 2 as population maps of the classifiers effectiveness. Such graphic representations are customizable grids of icons to visually depict equal parts of the performed experiments and compare raw data values. The squares are colored according to the metric values in terms of accuracy, precision, sensitivity, and specificity, determined after the experiments. Herein, we executed five trials for each model and each type of preprocessing strategy, corresponding to 40 execu-tions per evaluation time (T1 to T3), totalizing 120 experiments. Then, we divided the graphics into four parts for a simpler visualization. Table 1 presents the mean outcomes (±standard deviation) for each model. The graphic representations point out that none of the algorithms was able to achieve metrics higher than 80%. The applied algorithms also struggled to differentiate between the two groups, even when the difference between them was maximum after the intervention period at T2 (Fig. 2).
Individually, SVM was able to achieve a higher value for sensitivity for one population group, but all other metrics continued to report low metrics indicating poor generalization ability of the model. In general, AdaBoost presented the most consistent results regarding all experiments achieving 51% of accuracy, 51% of precision, 69% of sensitivity, and 38% of specificity. Other competing models presented similar discrimination power, including SVM and Decision Trees, but none of them was able to provide a considerable combination of specificity and sensitivity, detailed on Table 1.

IV. DISCUSSION
This study aimed to test the ability of some intelligent algorithms to identify and differentiate functional features in community-dwelling older adults undergoing two 24-weeks dual-task training protocols. Although our hypothesis that the intelligent algorithms used in our study would be able to identify and differentiate functional features of the participants undergoing these two dual-task protocol training, none of the applied intelligent algorithms provided a satisfactory combination of specificity and sensitivity to differentiate both training protocols.
Given the methods applied in our study, the advantages of support vector machines include effectiveness in high dimensional spaces, versatility by accepting different kernel functions, and the ability to learn with a small number of parameters. Even though SVM has been used to detect older adults with lower cognitive status based on gait features [34], this method performed below expectations despite the sensitivity rate above 80% at T2 (Table 1), indicating the proportion of true positives. Moreover, the modest performances of the algorithms corroborate our preliminary analyses by using univariate generalized linear mixed models, which evidenced neither interaction nor group effect. These univariate analyses revealed only a significant time effect for both groups. These results can be explained by the limited ability of univariate methods to recognize more complex interactions among a considerable number of outcomes.
On the other hand, the intelligent algorithms allow a sort of multivariate analysis, considering the interaction between multiple input features. Between the applied boosting techniques, Adaboost outperformed the decision tree being the most consistent overall. Herein, the procedure to reduce the probability of misclassification and enlarge the hypothesis space, provided by Adaboost, produced steady outcomes. Thus, the accomplishment of the AdaBoost identifying correctly around 70% of the subjects at T2, right after the intervention period, on three out of four population groups should indicate an effective tool for similar applications on distinct groups. At the second evaluation (T2), the clustering method K-means also presented a solid performance, yet slightly worse than Adaboost. However, such an algorithm requires less computational cost and no supervision during the learning process, posing as a reasonably good tool to solve the proposed task.
Comparing the population maps at T1 and T3, one can notice that the performances of the applied methods reduced after improving at T2, suggesting a regression to the initial state. This result indicates that the type of dual-task exercise adopted in both experimental and control groups impacted the EQUIDOSO-I study outcomes. Our results imply that, with the dual-task progression used in the EQUIDOSO-I study, the groups did not present enduring diverse functional development six months after the end of the training protocols, evaluated at T3.
In the first three months of intervention, both groups received DTVP training and, only after this period, the EG started to be trained with DTFP. Considering the greater complexity in performing activities under DTFP compared to DTPV [35], we hypothesized that the 12-week of DTFP would be enough to differentiate the functional performance from EG. Although the proposed algorithms have not presented high rates, this premise was confirmed by the robust outcomes at T2, especially the consistent sensitivity values attained by the AdaBoost algorithm.
The principle of training specificity supports our outcomes, given that adaptations resulting from exercise require specific tasks according to the expected result [36], [37]. These adaptations are strongly linked to the mode, frequency, and duration of the proposed exercise protocols [37]. To get functional improvement in some specific aspects we need to incorporate an exercise routine similar to the desired functional response [38]. However, we suppose that adopted protocols included a wide range of exercises rather than focusing on a smaller exercise set with a higher training intensity and frequency by training session. Hence, the algorithms used in this study did not provide a substantial specificity to differentiate both groups.
Additionally, we emphasize that in the EQUIDOSO-I study only older participants who lived independently in the community, with great balance performance (scores ≥ 52 points on the Berg Balance Scale), cognitive performance (score ≥ 24 points on the Mini-Mental State examination), and able to walk uninterruptedly for at least a distance of 60 m, at a self-selected speed of at least 1 m/s, without the 3070 VOLUME 10, 2022 assistance of another person, cane or walker were recruited. The preserved functional capacity of both groups at the study baseline might not allowed further improvement, regardless of the chosen dual-task training protocol (i.e., experimental or control). This phenomenon is known as the ceiling effect, when there is no room for improvement, at least with the therapeutic protocols adopted. Thus, the identification and classification of these two training protocols based on functional performance of the participants at the end of the study was not entirely possible.
The high proportion of older women can be considered a limitation in this study. However, a likely explanation for this would be the multifaceted phenomenon known as feminization of aging, which is generally associated with the fact that there are more women than men in the elderly population, especially in Brazil [25]. Several data sources suggest that women make higher use on average of primary care than men. For instance, this also happens in the United Kingdom (UK), where most health care is free of charge women consult their general practitioner (GP) more often than men [39]. A seminal study on women's health showed that treatment-seeking among women and higher levels of primary healthcare use among women is related to the fact that women are more likely to survive with disabling conditions following hospitalization [40]. Therefore, the feminization of aging affected the healthcare-service seeking in the city where this study took place. Thus, during the recruiting process of participants from the local community, most interested were female subjects. Given that, the conclusions of this study should be considered primarily regarding community-dwelling older women.

V. CONCLUSION
Generally, the intelligent algorithms used in this study showed reasonable measurement properties in identifying and differentiating the functional features in community-dwelling older adults undergoing two specific 24-weeks dual-task training protocols, especially the AdaBoost, which evidenced a sensitivity of roughly 70% in three quarters of the population maps at the intervention end.
Regarding the effectiveness of the adopted protocols and their lasting effects after six months, the algorithms' performances denoted a return of the observed features to the baseline, indicated by a similar accomplishment at T1 and T3. Hence, a more robust exercise period could provide longer-lasting effects on the sample. On the other hand, as a positive issue, these results allow healthcare professionals to diversify the dual-task training exercises by adopting both training protocols during their interventions.
Given the caveats of the present study, future works should focus on establishing distinct protocols to enhance functional aspects of older adults. Therefore, we plan to develop further studies considering gait features along with cognitive and functional data. Moreover, we also intend to enlarge the database, adding diversity to the sample and balancing the gender ratio to improve the algorithms' performances. RAFAEL CALDAS received the bachelor's degree in physiotherapy from the Federal University of Pernambuco, in 2011, the master's degree in systems engineering from the University of Pernambuco, in 2015, and the Ph.D. degree in natural sciences from RWTH Aachen University, in 2020. He currently works as a Postdoctoral Researcher with the Department of Computing Engineering, University of Pernambuco. His research interests include computational intelligence, biomechanics, and intelligent decision support systems.
FERNANDO BUARQUE (Senior Member, IEEE) received the Diploma degree in artificial neural networks from the Imperial College London, in 2002, and the Ph.D. degree in artificial intelligence from the University of London. He is currently an Associate Professor with the University of Pernambuco, Brazil, where he leads the Computational Intelligence Research Group. He is a member of the Computational Engineering Post-Graduate Program. His current research interests include computational intelligence (metaheuristics evolutionary, social and hybrid), modeling/simulation of stochastic and complexes problems, and intelligent decision support systems (including aspects of computational semiotics). He is a Humboldt Fellow. VOLUME 10, 2022