A Machine Learning Approach to Perform Physical Activity Classification Using a Sensorized Crutch Tip

In recent years, interest in monitoring Physical Activity (PA) has increased due to its positive effect on health. New technological devices have been proposed for this purpose, mainly focused on sports, which include Machine Learning algorithms to identify the type of PA being performed. However, PA monitoring can also provide data useful for assessing the recovery process of people with impaired lower-limbs. In this work, a Machine-Learning based Physical Activity classifier design procedure is proposed, which makes use of the data provided by a Sensorized Tip that can be adapted to different Assistive Devices for Walking (ADW) such as canes or crutches. The procedure is based on three main stages: 1) defining a wide set of potential features to perform the classification; 2) optimizing the number of features by a Random-Forest approach, detecting the most relevant ones to classify five relevant activities (walking at a normal pace, walking fast, standing still, going up stairs and going down stairs); 3) training the ML-based classifiers considering the optimized feature set. A comparative analysis is carried out to evaluate the proposed procedure, using three ML-based classifier (Support Vector Machines, K-Nearest Neighbour and Artificial Neural Networks), demonstrating that the proposed approach can provide very high success rates if proper feature selection is carried out. This work presents four relevant contributions to the PA monitoring area: 1) the approach is focused on people that require ADW, which are not considered in other approaches; 2) an analysis of the features to characterize gait in people that require ADW is carried out; 3) a design procedure to optimize the number of features using a Random-Forest approach is used, avoiding a typical “brute force” procedure; and 4) a comparative analysis is carried out to demonstrate the validity of the approach.


I. INTRODUCTION
Lower-limb mobility plays an important role on autonomy and quality of life. Neurological diseases or trauma injuries that affect the mobility of the lower-limb have a great impact on the lives of people suffering from them. Hence, trying to fully or partly recover this function is one of the main goals when designing a rehabilitation strategy for these patients [1].
The associate editor coordinating the review of this manuscript and approving it for publication was Tyson Brooks .
In order to be effective, rehabilitation interventions must be adapted to the status of the patient during the whole rehabilitation process [2]. This also includes the selection of the assistive device that better fits patient needs according to her/his functionality. If the patient has lost the ability to walk autonomously, the use of wheel chairs or scooters is the better option, while crutches or canes are typically used when the gait function is maintained. Hence, therapist are required to assess patient status periodically to monitor the evolution on the status of the patient. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Patient assessment is typically performed using the data collected through tests carried out in clinical settings. However, monitoring the types of Physical Activity (PA) the patient performs throughout the day is becoming increasingly important in the functional assessment of patients, due to the well-known benefits it has on their health and its contribution to the prevention of non-communicable diseases [3], [4]. It also allows interpreting the results of the periodic clinical tests and giving individualized recommendations and feedback on how much and how to perform these activities in order to aid in the recovery process.
In order to perform PA identification, three main steps are typically carried out: 1) data related to the patient is captured by a monitoring device; 2) a set of features that allow characterizing PA is extracted from the raw sensor data; and, 3) the set of features are processed by a classifier, which detects the particular PA being executed. The main works related to each step will be summarized next.
Regarding the data capture and monitoring, different technological solutions have been proposed [5]. The most popular ones are wearable devices, which have to be attached to specific places of the lower-limb of the patient, and typically capture motion data using Inertial Measurement Units (IMUs) [6]- [10] or biomedical signals such as EMG [11], [12]. A number of commercial devices exist on the market, such as XSens [13], BioStampRC [14], Tracmor [15]), FlexiForce [16], BioCapture [12]). These solutions require to be properly placed and attached to the limbs, and may generate rejection on patients. In order to reduce the impact of monitoring devices the use of the integrated sensors of smartwatches and mobile phones has been proposed [17]- [20]. These latter devices do not have a specific placement in the body, but, on the other side, this positioning flexibility and the variability introduced by parasitic motions are issues to be considered when processing the data.
Once the raw data has been captured by the monitoring device, the second step is to extract a set of features that will allow to characterize the different PA. For this purpose, the use of time-series segmentation using time-windows is a common approach, as it allows to reduce the number of data to be processed [21], [22]. In the case of gait monitoring, the selected window typically matches a step. Hence, features of different nature that allow to characterize each step can be extracted from these windows. Statistic (mean, standard deviation,. . . ) [10], [15], [23]- [25], frequency [19], [26] or phase [9] operations are typically applied to the captured variables for this purpose. In addition, in the particular case of gait, features such as the average speed, time between steps or the number of steps [27] have also been proposed. It is to be noted that there is no standarized approach to define these features, and that in general, a brute force approach is used in which a wide set of features is defined so that the classifier to be designed has enough input data to perform its job.
Finally, in the third step, using the set of selected features, the PA identification or classification is performed.
Machine Learning (ML) techniques such as K-Nearest Neighbour (K-NN) [10], [24], [28], Support Vector Machine (SVM) [29]- [31] and Artificial Neural Networks (ANN) [15], [32]- [34] are the preferred solution for gaitrelated PA classification due to their flexibility and capability of generalization, which provide acceptable results with a success rate up to 91% [15]. Note that all these approaches are of supervised nature, and require a set of properly designed training data in which the selected features are the input, and the type of PA to be identified are the outputs. In the case of gait-related PA the proposed classifiers typically identify if the patient is walking (at an habitual or normal speed or faster, i.e. running), going up and down stairs or standing still [9], [15], [28], [32], [35].
The three-step procedure detailed previously provides a general methodology for PA classification. However, it is to be noted that there is no standarized approach to be followed in each step, and that different open research areas still exist. In particular, the feature selection procedure is typically carried out using a brute force approach, in which a wide set of possible features are proposed as inputs to the ML-based Physical Activity classifier, so that it can have enough data to perform the classification. This approach, however, leads to non-optimal classifiers, which typically use more features than required leading to oversized solutions, as the relative importance of each feature is not usually analyzed.
Moreover, all the aforementioned works are designed for people that do not require Assistive Devices for Walking (ADW) such as crutches or canes. However, several parameters change significantly in the case of people that use ADW, as they present non-symmetrical gait and parameters such as the load applied to the ADW might be relevant. These differences have to be considered in the three-step procedure. Recent works have demonstrated that patients that require ADW in their rehabilitation process require specific monitoring approaches [36], being sensorized ADW devices the best option for this population [37]- [41]. Hence, the set of features to be defined also has to consider ADW data.
Based on the previous analysis, in this work, a novel approach for the development of Physical Activity classifiers for patients that require ADW is proposed. The proposed approach aims to give some insight into the previously cited issues, with four relevant contributions: 1) The approach is focused on people that require ADW, which are not considered in other works; 2) A comprehensive set of features to classify five relevant types of Physical Activity is proposed and analyzed ; 3) A Feature Selection methodology based on a Random-Forest approach is proposed; and, 4) A thorough comparative analysis using three ML approaches (K-NN, SVN and ANN) is carried out to validate the proposed approach.
The rest of the work is structured as follows. Section II presents the Sensorized Tip and its sensorization capabilities. Section III details the set of tests carried out to generate the database used to develop the ML-based PA classificators. In Section IV a thorough analysis of the potential features proposed in the literature for gait monitoring is carried out, and the proposed methodology to select the most relevant features and train three different ML-based PA classificators using K-NN, SVN and ANN approaches is detailed. Finally, in Section V, a comparative analysis is carried out to evaluate the approach. Finally, the most important ideas are summarized in Section VI.

II. SENSORIZED TIP FOR GAIT MONITORING
In order to monitor the performance of people that require ADW, different approaches can be used, as analyzed in Section I. Wearable devices, although widely used, present some drawbacks for this population, as they may generate rejection due to the need of attaching the sensors to the limbs, and do not consider the interaction force between the ADW and the patient, which provides relevant monitoring data. Smartphones and watches, on the other hand, present parasitic motions that have to be considered.
Hence, several works have proposed to sensorize ADW, providing a noninvasive approach that provides accurate measurements of both ADW motion and interaction force [37]- [41]. In particular, in this work the Sensorized Tip proposed in [41] (Figure 1) is used to capture gait data. Different from the other cited approaches in which a sensorized crutch or cane is designed, the proposed Sensorized Tip can be attached to the personal crutch or cane used by the patient, which is typically adapted to his/her needs. The Sensorized Tip integrates three sensors in its aluminum enclosure. A 9 degrees-of-freedom Inertial Measurement Unit MTi-3 by XSens provides linear acceleration data, angular speed and magnetic field in the local (x, y, z) axes. In addition, this device integrates a proprietary algorithm based on a Kalman filter that allows to estimate the rollpitch-yaw Euler angles in the global reference frame (Roll and Pitch dynamic error of 0.5 • , and Yaw dynamic error 1 • ). The aforementioned data can also be used to estimate the anteroposterior and lateromedial crutch angles (see Figure 2). A BMP280 barometer provides information on atmospheric pressure, which allows to estimate the relative height of the device (relative precision of 0.12hPa). Finally, a C9C piezoelectric force sensor by HBM, with 1 kN range, provides information on the axial load exerted by the patient. The overall weight of the Tip is 160g.
The 16 sources of data provided by the aforementioned sensors are captured by a nRF52832 microprocessor, which adds a timestamp and sends the processed data with a 20ms period to a mobile phone device using the Bluetooth Low Energy (BLE) protocol. The data is stored in the phone using a self-developed app, so that it can be processed later. The capturing system is powered by a standard 5V powerbank, which is placed externally to the Tip in order to minimize the weight of the device ( Figure 1).
The full characterization of the measurement errors and the integrated algorithms for the Sensorized Tip can be found at [41].

III. DATA BASE FOR CLASSIFIER DESIGN
In order to develop a Physical Activity classifier using Machine Learning approaches, a proper data base is required, in which the selection of the types of PA to be identified is a key issue.
As analyzed in Section I, when considering gait-related PA classifier five types of PA are typically considered [9], [15], [28], [32], [35]: walking at a normal pace; walking at a fast pace (approximately 30% faster than normal pace); going up stairs; going down stairs; and standing still. The identification of these types of PA will allow to monitor the activity of a patient through its daily life, defining patterns of activity, VOLUME 8, 2020 sedentariness, etc,. . . This data can be used by the therapist to provide individualized recommendations or to detect possible modifications in the patient functional status [26], [42].
In order to capture relevant data for the classifier design, a total of five tests have been carried out using a crutch in which the Sensorized Tip detailed in Section II was attached: • Walking 30m in a straight line at the normal speed. • Walking 30m in a straight line at a speed higher than normal (approximately 30% faster).
• Standing still for approximately 10 seconds.
• Going up an 11-step flight of stairs.
• Going down an 11-step flight of stairs.
The tests were carried out by 11 healthy volunteers from the research group of the authors (4 women and 7 men, ranging between 24-48 years), at the facilities of the Faculty of Engineering of Bilbao UPV/EHU. Each test was repeated three times for each volunteer.
In order to generate the database, a segmentation procedure was followed [28]. This procedure is carried out by considering each cycle of use of the crutch, which is composed by a stance phase (in which the crutch is in contact with the ground), and the swing phase (in which the crutch is lifted though the air and no contact exists). This way, the raw data provided by each sensor is divided in sequential windows, each associated to a crutch cycle. The initial point of each window is defined at the very first start of the stance phase, in which the crutch tip contacts the ground. This can be easily detected by considering the force sensor signal, as seen in Figure 3, as no force exist in the swing phase. The total number of segmented windows generated in the aforementioned tests are summarized in Table 1. Note that in the case of Standing Still, the aforementioned approach is no longer valid, as no crutch cycles exist. In these scenarios a virtual step is considered as a fixed segmentation window of 1.8s, which is slightly longer than the average cycle time for the cycles considered in the walking at normal pace scenario.
Once the database is defined, it will be divided into two balanced sets (Training and Test), as required by the design procedure of supervised ML-based approaches [43]. The Training set will be used to train the proposed ML-based PA classificators. For that purpose, a balanced set has been defined, with approximately the same number of windows considered for the different identified types of PA. This allows to train the classifier with the same relative importance for each type of PA. The Test set, in the other hand, will be used to test the designed classifiers. Hence, Test data will not be used in the PA classifier design procedure, but for the validation analysis carried out Section V. Note that in this latter case, a balanced window selection has also been carried, so that the tested classification success rates can be similar in nature for each type of PA to be identified [44], [45].

IV. MACHINE LEARNING-BASED PA CLASSIFIER DESIGN METHODOLOGY
The use of segmentation allows to define discrete units of data, one for each crutch cycle, from which a series of features can be extracted. These features, which may be diverse in nature (statistical, frequency based,. . . ) can be used to characterize each cycle, and be used as inputs for the PA classification system to be developed. In this section, a methodology is detailed to select the most appropriate features and design the ML-based PA classificator.
The proposed methodology is summarized in Figure 4: First, a set of potential features based on the ones proposed in the literature is proposed (Section IV-A). This set is defined with a high number and variety of features, so that the maximum amount of information can be considered. Then, in a second step, a Random-Forest approach is used to determine the relative importance of each feature, allowing to order the potential feature set considering the relevance of each feature (Section IV-B).This ordered set will be used to design the ML-based classifier. In a third step, the set of optimal hyperparameters will be calculated for each set of n features to be considered as inputs. Finally, using the selected hyperparameters and the set of n features selected (based on their relevance), the ML approach will be trained (Section IV-C). An analysis and evaluation of the procedure will be carried out in Section V.

A. POTENTIAL FEATURES SET GENERATION
Features are related to the data sources available, as they are used to extract, using a simpler metric, a particular characteristic of the signal contained in the segmented window. For the particular case detailed in this work, 17 sources of data are considered based on the data provided by the Sensorized Tip (Section II): 9 associated to the raw IMU data (x, y, z components of acceleration, angular speed and magnetic field in the local axes); 5 related to the processed IMU data (RPY Euler Angles, and crutch anteroposterior and lateromedial angles); 1 related to the force sensor value, which is filtered; and, 2 associated to the barometer signal (filtered and unfiltered).
For each segmented window, the time evolution of these 17 data sources can be processed to extract a feature. This is carried out by applying an operator, which may be of different nature (statistical, time-based,. . . ). Although the particular case of people that require ADW has not been analyzed in the literature, based on the operators proposed in the related works and the clinical experience of the authors, the following set of operators are proposed: • Statistic-based operators: They are widely used in gait characterization works, as they are easily applied to any data source. Mean value, standard deviation, variance, kurtosis, correlation coefficients XY (i.e., between X and Y signals), percentiles, area under each curve and interquartile ranges [15], [23]- [25] have been selected to be applied to the data provided by all sensors. In the particular case of correlation coefficients, the correlation between the different angles/axes values provided by a sensor are considered, i.e. correlation between the accelerometer x and y signals, correlation between roll and pitch Euler Angles, etc. • Time-based operators: Measuring the time between specific events allows to obtain spatio-temporal features.
In the case of ADW, cycle time, this is, the time between consecutive starts of the stance phase, allows to define speed-related features [27]. The use of the ADW can also be defined by comparing the relative percentage of the cycle time the patient uses the device for support, this is, the time of the stance phase with respect to the cycle time (Stance Phase %) [11]. By combining the set of data sources and the defined operators, a full set of 176 features can be defined. All are summarized in Table 2, where an X defines a feature (or features) that has been obtained by applying a particular operator (row) to a data source (column). Note that this set of 176 features is extracted for each ADW cycle, following the segmentation procedure detailed in Section III.

B. FEATURE SELECTION USING RANDOM-FOREST APPROACH
The aforementioned set of 176 features can be used to develop ML-based PA classifiers. This way, the set of features will be considered as the input to the classifier, which will identify a type of PA for each ADW cycle as seen in Figure 4.
However, this brute force approach, which is typical in the works cited in the introduction is not an efficient one. First, a high number of features increases the computational cost of the classifier. Second, the feature selection impacts the performance of the classifier, as some features may be not be related to the types of PA considered, or even some are correlated one with the other. Hence, in order to optimize the PA classifier design, proper feature selection approaches must be used.
Detecting the best feature set to design an PA classifier is not a trivial task. In recent years, Machine Learning approaches have demonstrated their ability to analyze the relative importance of different features when analyzing a classification or regression problem. One of the most interesting approach in this field is the Random Forest (RF) [46], [47] approach, which consists on the generation of a wide set (forest) of different decision trees for classification purposes. The trees are generated using a set of random samples and features, so that in the training procedure, different features can be tested. This technique has been used in different application fields such as diagnosis [48], mineral process industries [49] or DNA analysis [50], to estimate the relative importance of each feature. This way the most relevant ones can be identified, and the ones that are redundant or unimportant eliminated.
Hence, in this work, a Random-Forest approach is proposed to analyze the relative feature significance to the PA classification. For that purpose, only the samples contained into the Training Set defined in Section III have been used. The proposed RF has been implemented using Matlab's Statistics and Machine Learning Toolbox [51] and experimentally tuned considering the following set of hyperparameters: the number of trees in the forest has been tuned to 5000; a sample with replacement strategy has been selected; a node size of 1 was defined; the number of variables randomly chosen at each split (mtry) has been tuned to √ M , where M is the total number variables; and the predictor used has been the interaction-curvature to avoid the disturbances caused by correlated features.
The obtained results from this procedure are summarized in Table 3, in which all the potential features have been sorted in decreasing order of decreasing relative significance according to the RF approach. Note that the RF approach orders the features by considering their relative weight or contribution to the desired classification process, being the Area Under the Curve of the Yaw angle and the Cycle Time some of the most relevant features for the proposed study-case.
It is to be noted that if all weights for the 176 features are analyzed, all present a positive weight with the exception of the last two, related to the Barometer Interquartile Range. This means that following the RF analysis, the features with positive weight contribute (or add information) to the PA classification. However, the relative importance of the most significant one Area Under the Curve Yaw is more than 50 times higher with respect to the less significant ones. Hence, designing an PA classifier using only some of the most relevant ones should provide better results than the use of the less relevant ones. In the next section, a comparative analysis will be carried out to analyze the effect of the proposed feature selection.

C. CLASSIFIER HYPERPARAMETER SELECTION AND TRAINING
Once the potential set of features has been ordered according to its relevance, a subset of n features can be selected to design a PA classifier. The goal of the classifiers is to be able to detect five relevant PA: Walking normal, Walking fast, Going up and down stairs and standing still. Hence, all classifiers will be implemented with 5 outputs/classes, one associated to each PA type.
In this work, the three most commonly used approaches in related works have been selected, so that a comparative analysis can be carried out in the next section: Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) and Artificial Neural Network (ANN).
As detailed in Figure 4, for a given set of n relevance-ordered input features, first the optimal subset of hyperparameters for each ML-based classifier is to be calculated. For that purpose a K-fold cross-validation approach is proposed with K = 5 [52]. This approach allows to effectively evaluate different ML-based models. Note that for this purpose only the data from the Training Set defined in Section III. is used. Once the best hyperparameters have been chosen, these are used to train the ML-approach using supervised methods and the Training Set data.
It is to be noted that in the case of the SVM and K-NN, Matlab's Statistic and Machine Learning Toolbox integrates VOLUME 8, 2020 the aforementioned steps, optimizing the related hyperparameters (Kernel functions, number of neighbours,. . . ) [51]. For the case of the ANN, the authors have ad hoc programmed the hyperparameter selection. In this latter case, a single hidden layer Multi Layer Perceptron (MLP) ANN has been selected, with 5 output neurons (one for each PA), a number of inputs equal to the n feature set to be processed and m hidden layer neurons with hyperbolic tangent sigmoid activation function. The number of hidden layer neurons m has been considered as the hyperparameter to be tuned using the aforementioned procedure, with m ranging from 1 to 10 neurons, since experimental tests have determined that ANN with 10 or lower neurons provide good results. Once the best (higher success rate) value for m has been selected, a Bayesian regularizationbased training algorithm is used to train the ANN.

V. COMPARATIVE ANALYSIS
In this section, a comparative analysis is carried out considering the features selected in Section IV-B. The aim is to: 1) analyze the best approach for the proposed PA classificator application; and 2) analyze the validity of the feature selection approach in different ML-based classification approaches.
Note that all the ML-based classifiers analyzed in this section have been trained following the methodology proposed in the previous section.

A. ANALYSIS OF THE EFFECT OF THE NUMBER OF FEATURES CONSIDERED FOR CLASSIFICATION
In order to analyze the effect of the number of features considered, a comparative analysis is carried out considering the features defined in Table 3. This way, each ML-based classification approach proposed previously is trained with 176 different feature sets following the procedure detailed in Section IV-C. These feature sets are defined incrementally considering the n most relevant features. This is: in the first set, only the most relevant feature is considered; in the second one, the two most relevant features are considered; while in the last one, all 176 potential features are considered.   approach. This success rate is defined as the percentage of PA samples of the Test Set whose type the classifier identifies correctly with respect to the total number of PA samples in the set. Note that the samples in this latter set have not been considered in the training procedure, so that the results can be used to analyze also the generalization capability of the approaches.
As it can be seen, if the seven most significant features are considered, a success rate percentage of over 90% can be achieved in all cases (92.8% for the K-NN, 97% for the SVM and 96.8% for the ANN). This value increases up to 97% if the nine most relevant features are selected for all approaches.
The general tendency is that a higher number of features considered allows better classification. A maximum success rate of 98.4% (66 features) for the K-NN, 99.1% (87 features) for the SVM and 99.6% (174 features) for the ANN is obtained. Note that the small oscillations are due to the randomized nature of the ML approaches training, with a success rate variation in the range from 7 to 176 most relevant features of 2.8% in the case of the ANN and 4.6% for the SVM.
There is an exception in the case of the K-NN approach, as the percentage of success decreases slightly when the number of features is higher than 119, reaching a value lower than 96% (92.8% with 147 and 160 most relevant features).
The results confirm that if a proper feature selection is carried out, a small set of features can be used to design the ML-based PA classificator, as the effect of increasing the number of features is small in the total success rate of the classifier. Moreover, this has an impact on computational cost. As previously stated, a K-Fold cross-validation procedure has been used to calculate the best configuration of hyperparameters for each feature set. For instance, in the particular case of the ANN the obtained optimal number of hidden layer neurons is summarized in Figure 6 for each feature set. It can be seen that although a lower number of neurons (5-6) is required for small values of n, the number of neurons stabilizes with a mean of 9 neurons. Hence, selecting a moderate number of features (for example the 7 most relevant ones) also leads to smaller ANN and lower computational cost.
Finally, in order to illustrate the classification capabilities of the ML-based PA classifiers, a particular example of the classifiers performance is shown in Table 4, where the Confusion Matrices for all classifiers when all features are considered are shown. In this particular case, the overall performance of the K-NN is 96.1%, SVM performance is 96.8% and in ANN 99.6%. However, it can be seen that the K-NN has a problem classifying Walking Normal case, as up to 20 samples are identified erroneously as Walking Fast and Going Up Stairs. The same effect is seen in the SVM's Confusion Matrix. The ANN outperforms the previous approaches, obtaining better results.

B. ANALYSIS OF THE EFFECT OF FEATURE SELECTION
In order to emphasize the importance of the feature selection procedure, the procedure defined in the previous section has  been repeated from the less significant feature to the most significant one. This is, 176 sets of features have been analyzed: the first set has considered only the less significant feature; the second, the two less significant features; and so on. As previously detailed, for each set of features, the optimum hyperparameters have been tuned, by the use of a K-fold procedure. For the particular case of the ANN, an average of 7.99 neurons with a standard deviation of 1.75 neurons have been obtained.
Results are summarized in Figure 7 for all proposed approaches. As it can be seen, the success rates evolution presents an increasing tendency. This is, as more significant features are added, the classifier quality increases. Hence, the success rate increases when adding more and more features, from approximately 22% to 99%.
Note that this is a very different evolution compared with the one analyzed in the previous section (Fig. 5). In the previous case, with few of the most significant, success rates up to 95% could be achieved, while in this latter case, a greater number of features are required to achieve the same performance: 86 for SVM, 68 for ANN, and almost all features for K-NN. This emphasizes the need of correctly selecting the features for designing PA classifiers.  The relevance of correctly selecting the features is also demonstrated in Figure 8. As analyzed in the previous subsection, the seven most significant features provide acceptable success rates for the classifier (over 92%). Hence, all proposed approaches have been evaluated by considering sets of 7 features. This is, the first 7 features have been evaluated first, then the next 7 and so on, ordered from the most significant ones to the less ones. As previously detailed, for each set of features, the optimum hyperparameters for the three ML have been obtained in a first step (for the particular case of the ANN, an average of 9.28 neurons with a standard deviation of 1.24 neurons have been obtained). After training, the resulting classifiers show that a variation of more than 50% on the performance of the classifier can exist depending on the set of features considered.
In summary, the aforementioned results demonstrate that: 1) a proper feature selection is mandatory when designing PA classification; 2) The proposed RF-based feature selection is an appropriate approach to optimize the number of features; 3) the ANN and SVM-based approach is the most stable classifier, although the K-NN approach can provide good results for the same number of features, however the best results are obtained from the ANN-based classifier; and 4) The inclusion of more and more features does not always imply an increase of the success rate, as proper hyperparameter selection is needed to handle all the input information.

VI. CONCLUSION
An individualization of rehabilitation therapies of people suffering from lower-limb impairment is essential during the whole rehabilitation process, specially those that require Assistive Devices for Walking (ADW). Recently, monitoring the types of Physical Activity (PA) carried out by the patients in their daily life has become an important source of information for this purpose.
In order to develop PA identification and monitoring, proper sensorized devices and processing algorithms are required. Typically wearable sensors have been proposed for this purpose, although they present limitations for people that require ADW. Moreover, in order to process PA data related to gait, a set of gait-based features is traditionally proposed, and brute-force approaches are followed to design the monitoring algorithms.
Different from other works, in this paper a novel approach for the development of PA classifiers is proposed. The approach makes use of a Sensorized Tip that can be fitted into the personal ADW of the patient. The 17 sources of data available are processed to define a set of 176 features that can be used to classify five relevant PAs (Standing Still, Walking Fast, Walking at a Normal pace, Going Up and Going Down Stairs).
In order to optimize the PA classificator, a Machine Learning approach, the Random Forest approach, is used to perform a feature selection. This allows to classify the features depending on their relative significance for the PA classification.
The approach is validated by implementing three different Machine Learning-based classifiers for PA: SVM, K-NN and ANN. Results demonstrate: 1) the validity of the proposed approach for gait monitoring; 2) the importance of feature selection when designing PA classifiers; 3) the validity of the feature selection approach, as with the identified seven most relevant features a success rate of 92-97% can be obtained, which is higher than the results offered by other related works (91%).
However, it should be noted that the development presented has been carried out with healthy people and in a laboratory-based testing. For this reason, future work will focus on studying classifier's performance in specific populations of people that require ADW, in order to analyze possible drawbacks of the proposed methodology in these cases. Moreover, more types of PA, such as going up and down slopes, different speeds, etc will be analyzed.
SERGIO LUCAS is currently pursuing the master's degree in industrial engineering specialized in automation and control. Since September 2018, he has been an Industrial Engineer with the University of the Basque Country (UPV/EHU).