A Novel Sensor-Based Human Activity Recognition Method Based on Hybrid Feature Selection and Combinational Optimization

In recent years, sensor-based human activity recognition (HAR) has become a hot topic due to the advancement of sensing technologies, wireless communication technologies and nano-technologies. Since the sensor signals are usually non-stationary and quite noisy, both selecting the discriminant feature representations and finding out the optimal parameters for recognition algorithm play an important role for the enhanced performance and robustness of an HAR system. However, most of the previous research focused on one of them ignoring their interactions. Very few studies focused on these two aspects simultaneously. Considering the two factors separately may lead to inferior HAR performance. This paper presents a novel HAR framework which can optimize the feature set and the parameters of recognition algorithm synchronously for robust and optimal system performance. A new hybrid feature selection methodology using game-theory based feature selection (GTFS) and binary firefly algorithm (BFA), called GTFS-BFA, is proposed. GTFS-BFA is a hybrid methodology combining evidence from both filter and wrapper feature selection methods. It consists of two phases, namely pre-selection phase and re-selection phase. Pre-selection phase relies on game-theory-based filter method, while the re-selection phase uses binary firefly algorithm (BFA) as a wrapper method. The popular and efficient algorithm kernel extreme learning machine (KELM) is utilized as a classifier. The experimental results indicate that the proposed method can obtain better comprehensive performance in terms of four performance measures through a comparison to other existing methods on daily activity dataset from five body positions.


I. INTRODUCTION
In recent years, as the development of inertial measurement unit (IMU) sensors and wireless transmission technology, human activity recognition (HAR) has become a promising research area in academic and application fields. HAR technology is an effective way to achieve better information interaction between humans and the external environment. It can determine the type of activity and show it by displaying The associate editor coordinating the review of this manuscript and approving it for publication was Xi Peng . text or animation, which can help us to make efficient decisions for future human activities. The vision-based HAR systems often suffer from privacy and insufficient illumination [1]. Besides, the vision-based HAR systems can only monitor users in a specific area, which greatly limits their actual use. The wearable sensor-based system is another ideal choice for HAR, which has the advantages of being light and compact. The gap between low-level sensor data and high-level meaningful applications can be reduced by sensor-based HAR system. For example, sensor-based HAR systems have been widely adopted for elderly care fields [2], VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ such as fall detection and assisted living. Besides, for healthcare services, sensor-based HAR system can provide effective information for health conditions of human beings and some diseases, such as Parkinson's and stroke [3], are related to the mobility of the human body. Doctors can detect and assess the rehabilitation of some diseases by using the information from HAR. In addition to these, sensor-based HAR plays an important role in physical training, such as swimming [4] and badminton training [5]. Previous works on sensor-based HAR can be roughly divided into two categories according to the sensor hardware systems. The first one is the body sensor network-based approach [6] and this technique combines sensor information from different positions of the body for HAR. However, excessive sensors on the body will cause inconvenience to the person's daily life, especially in long-term monitoring. Moreover, this approach requires the extra cost of the equipment. The other HAR approach is based on a single sensor [7], [8], which is typically an accelerometer or other inertial sensor mounted at the waist or other positions to recognize activities. Compared with body sensor network-based approach, this approach is low-cost and less intrusive. Therefore, more and more researches have been focused on single sensor-based HAR systems.
Various features have been utilized in sensor-based HAR and these include the traditional time and frequency domain features such as mean, median, standard, Fast Fourier transform, spectral energy, etc. These features are effective for linear signals but sometimes are not helpful because of the non-stationary of the activity data. Therefore, time-frequency domain-based features such as Hilbert-Huang features [9] and EEMD-based features [10] have been proposed as important feature vectors in HAR. However, if all the extracted features are employed for HAR, there will be some redundant and irrelevant features, which could increase the computational cost and reduce the performance of the HAR system.
Selecting representative features which characterize different activities rather than adopting all features is very important in enhancing the HAR system performance. The current feature selection methods can be broadly categorized into filter and wrapper based methods [11]. The filter based methods use the statistical characteristics of the training data to evaluate the importance of features. Then several most important features are selected as a feature subset. This type of methods has the advantage of efficiency, but they are less effective because they ignore the influence of a classifier [12]. The wrapper based methods select the feature subset by utilizing the performance of a classifier. So, the wrapper based methods can achieve high recognition performance for the particular classifier. However, they have high computational cost and are time-consuming [13]. As mentioned above, both filter and wrapper based methods have their advantages and disadvantages. Hence, a hybrid method combining the advantages of filter and wrapper based methods could improve the performance of feature selection.
Recently, various efficient filter based methods have been proposed and have been widely utilized in dealing with high-dimensional feature selection problems. However, most of these methods tend to ignore the relationship between features. These methods only focus on the features that have strong discriminatory power as individuals and neglect the ones that have strong discriminatory power as a group. Recently a novel filter-based method inspired from cooperative game theory has emerged in feature selection and is named game-theory based feature selection [14]. Its powerful feature selection ability has been demonstrated in many real-world applications [10], [15]. Accordingly, we can first apply GTFS to select some features with the highest weights as candidates for the wrapper program. In this way, the raw feature data with high dimensionality can be pre-optimized into low-dimensional feature dataset, which can lower the computational burden of the wrapper process. However, the classifier in the wrapper based methods has an important influence on the selected features and the performance of HAR. Therefore, constructing an effective classifier and reselecting features in the wrapper process is crucial for the HAR system performance.
Kernel extreme learning machine (KELM) is a popular machine learning method, which has comparable performance to ELM [16]. It has many unique advantages such as extremely fast learning speed and good generalization performance. Hence KELM is very suitable for HAR. However, the establishment of a KELM classifier requires proper selection of the kernel function, the kernel parameters, and the soft margin constant. Appropriate parameter selection is important for improving KELM performance. There have been many studies on the optimization of classifiers using optimization methods such as genetic algorithm and particle swarm optimization [17]- [19]. However, the optimization of features and setting the classifier parameter are usually done separately in the wrapper phase. To obtain the optimal recognition performance of HAR, both feature subset and KELM parameters must be optimized simultaneously. Performing these two aspects separately in the wrapper phase may not lead to optimal recognition performance.
Increasingly evolutionary algorithms and swarm intelligence algorithms have been employed to solve the feature selection problem. Firefly algorithm (FA) was initially proposed by Yang [20] in 2009 and shows the superior optimization ability in various applications. Specifically, some studies have demonstrated the superiority of the FA against the genetic algorithm (GA) and particle swarm optimization (PSO) [21]- [22]. Therefore, a novel hybrid feature selection method named GTFS-BFA is proposed in this paper to optimize the HAR system performance. The proposed method includes combinational optimization of the classifier in the wrapper-based feature selection phase. Firstly, the GTFS is applied to select some important features beneficial to recognition. However, it is less effective without considering the influence of a classifier. Hence, in the next wrapper phase, these features are used as candidates, which will be further optimized by the BFA. Moreover, in the wrapper phase, the classifier is optimized to fit the selected feature set. The novelty and main contributions of this paper are highlighted as follows: (1) Activity recognition framework: considering feature subset selection influences the appropriate classifier parameters and vice versa, this paper presents a novel HAR framework that can optimize the feature set and the parameters of the recognition algorithm synchronously. This helps optimize the feature selection and the parameters of the recognition algorithm effectively and achieve robust and optimal system performance. In addition, it is a general framework that can be applied to various classification tasks.
(2) Feature selection method: a novel hybrid feature selection method based on GTFS and BFA is proposed to improve the recognition accuracy and efficiency of recognition. While existing studies have applied the filter or wrapper method for feature selection in HAR, limited work has investigated the hybrid of both methods in the literature of HAR. Moreover, our proposed GTFS-BFA method is different from the previous studies in terms of its filter selection criteria as well as the search implementation in the wrapper process. The hybrid approach GTFS-BFA proposed in this work has a novel contribution to the literature of HAR.
(3) Experimental evaluation: we utilize the acceleration data from five body positions to comprehensively analyze the effectiveness of the proposed scheme and compare it with six well-established counterparts in the literature. The proposed scheme is shown to have a better performance against six well-established counterparts in terms of feature optimization and combinational optimization. This provides a clear indication that the proposed scheme can be a promising alternative for HAR.
The paper is organized as follows. Section II presents some previous HAR works on feature selection and classification algorithm. Section III presents the related algorithms including GTFS, BFA and KELM. Section IV gives the proposed HAR approach, details the feature extraction and hybrid feature selection, as well as illustrates the combinational optimization phase. In Section V, the experimental setup and performance measures are introduced. Section VI presents the experimental results on data from different positions. Some concluding remarks are drawn in Section VII.

II. RELATED WORKS
Some advanced works have been reported on HAR by optimizing the original high dimensional feature set. Feature transformation methods such as LDA [23] and KFDA [24] have been proposed to reduce the feature dimension while also enhancing the distinguishing ability of feature vectors. A game theory-based feature selection method is applied in HAR by Wang et al. [10]. The game-based theory is a mathematical method that describes the phenomenon of conflict and cooperation based on entropy and mutual information theory. Experimental comparisons with the ReliefF and minimumredundancy maximum-relevance verify the effectiveness of the proposed approach. Ghasemzadeh et al. [25] proposed a power-aware feature selection method for mobile-based HAR. To reduce computation complexity, integer programming and greedy approximation approaches are utilized in the method to optimize feature set. Experimental results on data collected from real subjects demonstrate the effectiveness of the proposed method. Considering the influence of the orientation, placement, and subject variations on HAR performance, a method based on coordinate transformation and principal component analysis (CT-PCA) was proposed in [26] to realize location-adaptive activity recognition. Wang et al. [27] proposed a hybrid feature selection method to reduce feature dimensions for HAR. This method combined the traditional feature selection methods based on filter and wrapper. However, only the data from the waist was utilized to verify the proposed method. Moreover, un-optimized classifiers may affect the performance of the HAR system.
Recently, a large number of classification algorithms have been utilized for HAR. The classification algorithms such as SVM [28], artificial neural network [29], k-mean clustering [30] and decision tree (DT) [31] are widely applied in HAR. Besides, new branches of machine learning, such as deep learning [32]- [34] and ensemble learning [35]- [37], have also shown their merits in HAR. However, the deep learning-based approaches require a huge dataset for model training, which may not be applicable in actual scenarios. Besides, the high computational load of deep learning makes it unsuitable for real-time human activity detection. The ensemble learning-based approaches can increase the robustness and accuracy of the recognition system. However, there are some weaknesses in establishing an ensemble learning-based recognition system, such as the difficulty of generating fully independent base classifiers and choosing the suitable base classifier.

III. PRELIMINARIES A. GAME-THEORY BASED FEATURE SELECTION
Mutual information (MI) is used to indicate the degree of correlation between two random variables, which can measure both the linear relationship between variables and the nonlinear relationship between variables. The joint probability distribution of two random variables X and Y is p(x, y) and the mutual information of these two random variables I (X ; Y ) can be defined as follows: The conditional mutual information is the MI of the two variables {X , Y } given a discrete random variable Z , which VOLUME 9, 2021 can be expressed as follows: In the game theory, Shapley value is utilized to measure the powers of game players. In our method, we regard the feature as the player and the feature subset as the coalition. By introducing Shapley value to evaluate the weight of each feature, we can select each coalition as a candidate subset for the final best feature subset. The Shapley value can provide a fair and efficient way to estimate the features' importance corresponding to the contribution of the features, while considering their possible intrinsic and intricate correlative interactions. The relevance, redundancy and interdependence of features can be considered by Shapley value, which is an effective method to measure features. It is formulated as following: and where m represents the number of players and the sum extends over all subsets L of M not including player i. In this paper, the weight of the feature f i is expressed by the Shapley value and the function i (L) can be redefined associated with feature information: This equation means that the Shapley value of the feature f i is positive only when the feature is related to the coalition and interdependent with at least half of the L features. In the formula (4), (i, j) indicates the interdependence between features, the interdependence between features means that each feature in the relationship cannot function when it is separated from one another, that is, the influence of each feature on the recognition performance cannot be ignored and replaced. Suppose both features f i and f j are in the feature set and interdependent, the correlation between the feature f j and the target class C can be increased conditioned by f i , that is, the two features f i and f j are interdependent if the following formula is satisfied.
where C means the instance class. Therefore, the (i, j) can be defined as follows: Formula (6) can guarantee the selected features are relevant to the target class C and interdependent on each other, which ensures that redundant features will not be selected.

B. BINARY FIREFLY ALGORITHM
As a population-based stochastic global search algorithm, FA simulates the behavior of fireflies of approaching brightness. For any two fireflies, the less bright firefly is attracted and move to the brighter one. The FA was originally proposed to solve the continuous optimization problem and recent studies have shown the competitiveness of FA in various applications. Specifically, the superiority of the FA compared with other optimization methods has been demonstrated by various studies [21], [22], [38], [39], which motivated us to utilize it in the wrapper-based feature selection phase.
The FA search process is closely related to two important aspects: the variation of brightness and formulation of the attractiveness. The attractiveness of fireflies is proportional to their brightness and the attractiveness decreases as the distance between any two fireflies increases. Another important factor is the absorption coefficient which affects the attractiveness. The brightness of a firefly can be expressed as: where B 0 is the original brightness, γ is a light absorption coefficient which is always a fixed value. r is the distance between any two fireflies. The distance r between two fireflies at the positions x i and x j is calculated by using Euclidean distance. This can be represented by: where x i,k is the kth component of the spatial coordinate x i of the ith firefly. As a firefly's attractiveness is proportional to the brightness seen by another firefly, the attractiveness A of a firefly can be expressed as: where A 0 is the attractiveness when the r = 0. The firefly i with a lower brightness is attracted by the firefly j with a higher brightness. The movement of a firefly is formulated by: where α is a randomization parameter and rand is a random number uniformly distributed in [0, 1], r ij represents the Euclidean distance between the ith firefly and the jth firefly, and d is the iteration index. As recommended by previous works [20], [22], in this paper, we set γ = 1, A 0 = 1 and α ∈ [0, 1]. When utilizing formula (10) to calculate the movement of ith firefly to jth firefly, the position of firefly changes from a binary vector to a real-valued vector. In order to obtain the binary positions of fireflies, a probabilistic rule based on a hyperbolic tangent sigmoid transfer function is applied, which is shown in (11): where rand is a random number uniformly distributed in [0, 1] and S(·) is the hyperbolic tangent sigmoid transfer function.

C. KERNEL EXTREME LEARNING MACHINE
Extreme learning machine (ELM) is a single hidden layer feed-forward neural network, which has many advantages such as fast training speed and excellent generalization ability. All these advantages make it successfully applied to HAR research. KELM extends ELM from explicit activation to implicit mapping functions. Some studies demonstrated it has better generalization performance than the traditional ELM algorithm. KELM is described as follows: T is the corresponding class label. All samples belong to m different classes, and the ELM mathematical model with L hidden neurons can be expressed as: where g(x) is the excitation function, w i , b i , and β i are the vector of input weights, hidden layer bias and vector of output weights of the ith hidden neuron node respectively. Equation (12) can be written in matrix form: where β represents the output weight, T is the corresponding coding class label, and H is the hidden layer output matrix: Since the activation function in the output layer of ELM is linear, the vector of output layer weights, β, is obtained by the following equation: where H † is the generalized inverse matrix of H. In order to further improve the generalization ability of ELM, Huang et al. [40] introduced a kernel function to avoid the problem of ELM with randomly generating input weight and bias value. The calculation formula of KELM output weight is as follows: where C is regularization coefficient. The output function for the SLFN is: where h(x j ) is the output of the hidden nodes and actually maps the data from input space to the hidden layer feature space H. When the hidden layer function h(x j ) is unknown, the kernel function matrix is calculated as follows: where K (x i , x j ) represents the kernel function. In this paper, the most commonly utilized Gaussian kernel function was applied. The form of RBF kernels is as following forms: When using RBF as the kernel function, two major parameters applied in KELM are C and γ . In order to achieve a higher classification accuracy, it is necessary to search for the optimal C and γ . Then the output function of KELM can be written as:

IV. THE PROPOSED GTFS-BFA BASED HAR FRAMEWORK A. OVERVIEW OF THE PROPOSED FRAMEWORK
The flowchart of the proposed HAR framework is shown in Figure 1. Using the original high-dimensional feature set to train the classifier will take a lot of computational resources. Therefore, in the proposed approach, the importance of each feature with GTFS is first evaluated to effectively reduce the large feature space. However, the obtained feature ranking does not consider the interaction between features. Then the optimal combination of feature set and parameters of KELM are obtained by using the BFA to maximize the recognition accuracy. The feature re-selection and classifier optimization are conducted synchronously in the wrapper-based phase. Lastly, the optimal KELM and feature set are utilized in HAR. The proposed HAR approach includes original feature extraction, GTFS-based feature pre-selection, BFA-based feature re-selection and KELM parameter optimization and activity recognition. The details of the proposed HAR approach is described as below.
(1) Original feature extraction: since traditional classification algorithms are not suitable for the time-series sensor data, this type of data needs to be divided into segments before extracting features. Sliding window techniques, including sliding window with overlapping and sliding window without overlapping between two consecutive windows, are widely adopted and has been proved effective. Then features are extracted from these sliding windows to construct the training and testing sample dataset. The raw feature vector is with high dimensionality and it not only includes relevant features for HAR but also irrelevant and redundant ones. Feature selection once generally cannot obtain representative features that characterize the activity type, which will lead to poor classification accuracy. Therefore, the feature pre-selection phase and re-selection phase are considered in this paper. (2) Feature pre-selection phase: using the original high-dimensional feature set as the inputs of the classifier would not only consume a lot of computation time but also decrease the recognition performance. As a filter-based method, GTFS has the merit that it takes less computational resources, so in this phase, the GTFS is utilized to calculate the feature weights and output an optimized feature set, which will reduce the dimensionality of the feature vector and preselect some top-ranked features advantageous to classification. While it is less effective without considering the influence of a classifier. Therefore, the preselected features will be provided to the next wrapper phase. (3) Feature re-selection phase and KELM parameter optimization: compared with the filter-based method, the wrapper-based method is more effective because it selects the optimal feature subset with the evaluation of a classifier. Therefore, in this phase, BFA is utilized to optimize the feature set and classifier parameters synchronously. Then the optimal feature set and optimal KELM model with the highest training accuracy will be obtained. (4) Activity recognition: in the testing phase, the corresponding optimal feature set from the testing high-dimensional feature vector is selected as the input of the optimized KELM. Eventually, the testing dataset is used to verify the proposed HAR approach.

B. ORIGINAL FEATURE EXTRACTION FROM ACCELERATION SIGNAL
Since the raw acceleration data are noisy and not representative of different activities, features from the raw data are more discriminative representations compared with raw acceleration data. Many features including time domain and frequency domain have been proven to be effective for HAR. For example, the signal magnitude area of acceleration can be utilized to recognize walking and fall. Frequency domain features show the distribution of signal energy, which help recognize dynamic activities from static ones. In this work, 24 features including time and frequency domain features from three-axis acceleration data are extracted as listed in Table 1.

C. FEATURE PRE-SELECTION PHASE USING GTFS
There are uncorrelated or redundant features in the original training dataset that do not contribute to the recognition accuracy. Furthermore, if the original high dimensional features are utilized as inputs of classifier directly, then high computational cost will be incurred. Therefore, it is necessary to optimize the optimal feature subset to achieve good recognition performance and reduce computational cost. GTFS can evaluate the weight of each feature and select features which have strong discriminatory power as a group. Higher discriminatory features will be preselected by GTFS to form a feature subset. Thus, the pre-selected feature set can be re-selected in the next wrapper phase to obtain the optimal feature set. Before the pre-selection phase, feature values are normalized to the range [0, 1] to eliminate the influence caused by different dimensionalities and orders of magnitude. The normalization equation is as follows: where w i is the normalized value. w i is the value of the original feature, min(w i ) and max(w i ) are minimum value and maximum value of feature w i respectively.

D. BFA ENCODING FOR COMBINATIONAL OPTIMIZATION
Filter-based GTFS method is used for the sake of reducing the search dimension and improving computing efficiency as there may still be redundant features in the feature set. In order to obtain the optimal feature subset that can be utilized to improve the performance of HAR, the wrapper method is introduced as a re-selection phase. The features pre-selected by GTFS are utilized as candidates which are further optimized by the BFA. Besides, considering the robustness of the HAR system, BFA is also utilized to simultaneously search for the optimal KELM parameters. In this phase, the two parameters of RBF kernel function C and γ and the best feature subset will be optimized. By applying the BFA for optimizing feature set and recognition algorithm simultaneously, a population of fireflies are utilized to search in the solution space. The BFA performs through updating the individuals from one iteration to another. To initialize the BFA optimization algorithm, each individual is coded to represent the feature selection state and the values of parameters C and γ . In our method of initializing BFA, the coding state of each firefly is divided into three parts, which include the states of feature selection and the value of parameter C and parameter γ . The firefly encoding in our method is illustrated in Figure 2.
The selection state of features selected by GTFS is represented by bits of the coded sequence in each firefly. Consequently, N p bits are utilized to represent the feature selection state in a candidate. The bit value ''1'' indicates that the feature is selected while the value ''0'' indicates that the feature is abandoned. For parameter C and γ , N c and N g bits are utilized to represent their values respectively. The following formula is utilized to calculate the decimal value of the parameter from the value of each bit: (22) where N is the number of bits, x d is the decimal value of C or γ , i is the bit index, bit(i) is the value of the ith bit, x dmin and x dmax are the lower and upper bounds of the searching interval of C or γ respectively.

E. FEATURE RE-SELECTION AND COMBINATIONAL OPTIMIZATION FOR KELM PARAMETERS
After initializing the BFA optimization algorithm, it is used for searching the optimal feature subset and the KELM parameters. In the BFA iteration, the bits of each individual represents a reselected feature subset and the value of the KELM parameter. The fitness value of each individual is calculated by formula (23) to obtain the optimal result in the iteration process.
where fit is the fitness value, A tr is the training accuracy, N s is the number of selected features and α is the weighting factor whose value is very small, such as 0.01. As the objective is to obtain higher recognition accuracy with fewer features, the fitness function of the firefly considers the training accuracy and the number of features. This selection criterion finds the combination of features and parameter values for the goal of achieving the highest classification accuracy with a number of features as less as possible. Figure 3 illustrates the proposed synchronous of feature re-selection and KELM parameters optimization with BFA. As the iteration continues, the position value of all fireflies will be updated until the required number of iterations is reached. Finally, the re-optimized feature set and the optimal KELM parameters with the highest training recognition accuracy are obtained. In the testing phase, the corresponding features are selected from the testing data and the activity type can be obtained by using these features and the optimized KELM.

V. EXPERIMENTAL SETUP AND ACCELERATION DATA ACQUISITION A. DATASET
In order to evaluate the performance of the proposed approach for HAR, we acquired the dataset by utilizing the TRIG-NOTM wireless system from Delsys Company which contains a base station and collection nodes. The collection node integrates a triaxial accelerometer, which has an acceleration range of ±6G with resolution = 0.016 (G is the gravitational constant). The TRIGNOTM wireless system has wireless transmission function and the acceleration signal can be transmitted to the base station from the collection nodes. Once received by the base station, the data can be transmitted and stored in the computer. The datasets were collected when each of the ten volunteers aged between twenty and forty-five performed activities with five collection nodes respectively attached to the chest, waist, left wrist, left ankle and right arm. Accordingly, the dataset from five different positions of the body can be obtained. Figure 4 presents the fixed positions of the collection nodes and the data collection process. Before the start of data collection, we utilized straps to fix the sensors on the body and checked the sensors were in the same position as the previous subject. The triaxial accelerometer worked at a sampling frequency of 150 Hz. As we mainly focus on the recognition of daily activities, the task on this dataset is to distinguish six basic daily activities which include five dynamic activity walking (W), going upstairs (GU), going downstairs (GD), running (R), jumping (J) and one static activity standing (S). Figure 5 shows the triaxial accelerometer data of different activities from the left ankle. After data acquisition, sliding windows are used to divide the acceleration signal into segments. The window length is 300 data points and adjacent windows contain 50% overlap.

B. PERFORMANCE MEASURES
To evaluate the effectiveness of the proposed method and show its superiority over the comparative methods, we compare them in terms of the number of selected features and obtained recognition performance which will be measured by the following four measures: The accuracy measure is used to evaluate the performance of the proposed method, which can be expressed as: where TP, TN, FP, and FN, respectively, represent the number of true positive, true negative, false positive, and false negative outcomes in a given experiment.
Precision and recall are defined as measuring the recognition rate of records correctly classified from a class of total positive records and the recognition rate of records correctly classified from a class of total true samples in a class, respectively. precision = TP TP + FP (25) recall = TP TP + FN (26) In addition, F1 evaluation criterion is also considered. F1 is defined as the combination of precision and the recall, which is defined as follows:

VI. EXPERIMENTAL SETUP AND RESULTS
The leave-one-out (LOO) strategy is utilized to train the classifier and recognize the activities. The validation is repeated ten times with data from each person used exactly once for testing. The final results are their average values. In the combinatorial optimization phase, we set the maximum iterations as 100 and the population size of fireflies as 30, absorption coefficient γ = 1, attractiveness β 0 = 1, and the stopping criterion was set as follows: the number of iterations reaches 100 or there is no improvement in the fitness for 10 consecutive iterations. As discussed in the former section, RBF kernel function is used in KELM classifier and parameter C is limited in the interval [0.001, 100] encoded by 8 bits for each individual. Meanwhile, the parameter γ is limited in the interval [0.1, 500] encoded by 16 bits for each individual. All our experiments were carried out in MATLAB 2014a using a desktop with a 3.2GHz processor and 8G memory storage.

A. THE PERFORMANCE OF THE PROPOSED METHOD ON DATA FROM DIFFERENT POSITIONS
To validate the effectiveness of the proposed HAR approach based on GTFS-BFA feature selection and combinatorial optimization, it is compared with several other HAR approaches in this study. Since the proposed feature selection method contains two phases of convergence, the number of selected features in the pre-selection phase is related to the performance of GTFS-BFA. Experimental data obtained in the pre-selection phase is also worthy of observation. Figure 6 shows the performance with sequential feature subsets in the GTFS-based filter phase. It is important to note that the results are only based on the feature set selected by GTFS. As shown in Figure 6, too small number of features will not benefit the recognition accuracy, which is expected. Then, the accuracy increases as the number of features increases. However, when the number of features reaches a certain value, the accuracy of recognition will stop rising or even decreasing. This further proves that more features are not better, and there will be redundant features in the set that will pull down the recognition results. Therefore, the number of preselected features should be kept within a reasonable range. According to the data obtained in experiments, in the feature pre-selection phase, the highest accuracy always appears when the feature number is between 25 and 35. Therefore, we select the 30 features with the highest rankings as the candidates for feature reselection phase. Accordingly, we set N p = 30 bits in the firefly encoding to represent the state of feature in the reselection phase.
In this subsection, we show the performance of the proposed feature selection approach for HAR. In order to present an intuitive impression of the performance of the proposed combinational optimization method compared with original features, GTFS method and feature optimization, Figure 7 shows the performance comparison in terms of accuracy, precision, recall and F1 of these four methods on the data from five positions. In Figure 7, the codes ''W'', Additionally, in order to gain a better insight into the activity recognition problem and the proposed feature selection method, the corresponding confusion matrixes are constructed, which are shown in Tables 2 to 6, respectively. According to the results, we can observe that the proposed method can distinguish dynamic activity (walking, running, jumping, going upstairs and downstairs) from static activity (standing) with a high accuracy. For example, for the position of waist and left ankle, only 3 and 4 dynamic activity samples are misrecognized as static activity (standing) respectively. Furthermore, for the position of waist and chest, the proposed method makes 4 and 5 recognition errors out of the 591 test samples respectively when recognizing the static activity (standing).

B. COMPARISON WITH OTHER STATE-OF-THE-ART APPROACHES 1) PERFORMANCE COMPARISON OF FEATURE OPTIMIZATION
To further demonstrate the effectiveness of the proposed GTFS-BFA approach for HAR, it is compared with other five existing state-of-the-art approaches including MBACO [41],     ReliefF-BPSO [42], ReliefF-GA [43], CMIM-GA [44], and MPI-FA [38]. All these heuristic methods are the binary searching algorithms. In order to demonstrate the performances of each algorithm empirically, the same population size and the same number of iterations are set for these binary searching algorithms, other parameters are set as described in their respective references. For ReliefF-BPSO, ReliefF-GA, CMIM-GA and MPI-FA, the filter phase selects the same number of features with the proposed GTFS-BFA. The MBACO method selects features from the original feature set. The four performance measures and the number of the selected features (d) are utilized to show the performance of the six methods. In this section, we only verify the performance of these methods in optimizing features without considering the combinational optimization of the features and parameters. Tables 7 to 11 present the performance comparison of different methods for the data from five body positions, respectively.
As can be seen from the results, the proposed GTFS-BFA method has the best optimization ability in feature selection for HAR, its average performance is the best for data from the five positions, and the number of the selected features (d) is in the range from 10 to 16, which is obviously fewer than other methods. For the data from the waist, the proposed GTFS-BFA achieves 95.32% accuracy with average 10 features, which is better than the other five algorithms. In particular, the MBACO method selects the most features   compared with other methods but the performance is very poor. This demonstrates that combining filter and wrapper phase helps to improve the performance of feature selection.  In addition, it can be seen from the results that the MPI-FA and the proposed GTFS-BFA select fewer features compared with BPSO and GA based methods when the number of features in the filter phase is the same. This demonstrates that FA has better optimization capability for features in HAR compared with BPSO and GA.

2) PERFORMANCE COMPARISON OF COMBINATIONAL OPTIMIZATION
In this subsection, we analyze the performance of combinational optimization of the proposed and comparative methods for HAR. This helps us to understand the superiority of combination optimization compared to feature optimization and verify the effeteness of the proposed method for HAR. To demonstrate the performances empirically, the parameters of all the methods are set as the same as the previous section. Tables 12 to 16 respectively show the performance comparison of different methods for the data from five body positions when combinational optimizing the feature and classifier is considered.  It can be seen from Tables 12 to 16, the proposed GTFS-BFA also outperforms the other methods when combinational optimization is considered. In addition, all the methods have improved performance compared with the results of only performing feature optimization, which demonstrates the combinational optimization for optimal parameters and feature subset at the same time can improve the performance and robustness of the HAR system. For example, the average recognition accuracy using the proposed GTFS-BFA has reached 98.69%, 98.38%, 95.49%, 92.25% and 90.81% for the data from five positions, respectively, which are all higher than using GTFS-BFA for feature optimization. Moreover, the number of features utilized by the proposed GTFS-BFA for combinational optimization will not increase significantly. For example, with the same number of features, the proposed GTFS-BFA achieves 98.38% and 95.49% accuracy on data from the chest and right arm, which is obviously better than 93.62% and 91.63% accuracy when only feature   optimization is performed. In brief, the proposed GTFS-BFA has superior optimization ability for features and parameters of HAR.

VII. CONCLUSION
An effective and robust HAR framework, which can conduct the feature selection and classifier optimization synchronously, is proposed to achieve better recognition performance. The proposed framework is composed of a GTFS-based feature pre-selection phase and an FA-based combinational optimization phase. The filter-based GTFS is firstly employed to eliminate irrelevant and redundant features, to form a reduced input subset. The wrapper-based combinational optimization phase is then applied to reselect the feature subset and find out the optimal classifier parameters synchronously. Several experiments with data from five body positions have been conducted to analyze the effectiveness of the proposed method. Experimental results have shown that the HAR performance of the proposed GTFS-BFA based approach is superior to other well-established counterparts. Therefore, the proposed GTFS-BFA method can be a good alternative for feature selection in HAR. The solution proposed in this paper has the following applications: first, the proposed HAR framework with accurate activity recognition benefits the design and development of human-centric applications such as assisted living system, rehabilitation system and fall detection system, etc. Secondly, we tested the effectiveness of the proposed feature selection method GTFS-BFA in identifying six daily activities, but it is actually a general function that can be applied to other situations, such as other classification and regression problems.
In future work, since the inertial sensor fusion provides the mechanism to estimate orientation and rotation of movement, we plan to combine various sensors to identify more kinds of activities. For example, the combination of gyroscope, magnetometer and acceleration can be used to distinguish the activity of similar pattern such as upstairs, downstairs and cycling. Another future work is to apply the proposed GTFS-BFA approach in other related areas and compares it with other effective feature selection methods.
YIMING TIAN received the B.S. degree in automation from Hebei University of Technology, Tianjin, China, in 2012, the M.S. degree in control science and control engineering from the University of Science and Technology Liaoning, Anshan, China, in 2015, and the Ph.D. degree in the control science and control engineering from Hebei University of Technology, in 2020.
He is currently a Lecturer with the College of Information Engineering, Tianjin University of Commerce. His research interests include neural networks, wearable sensors, pattern recognition, and bioinformatics.
JIE ZHANG (Senior Member, IEEE) received the B.Sc. degree in control engineering from Hebei University of Technology, Tianjin, China, in 1986, and the Ph.D. degree in control engineering from the City, University of London, London, in 1991.
He is currently a Reader with the School of Engineering, Newcastle University, Newcastle upon Tyne, U.K. His research interests include neural networks, neuro-fuzzy systems, intelligent control systems, genetic algorithms, optimal control of batch processes, and multivariate statistical process control. He has published over 180 articles in international journals, books, and conferences.
Dr. Zhang is a member of the IEEE Control Systems Society and the IEEE Computational Intelligence Society. He served as a Reviewer for many prestigious international journals, including IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, IEEE TRANSACTIONS ON FUZZY SYSTEMS, Neural Networks, Automatica, Chemical Engineering Science, and AIChE Journal.
LIPENG LI received the master's degree from Hebei University of Technology, in 2015.
He is currently an Experimenter with the College of Information Engineering, Tianjin University of Commerce. His research interests include the application of Internet of Things, pattern recognition, intelligent mobile robot, and artificial intelligence control.
ZUOJUN LIU received the M.S. degree in control science and control engineering from Hebei University of Technology, Tianjin, China, and the Ph.D. degree in control science and control engineering from Nankai University, Tianjin.
He is currently a Professor and a Ph.D. Supervisor with the School of Artificial Intelligence and Data Science, Hebei University of Technology. His research interests include rehabilitation aids, intelligent robots, and intelligent control theory and application. VOLUME 9, 2021