Efficient Wi-Fi-Based Human Activity Recognition Using Adaptive Antenna Elimination

Recently, Wi-Fi-based human activity recognition using channel state information (CSI) signals has gained popularity due to its potential features, such as passive sensing and adequate privacy. The movement of various body parts in between Wi-Fi signals’ propagation path generates changes in the signal reflections and refraction, which is evident from the CSI variations. In this paper, we analyze the relationship between human activities and properties (amplitude and phase) of Wi-Fi CSI signals on multiple receiving antennas and discover the signal properties that vary remarkably in response to human movement. The variation in the signal received among multiple antennas shows different sensitivity to human activities, directly affecting recognition performance. Therefore, to recognize human activities with better efficiency, we proposed an adaptive antenna elimination algorithm that automatically eliminates the non-sensitive antenna and keeps the sensitive antennas following different human activities. Furthermore, the correlation of the statistical features extracted from the amplitude and phase of the selected antennas’ CSI signal was analyzed, and a sequential forward selection was utilized to find the best subset of features. Using such a subset, three machine learning algorithms were employed on two available online datasets to classify various human activities. The experimental results revealed that even when using easy-to-implement, non-deep machine learning, such as random forest, the recognition system based on the proposed adaptive antenna elimination algorithm achieved a superior classification accuracy of 99.84% (line of sight) on the StanWiFi dataset and 97.65% (line-of-sight) / 93.33% (non-line-of-sight) on another widely applied multi-environmental dataset at a fraction of the time cost, demonstrating the robustness of the proposed algorithm.

Human Activity Recognition (HAR) is one of the key factors in a smart home that monitors daily activities using a set of Internet of Things gadgets in the context of a smart home.Thanks to such monitoring, a smart house can provide occupants, especially disabled and aging citizens, with individualized home care services to enhance their quality of life, independence, and health.HAR has gradually become one of the most notable research topics in diverse fields because of its numerous applications, including fall detection [37], [54], elderly monitoring [36], context awareness [21], [34], driver behavior analysis [25], sports assistance [5], gesture recognition [39], edutainment [18], and information retrieval [12], [40], among others.The sophisticated design and suitable usage of sensors are essential to obtain highquality sensory observations that may be used to identify users' activities and behaviors [30].
Conventional HAR methods use different sensing technologies: vision-based [1], wearable-based [47], and radars [27].Vision-based is a prominent and first sensing technology where a camera is used to obtain human movement data from the neighboring environment, which achieves good performance but is susceptible to various factors, such as measuring volume, light and environmental conditions, and user privacy.These limitations significantly impact the outcome of vision-based HAR in many applications.In wearable-based HAR, signals are obtained from wearable devices and sensors, which has been widespread and fruitful.In addition, sometimes users need to carry different devices, for example, smartphones, smartwatches, and smart bracelets, to gather cumbersome human activity data, which is inconvenient, particularly for elders and disabled people.Moreover, activity recognition could be impracticable if the person does not have the required tools with them.Radarbased HAR collects data using specialized equipment, such as universal software radio peripherals, but its coverage area is limited.
Wi-Fi-based methods, hybridizing the following features, can be the emerging way to bridge the issues associated with vision, sensor, and radar sensing technologies [51].First, no camera/wearable and non-invasiveness enable uncomplicated data acquisition while preserving adequate privacy.Second, Wi-Fi signals have extensive transmission that can propagate through walls, furniture, and doors, ubiquitous in indoor environments as a reliable and vital source of information.Third, Wi-Fi signals are reflected by the human body [24], [48], whose changing patterns can be utilized for detecting different activities.Last but not least, the ease and efficiency of Wi-Fi signal acquisition allow it to be potentially applicable to real-time or online HAR [18], a task that often requires complex research when utilizing vision or wearable technologies, particularly against the capacity, speed, and accuracy of data transmission [33].
Currently, available Wi-Fi-based HAR uses two metrics to represent a Wi-Fi signal: received signal strength indicator (RSSI) and channel state information (CSI).RSSI represents coarse-grained information and has been used primarily for indoor localization [20] and HAR [13].RSSI signal analyzes changes in the received signal's strength but fails to detect the changing signals generated by a moving person.As the gap between human movement and the receiver antenna increases, the RSSI signal, which measures changes in the strength of the received signal, becomes erroneous, affecting the overall performance of the system [35].In contrast, CSI measures fine-grained signals and is applied to a variety of tasks, including handwriting recognition [15], position estimation [49], fall detection [53], and micromovementnt identification [50].CSI signals expediently capture the propagation signal information between the transmitter and receiver antenna pairs at a particular carrier frequency.As summarized in [37], generic human activities can be classified based on CSI signals using time-frequency features such as torso and limb velocities by characterizing the changing speed of the reflected path length in subcarrier amplitudes.Such velocities are similar to those obtained from Doppler radars without movement direction.The critical insight of HAR through the Wi-Fi CSI signal is that the amplitude and phase of the CSI signal generated by human movement or object differ from the regular measurement without movement, through which we can analyze the received signals, in particular their amplitude and phase information, to recognize human activities.Furthermore, existing research demonstrates that CSI signals perform better in complicated situations than RSSI (see Section II) because, in interior environments, the amplitude and phase information provided by CSI can clearly distinguish between moving and steady signal patterns.This article applies Wi-Fi CSI signals for the HAR study.
There are challenges associated with employing CSI signals to recognize human activities.The relationship between various human movements, surrounding environments, and antenna placement results in different antenna sensitivities to different activities.Antennas are easily affected by external factors due to changing environment and different human activities.Hence, in this article, an adaptive antenna elimination algorithm is proposed to eliminate nonsusceptible antennas and keep the most sensitive antenna signals related to human activities.Moreover, several statistical features are computed from each selected antenna's amplitude and phase.The correlation of the extracted features is analyzed, and to reduce the data overfitting, the wrapper feature selection technique has been used to find the most informative features.
Three machine learning (ML) models are trained with the optimal feature set on two open datasets, evaluated by a tenfold cross-validation.The experimental results demonstrated that the proposed HAR performs better than or on par with state-of-the-art methods for recognizing human activity with respect to accuracy, precision, recall, and F1-score, with a significant enhancement of efficiency and simplicity.The main contributions of this article are: • We proposed an adaptive antenna elimination algorithm to automatically eliminate non-sensitive antennas and keep critical antennas, which reduces unnecessary data and enables efficient HAR; • Unlike previous studies that solely employ either amplitude or phase information, we jointly utilize the amplitude and the phase information in the CSI sequence to enhance the recognition accuracy; • Our proposed system, erected on a set of experimentally selected superior features, outperforms the existing state-of-the-art works on two widely-applied datasets in terms of its high-level activity recognition capability with a low time cost.

II. BACKGROUND AND RELATED WORK
Because of the availability and numerous advantages of Wi-Fi communication, Wi-Fi signal-based HAR methods have become popular recently.In this section, we review existing Wi-Fi-based HAR methods, where those existing methods are divided into RSSI and CSI.

A. HUMAN ACTIVITY RECOGNITION BASED ON RECEIVED SIGNAL STRENGTH INDICATOR (RSSI)
RSSI signal-based HAR approaches rely on the change in the received signal strength introduced by different human movements.Sigg et al. [43] designed a HAR approach in which a mobile phone collected RSSI signals to recognize human activities.They acquired data from the three cases of a mobile phone: lying on a table in an empty room, lying on a table while a subject is moving around the room, and being held and managed by a subject.They extracted and selected features, based on which the system achieved recognition accuracy of 52% for 11 gestures and 72% for four chosen gestures.An online HAR system by Gu et al. [14] analyzed RSSI fingerprints from various human activities.Gu et al. [13] presented a unique recognition case utilizing the RSSI signal and developed a fusion approach with k-nearest neighbor to identify human activities at an average accuracy of 92.58%.A hardware device was applied in Sigg et al. [44] to obtain RSSI signals from the surrounding environment for the recognition of four different activities, including walking, standing, crawling, and lying down, which reported an accuracy greater than 80%.Youssef et al. [57] presented a localization system using an RSSI signal, which detects different environmental changes and tracks passive entities.
So far, RSSI has been less effective in complex operations since it only provides coarse information about channel variations and can often be influenced by multipath effects and noise [7], [51].

B. CHANNEL STATE INFORMATION (CSI)
CSI describes the channel characteristics of a communication link, referring to the propagation of the signal affected by human movement between the transmitter and receiver to indicate distance, scattering, the effect of power, and fading, among others.CSI can be applied in cognitive scenarios such as HAR, gesture recognition, and location tracking by virtue of its distinguished sensitivity to the environment.Modern wireless technology adopts the multiple-input multipleoutput (MIMO) system consisting of multiple transmitting and receiving antennas.Each transmitter-receiver antenna pair forms a communication channel to send adjacent information over the established channel using various modulation techniques, the most widespread of which is orthogonal frequency-division multiplexing (OFDM) that uses a MIMO channel's bandwidth to send information on several concurrent orthogonal subcarrier frequencies.Each of these subcarriers can be identified using CSI.In CSI metric/wireless systems, the MIMO-OFDM technique can be modeled as where H i ∈ C N R ×N T represents the CSI matrix of the i th subcarrier, v denotes the noise term, N denotes the number of OFDM subcarriers, and y i ∈ R N Rx and x i ∈ R N Tx are the i th received and transmitted signals, respectively.
where h jk i is the CSI of the i th subcarrier for the link between the j th transmitted antenna and the k th receiving antenna.
Using the CSI matrix H i , CSI describes the signal attenuation factor along each transmission line.This includes signal scattering, power decay with distance, multipath fading, and other details.The amplitude-frequency and phase-frequency properties of a signal are used by the channel frequency response to characterize the multipath propagation of the signal.The mathematical expression of the frequency response is given as where |h jk i | and ̸ h jk i denote amplitude and phase, respectively.

C. HUMAN ACTIVITY RECOGNITION BASED ON CSI
Compared to RSSI, which measures only power over the entire channel bandwidth, CSI provides a set of channel estimates for each subcarrier of each transmission link.AAE aspires to reduce complexity and improve training and recognition efficiency by eliminating the least sensitive antenna.Pursuing the model simplification is also a reasonable consideration in the spirit of further reducing complexity.We found that most Wi-Fi-based HAR studies are based on deep learning, which is known to be more complex and timeconsuming than regular ML in principle.Therefore, three non-deep ML models are applied in this article, aiming to • Confirm whether AAE is effective with plain ML models rather than deep learning.The latter might naturally lead to better results; • Provide a lightweight Wi-Fi-based HAR scheme, which not only minimizes redundancy in data utilization but also engages minimalism in the model, thus ensuring simplicity, low consumption and high efficiency, and partial interpretability.

III. DATASETS
Data corpus is an essential and critical resource in all aspects of ML and deep learning.We employed two publicly available Wi-Fi signal-based datasets for our HAR study: STANWIFI Yousefi et al. collected a dataset in an LOS environment of an indoor area [56].We refer to this dataset as StanWiFi, following the name given in [41] and [55].Six participants performed each of the six activities (Fall, Run, Lie down, Walk, Sit down, and Stand up) twenty times.The experimenters used one Wi-Fi router with one antenna as a transmitter and a laptop equipped with an Intel 5300 NIC with three antennas as a receiver to collect data.The receiver was located three meters from the transmitter, and the sampling rate was 1000 Hz.Each subject completed each session within 20 seconds, starting and ending in a stationary state.

MULTIENVIRONMENT
The dataset [3] was collected from three indoor environments.The provider did not name this dataset; for the sake of narrative, we will call it ''MultiEnvironment'' in this article.One of the LOS scenarios, the office, set the distance between the receiver and the transmitter at 3.7 meters.The NLOS scenario had a barrier (an eight-centimeter wall) between the transmitter and the receiver.Thirty persons were asked to perform five sessions: falling from a standing or sitting position, walking, sitting down and standing up from a chair, and picking a pen from the ground, each repeated 20 times.Subsequently, each session was divided into several activities.For example, falling from a sitting position was divided into three activities: sitting, falling, and lying.The authors of [2] and [4], who belong to the group of dataset owners, identified 12 classes from these five sessions and reorganized them into six labels for HAR, listed in Table 1.We followed the data provider's work by applying the six-category activity labeling in our study on both the LOS (office) and the NLOS scenarios.The total number of samples is 3,000 (30 subjects × 5 experiments × 20 trials).Two computers were used to collect data, one as a one-antenna transmitter and the other as a three-antenna receiver.

IV. METHODS
Figure 1 portrays the overall workflow of the proposed HAR system in this article, which consists of five primary aspects: data preparation, antenna analysis, signal processing, feature research, and modeling.

A. ANTENNA ANALYSIS FOR ADAPTIVE ELIMINATION
A MIMO system consists of multiple transmitter and receiver antennas.Both the StanWiFi and ''MultiEnvironment'' datasets have one transmitting and three receiving antennas.The Wi-Fi signals from the transmitting antenna can be reflected by the human body during the propagation route to the receiving antenna.In the context of the variety of human movements and the surroundings, antennas are sensitive to external information like human movement direction and the antenna's vertical dimension.As a result, receiving antennas have different susceptibilities to different activities, for which previous works concentrated on subcarrier selection and fusion techniques [46].However, subcarrier selection on nonsensitive antennas does not show any significance, which indicates that various antennas exhibit varying sensitivity to the same human activity.Figure 2 (Running) diagnoses that for the activity running in such a case, the first and second antennas are more sensitive, while the third does not show significant perceptibility.Similarly, the first and third antennas exhibit more sensitivity for walking, as Figure 2 (Walking) evinces.It can be witnessed that the amplitude signal of a sensitive antenna changes substantially, while that of a non-sensitive antenna keeps relatively stable.Hence, eliminating nonsensitive antennas should be a potential way to enhance HAR.
The signal that the non-sensitive antenna receives is severely corrupted by the surrounding noise and only vaguely depicts human activities.Possible reasons are a combination of factors such as environments, antenna positioning, and human body movement.Our goal is to distinguish the sensitive and non-sensitive antennas.Therefore, we proposed an adaptive antenna elimination algorithm that adaptively eliminates antennas based on their sensitivity to different human activities: 1) Let CSI a,s,p (i) denote the i th sample's CSI value in packet p of antenna a's subcarrier s.Antenna a's mean value sequence of all its subcarriers is computed as where in our applied datasets, a = 1, 2, or 3 and s = 1, 2, . .., 30, respectively.
105444 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.2) To distinguish the antenna's sensitivity to human activities, the sliding window-based standard deviation sequence of antenna a's µ a (p) is calculated with a window size l.The n th standard deviation value in the sequence is where in this work, l and the sliding step equal 3 and 1, respectively.Let σ a = [σ a (1), σ a (2), • • • ] denote the standard deviation sequence.3) Antenna a's sensitivity is the maximum minus the minimum of its [σ a ]: based on which the antenna with the lowest sensitivity should be eliminated.The procedure elucidated above provides a reasonable basis for selecting between sensitive and non-sensitive antennas, as displayed in Figure 3 (b), a visualization perspective preliminarily validating the possibility of eliminating nonsensitive antennas: The proposed AAE algorithm eliminates one antenna based on the lowest values and keeps the other two antennas, as resulted in Figure 3 (c).The pseudocode of AAE is given in algorithm 1.
In the current version of AAE, two of the three antennas are retained and one is eliminated.Questions may arise.What if all three antennas are sensitive, one is sensitive, or all are nonsensitive?More sophisticated selections can be made, but at the current stage they are not necessary because • If a selection algorithm screens out three non-sensitive antennas, no data is left to use.In this case, it makes sense to eliminate the least sensitive one and keep two to provide some helpful information; calculate the standard deviation σ (w(n)) (see Equation 5) Sensitivity a = max σ a − min σ a (see Equation 6) antennas are sensitive, it is evidenced from numerous signal visualizations that the three antennas have a high mutual correlation.Eliminating one should have minimal effect on the results.In our work, it even improves the recognition performance (see Table 7); • If a selection algorithm judges only one antenna to be sensitive, keeping one of the other two insensitive ones may provide additional information, evidenced in Table 7, while the negative impact of the least sensitive one is reduced; • Most importantly, excellent recognition rates were achieved in the experiments (see Sections V-D and V-E) using the current AAE algorithm that eliminates one non-sensitive antenna.Therefore, we do not propose a more sophisticated elimination scheme at this stage.This work aims to validate the novel idea of antenna elimination for HAR by confirming whether the classifier performs well when eliminating the most insensitive antenna.After successful validation, future tasks include designing reasonable threshold metrics to eliminate or retain more antennas, especially for datasets that contain a large number of antennas.

B. DENOISING, SMOOTHING, AND SEGMENTATION
Based on the AAE approach, the antenna that is calculated as less sensitive to human activities is eliminated.Since Wi-Fi-based CSI signals are intervened by signal attenuation, multipath propagation, and other environmental factors, it is necessary to remove noise, outliers, and other irrelevant information from raw CSI data to improve the credibility of activity-related signal patterns.The most common Butterworth bandpass filter denoises the raw CSI signal, expressed by the following equations: After denoising, some burrs on the human activity signal envelopes may still exist because of surrounding and equipment impacts, for which the Gaussian smoothing function further diminishes the interference of irrelevant information: The signal becomes cleaner after denoising and smoothing, which guarantees reliable feature extraction to enhance training and classification.Furthermore, segmentation separates signals into smaller pieces, commonly referred to as windows or frames, which helps resolve constraints during data preprocessing with the aim of training and recognition [23], [32].The first difficulty is that the data from different participants may have different lengths of recorded trials.Another problem is that processing a long time series consumes resources significantly.Our experiments applied a window size of 512 samples with a stride of 64 samples (12.5% overlap ratio) to segment the processed CSI signal.

C. FEATURE EXTRACTION, CORRELATION, AND SELECTION
Feature extraction plays a crucial role in ML.Publicly available code libraries, such as TSFEL [6], assist in the extraction of time-series features in the temporal, statistical, and spectral domains.For each window of 512 samples (see Section IV-B), 10 handcrafted feature types (see Table 2) were extracted on the amplitude and phase of each of the two selected antennas, bringing the dimension of the feature vectors to 40.
The extracted features related to every human activity are not always correlated with each other.Correlation is a metric which measures the similarity between two features.If two features are linearly dependent, the correlation coefficient value lies between -1 and +1, while no correlation is indicated with a zero.Pearson's correlation coefficients (PCC) [8] analyze the extracted features.
Figure 4 details the feature correlation matrix for both the amplitude and phase of the CSI signal.The matrix's entries display the PCC between each pair of features.Some features are highly correlated, suggesting that they should be essential for recognition tasks.Some earlier work used feature space reduction to enable the training and recognition of HAR with decreasing dimensionality [17].Feature selection is another way to comprehensively study the decline in the number of features for HAR, which cuts down on computational expenditures and potentially improves recognition performance.Additionally, it tends to decrease overfitting and facilitate data visualization.In this study, sequential forward selection (SFS) [9], a wrapper approach based on a greedy top-down search algorithm, generated the optimal feature set.The algorithm begins with an empty subset, adds the feature that resulted in the most significant improvement in each iteration, and halts when the performance of the validation set cannot be further improved.
The importance of features was computed in terms of weights using SFS, as illustrated in Figure 5. Seventeen features do not significantly affect classification (weight < 0.2), and the remaining features are considered advantageous.Empirically and experimentally, features with a 105446 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.weight greater than 0.4 perform well.We selected fourteen features from the two antennas, specifically, ten features from amplitude and four from phase, reducing the use of handcrafted features by 65% according to the feature selection experiments.

D. MACHINE LEARNING MODELS
The random forest (RF) is based on ensemble learning and the collection of decision trees constructed using portions of all data [28], forecasting hypotheses from each tree to finally predict according to the majority vote.Generally, RF minimizes the overfitting problem, and as the number of learners increases, the generalization errors decrease.
The support vector machine (SVM) is a set of supervised learning methods characterized by kernel functions [45], which works based on a hyperplane dividing an ndimensional space into different data classes.The maximal margin determines the ideal hyperplane, although the margin itself refers to the distance between two support vectors.The data points closest to the hyperplane are the support vectors and are referred to as critical points.
The K -nearest neighbor (KNN) stores all available data and classifies a new sample according to the K nearest samples by calculating different types of distances [10], such as the Euclidean.KNN handles both linear and non-linear scenarios, capable of unknown morphological data.
This work used all the default settings of the abovementioned three ML models in the Python package scikitlearn [38].

E. EVALUATION
We used a ten-fold cross-validation procedure, which randomly splits the whole dataset into ten non-overlapping parts.Each experimental session took nine folds for training, and the rest one fold for recognition and evaluation.We have evaluated the overall recognition performance of the proposed system using well-known performance evaluation metrics, such as accuracy, precision, recall, and F1-score, based on the number of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN):

V. RESULTS AND DISCUSSION
The results of the AAE-based HAR experiments on the two datasets were summarized and supplemented with a comparison of the state-of-the-art results.

A. EXPERIMENTS ON THE STANWIFI DATASET
Table 3 reports the experimental results of our proposed HAR system on the StanWiFi dataset, where the RF classifier achieves the highest performance with an average of more than 99.80% accuracy/F1-score/precision/recall.Among the ten-fold sessions, the first fold shows the lowest accuracy (99.76%), and the 7 th obtains all correct (100%).The SVM and KNN classifiers reach average accuracies of 97.20%±0.15%and 96.08%±0.49%,respectively.Figure 6 shows the confusion matrix of the RF classifier on the StanWiFi dataset.RF accurately classifies the activities  Run, Sit down, Stand up, and Walk without failure, whereas the recognition rate is over 99% for Fall and Lie down.

B. EXPERIMENTS ON THE ''MULTIENVIRONMENT'' DATASET
Table 4 and Table 5 record the experimental results the RF classifier in the LOS and NLOS scenarios, respectively, while Figures 7 and 8 convey the confusion matrices of the RF classifier for both scenarios.The LOS HAR tasks witness an over 97.61% overall accuracy/precision/recall/F1score, while SVM and KNN perform 95.10%±0.57%and 93.88%±0.55%recognition rates, respectively.
It is observable from Figures 7 and 8 that Sitting down/Standing up is the key factor pulling down the  global recognition rate, followed by Falling and Turning.Recognition errors for these three activities were mainly concentrated on No movement, for which a possible reason could be that there is a stationary state (''No movement'') of the body prior to performing Sitting down, Standing up, Falling, or Turning.The application of sequential modeling like HMM may enhance universal recognition accuracy [29], [31], which should be one of the potential follow-up studies.Furthermore, it should not be overlooked that the class Sitting down/Standing up combines two activities in opposite directions, which follows the data providers' labeling (see Section III) but undoubtedly increases recognition difficulty.A strong contrast is that the StanWiFi dataset does not merge these two activities, enabling both to be entirely correctly recognized (see Figure 6).Direction-related highlevel features may be beneficial for such a problem [16], [19].

C. PERFORMANCE COMPARISON AND DISCUSSION
Applying different models on the same dataset is a compelling criterion to judge.We compare the performance of our proposed system with state-of-the-art peer workpieces on the same two datasets, resulting in the statistics in Table 6.
The first issue to be noted overall is that [22], a recent publication, used more sophisticated deep learning, whose ten-fold cross-validated recognition accuracies are higher than this article by 0.04%, 0.55%, and 1.33% on Stan-WiFi, ''MultiEnvironment''-LOS, and ''MultiEnvironment''-NLOS, respectively.However, such results do not affect the success of validating the proposed AAE algorithm.With a slight loss of accuracy, AAE-based non-deep learning brings not only less complexity and more interpretability but also a considerable reduction of training and recognition time.As described in Section II-C, non-deep learning models are applied in this work to validate AAE's usefulness reasonably; hence, the almost on-par accuracies and significantly lower time cost compared to deep learning confirm AAE's effectiveness and contribution to efficient, low-consuming, lightweight Wi-Fi-based HAR.
Yousefi et al. [56], the provider of the StanWiFi dataset, applied RF, HMM, and LSTM based on feature extraction to validate the dataset, among which LSTM achieved the optimal accuracy of ∼90% according to [41] and [55].Since then, there have been several HAR studies on the StanWiFi dataset.In [7], the authors proposed the ABLSTM model that helps focus on essential features, which reached 97.30% accuracy on a random-selected ten-fold cross-validation.The advanced InceptionTime model, put forward in [55] and named CSITime, attained an accuracy of 98.00%.A lightweight HAR model (LiteHAR) [41] had a 93.00% accuracy rate on ten-fold cross-validation.In comparison, our work presented a promising recognition result of 99.84% better than or on par with most state-of-the-art works using the AAE-based RF classifier on the StanWiFi dataset, with significantly lower time consumption.
On the ''MultiEnvironment'' dataset, in addition to [22], there are two published referential results on the LOS scenario.In [4], features were extracted from the time and frequency domains, and the optimal set of features was selected.With SVM, an accuracy of 94.00% was reported on a 100-fold purely person-dependent experiment (ten-fold cross-validation for each of the ten subjects).In [2], after outlier removal and signal smoothing, an SVM classifier based on feature extraction and selection achieved 91.27% accuracy for LOS on a ten-fold leave-one-out cross-validation.Our work reached 97.65% accuracy on a random-selected ten-fold cross-validation in the LOS scenario.
To our knowledge, the up-to-date publicly available HAR work on the ''MultiEnvironment'' dataset's NLOS scenario is the recent publication [22].The AAE-based recognition results in this work can serve as the first benchmark for non-deep learning, with comparable accuracy  Time complexity is a challenging factor for any model or classifier, should one desire to implement it to solve real-world problems.We also compare training and testing time costs.Our work took 45.12 seconds for training and 0.29 seconds for testing on the StanWiFi dataset, while [41] took 157.80/5.46seconds.ABLSTM [7] spent even more.Again, the 49.20/0.35seconds for training/testing can be regarded as a benchmark of time cost for ''Mul-tiEnvironment''.Admittedly, the time statistics are relevant to hardware, but the efficiency of our method is evident.
The following factors contributed to the outstanding recognition performance of our work: • The proposed AAE algorithm eliminated the antennas that are non-sensitive to human activities, significantly reducing irrelevant information.
• Statistical features were extracted from both the amplitude and the phase of the selective antennas.
• The correlation between the extracted features was analyzed using PCC.The feature selection using the SFS technique further endowed ML tasks with the most informative input.
• Following non-deep ML approaches on several publicly available datasets, AAE-based HAR achieved comparable performance to deep learning methods, with significantly reduced time complexity.

D. IMPACT OF THE NUMBER OF ANTENNAS RETAINED
All the datasets applied in this paper applied three receiving antennas.Therefore, the current process of eliminating one antenna, which we used to verify whether eliminating the antenna impacts recognition, implies the preservation of two antennas.To further confirm the optimality of the derived results, experiments employing all three antennas as well as retaining only the most sensitive antenna on the RF classifier, which performs best among the three applied ML models, were also performed on all datasets, as summarized in Table 7.
The advantages of AAE are self-evident in Table 7.With the elimination of one antenna, which saves one-third of the data usage, the recognition rates and F1-scores, in turn, improve by about 2-3 percentage points.When one more antenna is eliminated, that is, only the most sensitive antenna is kept, the performance drops by 5-6 percentage points.The time complexity is understandably close to proportional to the number of retained antennas, while the time cost pro sample is constant.Since the time expenditure of our algorithm is already exponentially better than the state-of-the-art models (see Table 7), AAE is apparently superior to keeping only the most sensitive antenna: Doubling the time at the millisecond level brings about a significant increase in recognition rate.
The current work verifies that under reasonable algorithms, such as the proposed AAE, eliminating antenna can even bring gains to the applied datasets instead of impairing the recognition performance.When there are more antennas involved, how many antennas should be retained/eliminated to achieve the optimal accuracy-efficiency balance is a topic depending on the dataset and purpose.Future work includes designing a generalizable threshold scheme to automatically or semi-automatically determine the number of eliminated/retained antennas.

E. IMPACTS OF ANTENNA ELIMINATION ON EACH ACTIVITY
To explore the impact of antenna elimination on the recognition rate of each activity, Figure 9 summarizes the accuracy statistics of individual activities from the nine experiments involved in Table 7.The red bars indicate the incremental accuracies brought about by applying AAE compared to using all three antennas, while the yellow bars represent AAE's recognition rate increment compared to using only the most sensitive antenna.
The noteworthy points found in Figure 9 are listed below.
• Eliminating one least sensitive antenna, i.e., the proposed AAE, had no negative impact on any activity.
• AAE achieved a 100% recognition rate for several activities (see Figures 6-8); the other two settings never reached 100% on any individual activity.
• On the ''MultiEnvironment'' dataset's NLOS scenario, AAE did not gain an increase in accuracy (nor did it get worse) for Sitting down/Standing up and Turning.Besides, AAE brings only tiny gains on Sitting down/Standing up in the LOS scenario.As analyzed in Section V-B, these two activities are inherently challenging to recognize compared to the others.However, when this label is split into two separate classes, that is, in StanWiFi, AAE can perform significantly better.
• AAE earns good gains for Walking and Falling that are involved in all datasets.
• The most positively affected activity by AAE is Picking up a pen in the ''MultiEnvironment'' dataset's NLOS scenario.For such an activity of localized details, the elimination of an insensitive antenna greatly helped recognition.Eliminating one more antenna on top of AAE, the most considerable decrease in recognition rate was as well for Picking up a pen (in the LOS scenario), which suggests that one antenna is far from enough to characterize this activity of localized details.
• When only the most sensitive antenna was retained, the results were much worse than when all antennas or AAE were applied, with one exception: the Falling in the ''MultiEnvironment''dataset's LOS scenario favors only one antenna over all antennas.Of course, AAE is still better than both.

VI. CONCLUSION
This article presents a Wi-Fi CSI signal-based HAR system with superior HAR performance for both LOS and NLOS scenarios.An adaptive antenna elimination (AAE) algorithm has been proposed to keep the most sensitive antennas related to human activities, and various features were extracted from the amplitude and phase of the selected sensitive antennas.Feature correlation analysis and feature selection are applied to obtain the best subset of features.Three non-deep ML classifiers were applied and compared rather than deep learning models in order to confirm AAE's effectiveness generally.The proposed HAR system with RF classifiers has achieved a classification accuracy of 99.84% on the StanWiFi dataset and 97.65% (LOS) / 93.33% (NLOS) on the ''MultiEnvironment'' dataset, outperforming or on par with the recognition performance of state-of-the-art studies, with a significantly lower time cost.Currently, our research involves two datasets with different types of daily activities as an AAE's proof of concept.A further investigation of our proposed system on other available datasets with more types of human activities will be a future direction.Additionally, recognizing 105452 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
multi-user activities based on Wi-Fi signals is realistic but more challenging.
Wang et al.[52] designed a CSI-based human action recognition and monitoring method (CARM) using timefrequency features such as torso and limb velocities.CARM consists of two models: a CSI-speed model describing the connection between CSI kinematics and human motions and a CSI-activity model explaining the relationship between human activities and movement speed.CARM performed a recognition rate of 96% and can withstand environmental changes.Yousefi et al.[56] created the StanWiFi dataset from Wi-Fi-based CSI signals, on which statistical features were extracted for the modeling research of long shortterm memory (LSTM), hidden Markov model (HMM), and random forest.Chen et al. [7] proposed an attention-based bidirectional LSTM (ABLSTM) model to identify passive human activities utilizing CSI signals, outperforming other benchmarks.A Wi-Motion system was proposed in [26] to recognize five activities in line-of-sight (LOS) and non-lineof-sight (NLOS) scenarios.Moshiri et al. [11] presented a deep learning-based HAR approach on Raspberry Pi by collecting CSI signals of seven human activities.They transformed CSI signals into 2-D images using pseudocolor plots and evaluated four convolutional neural networks (CNN) or LSTM models, namely 1D-CNN, 2D-CNN, LSTM, and bidirectional LSTM, among which 2D-CNN reached 95% accuracy.Alsaify et al. [2] constructed a multi-environmental HAR system using CSI signals at an overall recognition rate of 91.27% through denoising, activity segmentation, statistical feature extraction and selection, and support vector machine training.Salehinejad and Valaee [41] introduced a LiteHAR model with a random convolution kernel for feature extraction, achieving 93% accuracy on a public dataset.Yadav et al. [55] presented an advanced version of the InceptionTime network, named CSITime, and evaluated it on three public datasets.Shalaby et al. [42] studied four deep learning models on the StanWiFi dataset, among which the CNN gaited recurrent unit (CNN-GRU) model acquired an accuracy of 99.31% with a time cost of 0.0033 seconds per sample.The accuracy of the attention model on CNN-GRU is 99.16%, but the sample-wise time consumption drops to 0.0019 seconds.Very recently, Islam et al. [22] proposed a deep learning model, called spatio-temporal convolution with nested LSTM (STC-NLSTMNet), which reaches 99.88% and 98.20% accuracies on two public datasets.The literature listed above delivers that several researchers have worked on recognizing human activity using diverse Wi-Fi signal-based approaches in signal processing, ML, and deep learning.One issue that has not yet received enough attention is that the mapping of human activities with Wi-Fi CSI signals on multiple antennas is affected by different sensitivities due to the fact that the Wi-Fi signals received by the antenna can be reflected during the transmission.The study on sensitivity changes in response to different activities in a multi-antenna scenario, including amplitude and phase variations, should help enhance HAR, motivated by which we analyzed the antenna's sensitivity following different activities and proposed an adaptive antenna elimination (AAE) algorithm that can eliminate non-sensitive antenna data during signal processing.

FIGURE 2 .
FIGURE 2. Example of three antennas' signal visualization for the activities running (top) and walking (bottom).Each antenna has 30 subcarriers indicated in different colors.

FIGURE 3 .
FIGURE 3. Example of an adaptive antenna elimination (AAE) procedure on a CSI data stream: (a) average values of each antenna's 30 subcarriers; (b) standard deviation values based on the signals in (a) using sliding window; (c) the elimination of the non-sensitive antenna based on values in (b).

13 :
append σ (w(n)) to σ a 14: n ← n + step 15: min a Sensitivity a ; return the rest 19: end function • If a selection algorithm determines that all three

FIGURE 4 .
FIGURE 4. Matrix of the Pearson correlation coefficients (PCC) between each pair of features.

FIGURE 5 .
FIGURE 5. Importance of the extracted features for HAR.

F1-score = 2 ×TABLE 3 .
Precision × Recall Precision + Recall (13) AAE-based HAR results on the StanWiFi dataset.To keep more valid digits, the precision, recall, and F1-score values are expressed using percentages.The percent sign is omitted from all statistics.

FIGURE 6 .
FIGURE 6. RF classifier's confusion matrix of the ten-fold AAE-based HAR results on the StanWiFi dataset.

TABLE 4 .
AAE-based HAR results on the ''MultiEnvironment'' dataset's LOS (office) scenario.To keep more valid digits, the precision, recall, and F1-score values are expressed using percentages.The percent sign is omitted from all statistics.TABLE 5. AAE-based HAR results on the ''MultiEnvironment'' dataset's NLOS scenario.To keep more valid digits, the precision, recall, and F1-score values are expressed using percentages.The percent sign is omitted from all statistics.

FIGURE 9 .
FIGURE 9. Increase in recognition accuracy of AAE for each activity compared with using all three antennas (red) and using only the most sensitive one (yellow) on all datasets.
and significantly lower time consumption to the deep learning model.

TABLE 2 .
Feature applied in this study.x 1 , x 2 , • • • , x N is a sequence of N samples for feature extraction; a 1 and a 2 indicate the two selected antennas.

TABLE 6 .
Summary of our proposed method's results with other state-of-the-art published results.

TABLE 7 .
Experimental results of applying 1-3 receiving antennas on the RF classifier.