Wi-Motion: A Robust Human Activity Recognition Using WiFi Signals

Recent research has shown that human motions and positions can be recognized through WiFi signals. The key intuition is that different motions and positions introduce different multipath distortions in WiFi signals and generate different patterns in the time-series of channel state information (CSI). In this paper, we propose Wi-Motion, a WiFi-based human activities recognition system. Unlike existing systems, Wi-Motion jointly leverages the amplitude and phase information extracted from the CSI sequence. We first construct the classifiers using amplitude and phase, respectively. The output of classifiers is then combined by a posterior probability-based combination strategy. As the simulation results show, Wi-Motion can recognize predefined 5 typical human activities with the mean accuracy of 96.6% in line-of-sight (LOS) environment, and 92% in not line-of-sight (NLOS) environment. Furthermore, Wi-Motion evaluates the effect of the age of the experimental subjects and relatively complex environments.


I. Introduction
As experiments show, the machine-centric computing model is shifting toward a people-centric computing model [1], [2], where it is critical to precisely sense and recogniz human activities.Conventional methods for recognizing human activities can be categorized into three groups: vision-based approaches, low-cost radarbased approaches and wearable sensor-based approaches.However, all of these conventional approaches have some limitations.Vision-based approaches are susceptible to lighting conditions and obstacles.At the same time, the camera has a dead angle where it may breach human privacy, resulting in the perception within only a certain range of line of sight.Low-cost radar-based systems have limited operation ranges of just tens of centimeters.Wearable sensor-based solutions although can achieve fine-grained behavioral awareness, but high cost and restriction on real-time nature, make it not practical in some applications(e.g.rescue applications).
In recent years, with wide deployment of WiFi hotspots and rapid development of WiFi-based indoor Sensing technology, several WiFi-based approaches have been proposed to recognize human activity.We can distinguish different human activities by detecting and analyzing different multi-path distortions in WiFi signals, which This research is supported in part by the National Natural Science Foundation of China (NSFC) under Grant 61702011, and in part by the National Science Foundation of USA under Grant CNS-1526638.Corresponding author: Xin He.
H. Du, J. Qian and P-J.Wan are with the Department of Computer Science, Illinois Institute of Technology, Chicago, IL 60616 USA (email: {hdu4, jqian15}@hawk.iit.edu,wan@cs.iit.edu).In this paper, we propose a WiFi-based human activity recognition system, namely Wi-Motion, using the open source datasets built by Guo et.al. [4] with IWL5300 wireless network card, and choose six most common human activities in daily home life, as shown in Fig. 1.To summarize, the contributions of this paper are shown as follows: • Unlike most human body recognition systems, which only process amplitude information, we furthermore extract the phase information in the CSI sequence and mathematically eliminate its random offset.Then we leverage several signals processing methods to obtain the high-quality dataset.
• We use different methods to extract the feature of the phase and amplitude information separately, designing different classifier with different methods.For phase feature, we chose the appropriate kernel function to build the SVM classifier after many experiments.We perform a unique classification algorithm by combining DTW algorithm and SVM model,adjusting the relationship between different amplitude feature vectors with different dimensions , construct a new SVM kernel function.
• After getting the recognition results at each classifier, we combined prediction results based on output posterior probability of two classifiers.In order to verify the effectiveness of the combine algorithm, we design a large number of comparative experiments.According to the experiment results, the solutions we proposed increase recognition accuracy clearly.
In the rest of this paper, we will present the related work in Section 2. Then we elaborate the design details and classification algorithm of Wi-Motion in Section 3 and 4. We present the implementation and evaluation in Section 5 and finally conclude our work in Section 6.

A. Hardware-based Methods
WiSee [5] used USRP as wireless devices and utilizes communication on a 10 MHz channel at 5 GHz.This implementation can recognize nine actions by extracting the doppler shift of human motion from the WiFi signal as a feature, with an accuracy rate of 94%.Adib et al. designed WiTrack and WiTrack2.0that apply specially designed carrier wave radio to track human movements behind a wall [6].

B. RSS-based Methods
As early as 2000, Bahl et al. proposed Radar [7], which is a system for indoor localization, based on received signal strength (RSS) .This is the first time where WiFi signals have been used for perception.

C. CSI-based Methods
Compared with RSS, CSI provides not only fine-grained channel status information, but information about small scale fading and multi-path effects caused by micro-movement.WiHear [11] further interpreted the transmitted signal by directing it to the human mouth and analyzing the changes in the mouth shape by reflecting the signal.The pronunciation was represented by the mouth type, thereby implementing WiFi-based recognition system called WiSign [15].Different from other systerm, WiSign uses 3 WiFi devices to improve the recognition performance.These splendid research specialized in specific application scenarios which do not contain continuous text input using CSI characteristics.With the inspiration of the above mentioned works, we propose a framework of WiFi-based activity recognition to improve the human activity recognition.

A. System Structure
In this section, we elaborate the design of Wi-Motion.Wi-Motion is a wireless system that enables commercial WiFi devices to identify people's activity using Orthogonal Frequency Division Multiplexing (OFDM) technology.The system flows of Wi-Motion are illustrated in Fig. 2. Firstly, a signal containing human activity information is acquired from a specific receiving device.Secondly, the collected signal, which is separated into amplitude and phase information, should be respectively subjected to preprocessing such as filtering and linear transformation to reduce noise and obtain useful information.Since each CSI information contains 30 subcarriers, this will result in too many dimensions of the data, causing the complexity of the system to become high.Additionally, some subcarriers may be more sensitive to human motions.Utilizing all the subcarriers is therefore not wise because the intrinsic noise on some subcarriers can be too serious to conceal the meaningful information about motions if the subcarriers are sensitive to noise but insensitive to human motions.Therefore, it is essential to reduce the dimensionality of the filtered data.
After dimensionality reduction, we extract useful features from the processed amplitude and phase information respectively.Since CSI waveforms of different activities differ on some features, so we can extracte suitable feature in both amplitude and phase information , which can represent the relationship between the time-series of CSI and different human activities, as a basis for classification.In the classification stage, we randomly select parts of feature vectors, using SVM algorithm to build two classifiers leveraging amplitude and phase information, respectively.When unknow activity enters, according to the prediction results of both two classifiers, Wi-Motion perform a merge method based on posterior probability to produce the final recognition.

B. Phase Information Preprocessing 1) Phase Analysis :
As discussed in section 1, CSI measurements provide the phase information of each subcarrier.The separated phase information φ ˆi for the i th subcarrier can be expressed as: where φi denotes the true phase, δ is the timing offset at the receiver, which causes phase error expressed as the middle term, β means an unknown phase offset, and Z indicates some measurement noise.ki signifys the subcarrier index (ranging from −28 to 28 in IEEE 802.11n) of the i th subcarrier and N represents the FFT size October 24, 2018 DRAFT (which sets as 64 in IEEE 802.11 a/g/n).Due to the unknowns listed above, it is impracticable to obtain the true phase shifts with solely commercial Wi-Fi devices.
2) Phase Calibration : To mitigate the effects of random noise, we execute a linear transformation on the raw phases, as recommended in [16].The key thoughts is to remove δ and β by considering phase across the entire frequency band.Firstly, we define two intermediate items a and b as follows: Subtracting the linear term aki + b from the raw phase φ ˆi − aki in Equation 4, we can get a linear combination of true phases, denoted as φ ˜i , from which the random phase offsets have been eliminated (omitting the small measurement noise Z ).
Although the above equation 6 can be used for calibrating phase information, the raw phase is folded due to the recurrence characteristic of phase, which requires us to map the raw phase into the true value.Fig. 3 shows the raw phase values of CSI for the three antennas at the receiver.What we can clearly see is that the raw phase of each of the three antennas is folded with the increase of subcarrier order and the range of the phase is [-π π].To obtain the true phase, the folded phase can be recovered by subtracting multiple 2π.

Algorithm 1 Phase Calibration
Input: raw phase MP = φ ˆi of 30 subcarriers; Output: transformed phase CP = φ ˜i of 30 subcarriers; 1: Set TP as a vector as the same size of MP; 2: Set k as a vector from -28 to 28; end ifTP i = MP i -diff*2*π; 10: end for 11: Compute a = T P (30)−T P (1)  CP i = TP i a k i b; 15: end for Thus, we perform a phase calibration algorithm in Algorithm 1 proposed by Wang et al [17].Fig. 4 shows the transformed phase values for three different antennas.It is noticed that the range of the transformed phase October 24, 2018 DRAFT becomes much smaller than the raw phase for three antennas.Fig. 5 makes a comparison of unprocessed raw phase and transformed phase information of the first subcarrier of a squat sample.As can be seen, the phase without further calibration distribute extremely randomly.But after calibration, it behave relatively stably as expected.
C. Amplitude Information Preprocessing 1) Noise Remove Algorithm: The raw amplitude waveform we separate from raw CSI measurements is usually not reliable enough to be used for feature extraction because of the noise, which can be from environmental changes, radio signal interference, etc.In our system, we further introduces weighted moving average (WMA) method to the raw amplitude waveform.{AMP1,1, ..., AMPt,1} denotes the amplitude value sequence of first subcarrier in the time period t ,the expression of amplitude series is shown as follows.
where AMP_NEW indicates the averaged new amplitude, the value of m decides in what degree the current value is related to historical records.In this paper, we set m=10.Fig. 6 shows the original waveform of the first subcarrier of squat and the waveform after WMA filtering.Comparison shows that WMA filtering can remove most of the noise, which makes waveform smoother.
2) Dimensionality Reduction: The IWL5300 provides 802.11n channel state information in a format that reports the channel matrices for 30 subcarrier groups.At each subcarrier, the fine-grained CSI describes how a signal propagates from the transmitter to the receiver with the combined influence of, for example, scattering, fading, and power decay with distance.After noise remove, we can get a relatively accurate amplitude matrix of each activity sample.However, if all subcarriers are used to perform the following operations, it will definitely cause the complexity of the system to become higher.On the other hand, as show in Fig. 7, we notive clearly that different subcarriers have different sensitivities for same activity.When data is affected by noise, some subcarriers that are very sensitive to noise but show very low sensitivity to human activity will unpredictable hinder the work behind.Therefore, reducing the data dimension and eliminating these non-significant subcarriers is very important.In this paper, we leverage PCA algorithm to reduce the dimensions of the CSI sequence and eliminate redundant information remaining in data sequence.Based on our experiment results, we finally choose the first principal component waveform for subsequent operations, which was showed in Fig. 8. Due to the classification performance, the Daubechies D1 coefficient wavelet family is selected.Then, the approximate coefficient of the last layer is taken out, and the normalized coefficient sequence is used as the feature vector.After feature extraction, the contour information of the amplitude waveform is preserved in the feature vector, and the noise is suppressed as the detail coefficients are discarded.The complete binary tree of the DWT process can be shown in Fig. 9.

IV. Classification
2) Phase Feature Extraction: For the phase information matrix, it also includes phase information of 30 subcarriers.In this paper, we use singular value decomposition (SVD) to simplify the CSI phase difference matrix of the first receive antenna and the second receive antenna.SVD is a method with obvious physical meaning.It can represent a more complex matrix by multiplying smaller and simpler sub-matrices, which describe the important characteristics of the raw matrix.Based on our experiment results, the top 5 singular values of the SVD matrix are more useful for classifcation.

B. Classifier Training
We select a high effective SVM classification to recognise six activities according to the performance of existing works.As is known to all, the choice of kernel function plays a key role in the performance of classical SVM.For example, a Gaussian kernel function that is simple in form and widely used.
where vector xi represents the center of the kernel function, and I x − xi I 2 represents the Euclidean distance of any vector x to the center of the kernel function.
For our extracted amplitude features, we find that the feature vectors of human activities may not share the same length, so the traditional SVM algorithm requiring the dimension of the feature vector to be consistent cannot be applied in classifying our amplitude features.In this situation, we use dynamic time warping (DTW) to calculate the distances among feature vectors.In contrast to Euclidean distance, DTW offers intuitive distance between two waveform and can be resilient to signal distortion or shift.DTW distance is the Euclidean distance of the optimal warping path between two waveforms calculated under boundary conditions and local path constraints [18].
Finally, we classify our amplitude feature vectors using the support vector machine with the kernel function defined in the equation 9.For our extracted phase features, we don't have to worry about the above problem, where the dimensions are inconsistent, because we chose the same quantity of singular value.We tested various kernel functions of the SVM, such as linear kernels, Gaussian kernels, and polynomial kernels, etc.According to classification performance, we choose a Gaussian kernel whose classification performance is much larger than other kernel as the final kernel function of phase SVM model.

C. Prediction
In our experiments, we collect the CSI waveforms of six different activities ("bend", "hand clap", "walk", "phone call", "squat" and "sit down") that are defined in Wiar to test our two classifiers.Results are shown in the Fig. 10.We find that for some activity, like "hand clap", we can use the kernel-based SVM model with phase feature to perfectly classify them.However, for some other activities like "bend", the classification effect of the phase classifier is not satisfactory.The same situation occurs on the amplitude classifier, where "walk" and "squat" can be classified very well, but other activity like "hand clap" can not be well recognized.In response to this situation, we propose combine the prediction results properly on two classifiers.Traditional result combination algorithms, such as boosting algorithm, multiple decision method, etc. face higher complexity and limitations for more than three classifier.In our experiment, there are only two classifiers, which lead to that the traditional combination algorithms not suitable for the challenges we are facing.In WiSign [15], Shang where P(y = 1| f (x)) indicates the probability that the sample under the condition of standard output value f (x) is the target class.A and B are parameters that need to be optimized, which can be obtained by using the training set for maximum likelihood estimation.That is, the target model can be expressed as fellowing formula. min where 1 in the training sample, p 1 Af (xi )+B) .Wi-Motion is a six-class task.In our experiments, we extend the two-class probability based SVM to the multi-class in a one-to-one manner, where we need to synthesize 6*(6-1)/2=15 results for each classifier (amplitude and phase).
After the test sample enters, two classifiers respectively predict it and generate a posterior probability vector.
Then Wi-Motion add the two vectors with same weight and give the final prediction.For example, assume the prediction vector reported by the two classifiers are (0.1, 0.2, 0.13, 0.78, 0.9, 0.27) and (0.12, 0.34, 0.2, 0.87, where (xi, yi) represents the training sample, n+ and n− indicate the number where the category is +1 and October 24, 2018 DRAFT 0.14, 0.24), we can see that the first classifier cannot distinguish the fourth and the fifth activity.If we always choose the result with the highest confidence on one classifier, it is going to be a wrong prediction.But if we combine these two prediction vectors , we can get (0.22, 0.54, 0.33, 1.65, 1.04, 0.51).Based on the final combined prediction vector, we can get the correct prediction (fourth activity).

A. Activity Dataset
We select the most common six human activities in the home environment from the dataset constructed by Guo et al in WiAR [4], as shown in Fig. 1.In WiAR,they use a commercial TP-Link wireless router as the transmitter operating in IEEE 802.11nAP mode at 2.4GHz.A ThinkPad 400 laptop with three antennae running Ubuntu 10.04 is used as a receiver, which is equipped with off-theshelf Intel 5300 card and a modified firmware.During the process of receiving WiFi signals, the receiver pings 30 pkts/s from the router and records the CSI from each packet.

B. Activity Recognition Accuracy
Fig. 11 shows the mean prediction accuracies of our prediction combination model and classifcation models on each classifier of volunteer 1.We can see that our system can improve recognition performance for all supported activities.Since âĂĲBendâĂİ is unsatisfactory to classify on both amplitude and phase classifier.
However, after merging the results of two classifiers, our system have great prediction accuracy of 97%.For âĂĲHand ClapâĂİ, both of the phase classifier and our system have great prediction accuracy of 100%, while the amplitude classifier only has prediction accuracy of 92%.For âĂĲSit DownâĂİ, the prediction accuracies are 95% and 88% on the amplitude and phase classifier, while our system has better result of 98%.The experiment results show that we can get more accurate activity estimation by combining output posterior probability of two classifiers.

C. Different Number of Training Samples
In this paper, Wi-Motion does not require a large amount of samples, but after our experiments, we notice that the number of training samples has a certain impact on the recognition accuracy.Fig. 12 shows our experimental results.We can find that with the increase of training samples, the classification accuracy of the amplitude and phase classifiers has a certain degree of increase, because the more training samples, the richer the scene, meaning the hyperplane position of the support vector machine is more accurate.Besides, the increase of our combination system is small, because the average recognition rate of our system is already at a very high level.
However, what we all know is that the more training samples, the longer time to train the SVM system takes , which inevitably leads to an increase in system complexity.Therefore, we set the number of samples used for training to 110 in our experiments, which minimizing the training time of the sample while ensuring accuracy.

D. Different Volunteers
In our experiments, we found even for a same activity, the operation range and speed may not be the same since different people tend to have different habits.Thus, to make sure our system can work for different October 24, 2018 DRAFT users, we add a new experiment to evaluate the influence generated by different volunteers.We use the trained classifcation model to further evaluate the data collected from the other two volunteers, and the results are shown in Fig. 13.We can find that the prediction performances are still good even if we do not retrain the parameters that need to be set in the model for these two new volunteers.Although the mean prediction accuracies decrease by about 3% and 5% respectively, the recognition accuracy is always at a very high level.What we can expect is that when we retrain new model parameters for these two new volunteers, our system average recognition accuracy must have a obvious increase.

E. False Positive and True Positive
In order to test the recognition performance of Wi-Motion, we further explore the false postive and true positive of each activity supported.We use the same dataset that is used in section 5.1, and the evaluation result is illustrated in Fig. 14 and Fig. 15.It is clear that the false positive of prediction can be improved to about 0.16% by combining two classifiers in most cases.Moreover, the true positive rate generated by the combined model reachs 98.5%, which is significantly higher than that produced by the separate classifier.
Therefore, the combination we proposed reduces the recognition mistakes and facilitates the recognition of activity.

VI. Conclusion
In this paper, we propose a WiFi-based indoor activity recognition system called Wi-Motion.Compared to existing related systems, we adopt both amplitude and phase information constructing classifiers in our system to improve the recognition performance.Moreover, to enhance the robustness of Wi-Motion, the final recognition result of our system is determined by combining prediction results on all classifers based on output posterior probability rather than simply obtaining from single classifer.Experimental results show that our system can get better mean false positive of 0.16% and mean true positive of 98.5%, in addition, improve the recognition accuracy to 98.4% compared with originial implementation that uses only one classifier constructed with amplitude or phase information.True Positives
b indicate the slope of phase and the offset across the entire frequency band, respectively.If the subcarrier frequency is symmetric, which means n kj = 0, b can be expressed as b = 1 n φj + β.

A. Feature Extraction 1 )
Amplitude Feature Extraction: Discrete wavelet transform (DWT) can analyze signals on multiple frequency scales and has better extraction ability for local features.Considering that the speed of movement of different parts of the body is different, direct extraction will lose a lot of detail.Through the wavelet transform, the wavelet coefficients of each frequency band are obtained.For the first principal component obtained from amplitude waveform after PCA processing, it maintains most features of the raw amplitude waveform, so the features corresponding to each frequency band can be extracted.Firstly, we perform DWT processing on the extracted amplitude waveform based on the first-order Daubechies wavelet, where the decomposition layer October 24, 2018 DRAFT number is 3.In Wi-Motion, several wavelet families have been tested such as Daubechies, Coiflets, Symlets.
The aim of DTW is to compare two timedependent series X = (x1, x2, ..., xn) of length n ∈ N + and Y = (y1, y2, ..., ym) of length m ∈ N + .These sequences can be discrete signals (time series) or, more generally, feature sequences sampled at equidistant points in time.Therefore, we use DTW distance to replace the Euclidean distance in the Gaussian kernel function to construct a new kernel function.

1 +
et al. propose a weighted voting on two laptop and get the final prediction result, where they combine two prediction vectors of classifiers on two laptops instead of choosing the result with the highest confidence on one laptop.In the context of Wi-motion, we want to combine the result of two classifiers.Having understood the similarity of two problem, we propose a combination strategy based on output posterior probability of two classifiers, where the classification result of each classifier is given in the form of posterior probability which represents the membership of the sample for each category.According to the method proposed by Platt [19] in a simple two-class problem, the SVM standard output value is mapped to [0,1] using the Sigmoid function to obtain the SVM posterior probability.exp(A f (x) + B)