A Survey on Human Behavior Recognition Using Channel State Information

Recently, device-free human behavior recognition has become a hot research topic and has achieved significant progress in the field of ubiquitous computing. Among various implementation, behavior recognition based on WiFi CSI (channel state information) has drawn wide attention due to its major advantages. This paper investigates more than 100 latest CSI based behavior recognition applications within the last 6 years and presents a comprehensive survey from every aspect of human behavior recognition. Firstly, this paper reviews general behavior recognition applications using the WiFi signal and presents the basic concept of CSI and the fundamental principle of CSI-based behavior recognition. This paper analyzes the key components and core characteristics of the system architecture of human behavior recognition using CSI. Afterward, we divide the sensing procedures into many steps and summarize the typical studies from these steps, including base signal selection, signal preprocessing, and identification approaches. Next, based on the recognition technique, we classify the applications into three groups, including pattern-based, model-based, and deep learning-based approach. In every group, we categorize the state-of-the-art applications into three groups, including coarse-grained specific behavior recognition, fine-grained specific behavior recognition, and activity inference. It elaborates the typical behavior recognition applications from five aspects, including experimental equipment, experimental scenario, behavior, classifier, and system performance. Then, this paper presents comprehensive discussions about representative applications from the implementation view and outlines the major consideration when developing a recognition system. Finally, this article concludes by analyzing the open issues of CSI-based behavior recognition applications and pointing out future research directions.


I. INTRODUCTION
Recent years have witnessed the rapid development of the Internet of Things (IoT).The demand for pervasive computing is dramatically increasing because IoT provides us with useful information about monitoring target and facilities development of ubiquitous computing applications.The IoT technology can be employed on environment monitoring [1], smart city [2], and industrial fields.Besides, this technology also has spread into our daily lives and can be used to sense human behavior [3], such as health monitor- The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wang .ing [4], behavior understanding [5], user profile construction [6], activity inference [7], smart home control [8], human localization [9], human position tracking [10], [11], and occupancy detection [12], etc.These applications of human behavior sensing largely enrich the research content and provide us with novel human-computer interaction methods.Traditional human sensing techniques usually require the user to wear or carry some sensors, which restricts their application environments and brings much inconvenience to users [13].With the widespread deployment of WiFi devices in indoor environments, the device-free WiFi-based behavior recognition technique has drawn more attention because it overcomes some existing shortcomings of common behavior recognition system, including the deployment cost, privacy violation, and applicable environment limitations, etc.In other words, this approach can be widely deployed in various environments and work within the Non-Line of Sight (NLOS) path even across the walls.
Currently, WiFi-based activity recognition techniques usually include two types of wireless signal, RSS (radio signal strength) and CSI (channel state information).RSS provides coarse-grained information about communication links while CSI describes fine-grained information about the communication channel state [14].We can measure RSS using most wireless devices easily because the RSS collection is supposed by almost all wireless chips.As for CSI, we have to modify NIC driver to measure the CSI using COTS devices, including Intel 5300 NIC [15], Atheros AR9382 [16], Atheros AR9462 and AR9480 [17], Atheros 9580 [18], [19], and Atheros 9390 [20], [21].Alternatively, we can measure CSI using a special device such as SDR [22] or WARP [23], [24].At present, there has a great many state-of-the-art WiFi-based recognition applications and studies.This part reviews some typical recognition applications based on RSS or CSI.
RSS is a measurement of the power of the radio signal at the received end.In other words, it describes the average signal power of the received signal.Currently, RSSbased approaches have been extensively used in the field of identification and have achieved great recognition performance.In this section, we introduce some typical RSS-based recognition applications, such as object tracking and location [25]- [28], activity monitoring or recognition [29], hand gesture recognition [30], driving behavior identification [31], and crowd counting [32].
Youssef et al. [25] first propose the concept of device-free passive (DfP) localization with RSS and present a passive radio map construction and tracking application.In addition, reference [25] displays promising pictures of passive devicefree localization and tracking.Wilson and Patwari [26] propose an approach for tracking and localizing the target behind the wall.It adopts a statistical model to depict change rules of the signal by using the variance of RSS caused by user motion.Xiang et al. [27] propose an approach to enhance the traditional RSS-based indoor localization precision by using features of light, temperature, and humidity information.Wilson and Patwari [28] propose radio tomographic imaging (RTI) technology for imaging the attenuation.It utilizes a line model for depicting the RSS to acquire images of moving objects and locates a certain person in the area of RF coverage by imaging the attention.Wang et al. [29] propose a devicefree localization and activity recognition (DFLAR) system using deep learning.This system adopts 3-layer networks to extract features.Compared with traditional handcraft features extraction approaches, deep learning-based methods improve 10% identification precision.Abudulaziz et al. propose Wisture [30], an RSS-based hand gestures recognition system on the smartphone.This system can recognize noncontact dynamic hand gestures (e.g., swipe, push, and pull) using Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN), and it achieves 94% accuracy in gestures detection and classification.Depatla et al. propose an RSS-based crowd counting system in through-the-wall scenario [32].This system tests 20 people in 5 different through-the-wall scenarios (e.g., concrete, plaster, and wood) and achieves the error of two people.
CSI can be expressed as a complex matrix, each entry of which records the amplitude and phase response of the signal transmission channel of each subcarrier.We can compute the amplitude and phase of the received signal of each channel from the complex number and evaluate the channel quality using each entry of the matrix.Therefore, the amplitude of the CSI signal quantifies the signal power attenuation after the multi-path effect, similar to the received signal strength of the wireless signal.Currently, there has many CSI-based recognition applications, such as indoor human behavior recognition, indoor localization [33]- [41], indoor object state detection [42], fire detection [43], traffic monitoring [44], wheat moisture detection [45], object distinction [46], in-baggage suspicious object detection [47], school violence monitoring [48], and other identification applications in driving scenario (e.g., risky driving behavior detection [49], [50], driver fatigue detection [51], driver's distracted behavior detection [52], in-vehicle behavior and hand gesture recognition [53], [54]).
This article concentrates on human behavior recognition with CSI.We present the typical human behavior in this paper to clarify the research area.Based on the behavior recognition purpose, we divide CSI-based indoor human behavior recognition applications into two groups, such as specific behavior recognition and activity inference.Specific behavior recognition refers to identifying some simple human behaviors that we just consider as distinct actions, such as daily behaviors (e.g., walking, running, sitting, cooking, eating, lying, standing, brushing, opening and closing door, etc.), abnormal behaviors (e.g., slipping, falling, etc.), hand gestures (e.g., wave, push, pull, sweep, clap, slide, draw the circle, draw zigzag, sign languages, digit finger gestures, etc.), and exercises (e.g., table tennis action, bodyweight exercise), etc. Differently, activity inference usually refers to determining the meaning of actions.This application aims to make the behavior inference to reveal the real purpose of the activities instead of specific behaviors, such as smoking activity detection, crowd counting, and user authentication, etc.
In general, according to identification techniques, CSI-based recognition applications can be divided into two types: pattern-based applications and model-based applications.The former recognizes human behavior by leveraging pattern recognition methods which usually include some machine learning algorithms.In addition, the pattern-based approach usually requires more data to fit or train network parameters.Differently, the latter recognizes behavior by utilizing the mathematical or physical model that depicts the unique relationship between CSI signal variation and human behaviors.Thereby, the key of model-based methods is how to extract effective signal features to express human behavior and build a model to measure the signal changes caused by human activity.From the description of the modelbased approach, this approach determines human activities with the model without the need of a lot of data, which is its striking advantages compared with the pattern-based method.Recently, deep learning technology has been widely applied in various fields due to special characteristics compared with traditional pattern-based methods.Specifically, it requires a great number of experiment data to train neural network parameters because it usually contains tens of thousands to several million unknown parameters.Besides, the deep learning-based approach has no definite feature extraction procedure because this step usually is integrated into different network structures.In other words, the deep learning-based method can automatically extract or build complex features by middle lays, which is an apparent difference compared with pattern-based methods.Although the deep learningbased methods usually are considered as one of the patternbased, based on the above analysis, we deem that deep learning applications should be classified as a group because of its distinct difference compared with pattern-based applications.Consequently, this paper categorizes state-of-the-art applications into three groups, including model-based, patternbased, and deep learning-based approach.

A. PATTERN-BASED BEHAVIOR RECOGNITION
In CSI-based behavior recognition applications, we find more applications based on the pattern.The pattern-based approach aims to find distinct patterns to identify human behavior.This depends on effective feature construction that can be leveraged to recognize different activities [3].These features can be either complicated or simple on the basis of behavior granularity and recognition purposes.Thereby, the pattern-based approach usually need some samples to construct patterns to accomplish CSI-based behavior recognition applications.After collecting effective samples, we need to preprocess the CSI data to remove the error data and noise data from hard devices and ambient scenarios.The typical signal preprocessing techniques include low past filter, Hamel filter, Principal Component Analysis (PCA), Discrete Wavelet Transformation (DWT), and Butterworth filter, etc. Especially, for phase information, we must implement phase calibration because it has a good resolution for human action and is distorted easily by device imperfection and other disturbance.Afterward, we obtain clear data and can feed them into a classifier to implement behavior recognition.

B. MODEL-BASED BEHAVIOR RECOGNITION
In the typical CSI-based behavior recognition applications, we find that many different behavior models have been employed.The model-based approach aims to design a physical model that relates the signal space to the physical space and builds the physical law through the relationship between the received signal and the sensing target [3].In other words, the model-based recognition approach correlates the user's motion to the received signal variation.Therefore, it can recognize human behavior by exploring the physical law based on the physical-mathematical models.Instead of profile matching or pattern recognition, we can utilize the model to identify human behavior with low computation cost.Compared with the pattern-based method, the modelbased approach can achieve satisfactory performance with small measurement data because of its utilization of the model.Despite these advantages, it is challenging to apply the model-based approaches in some scenarios because test environments usually contain various stuff, making it hard to develop a general mathematical model that depicts the CSI features in the scenario.
Although many references involve model-based methods, they do not give an explicit description.In this article, we consider two types of models used in CSI-based behavior recognition applications.The first one exploits the physical law of signal propagation to depict the relationship between the signal change of CSI and the user's behavior, which can be used to recognize human activities, such as Fresnel Zone Model, angle of arrival (AoA), CSI-ratio model, interacting model.Therefore, we can obtain the human action state from these models, such as motion direction, speed, and time, which interprets human behavior by using physical law solely.The second one usually comprises two parts, the physical description of signal changes and recognition algorithms, such as the CSI-speed model and CSI-activity model.Specifically, this method first obtains signal fluctuations based on propagation law and physical model and then leverages classification or fitting methods to identify human behavior.Therefore, this method combines models and classification algorithms.Remarkably, we do not consider PSD (power spectral density) and CFR (channel frequency response) as a model because they solely describe CSI signal changes based on the propagation path and cannot be utilized to depict behaviors directly.This paper presents some representative model-based recognition applications, including daily behavior recognition [124], [125], human presence detection [126], [127], user authentication [128], walking direction estimation [129], handwriting recognition [130], and respiration detection [24], [131]- [137].

C. DEEP LEARNING-BASED BEHAVIOR RECOGNITION
Recently, deep learning technology has been studied comprehensively and has achieved considerable progress.Therefore, many recognition applications based on CSI using deep learning algorithms have been developed.Although the number of traditional pattern-based approaches of CSI exceeds that of deep learning-based approaches, the growth speed of deep learning applications is rapidly increasing.Compared with pattern-based and model-based approach, deep learningbased approach has many distinct benefits.We can benefit from automatic feature discovery and behavior recognition of deep learning algorithms.Specifically, it can automatically identify and extract effective features and feed them into the classifier.The procedures of feature extraction and classification are implemented by the middle layers of the deep neural network model.To accurately depict the features, the deep learning-based approaches require a large number of samples to continuously alter network parameters.The procedure of training is labor-intensive and time-consuming, which is its main disadvantage.
For CSI-based behavior recognition, we can obtain satisfied identification accuracy if we build an appropriate model and have massive samples.That is to say, there is no need to pre-process CSI data that has complex features to obtain feature descriptions in deep neural networks, which is the advantage of the deep learning-based method.Besides, we have to collect plenty of samples of CSI to train network parameters and obtain good recognition accuracy.This article presents some typical deep neural network models used in CSI-based behavior recognition applications, including Autoencoder, Convolutional Neural Network (CNN), LSTM, RNN, Residual Neural Network (ResNet), and Restricted Boltzmann Machine (RBM).In addition, we also detailed introduce these applications, such as daily behavior recognition [138]- [142], falling detection [143], [144], syncope detection [145], hand gesture recognition [8], [146]- [149], sign language recognition [150], gait and walking direction recognition [151], [152], human presence detection [153]- [155], crowd counting [156]- [159], user authentication [160]- [163], and respiration monitoring [22].
The main contributions of this paper are given as follows.Firstly, we review the latest research progress of WiFi CSI-based human behavior recognition and emphasize the size of the samples.To the best of our knowledge, the present paper is the first survey that contains a total sample description of CSI-based behavior identification.We exhibit the accurate size of samples, classifiers and human behavior from pattern-based, model-based, and deep learning-based approaches according to identification techniques.Secondly, we present the general framework that contains the main structure of a typical recognition system.Based on the structure, we report a comprehensive statistic of state-of-the-art applications from base signal selection, signal preprocessing and behavior classification methods.Thirdly, we investigate the state-of-the-art studies and exhibit them from the different granularity of behavior.We select some typical applications and outline their characteristics from test equipment, preprocessing methods, test environments, the size of participants, recognized behaviors, classifiers, and system performance.Finally, we make a comprehensive discussion about a few typical activities from the implementation view and provide some helpful insight into the development of these applications.
The remainder of this article is arranged as follows.In Section II, we present some surveys of human behavior recognition.Moreover, we introduce background which contains the basic concept of CSI and the principle of CSI-based behavior recognition.In Section III, we give some general approaches and the general framework of CSI-based behavior recognition, including base signal selection, signal preprocessing, and three behavior recognition techniques (e.g., pattern-based recognition, model-based recognition, and deep learning-based recognition).In Section IV, we divide state-of-the-art CSI-based activity identification applications into three groups and many subgroups based on behavior and review them in detail.Section V presents a discussion about identification techniques and system performance.Issues and future directions are given in Section VI.Finally, the conclusion is provided in Section VII.

II. RELATED WORK, CHANNEL STATE INFORMATION, AND THE PRINCIPLE OF CSI-BASED BEHAVIOR RECOGNITION A. RELATED WORK
With the wide popularity of wireless networks, WiFi devices have been extensively employed in various indoor environments.Currently, WiFi-based behavior recognition approaches have been widely studied and the related applications are increasingly emerging due to their major strengths.Meanwhile, some surveys of WiFi-based recognition report the behavior recognition progress using CSI and review the state-of-the-art research results, (see in Table 1).However, they have some weaknesses in some aspects.Next, we summarize their major features.Although reference [164], [169] present the survey on WiFi-based contactless activity recognition and highlight behavior recognition frameworks, they do not provide enough description of the typical applications.Liu et al. [173] review wireless sensing applications using 4 different techniques, including RSSI, CSI, FMCW (Frequency Modulated Carrier Wave), and Doppler shift.The authors select some typical applications and analyze the features of them.However, this paper lacks the description of signal preprocessing, experimental performance evaluation and recognition approach analysis.The articles [3], [165]- [167] come from the special issues of behavior recognition based on Wi-Fi CSI and present the survey on CSIbased behavior recognition from a different view, including pattern-based and model-based approaches, deep learning techniques, and behavior granularity.Although these articles present the key content from their interest, they lack the detailed analysis of typical behavior recognition applications.Wang et al. [168] present a survey on WiFi-based recognition from challenges, opportunities, and applications, and emphasize the model used in WiFi-based recognition.Although it describes some physical models and discusses further research opportunities, it does not provide a comprehensive analysis of existing applications.Two papers [170], [171] are the newest surveys that cite more references and present a detailed survey on WiFi-based recognition.However, Khalili et al. [170] do not introduce the framework of activity recognition and reference [171] lacks a detailed description of every application.Although reference [172] investigates many references, it focuses on through-the-wall applications and emphasizes the special application scenario, whose aim is different from that of this paper.
Based on the above analysis, we find that these surveys do not present a comprehensive review of WiFi CSI-based sensing, especially for human behavior recognition.Therefore, there is a pressing need for a detailed survey on CSI-based behavior recognition that contains general activity recognition framework, more application overview, more detailed signal processing statistics, accurate classification approach description, and the statement of typical applications in detail.
This article effectively overcomes the weaknesses existed in the above surveys and contains some essential contents, including the accurate number of samples, detailed preprocessing method statistics, more comprehensive application analysis, and more thorough references.Specifically,  3.
this paper investigates more than 100 latest behavior recognition applications using CSI and categorizes them from behavior recognition methods.Besides, we make statistics about the number of different recognition methods according to the paper publishing time, as shown in Fig. 1.The number of behavior recognition applications based on CSI goes up rapidly, which indicates that this is a heat research topic.We notice that the numbers of pattern-based and deep learning-based behavior recognition applications are gradually increasing.The growth rate of deep learning-based applications is quicker than that of the pattern-based applications.The number of model-based applications quickly increases in 2018 largely because the Fresnel zone model is proposed and applied, which shows that this approach can effectively monitor the breathing rate.From Fig. 1, we deem that the number of behavior recognition applications based on CSI will steadily rise because of the wide adoption of patternbased methods, further study of deep learning-based method, and proposal of the appropriate model.
To clarify the major content of this paper, we use some tables to summarize the analysis results of human recognition using CSI.Specifically, the paper exhibits the behavior recognition framework using CSI and main composition (see in Table 2), including base signal selection, signal preprocessing, and three types of classification methods.This paper presents the state-of-the-art studies (see in Table 3) and analyzes these CSI-based behavior recognition applications from classification technique aspects, such as the patternbased, the model-based, and the deep learning-based.For a more detailed analysis of these applications, this article presents Tables 4 to 8, which contains system name, equipment, preprocessing methods, experimental environments, the number of participants, classifiers, behaviors, and system performance, etc.In addition, this article presents a discussion which recognition approach (e.g., the pattern-based, model-based, and deep learning-based) is more suitable and has the best performance for some typical behaviors, such as daily behavior, falling detection, hand gesture, crowd counting, user authentication, respiration monitoring.In sum, this paper presents a complete framework, precise sample statistic, detailed analysis of applications, and a comprehensive discussion about CSI-based behavior recognition applications.

B. CHANNEL STATE INFORMATION
CSI is a metric that describes the channel properties of wireless communication links and considers the several factors affecting signal propagation, such as signal scatter, environmental attenuation, and distance attenuation.The purpose of the introduction of CSI is to ensure effective and reliable data transmission by quantifying the channel fading effect and adjusting the signal transmission rate.Specifically, when the wireless signal propagates in a multi-path manner, it will be obstructed by the objects in the line of sight (LOS) path, which leads to signal changes, including amplitude attention and phase shift.Besides, the reflection from the test environment also changes the signal waveform.Therefore, CSI is introduced to evaluate the communication link state.That's to say, the quality of the wireless channel can be estimated by the CSI matrix, and the communication rate can be adjusted based on the CSI.In the IEEE 802.11n/ac standards, CSI is measured and parsed from the PHY layer using orthogonal frequency division multiplexing (OFDM) technology.In the frequency domain, the wireless channel can be defined as: where H is the channel matrix representing CSI information; the received and transmitted signal vectors are Y and X , respectively; N refers to an additive white Gaussian noise vector.According to formula (1), H can be expressed as: where H (i) represents the value of CSI for the i th subcarrier, which includes the amplitude and phase of CSI; the amplitude and phase of the i th subcarrier are |H (i)| and H (i).

C. THE PRINCIPLE OF CSI-BASED HUMAN BEHAVIOR RECOGNITION
The principle of WiFi CSI-based behavior recognition is based on the observation that different human motion within the coverage of the WiFi signal can cause distinct wireless channel disturbances and the corresponding relationship is unique.Therefore, we can utilize the correlation between CSI dynamics and user activities to recognize human behaviors.
To effectively describe wireless signal changes caused by human behaviors, we explain how the radio signal propagates and why the users affect wireless channels as follows.
A typical indoor human behavior recognition system consists of a WiFi access point, PC (e.g., WiFi receiver), and users.The wireless signal propagates in a multi-path manner and the wireless channel keeps relatively stable when there are no people in the range of the WiFi signal.However, once a person stands or moves in the coverage, the signal propagation paths will change due to signal reflection, which results in channel disturbances, including amplitude attention and phase distortion.In other words, the users will cause the received channel state information changes due to the multipath effects, which can be leveraged to recognize human behaviors.Specifically, the unique corresponding relationship between CSI dynamics and human behaviors can be obtained based on the behavior profile construction, which can be utilized to recognize different behaviors by correlating the signal changes with the corresponding channel distortion patterns.As shown in Fig. 2, according to the Friis [174] free space propagation equation, the power receiver can be represented as follows.
where λ is signal wavelength; d is the length of the LOS between AP and PC; P t and P r refer to the transmitting and receiving power, respectively; G t and G r are the transmitting and the receiving gains, respectively; h is the vertical distance from the reflection point to the LOS; is the length of the body's reflection path; h and are crucial factors that affect the received power, which indicates that the obstacle and reflection significantly affect signal propagation.From the formula (3), the received power is inversely proportional to the square of the propagation path.That is to say, the greater the distance, the smaller the receiving power and the more difficult the sensing.
In the frequency domain, the channel frequency response (CFR)H (f , t) can be defined as follows.
where Y (f , t) and X (f , t) are the received and transmitted signal, respectively; a k (f , t) refers to the signal changes of the k th path; the phase shift caused by the propagation delay of τ k (t) is e −j2π τ k (t) ; e −j2π ft refers to the phase shift caused by frequency offset f between the transmitter and the receiver.
As shown in Fig. 3, the human's movement changes the WiFi signal propagation; thus, the propagation paths can be divided into two categories, such as static path and dynamic path.In general, the signal of the static path is not affected by the human body and the frequency state response of this path is H s (f , t).Differently, the signal of the dynamic path is affected by the human body and other objects (e.g., floor, wall, etc.), and the dynamic path frequency state response H d (f , t) can be expressed as follows.
where P d refers to a total dynamic path; d k (t) is the length of the k th path at time t; λ is the wavelength.The total CFR power is expressed as follows.
The energy of the entire CFR changes along with the body's motion; therefore, we can make the following assumption that the human body moves with a constant speed, the length of a certain path k changes at a constant speed v k for a short time period.Let d k (t) denotes the length of the k th path and d k (t) = d k (0) + v k t, the instantaneous CFR power in time t can be written as: where the initial phase deflection of the signal is 2πd k (0) λ + φ sk or 2π(d k (0)−d l (0)) λ + φ kl .Formula (7) and formula (8) provide a significant view: the total CFR power is a sum that contains a constant offset and a set of sinusoids, where the frequencies of the sinusoid are related to the speed changing rate of path length.By measuring these sinusoid frequencies and multiplying the carrier wavelengths with these frequencies, the speed variation of the path length can be obtained.Then the movement speed of the human body can be calculated by observing the CSI changing, and different human behaviors can be identified.

III. GENERAL METHODS AND FRAMEWORK OF CSI-BASED HUMAN BEHAVIOR RECOGNITION
In this section, we present a general framework for CSI-based behavior recognition using COTS (commercial off-the-shelf) WiFi devices, introduce the crucial components, interpret their important role, and analyze their key characteristics.Based on signal processing procedures, the behavior recognition system comprises four parts, such as data collection, base signal selection, signal preprocessing, and behavior classifiers, as shown in Fig. 4. Based on the behavior recognition technique, we first categorize the recognition approaches into three groups, including pattern-based, model-based, and deep learning-based methods.Then, we introduce the key parts of the classifier of three types and analyze their main characteristics.
In general, we collect CSI data using a PC equipped with a network interface card (NIC) when the PC communicates with the access point (AP).After collecting CSI data, we must choose a suitable base signal, such as amplitude, phase, the combination of amplitude and phase, or phase difference.In order to wipe out the noise of the CSI stream and obtain more accurate CSI data, signal processing approaches become essential.As shown in Table 2, this paper presents some representative techniques to acquire precise CSI data, including low-pass filter, Hampel filter, PCA, DWT, data interpolation, phase sanitization, Butterworth filter, Savitzky-Golay filter, and Band-pass filter, etc.After signal processing, we recognize human behaviors using three types of recognition techniques, such as pattern-based, model-based, and deep learning-based approaches.The pattern-based method usually needs to extract features and classify different activities using machine learning algorithms (e.g., DTW, SVM, KNN, HMM, etc.) and neural networks (e.g., BPNN, SOM).The model-based method usually utilizes some typical models, such as Fresnel zone model, AoA, human respiration model, interacting model, CSI-speed model, and CSI-activity model, to identify human behaviors.Compared with the pattern-based methods, the model-based approach has fewer applications due to the difficulty of the building model.At present, the deep learning-based method is gradually applied in activity identification.Although this approach requires a large number of samples to train a deep neural network (e.g., Autoencoder, CNN, LSTM, RBM, etc.), it not only can automatically extract behavior features and classify samples but also can achieve satisfactory performance.In summary, this architecture gives us a general description of CSI-based behavior recognition.The more detailed activity recognition processes can be illustrated as follows.

A. BASE SIGNAL SELECTION
Generally speaking, accurate and complete data collection depicting user movements is very important to effective behavior recognition.Without adequate data, it is hard to mine behavior profiles or build a mathematical model to characterize the correlation between human behavior and corresponding signals.Furthermore, the selection of the base signal plays a crucial role in behavior recognition as it decides the resolution of data and the accuracy of behavior identification.For example, a complex number contains more information compared to a real number as we may calculate the direction and amplitude based on its real and imaginary parts.Currently, we can categorize the signal selection of existing applications into four groups: amplitude, phase, the combination of amplitude and phase, and phase difference.

1) AMPLITUDE
The physical meaning of CSI amplitude is the quantification of the signal power attenuation after multi-path fading [186].The user motion in the WiFi-enabled area affects the wireless signal propagation and changes the amplitude of signal arriving at the receiver, leading to amplitude variation.There is a unique relationship between the frequency of amplitude changes and human walking speed, which means the movement of a person can be detected after the measurement of amplitude.For instance, E-Eyes [55] utilizes amplitude measurements of CSI to realize nine kinds of daily activity (e.g., walking, cooking, washing, etc.) identification.FallDeFi [70] achieves 93% precision of falling detection through the analysis of amplitude variation of CSI subcarriers.WiFi-ID [112] realizes user identity recognition with 93% accuracy for two users by analyzing the amplitude of the CSI stream.

2) PHASE
In theory, the phase contains more information than the amplitude and can be used to depict the signal changes and corresponding user motion.However, as phase is periodic compared to amplitude and its measurement value is affected by device clock and carrier frequency, it must be calibrated to generate practical phase features.Usually, a linear transformation is a simple and effective calibration method.For example, Wu et al. [106] employ the linear transformation to revise phase information of CSI and implement user authentication.However, phase calibration has some adverse effects.For instance, a key limitation of the phase calibration is the lack of clear explanation and accurate description, which leads to the difficulty of building a precise phase-based model that represents the user motion.Meanwhile, the filter process might remove some phase information describing user movement, resulting in the loss of some detection capability.

3) COMBINATION OF AMPLITUDE AND PHASE
A combination of amplitude and phase might be a useful approach in that it can effectively utilize the advantages of both features and improve recognition accuracy.For instance, WiseFi [176] achieves better performance (e.g., average accuracy: LOS 89.1%, NLOS 82.5%, and through-one-wall 73.4%) for activity recognition by using the combined approach.NotiFi [66] can automatically detect abnormal activities by applying the combination of phase and amplitude of CSI.SignFi [150] identifies nine digits finger gestures with an average 86.66% precision by utilizing the information of phase and amplitude.

4) PHASE DIFFERENCE
Compared to phase, phase difference has several benefits such as space diverse and less noise.It is sensitive to human movements and can be leveraged to identify behaviors.Currently, there have been several studies that use the phase difference to detect various vital signals and complex activities.For example, Anti-Fall [68] distinguishes fall and falllike activities by using the CSI phase difference over two antennas.RT-Fall [69] applies the phase difference of CSI to realize falling detection with satisfactory performance.FreeSense [115] can automatically detect humans by the analysis of the phase difference of two signals.PhaseBeat [85] can monitor the user's respiration rate and heartbeat by analyzing the CSI phase difference.However, the approach usually uses an antenna array to measure phase difference, which leads to this method unavailable without an antenna array.

B. SIGNAL PREPROCESSING
As known to all, the measurements of CSI comprise helpful signals, disordered noise, and a few outliers.These bad data mainly come from ambient environments, signal interference, and some moving persons.Thus, the methods of data preprocessing are significant since it can remove outliers, filter noise, calibrate the phase, and retain valuable data.Currently, various signal preprocessing techniques can be used to implement noise removal, data interpolation, and phase sanitization.Generally speaking, the premise of correct behavior recognition is to obtain precise data representing human behaviors.Therefore, the data preprocessing is necessary before further feature extraction.Specifically, to reduce invalid data and enhance the accuracy of behavior recognition, we must remove the noise caused by multi-path effects and hardware devices.There are several de-noising methods, such as low-pass filter, Hamel filter, PCA, Discrete Wavelet Transform (DWT), data interpolation, and phase sanitization.

1) LOW-PASS FILTER
The low-pass filter is an ordinary method that only allows some signals below the cutoff frequency to pass.Since human activities always cause low variation frequency of CSI signals, the low-pass filter can effectively wipe out high-frequency noise and constant value that cause by the multiple-path effect and has been widely used at many applications, such as RT-Fall [69], Anti-fall [68], WiGeR [74], and R-TTWD [108], etc.Although the low-pass filter has great performance, there are still some noises that cannot be removed effectively.These noises usually come from the internal state of WiFi network interface cards (NICs) of sender and receiver or environment variation [169].
2) HAMPEL FILTER Currently, there have many abnormal values in the collected CSI data due to the sudden changes in equipment and environment, and so on.Hampel filter is an efficient technique to remove outliers that are far away from their neighbor data.Specifically, it finds the outliers and replaces them with the mean of data by utilizing a moving average window, which eliminates the negative impacts caused by invalid data.In the case of the CSI signal, the outliers caused by the intrinsic hardware feature and deployment environment can be wipe out by Hamel filter.For instance, Wi-Sleep [84] applies Hamel filter to wipe out the noise and monitors respiration rates.WiHACS [57] utilizes Hamel filter to denoise and identifies seven kinds of activities, i.e., sitting, walking, falling, etc.Although Hamel filter is efficient to remove noise, this approach usually fits these signals that only have Gaussian noise [164].

3) PRINCIPAL COMPONENT ANALYSIS
Principal Component Analysis (PCA) is a data processing method that represents the data with part features of a matrix, which reduces data dimensionality and enhances processing efficiency.Specifically, some eigenvectors are selected to construct a new matrix that represents the original matrix by computing the eigenvalue of the matrix.In CSI-based behavior recognition, the PCA is mainly used to wipe out the noise and data redundancy at signal processing steps (e.g., Wi-Wri [93], WiHACS [57], FreeSense [126], and CareFi [96], etc.).The PCA works well if the selective principal components can describe the original matrix [189].In the case of CSI data processing, we can employ PCA on the measurement data to remove various noises and redundant data coming from off-the-shelf devices and environments.

4) DISCRETE WAVELET TRANSFORM
Discrete Wavelet Transform (DWT) can be used at image processing because it can remove the noise of signal, extract and preserve some useful image edge information.It can overcome the shortcomings of the traditional Fourier transform image processing.These shortcomings include difficulties in detecting the local abrupt signals and the loss of image edge information when removing signal noise, and so on.On the contrary, DWT performs better than the Fourier transform in removing signal noise.Currently, there has a lot of applications that utilize DWT to denoise, such as WiFinger [76], WiStep [105], etc.

5) DATA INTERPOLATION
During the realistic applications, despite a constant rate of data delivery at the transmitter, the receiver usually cannot receive stable and constant data due to data loss or transmission delay.Therefore, effective means must be taken to address this question.Data interpolation is an effective method to solve this question due to its simplicity and practicality.Specifically, to get data at a fixed rate, we can add the sample computed from its neighbor to the time slot missing data, which constructs even CSI sequences, eliminates data clustering and fuzzy measurements, and enhances data accuracy.For example, Smokey [97] employs the interpolation method to preprocess the received data and achieves the recognition of smoking behavior.

6) PHASE SANITIZATION
Although we can acquire the raw phase information, it cannot be used directly due to carrier frequency offset (CFO) and sampling frequency offset (SFO) [165].Specifically, CFO comes from the lack of synchronization at the central frequency between the transmitter and receiver clocks, and the process of the analog-to-digital converter (ADC) can be likely to produce SFO.Moreover, the raw phase P M can be expressed as [157]: where P is the genuine phase; t is the time lag due to SFO; β is the phase offset due to CFO; N is the noise.From formula (9), due to the value of t and β are unknown, the genuine phase cannot be calculated directly.However, phase sanitization (e.g., linear fitting method) is proposed to remove the impacts of CFO and SFO, which makes phase information applicable for any detection [110].

C. PATTERN-BASED HUMAN BEHAVIOR RECOGNITION
The purpose of pattern-based recognition is to find the patterns and regularities from data.In most cases, the recognition procedure is to assign a label to a given input data.Therefore, many recognition problems can be treated as classification problems.As for pattern-based behavior recognition, we attempt to identify human behaviors by leveraging CSI variation pattern.Specifically, we first collect data that contains CSI changes caused by human behaviors.Then, we determine the CSI variation regularity and establish a unique mapping relationship between CSI variation and human behaviors.Based on the mapping, we exploit some recognition methods to identify specific behaviors.Therefore, the key to this approach is to relate the CSI signal variation profile to specific actions.How to identify and describe distinct patterns to distinguish different behaviors is challenging because CSI change is complicated.In general, the pattern-based approach for behavior recognition consists of two processes: feature extraction and activity classification.From identification processes, we can discover that these two steps are crucial to accurately identifying human behaviors.We introduce their basic principles and main implementation methods as follows.

1) FEATURE EXTRACTION
Feature extraction refers to a process that usually transforms the original information into new data types that can be easily utilized by the following classification.After feature extraction, we get a more effective data description.The measurement data usually contain much redundant information and they cannot be utilized directly without feature extraction.
Therefore, feature extraction plays a crucial role in CSI recognition approaches.Currently, we can extract many features from preprocessed data, such as statistical characteristics, Doppler shift, wavelet features, and time-frequency diagram.
a: STATISTICAL CHARACTERISTICS Statistical characteristics refer to some statistical results computed from the original waveform data.They effectively describe the general features, simplify data representation and reduce the data complexity.We can calculate the statistical features based on the original time-domain data.
Similarly, we can also get the statistical results based on the frequency domain data after FFT transformation of timedomain data.Some behavior recognition applications employ statistical features to extract features, such as TW-See [61], Anti-Fall [68], RT-Fall [69], etc.
b: DOPPLER SHIFT Doppler shift refers to the frequency changes of the original signal observed at one user when there is a relative speed between transmitter and user.It usually can be used to extract frequency shifts and deduce the object speed.Specifically, human movement changes the path length of the body's reflected signal, resulting in changes in the signal at the receiving end.The Doppler frequency shift of the received signal can be expressed as follows.
where c is the speed of light propagation; v path refers to the speed of the path length changes; f is the carrier frequency of the signal.Currently, there have some activity recognition applications based on Doppler shift.For instance, WiSee [182] implements behavior recognition by exploring the Doppler shift.WiFit [64] counts bodyweight exercise repetitions by applying the doppler shift.

c: WAVELET FEATURES
Wavelet transform is a new feature transformation method similar to the short-time Fourier transform (STFT).It overcomes the shortcoming of STFT that window size does not change along with frequency differences.After wavelet transform, the wavelet coefficients of each frequency band are the corresponding features needed to be processed.We call these coefficients as wavelet features.Wavelet transform achieves better performance of local feature extraction as it can VOLUME 7, 2019 analyze signals at the scale of more fine-grained frequency.For example, the CARM [124] uses the wavelet transform to distinguish speeds and frequencies of different behaviors and realizes eight kinds of behaviors (e.g., running, walking, sitting, and falling) recognition with an average accuracy of 96%.

d: TIME-FREQUENCY DIAGRAM
The time-frequency diagram describes the relationship between time and signal amplitude of all frequencies.For a given time signal, we can transform it into a two-dimensional color image using a time window.The image contains the time, different frequencies, and corresponding amplitudes.
In the case of CSI signal, the image illustrates the time index, subcarrier index, and signal amplitude values reflected by human motions.Based on the energy changes of different frequencies along with time, we can clip the time, get each action segment and identify specific behaviors, such as E-eyes [55], WiTT [63], Smokey [97], and WiStep [105], etc.

2) BEHAVIOR CLASSIFICATION
After feature extraction, we will identify human behavior by utilizing different classifiers based on the characteristics of specific applications.Specifically, for some simple classification problems that can be represented by a statistical characteristic, we can use the simplest comparison method by designing the threshold to distinguish the behavior types.
In other words, we can acquire a computing value based on some methods, and it can be used to determine whether the behavior is identified to a certain type according to the threshold assigned.For example, Kun et al. [190] use this method to recognize whether there are human activities in the environment and Zhang et al. [68] compare the standard deviation of signal with the threshold to determine whether a ''fall-like activity'' is a fall action.Differently, for the complex problems with high dimensional feature space, common methods cannot easily yield satisfactory recognition results.Therefore, in these cases, we can employ machine learning techniques to identify the pattern of CSI changes and relate these to human behaviors.Machine learning includes numerous algorithms, such as DTW, HMM, Conditional restricted Boltzmann machines (CRBMs), SVM, KNN, Artificial Neural Network (ANN), BPNN, and SOM, etc.Although all these algorithms might generate similar classification results for the small data set or data with fewer features, they may produce a significantly different recognition result for large data set or data with a big number of features.Here we give the specific characteristics of machine learning algorithms utilized by the representative studies and applications.
a: DYNAMIC TIME WARPING Dynamic Time Warping (DTW) is an important method that compares the similarity of two sequences, and it is also a template matching algorithm [191].In order to judge similarity, DTW calculates the distance of the two sequences by utilizing a dynamic programming algorithm.There are some DTW classification applications of CSI, such as WiGeR [74] and Mudra [75].WiGeR and Mudra utilize DTW to identify hand gestures.Even though DTW does not require training samples, it also has some side effects, such as a large amount of calculation and the recognition performance dependence on breakpoint detection and template, etc., [169].
b: SUPPORT VECTOR MACHINE Support Vector Machine (SVM) is a common classifier in the pattern recognition field.It transforms the linearly indivisible samples of low-dimensional inputting space into highdimensional feature space samples to divide the samples by using the nonlinear mapping algorithm, which realizes the sample separation in high-dimensional feature space.There are some SVM classification applications of CSI, such as DeNum [146] that achieves digit gesture recognition (e.g., 0 to 9) by using SVM.Although SVM has a better classification result, it is difficult to achieve multi-classification when the capacity sample is too large.c: K-NEAREST NEIGHBOR K-nearest neighbor (KNN) is a common classification method that classifies the data according to its K nearest neighbor.Specifically, KNN first finds the K nearest neighbor of the target and then calculates the node number of each type of K neighbors.We assign the target to the type based on its neighborhoods which hold the maximum nodes of that type.Currently, there have some human behavior recognition applications based on KNN.For example, FreeSense [115] applies KNN for user authentication and Wi-Wri [93] achieves the recognition of 26 English letters by using DTW and KNN.Despite the simplicity of theory and usability of deployment, KNN has some shortcomings as it just considers K nearest neighbors of one point.For data that have imbalance types or have large scale measurement values, the performance of KNN will severely decline.

d: HIDDEN MARKOV MODEL
Hidden Markov model (HMM) is a statistic model that usually is utilized to solve time sequence problems.Specifically, we can take the HMM to address the problems with the following two features.First, the problem describes sequence state changes, such as time sequences and state sequences.Second, the data of the problem consists of two types, observable data (observation sequences) and unobservable data (hidden sequences or state sequences).We infer the hidden sequence based on an observation sequence.Currently, several gesture recognition applications based on CSI have applied HMM to identify human movements.For example, Wang et al. [124] use HMMs to build a CSI behavior model for distinct motion states and realize the 8 kinds of behavior recognition.Despite the robustness for fine motion, HMM needs to tackle computation cost for the data with large hidden features.

D. MODEL-BASED HUMAN BEHAVIOR RECOGNITION
The purpose of model-based recognition is to leverage a mathematical or physical model to depict the signal variation caused by human behavior.Thereby, to identify human behavior, we have to build an appropriate model to interpret the CSI changes generated by user actions.However, many activities are conducted in indoor environments; therefore, the environment poses many effects on CSI propagation, which increases the difficulty of developing the physical model.Besides, CSI aims to evaluate wireless link state and improve communication qualification, which makes it difficult to correlate the specific signal change with the specific human behavior.In other words, the key of the model-based behavior recognition approach is to first establish the model that relates the signal space to the physical space and then utilize the model to determine the human behaviors based on the relationship between the received signal and the sensing target.Based on the physical model or mathematical model, human behavior can be accurately identified by exploring physical laws.In state-of-the-art CSI-based behavior recognition applications and studies, the typical models include Fresnel Zone Model, AoA, human respiration model, interacting model, CSI-speed model, and CSI-activity model.

1) FRESNEL ZONE MODEL
The concept of Fresnel zone originated from the research on the interference and diffraction of light in the early 19th century.When applied in a radio propagation area, Fresnel zones refer to the series of concentric ellipsoids with two foci corresponding to the transmitter and receiver antennas [3].As shown in Fig. 5, assuming P 1 and P 2 are two transceivers with a certain height, for a given radio signal with wavelength λ, the Fresnel zones containing n ellipses can be constructed by ensuring: where Q n is a point on the n th ellipse.The ellipse is the boundary of the Fresnel zone.Moreover, the innermost ellipse is defined as the 1st Fresnel zone, the elliptical annuli between the first and the second ellipse are defined as the 2 nd Fresnel zone, and the n th Fresnel zone corresponds to the elliptical annuli between the (n − 1) th and the n th ellipses [192].Apparently, the width of the Fresnel zone keeps decreasing as increasing of n and approaching λ/2.
From the mathematical view, Fresnel zone model describes the relationship between the geometric target position and the CSI power or amplitude caused by the target motion.In other words, when a target goes across different Fresnel areas, the signal path formed by the body reflection will vary with the different Fresnel area.Specifically, the different behaviors can be recognized by analyzing the signal path changes, such as walking direction estimation [129], respiratory detection [131]- [135], [193], human detection [126], and behavior recognition [125], [194].

2) ANGLE OF ARRIVAL
The angle of arrival (AoA) is a measurement that determines the propagation direction of RF waves on the antenna array.The basic idea is that whenever the user's body part occludes CSI measuring from a certain direction, the signal intensity of AoA corresponding to the same direction will descend distinctly [194].Specifically, as shown in Fig. 6, the human body blocks the signal propagation along the AoA θ i , but it does not affect signal transmission in other directions, i.e., θ 1 , θ 2 , θ 3 , etc., which obviously leads to the weakness of the signal intensity of the AoA θ i .In general, AoA can be evaluated by the phase difference between the antennas of the array [3].As shown in Fig. 7, the phase difference between the adjacent antennas can be depicted as follows.
where f is the frequency of the incident signal; c is the speed of the light; d is the interval between the adjacent antennas; d sin θ refers to the path difference; d sin θ c refers to the time delay.Moreover, the more antennas, the better the AoA estimation performance.In order to acquire accurate angle estimation, the MUSIC (multiple signal classification) algorithm has been applied, which adopts two or more AoA measurements from known points and utilizes triangulation to calculate the position of the signal source.All in all, the different behaviors can be tracked by analyzing the signal phase difference, such as handwriting recognition [130], human detection [126], [107], respiratory monitoring [136], and activity recognition [176].

3) INTERACTING MODEL
Wang et al. [128] present an interacting model that realizes person identification.The model treats skin, fat, and bone as three concentric layers, and each layer's medium of signal propagation is uniform.Moreover, the user is asked to stand on the vertical line in the middle of the TX-RX antenna.
As shown in Fig. 8 (a), the signal received S from the i th path can be expressed as follows.
where A refers to the medium absorption of signal propagation process; φ 0 is the initial phase of the signal; d i refers to the length of the i th path; the permeability and permittivity are µ i and ε i , respectively.Wang et al. just consider the paths of three layers and signals reflected by the human body, as shown in Fig. 8 (b).The signal received S can be extended as: where h i refers to a function that calculates the propagation of the i th path; the permeabilities, permittivity, and radiuses of the human body in each layer are M , E, and R, respectively; L means some location parameters.

4) CSI-SPEED MODEL AND CSI-ACTIVITY MODEL
Wang et al. [124] propose CARM in 2015, a human activity recognition system using the CSI-speed model and CSI-activity model [124], [186].The principle of these two models can be demonstrated as follows.As shown in Fig. 9, when one person moves from P 1 to P 2 , the WiFi signal is reflected by the wall and the human body, and propagation paths are changed, which causes multipath components.Specifically, the multipath components consist of LOS component and NLOS component caused by the wall, the human body, and other objects.The authors divide these multipath components into two types, such as static component (e.g., LOS component and the component caused by the wall and other objects) and dynamic component (e.g., the component caused by the human).Furthermore, the authors build the CSI-speed model and CSI-activity model to estimate the speed of human activities and multiple movement states by using the complex-valued channel frequency response (CFR) and analyze the amplitude and phase of CSI from static component and dynamic component.Due to some randomness of the CSI phase, it is difficult to develop a speed model by using phase information.Instead, the CSI-speed model builds a relationship between CSI amplitude and walking speed and evaluates the velocity of activities [186].Furthermore, the author applies HMM to establish the CSI-activity model and identify human behaviors based on CSI energy changes on low-frequency components (e.g., caused by slow movement) and high-frequency components (e.g., caused by fast movement).CSI-activity model achieves great performance for quantifying the relationship between the velocity of human and activities.

E. DEEP LEARNING-BASED HUMAN BEHAVIOR RECOGNITION
Deep learning is one of the machine learning algorithms and can classify the data by exploiting the deep neural network (e.g., autoencoder, CNN, LSTM, and RBM, etc.).Usually, machine learning algorithms require accurate features as input because these features characterize the input data and determine output results.Therefore, well-designed features are the premise of correct behavior recognition and directly affect classification accuracy.However, some feature extraction may rely on empirical experience, decreasing classification accuracy.Different from machine learning, deep learning usually does not need feature extraction steps since it can automatically discovery and extract features from input data with a neural network model.Deep learning enables a new classification approach that can handle a large scale of data with complex features.In other words, with deep neural networks, there is no necessity to pre-process data for the acquirement of feature descriptions, which is the significant advantage of deep learning.Meanwhile, the large scale of unknown parameters in the neural network can automatically be calculated by the training process.Although the training process usually is extremely time-consuming, it can achieve satisfactory performance.Deep learning approaches have been widely adopted in many scenarios: from the classical image recognition applications to the challenging natural language processing, up to visual art applications, and so on [195].In this section, we analyze the specific neural network models applied in CSI-based behavior recognition applications, including autoencoder, CNN, LSTM, and RBM.

1) AUTOENCODER
Autoencoder is an unsupervised neural network model, which manages to rebuild the input data as the output by setting a narrow-hidden layer.Specifically, it can learn the implicit features of input data, which is called coding.Afterward, the original input data can be reconstructed with the new features learned, which is called decoding.Intuitively, it can be used for data dimensionality reduction because the neural network model can find hidden features that can reproduce the input data on the output by utilizing the internal representation layers.In addition, it can also be utilized as a feature extractor since the new features learned by the encoder can be fed into the supervised learning model.For example, Doong [158] use autoencoder to extract hierarchical features from the original input and achieve the estimation of people's numbers.

3) LONG SHORT-TERM MEMORY NETWORK
The traditional neural network assumes that all inputs are independent of each other.This assumption may not be true for some applications such as natural language processing.Therefore, recurrent neural network (RNN) is proposed to address time sequences problems.Specifically, RNN is a simple cyclic neural network and can process sequential information.It performs the same task for each element of the sequence, whose outputs depend on the previous computation result.However, it cannot effectively tackle the longterm dependency problem because the output of the system depends on the previous information that occurred a long time ago.To solve this problem, some researchers propose the LSTM (see in Fig. 11).It is an improved RNN and can effectively process and predict important events with relatively long-term dependencies and intervals in time series.It has been used in various scenarios, such as image analysis, document summary, speech recognition, image recognition, handwriting recognition, music synthesis, and so on.Compared with RNN, LSTM adds a ''processor'' called cell to the algorithm to judge whether the information is useful or not.Three gates are placed in a cell, called the input gate, forget gate, and output gate.Information entering LSTM will be judged according to the specific rules.Only information certified by the algorithm will be retained, while other information will be forgotten through the forget gate.Based on this principle, LSTM is an effective technique for solving the problem of long-term dependency and achieves satisfactory performance.For example, Yousefi et al. [165] use LSTM to distinguish between lying down and falling.Feng et al. [178] recognize walking, running, and hand moving by applying LSTM, SVM, DTW with KNN.However, the learning processes of BM is complicated and time-consuming due to the connections among all nodes.
To reduce computation complexity and enhance learning efficiency, RBM divides all nodes into two groups (e.g., input units and hidden units) and eliminates the link between units within the same group, which enables a more efficient training algorithm.RBM has been extensively applied in numerous scenarios, such as collaborative filtering, weight initialization, dimensionality reduction, classification, feature learning, topic model building, and deep belief network.
In the case of human behavior recognition based on CSI, RBM can be used as feature extraction by using its powerful capability of representing complicated and hidden features.For example, Zhou et al. [146] built a stacked RBM for automatic feature extraction with higher accuracy and effectiveness to distinguish ten kinds of number gestures.Specifically, the authors first constructed the multi-layer RBM and then trained link weights with the probabilistic generative model layer by layer to obtain near-optimal initial values.Afterward, they unrolled the RBM and fine-tuned the weights with the back-propagation algorithm.Finally, the output of the last hidden units formed the effective features fed in the classification algorithm.

IV. CSI-BASED HUMAN BEHAVIOR RECOGNITION APPLICATIONS
Behavior recognition based on WiFi has received much research attention in recent years due to the popularity of WiFi devices indoors, the simplicity of approach, and the accuracy of the recognition results.In addition, since the CSI is sensitive to propagation path variation, it can be leveraged to depict these changes caused by human actions.As a result, CSI-based behavior identification techniques are widely used in the behavior recognition field.In general, human behavior recognition applications are categorized into two types: the pattern-based and the model-based applications based on whether the applications use the model to interpret human behavior.Although the recognition method based on deep learning can be regarded as a type of the pattern-based method, it is better to consider applications based on deep learning as a distinct recognition system due to its characteristics (e.g., without the feature extraction process and requiring a mass of data to train network parameters).Consequently, we categorize the state-of-the-art applications (as shown in Table 3) into three groups, including patternbased, model-based, and deep learning-based.For better understanding and analysis of CSI-based human behavior recognition applications, we divide applications of each recognition technique into two types, such as specific behavior recognition and activity inference according to the purpose of behavior recognition.
Specific behavior recognition refers to some simple human behaviors which we just consider as distinct actions.Specifically, we only need to concentrate on the characterization of specific actions and can obtain their labels without further inferences.According to behavior granularities, we classify the specific behaviors into two types, such as coarse-grained specific behavior and fine-grained specific behavior.The coarse-grained specific behaviors refer to some activities which we conduct with a large range.Specifically, it consists of some simple daily activities and physical exercises, such as jogging, lying, sitting, cooking, washing, standing, walking, playing table tennis, bodyweight exercise, etc.These activities usually last some time with a certain periodicity and happen regularly.Besides, we regard some abnormal activities such as falling as coarse-grained behaviors, which usually occurs suddenly and experiences a very short time.The fine-grained behaviors refer to the specific human activities with a small motion distance.It consists of hand motion, lip movement, heart rate, and respiration, etc.
Differently, activity inference usually refers to determining the meaning of actions.Specifically, it aims to make the behavior inference to reveal the hidden purpose of the activities instead of specific behaviors.For example, when we want to count the number of persons in the room, we do not focus on someone's specific behaviors.We can count the number of crowds by identifying the differences between different individuals.The activity inference usually includes smoking detection, crowd counting, step counting, human detection, and user authentication, etc.Based on the analysis, we investigate the state-of-the-art applications and exhibit a comprehensive statistic based on the purpose of the applications, as shown in Table 3.

A. PATTERN-BASED BEHAVIOR RECOGNITION APPLICATIONS
Currently, there have plenty of pattern-based human behavior recognition applications using CSI because the pattern-based approach provides many advantages.According to identification purposes, the pattern-based applications are divided into three groups, such as coarse-grained specific behavior recognition applications, fine-grained specific behavior recognition applications, and activity inference applications.We make some tables to elaborate on the characteristics of typical applications from many aspects, including the device used, preprocessing methods, test scenarios, number of users, recognized behaviors, classifier, and recognition performance, as shown in Tables 4 to 6.We can obtain key features of each application from Tables 4 to 6 because these columns describe the characteristics of the applications.Besides, we categorize the applications based on the specific function, compare them from the above aspects, and obtain the key difference of these identification systems.In Table 4, there is one extra column compared with Table 5 and 6, titled TTW (Through the Wall), which is used to mark whether these applications can work in through-the-wall scenarios.The reason that the column solely exists in this table is that the coarse-grained behavior has a big activity range, which can be identified in through-the-wall scenarios.On the contrary, fine-grained behavior usually cannot be identified because the variation of CSI through the wall is too weak to be detected.Differently, behavior inference normally involves some continuous movements; therefore, we usually focus on the activities rather than the application scenarios.We analyze the key components of activity recognition applications as follows.

1) COARSE-GRAINED SPECIFIC BEHAVIOR RECOGNITION a: DAILY BEHAVIOR RECOGNITION
Wu et al. propose a novel device-free through-wall human behavior identification system in 2018, called TW-See [61].This system identifies seven kinds of behaviors (e.g., walking, hand swing, boxing, etc.) by applying the opposite robust PCA (Or-PCA), low-pass filter, and BP Neural Network.Specifically, the authors utilize low-pass filter to eliminate the noise of raw CSI data.Next, Or-PCA is applied to extract the correlation between human motion and changes of CSI values.Furthermore, the authors adopt a normalized variance sliding window algorithm to estimate the beginning and end of the behaviors to segment activity samples.Besides, eight kinds of features are fed into three-layer BP Neural Network, including STD, MAD, IR, DA, MEA, and so on.The authors collect samples in three different environments (e.g., environment 1 and 2: 12-inch concrete wall, environment 3: the top of the wall is glass, and the bottom is concrete.).The experiments demonstrate that TW-See can identify activities in a through-the-concrete-wall scenario with an average 94.46% accuracy.[64].This system can accurately count the repetition of bodyweight exercise by analyzing the Doppler effect caused by human activity.This system first applies the impulse-based method to automatically segment each exercise, and then SVM is utilized to count the number of activities according to doppler features of three exercises (e.g., sit-up, push-up, squat).The authors invite 20 persons to participate in experiments and collect a total of 4350 samples in the office and meeting room.The results show that this system can count the number of repetitions of bodyweight exercise with an accuracy of 99% and accurately classify exercise type with an accuracy of 95.8%.

d: DANGER-POSE DETECTION
Zhang et al. [65] propose a device-free danger-pose detection system in 2018.This system can automatically identify three kinds of dangerous behaviors when taking a shower in the bathroom, including steady lying position, the whole-body sinks below the water surface, and the face sinks below the water surface.The central components of this system consist of preprocessing, activity classification, and danger detection.Specifically, the authors select amplitude and phase as base signals and then calibrate phase.Furthermore, the authors extract static and dynamic features from CSI lowfrequency signals and feed these features in one-class SVM.The authors evaluate the performance of this system using three types of criteria, such as precision, recall, and F1-score.According to experimental results, it demonstrates that this system accurately recognizes danger-pose with 83.61% precision, 96.23% recall, and 89.47% F1-score.

e: ABNORMAL ACTIVITY DETECTION
Zhu et al. propose a device-free non-invasive abnormal activity detection system in 2017, called NotiFi [66].This system leverages some steps including PCA, non-parametric Bayesian model, and Hierarchical Dirichlet Process (HDP) to detect five kinds of abnormal actions (e.g., slipping on the ground, falling, running, and breath pausing, etc.).Specifically, the authors first exploit PCA to remove noise and then design a model to describe the activity as the CSI state trajectory, including amplitude and phase information.The abnormal activities can be identified by exploiting the non-parametric Bayesian model and Dynamic Hierarchical Dirichlet Process.NotiFi evaluated in three scenarios (e.g., LOS, NLOS, through-one-wall scenario) from three rooms, including office, laboratory, and apartment.The experimental results show that NotiFi has a fine performance with an accuracy of 89.2% in LOS, 85.6% in NLOS, and 75.3% in a through-one-wall scenario.

f: FALLING DETECTION
Wang et al. propose a real-time, contactless, and cost-saving indoor fall identification system, called RT-Fall [69] in 2017.The authors utilize the phase, amplitude, and phase difference of CSI to detect falling motion.The authors discover that the phase difference of CSI is sensitive to fall and fall-like activity.Therefore, the authors exploit the phase difference for activity segmentation based on the correlation between human activities and phase change variance.In addition, the end of fall and fall-like activity can lead to a sharp power profile decline pattern in the time-frequency domain, which can be utilized to validate the activity segmentation.After signal preprocessing (e.g., interpolation and band-pass filter), the fall-like activity segmentation is performed to separate fall-related activities.Next, eight features are obtained in realtime and fed into v-SVM to identify fall activity.Experiments are conducted in four indoor scenarios, and results are compared with the WiFall to validate the algorithm performance.The results demonstrate that RT-Fall can separate the falls with the precision 100% and achieves 91% of sensitivity and 92% of specificity.Moreover, the sensitivity and specificity on average of RT-Fall are 14% and 10% higher than that of WiFall.
2) FINE-GRAINED SPECIFIC BEHAVIOR RECOGNITION a: HAND GESTURE RECOGNITION Tian et al. propose a specific device-free hand gesture identification system, called WiCatch [80] in 2018.This system identifies 9 kinds of hand gestures by applying phase information of CSI, including one-hand gestures (e.g., pull, push, slide, leftward, rightward, and wave hand) and twohand gestures (e.g., boxing, open the fridge, and open the window).This system consists of four main processes, such as interference elimination, virtual array construction, motion trajectory reconstruction, and gesture classification.Specifically, the authors eliminate phase errors caused by SFO, CFO and STO (a random time shift that can cause phase offset of CSI) and then utilize the Gerschgorin Disk Criterion (GDC) algorithm to estimate the number of angles of hand gestures.Moreover, the authors apply SVM and radial basis function (RBF) kernel to classify the spectrums of hand gesture and achieve single-hand and two-hand gestures identification with 0.97 and 0.96 average accuracies, respectively.

b: PHYSIOLOGICAL INDICATOR DETECTION
Liu et al. [88] propose a breathing and heart rate detection system during human sleeping in 2018, which can identify sleep postures (e.g., curl up, supine, prone, and recumbent) and realize breathing rate tracking.The main components of the system consist of CSI collection, coarse sleep event detection, heart rate estimation, breath rate estimation, and sleep event and posture identification.Specifically, the authors apply Hampel filter to remove outliers of each CSI subcarriers, and then a moving average filter is adopted to remove high-frequency noise that does not come from breath breathing or heartbeats.After that, the authors apply the PSD-based approach to estimate the human breathing rate.To evaluate the system performance, the authors utilize another four different techniques to identify respiration rates, such as discriminant analysis (DA), KNN, SVM, and RF.This system can infer heart rate with errors of 57% are less than 2 b/min and over errors of 90% are less than 4 b/min.Moreover, this system achieves over 90% accuracy for four sleep postures recognition using KNN (K = 5), SVM, and RF.

c: SIGN LANGUAGE RECOGNITION
Shang al. propose WiSign [83] and improve it [82] in 2017, two sign language recognition systems that recognize eight sign languages such as Hello, Thanks, Yes, No, etc.The main characteristics of WiSign are that this system leverages training data that solely have sparse labels to recognize gestures by using transfer learning and semi-supervised learning.Specifically, the authors choose two features as input vectors from the calculated eight waveform statistic features based on the received raw signal.Next, the authors classify unlabeled data with small labeled data based on SVM and KNN to use the semi-supervised learning-based solution.In addition, the authors discrete the feature values of any two samples and judge whether they are similar according to the absolute value of discretization feature calculation formulation.Based on this calculation and SVM, the authors implement the transfer learning-based solution.The experiments show that WiSign can achieve the mean prediction accuracies of 87.01% with the transfer learning-based approach and 87.38% with the semi-supervised learning-based approach for all participants, which are better than traditional SVM.In order to improve recognition accuracy, another laptop is introduced and different kernel functions of SVM is applied [82].Meanwhile, for some complicated gestures whose signal often overlaps, a weighted voting system is employed.The experiments show that the system obtains a better mean false positive of 1.55%, and enhances the recognition accuracy to 93.8% compared with having only one laptop in the same environment.[89] in 2016, a novel system to hear talks with WiFi signals.Two core components include the mouth motion profile for feature extraction and the learning-based signal processing approach for lip reading.Mouth Motion Profile construction consists of the following steps: localizing the mouth, filtering out interference and reflection using 3-order Butterworth IIR band-pass filter, partial multipath removal using delay over 500 ns in the time domain and FFT, and building mouth motion profile using the calculation of a coefficient C. The coefficient C depicts the peak to peak value on every subcarrier in a sliding window and is representative value of every time slot.Afterward, authors apply discrete wavelet packet decomposition of 4-order Symlet wavelet filter on the filtered signal.The lip-reading includes the following steps: segmentation with detection of the silent interval, feature extraction with Multi-Cluster/Class Feature Selection (MCFS) scheme that extracts main features from wavelet profiles, classification with dynamic time warping (DTW).The authors conduct experiments for recognition of 14 different syllables under six scenarios, including LOS, NLOS, and through-the-wall environments.Experiment results show that WiHear can achieve recognition accuracy of 91% for a single person speaking less than seven words and up to 74% for hearing less than four users simultaneously.

e: KEYSTROKE DETECTION
Ali et al. propose a keystroke recognition system in 2015, called WiKey [91].This system can detect keystroke accord-ing to the movement of hands and fingers.Firstly, the authors apply Butterworth filter and PCA to eliminate some highfrequency noise and extract the signals which only contain variations caused by movements of hands.The authors determine the start and end of activities by detecting an increase and decrease in rates of change in the CSI time series.Based on the experiment observation, means and variances or frequency components calculated from waveform cannot be used as feasible features due to the similarity between different keystrokes.Therefore, the authors utilize the extracted keystroke waveforms as feature representation since the shapes containing both time and frequency domain information of the waveforms are suitable for classification.The authors choose Daubechies D4 (four coefficients per filter) wavelet and scaling filters because they preserve enough information and finish the maximum computation.WiKey utilizes an ensemble of KNN using DTW distance as the comparison metric between keystroke shape features.Extensive experiments demonstrate that WiKey achieves more than 97.5% keystroke detection rate and 96.4% single keys recognition accuracy.In real-world experiments, WiKey achieves a recognition accuracy of 93.5% for keystrokes in a continuously typed sentence.

f: WRITING RECOGNITION
Zhang et al. propose a novel device-free letter recognition system, called LetFi [94] in 2018.This system can successfully detect the hand motion of writing 26 capital letters in the air and estimate which letter has been written.The main processes consist of robust CSI measurement extraction, the detection of the start and end point of the letterwriting activities, multi-domain feature extraction based on coherence histogram, and action classification using SOM network.Specifically, the authors apply PCA to extract the 1st principal component and utilizes it as the metric to detect the start or end of the letter-writing action.Furthermore, the authors utilize Fast Fourier Transformation (FFT) to acquire multi-domain features that contain amplitude and phase information of CSI matrixes.Next, the authors use SOM neural networks to train and test 832 (16 × 26× 2) CSI samples and realize accurate classification of multi-domain features.The results show that LetFi has 95% identification precision for the 26 capital letter recognition.
g: SEDENTARY BEHAVIOR MONITORING Yang et al. propose CareFi [96] in 2018, a device-free Sedentary Behavior (SB) monitoring system.In CareFi, the authors categorize sedentary behaviors into dynamic and static activities which have different properties.The authors find that different subcarriers are sensitive to different parts of the human body.They employ PCA to obtain the trends of CSI changes and distinguish dynamic activities based on the main features such as variance or kurtosis.Different from other applications, the authors utilize the frequency information to capture the critical features representing the static postures and occupancy position.In addition, they pro-pose a foreground detection method based on kernel density estimation (KDE) to extract coarse motion.On the preprocessing phases, linear interpolation, subcarrier refinement with IFFT and FFT, and DWT coefficients for temporal de-noising are utilized to acquire the sanitized CSI data.For dynamic activities, the authors choose the first four PCA components and eight features to describe activities, including entropy, mean, variance, standard deviation, kurtosis, skewness, interquartile range, and mean crossing rate.The authors apply Pearson product-moment correlation coefficient (PPMCC) to measure the similarity between two signal vectors and distinguish static postures.According to the experiment results, CareFi achieves 94.9% average accuracy for six common activity recognition, including reading, write, type, drink, sit up or down, and phone.The classification accuracy of human activities under environmental interference achieves 97.5%, and accuracy for other conditions vary from 90.5% to 100.0%.[97] in 2016, a ubiquitous passive smoking detection system.The smoking procedures are decomposed into six steps in a certain order.The authors find that different smoking phases affect different subcarriers because the subcarriers have distinct sensitivity to the motions of different parts of the human body.Based on this principle, the authors utilize the rhythm/order information to detect smoking activities from time-varying and subcarrier-dependent CSI information.The system consists of data processing, motion acquisition, and activity analysis.Specifically, due to the inherent noise of the CSI stream, linear interpolation is adopted on the irregular data to obtain the CSI sequence with samples evenly spaced in time.Afterward, the variations of CSI caused by motions are detected from dynamic noises utilizing the image processing algorithm, and the composite motions are acquired based on the temporal correlation and the frequency correlation.Then, the authors utilize autocorrelation to analyze the periodicity of the composite motions in each detection window and identify smoking motions according to the range of smoking period and the threshold of the standard deviation of periods.This system achieves 92.8% detection accuracy for the smoking activities and misjudges normal activities as smoking action with 2.3% error.In a static environment, the true positive rate (TPR) of Smokey can be as high as 0.976, and the average false positive rate (FPR) is low to 0.008, while in a dynamic environment, the TPR reduces to 0.919 and the FPR increases to 0.097.[99] in 2017, a device-free crowd counting system based on CSI data.Firstly, the authors apply the wavelet-based de-noising scheme to remove the ambient noise of the raw CSI data.Then, some features (e.g., statistics features, transformation-based features, and shape-based features) are calculated.Afterward, the authors choose the most representative features which are sensitive to human motion while resilient to environmental variation.Finally, rather than using the raw CSI feature space, the authors develop a robust crowd counting classifier based on transfer kernel learning (TKL) in the reproducing kernel Hilbert space to construct a domain-invariant kernel for SVM training.This system with TKL achieves 91.97% estimation accuracy, outperforming SVM with RBF kernel by 38.78%.In all, FreeCount estimates the number of people with 96% accuracy up to 7 users, consistent with that under temporal and environmental variation.

Zou et al. propose FreeCount
Zou et al. propose WiFree [101] in 2018, a novel application for occupancy detection and crowd counting.This system applies OpenWrt and the Atheros CSI tool, which collects more CSI data related to human behavior.The authors apply the low-pass filter and wavelet-based de-noising scheme to filter the noise of CSI data and then sanitize the raw CSI stream.In addition, the authors apply DTW to extract features and count the crowd by using transfer kernel learning (TKL) and SVM.This system detects seven volunteers in three rooms (e.g., discussion room, conference room, and seminar room) of different sizes.The results demonstrate that WiFree achieves 99.1% accuracy for occupancy detection and 92.8% accuracy for crowd counting.

c: STEP COUNTING
Zhang et al. propose a device-free multi-runner step counting system in 2018, called Wi-Run [104].This system can automatically estimate the number of steps when users run on the treadmill by three novel approaches, including Canonical Polyadic (CP) decomposition, stable signal matching, and the peak detection method.Specifically, the authors apply Hampel filter and Savitzky-Golay filter to eliminate noises that contain both low and high-frequency components.Next, Canonical Polyadic is applied to decompose signals related to running.Finally, the authors apply a peak detection method to estimate steps for each runner.The experiment results show that Wi-Run has 88.25% average accuracy for overall step estimation.Moreover, this system has an average of 91.30%, 90.21%, 86.97%, 84.53% accuracy when runners increase from two to five, respectively.
Xu et al. propose a novel device-free step counting system based on the torso frequency analysis, called WiStep [105] in 2018.This system can count in-place walking steps even when the user's torso speed is almost zero.Firstly, the authors apply Butterworth bandpass filter, PCA, and DWT to filter high-frequency noise caused by external interference, and then the time-frequency analysis method is utilized to segment and identify the walking by analyzing amplitude variances of CSI streams.Furthermore, wavelet decomposition is applied to extract useful coeffcients which are related to these frequencies induced by feet or legs.Finally, the authors apply the Short-time energy of the coeffcients to count steps.According to extensive experiments, WiStep achieves 90.2% and 87.59% accuracy for overall step counting in laboratory and classroom, respectively.[108] in 2017, a novel devicefree moving human detection system in a through-the-wall environment.This system employs Hampel identifier, PCA, low-pass filter, and SVM to detect human behavior.The authors adopt the Hampel filter to remove outliers that are some points falling out of the closed interval, and then they apply the 1-D linear interpolation algorithm to obtain continuous samples located in consecutive and even time slots.Afterward, they utilize low-pass filter and PCA to remove noise at the top position of the spectrum.After human detection and feature extraction, the authors apply SVM to classify the samples.The system achieves over 99% detection rates in many general experimental scenes which have different wall materials, dynamic moving speeds, and so on.[119] in 2018, a unique twofactor authentication (2FA) system that does not depend on an auxiliary device to determine the second factor.This system recognizes user identity by exploiting the signing motion.In other words, the authors identify hand geometry and the way of hand movement to estimate the people's identity.Firstly, the authors utilize the Inverse Fast Fourier Transform (IFFT) to transform the frequency domain of CSI to timedomain Channel Impulse Response (CIR), which deletes more than 500ns multipath delay.Next, the authors apply low pass Butterworth filter, DWT, and PCA to filter highfrequency noise and acquire detail-coefficients.A threshold method based on the first-order variance of the selected principal component is applied to determine the start and end of hand motion.Besides, the authors use one-class SVM (LIBSVM) with RBF kernel to classify eight features, including mean, skewness, kurtosis, standard deviation, etc.The experiment results show that Wi-Sign has an average 79% TPR for user's identity recognition of 14 users and has an average 86% TNR for attack detection (The common attack is that the intruder mimics the signer).

B. MODEL-BASED HUMAN BEHAVIOR RECOGNITION APPLICATIONS
Currently, there are some model-based human behavior recognition applications using CSI.These applications utilize the physical law and mathematical model to depict the CSI variation and recognize human behavior.We make Table 7 interpret the main components of each application, including the device used, preprocessing method, test environment, number of users, classifier, and recognition performance.From this table, we discover that there exist many evident differences between model-based approach and pattern-based approach.Specifically, there are more daily activity recognition and user authentication applications using a pattern-based method while there are one or two similar applications using the model-based method.Conversely, the number of respiration applications based on the model far exceeds that of applications based on the pattern.The leading cause is the difference in the characteristics of these two methods.Specifically, pattern-based methods are suitable for recognition or inference of large range activity while model-based methods are fit to recognize the periodic activities with fine granularity.Besides, we notice that the types of recognized behavior are very different.Specifically, there are six types of model-based applications while there are about eighteen types of pattern-based applications.The latter outweighs the former.The reason may be that building an appropriate model is challenging because many factors affect the precision of the model.The typical model-based applications include daily behavior recognition [124], [125], human detection [129], walking direction estimation [126], and respiration detection [24], [131]- [136].We interpret the key element of these applications as follows.

1) COARSE-GRAINED SPECIFIC BEHAVIOR RECOGNITION DAILY ACTIVITY RECOGNITION
Wang et al. propose a CSI-based human activity recognition and monitoring system, called CARM [124] in 2015.Two powerful theory models are proposed: a CSI-speed model, which describes the relationship between CSI dynamic value and human movement speeds, and a CSI-activity model, which depicts the relationship between the movement velocity of human body parts and a special individual action.Furthermore, the authors propose a series of signal processing approaches, including PCA to de-noise, DWT to extract features that represent the movement speeds, and HMM to address different individual activity speeds.Furthermore, it uses the Baum-Welch algorithm to calculate the average vector and covariance matrix associated with each state and the transition probabilities of HMM.In addition, the authors apply the Exponential Moving Average (EMA) algorithm to adjust the detection threshold to determine the start or end of an activity.Based on these models and algorithms, the authors deeply analyze the correlation between CSI value and human speed.The experiments with eight different actions in laboratory and apartment are used to validate the system.The results show that CARM can effectively detect small movements and large movements with a true positive rate (TPR) larger than 98% when the distances at 5 meters or 12 meters, respectively.Besides, it implements impressive mean activity identification accuracy of 96% across all activities in the trained environment and an identification accuracy for more than 80% in a new environment and with a new individual.
2) FINE-GRAINED SPECIFIC BEHAVIOR RECOGNITION a: HANDWRITING RECOGNITION Li et al. propose the first WiFi-based hand motion tracking system, called WiDraw [130] in 2015.This system can automatically track the hand's trajectory and estimate the letters written.The key structure of WiDraw consists of initial phase calibration and MUSIC algorithm.Specifically, the authors apply a laptop that equips Atheros 9590 to collect CSI data and then calibrate the phase of CSI stream.After that, the authors utilize a low pass filter to eliminate noise caused by environment inference.In addition, MUSIC's 1-D AoA model is adopted to track the hand's trajectory and realize the identification of drawn letters, words, and sentences.According to experimental results, WiDraw has fine performance of in-air handwriting (e.g., letters, words, and sentences) identification with an average 91% precision.

b: RESPIRATION DETECTION
Zhang et al. propose a novel contact-free breath tracking system in 2019, called BreathTrack [136].This system can automatically track the status of respiration by exploiting Hampel filter, FIR high pass filter, and joint AoA-TOF sparse recovery method.Specifically, the authors apply the Hampel filter and FIR high pass filter (the cutoff frequency is 0.05Hz) to eliminate low-frequency noise.To avoid the phase distortions (e.g., Carrier Frequency Offset (CFO), Sampling Frequency Offset (SFO), Packet Detection Delay (PDD), and PLL Phase Offset (PPO)), the authors apply a combination of hardware and software methods.Furthermore, a joint AoA-TOF sparse recovery method is adopted to get the phase variation of attenuation coefficient and acquire respiratory state and breathing rate.The results show that BreathTrack achieves more than 99% breath detection precision in most scenarios.

3) ACTIVITY INFERENCE a: WALKING DIRECTION ESTIMATION
Wu et al. propose the first CSI-based device-free human walking direction estimation system, called WiDir [129] in 2016.This system estimates a walking direction angle in real-time by analyzing the phase between two subcarriers.WiDir mainly consists of data acquisition, pre-processing, feature extraction, and direction calculation.Specifically, the authors apply the Savitzky-Golay filter to remove noise and smooth signals of CSI.After phase delay estimation and Fresnel direction estimation, a temporal-spatial model is applied to infer the walking direction.Extensive results demonstrate that WiDir can successfully detect the human walking direction with a median error of less than 10 degrees.

b: USER AUTHENTICATION
Wang et al. propose a novel device-free person identification system, called WiPIN [128] in 2018.This system can recognize user walking and infer the user's identity by utilizing the SVM and interacting model.The authors apply some techniques to process CSI data, including Butterworth filter to denoise and multipath effect removal based on IFFT-FFT frequency-domain method.After acquiring robust features that can represent walking, the authors utilize the interacting model and SVM to analyze the correlation between walking characteristics and the user's identity.According to extensive experimental results, WiPIN has 92% precision of the user's identity recognition over a group of 30 users.

c: HUMAN PRESENCE DETECTION
Xin et al. propose a novel device-free indoor human presence detection system, called FreeSense [126] in 2018.It not only implements human presence recognition but also models the sensing coverage for movements.This system can automatically detect indoor users by using the MUSIC algorithm, Fresnel zones model, AoA, and Wi-HD model, and have great anti-interference ability to resists noise, such as multi-path effect and device difference.Firstly, the authors apply the MUSIC algorithm to evaluate the phase difference between the waveforms in some receiving antennas.Furthermore, the Fresnel zones model, AoA, and Wi-HD model are utilized to estimate the coverage size and the sensing granularity.The experiments demonstrate that FreeSense achieves an average 0.53% false positive rate (FP) and an average 1.40% false-negative rate (FN); moreover, the coverage estimation approach of FreeSense has an average accuracy of 1.36 m.

C. DEEP LEARNING-BASED BEHAVIOR RECOGNITION APPLICATIONS
Currently, there have some deep learning-based human behavior recognition applications using CSI.These applications leverage the striking advantages of deep learning to recognize human behaviors.We make Table 8 to analyze the key features of recognition applications using deep learning from the following aspects, including the device used, preprocessing method, test environment, number of users, classifier, and recognition performance.From this table, we notice that the number of recognized behavior types using deep learning is between that of model and pattern.We deem that deep learning is becoming an important research method in behavior recognition using CSI.From recognition approaches, we find that most applications apply common deep models (e.g., Autoencoder, CNN, LSTM, RNN, and ResNet), which indicates that the general deep learning algorithm can be used at CSI-based behavior recognition and the capability of automatic feature extraction and classification can be exploited when developing these systems.We believe that more and more CSI-based behavior recognition applications will employ deep learning algorithms because it can build and extract the high-level behavior features that usually cannot be extracted by pattern-based or modelbased methods.The typical deep learning-based applications include daily behavior recognition [138], [139], [141], [142], falling detection [143], [144], syncope detection [145], hand gesture recognition [8], [146], [147], [149], sign language recognition [150], gait and walking direction recognition [151], [152], human detection [153], [154], crowd counting [156]- [158], user authentication [161], and respiration monitoring [22].We explain the key components of these applications as follows.

1) COARSE-GRAINED SPECIFIC BEHAVIOR RECOGNITION a: DAILY BEHAVIOR RECOGNITION
Wang et al. propose a novel device-free spatial diversityaware activity recognition system, called WiSDAR [138] in 2018.This system identifies eight kinds of activities (e.g., walking, falling, running, sitting, picking, pushing, waving, boxing) in four environments (e.g., laboratory, hall, apartment, office) by using Low-pass filter, PCA, STFT, CNN, and LSTM.Specifically, because daily activities cause more low-frequency components of CSI streams, the authors apply low pass filter and PCA to wipe out highfrequency noise.Moreover, the STFT is adopted for feature extraction.Finally, the authors utilize CNN and LSTM to train 5760 CSI samples collected from six students.The sufficient experimental results show that WiSDAR achieves 96% stable identification precision for eight kinds of activity identification.Zou et al. propose a novel device-free human behavior recognition system, called DeepHare [132] in 2018.This system can automatically identify 5 daily behaviors (e.g., sit, stand, walk, run and lying down) in three different rooms (e.g., conference room, office, apartment) by utilizing Autoencoder Long-term Recurrent Convolutional Network (AE-LRCN).Specifically, the authors apply the Atheros CSI Tool and OpenWrt to acquire CSI measurements, and then the low-pass filter and median filter are applied to eliminate inherent noise.Besides, the authors utilize AE-LRCN to train 400 thousand CSI samples and realize activity recognition.According to experimental results, DeepHare can identify 5 types of daily actions with 97.6% accuracy.[143] in 2017, a fall detection based on Wi-Fi spectrograms and deep convolution nets.This system first linearly interpolates the signal and then used Hampel identifier to wipe out outliers of CSI stream.Afterward, de-trending subcarriers, zero-padding, and tapering CSI waveforms are applied as the signal segmentation techniques for the detection of fall and fall-like activities.Next, this system applies the Singular Spectral Analysis (SSA) to remove the noise of CSI amplitudes rather than using DWT.Due to the shortcomings of STFT and Continuous Wavelet Transform (CWT) spectrograms, the authors decompose the input signal to Intrinsic Mode Functions (IMFs) with the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN).Then, the Hilbert Huang Transform (HHT) of the decomposed IMFs is applied to create spectrogram images, which are fed into 10-layer Deep Convolutional Neural Networks (DCNNs) for classification.The extensive experiments demonstrate that DeepFalls achieves satisfactory recognition performance compared with RT-Falls.In all environments, DeepFalls achieves a higher sensitivity (7.7%) and specificity (11.9%) on average than RT-Fall.When the furniture position is changed in the same room, DeepFalls implements a 4.07% higher sensitivity and 14.66% higher specificity than RT-Falls.When training on one scenario and testing in different scenarios, the sensitivity and specificity of DeepFalls are much lower than before.In this scenario, DeepFalls has 9.67% and 12.3% improvement in sensitivity and specificity than RT-Falls.

2) FINE-GRAINED SPECIFIC BEHAVIOR RECOGNITION a: HAND GESTURE RECOGNITION
Zhou et al. propose a novel device-free real-time finger gesture recognition system in 2018, called DeepNum [147].This system can automatically detect ten kinds of number finger gestures in three environments (e.g., meeting room, corridor, student studio) by using Higher-Order Singular Value Decomposition (HOSVD) and CNN.Specifically, the authors apply HOSVD to wipe out the noise of the CSI stream and acquire useful high-dimensional principal components.In addition, the authors utilize a 7-layer CNN to train 3000 (300 × 20 × 50%) samples of CSI and realize finger gesture recognition.According to the experimental results, DeepNum has 98% accuracy for 10 finger gesture identification.

b: SIGN LANGUAGE RECOGNITION
Ma et al. propose a sign language recognition system, called SignFi [150] in 2018.This system can identify more than 25 hand gestures by using 9-layer CNN.Specifically, because CSI phase change in the range of −π, π, phase calibration becomes necessary.Besides, a 9-layer CNN and KNN with DTW are utilized to identify 276 sign gestures.According to the experimental results, when SignFi validates system performance with one user in three places, such as laboratory, home, and both laboratory and home environments, this system has great performance with the accuracy of 98.01%, 98.91%, and 94.81%, respectively.In addition, SignFi achieves 86.66% mean precision for the recognition of 150 sign gestures performed by five different volunteers.c: RESPIRATION DETECTION Khan et al. [22] propose a novel device-free end-to-end deep learning-based approach to detect human respiration rate using CSI in 2017.The main components of this approach consist of adaptive cancellation, deep activity classification, and breathing rate estimation.Specifically, the authors apply USRP B200 SDR to collect the CSI data stream and then utilize an adaptive filter to wipe out the redundant echoes of signals.Furthermore, the authors utilize Random Forest (RF) and CNN to monitor the human breathing rate in three different environments, such as breathing motion environment, static environment, and random motion environment.In a static environment, the breathing rate detection performance with RF is pretty great than that with CNN, and RF has 100% accuracy whereas CNN achieves 94.85% accuracy.In other environments, the CNN model has higher robustness than RF and the performance of CNN outweighs that of RF.Specifically, CNN achieves the recognition accuracy of 81.03% and 85.17% while RF obtains 71.07%and 81.97% accuracy in random motion and breathing motion environment, respectively.

3) ACTIVITY INFERENCE a: GAIT SENSING AND WALKING DIRECTION ESTIMATION
Xu et al. propose a novel device-free cycle-independent human gait and walking direction detection system in 2018 [151].This system can recognize human gait and walking direction by using Inverse Fast Fourier Transformation (IFFT), Butterworth bandpass filter, PCA, STFT, and attention-based RNN encoder-decoder neural network.Specifically, it requires three steps to handle raw CSI, including long delay removal using IFFT, CSI de-noising using Butterworth bandpass filter, and CSI refining using PCA.In addition, the authors apply the attention-based RNN encoder-decoder neural network to determine human gait and walking direction.The results show that this system can recognize human gait from a group of eight volunteers with 89.69% average F1 scores and identity direction from eight walking directions with 95.06% accuracy.Moreover, the mean recognition accuracy considering both direction and human gait exceeds 97%.
b: HUMAN PRESENCE DETECTION Wang et al. propose an innovative deep learning-based activity recognition system in 2019, called CSI-Net [154].This system comprises four aspects: biometrics estimation (including body fat, muscle, water, and bone rates), human identification, hand sign recognition from 0 to 9, and falling detection.Firstly, the authors adopt mini-PC which equipped with Intel 5300 NIC to collect a great number of samples of CSI (e.g., biometrics estimation: 43077, human identification: 43077, hand sign recognition: 23896, and falling detection: 24398).Next, the median filter, mean filter, Butterworth filter, and DWT are applied to wipe out the noise, and then the Local Temporal Average is employed to smooth CSI streams.Finally, the authors apply DNNs, LibSVM, and Naïve Bayes to classify human activities.The result shows that DNNs has better identification performance (e.g., human identification: 93%, sign recognition: 100% and fall detection: 96.67%.)than LibSVM (e.g., human identification: 85.28%, sign recognition: 90.24 % and fall detection: 81.46%.), and Naïve Bayes (e.g., human identification: 72.97%, sign recognition: 81%, and fall detection: 73.01%.).c: CROWD COUNTING Liu et al. propose WiCount [157] in 2017, a robust deep learning-based crowd counting system by identifying walking and other specific activities (e.g., eating, waving, typing, talking, sitting down).This system uses Butterworth filter, weighted moving average filter, PCA to wipe out highfrequency noise of CSI stream.Next, SVM and Back Propagation Neural Network (BPNN) are applied to realize crowd estimation.According to experimental results, WiCount can automatically estimate the number of crowds up to five and ten with 82.3% and 75% precision, respectively.In addition, the experiments show that the combination of amplitude and phase information as behavior features can improve crowd counting precise and deep learning algorithm performs better than SVM for crowd counting.

d: USER AUTHENTICATION
Lin et al. propose a device-free user authentication system, called WiAU [161] in 2018.This system can automatically recognize 16 kinds of activities (e.g., walk, one arm wave, high arm wave, sit down, drink water, squat, phone call, etc.) and infer one user's identity according to walking gait.Specifically, the authors apply Butterworth low-pass filter to remove the high-frequency noise of raw CSI stream.Furthermore, Automatic Segmentation Algorithm (ASA) is used to segment continuous behaviors.Finally, the authors apply a convolution module that contains one CNN layer and 15 ResNet layers to prove user authentication (e.g., legal users and illegal ones).In this experiment, the authors invite 12 volunteers (9 males and 3 females) to participate in experimental data collection.The data ratio of training, validation, and testing is 60%, 20%, and 20%, respectively.Experimental results confirm that WiAU achieves over 98% precision for identity identification.

V. DISCUSSION
In this section, we make a detailed discussion on certain kinds of behaviors from different aspects, including test environment, size of samples, and system performance.This discussion section focuses on analyzing the characteristics of these behaviors and comparing the difference of the system implementation.We hope this part provides some insights into the development of human behavior recognition using CSI.Based on the representative applications, we analyze some main behaviors (see in Fig. 12), such as daily behavior recognition, falling detection, hand gesture identification, crowd counting, and user authentication.Since almost all systems employ Intel 5300 NIC as CSI collection device, we do not emphasize the hardware device in the following discussion.3.

A. DISCUSSION ON DAILY BEHAVIOR RECOGNITION
Daily behavior recognition is an important research field because it can be utilized to identify human life state and provides helpful information to evaluate lifestyle.Daily behaviors have been widely studied and can be recognized using different recognition methods.The main reasons may be explained as follows.Firstly, the range of daily behavior is bigger than that of other behaviors.Therefore, the variation of CSI is easily measured and the different behaviors are easily discriminated.Secondly, daily behavior recognition can be treated as a classification problem that can be effectively tackled.Therefore, we find that many applications of daily behavior recognition employ the pattern-based method rather than model-based methods.
We analyze the main characteristics of daily behavior recognition and make some comparisons between recognition methods.Firstly, from the size of samples, the number of samples of the pattern-based methods is less than 5000 and the median value is 2000.The samples of model-based methods are usually 1500 which is less than that of the pattern-based methods.Furthermore, the samples of deep learning-based are above 5500 except Khan et al. [141] and the maximum value reaches one hundred thousand.Secondly, we observe that the activities are conducted usually in 2 to 4 scenarios to assess system performance and the type of activities is usually 5 to 9, which validates the robustness of the system.The overall performance of daily behavior recognition is close to 95% for three recognition techniques, which indicates that different recognition methods are not a decisive factor.Although the performance of deep learning-based is a little higher than other methods, it usually requires more computation and more data to train neural network parameters, which is the main disadvantage.The best recognition results are shown as follows.DeepHare [139] has 97.6% precision using the deep learning-based approaches, WiHACS [57] achieves 97% precision using the pattern-based methods, and CARM [124] applies model-based methods and has 96% accuracy.Thirdly, in through-the-wall scenarios, the identification accuracy may decrease, such as WiHACS [57], DFS [60], and TW-See [61].For a special test environment, Wei et al. [56] consider the effect of radio-frequency interference that seriously decreases recognition precision.Besides, WiSPPN [142] studies human behaviors by analyzing human pose and calculates key point coordinates instead of behavior types; therefore, it applies a different evaluation metric to assess the system performance.

B. DISCUSSION ON FALLING DETECTION
Falling is a serious threat to the life of persons, especially for the elderly.Therefore, many researchers pay more attention to the falling detection using CSI due to its many advantages.Currently, there some studies which can automatically detect falling behaviors, such as WiFall [67], Anti-Fall [68], RT-Fall [69], FallDeFi [70], Dong et al. [71], DeepFalls [143], and WmFall [144].Based on the analysis of these applications, we obtain the following results.Firstly, for the falling detection, the pattern-based method is more popular than the deep learning-based method.To validate system performance, many applications conduct the behaviors at 2-3 test environments.In addition, we notice that some applications of falling detection consider falling as one of the recognized behaviors while some applications solely identify falling activity, such as DeepFalls and FallDeFi.Secondly, we discover that the number of samples of falling detection is similar to that of daily behavior when using pattern-based methods because some falling detection applications consider falling as a common behavior.The samples of falling have less size compared with that of daily behavior using deep learning because these systems usually consider whether the behavior is falling or not, which is a simple binary classification problem.Thirdly, the recognition accuracy of falling is lower than that of daily behavior using the pattern-based and deep learning-based approach because the duration of falling is very short, which leads to the difficulty of the detection and identification.Therefore, many applications add another evaluation parameter, false alarm, to provide a more precise assessment.The recognition accuracy of pattern-based and deep learning-based hold similar results, about detection accuracy of 90% and the false alarm rate of 12%.Some typical applications make comparisons with other applications to confirm system improvements.For instance, WiFall achieves 87% detection precision with a false alarm rate of 18%.RT-Fall compares its performance with WiFall and has a 14% higher sensitivity and 10% higher specificity than WiFall.Furthermore, FallDeFi and DeepFalls compare their performance with RT-Fall.FallDeFi and DeepFalls are more sensitive to detect falling than RT-Fall.FallDeFi achieves 93% accuracy of falling detection, compared with RT-Fall and CARM, this system improves 12 % and 15 % accuracy, respectively.DeepFalls has an average 7.7% higher sensitivity and 11.9% higher specificity than RT-Fall.Dong et al. [71] realize falling detection in the staircase with 94% precision.
The results indicate that the test environments affect the recognition accuracy of falling detection.

C. DISCUSSION ON HAND GESTURE RECOGNITION
Hand gesture recognition is an important field of human behavior recognition because it can provide effective information input for HCI (Human-Computer Interaction) and more help for communication of the deaf.Currently, there has a lot of systems which identify various hand gestures, such as WiG [73], WIGeR [74], Mudra [75], WiFinger [76], WiFinger [77], PWiG [78], Chen et al. [79], WiCatch [80], iGest [81], DeNum [146], FreeGesture [8], DeepNum [147], Widar3.0 [148], and Temporal Unet [149].From the recognition technique, we find that the pattern-based method is more popular than the deep learning-based method for hand gesture recognition.The experimental scenarios of these applications are simple compared with daily behavior.From the size of samples, the number of samples of the pattern-based is less compared with the deep learning-based.The size of samples used in pattern-based approaches is less than 500 (except WiG [73]) while the number of samples used in deep learning-based approaches is over 1000 for common hand gestures (e.g., DeepNum [147]: 6000, Widar 3.0 [148]: 1700).From system performance, the overall recognition accuracy is about 93% for LOS scenarios and less 90% for NLOS scenarios, such as WiG [73] and PWiG [78].DeepNum has an overall accuracy of 98% for American sign language, which is more accurate than other sign language applications.WiCatch [80] considers two hand gesture recognition and achieves 95% recognition accuracy.We notice that the number of hand gestures varies from 4 (e.g., WiG [73]) to 9 (e.g., Mudra [75]) and the size of ASL (digital gestures) is up to 10.

D. DISCUSSION ON CROWD COUNTING
Crowd counting is a significant study because it can provide helpful information on population mobility and human dynamics.Based on the information, we can conduct some intelligent crowd management when person density reaches some thresholds.Currently, there has many applications which estimates the number of users according to different walking states, including FCC [98], FreeCount [99], Guo et al. [100], WiFree [101], Wi-Count [102], HFD [103], Chen et al. [156], Wi-Count [157], Doong et al. [158], and DeepCount [159].The pattern-based systems have more applications than deep learning-based systems.We observe that the size of samples of pattern-based is less than 1000 while that of deep learning-based is above 16000.The experimental scenarios are very different, varying from 4 different environments (FCC [98]) to a single environment (Doong [158]).The recognized number of crowds varies from 5 users (WiCount [157]) to 15 users (FCC [98]).All these applications recognize walking users and the recognition precision decreases with the increase of users.From the perspective of recognition performance, the pattern-based methods have better results, and the counting accuracy is higher than 90% while all deep learning methods are less than 90%, which indicates pattern methods are suitable to crowd counting applications.

F. DISCUSSION ON RESPIRATION MONITORING
Normal respiration plays a crucial part in daily life because abnormal respiration state may endanger a person's life.Currently, plenty of systems based on CSI can detect human breath rate, including PhaseBeat [85], TR-BREATH [87], Liu et al. [88], Wang et al. [131], TinySense [132], Yang et al. [134], Zhang et al. [135], FullBreathe [24], BreathTrack [136], FarSense [137], and Khan et al. [22].Different from other behavior recognition that usually applies the pattern-based or deep learning-based methods, most respiration monitoring applications leverage model-based methods.The number of users varies from 2 (e.g., Tiny-Sense [132]) to 12 (e.g., TR-BREATH [87]).The patternbased method achieves the best results for the number of users (12 users) and recognition accuracy (above 98% for doze users under LOS and 9 users under NLOS scenarios).BreathTrack [136] monitors 8 users and achieves over 99% accuracy in most scenarios.For deep learning-based methods, Khan et al. [22] leverage CNN and obtain 98.85% recognition accuracy for 3 users, which is a common result because it collects more measurement data using USRP B200.
All in all, the pattern-based method is widely employed in most behavior recognition applications since it has many important advantages, such as fewer sample requirements compared with the deep learning-based approach, wide and deep study, simple recognition steps, and without the need of the precise mathematical model.Model-based approaches are extensively exploited in respiration monitoring due to the small range of body motion.In other words, the physical model is more appropriate for the detection of minor body motion.Deep learning is gradually being applied in CSIbased human behavior recognition because it has achieved extraordinary success in various scenarios, including image processing, speech recognition, and natural language processing, etc. Thereby, we hope that these CSI applications can leverage its powerful feature extraction and recognition capability to identify human behaviors.Generally, deep learning approaches need a large number of samples to extract features and implement classification, which is not absolute requirements based on our analysis.Thereby, if we have more samples, we can leverage the strengths of deep learning.If we have small size samples, we can select appropriate recognition algorithms based on characteristics of the system.Therefore, we deem that the size of samples affects the selection of recognition algorithms.In summary, we should select appropriate recognition methods to identify human behavior based on application requirements, experimental environments, and characteristics of algorithms.

VI. ISSUES AND FUTURE RESEARCH DIRECTIONS
Recently, human behavior recognition based on CSI has achieved remarkable success in many fields due to the popularity of WiFi devices and the improvement of recognition algorithms.However, we have to face many challenges when developing specific behavior recognition applications.Besides, with the gradual changes in requirements and work environment, we have to tackle more complex problems.In this section, we discuss some crucial issues and present some promising research directions.We hope that these contents provide some insights into the analysis of activity identification systems and facilitate the development of novel applications.We consider some representative problems and research trends, including electromagnetic interference, multiple users, through the wall, multiple AP, standard dataset, robustness, and security issues.These issues and further directions are discussed as follows.

A. ELECTROMAGNETIC INTERFERENCE
Although the dense deployment of WiFi brings us a more convenient network link, it causes serious electromagnetic interference, which reduces the accuracy of behavior recognition.As shown in Fig. 13, Wei et al. [56] exhibit a device-free activity recognition system with radio frequency interference (RFI), which can identify 4 kinds of behaviors (e.g., lying, sitting, standing, and walking).Based on this system, authors find that the CSI signal has been seriously affected and the measurement data have been apparently changed by RFI.Consequently, the behavior recognition accuracy decreases due to electromagnetic interference.Huang et al.consider the co-channel interference from channel overlap of WiFi devices and propose WiAnti [196], a robust anti-interference activity recognition system using CSI.WiAnti analyzes the co-channel interference and proposes a subcarrier selection algorithm to choose some subcarrier with a weak correlation.The system achieves 95.865% recognition accuracy, an 8% improvement compared with WiFall.With the increase of WiFi devices, how to reduce electromagnetic interference and improve the recognition accuracy may not be a nonnegligible issue.The effect of RFI on system performance should be a factor considered when developing and evaluating a gesture identification based on CSI.

B. MULTI-PERSON BEHAVIOR RECOGNITION
Multi-person behavior recognition refers to that the system can recognize the gestures conducted by more than one person simultaneously, as shown in Fig. 14.Specifically, when multiple persons perform actions in the coverage of CSI signal, besides the effect of gesture conducted by one user on the CSI, the mutual position of users also pose more complicated influence on CSI signal, which extraordinarily increases the difficulties of recognition because the user may walk to different positions.However, multi-person behavior identification is also indispensable for some applications.Thereby, some applications evaluate system performance by analyzing the test scenario.For instance, WiMU [59] is a multiple user gesture recognition system using a WiFi signal.It utilizes WiFi signal propagation law to build a theoretical model for depicting multiple user movement.It achieves average accuracies of 95%, 94.6%, 93.6%, 92.6%, and 90.9% for 2, 3, 4, 5, and 6 simultaneously performed gestures, respectively.Tan et al.propose MultiTrack [62], a multi-user tracking and activity recognition system.It extracts signal reflection describing each user using multiple WiFi links and achieves over 92% precision for activity recognition under multi-user scenarios.WiHear [89] identities 9 vowel pronunciation according to mouth motion profile, and it has 91% precision for one person and 74% precision for 3 persons.TinySense [132] detects a multi-person breathing rate by analyzing biggest stream with the peak-valley difference.Moreover, this system obtains over 95% and 88% precision for respiration rate detection of 1 or 2 volunteers at the same time.Yang et al. [134] propose a multi-person sleeping respiration monitoring system.This system estimates two people's breathing rate with the Mean Absolute Error (MAE) of 0.5bpm∼1bpm.TR-BREATH [197] detects the respiratory rate of 1 to 7 people.In this system, the accuracy of a single breathing estimation is 98.5%, and the average accuracy of 1 to 7 respiratory estimation is 96.9%.We discover that the increase in the number of users usually decreases the recognition accuracy.How to effectively address the effect of multiple persons and recognize different actions conducted by different users is a challenging problem.

C. THROUGH THE WALL DETECTION
CSI signal has a remarkable benefit because it can propagate through the wall.The communication between the sender and the receiver that locate in different rooms do not be interrupted if the distance between them keep a rational range.Consequently, human behavior recognition in a through-thewall scenario by using CSI extends the research field and provides us with promising applications.With the powerful communication through the wall, many systems have implemented attractive functions, such as E-Eyes [55], Ten-sorBeat [86], WiHACS [57], WiseFi [176], Smokey [97], NotiFi [66], R-TTWD [108], WIGeR [74], FallDeFi [70], DFS [60], and TW-See [61].However, from these throughthe-wall activity recognition applications, we find that only DFS [60] and TW-See [61] involve the material of walls, such as concrete walls and glass walls.For instance, DFS [60] identities 8 kinds of actions in two experiments with exceeds 85% precision.As shown in Fig. 15 (a) and Fig. 15 (b), twosided walls, such as the concrete wall, and the glass wall and concrete wall, isolates receiving end and transmitting end.TW-See [61] recognizes seven daily behaviors (e.g., walking, hand swing, boxing, etc.) with an average 94.46% recognition precision in two through-the-wall scenarios (e.g., through the glass wall, and through the concrete wall), as shown in Fig. 16.These systems prove that the material of walls can affect signal propagation and lead to different recognition accuracy.Although a few applications discuss the recognition performance in through-the-wall scenarios, they do not provide a comprehensive analysis of variation of throughthe-wall CSI signal.The applications and analysis of human behavior recognition in the through-the-wall scenario will be the hot research topic.

D. MULTIPLE ACCESS POINTS (APS)
Due to the short WiFi communication distance and low transmission power, the traditional CSI-based behavior recognition applications often use a single AP to validate algorithm accuracy [198].However, various factors (e.g., wall, noise, etc.) severely distort the signal propagation and attenuate signal energy, which may decrease recognition accuracy.With the popularity of WiFi devices, multiple access points are available in our daily environments.Therefore, utilizing multiple access points seems to be a potential solution to improve recognition accuracy.As shown in Fig. 17, data collection with multiple APs provides us with more information from more communication links, which enhances behavior recognition accuracy in complex experiment environments.For instance, NotiFi [66] adopts five APs to evaluate the impact of the number of AP on behavior recognition accuracy.The experiments in NotiFi confirm that the increase in the number of WiFi AP can improve system performance.However, the relationships between recognition accuracy and the number of APs remain unclear.Li et al. [198] propose a learning method that can analyze the CSI of multiple APs.This approach adopts 9 APs to collect data of CSI and then utilizes CNN to identify human activities.It proves that using multiple APs for human activity recognition can increase the identification precision effectively.However, how to decrease the RF interference from multiple APs, how to assign the position of the APs to obtain the optimal recognition accuracy, and how to coordinate the communication among these APs are the crucial problems to be solved.

E. STANDARD DATASET
Currently, almost all behavior recognition applications evaluate system performance using specific samples.Usually, authors recruit some participants to conduct some typical actions and collect the CSI data.The experiential environments are settled according to the specific requirements of the application.Consequently, system performance usually is validated under their own system arrangement.The comprehensive evaluation and comparison among more systems are difficult because the experiment conditions are very different.Although some applications provide some comparison with other applications, more analysis and discussion are needed.The standard dataset can significantly improve system performance evaluation and comparison.With an open and accurate dataset, we can accurately assess the system performance from many different aspects.As for how to build the dataset, we can obtain some experience from other open and successful datasets, such as CIFAR10 [199], Ima-geNet [200], MNIST [201].When building the dataset, many factors should be considered, such as action types, number of participants, the difference of users, test environments, and size of samples.We deem that the successful development of the standard dataset will boost the studies and applications of behavior recognition based on CSI.

F. ROBUSTNESS
Many behaviors of current applications are conducted in indoor environments.As a result, the environment exerts an important effect on recognition accuracy [183].Besides, the number of participants, age of the users, the position of devices, and types of gestures will affect recognition accuracy.How to make the algorithm available under different indoor scenarios and for different users is an essential issue because it confirms whether the approach has robustness.Some applications seek the solution using a modelbased or deep learning-based approach.They can alleviate the problem and improve recognition accuracy under different environments.However, with these two approaches, we have to face the difficulty of modeling or gathering large samples.Therefore, how to tackle this question has no clear answer.Currently, we can take some measures to mitigate the effect of different test environments.Development of a universal framework to tackle this problem seems to be a potential approach [8], [22], [138], [141], [147], [150], [161], [187], [188].Keeping robust and available for many scenarios is a fundamental problem for behavior recognition using CSI and can be solved in further research.

G. SECURITY ISSUES
The human behavior recognition based on CSI works under device-free and non-intrusive pattern, which provide a long-term and accurate monitoring of a user.The behavior recognition brings us many advantages and disadvantages.On the one hand, it provides us with more control over the smart device, timely health care to the elderly, and more help for the impaired person, which are their advantages.On the other hand, it may bring many disadvantages [202].For instance, it can be utilized to steal private information by recognizing keystroke information [91], inferring a textbased password [92].Besides, continuous behavior recognition actually poses strict surveillance on the user, which may lead to a serious threat to people's privacy if this information is utilized illegally.Furthermore, it can be leveraged to control the device remotely without permission in a quiet state.Therefore, how to control device only under the authority or how to prevent private information leak using CSI are essential problems.

VII. CONCLUSION
Human behavior recognition technology is an important research direction in the field of ubiquitous computing.Currently, human behavior recognition based on WiFi CSI has drawn more attention because it can overcome the shortcomings of traditional methods, such as the requirement of wearing physical sensors, privacy violations, and deployment costs.As a result, much significant research progress has been achieved in many application fields.This paper investigates state-of-the-art behavior recognition applications based on CSI and presents a comprehensive review of the key characteristics of these applications.
Firstly, this paper introduces the current general identification methods of behavior recognition, overviews related surveys, introduces the concept of channel state information, and illustrates the principle of CSI-based behavior recognition.Secondly, the article presents the general framework of behavior recognition in detail, such as base signal selection, signal preprocessing, the behavior recognition techniques including pattern-based, model-based, and deep learningbased approaches.Thirdly, based on the above recognition techniques, the article categorizes the existing studies and applications into three groups and elaborates on each typical application from the test device, experimental scenarios, number of users, behaviors conducted, classifier and system performance.Fourthly, this paper analyzes some specific applications and presents extensive discussions on the selection of recognition techniques and performance evaluation.These discussions provide some helpful insights into developing an identification system.Finally, this article concludes by presenting open issues and research future.

FIGURE 1 .
FIGURE 1. Recognition technique statistics of the different time from Table3.

FIGURE 7 .
FIGURE 7. Antenna array with one incident signal.

FIGURE 10 .
FIGURE 10.The structure of CNN.

FIGURE 11 .
FIGURE 11.RNN structure with the LSTM block.(a) Overall structure.(b) The detailed structure of the LSTM block at the p th time step.

FIGURE 12 .
FIGURE 12. Recognition technique statistics of the different human behaviors from Table3.
ZHENGJIE WANG was born in Liaoyang, Liaoning, China, in 1972.He received the B.S. degree in power engineering from the North China University of Water Resources and Electric Power, Henan, China, in 1995, the M.S. degree in computer software and theory from Northeast University, Liaoning, China, in 2003, and the Ph.D. degree in computer application technology from the China University of Mining and Technology at Beijing, Beijing, China, in 2013.Since 2003, he has been a Lecturer with the College of Electronic and Information Engineering, Shandong University of Science and Technology.He is the author of two books and more than ten articles.His research interests include human behavior recognition, people activity inference, person counting, identity authentication, and people tracking using WiFi devices and smartphones.KANGKANG JIANG was born in Jining, Shandong, China, in 1993.She received the B.S. degree from Jining University, in 2017.She is currently pursuing the M.S. degree in communication and information system with the Shandong University of Science and Technology.Her research interests include image processing, machine learning, and deep learning.YUSHAN HOU was born in Jinan, Shandong, China, in 1995.She received the B.S. degree from the Shandong University of Science and Technology, in 2017, where she is currently pursuing the M.S. degree in communication and information system.Her research interests include deep learning, machine learning, and signal processing.WENWEN DOU was born in Taian, Shandong, China, in 1996.She received the B.S. degree from Weifang University, in 2018.She is currently pursuing the M.S. degree in communication and information system with the Shandong University of Science and Technology.Her research interests include deep learning, machine learning, and signal processing.CHENGMING ZHANG was born in Zaozhuang, Shandong, China, in 1995.He received the B.S. degree from the Qilu University of Technology, in 2018.He is currently pursuing the M.S. degree in electronic and communication engineering with the Shandong University of Science and Technology.His research interests include machine learning, deep learning, and signal processing.ZEHUA HUANG was born in Binzhou, Shandong, China, in 1996.He received the B.S. degree from the Shandong University of Science and Technology, in 2018, where he is currently pursuing the M.S. degree in communication and information system.His research interests include machine learning, deep learning, and signal processing.YINJING GUO was born in Jining, Shandong, China, in 1966.He received the B.S. degree in radar engineering from the Ordnance Engineering College, in 1989, and the M.S. degree in communication and electronic system and the Ph.D. degree in weapon system and application engineering from the Beijing Institute of Technology, in 1992 and 2004, respectively.Since 1996, he has been a Professor with the College of Electronic and Information Engineering, Shandong University of Science and Technology.He has published more than 90 articles, among which more than 40 articles have been retrieved by EI and SCI.His research interests include wireless communications, electromagnetic compatibility theory and applications, special radars, and unmanned aerial vehicle.Dr. Guo is a Reviewer of the National Natural Science Foundation of China, and a Reviewer of National Science and Technology Award, a member of the Qingdao Senior Experts Association, and a Reviewer for many international journals.He has served as the local Chair for the first, second, and third International Conference on Intelligent Information Technology Applications and the 3rd IEEE International Conference on Communication and Mobile Computing.

TABLE 1 .
Related surveys on CSI-based behavior recognition.

TABLE 2 .
CSI-based human behavior recognition process.

TABLE 3 .
CSI-based human behavior recognition applications.

TABLE 5 .
Pattern-based fine-grained specific behavior recognition applications.

TABLE 6 .
Pattern-based activity inference applications.

TABLE 6 .
(Continued.) Pattern-based activity inference applications.: TABLE TENNIS ACTION RECOGNITIONChen et al. propose WiTT [63] in 2018, a unique system that can detect and classify table tennis activities by analyzing the changes of CSI values.Firstly, the authors use DWT, PCA, and Butterworth low-pass filter to wipe out the noise and extract useful features.Next, the authors utilize SVM to train samples and classify 9 activities of playing table tennis, such as forehand attack, backhand stroke, forehand pull, forehand pick, backhand pull, backhand close shot, step, step by step, and squat.The extensive experimental results prove that WiTT only has an average 79.78% accuracy for 9 kinds of behavior identification due to some similar actions, 90.33% accuracy for six types of activity recognition, and more than 96.34% detection rate for presence detection of table tennis actions. b

TABLE 7 .
Model-based human behavior recognition applications.

TABLE 8 .
Deep learning-based human behavior recognition applications.