A Survey of Handy See-Through Wall Technology

The through-wall system, which applies radio technologies to detect objects behind the wall, can find many appealing applications such as public security, life detection, and medical health monitoring. While being studied for years, the recent advances in high-performance handheld computing devices and artificial intelligence have made the through-wall system more practical. In this article, we present a tutorial-like study on the fundamental radio technologies used in the through-wall system, as well as its recent advances. Different from the traditional through-wall radars, this paper mainly focuses on the handy through-wall techniques with low power, narrow bandwidth, lightweight, no contact, and civil use. Advanced through-wall systems and open research issues are also presented.

People always have a great interest in knowing things behind walls, which makes the through-wall technology demanding. The through-wall technology refers to the technology which has the ability to detect objects of interest inside an enclosed area, and does not require users to wear any additional devices.
Through-wall technology can find the following typical applications: A. PUBLIC SECURITY Law enforcement personnel can use this technology to detect how many individuals are behind the wall and locate them as well. For example, in a hostage situation, policemen can use the through-wall technology to identify the distribution of terrorists hiding in rooms and make a more effective rescue plan of hostages.

B. LIFE DETECTION
Through-wall technology provides an efficient way for life-search and detection in disaster rescue. For instance, after an earthquake, the through-wall system, installed on The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy . unmanned aerial vehicles, can be used to locate survivors trapped under the rubble.

C. SMART HOME MONITORING
A house equipped with through-wall devices is able to monitor the movement of people, e.g., the elderly and children [1], inside the house and analyze their behaviors. For instance, it would be very dangerous if the elderly fall down, or children get to dangerous positions such as windows or balconies. With the real-time continuous motion detection achieved by the through-wall technology, many dangerous events at home can be alerted and prevented.

D. MEDICAL APPLICATIONS
Through-wall technology can also be applied to measure vital signs like breathing and heart rates of human in medical health care. Typical technology for vital sign measurement requires body contact; users need to put on extra devices on their bodies which can be inconvenient and uncomfortable. The through-wall devices can track the vital signals without the skin contacts and even through obstacles. The devices can discover the slight body movement caused by breathing or heartbeat, and measure the breathing and heart rates accurately by extracting the periodicity of the received signal [2]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ The through-wall technology mainly uses the radio frequency (RF) signal since RF can penetrate non-metallic obstacles like wooden walls and reflect off human bodies. Through-wall systems transmit electromagnetic signals and then process the received reflected signal to exploit information behind walls. In order to have a better penetrability, most proposed through-wall systems operate at the frequencies below 10GHz, which is because that wall attenuations of RF signals are reasonable below this frequency, and the higher frequency, the worse penetrability of RF signals is. In particular, the Industrial Scientific Medical (ISM) band is widely adopted in through-wall technology since most daily used wireless devices, e.g., WiFi and Bluetooth, operate at the ISM band of 2.4GHz and 5.8GHz.
The performance of through-wall systems is typically evaluated by the following key performance indicators (KPI): • Detection Ability refers to the accuracy of the throughwall detection. According to specific applications, the detection ability is measured in different criteria. For through-wall imaging systems, the detection ability is evaluated by the resolution of output images; for through-wall localization systems, it is measured by the accuracy of locations of the detected targets; for through-wall human body detection systems, it is measured by the number of people they can detect.
• Economical Efficiency refers to the cost of a whole through-wall system.
• Energy Efficiency refers to the power consumption of a through-wall system.
The objective of this paper is to survey the most commonly used techniques, existing systems, and challenges in the through-wall field. Rather than the through-wall technology in a general and broad sense, this paper mainly focuses on the handy through-wall systems which are suitable for civil use (satisfy the Federal Communications Commission (FCC) regulations and in ISM band ) with low transmit power (less than 1W), narrow bandwidth (less than 2GHz), lightweight and easy to deploy. The techniques for identifying, locating, or imaging the human body behind a wall are the focus of our work.
The structure of this paper is illustrated in Table 1. Section II lists several critical challenges encountered in the system design. Section III presents the four different categories of related systems or works. Section IV describes the basic principles of through-wall technology, and Section V introduces commonly used techniques to achieve through-wall detection. In Section VI, a number of existing through-wall systems are discussed with their functions, techniques, advantages, and limitations. Section VII closes the paper with concluding remarks and open research issues.

II. CHALLENGES OF THROUGH-WALL TECHNOLOGY
Although RF signals have natural advantages in the through-wall scenario, there are still fundamental engineering challenges as below.

A. SIGNAL ATTENUATION
The transmitted signal typically suffers a great attenuation, especially through dense obstacles. For example, a one-way traversal of a concrete wall can reduce WiFi signal power by 18 dB [3]. In the practical use of a through-wall system, the signal needs to traverse the wall several times, making the received signal very weak. As a result, the useful information in the received signal is prone to be buried in both the signal reflected off the wall and the direct signal from transmit antennas. The basic propagation principle is that the higher frequency, the worse penetrability is; nevertheless, the higher frequency can bring better resolution in detection. Therefore, it is a trade-off to choose an appropriate frequency in through-wall systems. A survey about through-wall radar points out that researchers tend to build through-wall radar below 3GHz [4]. Apart from that, signal strength would reduce as the propagation distance increases which has a notable effect on detecting multiple targets with different distances [5]. In multi-target scenarios, the target close to the device often has a stronger signal than the ones far from the device. In this case, the signals from those targets far from the device are prone to be buried by the stronger signals from the near targets. Therefore, setting a simple threshold to determine the existence of targets is insufficient and through-wall systems need to process signals from different targets differently to take the distance attenuation into consideration. Besides that, it is also challenging in tiny movements detection (e.g., breathing and heart rates detection) through a wall. The fluctuation in the reflected signal caused by the tiny movements is even weaker than the reflected signal from the whole human body. To address this challenge, Vital-Radio, a wireless breathing and heart rates measurement system, separates the space into multiple buckets based on the distance from the device and extracts the periodic signal components in each bucket [2]. By separating the space into multiple buckets, the signal with useful breathing and heart rates information would be isolated from the signals reflected off furniture and walls. Then, if users keep static or do just small movements (e.g., typing, watching TV), the dominating periodic components in the received signal are mainly caused by breathing and heartbeat, which makes it easier to measure breathing and heart rates.

B. SIGNAL MEASUREMENT
An important parameter in through-wall systems is the timeof-flight (TOF), which is the time that signal takes from the transmit antenna to the receive antenna. Because the transmission speed of the RF signal is very high, it is very hard to accurately measure TOF. For example, one signal only takes around 0.1ns to travel through 30m. Traditional radars leverage high-powered systems with high-speed ADCs (analog-todigital converter) to tackle this issue and they are mostly used in outdoor environments to detect objects at a distance of hundreds or thousands of meters. However, high-speed ADCs are expensive, power-consuming, and have low bit resolution [1], which make traditional through-wall radars bulky and costly. Moreover, the short distance of indoor applications impose even more challenges in accurate TOF measurements.
To address this challenge, through-wall systems mostly resort to three techniques: (1) leveraging frequency modulated continuous wave (FMCW) to measure the TOF indirectly, (2) measuring the direction of arrival (DOA) to obtain angle information instead of the distance information obtained by TOF, and (3) leveraging the pattern of received signal strength (RSS) or channel state information (CSI) to achieve localization. These techniques will be discussed in detail in Section III and V.

C. MULTIPATH EFFECT
The complicated environment of through-wall detection would cause a multipath effect, which would get worse in multi-target scenarios since more targets are accompanied by more reflection of signals. For example, in the indoor environment, through-wall systems need to deal with multipath signals such as ones reflected off walls and furniture. Such interference signals would bury the signal from intended targets. Moreover, the received signal from targets may not only be received directly by antennas but also be reflected off other objects in the environment or even other surrounding targets and then received by antennas. These indirect signals would result in the inaccurate location estimation of the target or even create some virtual targets which do not really exist. Virtual targets caused by multipath are very similar to the scenario in which a person is standing by a mirror and an observer may observe two persons, one is the real person, the other is the virtual person in the mirror. Therefore, those virtual targets would confuse the through-wall system like a virtual image of targets in a mirror or like a ghost and some efforts have been made to alleviate the impact caused by multipath specifically in the radar field. Chen et al. proposed a method of multipath ghost elimination [6]. Setlur et al. derive a model for the multipath in an enclosed room and propose a way to associate the ghosts back to the true location [7]. Those methods of multipath elimination are highly based on the precise model of multipath signals with certain radar techniques (e.g., Synthetic Aperture Radar (SAR)). Thus, for some handy through-wall systems which do not use the same radar technique, it is hard to apply these methods of multipath elimination to these systems directly.
To address the multipath effect, different through-wall systems leverage different methods based on the techniques they use and their scenarios. For instance, Wi-Vi exploits Multiple Signal Classification (MUSIC) algorithm to extract useful information from highly correlated signals (e.g., multipath signal) [3]. WiTrack takes the assumption that the multipath signals always travel a longer distance than the directly received signal, therefore, by determining the signal with the minimum propagation distance, it can alleviate the multipath effect and achieve higher accuracy [1]. However, this assumption only holds in the one-target scenario since there are multiple direct signals reflected off multiple targets in multi-target scenarios so that the minimum propagation rule can only determine one direct signal. To address the multipath effect with multiple targets, WiTrack2.0 adds the RF heatmaps from multiple pairs of transmit and receive antennas together to strengthen the direct signals [5]. Specifically, since different antenna pairs are placed in different locations, they suffer from different patterns of multipath signals whereas similar patterns of direct signals. Then, by adding the results from multiple antenna pairs, direct signals with similar patterns would superpose each other and become more salient.

D. MIRROR EFFECT
The wavelength of RF signals is quite longer than that of the visible light; in RF, body parts would reflect the signals from the transmit antenna but not scatter them. However, scatter is always more useful because scattered signals are visible in all directions, whereas reflected signals are very directional. Such directivity could result in only part of the reflected signal being received by antennas. For instance, the signal reflected off the chest is along the direction of the receive antenna but meanwhile the signal reflected off the legs is in the direction of the floor, so that only the chest is visible at this time. In other words, different body parts act like mirrors facing different directions thus some body parts cannot reflect VOLUME 8, 2020 the signal to the receive antennas, which is called the mirror effect [8]. In order to plot the whole human body via RF signals, one snapshot is insufficient, thus through-wall systems need extra operations. Fortunately, as the human body moves, the directions of reflected signals vary over time. Therefore, in different time periods, the visible parts of a human body are always different, which makes the capture of the whole human body possible. RF-Capture, a through-wall imaging system, leverages consecutive reflection snapshots to reconstruct the whole human figure based on the fact that different snapshots consist of different parts of the human body [9]. Likewise, RF-Pose, a through-wall imaging system with a machine learning method, needs to feed a sequence of consecutive frames of RF heatmaps into the neural network to address the mirror effect [10].

E. PRACTICAL ISSUES
One of the most practical issues is designing real-time through-wall systems which can process the received signals efficiently and render the result to users with low time delay. Handy through-wall systems are always more compact and with limited computing power, so that how to control the computation complexity is a very critical problem in system design. It is common that most through-wall systems are not able to do a perfect job to extract all kinds of information from the signals reflected off targets and most of them are designed to be an expert in extracting certain information. For instance, Wi-Vi leverages off-the-shelf WiFi devices to estimate the coarse angle information of targets behind a wall [3]. It is designed to obtain the number of people behind a wall rather than to estimate the accurate people's locations. WiTrack can only accurately locate one person behind a wall because multiple people would give rise to a more complicated environment (e.g., stronger multipath effect and distance attenuation) where more computation resources are needed to process signals [1]. RF-Capture, a through-wall imaging system, resorts to a strategy named coarse-to-fine scan to reduce the computation complexity [9]. Specifically, RF-Capture first scans a wide area with low scan resolution to find the coarse location of the target, and then scans only this target area with high resolution to achieve fine imaging. Other through-wall systems, such as through-wall breathing and heart rate measurement systems [2], do not locate the accurate location of targets and only extract useful vital information from signals. Therefore, to achieve real-time throughwall detection, designers have to make a trade-off between capacity and complexity.
Another practical issue is the relatively low detection resolution of RF signals comparing to the visible light. This issue actually limits the performance of today's through-wall systems in many aspects. For gesture classification, due to the low resolution of RF signals, most through-wall systems can only recognize large movements such as walking forward or backward, standing up or sitting down, and raising an arm in a certain direction. Moreover, tracking the trajectory of certain body parts is still challenging, researchers point out that due to the mirror effect, some body parts cannot reflect the signal to the receive antenna efficiently in certain directions or places [9]. Though some remarkable work has achieved fine-grained gesture and motion recognition via CSI, these systems cannot work well in through-wall scenarios and nearly have no ability to locate targets behind a wall [11]- [13].
For through-wall systems leveraging off-the-shelf devices such as WiFi and Radio Frequency Identification (RFID), band interference is also a crucial factor affecting the system performance. Most WiFi-based systems extract useful information from measured CSI or RSS. Take CSI as an example, in most experiments, CSI is obtained by an open-source CSI Tool which measures channel matrices for 30 subcarriers in the orthogonal frequency division multiplexing (OFDM) WiFi system [14]. Fine-grained gesture and motion recognition is achieved by analyzing the CSI pattern of different gestures and motions. However, once there are other WiFi devices operating within the same WiFi channel, signals from other WiFi devices would contaminate the measured CSI and then influence the performance of these systems. WiFi-based through-wall systems face the same situation that the ubiquitous WiFi devices may cause interference to them. Therefore, the robustness of such through-wall systems is influenced by band interference.
Apart from the above challenges, some researchers also focus on the properties of walls that could impact the performance of through-wall systems. When penetrating walls, RF signal would behave differently due to various electromagnetic parameters of walls. Therefore, by operating the estimation of wall properties, such distortion can be corrected. Solimene et al. propose a method to estimate the wall transmission coefficient by MIMO which is both time-saving and resource-saving [15]. Vishwakarma et al. propose a machine-learning-based method to mitigate wall interference effects [16].

III. CATEGORIES OF THROUGH-WALL SYSTEMS OR WORKS
Through-wall systems or systems with certain through-wall detection ability can be divided into four categories: (1) WiFi-based system, (2) radio tomographic imaging (RTI) system, (3) traditional through-wall radar, (4) and softwaredefined radio (SDR) system. The general setup of the four categories is shown in Fig. 1.

A. WiFi-BASED SYSTEM
Nowadays, nearly everywhere indoor is covered by WiFi signals and WiFi devices. WiFi-based systems are defined as the systems which leverage WiFi devices as a platform and detect things of interest by measuring the received WiFi signal. There are mainly three approaches for WiFi-based systems: (1) analyzing the Received Signal Strength (RSS) from the MAC layer [17]- [21], (2) analyzing the Channel State Information (CSI) from the physical layer [22]- [25], and (3) analyzing WiFi signals through software radio technology. RSS represents the received signal power or the signal-tonoise ratio (SNR) which can be measured within one or more packets transmitted in WiFi. RSS-based systems analyze the pattern of varying RSS, i.e., the received signal strength at the receiver, to extract information of obstacles (e.g., furniture, walls, etc.); the presence or movement of a human body would also cause an influence on the RSS. Nuzzer, proposed by Seifeldin et al., utilizes changes in RSS measured by wireless networks to locate users directly [17]. EZ, proposed by Chintalapudi et al., determines the location of users' mobile devices by measuring the RSS at different access points (APs) [18]. It is worth noting that EZ locates users by locating their mobile devices and therefore is not device-free. WiGest, proposed by Abdelnasser et al., utilizes changes in RSS to sense hand gestures which can be used to control a video player [19]. Mostofi et al. achieve through-wall imaging based on mobile platforms such as unmanned aerial vehicles (UAVs) [20] and robots [21].
Different from RSS which records the collective signal strength at the receiver, CSI between one pair of transmit-receive antennas contains the signal strength and the phase information of each OFDM sub-channel. As the signal strength and the phase information of one sub-channel reflect the properties of this channel, such as the information of signal propagation, and the effect of path loss, scattering, fading, etc., CSI describes more detailed information about the channel compared to RSS, and accordingly the processing of CSI is more complicated but also more reliable and accurate. E-eyes, proposed by Wang et al., leverages CSI measured from a WiFi network to track activities of one user [22]. Wang et al. show that CSI is more fine-grained than RSS by detecting the same activity with CSI and RSS, respectively, and comparing their results. FarSense, proposed by Zeng et al., can monitor human respiration by CSI even in through-wall scenarios with commodity WiFi devices [23]. FarSense employs the ratio of CSI readings from two antennas to reduce noise and the ratio enables the use of phase information to extract respiration information. WiSpy, proposed by Hanif et al., can sense the movement of persons behind a wall by analyzing CSI and predict their number as well by applying machine learning algorithms to CSI data [24]. WiGeR, proposed by Al-qaness et al., can achieve gesture recognition based on CSI extracted from any common WiFi router [25].
The software radio scheme applies the universal software radio peripheral (USRP) [3], [11] to process WiFi signals, or wireless open access research platform (WARP) to design customized WiFi systems [26]. Note that this type of system also meets the definition of Software-Defined Radio (SDR) system, as discussed in Section III-D. Specifically, USRP is an integrated hardware platform for software radio which allows users to configure and process the radio through software, such as modulation, coding, and signal processing. In other words, USRP provides more freedom in adapting WiFi radios than standard WiFi protocols and thus can provide and enables the deployment of radar techniques and complicated frequency analysis for more accurate measurement at the cost of system cost and complexity. Wi-Vi, proposed by Adib et al., can detect the moving people behind a wall [3]. It leverages the MIMO of the WiFi device to address the flash effect, which would be discussed in detail in Section VI. USRP is then employed to process WiFi signals with radar methods. WiSee, proposed by Pu et al., is a gesture recognition system leveraging USRP and can work in through-wall scenarios [11]. WARP has been developed for the algorithm or system design and is compatible with the commercial WiFi standard. WiDeo, proposed by Joshi et al., is a through-wall device-free motion tracing system with customized algorithms to process WiFi signals on WARP [26].
In conclusion, RSS is the most accessible information since you can obtain the received signal power or SNR easily on nearly every wireless system. Compared to RSS, CSI can provide details about each sub-channel (i.e., the strength and phase information) instead of the rough power provided by RSS. Though CSI is supported in 802.11n protocols, not every network interface controller (NIC) provides access to CSI data from the physical layer. The method most commonly used by researchers is CSI Tool with Intel 5300 NIC [14]. VOLUME 8, 2020 Therefore, it is harder to obtain and more expensive to process CSI data. USRP is a peripheral device independent of WiFi whereas we can obtain RSS and CSI merely through WiFi devices. The processing flexibility and ability of USRP outweigh WiFi devices but the cost of the USRP hardware platform is much higher than off-the-shelf WiFi devices.
The most significant advantage of WiFi-based systems is that they can exploit off-the-shelf WiFi devices, and therefore they are relatively cheaper and easier to implement. Besides that, the transmit power of WiFi devices is usually less than 50 mW which is lower than that of traditional through-wall radars. But the disadvantage is that the performance of such systems would be limited by WiFi devices (e.g., bandwidth, sampling rate, etc.) since they are designed for low-cost wireless communications.

B. RADIO TOMOGRAPHIC IMAGING (RTI) SYSTEM
RTI is an emerging technique [27]- [31] which reconstructs the spatial loss field of the environment by a dense wireless sensor network [30]. In specific, by placing a large number of sensors around the target area, RTI systems analyze the spatial accumulation effect of shadowing loss of the RSS to image area of interest [28]. RTI is an effective technique for through-wall detection and its sensor nodes are low-cost and low-power [29]. In order to have better accuracy, most RTI systems need to place a mass of sensors to enclose the whole area of interest. This, however, makes it impractical in some scenarios like hostage rescue.

C. TRADITIONAL THROUGH-WALL RADAR
Traditional through-wall Radar is a well-studied area and has achieved great success, especially in military applications. Radar-based methods can be roughly divided into three categories: ultra-wideband (UWB) radar [32]- [41], Doppler radar [42], [43], and Frequency Modulated Continuous Wave (FMCW) radar [44], [45]. With radar devices, some critical challenges in through-wall scenarios (e.g., flash effect [3]) can be easily solved. The most widely used one is UWB radar because of its high range resolution and good penetrability [32]. However, most traditional through-wall radars are costly and bulky since they need high-performance hardware, higher transmit power, and large antenna arrays which are the key features that make them different from the other three categories. For instance, though some SDR through-wall systems (will be discussed in Section III-D) also adopt techniques from radar fields, such as FMCW, their transmit power is less than 1mW, which is far less than that of the FMCW radars mentioned above (e.g., the transmit power is 1W in [45]). The high transmit power makes it hard for traditional through-wall radars to meet the power limitation in the ISM band, and costly and bulky hardware also prevents traditional through-wall radars from civil use.

D. SOFTWARE-DEFINED RADIO (SDR) SYSTEM
The SDR system refers to the RF system where most hardware components are replaced by software methods.
It requires professional RF devices (e.g., special antennas, radio frequency identification (RFID) devices) and special programmable software platforms (e.g., USRP). The flexibility of software provides SDR systems with more algorithm options (from radar to machine learning). Besides that, SDR systems controlled by software could achieve reasonable resolution with less transmit signal power than WiFi (even less than 1mW) since they can break the limits of WiFi devices (e.g., bandwidth, sampling rate, etc.) and focus only on through-wall purpose. Table 5 summarizes the power consumption of some systems. SDR-based systems have achieved through-wall localization [1], [5], [46], through-wall imaging [9], [10], [47], [48], and human vital signals detection [2], [49]. The advantage of such systems is that signal processing methods are relatively flexible but the disadvantage is the high cost of special devices like USRP.
WiTrack [1] and WiTrack2.0 [5], proposed by Adib et al., are designed for through-wall localization. They leverage USRP to generate FMCW signals to measure the distance from antennas to targets. RF-Capture, proposed by Adib et al., leverages FMCW and USRP to achieve throughwall imaging [9]. Tadar, proposed by Yang et al., leverages an RFID reader and multiple RFID tags to locate people behind a wall [46]. RF-Pose [10], RF-Pose3D [47], and RF-Avatar [48], proposed by Zhao et al., can reconstruct the 2D or 3D model of a human body through RF signals. These three systems combine RF devices and camera devices, and leverage machine learning techniques to train RF systems supervised by vision systems. Besides that, Vital-Radio, proposed by Adib et al., can achieve breathing and heart rate detection with FMCW signals even in through-wall scenarios, which shows the great potential of SDR systems [2]. Table 2 summarizes the systems mentioned above. The reference number with a superscript indicates that the corresponding paper meets the definition of handy through-wall technology in Section I and they are the focus of this paper. The basic principles and methods discussed in Section IV and V are mainly based on these systems.

IV. BASIC PRINCIPLE OF THROUGH-WALL TECHNOLOGY
Through-wall systems transmit electromagnetic waves to detect the space and then receive the reflected signals to obtain the location of targets. Some basic principles of the electromagnetic wave and localization are as follows.

A. PRINCIPLE OF ELECTROMAGNETIC WAVE
Since electromagnetic wave was discovered, it has been applied in numerous fields. Because electromagnetic wave has the ability of transmission and reflection, it is widely used in the through-wall technology with basic principles as follows.

1) TRANSMISSION
Electromagnetic signals can penetrate non-metallic obstacles, which means that these signals can allow us to get valuable information from things behind walls. The transmission  performance varies by frequency. Specifically, the higher the frequency, the worse the transmission performance is, which results in a low-pass filtering effect when penetrating an obstacle [53]. For example, visible light is in the band of 3.9×10 5 -8.6×10 5 GHz, whereas the RF signal is in the band of 3-300 GHz. Consequently, visible light has an extremely poor penetrability but RF signals can penetrate obstacles well or just bypass them like FM (frequency modulated) signals used in radio systems. Apart from that, the performance of transmission is influenced significantly by the material of obstacles. Materials like wood, drywall, and styrofoam have small loss effects whereas brick walls and blocks have significant loss effects [53]. Table 3 shows the attenuation of signals with different frequencies. Table 4 shows the attenuation of signals through different materials.

2) REFLECTION
Transmitted signals are reflected once encounter obstacles such as human bodies and furniture. Then, by analyzing these reflected signals, through-wall systems can calculate the distance or angle of objects. However, not all the reflected signals are valuable such as ones reflected off walls. Therefore, through-wall systems always need an extra filtering process to filter out unwanted reflections. In addition, there is a great difference between the reflection of RF signals and that of visible signals. Since the wavelength of RF signals is much longer than that of visible light, the human body acts as a reflector rather than a scatterer.

B. PRINCIPLE OF LOCALIZATION
Localization is a core function of through-wall technology. For easy understanding, we assume that the location of the target is in the 2D plane. Through-wall systems leverage the following principles to locate.
First, a widely used method for locating is achieved by the measurement of distance. Specifically, the round-trip distance, from the transmit antenna to the target and then back to the receive antenna, is used to calculate the location of the target. Suppose we know a round-trip distance between a pair of antennas, according to math principles, we can plot an ellipse and the sum of distances to the two antennas is equal to the round-trip distance we know. In other words, the couple of antennas is a couple of foci of an ellipse; the location of the target must fall on the periphery of the ellipse. Using one pair of antennas, however, is insufficient to locate the target, and we need another pair of antennas. Similarly, the second pair of antennas with known round-trip distance enables us to plot another ellipse on the plane. The foci of the new ellipse are two other antennas. Since the correct location is on both ellipses, one of the intersection points is the location of the target.
To express mathematically, assuming that the round-trip distance is d 1 and the locations of the pair of antennas are (x 1 , 0) and (x 2 , 0) in the (x, y) plane, respectively. The standard parameters of this ellipse are where a is the length of the semi-major axis and c is the distance between a focus and the center of the ellipse. Then we can derive the equation of this ellipse as Similarly, assuming the round-trip from another pair of antennas is d 2 , and the location of this pair of antennas are (x 3 , 0) and (x 4 , 0), respectively. Then we can derive the equation of this ellipse as Now we have the equations of two ellipses on which the location of the target falls. Therefore, by solving (3) and (4), the solution is the possible location of the target.
If the antennas are directional, it's very easy to rule out the invalid intersection, as shown in Fig. 2. If there are more than two pairs of antennas, the location can be uniquely determined. It's easy to generalize the argument to a 3D space. In the 3D space, ellipses established by pairs of antennas would become ellipsoids and 3 ellipsoids can determine one intersection (with directional antennas). Another way to locate is by leveraging both the distance and the angle. It is similar to the localization in the polar coordinate system. Take a 2D situation for example, each point in the polar coordinate system is determined by a distance from a reference point and an angle from a reference direction. Once we know the distance and the angle of the target, we can locate it accurately. In antenna systems, we can only get the round-trip distance rather than the direct distance to the target, hence like the above statement, we leverage the round-trip distance to plot an ellipse rather than a circle. Then, if we know the angle of the object, we can plot a ray and there will be only one intersection point which indicates the location of the object, as shown in Fig. 3. It is straightforward to generalize the argument to 3D space. In a 3D space, like a point determined by (r, θ, ϕ) in a spherical coordinate system, the location of an object can be determined by the round-trip distance and two spacial angles.

V. BASIC METHOD OF THROUGH-WALL TECHNOLOGY
Most through-wall systems leverage RF techniques to obtain distances and angles (namely direction of arrival (DOA)) of targets. Nowadays, in radar systems, there are many approaches to measure the distance and DOA, and most of them are based on the following ideas. Since through-wall technology has a strong relationship with radar technology, we can find that some through-wall techniques are similar to radar techniques.

A. OBTAINING DISTANCE
We usually leverage RF signals to measure the distance between two objects. Specifically, we know that RF signals travel at the speed of light so once we obtain the time of flight (TOF), which the signal takes to travel from its transmitter to receiver, we can calculate the propagation distance. The direct and indirect measurements of TOF are as follows.

1) PULSE METHOD
The pulse method is a direct solution to measure the TOF. First, the antenna transmits a very short pulse. Then, the receive antenna receives the echo of the pulse and calculates the time delay. The time delay is exactly the TOF we want. Finally, the distance from the transmit antenna to the  target and back to the receive antenna is where d 1 is the distance between the transmit antenna and the target, d 2 is the distance between the target and the receive antenna, c is the propagation speed, and t TOF is the time delay. Fig. 4 and Fig. 5 illustrate this situation. The advantage of this method is that the measurement is simple and does not need complex algorithms. However, the propagation speed is so fast that TOF at short range is extremely hard to measure. Specifically, in order to obtain such a short TOF, the through-wall system must operate at a very high speed to sample the signals at subnanosecond intervals. Therefore, high-speed ADCs that operate at multi-GS/s are necessary [1]. Such ADCs are power-consuming, expensive, and not practical in small-size systems.

2) FMCW
Frequency modulated continuous wave (FMCW) technique is an indirect method to measure the TOF. FMCW transmits a periodic signal whose carrier frequency changes linearly with time. The chirp reflected off objects and then received by antennas has both time delay and frequency shift. Instead of measuring the delay directly, FMCW leverages the frequency shift to calculate the delay. Since the carrier frequency is changing linearly in time, the delay and the frequency shift are linearly related. In Fig. 6 (a), the blue line is the transmitted signal whose frequency is changing linearly with time. The green line is the received signal reflected off an object. Because the received signal delays in time, there is a FIGURE 6. (a) shows the transmitted signal in blue and the received signal in green. The received signal has both the time delay (t TOF ) and the frequency shift ( f ) and these two differences are linearly related. (b) shows that with a higher rate of change in frequency, the f would be more notable.
frequency shift between the transmitted and received signals at the same time. Obviously, the TOF and round-trip distance can be calculated by the equations as where k is the slope of the frequency.
Besides that, to show the advantage of this method, we assume that the object is static, i.e., the distance between the object and antennas is constant. Hence, the time delay of the received signal is constant. According to (6), f only depends on k which is the rate of the frequency change. Therefore, the higher the rate of change, the larger the f is, as shown in Fig. 6 (b). Because the TOF is always very short in the short-range situation, we can obtain the TOF much easier by measuring the f which can be more than billions of times larger than TOF. For example, if the carrier frequency changes from 5 GHz to 7 GHz per second, theoretically, the frequency shift would be 2 × 10 9 times TOF, hence the easier measurement of TOF.
When applying FMCW to a real system, system designers need to make sure that the carrier frequency of the transmit signal changes linearly with time. To achieve this goal, the linearity of the hardware of the transmitter within the bandwidth is very important and any nonlinear components in the transmitter would affect the estimation of TOF. Take a voltage-controlled oscillator (VCO) as an example, if the control signal is out of the linearity tuning range, the output frequency of VCO would not satisfy the linear relationship with the control signal. Moreover, the selection of the bandwidth of FMCW affects not only the linearity of the output frequency, but also the detection resolution of FMCW. Typically, the resolution of FMCW, i.e., the minimum measurable change in location, is: where B is the bandwidth of the FMCW signal [1]. Therefore, in order to achieve a sub-meter accuracy, FMCW-based systems always have the bandwidth of multiple GHz, as shown in Table 5. Note that the bandwidths of such through-wall systems are selected around 1.7GHz to satisfy the FCC regulations for civil use.

B. OBTAINING DIRECTION OF ARRIVAL
For localization, sometimes the distance information is insufficient. There is another technique that allows us to obtain the direction of arrived signals (DOA) and can be used to detect the angles of objects. This technique leverages antenna arrays to achieve this goal. First of all, assume we have an antenna array containing m identical antennas which are placed uniformly in a straight line and the distance between two adjacent antennas is d, as shown in Fig. 7. After the demodulation, the output of antenna k(k = 1, 2, . . . , m) can be written as the following equation [55]: where θ is DOA, c is the propagation speed of signals, ω c is the carrier frequency, s(t) is the baseband signal, H k (ω) is the frequency response of kth antenna within the band of s(t), and e k (t) is the noise received at the kth antenna. Then, with m sensors, we will obtain where By (10) with known y k (t), s(t), and H k (ω c ) (k = 1, . . . , m), we can obtain θ (DOA) approximately. The details of derivation are illustrated in the appendix.
In the real-world scenario which contains more than one source, however, the environment would become more complicated since more sources could bring more noises and interferences. Some advanced algorithms were proposed to estimate multiple DOAs. The most famous algorithm is Multiple Signal Classification (MUSIC) algorithm, which performs well in multi-DOA estimation [56]- [58].

C. MACHINE LEARNING
To achieve through-wall localization with higher precision and more fine-grained through-wall motion classification, system designers need to formulate more sophisticated models to fit the complicated environment caused by minor movements of the human body and fine-grained motions. Therefore, researchers resort to machine learning, which is a powerful technology to find the hidden relationship in a complicated environment. The most used machine learning method is the convolutional neural network (CNN) which has a good performance in image processing. The reason that CNN is widely adopted in through-wall systems is that the formats of RF signals (e.g., RF heatmaps which indicate the signal strength in different positions) are quite similar to images. In the through-wall imaging scenario, each snapshot consists of lots of pixels representing the spacial information of RF signals. However, feeding the whole snapshot to a fully connected neural network is intractable since such a dense neural network would contain a huge number of parameters which are hard to train and easy to overfit. Therefore, CNN is used to reduce the complexity and extract only important features from the input RF signals and then uses these features to predict the positions of targets [10], [47], [48].
The traditional structure of CNN is shown in Fig. 8. There are three kinds of layers in CNN: Convolutional Layer, Pooling Layer, and Fully Connected Layer. Convolutional layers are used to extract features from the input data. Each neuron in a convolutional layer is connected with a group of neurons in the previous layer, and the relationship between the neurons in two layers is determined by a matrix which contains the weights of connection between neurons in two layers. This matrix is called Kernel or Filter. Specifically, a neuron in one layer is calculated by the convolution operation between the previous layer and a Kernel. Different Kernels can capture different features of data from the previous layers. Then, the pooling layer is used to reduce the spatial size of the convolved feature from convolutional layers. The main purpose of this layer is to identify dominant features and ignore the trivial ones in order to reduce the computation complexity. After multiple layers of convolution and pooling, the extracted features are put into the following fully connected layers, which are used for classification. In short, convolutional and pooling layers are responsible for feature extraction, and fully connected layers are responsible for mapping different combinations of features with different outputs.
The structure of the neural network in CNN-based throughwall systems varies from system to system but the ideas of extracting features from RF signals are quite similar. Specifically, a common way to process RF signals is called the spatio-temporal convolution, which applies convolution operations to both the spatial dimension and the temporal dimension. The motivation is that, in order to locate different body parts, the neural network should not only focus on the information in one snapshot (spatial dimension) but also take a series of consecutive snapshots (temporal dimension) FIGURE 8. Structure of CNN. The convolutional layers are used for extracting features, and the more convolutional layers, the more high-level features (e.g., faces and eyes) can be extracted. The pooling layers are designed to extracting dominant features after convolutional layers. The last layer is the fully connected layer, which works as a classifier to match different features with corresponding objects.

FIGURE 9.
Spatio-temporal convolution. The convolution operation is applied to a series of consecutive RF snapshots to extract features from both spatial and temporal dimensions [59].
into consideration, since the motions of the human body are continuous and highly correlated over time. Based on this motivation, the common input of CNN is a series of consecutive RF snapshots and the convolution operation is applied to both spatial and temporal dimensions, as shown in Fig. 9. For instance, in the 2-dimensional through-wall imaging scenario, we need 3-dimensional (2 spatial dimensions and 1 temporal dimension) convolution operations to process RF data; in the 3-dimensional through-wall imaging scenario, we need 4-dimensional (3 spatial dimensions and 1 temporal dimension) convolution operations to process RF data. Note that the higher the dimension of convolution operations, the higher the computational complexity is. Applying 4-dimensional convolution operations to extract features from RF signals directly is time-consuming and even mainstream open-source machine learning libraries (e.g., Tensorflow) only support up to 3-dimensional convolution operations. Therefore, Zhao et al. proposed a tensor decomposition technique to reduce the convolution dimension and leveraged auxiliary neural networks to help CNN focus on the regions with targets to reduce the computational complexity [47], [48].
In through-wall systems, machine learning is often used in cross-modal supervision. This is because the training set, i.e., RF data labeled with true human body positions, is insufficient in the through-wall field. Thus, systems with machine learning leverage vision-based human body tracking systems to supervise the training. Specifically, in the training stage, the RF-based system and vision-based system work together to extract human bodies respectively. The results of the vision-based system are then used to amend the neural network in the RF-based system. Most often, the visionbased system is based on another well-trained neural network. After the training, the RF-based system can work alone. The cross-modal supervision structure is shown in Fig. 10.

VI. EXAMPLE OF HANDY THROUGH-WALL SYSTEMS
In recent years, there is a rapid rise of handy through-wall systems with the functions of identification, localization, or imaging. These systems have low power, narrow bandwidth, and lightweight. Besides that, different from some traditional monitoring devices, these systems do not need extra devices equipped on the human body. Some radar techniques (e.g., FMCW and ISAR) and machine learning techniques (e.g., CNN) have been used in the existing through-wall systems. Some of these systems are explained in detail as follows and crucial implementation characters of these systems are summarized in Table 5, and their performance is summarized in Table 6.

A. Wi-Vi
Wi-Vi is a wireless device which was proposed by Adib et al. in 2013 [3]. It belongs to both the WiFi-based system defined in Section III-A and the SDR system defined in Section III-D. The main function of Wi-Vi is to detect the movement of people walking behind the wall. Wi-Vi leverages the MIMO technique, ISAR, and MUSIC algorithm to work in the through-wall scenario. Furthermore, another function of VOLUME 8, 2020  Wi-Vi is gesture encoding by tracking the motion of the human body. Wi-Vi allows a person to send bit information as ''1'' and ''0'' by moving forward and backward without any communication device.
Wi-Vi is a MIMO device with 2 transmit antennas and 1 receive antenna and leverages WiFi OFDM signals in the ISM band (at 2.4 GHz). Besides, Wi-Vi is based on the typical WiFi hardware which makes it relatively low-power, low-cost, and low-bandwidth. Since it has the standard WiFi structure, Wi-Vi might be implemented on WiFi devices and operated by the public in the future.
Wi-Vi has to address several challenges and the toughest one is called flash effect. In order to detect things behind walls, signals need to traverse the wall at least twice which would lead to significant signal attenuations. Therefore the reflections from the objects of interest are buried in the reflections off the wall as well as the signal directly received from the transmit antenna. These two kinds of signals are so strong that overwhelm the ADC of the standard WiFi device (the overflow in ADC means background subtraction is not desirable). Wi-Vi utilizes the MIMO technique to eliminate the flash effect. First of all, to eliminate the flash effect, Wi-Vi estimates a special channel between the first transmit antenna and the receive antenna and another channel between the second transmit antenna and the receive antenna. Then Wi-Vi lets the signals from two transmit antennas interfere with each other at the receive antenna. In this case, Wi-Vi can eliminate the direct signals and all the reflections off the static objects such as walls and furniture. After the flash is removed, Wi-Vi can receive only signals reflected from moving objects. The next step is localization. Because there is only one receive antenna, Wi-Vi employs a technique called Inverse Synthetic Aperture Radar (ISAR). ISAR leverages the movement of the target to emulate an antenna array. As mentioned above, to obtain the DOA of signals, an antenna array is necessary. Hence ISAR provides Wi-Vi a virtual antenna array to obtain the DOA. Fig. 11 illustrates ISAR used in localization. Comparing with a real antenna array, ISAR with only one antenna is cheaper and simpler. In the scenario of tracking multiple people, since the received signal is a superposition of multiple moving humans with significant noise and correlates in time, Wi-Vi leverages the smoothed MUSIC algorithm to process the received signals. Finally, data of DOA are the only straightforward output of Wi-Vi and all the other functions (e.g., number identification and gesture-based communication) are based on the information of DOA.

1) EXPERIMENTAL EVALUATION
Wi-Vi can distinguish between 0, 1, 2, and 3 moving humans with an accuracy of 100%, 100%, 85%, and 90% respectively at a room with 6-inch hollow walls supported by steel frames. Wi-Vi correctly decodes all the massages within 5 meters. And the decoding accuracy decreases to 75% at 8 meters. Besides, Wi-Vi cannot detect gestures beyond 9 meters.

2) LIMITATIONS
First, Wi-Vi can only track moving objects, i.e., if a person keeps stationary, its existence will not be detected. Second, the detectable number of individuals is limited because more individuals lead to more complicated situations. Third, the virtual antenna array is affected by the velocity of moving objects. However both the value and direction of the velocity are unknown, the DOA calculated in this antenna model is relatively rough.

B. WiTrack
WiTrack is a wireless system which was proposed by Adib et al. in 2014 [1]. It belongs to the software-defined radio system defined in Section III-D. The main function of WiTrack is 3D motion tracking in both line-of-sight and through-wall scenarios. WiTrack leverages FMCW technique and its hardware is shown in Fig. 12(b). Besides, WiTrack can also detect the movement of body parts. Based on the above two functions, WiTrack can be applied to elderly fall detection and gesture-based appliance control.
WiTrack has 4 antennas: 1 transmit antenna and 3 receive antennas which are arranged in a ''T'' shape. Fig. 12(a) illustrates the setup of antennas. By analyzing the reflected signals from a human body, WiTrack calculates the round-trip distances from the transmit antenna to the body then back to each receive antennas. As known in Section IV-B, the 3D location of the body can be uniquely determined by 3 roundtrip distances. Detailed steps are as follows: The first step is to obtain the TOF. WiTrack exploits FMCW, as illustrated in Section V-A.2, to obtain TOF and then calculates each round-trip distances. Specifically, WiTrack transmits a periodic signal whose carrier frequency changes linearly with time. Then, by measuring the frequency shift in the received signal, WiTrack can obtain the TOF easily without high-speed ADCs which are expensive and high power. At last, round-trip distances can be calculated directly from TOFs.
The second step is to extract the information of the human body. WiTrack also needs to face the flash effect and the multi-path effect. This is because the objects in the room would reflect a huge amount of signal and reflected signal would bounce off other objects before reaching the receive antenna. Therefore, useful information is buried in these interference signals and WiTrack needs extra operations to extract the useful parts. To remove the reflections from all the static objects, WiTrack leverages the fact that the frequency shift of the signal reflected off static objects is constant over time. Therefore, by extracting the changing component, WiTrack can distinguish body (moving) reflections from static reflections and this process is called background VOLUME 8, 2020 subtraction. Besides that, to address the multi-path effect, WiTrack extracts the valid path according to the assumption that the direct signal reflected from the human would have the smallest frequency shift (the distance is the shortest). The noise, however, would affect the judgment on the smallest frequency shift. WiTrack averages the data over five consecutive sweeps (the period of FMCW) to reduce the effects of noise. Because the noise is random and hence adds up incoherently, the useful information would become more significant after averaging. Then WiTrack finds the local maximum value with the shortest distance as a valid path. Finally, WiTrack leverages outlier rejection and Kalman Filter to obtain a better result. Fig. 13 illustrates this step. A noteworthy point is that, due to the background subtraction, WiTrack cannot track the person who stops moving. Therefore, WiTrack interpolates the latest location to supplement the missing data when a person stops moving.
The third step is the 3D localization. After getting the 3 round-trip distances between antennas and the body, WiTrack can localize the position in 3D space. Based on this 3D position, WiTrack has the ability to detect falls and the direction of a pointing hand.

1) EXPERIMENTAL EVALUATION
WiTrack's median location error is 9.9 cm, 8.6 cm, and 17.7 cm along the x, y, and z dimensions in line-of-sight scenarios respectively. The median location error is 13.1 cm, 10.25 cm, and 21.0 cm along the x, y, and z dimensions in through-wall scenarios respectively.

2) LIMITATIONS
First, WiTrack can track only one person. Second, WiTrack cannot identify still people, i.e., in order to be detected, the user needs to move. Third, the detection of body parts is relatively coarse. Only the motions of large parts such as legs and arms can be tracked. Besides that, WiTrack cannot identify which part of the body has moved.

C. RF-CAPTURE
RF-Capture is an advanced wireless device which was proposed by Adib et al. in 2015 [9]. It belongs to the software- Coarse-to-fine angular scan (figure from [9]). (a) shows the coarse angle scan with a small number of antennas. (b) shows the finer scan with more antennas but within the limited region.
defined radio system defined in Section III-D. The main function is capturing the human figure through a wall. Specifically, RF-Capture can capture a coarse human skeleton with low-power RF signals (1/1000 the power of WiFi), classify different subjects, as well as identify and track body parts through a wall.
The antenna array of RF-Capture is ''T'' shaped and contains 4 transmit antennas and 16 receive antennas. The scan of the environment is realized by a special coarse-to-fine algorithm based on the FMCW technique and antenna array technique. Besides, the figure capture is achieved by several identification and classification algorithms. Two key components are as follows.

1) COARSE-TO-FINE 3D SCAN
To capture a figure of a human, RF-Capture needs to scan every voxel in the surrounding space. RF-Capture exploits the spherical coordinate (r, θ, φ) to determine each voxel. The depth r is obtained through FMCW and two angles θ, φ are obtained through the ''T'' shaped antenna array. Calculating the position of each voxel, however, would take a long time. Therefore, RF-Capture leverages a coarse-tofine algorithm to reduce algorithmic complexity. The coarseto-fine algorithm contains two components: coarse-to-fine angular scan and coarse-to-fine depth scan. Coarse-to-fine angular scan exploits the fact that the larger an array, the finer its spatial resolution. Hence, as shown in Fig. 14, RF-Capture starts with a small array to scan all the space with a relatively low resolution and then gradually adds more antennas to the FIGURE 15. Coarse-to-fine depth scan (figure from [9]). (a) shows the coarse depth scan with a small chunk of bandwidth. (b) shows the finer scan with more bandwidth but within the limited region.
array to scan the region with high reflection power. In this case, the region with more objects would have a higher resolution, which can save lots of computing resources in empty regions. Similar to the angular scan, the coarse-tofine depth scan changes the depth resolution by changing the bandwidth of FMCW. First, RF-Capture uses a small chunk of bandwidth to obtain a coarse region of the human body. Then, by adding more bandwidth, RF-Capture rescans the region of interest with higher resolution, as shown in Fig. 15.

2) MOTION-BASED FIGURE CAPTURE
One challenge RF-Capture has to face is the mirror effect, as discussed in Section II. To address this challenge, RF-Capture leverages consecutive reflection snapshots to reconstruct the whole human figure. The process contains four steps: Compensation for Depth, Compensation for Swaying, Body Part Segmentation, and Skeletal Stitching.

3) EXPERIMENTAL EVALUATION
In the 5 users' condition, the classification accuracy is 95.7%. And in the 15 users' condition, the classification accuracy is 88.2%. Besides, the through-wall body part identification accuracy is 99.13% when the user is 3 m away and 76.4% when the user is 8 m away. Finally, RF-Capture can track the palm of a user within a couple of centimeters.

4) LIMITATIONS
First, the algorithm of motion-based figure capture is based on the assumption that the user starts by walking towards the device. Hence, it doesn't work with other motions. Second, though the resolution of RF-Capture has a great leap comparing to other through-wall devices, it can just capture a relative coarse human skeleton. Besides that, some functions such as body part tracking can only be applied to certain motions at certain positions since not all the reflected signals can be detected by the device. Finally, because of the use of background subtraction, RF-Capture cannot detect the static object.

D. RF-POSE
RF-Pose is a wireless device which was proposed by Zhao et al. in 2018 [10]. It belongs to the software-defined radio system defined in Section III-D. RF-Pose is a neural network system that reconstructs an accurate 2D human pose even in through-wall scenarios.
RF-Pose is relatively special because it follows a teacher-student design based on a deep neural network. RF-Pose has two parts: visual supervision and RF-based pose estimation, as shown in Fig. 16. The visual part has the ability to extract 2D pose through a camera, which provides cross-modal supervision for RF-based pose estimation. The RF-based part is used to detect reflections of RF signals and perform pose estimation which is trained by the visual part. Specifically, RF-based pose estimation leverages FMCW to obtain depth information with two antenna arrays: vertical and horizontal. Therefore, as shown in Fig. 17, RF-Pose generates two RF heatmaps and an RGB image (recorded at the same time) as input to the deep neural network. Then the teacher network predicts keypoint (parts of a human body) confidence map based on the image and exploits this keypoint map to supervise the student network. Once the student network is trained, RF-Pose can work only with RF signals.
One challenge RF-Pose has to face is the mirror effect, as discussed in Section II. Since the human body reflects RF signal like a mirror not a scatter, one snapshot cannot capture all parts of the human body. To address this challenge, RF-Pose packages a sequence of RF snapshots as the input to the neural network, since multiple snapshots contain more information about the human body and motion.

1) EXPERIMENTAL EVALUATION
In line-of-sight scenarios, the average precision (AP) of RF-Pose is 62.4 whereas the AP of the vision-based system is 68.8. In through-wall scenarios, the AP of RF-Pose is 58.1 whereas the vision-based system doesn't work.

2) LIMITATIONS
The RF signal of RF-Pose can traverse walls, however, it cannot traverse the human body. Therefore, inter-person occlusion is a limitation of RF-Pose.

E. OTHER SYSTEMS
There are some other handy through-wall systems with advanced or novel functions. All of these systems have features of low power, low bandwidth, and compaction. Besides, these systems do not need users to carry extra devices and can be applied to non-military use as well.
WiTrack2.0, proposed by Adib et al. in 2015, is an updated vision of WiTrack [5]. WiTrack2.0 belongs to the software-defined radio system defined in Section III-D. WiTrack2.0 has the ability to locate multiple people (up to five people) and even static ones. There are two challenges that WiTrack2.0 has to face: The first one is the significant multipath effect since the movement of multiple people would result in more reflections, which influence signals a lot. The second one is the near-far problem. Since the power of near reflections is much higher than that of distant reflections, VOLUME 8, 2020  it is difficult to detect further people. WiTrack2.0 leverages multi-shift FMCW to address the multipath effect and successive silhouette cancellation algorithm to face the near-far problem. Besides that, the detection of static people is based on their breathing, i.e., when a person is breathing, his/her chest moves within a sub-centimeter. Finally, WiTrack2.0 has a median accuracy of 11.7 cm in each of x and y dimensions.
Vital-Radio, proposed by Adib et al. in 2015, is a wireless monitoring device [2]. It belongs to the softwaredefined radio system defined in Section III-D. Vital-Radio can track breathing and heartbeats without physical contact even through a wall. Besides that, it can monitor the vital signs of multiple people simultaneously. The main idea of detection is to identify the minor movement of the human body caused by the breathing and heartbeat. There are three steps in the operation: First, Vital-Radio leverages FMCW to separate the space into different buckets depending on the distance from the device, as shown in Fig. 18. In this case, Vital-Radio can isolate reflections from different users and eliminate reflections off furniture and walls since they are in different buckets. Then, Vital-Radio identifies the reflections involving the breathing and heartbeat by analyzing the periodicity in each bucket. Since the breathing and heartbeat are periodic, the reflections in the buckets where the dominant motion is breathing and heartbeat would be periodic. Finally, leveraging FFT and linear regression on phase, Vital-Radio can extract the breathing and heart rates from the received signals. In the experimental evaluation, Vital-Radio can track users' breathing and heart rates with a median accuracy of 99% when a person is 8 m from the device.
RF-Pose3D, proposed by Zhao et al. in 2018, is an updated vision of RF-Pose [47]. RF-Pose3D belongs to the softwaredefined radio system defined in Section III-D and is the first system that extracts full dynamic 3D skeletons of people (including the head, arms, shoulders, hip, legs, etc.) from RF signals. Besides, RF-Pose3D works with multiple people even in through-wall and occlusive scenarios. RF-Pose3D is based on CNN, which takes the RF signal captured by multi-antenna FMCW and 3D video of skeletons as inputs, as shown in Fig. 19. There are three components in the architecture: First, sensing the 3D skeleton. RF-Pose3D obtains the reflected signal and then leverages CNN to extract skeletons.  [47]). The upper half shows the visual-based skeleton identification system which provides supervision for the RF-based system. The lower half is the RF-based system.
Second, scaling to multiple people. RF-Pose3D exploits a deep neural network to learn to detect people and focus on people. Third, training. The training data are obtained by a set of 2D visual-based skeleton identification systems. Since training data should be the 3D skeleton information, the team of RF-Pose3D develops a system containing 12 such 2D systems to obtain 3D training data. Once the network is trained, RF-Pose3D can extract 3D skeletons just from RF signals. In the experimental evaluation, the average errors of tracking each keypoint on the human body are 5.2 cm, 3.7 cm, and 4.7 cm along the x, y, and z dimensions, respectively (in through-wall scenarios).
Tadar, proposed by Yang et al. in 2015, is a through-wall system achieved by radio frequency identification (RFID) technique [46]. It belongs to the software-defined radio system defined in Section III-D. RFID is a popular technique in wireless identification fields. Typically, an RFID system is made of a reader and several tags. In the progress of obtaining information from one tag, the reader first transmits a continuous wave signal to activate a tag. The widely used tags are passive which means that there is no battery in such tags and they cannot transmit signals actively. Therefore, tags need to be activated by a reader, that is, tags leverage the energy of the received signal (from the reader) to respond to the reader and the identity information is carried by the responding signal, or called the backscatter signal. The design of Tadar is novel. Tadar contains one reader and 45 tags attached on the outer wall, constituting a virtual antenna array, as shown in Fig. 20. The transmitted signal from the reader would reflect off walls, furniture, human bodies, and other things inside the room. Thus tags would be activated by both the reflected signal and the signal directly from the reader and then generate the backscatter signal. By extracting the reflection of the human body from the backscatter signal, Tadar can obtain the location of the human body. One advantage of leveraging RFID is the simplicity of passive tags. Passive tags do not need batteries and other complicated structures so that the cost of each tag is low and tags are easy to be deployed. However, the disadvantage of Tadar is that the transmit power is higher than the systems we have discussed above by 3 orders of magnitude, as shown in Table 5. This is because the human body reflections are relatively weaker than other reflections so that the backscatter signals from passive RFID tags activated by these weak human body reflections would be much weaker. Therefore, to capture such backscatter signals, higher transmit power is necessary. In the experimental evaluation, the median of the tracking error of Tadar is 7.8 cm and 20 cm along the x and y dimensions, respectively.

VII. CONCLUSION AND OPEN ISSUE
This paper surveys the handy through-wall techniques which have low power, narrow bandwidth, lightweight, without extra devices equipped on the human body and can work in non-military scenarios. The main functions of systems based on these techniques are identifying, locating, or imaging the human body behind a wall. By applying the radar technologies and artificial intelligence (AI) in off-the-shelf mobile wireless systems, through-wall systems are becoming more practical, accurate and efficient to be applied in civil applications.
Though through-wall systems have made a great achievement in research community, several open issues need to be solved in the future:

A. FEASIBILITY
Most handy through-wall systems are only tested in experimental environments and are still facing lots of problems when applied to real-world scenarios. Note that the real-world environment is much more complicated than the lab environment. For example, for through-wall systems used for human body localization, the presence of pets in the house may influence the performance of devices. The fast movement of pets would make a distortion of signals and interfere with the detection of human bodies. Besides that, through-wall systems also need to be tested in outdoor environments. Specifically, dense urban areas always have complicated electromagnetic interference. For example, the intensive coverage of WiFi signals in city centers is a huge challenge to WiFi-based systems. In this case, any WiFi device from shops or pedestrians is the interference to these systems. The ability of through-wall localization is also limited by the number of people. If there is a meeting in a room that contains dozens of people, most through-wall systems cannot work with such a massive density of people. How to further improve the performance in such challenging scenarios deserves further study.

B. MOBILITY
While the paper investigates the handy through-wall system, the real-world portable through-wall systems is still challenging; most the through-wall devices need to be fixed at a certain position to make the signal processing much easier, and the bulky hardware devices (e.g., USRP) and real-time processing make it challenging to be portable. It is also challenging yet appealing to apply the through-wall technology in a highly mobile environment. For example, by equipping autonomous cars with through-wall devices, the cars can identify potential risks at the street corner that camera-based visual systems cannot work.

C. INTELLIGENT SIGNAL PROCESSING METHOD
Nowadays, AI has become a very powerful method to process signals. The combination of AI and through-wall technology could be the mainstream of future development. As several systems mentioned above [10], [47], [48], [52], AI has been proved to have the ability to extract fine-grained information of human bodies from coarse RF signals. How to combine AI and through-wall technology much closer is a challenge to future through-wall systems. Supervised machine learning is now widely used in through-wall scenarios to interpret complicated RF information. Supervised learning, however, requires a huge amount of training set which is scarce in the through-wall detection field. For example, in the absence of labeled RF data of walking people, today's systems turn to the visual-based system for help. However, how to teach through-wall systems to recognize dancing people instead of walking ones, a new training set is required. Due to such challenges in supervised learning, unsupervised learning may be an alternative choice. Therefore, combining AI and through-wall technology effectively is a big deal for future research.

D. PRIVACY
The through-wall technology, if inappropriately used, would severely violate the privacy of people. Therefore, privacypreserving technology is necessary before the wide civil adoption of the through wall system.

APPENDIX A DERIVATION OF DOA
To make the task easier and reduce the difficulty of calculation, some assumptions are made as follows [55]: • The propagation medium is homogeneous which means that the waves arriving at the array can be considered planar.
• The locations of every sensor (antennas array) are known and uniform.
• Sensors can be modeled as linear time-invariant (LTI) systems.
• The received signals are narrow-band and the sensor frequency response is flat over the pass-band.
• Sensors are identical and omnidirectional over the DOA range of interest which means the sensor frequency responses are independent of DOA. Above all, we use the transmitted signal x(t) at a reference point to describe the signal received by sensors(antenna array). The reference point can be one of the sensors or any other point placed near enough to the sensors. Assume that the number of sensors is m. Then the output of sensor k (k = 1, 2, . . . , m) can be written as whereh k (t) is the impulse response of the kth sensor, τ k is the time that signals take to travel from the reference point to the kth sensor, andē k (t) is additive noise. Similarly, the output of sensor k in the complex frequency domain is whereȲ k (ω),H k (ω), X (ω), andĒ k (ω) denote the Fourier transform ofȳ k (t),h k (t), x(t), andē k (t), respectively. Let s(t) denote the baseband signal associated with x(t). Then X (ω) can be expressed with S(ω) (Fourier transform of s(t)) through real modulation process: where ω c is the carrier frequency of the modulation. According to (13), (12) can be written as Y k (ω) =H k (ω) S(ω−ω c )+S * (−ω − ω c ) e −jwτ k +Ē k (ω).
After the modulated signal (Ȳ k (ω)) has been received by a sensor, the next the procedure is demodulation. Letỹ k (t) and Y k (ω) denote the demodulated signal. Then we can obtain the demodulated signal by translating in frequency to the left by ω c :Ỹ The next step is to extract the baseband signal from the demodulated signal (ỹ k (t)) by passing through a lowpass filter whose bandwidth is equal to s(t). Let y k (t) denote the output of the lowpass filter and Y k (ω) is its Fourier transform. Then Y k (ω) = H k (ω + ω c )S(ω)e −j(w+ω c )τ k ≈ H k (ω c )S(ω)e −jω c τ k +E k (ω + ω c ), where H k (ω) and E k (ω) denote the parts ofH k (ω) andĒ k (ω) within the band of the lowpass filter. The approximation in (17) is based on the above assumption that the received signals are narrowband. Therefore, in time domain, (17) can be transformed into y k (t) = H k (ω c )e −jω c τ k s(t) + e k (t), (18) where y k (t) and e k (t) are the inverse Fourier transforms of Y k (ω) and E k (ω + ω c ). In order to obtain the DOA by detecting the signals, we need to obtain the relationship between y k (t) and DOA(θ). Assume that the reference point is sensor 1 (i.e., τ 1 = 0) and let d donate the distance between sensors, as show in Fig. 7. Then τ k and θ meet the following equation: Then (18) can be written as Therefore, with m sensors, we will obtain a matrix equation: