Multipath Interference Suppression in Time-of-Flight Sensors by Exploiting the Amplitude Envelope of the Transmission Signal

Time-of-flight (ToF) cameras provide depth information and they have been applied in many areas. However, multipath interference (MpI) affects the depth measurements. This article proposes an algorithm to suppress MpI during ToF camera depth measurements by exploiting the amplitude envelope of the camera’s transmission wave. More precisely, it uses the fact that the amplitude envelope of the ToF camera’s transmission wave is constant. The envelope of the reflected signal in the case of MpI deviates from that of the MpI-free envelope. By minimizing the deviation using an adaptive filter, the depth measurement error can be corrected. The method is applied at the sensor side of the ToF camera and lays the theoretical foundation for MpI suppression in ToF cameras. The simulation and experimental results demonstrate that the proposed algorithm is able to significantly improve the depth measurement accuracy.


I. INTRODUCTION
In recent times, 3D imaging has undergone tremendous growth and imaging applications have been introduced in various fields, such as the automotive, gaming, smartphone, and robotics domains. Time-of-flight (ToF) is a 3D imaging technique that determines the depth of an object in a scene by illuminating the scene with modulated light and measuring the phase shift of the reflected light. ToF imaging has been applied in areas such as material sensing, vehicle parking assistance, underwater situational awareness, robot navigation, obstacle detection and collision avoidance, etc.
[1]- [6]. Moreover, with the emergence of low-and mediumcost cameras with very high resolution and frame rates, the ToF camera market is experiencing unprecedented growth. ToF imaging has finally made a leap into mass production in consumer markets, with more advanced and sophisticated cameras being manufactured. The major ToF camera manufacturers include PMD, MESA, Optrima, and Microsoft [7].
The associate editor coordinating the review of this manuscript and approving it for publication was Yongqiang Zhao . A ToF camera operates by illuminating the scene with a modulated pulsed or continuous-wave (CW) light signal and then measuring the phase displacement between the transmitted and received signals [8], [9]. The camera emits an amplitude-modulated light in the near-infrared (NIR) range. The objects in the scene then reflect the light, and the camera optic projects the reflected light into a complementary metal-oxide semiconductor (CMOS) or a charged coupled device (CCD) pixel matrix. With the use of advanced electronics, the depth is then computed from the phase shift between the transmitted and reflected signal [10]. In reality, the light reaching the camera sensor is reflected not just directly from the target object but also from other objects and surfaces present in the vicinity. The phase shift, in this case, is computed from the superposition of the transmitted and multiple reflected light signals. The measured depth, as a result, is either more or less than the actual depth. Multipath interference (MpI) is one of the primary sources of error in ToF data acquisition and can cause significant inaccuracies in depth estimations.
Several studies have addressed the impact of MpI on ToF depth measurements and several others have proposed VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ MpI compensation and depth correction: For an intensive review, see Whyte et al. [11]. A broad categorization of the MpI suppression methods can be made according to whether the method removes the MpI from the final ToF depth image or during the signal capture at the camera sensor. The first category of methods aims to suppress the MpI from the final depth image by either developing an MpI model or by using structured light patterns to project the scene. Jimenez et al. incorporated a radiometric model to predict the ToF measurements and remove MpI contamination by implementing an iterative optimization algorithm [12]. Bhandari et al. employed a method for the backscattering vector estimation through a custom-coded waveform instead of a standard sinusoidal waveform and used the sparse deconvolution procedure to recover the sequence of signals corresponding to MpI [13]. Gupta et al. proposed an approach where light with a high modulation frequency was used to illuminate the scene [14]. For high-frequency illuminations, the global radiance is initially expressed as a DC component and is then canceled out from the total illumination. Naik et al. used a similar approach where the scene was illuminated with a spatial checkerboard pattern. The directreflected light and the echoes are then acquired and separated through the decomposition of the measured radiance at the sensor [15]. The technique developed by Baek et al. eliminates the depth error by employing a juxtaposition of three ToF cameras set to different integration times [16]. Although the results were successful, additional hardware and precise synchronizations for multiple ToF cameras are required.
The second category of methods attempts to eliminate the MpI from the sensor side of the ToF camera during depth acquisition. Freedman et al. developed a method based on the use of multiple frequencies and a sparse reflection analysis to remove the MpI using the L1 optimization algorithm [17]. In [18], Agresti et al. presented a method for MpI correction by separating the direct and global components of the received signal using high-frequency projection patterns. Patil et al. proposed a technique called microToF imaging to mitigate the effect of MpI in ToF 3D sensors using the direct and global separation technique [19]. Dashpute et al. proposed the elimination of non-systematic errors (including MpI) by incorporating a polarizer with a ToF camera and measuring the state of the reflected light at particular polarization angles [20]. The depth error is at a minimum at the maximum intensity of the received light, which occurs at the polarization angle. Lickfold et al. suggested simultaneous stepping of the phase and the frequency of the CW modulation of the camera to remove the error and the offset in the depth measurement [21]. Although these are efficient techniques for MpI suppression, they are not easily implementable for the complete range of ToF devices, as they either require some assumptions regarding light characteristics or require additional custom coding and structural light projections to remove MpI.
Many researchers have previously exploited the signal envelope for ToF measurements. The detection of the envelope's zero crossings of the transmitted and received signals for ultrasonic ToF measurements has been used in [22]- [24]. However, unlike the zero-crossing detection or the envelope extraction used by other researchers, the method proposed in this article exploits the fact that the amplitude envelope of a sinusoidally modulated wave is a constant and that the envelope of a signal affected by MpI deviates from the constant one. The amplitude envelope of the signal, also referred to as the analytic envelope of the signal, is the magnitude of the sum of the squared in-phase and quadrature phase of the signal [25], [26]. In this article, this envelope is exploited to suppress MpI during ToF depth measurements, which happens at the sensor side of the ToF camera. The algorithm suppresses the MpI from the reflected light signal, irrespective of factors such as directivity and the reflection properties of the surrounding objects, and it does not require any specifically designed multipath model. The performance of the proposed method was validated through simulations as well as experiments with real measurements. The experimental scenarios were designed to be similar to what a ToF camera measures in the real world. The results confirmed the effectiveness of the proposed method.
In the following section, the imaging process at the sensor of the ToF camera is briefly explained. Then, the algorithm proposed for the suppression of echoes is presented. After that, the simulations and the experiments based on the proposed method are discussed at the end of the paper.

II. ToF CAMERA PRINCIPLE
A CW ToF camera emits a temporally modulated light signal, usually in the NIR range, employing LEDs as the light source to illuminate the scene. The light reflected by the scene reaches the camera sensor as a temporally varying signal with an attenuated amplitude and a phase shift. The depth of the object is estimated by obtaining the phase displacement between the transmitted and received signal. The modulated transmitted signal is expressed as where a t is the amplitude of the transmitted light signal, ω represents the transmitted frequency, and φ i is the initial phase of the signal. Without MpI, the reflected light reaching the receiver is given by where b 0 represents the ambient light, a r is the amplitude of the reflected light, and φ is the phase shift of the reflected light. The depth d is calculated from the phase shift φ, i.e., (φ i − φ) of the reflected light by the following relation: where c is the speed of light, and f is the frequency of the transmitted light signal. To retrieve the phase shift of the received signal, synchronous demodulation of the received light signal is carried out. In this method, also commonly known as the ''four-bucket'' sampling technique, the received signal is sampled at four specifically chosen instances, usually at 0, π/2, π, and 3π/2 for every period. The phase shift is then calculated by the following formula [27]: The depth of the object is further estimated by the relation given in (3). A ToF-based depth estimation assumes that the light reaching each pixel is reflected from a single point in the scene without inter-reflections. The light, however, may undergo multiple reflections from other reflecting objects, mirror-like and semi-transparent materials, or from a complicated geometry of walls and corners in the vicinity. A number of multiple reflected signals then combine at each pixel. This distorts the depth calculations and results in incorrect depth results. Usually, the received light is considered as a superposition of two components: the direct reflection from the target object and the indirect component, which includes all the other reflections from the surrounding objects and surfaces. The received signal s r (t) can be mathematically represented as a sum of the direct-reflected and the N superimposing multipath reflections: where k represents the number of superimposing multipath reflections. Most of the MpI is comprised of the light signals that undergo up to three reflections from the scene. Further higher-order reflections suffer from large attenuation and they decay exponentially due to scattering [13]. The ToF depth measurement also encounters many other practical issues, which are categorized into two broad categories: systematic and non-systematic errors. The systematic errors include the integration time (IT) error, amplitude ambiguity, temperature error, depth distortion error, and the built-in pixel error, and the non-systematic errors include MpI, background illumination, scattering noise, and motion blur [28], [29]. Modern ToF cameras use techniques such as the suppression of background illumination [30] or employ special background-light immune sensors to suppress the effect of ambient light in the measurements [31], [32]. Thus, the term b o in the above equation is neglected in the calculations in this article. Depth measurements in ToF imaging are also affected by motion blur; i.e., the error occurring when the objects move. In [33], the ToF raw data are considered as a noisy time series, and the error between raw frames due to transverse motion is reduced by the application of statistical processing using a Kalman filter. This work, however, is only concerned with MpI and its suppression, and the other errors are excluded from the scope of this article.

III. MpI SUPPRESSION EXPLOITING THE CW CHARACTERISTICS OF LIGHT
The method proposed in this article exploits the amplitude envelope of the sinusoidal waveform to mitigate against the effect of MpI from the incident light signal. As already mentioned in Section I, the amplitude envelope of a signal is given by where H (s (t)) is the 90 • phase shift or the quadrature of the signal. This method employs the fact that the amplitude envelope of a sinusoidal signal is a constant, whereas the envelope of the signal with MpI deviates from the constant one.
Consider an example where the received signal is comprised of one direct-reflected and two echo signals. The received signal is given by the equation where a 0 , a 1 , and a 2 are the amplitudes of the direct-reflected signal and the echoes, respectively, and φ 0 , φ 1 , and φ 2 are the respective phase shifts. The light signal incident on the camera sensor without any MpI is given by and the envelope of the above signal is Similarly, the envelope of the light signal with multiple echoes is given by which, on further simplification, gives Comparing Eq. (9) and Eq. (11), it can be seen that the envelope of the received light with echoes deviates from that of the direct-reflected signal. By minimizing this deviation, the MpI from the received signal can be suppressed. The proposed method suppresses the MpI from the received signal by incorporating the least mean square (LMS) algorithm. The algorithm updates the adaptive filter coefficients and consequently restores the received signal in a form whose envelope matches that of the signal without any MpI. The workflow of the proposed MpI suppression algorithm is described in Fig. 2. The light received at the sensor, distorted due to the MpI, is passed through an adaptive filter whose output is then routed into two 90 • phase-shifted signals, which are squared and added to each other. The 90 • phase shift is achieved through the Hilbert transformation. The output of the filter with weight vector Here (·) T is the transpose of a vector, L is the number of filter coefficients, and M is the size of the input signal. Next, the resulting signal is compared with a constant envelope. The instantaneous error at a time instance k is given by where d (k) is the constant envelope and y env is the envelope of the filter output signal given by The LMS algorithm is a stochastic gradient descent method, i.e., the update is based on the error at the current instance. The cost function is formulated as the instantaneous mean squared error (MSE) between the envelope of the received signal and the constant, given by The gradient of the MSE with respect to the weights at each iteration is given by After further simplifications, the gradient becomes where the term y filt (k) = W T · X env (k) represents the filter output of the signal envelope. For a particular choice of stepsize µ, the update of the weights, according to the steepest descent adaptive algorithm, can be written as The filter coefficients are updated at each subsequent iteration of the algorithm until the cost function gradually reaches a minimum. The filter output converges in a way that its envelope is as close to the constant envelope as possible. This implies that the echo components in the received signal (Eq. 11), which deviate the envelope from the constant, are suppressed. By suppressing the echoes from the received signal, the influence of MpI on the ToF depth measurement is minimized.
The filter's coefficients are adapted such that the echoes are effectively subtracted from the main signal during equalization. Consider an example where the reflected light consists of one direct-reflected and one echo signal whose amplitude is 0.25 mV. The echo arrives at the sensor with a delay of three samples, as shown by the received signal in Fig. 3.
In the course of the equalization, the filter coefficients adapt to their optimum values so that the echo is effectively subtracted from the signal. Each coefficient is adjusted accordingly to compensate for any deficiency that remains. As the echo may be received with a certain delay after the main signal, the first tap of the filter need not be the main tap. In the simulations it was observed that the filter weight − → W [3] (corresponding to the echo at n = 3) converges to the final value of ∼−0.25 (Fig. 4) and thus subtracts the echo from the main signal.

IV. SIMULATIONS AND RESULTS
The efficiency of the proposed method was investigated by simulating ToF operation under different scenarios. The simulations were carried out using MATLAB. The transmitted signal was modeled with an amplitude a 0 = 0.5 mV and  a frequency of 10 MHz, which is commonly used in many ToF cameras. An object at a distance of 1 m from the ToF camera was considered for the simulations. The received light without any echo gives the actual distance of the object. The phase shift of the reflected light for a distance of 1 m can be derived from the known formula in Eq. 3. To determine the effectiveness of the algorithm, a number of different scenarios, with the received light consisting of one, two, and three echoes, were considered separately. It is known from [13] that the amplitude of an echo signal attenuates after each bounce. Thus, for a case with two echoes, their respective amplitudes were set to be a 1 = a 0 /10 and a 2 = a 0 /100, assuming one echo reaches the sensor after more bounces than the other one. The simulated received light is then cross-correlated with the transmitted signal, and with the ''four-bucket'' sampling, the phase shift, and consequently, the depth is computed. The depth in this case was estimated to be 1.09 m, which does not match the ground truth, resulting in a depth error of 0.09 m. The received signal was then equalized by employing the proposed adaptive filter.
A major challenge in designing an adaptive filter is the compromise between the computational complexity and convergence accuracy. An infinite impulse response (IIR) filter structure, in comparison with a finite impulse response (FIR) structure, involves a fixed number of computations with respect to memory locations and mathematical operations, thus providing the additional advantage of reduced computational complexity [34]. The rate of convergence of the MSE depends on a number of factors, such as the filter initialization, filter length, and the step size. The selection of the step size regulates the convergence speed and stability of the adaptation. There are many methods that propose dynamically updating the step size during adaptation to improve the speed and accuracy of the convergence: A detailed compilation of the methods and the formulas can be found in the paper by Hwang and Li [35]. Although there is no analytical definition that characterizes the convergence and the factors influencing it, there are, however, qualitative results that can help in choosing the parameters to attain the desired convergence. The speed of convergence increases with the step size; however, an upper bound on the step size remains for a stable convergence given by where L is the filter length [36]. The length of the filter has an inverse relation with the speed of the convergence as it influences the ''condition number'' for the cross-correlation of the signal and the filter, and consequently reduces the upper bound on the step size for the MSE to converge [36]. There are several algorithms, such as fractional tap-length LMS (FT-LMS), variable-tap variable-step LMS (VTVS LMS), and variable-length LMS (VL-LMS) to help choose an optimum filter length L for a given step-size µ [37]. Taking the various factors into consideration, the equalization of the simulated received signal was carried out with an IIR filter with five coefficients (initialized to zero) and a step size of 0.01. It was observed that the error in the depth estimation had been minimized. The depth after the equalization was computed to be 1.001 m, which is very close to the actual depth, and the error is reduced to 0.001 m.  Table 1 lists the simulation results for the extended reflection model by considering the reflected signal corrupted by a different number of echoes. The measured and the corrected depths for each case and the corresponding errors before and after the echo cancelation are shown in the table. The error VOLUME 8, 2020 in each case is minimized, and the depth after correction is much closer to the actual depth.
For further simulations, it was considered that the reflected signal reaches the sensor with an additive white Gaussian noise (AWGN) with a 30-dB signal-to-noise ratio (SNR). Table 2 lists the depth measurements before and after the echo correction for different numbers of echo signals. The data from Table 2 show that the MpI has been suppressed from the received signal in every case, and the depth computed after the equalization is very close to the ground truth. More simulations were carried out by assuming that the object was placed at different distances from the camera. It was observed that the MpI suppression and the subsequent depth correction were equally good, even when the object was quite far from the camera. The results in Table 3 show the comparison of the depth of the object before and after the correction for different distances from the camera. It can be seen from the data that for each case, the corrected distance closely matches the ground truth.

V. EXPERIMENTS AND RESULTS
In addition to the simulations, experiments with real ToF measurements were also performed to validate the efficiency of the proposed method in real-world scenarios. The different experimental scenarios involved objects of different sizes and material being placed strategically at different distances from the ToF camera. The experimental setup consisted of a laser diode as the light source that emitted sinusoidally modulated light at a frequency of 100 kHz with 30 mW power and 0.6 mV peak amplitude. The light is reflected from a target object placed at a certain distance from the light source. A photodetector placed adjacent to the emitter receives the reflected light. The reflected light signal is then passed through a low-noise amplifier and recorded by an oscilloscope. The acquired data are processed with a program in MATLAB, which computes the depth of the object and carries out the correction algorithm. The experimental setup is shown in Fig. 5.
The experiment was carried out in a completely dark room with a non-reflective screen placed behind the target object to avoid as much background illumination as possible. The experimental setup employs a fixed modulation and demodulation frequency, which cannot be increased in order to measure a larger depth range. The algorithm is scalable to any frequency in the NIR range used by commercial ToF cameras. To test the scalability of the method, measurement data from a camera sensor are essential, and unfortunately, access to raw data from commercial ToF cameras is not possible at present. This experimental setup was only used to validate the method proposed in this article.
A qualitative analysis of the performance of the MpI suppression algorithm is shown in Fig. 6. Plot (a) in Fig. 6 shows the received light signal reflected by a coffee mug placed at a distance of 400 mm from the ToF camera. The reflected light is affected by MpI and noise. Next to it, plot (b) shows the corrected output of the filter, i.e., the signal after MpI suppression, and plot (c) shows the convergence of the error during the adaptation process. Later in this section, a quantitative analysis of the algorithm is performed on different objects, such as a metal wire, a pencil lead, and an eraser placed at distances of 150 mm, 250 mm, and 350 mm away from the camera. The selection of the different objects was made by considering different materials and their reflective properties. The pencil lead exhibits high diffusion and low reflectivity, the metal wire has high reflectivity and exhibits specular reflections, and the eraser features Lambertian reflections. Since the experimental setup is very susceptible to noise, multiple measurements were taken and averaged to provide a more general representation of the analysis. For the depth measurement, four samples corresponding to the correlation function were captured at 0, π/2, π, and 3π /2 for every period over multiple periods so that the measured depth was averaged out. The measured depths before and after correction for the different objects in different scenarios were compiled in the form of a table. Table 4 shows the depth estimations    before and after the MpI correction for a lead pencil placed at different distances from the camera. Tables 5 and 6 show similar comparisons for metal wire and an eraser.
During the experiments, it was observed that different objects consisting of different materials tended to be estimated at different depths when placed at the same distance from the camera. Besides, different lighting conditions and the presence of a non-reflective background also affect the measurements. A thorough analysis of the factors affecting the depth measurement of an object has been done by He et al. [29]. In addition, other typical systematic and nonsystematic errors were not taken into consideration during the measurements in the experiments.
As a ToF camera is essentially employed to capture depth images of moving objects, the performance of the proposed method was also tested for objects in motion. For this scenario, the experiments were designed such that an object initially placed at a distance of 500 mm from the camera moved away from the camera continuously. The camera captured the depth of the object at each frame. A typical ToF camera runs at a rate of 30-40 frames per second [7]. As the object continuously changes its position relative to the camera and the surroundings, the amplitude and the phase of the directreflected light, as well as those of the echoes, also change. The algorithm must be able to minimize the error before the next frame. The accepted duration of convergence is around 25 milliseconds. It was observed that the algorithm was able to converge the error to a minimum within the timeframe. Table 7 lists the three time instances at which the object was captured at different depths. A metallic rail with micrometer precision was used to place the object at different precise positions from the camera (for ground truth). It was observed that the depth of the object was successfully corrected for each frame. The filter coefficients reach their optimum values after the convergence, and the optimum coefficients are then used as the initial coefficients for the next optimization. Fig. 7 shows the convergence of the error for the consecutive time instances. It takes around 150 iterations for the algorithm to converge the error to a minimum.
Another scenario was considered where the target object remained in the same position, and another object suddenly appeared near it. Although the actual distance of the object in this case remained the same (500 mm), the echo component reaching the sensor changed suddenly. Consequently, due to the abrupt change in the received light, the measured depth was affected. It was observed that the algorithm was able to carry out the echo cancelation well inside the time period. The depths measured before and after the correction for both frames are shown in Table 8 below. Other Adaptive Algorithms: An appropriate selection of the step size such that the filter's coefficients do not diverge is a big challenge in filter design. If a small µ is chosen, the convergence will be accurate but very slow, and for a larger µ the weights might diverge. To overcome this difficulty, several methods have been developed over the years. One example is the normalized LMS (NLMS) algorithm, which computes the dynamic step size as where the adaptive step-size µ (n) is normalized by filter length L and the signal powerP x , and δ is a small constant to avoid division by zero. The NLMS algorithm optimizes the speed of convergence of the error and guarantees the convergence regardless of the statistics of the input signal [38]. The block-LMS (BLMS) algorithm converts the input signal into blocks that are applied to the adaptive filter one block at a time. The adaptation of the filter weights occurs on a block-by-block basis, as compared to the conventional LMS. The BLMS algorithm reduces the computational complexity and hence the runtime of the algorithm; however, this improvement generally does not infer faster convergence. The convergence rate of the BLMS is improved by combining it with the NLMS algorithm and computing the adaptive step size at each iteration [38]. The updated equation for the tap-weight vector for the block-normalized LMS (BN-LMS) becomes where µ(k) is the adaptive step size calculated from the NLMS formula and (k) is the cross-correlation given as One of the alternatives to the LMS algorithm is the recursive least squares (RLS) algorithm, which provides an excellent convergence rate but introduces increased complexity and computational cost. While the LMS filter adapts the coefficients until the difference between the desired and the actual signal is minimized, the RLS algorithm recursively finds the filter coefficients that minimize the weighted linear least squares. The signal paths and the adaptation are the same for both the LMS and the RLS filters [39]. The computational complexity of an algorithm is based on the total floating-point operations (additions and multiplications) involved in its implementation on a processor. For L filter coefficients, the complexity of the LMS (as well as the NLMS and BN-LMS) algorithm is of the order O(L), while that of the RLS algorithm is of the order O(L 2 ) [34]. The performance of the above-mentioned algorithms compared with the LMS algorithm on the basis of the root mean square error (RMSE) and the computation time is compared in Table 9 below. To compare the runtime of the algorithms, the code (implemented in MATLAB) was run on a Windows 10 computer equipped with an intel core i5 (7 th Gen) processor with four cores, 8 GB of RAM, and a clock speed of 2.80 GHz. It can be seen from the table that the BN-LMS algorithm is much faster than the conventional LMS algorithm is, and the RLS algorithm, though improving the convergence, is much more time-consuming than the LMS algorithm is. Comparison: The method proposed in this article successfully corrects the error in the depth measurement without any physical supplementation to the ToF camera and without using multiple frequencies. The depth correction performance of the proposed method was compared with recent depth correction methods. The comparisons, based on the depth correction for an object at a distance of about 500 mm from the camera, are listed in Table 10 below.

VI. SUMMARY AND CONCLUSION
In this article, the authors presented a new approach to suppress the MpI from the received light signal in a ToF camera. The proposed method operates directly at the sensor side of the camera and can be employed with any ToF camera. Simulations demonstrated that the algorithm is able to correct the MpI-induced depth error under different scenarios. Furthermore, the algorithm was employed on a ToF camera setup, and experiments with real objects under real conditions were performed. To replicate the results in a commercial ToF camera, raw data from the camera sensor are required. Unfortunately, access to the raw data from the sensor interface of ToF cameras is not possible with commercially available ToF cameras. Future work will involve a more comprehensive implementation of the method suitable for commercially available real-time ToF cameras and the adaptations required in the physical structure of the cameras. The simulations, as well as the experiments, validate the method proposed in this article, although for the latter, there is still much room for improvement, which will definitely be overcome in future work.