Pseudo-Passive Time-of-Flight Imaging: Simultaneous Illumination, Communication, and 3D Sensing

The 3D time-of-flight (ToF) cameras have recently received a lot of attention due to their wide range of applications. Despite remarkable advancements in ToF imaging, state-of-the-art ToF cameras are still afflicted by the power hungriness of their illumination sources. To tackle this problem, we exploited existing lighting infrastructure, which ensures the ubiquitous presence of modulated light sources in indoor spaces that serve as opportunity illuminators. We explored the bistatic geometry for passive imaging using the pulse-based ToF approach. Our work is inspired by the recently introduced visible light communication (VLC) or light-fidelity (Li-Fi) infrastructure. VLC allows the infrastructure to provide indoor simultaneous illumination, communication, and sensing (SICS). To this end, we designed a bistatic geometry for the purpose of attaining passive 3D imaging. Such capabilities are achieved by exploiting the pulse shape of the autocorrelation function of real optical signals generated by VLC/Li-Fi modules (e.g., OpenVLC and LiFiMAX). We demonstrated passive imaging by means of matched filtering. In this work, we also studied different sampling strategies in the time-shift domain, including uniform, random, and sparse rulers, which is another step forward toward preserving high depth accuracy with a minimal number of measurements. The proposed methodology achieved successful depth reconstruction with negligible root-mean-square error (RMSE) for the low signal-to-noise ratio (SNR) of the measurements. Parametric models, such as Gaussian and sum of sines, are used to characterize the cross correlation functions and allow for robust parametric depth retrieval from a few measurements. Moreover, we attained a 20-mm worst case error for a target at 25 cm. The experiment proved that the bistatic passive depth reconstruction is feasible.

Pseudo-Passive Time-of-Flight Imaging: Simultaneous Illumination, Communication, and 3D Sensing gaming, and human-machine interaction, previously reserved to LiDARs and stereo vision systems [1], [2], [3]. ToF is an active imaging technique that is able to produce both intensity and depth maps. A ToF camera computes the distance between the camera and the scene objects in a per-pixel basis by exploiting the time lapse that the photons confront when the modulated light signals are projected onto the scene and bounce back to the camera, which is proportional to the distance between the camera and each corresponding scattered point.
In recent years, rapid advances in solid-state technology have profoundly transformed the lighting infrastructure from conventional lamps (e.g., incandescent and halogen) to light-emitting diodes (LEDs). Thus, LEDs have become popular for displays and light sources because of their advantages such as long lifetime, small size, low cost, energy efficiency, and low switching transient [4]. The switching capability of LEDs enables the visible light channel to be modulated at high frequencies, which are high enough to be imperceptible to the human eye [5]. LED lighting has gained global recognition as a green lighting technology in recent years [6], [7]. LEDs are predicted to replace conventional lights eventually and become the ultimate light source for many applications [8]. This transition has transmogrified the lighting infrastructure into a novel communication paradigm. This paradigm shift has accelerated the development of visible light communication (VLC). VLC is a promising mechanism and an accessible technology, driven by the ubiquitous proliferation of low-cost LED sources in indoor environments. LED-based VLC systems offer numerous advantages, including low-cost front ends, immunity to electromagnetic interference, high physical-layer security, and unregulated spectrum resources (400-900 THz) [9]. It is believed that VLC will play a significant role in the fifth-generation (5G) and sixth-generation (6G) communication [10], [11]. This brings communication and lighting together within a single module. Therefore, the VLC infrastructure can provide multiple services simultaneously, such as communication, illumination, and now, also ToF 3D imaging in indoor environments. In this context, the VLC infrastructure has laid down a solid foundation for pseudo-passive ToF sensing.
In addition, VLC-enabled front ends from companies, such as pureLiFi, Philips, and Oledcomm, are accessible to the general public and have rapidly penetrated into the commercial market, including homes, offices, and industrial buildings. 1 The commercialized VLC technology (IEEE 802.15.7) offers a data rate of 150 Mb/s and the research prototypes reach 1 Gb/s of data rates [12]. Light-based wireless communication, the so-called VLC, and ToF sensing systems have been developed independently for two decades after the emergence of optical wireless communication (OWC). The passive imaging method we present transcends the communication-only or sensing-only philosophy and, alternatively, uses a VLC-enabled sensing approach based on the cooperation between VLC and ToF imaging. Thus, this study broadens the reach of VLC systems to synergistically support communication and sensing services.
Active 3D imaging has been a vibrant research topic for many years [13], [14], [15]. Despite significant progress in ToF imaging, state-of-the-art ToF cameras are still susceptible to the high power consumption of their dedicated illumination units. Unfortunately, this problem remains unaddressed, and most of the recent progress is on the signal processing side. For example, in [16], the illumination system features 91 W of optical power for wide-area ToF imaging. This precludes the applicability of ToF imaging in specific scenarios. To overcome this problem, an alternative approach that uses an opportunity illuminator for sensing is proposed. This makes the built-in dedicated illumination unit futile, thus enabling passive ToF 3D sensing. VLC has recently been used for passive ToF sensing without synchronization between the source and the ToF camera. This leads to an unknown depth offset in passive ToF imaging [17]. Nonetheless, this passive modality still needs refining to attain accurate depth recovery.
Due to the development of a bistatic sensing geometry and the refinement of the ToF sensing pipeline for passive 1 http://purelifi.com/case-studies/ imaging modality, we solved the synchronization problem by introducing a direct link between the emitter and a reference photodiode (RPD). In general, the bistatic configuration uses two parallel channels. One is the reference channel, which transmits signals between the VLC source and the reference PD to acquire an external reference signal for the photonic mixer device (PMD) camera. Another one captures scenerelated reflections. Informative measurements are achieved by cross-correlating the reference and reflected signals. This solution exploits the existing VLC infrastructure [18] to illuminate the scene with modulated light and synchronize the camera with an externally provided signal, allowing accurate depth reconstruction.
To evaluate this alternative, we have used an Open-VLC1.3 module with a white LED and a LiFiMAX module with an infrared (IR) LED for our simulations [19], [20]. OpenVLC is a low-cost and open-source platform with a bandwidth of 1 MHz and supports a throughput of 400 kb/s. The LiFiMAX module can provide a data rate of 40 and 100 Mb/s in uplink (UL) and downlink (DL), respectively.
Moreover, in this work, the depth is reconstructed using different sampling methods in the time-shift domain. In an attempt to reduce the number of measurements, we compared different sampling approaches, both uniform and nonuniform. The signal processing community extensively relies on uniform sampling (US) in the communication and sensing domains, while nonuniform alternatives often remain unexplored, arguably except random sampling (RS). RS is used as a basis for compressive sensing to recover sparse signals from a few measurements [21]. In addition, for depth reconstruction, we used a matched filtering method.
1) Matched Filtering: A matched filter (MF) is a well-known signal processing technique for improving signal quality and estimating delays. The temporal shift is observed by correlating a known delayed signal (or template) to an unknown signal [22], [23], [24]. Matched filtering is often performed in an analog circuit that carries out a correlation operation and then finds the peaks or in a digital circuit or computer that takes samples of the signal and calculates the discrete correlation function. The MF offers accurate results at a low computational cost, thus allowing for high frame rates. In this context, the sampling rate should be high enough.
The 6G technology promotes communication and sensing simultaneously. In this context, VLC infrastructure can be reused for multiple services. To the best of the authors' knowledge, this is the first work exploring VLC infrastructure as a drop-in replacement for the illumination unit in ToF depth cameras.
The rest of this article is structured as follows. Section II provides a summary of related work in the area of pulse-based (PB) ToF and existing passive sensing methods. Section III is devoted to the system model. In Section IV, the depth recovery method and different sampling schemes are briefly discussed. The experimental setup is demonstrated and elaborated in Section V. Section VI provides the simulation performance of the system based on several sampling schemes and the firstever passive-ToF 3D reconstruction. Finally, Section VII draws conclusions and proposes future lines of work.
II. RELATED WORK OWC and ToF sensing have achieved unprecedented results independently in the recent past, but without mutual intersection. To date, no prior works have studied the intersection of both technologies for simultaneous communication and 3D sensing. Recently, the emergence of OWC variants, such as VLC, light-fidelity (Li-Fi), and free-space optical communication (FSOC), has brought interesting avenues for passive ToF imaging. These variants are frequently used for communications and illumination in indoor and outdoor settings.
In parallel, ToF cameras have made great strides in depth reconstruction. In general, ToF cameras are segregated into two operational modes, i.e., PB-ToF mode and continuouswave (CW) ToF mode. In the PB-ToF mode, the source emits a short pulse that illuminates the scene, and bounced-back signals are received by the ToF camera. ToF pixels perform an integration of the scene reflected light mixed with a demodulation control signal (DCS). The measurements are achieved by shifting the DCS with respect to the illumination control signal. In the CW mode, measurements are obtained for different phase shifts between the DCS and the illumination signal.
The phased ToF camera was primarily pioneered by Schwarte [25]. His work paved the way for the success of ToF cameras in the computer vision community. Over the years, ToF imaging technology has become ubiquitous in a wide range of 3D imaging applications. The PMD camera is a leading-edge technology for CW-ToF cameras. These devices are able to extract depth from raw data following the phase stepping algorithm [26]. Such devices are frequently endorsed due to their mature processing pipeline and publicly accessible designs [27]. A related work outlined the fundamental operation of lock-in ToF cameras, their merits and shortcomings, the layout of ToF pixels, and a remedy to practical difficulties that appear when a PMD camera is being used in the presence of background light [26], [28], [29], [30].
Li-Fi was initially demonstrated by the German physicist Harald Haas. VLC is a subset of temporally structured lighting [31]. The challenging task is to encode the information in a lighting framework (one or multiple VLC sources). Zhang et al. [32] reported recent advancements in VLC hardware technology. They demonstrated that the blue LEDs and color converters attained 1485-and 470-MHz bandwidth, respectively. These advances not only boost the capacity of VLC channels but also facilitate passive ToF sensing. In 2019, VLC was used for ranging and vehicular communication. This work is not directly related, but the ToF technique is analogous [33]. The ToF sensing technique is a well-known problem: determining the distance from the reflection of a known signal.
Recent research efforts have been devoted to the development of PB-ToF imaging systems. Typically, rectangular pulse shapes are used in PB-ToF sensors. It is challenging to generate perfectly square pulses since vertical rising or falling edges would require unlimited bandwidth.
Sarbolandi et al. [34] used a PB-ToF camera (Hamamatsu area sensor S11963-01CR) featuring two electronic shutters, known as gates, which are used for the accumulation of photogenerated carriers from bounced-back photons. Both gates are triggered sequentially for a few nanoseconds. The distance to an object leads to a shift in the reflected signal compared to the emitted signal. Thereby, the time shift determines depth. Lang et al. [35] developed a PB-ToF technique for classifying materials based on their unique signatures. Wagner et al. [36] developed an interferometric technique for depth imaging that uses the inherent photon bunching signature of thermal light, which was initially demonstrated by Brown and Twiss [37], [38]. This work adds complexity to the system since the illumination signals should be conditioned before use. This method requires high-bandwidth detectors.
In contrast to our work, other authors employed photometric stereo (PS) and aperture masking interferometry to perform passive imaging [39], [40], [41], [42]. Furthermore, the need for multiple sources and an appropriate footprint are fundamental problems in such techniques for scene reconstruction using low-coherence interferometry. The aforementioned passive ToF alternatives [36], [39] have significant drawbacks and are far from being practical.
Differently, a VLC-enabled passive ToF system can use a single broadcast signal for multiple purposes, such as communication, illumination, and 3D sensing. This exploits the same spectrum and hardware resources, allowing it to play a significant role in the future wireless network industry.
Unlike previous studies, we make use of the fortunate fact that VLC infrastructure is often found in homes, office settings, industrial zones, and vehicles, i.e., interesting application fields for ToF cameras [43]. We propose using the existing VLC infrastructure for illumination, turning the background light from a disturbance into a useful optical signal for the ToF camera. In light of existing literature, no prior works have shown passive ToF imaging by leveraging opportunity illuminators. This article demonstrates the VLC-enabled passive ToF system that can provide communication and 3D sensing for free in terms of additional power consumption.

III. SYSTEM MODEL
We consider a single-input-single-output (SISO) system that can emit signals aimed at both probing scene objects and enabling communication with DL users. Fig. 1 shows the detailed block diagram of our proposed VLC-enabled passive ToF sensing scheme. In this section, we first model the free-space optical channel, and later on, we define the mathematical model based on the communication and the passive ToF sensing perspectives. Consider a point-to-point VLC-based optical wireless link that broadcasts a DL signal toward the scene and the user. A bistatic setup is designed to synchronize the ToF camera. In this context, the reference signal for the PMD module can be obtained via a direct lineof-sight (LoS) link between the VLC source and the reference PD. The VLC source and the ToF camera should not be co-located in contrast to conventional ToF cameras.

A. VLC Channel
The underlying characteristics of a VLC channel are mainly dictated by the optical link configuration [44]. The most influential aspect of VLC communication performance is the quality of the communication channel. Kahn and Barry [45] demonstrated six different configuration cases for indoor VLC links based on the presence or absence of an LoS link between an optical transmitter (TX) and a receiver (RX). The degree of directionality and orientation between TX and RX determines the link configuration of indoor VLC systems. The direct LoS link is a dominant communication link configuration and is widely used in the literature [46]. The emitter beam and the field-of-view (FoV) of the RX define the optical transmission channel. In this case, the LoS channel facilitates obtaining a higher received light intensity (i.e., a higher signal-to-noise ratio (SNR) that can be used to achieve suitable data rates and a long communication range). Propagation of optical signals in an optical wireless channel [47] can be written as where P R denotes the received optical power, P T is the source transmitted signal power, G OTX is the gain of an optical TX, G ORX is the gain of the optical RX, and G D denotes the attenuation of light intensity over the propagation distance.
A SL accounts for the other system-dependent losses due to the system design and the link misalignment configurations. Optical TXs (i.e., LEDs) are the core components of VLC communication. A Lambertian pattern models the intensity profile of LEDs in the spatial domain. In fact, the radiation pattern from the Lambertian source has radial symmetrical profiles and is controlled by its order of emission m = −ln(2)/ln(cos( 1/2 )). Here, 1/2 represents the half-power beamwidth of the TX. The existing literature on optical wireless channels has reported that the Lambertian radiation pattern is widely accepted by the VLC research community (see, e.g., [44], [48], and references therein). Considering the Lambertian beam TX aperture and received light at the PD, then the gain of the optical transceiver in the presence of free-space propagation losses can be expressed [49] as follows: where is the full-width transmit beam divergence angle, D R denotes the optical RX aperture diameter, d TRX is the distance between TX and RX systems, and λ is the wavelength.

Equation (1) can be rewritten as (3) by substituting the gain values
From (3), we can encapsulate the free-space propagation loss η and this can be given by According to (4), the attenuation factor is also affected by the emitter's beamwidth, the RX's aperture diameter, and the propagation distance between TX and RX. A wider beamwidth leads to high attenuation and a large aperture reduces the attenuation coefficient because the RX may collect more light. In practice, LED generates the far-field pattern with its maximum intensity region. For an LED with a Lambertian emission pattern, the angular intensity distribution is the maximum at 0 • , and the theoretical half-power beamwidth is 1/2 = 120 • . Optical signal propagation loss (optical path loss) can be computed by making use of the LED beam divergence angle and the diameter of the PD aperture.

B. VLC Communication Model
VLC is a novel wireless communication technology supported by existing lighting infrastructure. Fig. 2 presents an indoor VLC link. The VLC infrastructure now serves as an opportunity illuminator in passive ToF sensing. We used two types of light-based wireless communication modules: OpenVLC1.3, which is a research platform, and LiFiMAX module, which is a marketable product. The functionality of the modules is discussed in the sequel.
1) OpenVLC: This is one of the most common open-source and adaptable platforms available to the VLC research community. It emits a manchester-coded on-off-keying (MC-OOK) modulated signal. This translates each bit into two transition levels, such as 0 bit and a 1 bit as "HIGH" during the first and second half of the symbol period, respectively, and otherwise, it is "LOW", as enunciated in (6). The transmitted data signal for j ∈ {0, 1} can be formulated as [17] x MC- where v j (t) denotes the pulse waveform and T is the period of each bit. Let us assume that x MC−OOK (t) is the transmitted data stream. In reality, the emitted optical signal x MC−OOK (t) does not exactly coincide with (5) due to the low-pass behavior of the LEDs. Therefore, our realistic theoretical model exploits the definition of convolution between the emitted signal (5) and the LED impulse response function, h LED (t) = e −t/τ /τ, which is modeled as a first-order low-pass filter. This operation results in a probing signal for the sensing and transmitted signals for communication, as expressed by [46] p VLC The optical signal propagates via the LoS channel model and is received by the DL RX (i.e., PD). The PD translates the light intensity into an electrical output signal y PD DL (t). The probing signal is further encapsulated as the received DL signal by performing the convolution between the probing signal (7) and the LoS channel response given in (9). This can be mathematically represented as [46] y PD where * denotes the convolution operator and n(t) is the independent additive white Gaussian noise (AWGN) with zero mean and unit-variance. Moreover, the LoS channel response is a time-shifted and scaled delta function that signifies the amplitude drop and delay of the transmitted data signal. As a result, an optical path loss becomes a significant parameter for characterizing the LED illumination potential. Indoor VLC channels may contemplate the effect of both the LoS and the non-LoS components at the RX end. LoS seems to be the most frequent method used for indoor OWC and illumination settings. In line with [50] and [51], the LoS channel resides between the optical TX and RX. In this work, we ignored the non-LoS components of the optical link without sacrificing generality and considered a single LoS channel between TX and RX. The LoS channel impulse response (CIR) can be written as where η is the attenuation coefficient that is introduced due to the optical channel propagation losses (path loss), t is the propagation time (delay offset of the LoS path) that the optical signal undergoes in free space, and δ(t) is the Dirac delta function.
2) LiFiMAX: Li-Fi is a wireless networking technology that has recently been developed for commercial applications as a fully networked device. We used the LiFiMAX system in this work. It can be easily installed on the ceilings of a conference room or workplace and it provides network access to any device equipped with a plug-and-play LiFiMAX dongle. This enables Internet connectivity for 16 users within 28-m 2 cell size with throughputs of 60 and 40 Mb/s in the DL and UL respectively. LiFiMAX uses carrierless amplitude and phase (CAP) modulation, one of the efficient spectral methods to overcome modulation bandwidth challenges in VLC. CAP modulation has seen extensive research interest in VLC applications as a result of its excellent spectral efficiency as well as its simplicity [52]. The transmitted signal can be expressed by where a n and b n are the real and imaginary parts of the nth symbol, respectively, T is the time period, and n is the symbol index. The pulse-generating orthogonal filters are given as follows: where g(t) is the root raised cosine filter (RRC), ω c is the angular frequency of the filter, and T is the symbol time period. CAP is a variant modulation scheme of quadrature amplitude modulation (QAM) signals for Li-Fi communication [53]. The received signal is further demodulated and decoded to retrieve the original data stream. Note that the UL VLC signals are not considered in this case. For the sensing purpose, the received signal at the reference PD is an analog signal that needs to be converted to the binary signal for the synchronization of the ToF camera.

C. Passive ToF Sensing Model
In the bistatic passive ToF paradigm, opportunity illumination sources (i.e., VLC sources) illuminate the scene, and the ToF camera acquires the reflected signal. We studied a PB-ToF method based on VLC-modulated light signals. The VLC signal interacts with the scene, and the observed ToF signal is affected by the scene response function (SRF), defined in (12), denoted by h s (t), where s represents the scene. Single reflection K = 1 is often considered in the general setting of the ToF camera. In the most general case, (K ≥ 1), a single ToF pixel may receive multiple reflections from the scene. In this context, the SRF is given by the expression [54] h (12) where δ(t) denotes the Dirac delta function and are the reflective components of the targets and corresponding delays that are introduced by K backscattered light paths [14]. The reflected signal is obtained by performing the convolution between the probing signal and the SRF, i.e., r (t) = ( p VLC TX * h s )(t) and this follows (7). The reflected signal is given by [55] where the SRF is the shift-invariant response function. The conventional CW-ToF method uses the internal signal generated by the ToF system as DCS [14], [56]. Differently, in this work, we exploited the thresholded version of the RPD signal, y RPD (t), as the DCS. This is a binary signal used to synchronize the ToF camera. The received signal is sampled at every bit duration T b . In that manner, we employed a thresholding scheme that compares the received signal amplitude to the threshold voltage V th ; this allows us to make a decision and generate logical levels low and high. The signal prior to thresholding is denoted as the RPD signal, y RPD (t), and the thresholding operation can be written as follows: where A is the required amplitude level of the ToF reference signal. Let us first explain the measurement gathering process to gain insights into how our proposed method processes the acquired data. During the acquisition time interval, the VLC emitter transmits signals continuously, which is long enough to capture the reflected light signals. We can work with a fixed time window of size τ , which is facilitated by the main lobe of the autocorrelation of VLC signals. This allows us to use a given range of samples in simulation and hardware experiments and accounts for the finite exposure time of the ToF camera. The cross correlation between (13) and (14) provides continuous-time measurements m(t) as given in (15). The measurements are obtained by shifting the control signal within a given range of delays, related to the depth range to cover. This yields samples of the cross correlation function where τ is the shift component. The discrete measurements can be written as m[i ] = m(t)| t =iT,i∈Z , where 0 ≤ t < T is the sampling range.
IV. DEPTH RECOVERY APPROACH The focus of this section is now shifted to depth reconstruction using matched filtering. Now, our discussion is linked to our numerical experiments, how we built our simulations from real optical signals and attempted to invert the model. Later, we will use real measurements from PMD pixels. The starting point is the generation of measurements in order to emulate our numerical experiments. We briefly discuss the measurement generation process to perform matched filtering for depth recovery. The GT cross correlation function is shifted and downsampled by a shift operator, S q : R N → R N and a downsampling operator, D q : respectively. This yields a measurement vector that can be written as follows: where u τ = Nτ/T and the spacing between the samples is defined as R = N/Q. For the number of measurements acquired according to the considered cases, we adhered to the standard number of measurements, namely, Q ∈ {4, 8, 16}, while N Q and Q is a different representation of M as measurements. The MF correlates the test vector obtained from the reference cross correlation vector and the measurement vector [22], [57]. To demonstrate the correct operation of MF, AWGN is added to the measurements for our simulation, which is a common channel model for noise. The SNR ranges from −30 to 100 dB for attaining measurement vectors. The reference cross correlation is generated in simulations and calibrated in experiments. We obtained the test vector by applying the same sequence of shifting and downsampling operations, but for a candidate delay to test, τ , so that a vector of the same dimension as the measurement vector is obtained that is a function of τ . The test vector can be generated as follows: The estimated time delay (18) is provided by determining the argument that maximizes the MF function, and from this, the depth estimation can be carried out as given in (19) where ., . denotes the standard inner product of vectors. The proposed methodology allows depth reconstruction, depending on various measurement SNR values. The translation of time shift into the distance is a simple linear computation Different sampling schemes are used in the time-shift domain. The depth range is given by the width of the main lobe of the cross correlation function. We sampled the cross correlation function according to uniform, random, and sparse ruler [58], [59] sampling schemes.

A. Bistatic Geometry
Bistatic ToF imaging is a new line of research; bistatic configurations of emitters and RXs have not yet been explored in 3D ToF imaging. In the passive modality setting, the estimated depth defines a 3D ellipsoid where the target point may lie due to the bistatic geometry. The foci of the ellipsoid are the VLC emitter and the ToF RX. Provided that we know the observation direction of each pixel of the ToF array due to lens calibration, we can compute the intersection between the ellipsoid and the direction vector. This defines the 3D location of the target point, thus retrieving the accurate depth between the camera and the target. The 3D location of the target can be written as follows: where T is the target position, R is the position of the RX, u R is the unit vector, and d RT is the distance between the RX and the target. The 3D ellipsoid is defined by where d ET is the Euclidean distance between the emitter and the target and d RT defines the Euclidean distance between the RX and the target. The depth offset, d o , is associated with the used cables and electronics. It is not related to the scene geometry. The complete proof takes profit from the lens calibration and uses the observation vector u R to attain the true depth. The proof of the final depth for the bistatic geometry is provided in the Appendix I and can be expressed as where d ER = E − R 2 2 and C x , C y , and C z are defined in the Appendix I. d is the total distance that is obtained by (19). u x , u Ry , and u Rz are the observation vector components in the x-, y-, and z-directions. The demonstrated results in Section VI-E are carried out by applying (19) first and this estimated depth is further used in (22) to obtain the true depth between the RX and the target.

B. Generalized Sampling in Time-Shift Domain
Classical sampling theory, based on the Shannon-Nyquist theorem, assumes US, but related work has shown that nonuniform sampling schemes may reduce the required number of samples without compromising the signal reconstruction quality [60]. We analyze three different sampling schemes in the time-shift domain: US, pseudorandom sampling, and sparse-ruler-based sampling.
1) Uniform Sampling: Reconstruction of a signal from a discrete number of sampling points, where the signal is sampled kT uniformly at fixed-time intervals, is known as US, where T is the time interval and is probably the most widely spread sampling technique. In an arbitrarily fine discrete domain, the samples would be located at the positions k ∈ {0, s, 2s, 3s, . . .}, where s is a discrete version of T . Fig. 3(a) shows US with fixed time intervals. Provided that our signals of interest are ultimately K -sparse, with K = 1 in the single-bounce case, Shannon sampling yields redundant and unnecessary samples, raising the computation complexity of subsequent processing. To overcome this limitation, one can make use of nonuniform sampling [61].
2) Nonuniform Sampling: It deals with sets of sampling points that are not uniformly distributed over the sampling domain. Different methods can be used to define the location of the sampling points, which may be random or deterministic. In this work, we analyze the depth reconstruction and performance obtained by employing random [58], [62] and sparse ruler (see [59] and references therein) sampling on the time-shift domain. The current upsurge of interest in sampling signals at rates lower than their Nyquist rate can be credited to compressive sampling (also known as compressed sensing) [63], [64], which has fueled a substantial amount of research over the last several years, including in the ToF 3D imaging community [1].
Conclusively, we consider sparse measurements in our simulation settings. Theoretically, a sparse ruler is one that misses some marks, but it can still measure all integer distances between 0 and the ruler's length L. For example, a sparse ruler with the length L = 23 and M = 8 marks would be {0, 1, 2, 11, 15, 18, 21, 23}. The cardinality of the set M determines the number of measurements, and the value of the marks denotes the distance between each measurement sample and the reference sample. We have exploited an optimal sparse ruler of type Wichmann W (2,5). Note that, since all markers are set on a discrete grid (like a sequence), the distances between them are always expressed in terms of integers rather than time-shift units.
3) On-Grid and Off-Grid Sampling: The sampling schemes considered in this work rely on an underlying regular grid. The regular or fine gird sampling is governed by the hardware (i.e., oscilloscope). It is assumed that the sample location fits within the fine grid's resolution. For US and RS methods, on-grid samples can always be generated, regardless of the step size of the fine grid. In the context of the sparse-ruler sampling (SRS) method, samples may not coincide with the on-grid locations. In this case, the signal is further downsampled to match the fine grid resolution required by the sparse ruler. This results in a grid mismatch or locations that are off-grid [65]. One has to deal with the grid mismatch between the fine grid and the sparse-ruler-based sampling grid (off-grid). To this end, the interpolation method can be used to predict the samples located between the fine-grid GT samples and form a smooth curve between on-grid and off-grid sample locations.

V. EXPERIMENTAL SETUP
In this section, the focus is on designing a bistatic setup for passive ToF imaging. We provide the details of the hardware components that are used in the setup. The bistatic geometry of the VLC-enabled passive ToF setup is shown in Fig. 4(a). Our proposed experimental setup is shown in Fig. 4(b). The simulations and experiments are developed based on our proposed pipeline (cf. Fig. 1). We used an evaluation board containing a PMD camera module endowed with built-in external reference capabilities to control the PMD pixels. This module needs binary signals with 0 V as low and 1.8 V as high voltage levels of the external reference signal.
In this case, we employed a thresholding operation that emulates the thresholding circuit to digitize the analog signal   for external referencing. This circuit should come before the evaluation board. For the thresholding operation in our experimental settings, we used a pulse generator (HP 8082A) that carries out the thresholding operation. The key parameters of ToF setup and evaluation board are given in Tables I and II. VLC communication experiments were carried out in dark and bright room conditions. ToF sensing tests were conducted in the darkroom settings. In the experimental setup, we used an optical rail to move the emitter and RX easily. The rail width is our x, the height of the emitter/RX is y, and the depth is denoted by z. The length of the rail is 1 m.
Here, we have provided a procedure for the proposed method and the process for acquiring data as follows.
2) The RPD is placed at the top of the ToF sensor. It is responsible for obtaining the optical signal for synchronizing the ToF camera. This establishes a direct link between the emitter and the RPD.
3) The RPD signal is inserted into the thresholding circuit, converting the analog signal to the digital signal. Furthermore, the thresholded signals are fed to the picosecond delayer (PSD), enabling custom delays. In our case, we selected our range of scenarios based on the range offered by the PSD. 4) The output signal of the PSD is launched into the evaluation board after the necessary voltage-level adaptation. 5) On the flip side, the reflections are acquired by the PMD camera. The ToF camera is mounted on the optical rail at the position of R = (7, 21.5, 60) cm. The evaluation board has a reference mixer, which takes the reflected signal and mixes it with the external reference signal. Assume that the signal, which is fed into the sensor board, is a thresholded representation of the VLC signal. This yields a cross correlation operation, and by delaying the signal, we can obtain measurements. 6) The target is placed at a distance of 25 cm from the ToF evaluation board. In our case, we used a MiraVera PMD version that features a high-resolution of 640 × 240 pixels. The Boehler star is used as a target in 3D. This is useful for evaluating the camera's real lateral resolution.

VI. RESULTS AND DISCUSSION A. Evaluation of OpenVLC and LiFiMAX Modules
For evaluating the performance of both exploited modules, we obtained optical signals using a Thorlabs PD (PDA 10A-EC) via an Agilent Oscilloscope (MDO4104-6). The optical signals can be seen in Fig. 5(a) and (b). The random data signals are generated from both modules (OpenVLC and LiFiMAX). These signals are used to emulate the passive modality. The acquired data signals are thresholded by following (14) with respect to their local average.

B. Depth Reconstruction Error in the Presence of Noise
A number of simulations are carried out using real optical signals of both modules. We exploited the two different ranges of each of the modules. For the OpenVLC module, which has the largest time period, the range can be covered up to 150 m, but we restricted simulations to a range of 6.72 m because of our PSD, which can provide shifts only up to the considered range.
In this section, we group our results based on different sampling approaches, according to Section IV-B, and the number of measurements. Synthetic ToF data measurements were generated and modeled as pulse-shaped cross correlation samples, as reported in Section VI-A. The shape is approximately Gaussian with a standard deviation of 44.8 ns for OpenVLC and 10 ns for LiFiMAX. The measurements are generated via (15) by following the considered sampling methods in the time-shift domain. We took into account 40 realizations for each approach used. Four samples of the correlation function are often used for phase computation in consumer devices, such as PMD cameras. Without sacrificing generality, samples are studied using a single-echo case in which the cross correlation function range has been partitioned into several measurement samples. The depth reconstruction results of this study are shown in box plots as shown in Fig. 6 for the OpenVLC module. We utilized root-mean-square error (RMSE) as our performance metric to evaluate the depth reconstruction performance of our method, i.e., where N is the number of independent depth scenarios,d is the estimated depth, and d GT is the ground-truth depth. The RMSE is considered in logarithmic scale RMSE[dB] = 10 log 10 (RMSE). The measurements SNR is controlled ranging from −30 to 100 dB. The matched filtering results are restricted up to 50 dB since a plateau at a negligible error was attained, as presented in Fig. 6. Several depth tests are carried out for a number of considered ranges. When the SNR is greater than 0 dB, a matched filtering method attained a negligible error. A more comprehensive analysis showed few outliers in all sampling schemes. One may observe a large error for SRS due to the combined effect of noise and the off-grid phenomenon due to a mismatch between the fine grid and the grid required by the sparse ruler. The matching filtering approach performs well, with US and RS.  The MF performs better since the transition from failure to successful reconstruction happens at a much lower measurement SNR. Fig. 7 presents the depth reconstruction error for matched filtering using the LiFiMAX module. The same procedure is conducted for the Li-Fi module. We attained a −90-dB RMSE value for uniform, −95 dB for random, and almost −80 dB for sparse ruler. It is observed here that sparse-ruler depth reconstruction error has a minimal difference in results with respect to the other two techniques.
Our results demonstrate how depth retrieval from samples gained according to different sampling schemes performed in noisy environments. Assume that we have noisy measurements, and the noise might be from the ToF sensors or the surroundings. We attempted to provide the results of the measurement SNR and the estimated depth SNR. The effects of measurement SNR on the estimated depth SNR were studied for both the OpenVLC and LiFiMAX modules. We provide the OpenVLC SNR results here. The estimated SNR becomes constant after 0-dB SNR in matched filtering, as shown in Fig. 8.

C. Parametric Modeling
The measurements are samples of the cross correlation model in (15). The data points were constrained to a range that meets the range of our PSD. In the time-shift domain, a window size of about 45 ns was taken into consideration. The cross correlation was obtained from the full range. The cross correlation is approximated as a sum of Gaussian functions in (24) and a sum of sines model in (25) for the OpenVLC. The sum of sines and Fourier models in (26) were found to be best in describing the LiFiMAX cross correlation function where t is the time scale of the cross correlation function, a i , b i , c i , and ω are the model parameters, and n = 2. The fit models for the considered range cases are shown in Fig. 9 against the points used for fitting. Plots of Fig. 9 demonstrate an excellent match with the cross correlation functions, with R 2 = 0.9999 in almost all cases considered. Fig. 10 shows the depth reconstruction using the Gaussian model for the OpenVLC module and the sum of sines model for the LiFiMAX cross correlation function. We attained successful depth reconstruction up to constant for both modules.

D. Performance Evaluation of VLC Communication
This section focuses on the communication-oriented system performance, considering both modules. Here, we concentrate our analysis on the OpenVLC module due to the fact that it is a research-based module, while the LiFiMAX module is a commercial product. We evaluated the OpenVLC module performance in terms of SNR and theoretical bit error rate (BER) as a function of distance. These metrics are used to examine the MC-OOK VLC link for indoor communication. The BER is given by BER = Q f ((SNR) 1/2 ), where Q f is a quality factor that can be defined as Q f = erfc((SNR/(2) 1/2 )). Fig. 11(a) represents the throughput of the OpenVLC link as  a function of distance. It can be seen that this can provide illumination without affecting the communication data rate. Fig. 11(b) shows the measured SNR and theoretical BER of an indoor VLC channel using the ThorLabs PD and the Agilent oscilloscope. The eye diagrams are measured after downsampling to one sample per bit. The MC-OOK modulation is used and an eye diagram is generated by making use of OpenVLC random data signals. The eye-diagram settings are enabled in the oscilloscope to see the eye patterns. It is possible to define the quality of a transmission network using eye diagrams. The SNR can be computed as follows: First insights can be gained by analysis of the eye diagram's vertical and horizontal characteristics. We choose to measure the vertical amplitude and horizontal eye openings, as shown in Fig. 12.

E. Preliminary 3D Imaging Results
This experiment aims to demonstrate that the concept of passive depth recovery works in practice. To this end, we used only four uniformly distributed samples for depth reconstruction. The results are shown in Fig. 13. Fig. 13(a) shows the so-called Boehler star [66] used as a target, which is a 3D representation of the Siemens star used to determine the spatial resolution of depth sensors. The VLC module is used to acquire raw images with custom delays, and the exposure time is maintained at 5 ms. The depth can be obtained via (22) using the lens normals. In the experiments, we recorded the measurements for the reference function, #» Y GT , in the range provided by our PSD. One can cover the complete range of the main lobe of the cross correlation function. A plane is used to acquire the reference function data. Then, MF is used to retrieve the depth from the measurements by leveraging the recorded reference function. The 3D reconstruction in Fig. 13 is the first one ever obtained from a ToF camera without a dedicated illumination system and validates the proposed passive ToF imaging concept.

VII. CONCLUSION AND FUTURE OUTLOOK
In this work, we proposed a novel VLC-enabled passive imaging pipeline that allows depth estimation up to machine precision in simulations. This study opens up new possibilities for simultaneous communication and 3D sensing systems by exploiting different sampling schemes in the timeshift domain. The validity of the proposed method has been demonstrated using matched filtering, resulting in a depth reconstruction accuracy of 95% at suitable SNR levels for both modules. The key advantage of our work is a drop-in replacement of the classical ToF illumination units by existing, uncontrolled VLC sources. This method drastically reduces the power consumption of ToF cameras and eliminates temperature drift effects from measurements that are generated by the light source [67]. We have also shown that this method performs well with noisy measurement data in simulations. Low-complexity parametric models, including Gaussian and the sum of sines, have been proposed to characterize the cross correlation functions of both modules. Extremely low fitting errors confirmed the validity of the proposed models. Hardware experiments validate our proposed methodology by attaining the worst case depth error of 20 mm at a 25-cm target distance. In addition, we acknowledged that the depth accuracy depends on the VLC source bandwidth. The communication signals are not optimized for sensing purposes. The design of modulation waveforms that are jointly optimized for both purposes is a subject of future work. On the theoretical front, we have provided a complete formulation of the direct sensing model and the method we use for solving the inverse problem. Our method raises interesting considerations regarding the hardware. We intend to explore our hardware implementation further in the future. (22) Remember that (22) is derived from (20) and (21) since the emitter and the RX locations are known. In this regard, (21) is further rearranged and represented in the Euclidean distance between the emitter and the target. By omitting d o , this can be expressed as follows: (29) where E = (E x , E y , E z ) and T = (T x , T y , T z ) are 3D locations of the emitter and the target, respectively. Equation (21) is broken down into 3D coordinate components and it can be defined as T x = R x + u Rx d RT T y = R y + u Ry d RT T z = R z + u Rz d RT (30) where R = (R x , R y , R z ) are the 3D coordinates of the RX and u R = (u Rz , y Ry , u Rz ) are the components of observation vector. We substitute (30) in (29) and, after carrying out some re-arrangements and manipulation, we get d 2 − 2dd RT + d 2 RT = C 2 x + C 2 y + C 2 z +d 2 RT u 2 Rx + u 2 Ry + u 2

Rz
−2d RT C x u Rx + C y u Ry + C z u Rz (31) where d 2 ER = C 2 x + C 2 y + C 2 z is the Euclidean distance between emitter and RX, where C x = E x − R x , C y = E y − R y , and C z = E z − R z are the differences of the x-, y-, and z-coordinates of emitter and RX, respectively. At this point, it is noted that the sum of normal vector components is equal to unity, (u Rx 2 + u Ry 2 + u Rz 2 = 1). By substituting this in (31) and making some rearrangements, we obtain the bistatic depth recovery formulation as given in (22).