Robust and Guided Bayesian Reconstruction of Single-Photon 3D Lidar Data: Application to Multispectral and Underwater Imaging

3D Lidar imaging can be a challenging modality when using multiple wavelengths, or when imaging in high noise environments (e.g., imaging through obscurants). This paper presents a hierarchical Bayesian algorithm for the robust reconstruction of multispectral single-photon Lidar data in such environments. The algorithm exploits multi-scale information to provide robust depth and reflectivity estimates together with their uncertainties to help with decision making. The proposed weight-based strategy allows the use of available guide information that can be obtained by using state-of-the-art learning based algorithms. The proposed Bayesian model and its estimation algorithm are validated on both synthetic and real images showing competitive results regarding the quality of the inferences and the computational complexity when compared to the state-of-the-art algorithms.


I. INTRODUCTION
Three-dimensional (3D) imaging has generated significant interest from the scientific community due to its increasing use in applications such as self-driving autonomous vehicles.Single-photon light detection and ranging (Lidar) is a technology for high resolution 3D imaging, where its high sensitivity and excellent surface-to-surface resolution can provide rich information on the depth profile and reflectivity of observed targets in challenging imaging scenarios.Single-photon LI-DAR operates by emitting picosecond duration laser pulses and collecting the reflected photons using a single-photon sensitive detector which measures the arrival time of each return photon using a time correlated single-photon counting (TCSPC) system [1].This results in a collection of X-Y pixels, where a timing histogram of photon counts with respect to their time of flight is constructed for each pixel.In the presence of a target with partially reflective or scattering surfaces, the histogram will contain a peak whose amplitude and location are related to the object reflectivity and distance from the sensor.This process can be repeated using different laser wavelengths to obtain a multispectral 3D image of the scene.
Several practical challenges currently limit the use of Lidar in real world conditions.This paper focuses on some of them and provides a principled statistical-based solution to improve performance.Such challenges include the photon sparse regime often observed for long-range imaging [2] or rapid imaging based on short acquisition times [3], [4].Lidar is also sensitive to the observation environment when imaging in bright conditions [5], and through obscurants or turbid media, such as underwater [4], [6], or through fog, rain [7].The latter causes photon scattering which results in the immersion of the useful signal within a high and possibly nonuniform background level [8]- [10].To obtain more detailed information about the observed target, one approach is to use multiple laser wavelengths which inevitably lead to larger data volumes which may necessitate the requirement for advanced algorithms to only select useful pixels [11], [12] or to account for shared data structures and correlations [13]- [16].
Several solutions have been proposed in the literature to tackle these challenges.We distinguish three broad families: statistical, learning-based and hybrid methods.The former builds on a statistical model and solves the resulting inference using stochastic simulation methods [6], [17]- [19], or optimization algorithms [10], [16], [20].These principled methods benefit from a good interpretability but are subject to the definition of good features to represent the data.The second family learns important features from training data with an available ground-truth, and then uses the learned features to process new measured data [3], [21].These approaches are dependent on the training data, and might require expensive network retraining if the imaging conditions change (e.g., different noise level).The third family uses a plug-andplay (PnP) approach [22] by combining methods of different families to improve performance [23], [24].Beside providing good results, these methods can lack interpretability (e.g., in terms of convergence) and increasing interest is now devoted to providing principled PnP formulations as in [25], [26].
This paper proposes a statistical-based algorithm for robust processing of multispectral 3D Lidar data acquired through obscurants.An approximate likelihood distribution is considered and a hierarchical Bayesian model is proposed to exploit the data Poisson statistics, the multi-scale information (known to improve noise and photon-sparsity robustness [10], [16], [23], [27]), and prior knowledge on the depth and reflectivity maps.This hierarchical model ensures the robustness of the proposed strategy to the mismatch between the simplified observation model and the actual one.The statistical model introduces latent variables that are connected to the parameters of interest using Markov random fields to account for spatial arXiv:2103.10122v1[eess.IV] 18 Mar 2021 correlations between pixels.This formulation allows for independent parameter updates, which allows for fast parallel implementations.Inspired by the PnP approaches, we propose a weight-based model which exploits the results of state-ofthe-art algorithms as a guide to improve performance.The parameter's posterior distribution is obtained by combining the likelihood and proposed prior distributions.This distribution provides parameter estimates together with their uncertainties which are essential for result analysis and decision making.More precisely, we used a coordinate descent algorithm [28]- [30] to approximate the maximum a-posteriori estimator of all parameters, leading to simple iterative updates based on analytical or well known operators (e.g., weighted median filter [31], [32]).The new algorithm is tested on simulated and real underwater data showing promising results in terms of robustness to noise, interpretability and computational cost when compared to state-of-the-art algorithms.
The paper is structured as follows.Section II introduces the observation model and formulates the considered approximated likelihood.The proposed hierarchical Bayesian model is presented in Section III, and the choice of the guidance weights is described in details in Section IV.Section V then introduces the estimation algorithm used to approximate the maximum a-posteriori estimate of the parameters.Section VI analyses the proposed algorithm's performance when considering synthetic data with known ground-truth.Results on real data are presented in Section VII. Conclusions and future work are finally reported in Section VIII.

II. PROBLEM FORMULATION
This section introduces the observation model for multispectral Lidar, followed by the likelihood approximation used in this paper.The last part presents the multi-scale information which is a key ingredient to restore the parameters of interest.

A. Observation model
In addition to object reflectivity, the TSCPC Lidar system measures the depth profiles by illuminating the scene and measuring the time-of-flight of the returned photons.These photons are then collected in a histogram of counts, denoted y n,t , and representing the received photon counts at pixel location n ∈ {1, • • • , N }, and time-of-flight (ToF) bin t ∈ {1, • • • , T }.In the case of multi-spectral imaging, the system illuminates the scene using K wavelengths leading to K histograms, denoted by y n,t,k , where k ∈ {1, • • • , K}.It is often assumed that the resulting histograms of counts follow a Poisson distribution P (.) as follows [10], [16], [17]: ) where s n,t,k represents the average photon counts in the nth pixel, tth time bin and kth wavelength.In presence of at-most one target per-pixel, the signal can be approximated as follows where f k represents the system impulse response (SIR) of the kth wavelength, which can be measured during system calibration, r n,k ≥ 0 represents the reflectivity of the observed object assumed different for different wavelengths, d n ≥ 0 represents the object distance assumed the same for all wavelengths (related to the object depth profile), and b n,t,k ≥ 0 represents the background which gathers all photon events that do not originate from reflections at the target surface, i.e., the dark counts of the detector and the environment background due to the ambient illumination or photon scattering when imaging through obscurants.When imaging through turbid media, the background will have a non-uniform shape with respect to the depth observation timing window [8], [33], hence the dependence of b on t.Our goal is to estimate the depth and reflectivity parameters when imaging in extreme conditions due to imaging though obscurants (high and nonuniform background) or sparse photon imaging (e.g., rapid or long-range imaging).

B. Approximated Poisson likelihood
Assuming independence between the observed pixels y n,t,k leads to the joint likelihood where d is an N × 1 vector gathering depth values, R, B are N × K matrices gathering reflectivity and background values, respectively.In absence of background counts and assuming that , ∀k for all realistic d n , the likelihood reduces to where ∝ stands for proportional to, G(x; ., .)denotes the gamma distribution with shape and scale parameters, s y n,k represents the histogram of target reflected (or signal) counts, Q s y n,k is a function of signal counts and represents the maximum likelihood (ML) estimate of the reflectivity at the kth wavelength obtained by summing the signal counts.Note that the depth maximum likelihood estimate is obtained using a simple log-matched filtering of the histogram with the SIR, as follows It is common to approximate the SIR at each wavelength with the Gaussian function , [34].In this case the likelihood in (4) becomes where N (x; µ.σ 2 k ) represents the Gaussian distribution with average µ and variance is given analytically when considering Gaussian approximation for the SIR.Considering these approximations, Eq. (7) indicates that the depth and reflectivity parameters are independent and that they appear within conventional Gaussian and gamma distributions, which is crucial for the design of the proposed Bayesian strategy.Indeed, the quality of the ML depth and reflectivity estimators is known to be poor in challenging scenarios, hence the need to account for known parameter properties to improve reconstruction.This can be done within the Bayesian framework adopted in this paper.

C. Multiscale information
A common approach to improve the performance of maximum likelihood estimation for Lidar data is to consider multiscale information, as already exploited in several state-ofthe-art 3D Lidar denoising algorithms [10], [16], [23], [27].
The key observation is that spatially downsampled histograms, which are still Poisson distributed, lead to depth and reflectivity estimates with lower noise at a price of a reduced spatial resolution.In this paper, we adopt a similar strategy by considering L downsampled version of the histogram of counts.For each wavelength k, spatially downsampled version of the histograms Y are first computed based on predefined L graphs of neighbours φ 1,••• ,L leading to Y k (for example, q (2) = 3 × 3 neighbours for φ (2) , a q (3) = 5 × 5 neighbours for φ (3) , • • • ).The latter can be efficiently computed using convolutions in the case of a regular grid but our algorithm can be equally applied to a non-uniform sampling grid of the pixels.Assuming independence between these histograms lead to L likelihood distributions as follows , = 1 is the original cube, and for example, = 2 corresponds to a 3 × 3 downsampling, = 3 to a 5 × 5 downsampling, etc.

III. HIERARCHICAL BAYESIAN MODEL
Estimating depth and reflectivity parameters in extreme conditions is an ill-posed problem which requires the use of prior information to alleviate its indeterminacy.A Bayesian strategy is considered to combine the approximate likelihood described above, with parameter prior distributions accounting for known parameter properties.The resulting posterior distribution will be exploited by deriving Bayesian point estimators and additional measures of uncertainty about the estimates.The following sub-sections introduce the proposed Bayesian model.

A. Prior distribution for depth
Our model assumes the observation of L depth maps d ( ) obtained from multi-scale downsampled histograms, and having different noise levels as highlighted by the Gaussian variances in (8).Object depth profiles exhibit homogeneous surfaces (i.e., spatial correlation) separated by a discontinuous jump between different surfaces.This requires enforcing spatial correlation between the pixels of a surface, while preserving edges of isolated objects or between separated surfaces.To mitigate this information, we introduce an N × 1 latent variable x that is connected to all multi-scale depth maps, to provide a robust reconstruction of the true depth map by considering correlations between pixels.To preserve edges separating different surfaces, we propose the following mixture of Laplace conditional prior distributions for x as follows where L(x; µ. ) represents the Laplace distribution with average µ and variance or diversity parameter , ν n represents the spatial neighbourhood of the nth pixel, d ( ) n denotes the mean, n > 0 is the variance of x n and w ( ) n ,n ≥ 0 are constant weights to be defined.Note that ( 9) preserves edges as it considers the sparsity promoting 1 -norm of the differences between x and d.The weights w ( ) n ,n ≥ 0 are essential as they allow guiding the connections between x and D using any available side-information (e.g., obtained from other sensors in the case of multi-modal imaging, or by using state-of-the-art denoising algorithms in the case of plug-and-play approaches).It is also worth noting that prior ( 9) is connected to the Bayesian lasso model [35], [36].Indeed, ( 9) could be obtained by marginalizing the exponentially-distributed variance hyperparameter of a Gaussian mixture prior.Finally, (9) does not enforce positivity on the depth parameter x, however, this will be ensured as indicated in Section V-B.

B. Prior distribution for reflectivity
In a similar fashion to depth, spatial smoothness can be enforced on the reflectivity by considering latent variables as in the gamma Markov random field prior [37].However, this prior will lead to underestimated reflectivity values as already highlighted in [19].In this work, we introduce an N ×K latent variable M assigned a Gaussian prior distribution as follows where v ( ) n ,n;k ≥ 0 are constant weights to be defined, and ψ 2 n,k represents the variance of the latent variable and contains reflectivity uncertainty information for the kth wavelength.The variable m k contains multi-scale reflectivity information through r ( ) n,k and will serve as the reflectivity estimate for the kth wavelength.Although this is not a conjugate prior, it will lead to non-negative analytical estimates for M , R as indicated in Section V.

C. Priors of the variance hyperparameters
The variance parameters n , ∀n (resp.ψ 2 n,k , ∀n, k) should be positive.Assuming prior independence between the parameters n , ∀n (resp.ψ 2 n,k , ∀n, k) and accounting for their positivity, we assign a conjugate inverse gamma distribution for these parameters as follows where α r , β r , α d , β d are positive user fixed hyperparameters.
In absence of additional knowledge, these hyperparameters are fixed to obtain a non-informative prior.

D. Posterior distribution
The joint posterior distribution of this Bayesian model can be computed from the following hierarchical structure (after dropping indices for clarity) This distribution contains complete information regarding the parameters of interest x, D, R, M and their uncertainties , ψ.A common approach is to extract Bayesian point estimators such as the maximum a-posteriori (MAP) estimator or the minimum mean square estimator (MMSE).In this paper, we consider the MAP estimator of all parameters.It should be noted that the depth related parameters D, x, and the reflectivity ones R, M , ψ are independent allowing parallel optimization with respect to both set of parameters.Finally, Fig. 1 presents a directed acyclic graph (DAG) which summarizes the main parameters of the proposed hierarchical Bayesian model.

SELECTION
The choice of the weights is very important and will have a direct impact on the algorithm performance.Several strategies have been considered in the literature where the choice can be based on the spatial distance between points, similarity of their values, etc [38]- [40].In this paper, we assume the presence of guiding information (e.g., by using other algorithms, or sensors) and define these weights while considering multiscale and multi-wavelength information.

A. Depth weights W
Assuming the presence of an outlier free multi-scale guiding depth d ( ) , = 1, • • • , L, our selection of the multi-scale weights W encourages the depth map at a given scale d ( ) to be close to d ( ) (with a graph of neighbours φ 0 , for example 3 × 3 neighbours in a uniform grid).More precisely, we assign low weights for pixels that differ significantly from their corresponding pixels in d ( ) as follows for ∈ {1, • • • , L}, where w norm is a normalization constant ensuring ,n w ( ) n,n = 1, the coefficient ζ is easily fixed based on physical considerations related to the impulse response width and it is weighted by q ( ) to account for the multi-resolution effect.In (14), the product over promotes lower scales data if their weights are high.
We are now left with the task of finding a reliable multiscale depth guide which is robust to outliers.This information can be obtained by considering other sensing modalities such as Radar, Sonar, when available.It can also be obtained by applying an off-the-shelf depth reconstruction algorithm to the Lidar data (e.g., [10]).The latter strategy is adopted in this paper.We consider two methods, the first, denoted GD1 for guide depth 1, is inspired by [41] which adopted the rank order mean approach to unmix signal from background counts.Here, we first detects background corrupted pixels (those without q (2) neighbours having close depth values) and then replace them with the median of surrounding valid points.The second strategy, denoted GD2, represents d ML( =1:L) as a point cloud and applies an outlier rejection algorithm to remove corrupted values (i.e., using pcdnoise in Matlab [42]).We note finally that the weights could be updated with iterations leading to a pseudo-Bayesian approach [43], but this is out of the scope of this paper and will be left for future work.

B. Reflectivity weights V
The reflectivity weights are obtained from the multi-scale images r ( ) k , ∀k, , but we note that they can also be learned using additional reflectivity maps acquired by other sensors when available.Assuming the presence of r ( ) k , ∀k, reflectivity guides and depth weights W , we consider a multiscale bilateral filtering approach [24], [39], [44] and define the reflectivity weights as follows where v norm is a normalization constant ensuring n,n ;k = 1, and η n,k is a constant.As indicated in (15), correlation between depth and reflectivity images is introduced through the use of W to define V .The reflectivity variables r ( ) k , ∀k, , follow a gamma distribution and hence show data dependent noise levels.To account for this effect, we assume a signal dependent variance η n,k , which is fixed as follows Several reflectivity restoration algorithms can be used to obtain the guides r ( ) k , ∀k, , based on the considered imaging scenarios.Algorithms based on Poisson statistics can be used in the sparse photon regime [20], [45], [46], while other stateof-the-art denoising algorithms [47], [48] can be considered in dense photon regimes.In this paper, we consider three guidance methods.The first guidance intensity (denoted GI1) considers r , ∀k, which leads to a multi-scale generalization of the bilateral filter.Indeed, these multi-scale maps already contain filtering properties which will provide good performance in practice.The second guidance (GI2) considers the Poisson based reconstruction method [45] (used with authors defaults parameters) which is applied to each scale and wavelength of r

ML( ) k
, ∀k, to obtain r ( ) k , ∀k, .As a third guidance (GI3), we considered the state-of-theart learning based DnCNN denoiser [48], also applied to each scale and wavelength r ML( ) k , ∀k, .We finally note that reflectivity multi-spectral correlations are introduced through the depth weights, which are shared between all wavelengths.Additional correlations can be easily included through the weights V when building the reflectivity guides.

V. ESTIMATION ALGORITHM
We propose to use the MAP estimators for all parameters and hyperparameters x, D, R, M , , ψ.More precisely, the maximum of the posterior distribution in ( 13) is approximated using a coordinate descent algorithm [28], [29].This algorithm sequentially maximizes the conditional distributions associated with each parameter until convergence to a local minimum of the negative log-posterior.The algorithm's main steps are presented in Algo. 1 and described with more details in the following sections.Note that the resulting depth updates alternates between robust to outliers non-linear parameter estimation (line 11) and a filtering step (line 12), which are commonly observed steps in several state-of-the-art algorithms [22], [24], [25] and optimization algorithms [49].Note also that reflectivity and depth iterates are independent and can be run in parallel.Note finally that reflectivity updates are analytically obtained ensuring fast estimation., ∀k as in ( 6), (5) 7: Compute guiding weights W , V as in ( 14), (15) 8: Coordinate descent algorithm 9: while conv= 0 do 10: Update x n , ∀n using WMF in (17) 11: Update d ( ) , ∀ using threshold operator in (18) 12: Update using analytical mode in (20) 13: Update m k , ∀k using analytical mode in (21) 14: Update r ( ) k , ∀ using analytical mode in (23) 15: Update ψ using analytical mode in (25) 16: Set conv= 1 if the convergence criteria are satisfied 17: end while 18: Output: 19: x, M , , ψ

A. Updating x
The parameters of x are independent allowing parallel updating of x n , ∀n.Minimizing the negative-log of the conditional distribution reduces to This is a weighted median filter (WMF) which has several efficient implementations (e,g, [31]).Note that the solution of (17) will be non-negative provided that d ( ) n ≥ 0, which is ensured during initialization.

B. Updating D
The variables d N are independent and spatial correlation is introduced through the latent variable x.This is interesting as it allows the parallel implementation of d ( ) n , ∀n, with respect to n and .Straightforward computations show that the update d n is given by This is a generalization of the well known soft-threshold operator which can be efficiently solved.Note that the solution of ( 18) will be non-negative provided that x n ≥ 0 and d

ML( ) n
≥ 0 which is ensured during initialization.

C. Updating depth variance:
The conditional distribution of n is an inverse-gamma distribution given by whose mode is given by where N is the number of spatial neighbours.

D. Updating M
The conditional distribution of M is a normal distribution whose mean is analytically given by mn,k = ,n ∈νn v This equation highlights a weighted sum of the multi-scale reflectivity maps r, as for the bilateral filter.

E. Updating R
The parameters of R are independent allowing parallel updating of r where . The minimum is analytically provided by [50] r( )

F. Updating ψ
The conditional distribution of the reflectivity variance ψ n,k is an inverse-gamma distribution given by . The mode is analytically given by G. Background estimation Our algorithm assumes known signal counts, which can be obtained after removing background counts from observed histograms.In the presence of obscurants, the background can be non-uniform b n,t,k , i.e., in addition to pixels and wavelengths it also depends on time bins related to the depth dimension.Assuming a spatially homogeneous distribution of the obscurant, the background level can be assumed smooth.This means that after downsampling, y (L) n,t,k can be represented by the sum of a smooth function bn,t,k and a sparse signal due to target reflections.Unmixing these two signals is a common signal processing problem and can be solved using several tools, e.g., Robust PCA [51].In this paper, we only require an approximative estimate of bn,t,k and are more interested in efficient solutions.More precisely, we assume the background has the same temporal shape for all pixels and estimate this shape as follows bt,k = median y (L) n ,t,k (26) where n represent the indices of the lowest 10% values of y (L) n,t,k to only consider background and reject signal returns.Akin to [52], the noise level of each pixel is estimated using the median as follows The smooth background is then obtained by with bk = t bt,k /T .Knowing the background level, the approximate signal counts can be extracted as follows

H. Stopping criteria
Two criteria are considered to stop the iterative coordinate decent algorithm for depth and reflectivity.The first is maximum number of iterations.The second evaluates the estimated parameter values and stops the algorithm if the following condition is satisfied where i denotes the algorithm iterations and ξ = 0.001 is a threshold.
VI. RESULTS ON SIMULATED DATA This section evaluates the proposed algorithm on simulated data.The section first introduces comparisons algorithms and evaluation criteria.Then we analyse the robustness of the proposed algorithm with respect to sparse and highbackground regimes and compare it on a single-wavelength 3D Lidar data.Finally, we generalize the analysis to multiple wavelengths scenarios.All simulations have been performed on a Matlab R2020a on a computer with Intel(R) Core(TM) i7-4790 CPU@3.60GHz and 32GB RAM.

A. Comparison algorithms and evaluation criteria
To highlight the robustness and benefit of the proposed algorithm, it is compared to several state-of-the-art algorithms including: • The unmixing algorithm (UA) [10]: considers multiscale information for robust reconstruction of depth and reflectivity images.It assumes the presence of one surface on all pixels, and is used when analysing robustness to noise and photon-sparse regime imaging.• The RT3D algorithm [24]: assumes the presence of multiple surfaces per-pixel and is used when analysing robustness to noise and photon-sparse regime imaging.• The MUSAPOP algorithm [19]: assumes the presence of multiple surfaces per-pixel and is used when analysing multi-spectral Lidar data.• The Class.algorithm: estimates d ML , r ML k , ∀k as in ( 6), (5) from the observed histograms (without removing background) • The Xcorr.algorithm: estimates background level as in (28), then estimates d ML , r ML k , ∀k as in ( 6), ( 5), respectively (see lines 5-6 in Algo.1).Comparison results will be analysed qualitatively (by showing reconstruction scenes) and quantitatively using several criteria.The depth performance is measured based on the depth absolute error (DAE) measure DAE= 1 where N represents the number of pixels having a target, and d ref and d est are the reference and estimated depth maps with a target, respectively.Similarly, intensity is evaluated using the intensity normalized absolute error IAE= . In addition, we consider the metrics used in [19] to evaluate point clouds.More precisely, we consider the percentage of true detections as a function of the distance τ , where a true detection occurs if an estimated point of a given nth pixel has a reference point in its surrounding such that | dn − d ref n | ≤ τ .The sum of the estimated points that can not be assigned to any true point at a distance of τ are considered as false detections.Average normalized IAE is considered for intensity, where pixels with no or false detections are assumed to introduce an error of

B. Robustness to sparsity or background counts
This section evaluates the algorithm performance under different cases, including the photon sparse regime (low average photon-per-pixels) and low signal-to-background level (SBR), where average SBR= N n=1 rn N n=1 T bn .Simulations are performed on the realistic Art scene extracted from the Middlebury dataset 1 , as it is a cluttered scene used to evaluate many algorithms [10], [53] (see Fig. 2 (a)).An intensity image is first constructed using the luminance of the RGB image.The 283 × 183 depth and intensity images are then used to generate a 20ps time bin histograms of counts as in (1), while considering a real system impulse response (leading-edge of 3 bins and trailing-edge of 26 bins).The resulting cube of histograms is of size 283 × 183 pixels and T = 300 time bins.To investigate several scenarios, we generate multiple histogram cubes by varying the SBR level logarithmically in [0.01, 100] and the average photons-per-pixel (PPP) in [0.1, 1000] (this PPP combines signal and background counts, useful signal counts can be deduced from the PPP and SBR level).In addition, we consider two background shapes, a conventional uniform shape (i.e., b n,t,k = b n,k ) where the background level is the same for all time bins, and a gamma shaped background (i.e., b n,t,k = G(α, β) where G denotes a gamma distribution with parameters α = 2 and β = 30) 1 Available in: http://vision.middlebury.edu/stereo/data/often encountered when imaging through obscurants (such as underwater or through fog [8]).The proposed algorithm is considered with the following parameters L = 3 with q (2) = 3 × 3 and q (3) = 9 × 9, ζ = 2.7cm (i.e., 9 time bins), while considering the first depth and intensity guides (GD1 and GI1).It is compared with the Classical and Xcorr algorithms (matched filter before and after removing nonuniform background), and the UA algorithm whose depth and intensity regularization parameters were tuned to provide best DAE performance.Fig. 3 shows the log scale DAE performance of the considered algorithms when considering uniform (left column) and gamma shaped backgrounds (right column).All algorithms show good results for high SBR and PPP and the performance degrades when decreasing SBR or PPP or when considering a non-uniform background.The proposed algorithm is more robust as it shows the lowest DAE even for extreme cases (DAE≈ 0.01 for SBR=1 and PPP=1 photons).The UA algorithm presents second best results, and shows robust results.However, performance is slightly reduced for high SBR and PPP levels due to considering a Gaussian IRF instead of the asymmetric one used to simulate the data.The Xcorr algorithm is more robust than Class.which highlights the importance of the background removal step.Fig. 4 shows similar behaviours when considering the recovered intensity images, i.e., best robustness by the proposed algorithm followed by the UA algorithm.While all algorithms perform well for high SBR and PPP levels, it is worth noting that UA presented best IAEs in this case although its results tend to be over-smoothed (see Fig. 5).
In addition to depth and intensity maps, the proposed algorithm also provides their corresponding uncertainty maps (variance of the estimates), which help with decision making.Fig. 5 shows the depth and reflectivity maps together with their uncertainty maps for SBR=1 and PPP=10 photons for uniform background (i.e., 5 signal photons on average).It is observed that the proposed algorithm provides sharp depth maps due to to the use of 1 based sparsity inducing prior.Note that higher uncertainty is observed on low reflectivity areas, near object edges and on corrupted regions due to high-background levels.
An advantage of the proposed algorithm is that it can benefit from state-of-the-art algorithms and use their results as a guide.We investigate here the performance of the proposed algorithm when considering two depth guides (GD1 and GD2) and three intensity guides (GI1, GI2, GI3).We repeat the same experiment as above while fixing SBR=1 and varying PPP.Fig. 6 shows the DAE and IAE performance of the four variants, indicating an overall similar performance with a slight advantage for GD1 when compared to GD2.GI3 provides similar results as GI1, GI2 and is not represented for clarity.In what follows, we consider the GD1 and GI1 guides for all experiments.
Table I finally reports the computational time of the algorithms considered indicating fast performance when compared to the UA algorithm.Note that most computations of the proposed algorithm are wasted when building downsampled cubes necessary for background estimation, which are indicated by Xcorr times.It should be also noted that most operations of  the proposed algorithm are pixel-wise or bin-wise independent, and a much reduced computational time is expected by using parallel computing tools.

C. Evaluation on multispectral 3D Lidar data
This section analyses the performance of the proposed algorithm for multi-spectral Lidar imaging.We consider 3 wavelengths of the Art scene to generate three histograms of counts with the same realistic IRF as in Section VI-B while varying SBR and PPP levels.The proposed algorithm is considered with the same hyperparameters (as in Section VI-B), and is compared with the MUSAPOP algorithm (used  with the authors' parameters).The latter is designed to process multi-spectral data and delivers point clouds hence the use of probability of detection, and number of false detections to evaluate performance.Fig. 7 represents these two criteria for four algorithms when considering a uniform (cross marker) and gamma shaped backgrounds (circle marker) for two SBR levels and several PPPs at τ = 10 bins.The proposed algorithm presents best performance highlighting its robustness due to the efficient use of the multi-scale information.Fig. 2 shows an example of the obtained point clouds with different algorithms for uniform background.MUSAPOP is better than  Xcorr, but fails to process the noisy case where only one signal photon is present against 9 background counts on average (SBR=0.1,PPP=10 photons).The proposed algorithm presents best performance and is robust to noise.In addition, it does not join disconnected surfaces (due to the use of sparsity inducing Laplace prior for the depth) and shows sharp intensity values (due to the weighted and depth guided reflectivity reconstruction).Table I finally highlights the fast computational time of the proposed algorithm when compared to the MUSAPOP algorithm.

VII. RESULTS ON REAL 3D UNDERWATER DATA
The proposed algorithm is validated on real underwater Lidar data of a moving target (painted metal flange 13 mm thick, diameter of 70 mm, with 7 mm diameter holes) put at a stand-off distance of 1.7m from the end of the water tank nearest the sensor (see the target in Fig. 8 (a)).The data were acquired in lab settings using a CMOS Si-SPAD detector array based system acquiring binary frames at a rate of 500fps, with 1ms acquisition time per frame, 700 time bins  and 34ps per bin [4].The 128×192 pixels binary frames were pre-processed by building histograms of counts every 10ms (max of 10 counts per histogram).Different concentrations of a commercially available antacid medicine, called Maalox, were mixed with water to obtain varying scattering levels of the imaging environment.With high Maalox concentrations, the turbid water is highly scattering leading to a non-uniform background as shown in Fig. 9.
The proposed algorithm is considered using the hyperparameters of Section VI-B and the guides GD1 and GI1.
Results are compared with the Xcorr algorithm and the RT3D algorithm.The UA algorithm is not considered as it assumes the presence of a target in all pixels which is not satisfied in this case.Fig. 8 shows the 3D point clouds obtained with the different algorithms for clear water (b-c-d) and turbid water (e-f-g) 2 .All algorithms performed well in clear water.However, both RT3D and Xcorr performed poorly in turbid water due to non-uniform background affecting the data, and leading to the detection of a fake object in front of the true target.The proposed algorithm successfully eliminates the background counts and retrieve a good reconstruction of the target even under these extreme imaging conditions.In addition, the proposed algorithm also provides uncertainty maps for the estimated parameters, as indicated in Fig. 10.These maps show higher uncertainties when imaging through scattering water, and near object edges.More results when considering other frames at AL=1.2 are provided in video 1, video 2, and at AL=4.8 in video 3, video 4. The supplementary document [54] shows more results of the proposed algorithm when processing a real multi-spectral Lidar scene.

VIII. CONCLUSIONS
This paper presented a new robust Bayesian algorithm for the reconstruction of multispectral single-photon Lidar data.The algorithm exploited multi-scale information to improve depth and reflectivity estimates under extreme conditions due to low light level illumination or imaging through turbid media.The approach also provided uncertainty measures regarding the estimates which is crucial for decision making.Results on both simulated and real data highlighted the benefit of the proposed strategy when compared to state-of-the-art algorithms.Future work will generalize the proposed strategy to process multiple detections per-pixel as observed in object's edges or when imaging through semi-transparent surfaces.Current implementation was done in Matlab, and a computational improvement is expected by using parallel computing tools (such as GPUs) which is being investigated.

Fig. 1 .
Fig. 1.DAG for the observations, parameter and hyperparameters of the proposed model.The guiding weights appear in circles, the observations in a dashed box and the rectangular box gathers the joint multi-scale or multiwavelengths parameters.

Fig. 3 .
Fig. 3. Depth absolute errors (in log scale) obtained for the art scene with different algorithms w.r.t.SBR and PPP levels.(top-row) Class.algorithm, (second row) xcorr.algorithm, (third row) UA algorithm, (fourth row) proposed algorithm.(Left-column) data with uniform background, (right-column) data with gamma background.The lower DAE the better.

Fig. 4 .
Fig. 4. Normalized intensity absolute errors (in log scale) obtained for the art scene with different algorithms w.r.t.SBR and PPP levels.(top-row) Class.algorithm, (second row) xcorr.algorithm, (third row) UA algorithm, (fourth row) proposed algorithm.(Left-column) data with uniform background, (rightcolumn) data with gamma background.The lower IAE the better.

Fig. 5 .
Fig. 5.Estimated depth and reflectivity maps with the UA and proposed algorithms for SBR=1 and PPP=10 photons and uniform background.The proposed algorithm provides additional uncertainty maps.

Fig. 6 .
Fig. 6.Proposed algorithm using different depth and intensity guides on the art scene at SBR=1 and different ppp levels.(top) depth errors, (bottom) normalized intensity errors.GI3 provides similar results as GI1, GI2 and is not represented for clarity.

Fig. 7 .
Fig. 7. PD and false detections of the Class, Xcorr, Musapop and proposed algorithms for different SBR, PPP levels and background shapes, with an error distance of τ = 10bins.

Fig. 8 .
Fig. 8. (a) Target used for underwater imaging experiments.3D representations of underwater scenes with (b-e) Xcorr, (c-f) RT3D and (d-g) the proposed algorithm.(b-c-d) clear water with AL=1.2, (e-f-g) turbid water with AL=4.8 (these AL values are for transceiver to target, and not round-trip).

Fig. 9 .
Fig.9.Examples of the obtained histogram summed over all pixels for (top) clear water with laser power 1.2 mW, (bottom) turbid water with laser power 8 mW, when imaging the flange target located around the 500 time bin.The bottom curve highlights a non-uniform background (see the region 100 -400 bins).The shape of the peak is different in the two histograms because the target return is visible in different pixels during the two measurements, as the target moves across the field of view of the camera.

Fig. 10 .
Fig. 10.Estimated Depth values and uncertainty maps using the proposed method for (top) clear and (bottom) turbid water.

TABLE I AVERAGE
COMPUTATIONAL TIME (IN SECONDS) OF THE COMPARED ALGORITHMS ON 283 × 183 × 300 × K DATA CUBE GENERATED WITH A UNIFORM AND GAMMA BACKGROUND, WHERE K REPRESENTS THE