Variational Bayesian Learning for Decentralized Blind Deconvolution of Seismic Signals Over Sensor Networks

This work discusses a variational Bayesian learning approach towards decentralized blind deconvolution of seismic signals within a sensor network. Blind seismic deconvolution is cast into a probabilistic framework based on Sparse Bayesian learning developed for blind image deconvolution. The posterior distribution of the signals of interest is then approximated using a variational Bayesian method. Depending on a particular form of selected variational factors, the scheme is shown to generalize the state-of-the-art distributed seismic blind deconvolution algorithm. The algorithm operates by repeatedly alternating between two stages: (i) estimation of seismic source wavelet and (ii) reflectivity estimation. The wavelet is estimated in a distributed fashion using Alternating Direction Method of Multipliers. Based on this estimate, each sensor then locally obtains a sparse reflectivity estimate. Numerical evaluation with synthetic seismic data shows that the proposed method outperforms existing deconvolution algorithms in a high signal-to-noise ratio (SNR) region. In low SNR regime a higher sensitivity of sparse Bayesian learning regarding model mismatches in the estimated source wavelet is observed. Also, processing of seismic data generated with an acoustic wave equation showed that the proposed method is able recover the original reflectivities more accurately, with more distinct support of the reflectivities, as compared to existing methods despite present model mismatches. The algorithm is also applied to the estimation of real seismic data, and shows improved performance as compared to state-of-the-art estimation method.


A. MOTIVATION
Seismic surveys such as reflection or refraction methods use multiple seismic sensors or geophones to record the seismic waves reflected at underground layers. Based on received waves the subsurface of a given area can be reconstructed using signal processing and imaging techniques [1]. However, most of these algorithms assume a centralized operation, when all measurement data are available at one central entity. In this work we instead focus on a distributed approach towards subsurface reconstruction realized over a network of "intelligent" geophones that are equipped with computing and communication capabilities. The latter allows for a cooperative processing of measured data. Such an architecture is particularly relevant for autonomous exploration of a planet's subsurface such as that on Mars or Moon [2], where an autonomous operation is required and a central processing entity might not be available.
A common survey methodology in seismic exploration is a reflectivity survey. In such a survey a seismic source such as a sledge hammer or an air gun is activated. The generated waves are reflected at underground layers and then recorded at multiple geophones that are placed over an area of interest. This measurement process can be modeled by a single-inputmultiple-output system where a source signal is convolved with impulse responses of multiple reflectivity paths. Each such reflectivity path describes a structure profile of a path a seismic wave travels from the source to the receiver. The goal is to recover the hidden impulse response of the reflectivity profile by deconvolving the source signal and the reflectivity profile from geophone measurements. However, the seismic source (also called wavelet) is usually unknown, which poses challenges for the deconvolution. Hence, blind deconvolution algorithms are needed that recover both the source signal and VOLUME 4, 2016 the reflectivity profile.
In the past, a significant research work has been dedicated to blind deconvolution of signals for images processing (see e.g., [3]) as well as specifically for seismic signals (see e.g., [4]- [6] and references therein). The methods in general follow an iterative approach, where at first a model of an excitation signal is estimated, followed by the inference of the signal -the seismic reflectivity profile. In particular, sparse multichannel blind deconvolution (SMBD) methods have shown promising performance results for the reflectivity estimation. The recent work of [5] uses an iterative algorithm that alternates between wavelet and reflectivity estimation. The wavelet estimation is conducted in the frequency domain whereas the reflectivity estimation is done in the time domain to exploit sparsity in the reflectivity's impulse response. The sparsity is achieved through an 1 penalty, which leads to a LASSO-type optimization [7]. Although optimization of LASSO-type functionals does lead to a sparse estimate, such methods were found to be inferior to sparse Bayesian learning (SBL) [8], which results in a sparser support [9], and provides uncertainty estimates of the parameters.
Bayesian learning has been applied to blind deconvolution problems in the past. However, most works do not consider seismic signals and focus on the problem of image deconvolution as in [10], [11]; the transfer of the results for image processing into the seismic domain is not straightforward and can pose significant challenges, despite formal similarities between models. In [12], sparse multichannel deconvolution of seismic traces is conducted using block sparse Bayesian learning techniques and the expectation-maximization algorithm. However, this work assumes knowledge of the source wavelet and therefore is not a blind approach. Also typical blind deconvolution methods adopt a centralized operation. This contribution instead, builds upon and extends these ideas for a distributed operation in a network of geophones.

B. MAIN CONTRIBUTIONS
The main contribution of this work consists of deriving a decentralized blind deconvolution algorithm within a variational Bayesian framework and analyzing its performance for the deconvolution of seismic signals. In particular, the estimation of the source signal -the wavelet -is conducted in a distributed fashion such that this algorithm is applicable to sensor networks where no central entity is handling the estimation procedure.
The proposed methodology follows a blind deconvolution approach of [5] and [10] and extends our previous work [13], where the wavelet estimation is implemented in a distributed fashion followed by a local reflectivity estimation. In [13] the wavelet estimation is implemented using alternating direction method of multipliers (ADMM) algorithm [14]. Based on the estimated wavelet each sensor then solves an 1 -regularized problem locally to obtain a sparse estimate of the impulse response of its reflectivity. In this work, we cast the proposed blind iterative estimation scheme into a variational Bayesian framework, which on the one hand, provides a formal basis for the algorithm in [13], while on the other hand, allows for application of empirical Bayesian techniques to fix (or rather learn) the regularization parameters both for the wavelet as well as for the reflectivity estimation. Based on the variational Bayesian formulation, we derive a distributed wavelet estimation that enables each sensor in the network to obtain a wavelet estimate by exchanging data with neighboring sensors. For the reflectivity estimation, we employ sparse Bayesian learning locally at each sensor. SBL techniques also facilitate analytical inference of model parameters and account for posterior variance of the reflectivity estimate, which is difficult for the 1 -type constraints due to nonsmoothness of the posterior function. Note that the use of non-variational approaches, such EM-based estimation of SBL parameters with approximate message passing (AMP), as in e.g., [15], also provide analytical inference expressions for some of the parameters. 1 Yet due to the dimensionality of electrical impedance tomography problem addressed in [15], not all inference expressions (and in particular that the E-step of the EM algorithm) are computationally feasible, which requires the use of numerical approximations. The numerical complexity of the considered problem remains, however, tractable; moreover, the computation of the E-step (or of the corresponding variational approximation factor) can be avoided as in e.g., fast marginal likelihood SBL realisations [17], [18]. These aspects of the algorithm implementation are, however, left outside the scope of the paper.

C. NOTATION
Column vectors are represented as boldface lowercase letters, e.g., x, and matrices as boldface uppercase letters, e.g., X. Their transpose is denoted by (·) T . For some positive- The expectation operator is denoted by E{·}, or E q {·} when the context requires to explicitly state the probability density function (pdf) q over which expectation is taken. We denote the pdf of a Gaussian random vector with expectation a E{x} and covariance matrix B E{(x − a)(x − a) T } as N(x|a, B). Also, we denote the pdf of a Gamma random variable x as where a is a shape parameter, b is a rate parameter, and Γ(·) is the Gamma-function. The notation δ a (x) is used to denote a Dirac distribution over domain of x with a support a. We will also use notation log x ∝ e a to imply that x ∝ e a .

II. SYSTEM MODEL
Consider a network of J interconnected sensors/geophones each measuring a seismic trace at its corresponding position. We describe the network topology by a graph G = {J , E} with a set of nodes J = {1, 2, . . . , J} and a set of edges E = {(j, i)|j, i ∈ J , j = i}. With N j we denote a neighborhood of the sensor j -a set of all sensors directly connected to 1 Variational Bayesian methods generalize the EM-algorithm for a specific choice of approximating factors [16] sensor j, including sensor j itself. In addition, we assume that the network graph is undirected and strongly connected [19].
It is common to model the acquisition of seismic traces via a convolution of a seismic source wavelet w and a set of impulse responses h j , j ∈ J , that model the subsurface reflections between the source and the receiver j [1]. Note that the wavelet w is a common source signal for all sensors, while responses h j , j ∈ J , differ between sensors. The impulse responses h j , j ∈ J , are typically termed reflectivities. They naturally characterize the reflections occurring as a seismic wave passes between the source and the receiver via subsurface layers. It is thus the key to a quantification of the subsurface under investigation.
Unfortunately, due to the non-ideal form of the seismic source, the reflectivity profile of the subsurface is blurred, which makes a reliable subsurface reconstruction difficult. Therefore, the goal of seismic deconvolution is to remove the influence of the seismic source in the seismic traces in order to infer "a clean" reflectivity. However, neither source wavelet w nor the reflectivities h j , j ∈ J are typically known. Thus, blind approaches are required. In [5] it was suggested to implement a blind deconvolution approach as a sequence of two consecutive steps: a wavelet estimation and reflectivity estimation. In [13] we cast this approach in a distributed setting, where the wavelet estimation was solved using ADMM algorithm [14], followed by a "local" estimation of the reflectivities h j , j ∈ J . Both are then iteratively repeated until a certain convergence criterion is met.
In the following we will show that the algorithm in [13] can be seen as a solution to a variational Bayesian optimization of a posterior distribution of wavelet and reflectivity signals. Moreover, the variational setting allows one to extend the method in [13] and estimate the regularization parameters within the empirical Bayes approach. This is detailed in the following.

A. BAYESIAN FORMULATION OF THE SPARSE BLIND DECONVOLUTION
Each sensor j acquires a noisy seismic trace d j [l], which can be described by a noisy convolution between a length L W discrete time source wavelet w[l] and a length L R reflectivity h j [l], j ∈ J : where l = 0, . . . , L W + L R − 2. By defining (1) can be reformulated in several equivalent matrixvector forms as where H j ∈ R (LW+LR−1)×LW is the convolution matrix of the reflectivity h j at sensor j, W ∈ R (LW+LR−1)×LR is the convolution matrix consisting of wavelet w and n j ∈ R LW+LR−1 are additive noise samples. Note that the products W h j and H j w are absolutely equivalent. The third form h j * w with a slight abuse of a vector notation also indicates a convolution operation. Depending on the context, these forms will be used interchangeably to simplify notation. We will assume that noise samples n j , j ∈ J , observed by seismic sensors are statistically independent across sensors. Furthermore, we assume that each n j , j ∈ J , follows a normal distribution with zero mean and scaled covariance matrix τ −1 j I, where τ j , j ∈ J , is a noise precision parameter. To simplify further notation, let us now define D {d 1 , . . . , d J }, T {τ 1 , . . . , τ J }, and H {h 1 , . . . , h J } as a collection of variables associated with J sensors. Now, we can begin with formulating the probabilistic structure of the estimation model. Let us point out the following modeling is similar to that used in proposed in [10], [11] for blind image deconvolution. In the following it is adapted to blind seismic deconvolution in a distributed setting.

1) Probabilistic formulation of the model
Due to the assumption made for the additive noise terms n j , j ∈ J , we can write the likelihood function of seismic traces H, wavelet w, and noise precisions T as follows where p(d j |w, h j , τ j ) = N(d j |H j w, τ −1 j I). Let us now assume that the wavelet w likewise follows a Gaussian distribution, with zero mean and circular covariance matrix controlled by an unknown precision λ w . Specifically, we assume that In general, we can also treat the precision parameter λ w as an unknown random variable. As such, we also select a prior distribution p(λ w ) = Ga(λ w ; c, d) for it. The choice of a Gamma prior is motivated by the conjugacy [20] of the latter to p(w|λ w ), which will eventually simplify the inference of λ w , as we will see later. Now, we define a prior distribution for reflectivities h j , j ∈ J . As seismic reflectivities are known to be sparse [21], [22] -the nonzero entries reflect the propagation velocity jumps in the subsurface structure -the statistical model for h j , j ∈ J , should be selected to reflect this. To this end we model the prior p(h j ), j ∈ J , with a so-called scale mixture distribution [9]: where p(α j ) is called a mixing density and p(h j |α j ) belongs to a power-exponential family of distributions [23]. Note that parameters α j , j ∈ J , are also treated as unknown random variables. VOLUME 4, 2016 Finally, we endow the probabilistic model with a prior distribution p(T ) = j∈J p(τ j ) for noise precision parameters T . Similarly to parameter λ w , we select p(τ j ) = Ga(τ j ; s j , t j ), j ∈ J . Now, the probabilistic model underlying the estimation problem can be summarized with the following joint pdf where we defined A {α 1 , . . . , α J } to simplify notation. The blind deconvolution under sparsity constraints for reflectivities can then be posed as a maximization of the following posterior: In general the inference of the posterior in (7) cannot be done in a closed form and numerical approaches are needed.
Here we make use of variational Bayesian inference to approximate the posterior of interest with a proxy distribution q(w, H, λ w , T , A) which makes analysis tractable.

2) Variational Bayesian inference
Variational Bayesian inference (VBI) techniques are a class of global approximative inference approaches. They attempt to approximate an intractable posterior p(w, H, λ w , T , A|D) with a simpler pdf q(w, H, λ w , T , A). The latter is in general a "free" pdf function that is chosen to minimize the Kullback-Leibler divergence [16] KL(q(w, H, λ w , T , A)||p(w, H, λ w , T , A|D). Although the latter cannot be computed since the posterior of interest p(w, H, λ w , T , A|D) is unknown, the divergence can be minimized indirectly, by maximizing the lower bound on the log-evidence [16]: The approximating density q(w, H, λ w , T , A) can be further constrained to reduce the complexity. Specifically, we will adopt a mean field approximation [16], which implies that In other words, we assume independence of all the factors in the approximating distribution. Furthermore, we will constrain individual factors in (9) to parametric pdfs. This will allow us to cast variational optimization into a parametric one. Specific choices of individual factors will be discussed in the following.

B. DISTRIBUTED SPARSE BLIND DECONVOLUTION AS A VARIATIONAL INFERENCE PROBLEM
The maximization of the lower bound in (8) requires specifying (i) the form of the factors in (9) and (ii) the sparsifying prior p(h j ) in (5). Depending on a particular choice of the factors in (9) different inference schemes can be implemented. In particular, distributed sparse blind deconvolution (D-SBD) algorithm proposed in [13] can be shown to be an instance of the proposed variational optimization of (8), as we will show in the following.
Let us assume that p(h j ) in (5) is selected such that for some fixed parameter µ and i ∈ {1, . . . , L R }, j ∈ J . Then a direct integration of (5) shows that p(h j ) is given by which is a multivariate Laplace distribution for (independent) entries in h j . By marginalizing the posterior (7) over A using this result, we obtain a variational lower bound in the form Note that "direct" approach to a specification of the prior is referred to as Type-I sparse Bayesian signal reconstruction [9]. Now, an instance of the D-SBD algorithm from [13] is obtained by selecting all variational factors as Dirac measures defined over the corresponding domains. (12b) In [13], however, the factors q(λ w ) and q(τ j ), j ∈ J , are assumed to be fixed, with the supports of these factors being free parameters of the D-SBD algorithm. With this choice of variational factors, it then becomes easy to evaluate the bound (11) and maximize it with respect to q(w) and q(h j ), j ∈ J . Specifically, it can be shown that with respect to q(w) the bound (11) can be expressed as As we see, the variational bound is maximized, when the divergence KL(q(w)||p(w)) is minimized. Since q(w) is constrained to a space of Dirac measures, the minimum of KL(q(w)||p(w)) is obtained when support w of q(w) coincides with the location of the maximum ofp(w), or equivalently, with the maximum of logp(w). Evaluating logp(w), we find w as the solution to the following problem: where H j is a reflectivity convolution matrix composed of elements of h j -support of q(h j ).
Similarly, the bound in (11) with respect to q(h j ) can be expressed as Computing the latter, we find that support of q(h j ), j ∈ J , that maximizes the bound is found from The estimation of factors q(h j ), j ∈ J and q(w) are then done iteratively until convergence. This will lead to the maximization of the marginalized lower bound in (11).
Expressions (15) and (16) coincide with those obtained in [13]. A distributed solution to optimization (15) can be found using the ADMM algorithm, while (16) is a least absolute shrinkage and selection operator (LASSO) problem that can be solved individually by each agent, as was proposed in [13].
Constraining the variational factors in (12) to Dirac distributions does simplify numerical evaluation of the variational bound, yet it unnecessarily simplifies the variational approximation. In the following, we propose a more complete treatment of the inference problem with non-degenerate variational factors. Moreover, we include the estimation of q(α j ), q(τ j ), j ∈ J , and q(λ w ) into the algorithms.

III. BLIND SEISMIC DECONVOLUTION USING SPARSE BAYESIAN LEARNING
We will begin by specifying the model for the sparsity inducing prior p(h j ), j ∈ J . Here we will follow a Sparse Bayesian Learning (SBL) approach, where we select with α j = [α 1,j , . . . , α LR,j ] T . Note that in contrast to (10), here α j are precisions rather than variance parameters; within SBL they are referred to as sparsity parameters. As a mixing density p(α j ), j ∈ J , SBL selects a Gamma prior [8], [9]: Note that for each agent j ∈ J there is a single scale parameter a j and a rate parameter b j that define the prior p(α j ). Such formulation leads to p(h j ), j ∈ J , being Student's t-distributed [8], [9]. Although in general we can again follow a Type-I approach and marginalize the posterior over A, as we did before, this will lead to intractable variational inference. Instead, a Type-II approach, where parameters A are treated as unknown random variables and estimated alongside the other parameters, is computationally simpler, and thus marginalization is done implicitly; the same approach was also proposed for images [10].
We will now constrain individual factors in (9) to parametric pdfs to cast variational optimization into a parametric one. To this end we select where δ w (w) is a Dirac measure with a support w on the R LW -dimensional domain of wavelets. Note that in contrast to all other factors, q(w) is selected as a degenerated density, despite the fact that it is not difficult to show that the posterior for the wavelet is Gaussian and thus an optimal variational factor q(w) should be Gaussian as well. Yet since computation of this factor requires decentralized processing, it would imply communication of both mean and covariance matrix of q(w), which in general grows as O(L W 2 ). By constraining this factor to a set of Dirac measures, we thus ignore the impact of wavelet estimation uncertainty on the inference of other parameters and resort to maximum a posteriori (MAP) estimation of its value. This reduces the communication load on the network as only the support w needs to be communicated. This leads to an O(L W ) communication load. Additionally, this simplifies the evaluation of the bound in (8).
Now, the inference of factors (9) is done in sequential fashion, one factor at a time, with each factor selected so as to maximize the bound in (8) until convergence.

A. ESTIMATION OF q(w)
We begin with estimating the wavelet factor q(w). Evaluating (14) with variational factors (19a), (19b) and (19d) (see also Appendix A-A) we find that the divergence KL(q(w)||p(w)) in (13) is minimized when support w of q(w) is found as a solution to the optimization problem Observe now that the matrix Ω j (see (47) in Appendix A-A appears due to uncertainty in the reflectivity estimate q(h j ), specifically, due to its covariance. This uncertainty has an additional "smoothing" effect on the cost function in (20) in VOLUME 4, 2016 terms of a weighted 2 norm of the wavelet estimate. Due to the quadratic form of (20), its solution is then readily obtained as (21) from which the regularization effect of the reflectivity covariance S hj becomes apparent.

1) Distributed estimation of q(w)
As we see, the solution (21) requires cooperation between J sensors. To find this solution in a distributed setting we propose to use the ADMM algorithm [14], similar to that used in [24].
Let us introduce a latent variable z j ∈ R LW at each sensor j and reformulate optimization (20) as Note that in (22) we added the constraint w j = z i , ∀i ∈ N j that enforces a consensus solution for {w j } j∈J over the whole network. Hence, all estimates {w j } j∈J will converge to the same solution, i.e., the solution of (20). Applying ADMM algorithm to problem (22) we obtain the following iterative update equations: In the above the variable u ij is a Lagrange multiplier originating from the employed consensus constraint and ρ is the ADMM penalty parameter. Note that for the latent variable z j , j ∈ J , in (24) we swapped the order of indices i and j in the last term. This is valid since in the augmented Lagrangian cost function we have a double summation over all receivers j ∈ J . Therefore, swapping these indices only changes the order of the summation, not the result. We do this in order to allow for a minimization with respect to z j in closed form. Solving (23)- (24) we finally obtain Above equations enable a distributed estimation of the wavelet w at each sensor j in the network. To this end, each sensor needs to exchange variables w ij with its neighboring sensors in N j \ {j}. All other quantities are local information and therefore available to each sensor j.

B. ESTIMATION OF q(λw)
With the factor q(λ w ) we proceed in a similar fashion and reformulate the bound (8) Computing the expectation in (27) we find that Now, q(λ w ) is found so as to minimize the Kullback-Leibler divergence KL(q(λ w )||p(λ w )).
Inspecting (28) we notice that its right-hand side coincides with the logarithms of a Gamma pdf. As such, by selecting the parameters c and d of q(λ w ) as we ensure KL(q(λ w )||p(λ w )) = 0. From the properties of the Gamma distribution we can also compute the mean λ w of q(λ w ) as Note, that once support w of q(w) becomes known to the sensors, (29) can be computed "locally", i.e., no cooperation is required for this computation.

C. ESTIMATION OF q(τj)
With q(τ j ) we proceed in a similar fashion and simplify (8) as Using the fact that the likelihood p(D| w, H, T ) can be factored as p(D| w, H, T ) = J j=1 p(d j | w, h j , τ j ), we can evaluate (31) using the same steps as for the factor q(λ w ) (see Appendix A-B). As a result, Expression (32) is similar in its structure top(λ w ) in (28): it is the logarithm of a Gamma pdf. Therefore, by selecting parameters s j and t j of q(τ j ) as we will ensure that KL(q(τ j )||p(τ j )) = 0. The mean τ j of q(τ j ) used in e.g., (21), can then be computed as (34) Again, similarly to the estimation of q(λ w ) we see that once support w of q(w) becomes known to each sensor j, computing (33) does not require cooperation.

D. ESTIMATION OF q(hj) AND q(αj)
Following similar steps we can now estimate factors q(h j ) and q(α j ) by optimizing bounds where α j = [ α j,1 , . . . , α j,LR ] T is the mean of q(α j ) and From (35) we can see that since both p(d j | w, h j , τ j ) and p(h j | α j ) are Gaussian, logp(h j ) must be quadratic in h j . As such,p(h j ) must be Gaussian. The optimal pdf q(h j ) that will minimize the Kullback-Leibler divergence KL(q(h j )||p(h j )) should thus be selected such that its moments coincide with those ofp(h j ). These moments can be found by computing the first and second derivative of logp(h j ) in (35) with respect to h j , which leads to where A j = diag[ α j,1 , . . . , α j,LR ]. Expression (37) can be recognized as a regularized least squares estimate of the reflectivity, with α j playing the role of regularization parameters.
To compute q(α j ) we evaluate the expectation in (36), which results in which can be recognized as a sum of logarithms of Gamma pdfs with a shape parameter (a j + 1 2 ) and L R rate parameters should be selected as a product of Gamma distributions, with the shape and rate parameters of each factor q(α i,j ) set to respectively. Note that the mean of q(α i,j ), i = 1, . . . , L R , used in (37), is then computed as

E. INITIALIZATION AND SUMMARY OF THE ALGORITHM
Let us now discuss initialization of the proposed inference scheme and summarize the overall algorithm structure. In the following we will refer to the proposed blind deconvolution algorithm with sparse Bayesian learning as VBI+SBL.
We begin with specifying the prior distributions p(λ w ), p(α j ) and p(τ j ), j ∈ J . For both p(α j ), j ∈ J , and p(λ w ) we use non-informative priors, i.e., we set a j = b j = 0, ∀j ∈ J , i = 0, . . . , L R , and c = d = 0. However, in case of noise prior p( τ j ) we set non-zero values to parameters s j , t j , ∀j ∈ J . This is done to better account for the model mismatch in the estimated wavelet w in addition to additive measurement noise n j , j ∈ J . We observed that this improves empirical performance of the VBI+SBL deconvolution scheme, as compared to a non-informative choice of p(τ j ), j ∈ J . Figure 1 exemplary shows the Gamma pdf Ga(τ j |s j , t j ) for different prior parameter values. In particular, the ratio s j /t j determines the mean of the respective Gamma pdf and this was used as a guiding rule to select appropriate parameter values.
Beside prior parameters, we also need to initialize the variational factors in (19). We begin iterative estimation with a factor q(w), which implies that factors (19a)-(19d), or rather their parameters, have to be specified. For the noise precision and sparsity parameters we set, respectively, τ j = 1 VOLUME 4, 2016 and α j = 1, ∀j ∈ J , where 1 is an all-one vector of an appropriate dimension.
A bit more care is needed for initialization of factors q(h j ), j ∈ J . To obtain initial estimates of the reflectivities the authors in [5] proposed to use a peak locator for each seismic trace d j . Here we adopt the same approach and set the means of q(h j ), j ∈ J , to estimated peak locations of corresponding seismic traces. Note, however, that to obtain an appropriate initial estimate of the reflectivities we need to take into account the peak position of the wavelet w. Since the source wavelet w is modeled as a casual FIR system, its peak does not lie at the 0-th sample, but is delayed by l peak samples. Therefore, the peaks in the seismic trace d j are not aligned to the original peak positions, but rather shifted by l peak samples as well. Hence, if we apply a peak locator on the seismic trace d j the original peaks in h j will be shifted by l peak in the initial reflectivity estimate h j . This offset will eventually lead to a corrupted wavelet estimate. To enable an appropriate wavelet estimation, after applying the peak locator on the trace d j the located peaks are shifted back in time by l peak samples. We agree that in reality this information needs to be appropriately estimated, or assumed based on preliminary calibration studies. For validation purposes, however, we assume that l peak is known perfectly at this stage. The initial estimate of each reflectivity is then modified as follows: where the last l peak entries in h j are set to 0. Finally, we set the initial covariance matrix of the reflectivities to S hj = I and Ω j = I.
We now summarize the key steps of the proposed distributed VBI+SBL blind deconvolution scheme in Algorithm 1. As we see, it consists of two estimation stages: (i) a wavelet estimation stage that is conducted in a cooperative fashion by all sensors, and (ii) a reflectivity estimation stage that is done locally at each sensor without data exchange. Furthermore, note that the algorithm has inner and outer iterations, which we index by superscripts [m],[n] and (k), respectively. The outer iterations correspond to a cycle over variational factors. The inner ones, on the other hand, are iterative updates of some of these factors.

A. SYNTHETIC REFLECTIVITIES
In the following, we evaluate the performance of the proposed algorithm in numerical experiments using synthetic reflectivity data. To this end, we consider a network of J = 10 fixed sensors in a line topology where the outer sensors have two neighbours, and other sensors have from three to four neighbors depending on its position in the line array. We use synthetically constructed reflectivities to generate seismic traces at the sensors at a sampling frequency of f s = 500 Hz. As a source signal w we use a Ricker wavelet [25] with dominant frequency f R = 40 Hz, phase shift of 30 • and
Compute c, d ← (29) and λ w ← (30) Compute noise precision τ j ← (34) Update convolution matrix W j , ∀j ∈ J , using w Compute matrix Ω j ← (47) end for return Estimated wavelet w (K) j and reflectivity h (K) j duration T W = 0.1 s. Furthermore, we perturb the generated measurements for each sensor with additive white Gaussian noise with zero mean and variance σ 2 j , j ∈ J . To assess the impact of noise we also define signal-to-noise ratio (SNR) as SNR = 10 log 10 ||s j || 2 2 / L s σ 2 j , where s j ∈ R Ls is the noise-free seismic trace at sensor j. For each sensor j we assume the same SNR conditions, i.e., SNR j = SNR, ∀j ∈ J . Also, we will average the performance of the algorithms over 100 independent Monte Carlo runs to collect statistics.
The generated true reflectivities are exemplified in Fig As performance metric for the deconvolution algorithm, we will consider two measures: the Pearson correlation coefficient (PCC) and the earth mover's distance (EMD) metrics [26], [27]. For vectors x and x from the same signal space, the PCC is simply defined as PCC = |x T x|/ (||x|| 2 || x|| 2 ). A PCC value close to one indicates a high similarity between the two vectors; a value of 0 implies highest dissimilarity (i.e., orthogonality) of vectors. However, since the PCC is not explicitly taking into account the distance between correct and estimated peak location in the signal vectors, we additionally employ the EMD metric, which is a discrete version of the Kantorovich-Rubinstein metric for probability  distributions on metric spaces [28]. Since sparse vectors can be regarded as distributions, EMD will thus quantify similarity between sparse vectors more accurately.
In the implementation of the VBI+SBL algorithm we set the ADMM penalty parameter for wavelet estimation to ρ = 15 and also set s j = 10, t j = 10, ∀j ∈ J for the estimate of the noise precision p(τ j ). We also assume K = 10 outer iterations and M = N = 10 inner iterations for the wavelet estimation and reflectivity estimation each.
We will compare the proposed VBI+SBL to (i) a distributed sparse blind deconvolution (D-SBD) algorithm proposed in [13], which uses LASSO for recovery of a sparse signal, and (ii) sparse multichannel blind deconvolution using spectral projected-gradient (SPG) in [5]. Additionally, we include two further cases of the proposed deconvolution using SBL into the analysis. For both cases, we substitute VOLUME 4, 2016 wavelet estimation with a fixed function. In the first case, we use a wavelet that has been pre-estimated with the D-SBD at the corresponding SNRs, termed SBL SNR-adapted w in the following comparisons. In the second case, we use a wavelet estimate with a high PCC = 0.99. This approach we term SBL fixed w. This is done to test how sensitive the VBI+SBL to mismatches in the wavelet estimation. Figure 3 shows the PCC performance of reflectivity estimation as a function of SNR and in Figure 4 we show the corresponding PCC of the wavelet estimates. For D-SBD, the 1 -regularization parameter has been chosen such, that an overall good performance is achieved over the considered SNR range. As we see, VBI+SBL outperforms both D-SBD and SMBD-SPG in middle to high SNR regime, but degrades as SNR drops. Note that SBL learns optimal regularisation with a factor q(α j ), j ∈ J . As such, its reflectivity estimate is adapted over SNRs. On the other hand, D-SBD employs a LASSO estimator with a fixed regularization parameter for the 1 -norm and thus, lacks the ability to adapt it to changing SNR values. SMBD-SPG employs the spectral projected gradient that adapts the sparsity threshold based on the noise power. Nevertheless, we observe that it performs worse than the other two methods despite the fact that it conducts the deconvolution based on all the available network data at once in a centralized manner. By fixing the wavelet estimate in VBI+SBL, we see that performance of our proposed method improves: for SBL SNR-adapted w we observe performance improvement in middle-to-high SNR regime (see also in Figure 4). For the case of a nearly "perfect" wavelet estimate, VBI+SBL clearly outperforms D-SBD and SMBD-SPG over the whole SNR region. This implies that VBI+SBL is sensitive to a good choice of wavelet model w, yet it does provide a more accurate sparsity estimate as compared to 1 -based optimization when wavelet is accurately estimated.
Nonetheless, let us mention that the 1 -regularization in D-SBD algorithm has been optimized so that an overall good performance is achieved. In Figure 5 we additionally show the impact of this parameter in D-SBD performance. Specifically, we show bounds of achievable PCC performance with respect to variation of the regularization parameter. We can observe that especially in the low SNR regime the PCC performance of D-SBD varies significantly depending on the choice of λ 1 . If λ 1 is selected without care, VBI+SBL performs similar to the D-SBD also in the low SNR regime.
In terms of the EMD performance shown in Figure 6, we can observe a similar behavior of the algorithms as in Figure 3. However, in both cases with a pre-estimated wavelet used in VBI+SBL we see a performance increase as compared to D-SBD at the considered SNR range. Furthermore, for higher SNRs all SBL-based schemes outperform D-SBD and SMBD-SPG quite significantly. This again indicates more accurate sparsity estimation results. Also, note that both D-SBD and SMBD-SPG show a very flat EMD performance especially in the high SNR regime where SMBD-SPG outperforms D-SBD. Again here, the learning ability of SBL becomes particularly apparent. If we include maximum and minimum achievable EMD performance of D-SBD depending on the 1 regularization parameter, we can observe that the EMD varies significantly in the low SNR regime (see Figure 7) and around SNR = 10 dB, VBI+SBL falls in the same performance region as D-SBD.

B. SYNTHETIC DATA GENERATED WITH AN ACOUSTIC WAVE EQUATION
Let us now consider algorithm performance using data generated based on physical model of an acoustic wave propagating in a medium. Specifically, we use an acoustic wave equation to generate seismic measurements at the sensors. The acoustic wave equation is given by is the source term. We solve the equation (42) numerically using finite-difference method with absorbing boundary conditions to avoid reflections at the borders of the computational domain [29]. The solution -the computed wave field u(x, t) -is then sampled at the corresponding sensor positions x j , j ∈ J to obtain the seismic measurements at the sensors. Note that in this case the model (2) is only approximately valid, in contrast to the previous simulation case. This implies a higher model mismatch with the generated data. As subsurface we use a two-layer model in 2D with layer depths d 1 = 100 m, d 2 = 200 m and wave velocities v 1 = 1000 m/s, v 2 = 1800 m/s, respectively. We assume J = 20 sensors uniformly spaced on a line over a length of 200 m. As source term f (x, t) we choose a Ricker wavelet with f R = 20 Hz and phase shift of 30 • . We place the source on the surface at the center of the sensor line array. The subsurface model together with the source-receiver setup is depicted in Figure 8. Figure 9a shows the measured seismic traces at SNR = 20 dB as a function of the trace number, where we can clearly see the arrival of the direct wave and the reflections from the interface between the first and second layer. We apply both VBI+SBL and D-SBD algorithms with K = 10 outer iterations and M = N = 10 inner iterations for wavelet and reflectivity estimation, respectively. Furthermore, for VBI+SBL we set the ADMM penalty parameter for wavelet estimation to ρ = 15 and set the parameters for the estimation of p(τ j ) to s j = 10, t j = 50 for all sensors and for the estimation of p(λ w ) to c = 1, d = 50. The deconvolution results can be seen in Figure 9b and 9c. One can observe that both SBL and D-SBD successfully recover the reflectivity peaks at the corresponding arrival times. However, VBI+SBL tends to obtain more distinct reflectivity peaks compared to D-SBD. This can be attributed to a better EMD performance of the proposed VBI+SBL algorithm. Especially in the range of 0.3 s to 0.35 s VBI+SBL clearly recovers the significant reflections from the first layer while suppressing unwanted reflections in the measurement data. Figure 10 depicts the same results but with wiggle plots instead of images. Here, it can be more noticeably observed that VBI+SBL recovers a cleaner structure of the seismic signal, as can be seen on the zoomed portion of the reconstructed signal.

C. REAL SEISMIC DATA
For the last evaluation example, we use real seismic data from a reflection survey provided by the National Petroleum Reserve, Alaska (NPRA) Legacy Data Archive by USGS (1976), Line ID 31-81 in SEG-Y format available at http: //certmapper.cr.usgs.gov/data/apps/npra/. The data has been recorded with a sampling frequency of 250 Hz. For our purposes we use measurement data from 100 traces between 1 s and 1.6 s. For the geophone constellation we assume a line array where each geophone has up to six connected neighbors to each of its side. Again, we compare VBI+SBL to D-SBD.
For both VBI+SBL and D-SBD we employ K = 5 iterations with M = N = 10 iterations for wavelet and reflectivity estimation, respectively. Furthermore, for VBI+SBL we set ρ = 1 and s j = 100, t j = 0, c = 1000, d = 0. These parameter settings were found to give a satisfactory deconvolution result. The original real seismic data and the deconvolved results by D-SBD and VBI+SBL can be seen in Figure 11. For both D-SBD and VBI+SBL one can observe clear deblurring and removing of ringing artifacts of the recorded seismic data. In particular, VBI+SBL obtains a much cleaner image with sharper features compared to D-SBD. This observation coincides with the spikier deconvolution result of VBI+SBL seen in Section IV-B. These results show that VBI+SBL provides satisfactory results also for data from real seismic surveys.

V. CONCLUSION AND FUTURE WORK
This paper presents a distributed probabilistic framework for blind deconvolution of seismic signals under sparsity constraints. The probabilistic formulation of the inference problem allows applying a variational Bayesian inference to efficiently approximate the posterior distribution of the parameters of interest, thus accounting for uncertainty of parameter estimates. Depending on a particular choice of variational factors an instance of the distributed sparse blind deconvolution (D-SBD) algorithm, proposed in [13], can be obtained. Furthermore, the variational formulation also allows for extending the algorithm with an automatic choice of regularization factors that are learned using empirical Bayesian method along with the other parameters within the estimation framework. This represents an extension of the previous method. Specifically, (i) instead of 1 -penalization used in [13] we apply sparse Bayesian learning for a sparse reconstruction of the reflectivity profile, (ii) we learn unknown additive noise variances at sensors, and (iii) adaptively regularize wavelet estimation using a probabilistic version of ridge regression.
The estimation algorithm includes two key steps, repeated iteratively: The first step is a distributed wavelet estimation, which is done cooperatively over the network of sen- sors/agents using the ADMM algorithm. In the second step, the reflectivity profiles of the subsurface are estimated locally based on the estimate of the wavelet; this computation step does not require any cooperation.
For performance evaluation we compared the proposed VBI+SBL algorithm with D-SBD and SMBD-SPG using Pearson correlation coefficient (PCC) and earth mover's distance (EMD) metrics for estimated wavelet and reflectivity signals. The analysis of the methods in simulated environments have shown that the VBI+SBL learning scheme, which includes estimation of both wavelet and reflectivity signals, as well as regularization parameters performs well in middle to high SNR regime. In low SNR regime the coupled estimation of wavelet and reflectivities apparently leads to a propagation of estimation errors between these two stages, which degrades the deconvolution performance. Although in evaluations the D-SBD algorithm was found to perform better in the low SNR regime, it requires manual tuning of regularization parameters which is not necessary for VBL+SBL. When not selected properly, D-SBD and VBI+SBL can perform similarly in the low SNR regime. Furthermore, for D-SBD the EMD metric flattens out in high SNR regime, which indicates that further, more precise tuning of regularization parameters is needed. In case of VBI+SBL this is not observed, since the parameters are automatically tuned.
Furthermore, we also used synthetic seismic data generated by solving the acoustic wave equation. This produces more realistic testing data, yet at the same time the data generated in this way does not exactly follow the signal model used for deconvolution. In general, we see that both methods produce an acceptable performance, with VBI+SBL resulting in more accurate signal sparsity. However, VBI+SBL seems to be more sensitive to the model mismatches. This can be explained by a higher number of available degrees of freedom and higher nonlinearity/non-convexity of the optimization problem. This consecutively leads to a higher sensitivity to algorithm initialization. Additionally, we tested VBI+SBL with real seismic data obtained in a reflectivity survey. Compared to D-SBD our proposed method achieves a sharper reflectivity image with higher amplitudes showing the effectiveness of SBL for the reflectivity estimation. Furthermore, ringing effects from the original measurement data are successfully removed. Hence, also for the decentralized blind deconvolution of real seismic data the VBI+SBL provides a satisfactory solution.
Finally, we conclude that in practice it seems reasonable to combine D-SBD and VBI+SBL, where the former can provide an initialization, while our proposed algorithm can be used to fine-tune signal estimates. For future work, we will investigate such combination in more detail. Applying our proposed method to real seismic field data is another objective to be addressed in the future. .

APPENDIX A COMPUTATION OF THE VARIATION LOWER BOUNDS
To evaluate the bound (8) we compute log p(D, w, H, λ w , T , A) and ignore constant terms.
log p(D,w, H, λ w , T , A) = const+ (43) L W 2 log λ w − 1 2 λ w w 2 + (c − 1) log(λ w ) − dλ w To evaluate the variational lower bound, the expectation of (43) is evaluated with respect to the Markov Blanket [16] of the variable being estimated, as will be detailed in the following.

A. VARIATION LOWER BOUND WITH RESPECT TO q(w)
For estimation of q(w) we need to evaluate (14). By inserting (6) into (43) and retaining the terms that depend on w we obtain logp(w) ∝ e const− 1 2 E q(λw)