Correlated Sparse Bayesian Learning for Recovery of Block Sparse Signals With Unknown Borders

We consider the problem of recovering complex-valued block sparse signals with unknown borders. Such signals arise naturally in numerous applications. Several algorithms have been developed to solve the problem of unknown block partitions. In pattern-coupled sparse Bayesian learning (PCSBL), each coefficient involves its own hyperparameter and those of its immediate neighbors to exploit the block sparsity. Extended block sparse Bayesian learning (EBSBL) assumes the block sparse signal consists of correlated and overlapping blocks to enforce block correlations. We propose a simpler alternative to EBSBL and reveal the underlying relationship between the proposed method and a particular case of EBSBL. The proposed algorithm uses the fact that immediate neighboring sparse coefficients are correlated. The proposed model is similar to classical sparse Bayesian learning (SBL). However, unlike the diagonal correlation matrix in conventional SBL, the unknown correlation matrix has a tridiagonal structure to capture the correlation with neighbors. Due to the entanglement of the elements in the inverse tridiagonal matrix, instead of a direct closed-form solution, an approximate solution is proposed. The alternative algorithm avoids the high dictionary coherence in EBSBL, reduces the unknowns of EBSBL, and is computationally more efficient. The sparse reconstruction performance of the algorithm is evaluated with both correlated and uncorrelated block sparse coefficients. Simulation results demonstrate that the proposed algorithm outperforms PCSBL and correlation-based methods such as EBSBL in terms of reconstruction quality. The numerical results also show that the proposed correlated SBL algorithm can deal with isolated zeros and nonzeros as well as block sparse patterns.


I. INTRODUCTION
Block sparsity has been observed for signals in a wide range of applications, such as the cluster structure of scatterers in radar images [1], [2], [3], fetal ECG [4], ultrasound signals [5] and so on.The structured sparse model can be naturally exploited by including further the dependencies among sparse coefficients, such as the correlations between coefficients or dependence of the sparsity patterns.Under noisy environments or with very compressive measurements, algorithms properly leveraging such an underlying structure could achieve a robust recovery compared to their counterparts which merely exploit the sparsity.
A number of algorithms have been proposed for block sparse signal recovery when the block partition is known a priori, including greedy pursuit algorithms like Model-based Compressive Sampling Matching Pursuit (CoSaMp) [6], Block Orthogonal Matching Pursuit (Block OMP) [7], and regularized convex optimizations, such as group Lasso [8], group basis pursuit [9], mixed 1 / 2 programming [10] and block sparse Bayesian learning (BSBL) [11], [12].These algorithms require knowledge of the cluster pattern (block partition) a priori.However, prior knowledge of the block partition of sparse coefficients is practically unavailable.To address this problem, a particular structure is imposed on the support of sparse coefficients in the first category of algorithms [13], [14].For instance, the Clustered Sparse Solver (Cluss) algorithm in [13] employs a hierarchical Bayesian "spike-and-slab" prior model to encourage the sparseness and promote the cluster patterns simultaneously.However, since the resulting posterior distribution of the Bayesian cluster sparse model cannot be analytically derived, Markov chain Monte Carlo (MCMC) sampling [15] has to be employed for Bayesian inference.Similarly, the Boltzmann machine is employed on the support of sparse coefficients in [14] to model the dependencies and an approximate model of the maximum a posterior (MAP) estimator is used to estimate hidden variables with exhaustive search.
In another category of algorithms, different block sparsity priors are imposed directly on the sparse coefficients.Algorithms such as extended block sparse Bayesian learning (EBSBL) [12], pattern-coupled sparse Bayesian learning (PCSBL) [16], cluster structured sparse Bayesian learning (CSBL) [17] and total variation regularized sparse Bayesian learning (TVSBL) [18] are evaluated under this category.EBSBL is an extension of the block sparse Bayesian learning algorithm which is designed for known block partitions.In EBSBL, it is assumed that the nonzero blocks are arbitrarily located and their size is unknown.Then the signal is partitioned into a number of overlapping and fully correlated blocks with user-defined block size.By expanding the overlapping blocks to a non-overlapping block structure, an extended set of fully correlated blocks is introduced for the unknown sparse coefficients.Based on this block structure, an expanded sensing matrix is constructed by adding redundant columns to the original sensing matrix.Similarly, the unknown coefficient vectors introduced for each block are concatenated as an augmented vector.Thereafter, the measurements are expressed as a linear combination of the expanded measurement matrix and concatenation of the block vectors.Then, the problem can be effectively solved by the traditional BSBL algorithm to find the augmented block vector.Finally, the unknown sparse coefficients can be computed by using the relation between the original sparse coefficients and the blocks.PCSBL, on the other hand, introduces a pattern-coupled hierarchical Gaussian prior for each coefficient involving its own hyperparameter and those of its immediate neighbors to exploit interactions between neighboring coefficients.A suboptimal solution is attained for the hyperparameters; however, the performance of PCSBL heavily depends on a proper selection of the hyperparameters.Extensions of PCSBL to two-dimensional cases are further addressed in [19], [20].CSBL takes on a similar formula as the pattern-coupled prior used in PCSBL without relying on the hierarchical distribution over the hyperparameters.As a result, no proper hyperparameter selection is required in contrast to PCSBL.For both the PCSBL and CSBL, the expectationmaximization (EM) is developed to learn the hidden variables and the unknown parameters.Lastly, in TVSBL, a block SBL method has been developed inspired by total variation (TV) denoising [18].
In the mentioned category, algorithms such as EBSBL, PCSBL, CSBL, and TVSBL exploit the EM algorithm in update rules.Instead of EM, the variational Bayesian inference can be exploited.As an alternative to PCSBL with EM, [21] develops an algorithm using variational Bayesian inference, and it has a noticeable performance for the MIMO channel estimation problem.Ref. [22] introduces a method that exploits variational Bayesian inference instead of EM, which can be considered an alternative version of EBSBL (BSBL).Although it performs similarly to BSBL, it is faster than BSBL as it is a covariance-free algorithm.Ref. [23] also presents both EM and variational Bayesian inference methods for Kalman smoothing, reporting that they have similar performance, but the variational Bayesian method is slower due to the high number of iterations required for convergence.Although there is no clear consensus on which family of methods is faster or has better performance, we prefer EMbased update rules for a fair comparison with the existing methods.However, the variational Bayesian method can also be exploited as future work.
In the second category, where different group-structured priors are imposed on the sparse coefficients, only a few existing algorithms consider intra-block correlation, i.e., the correlation among the elements within each block.In practical applications intra-block correlation widely exists in signals, such as physiological signals [4] and images [5].In this work, we review several algorithms that explore and exploit intra-block correlation to improve performance.These algorithms are based on block sparse Bayesian learning (BSBL) and extended block sparse Bayesian learning (EBSBL) [12].However, BSBL requires knowledge of the block partition and EBSBL suffers from several key drawbacks leading to high computational complexity and coherence, and a larger dictionary matrix.Note that high number of unknowns deteriorates the performance of EBSBL.
In this work, a new algorithm dealing with the problem of an unknown block partition of the correlated signal is proposed to alleviate the challenges of recently reported methods.This work is motivated by the disadvantages of EBSBL, where the interactions among neighboring coefficients are implicitly modeled by a linear transformation of the artificially constructed augmented vector.A new structured sparse prior can be derived based on the underlying relationship between the correlation matrices in the augmented EBSBL model and the original signal model.The proposed algorithm uses the fact that immediate neighboring sparse coefficients are correlated.It is also inspired by PCSBL [16] and CSBL [17] in the sense that it considers the relation between neighboring sparse coefficients.However, it is different than PCSBL and CSBL since they do not leverage the existing correlations among the data.We naturally exploit these correlations instead of relating the hyperparameters.In this work, we only focus on the correlations between immediate neighbors.Hence, the unknown correlation matrix has a tridiagonal structure, which is different than the diagonal correlation matrix used in conventional SBL algorithms [24], [25].The proposed algorithm with a tridiagonal correlation matrix is a simple extension of the classical sparse Bayesian learning algorithm which has a diagonal correlation matrix.On the other hand, our algorithm improves the group sparsity performance as it does not ignore the correlation with the neighbors.
While classical SBL assumes the sources are uncorrelated, EBSBL assumes the different blocks share a common correlation structure [12].In this work, we claim that if there is an intra-block correlation in the data, modeling an interelement correlation can already trigger a grouping effect.In other words, without building a new block-based data model, assuming inter-element correlation in the classical SBL model already promotes group sparsity.It is also shown that there is a grouping effect even when the true data does not have intra-block correlations but only contains the block structure without correlations.
There are three contributions of this paper, which can concisely be summarized as follows.First, a new structured sparse recovery algorithm is proposed, which can be considered a better alternative to EBSBL.The algorithm avoids the high dictionary coherence in EBSBL and also reduces the unknowns of EBSBL, making it computationally more efficient.Second, we provide an analysis of the relation between our algorithm and a particular case of EBSBL.The intuitions given here can be used to enhance the approximate update rule for the proposed algorithm with a tridiagonal structure.The third contribution of our paper is the ability of the proposed algorithm to tackle irregular sparsity patterns where the sparse vector contains both block sparse and isolated coefficients.Once there are isolated zeros and nonzeros in the data, the group sparsity algorithms might not perform well as their assumptions enforce only the grouping effect.However, the proposed algorithm is also able to deal with isolated zeros and isolated nonzeros as well as block sparse patterns.
The rest of the paper is organized as follows.Section II provides a review of classical sparse Bayesian learning and extended block SBL algorithms.A tridiagonal correlation based prior on the sparse coefficients is derived from the classical SBL algorithm in Section III.Section IV discusses the relationship of the proposed method to EBSBL.Comparisons of the proposed method with the state-of-the-art are shown in Section V. Conclusions are drawn in Section VI.

A. NOTATION
Throughout the paper, bold symbols in small and capital fonts are used for vectors and matrices, respectively.||x|| 2 2 denotes the l 2 -norm of vector x.x i denotes the i-th block of x.And the i-th element of x is either denoted by x(i), (x) i or x i .Furthermore, (A) i j and A i j represent the element in ith row and jth column of an A matrix.For matrix A, A H and A −1 denote the Hermitian and the inverse of the matrix, respectively.tr(A) is the trace of a matrix A. Notation diag(A) denotes a column vector composed of the diagonal elements of a matrix A. rank(A) denotes the rank of matrix A. |A| is the determinant of the matrix A. CN(.) denotes the multivariate complex Gaussian distribution.

II. REVIEW OF SBL ALGORITHMS
Sparse signal recovery problems attempt to recover the unknown sparse coefficient vector s ∈ C n from noisy and distorted measurements z ∈ C m .More specifically, we consider the model where n ∈ C m is the additive white noise and A ∈ C m×n is the measurement matrix with m n.A block structure in s is commonly observed in practice, where elements of s tend to be nonzero in multiple groups with unknown block sizes and arbitrary locations.Let us first give a detailed review of the considered structure in classical SBL.Such a detailed review is given to show the relation between our method to the classical SBL.Then we provide a brief review on the EBSBL [12].

A. CLASSICAL SBL
Using Bayesian inference to solve the linear problem in (1) involves determining the posterior distribution of the complex amplitudes s from the likelihood and prior distribution.The conditional probability density function (PDF) for z given the sources s is complex Gaussian with noise variance σ 2 : The unknown coefficients s l are assumed to be independent across different coefficients l and to follow a zero-mean complex Gaussian distribution: Then the variances of the elements of s can be stacked into γ = [γ 1 , . . ., γ n ] T and we get with = diag(γ ).When the variance γ l = 0, then s l = 0 with probability 1. Hence the sparsity of the model is controlled by the hyper-parameters γ.It has been shown that such a model enforces sparsity.By using the likelihood in (2) and the prior in (4), the posterior PDF of s can be found using the Bayes rule conditioned on γ and σ 2 and neglecting the denominator Since both p(z|s; σ 2 ) in ( 2) and p(s; γ ) in ( 4) are Gaussians, their product ( 5) is Gaussian with posterior mean μ s and covariance s given by and where the covariance matrix of the measurements z is given by Here, to estimate the and σ 2 , we perform expectationmaximization (EM) to maximize p(z; , σ 2 ).The actual EM formulation proceeds by treating the s as a hidden variable and then by maximizing with respect to the hyperparameters and σ 2 to find their estimates, where E s|z; ,σ 2 [.] denotes an expectation with respect to the posterior distribution of s.By substituting the joint distribution p(z, s; , σ 2 ) = p(z|s; σ 2 )p(s; ) into (9) we obtain Ignoring the terms independent from , we can estimate the by maximizing and using the fact that s H −1 s = tr( −1 ss H ) and E [ss H ] = s + μ s μ H s we attain the following expression: This function is also called as Q function.Since we have a diagonal , we can maximize this function only for the diagonal elements in .Hence, we can take the derivative as follows: Then, the closed-form solution for γ l is given by Note that in classical SBL, the sources are assumed uncorrelated.

B. EXTENDED BLOCK SBL (EBSBL)
The BSBL method is proposed to solve the group sparsity problem with a known block partition [12] in which both the block size and the border locations of the blocks are known.Group sparsity is enforced by considering a separate covariance matrix per block, with a common structure over the different blocks and potentially a specific structure within every block (e.g., Toeplitz).The extension of this method, the extended BSBL (EBSBL) deals with unknown block partitions.EBSBL considers an extended set of hidden blocks x i with hidden block size h for i = 1 . . .g (g = n − h + 1 is the number of blocks).More specifically, the unknown signal s is represented as where E i ∈ R m×h contains an identity matrix from the ith row to the (i + h − 1)th row and zeros for the other entries.This idea allows for blocks of size h with unknown border locations but it can also handle blocks that might have a size different from h as discussed in [12].Under (15), the model in ( 1) can be written as follows: where 16) becomes a block sparsity problem with a known block partition and is solvable by BSBL.Specifically, x is assumed to follow the given distribution: where 0 = diag(β 1 B, . . ., β g B) and where each block satisfies the parameterized multivariate Gaussian distribution of p(x i ; β i , B) = CN(0, β i B) with β i determining the degree of block sparsity.Then we can find the MAP estimate of x using the given formulas in ( 6), ( 7) as follows: After iteratively finding the hyperparameters, finally, the estimate of the unknown signal s is given by ŝ EBSBL is designed to cope with block-sparse recovery under the assumption of an unknown block partition.However, it suffers from several disadvantages.First, it leads to a higher computational complexity, as the augmented vector x is of size h × (n − h + 1), which is almost h times the size of the original signal s.More importantly, since the expanded measurement matrix is constructed by adding redundant columns to the original measurement matrix A, dictionary will be of high coherence.It affects the efficiency of the sparse coefficient estimation in SBL [17].

III. CORRELATED SPARSE BAYESIAN LEARNING ALGORITHM
As we mentioned before, classical SBL assumes the sources are uncorrelated, whereas EBSBL assumes the different blocks share a common correlation structure, which includes intra-block correlation or not.Related to the latter, it has been shown that if the data contains an intra-block correlation, the performance of EBSBL improves if it is taken into account.On the other hand, the performance of EBSBL ignoring intra-block correlation does not change with the amount of intra-block correlation in the data.The claim we make in this work is that if there is an intra-block correlation in the data, just modeling an inter-element correlation can already trigger a grouping effect.In other words, assuming inter-element correlation in the classical SBL model already promotes group sparsity without the need for building a new block-based data model first.Experimental results (shown later on) show that this grouping effect is even there when the true data has no intra-block correlations.
In this section, we, therefore, discuss how the classical SBL can be extended for correlated sources, where we assume for simplicity that only neighboring elements are correlated.

A. PRIORS ON THE SOURCES
In this section, the complex coefficients s l , which were assumed to be independent and uncorrelated in the classical SBL, are assumed to be correlated with their immediate neighbors.In other words, s has the following distribution: with the following tri-diagonal structure for : Hence, we assume to have nonzero elements in the tridiagonal elements of , ll where l = l, l = l + 1, l = l − 1, by ignoring the other elements of .Note that 's diagonal elements represent the power of the coefficients and ll ≥ 0. When the variance ll = 0, then s l = 0. Hence, the sparsity of the model is controlled by the diagonal elements of .For that problem, the likelihood is given by which is similar to the formulation in (21).Based on the likelihood in (23) and the prior of s in (21), it is easy to show that the posterior of s is a Gaussian with mean and covariance where is a tridiagonal matrix but the inverse −1 does not have a simple structure as in the diagonal case.However, a fast inverse of this tridiagonal matrix is available with recursive methods [26].

B. DISCUSSION
Once we insert ( 22) into (24), while attaining μ s or ŝ, we can see how the structure of affects the relation with the neighboring elements.For the computation of μ s , we can see that in each row there are contributions from the sub-diagonals (the correlations with the neighbors) in the multiplication of the and Here, in addition to ll , both l (l−1) and l (l+1) contribute to (μ s ) l .
For the correlated block sparse data, the correlations with the neighboring elements (correlations in the sub-diagonals) become nonzero inside the group, and they become zero outside the group or for the corner elements of the group.The nonzero elements on the sub-diagonals enforce the neighboring elements to be nonzero in the group.Likewise, the corners of the groups can be clearly separated as the diagonals for the corner elements are nonzero, but the correlation with the neighboring zero element which is in the sub-diagonal is zero.If we focus on a single nonzero element surrounded by zeros (isolated nonzero element), its autocorrelation is nonzero, but the elements in the sub-diagonal are zero.Similarly, for a zero element inside a nonzero group (isolated zero elements), the correlations in the sub-diagonal become zero and are not affected by the neighboring nonzeros.
Once these elements of the are used in (24), it can be seen that the contributions are only between consecutive nonzero elements.If there is a zero neighbor, then it does not have any contribution.Furthermore, if the nonzero element is a corner element or an isolated nonzero element, it also does not have a contribution to the zero neighbors.Only the nonzero elements in the group contribute to each other as given in (24) and provide a grouping effect.As a result, the proposed algorithm does not have any exponential decay around the corners of the groups or isolated zero and nonzero coefficients.Hence, unlike pattern coupling approaches [16], [17], the proposed algorithm tackles both isolated nonzero and zero elements in addition to the block sparse patterns.

C. ESTIMATION OF
By following the derivations from ( 9)-( 12) to derive the EMbased update rule for , we attain the minimization function for .Since we assume a structure over , we can minimize the function in (12) only for the tridiagonal elements in (22).Note that we have complex-valued data and is Hermitian symmetric but not symmetric, and thus each entry in the matrix is considered as a unique entry.Hence, we can take the derivative of all tridiagonal entries and set them to zero, leading to Note that while has a particular tridiagonal structure, −1 does not have a simple particular structure, unlike the diagonal version of .Hence, finding a closed-form solution to this problem is tricky as the ll terms are entangled.Alternatively, we can use iterative algorithms to maximize (12), such as gradient-ascent [27] or fixed-point iterations [28].However, they might be time-consuming as we need to take the inverse of several times during the update steps of the iterative algorithms.Instead of solving the problem with iterative methods, we propose an approximation for the solution of .
To propose an approximation for the tridiagonal , we first consider the update rule for without any structure, and then try to relate this to the solution for the tridiagonal .If we consider all the correlations in without any structure, the derivative of ( 12) with respect to is expressed as follows: Setting this derivative to zero, we obtain the following closedform solution Here ¯ notation is used for the full .However, using all the correlations does not lead to a sparse solution.In classical SBL, the sparse coefficients are generally assumed to be uncorrelated and result in a diagonal .Note that the estimated in (14) corresponds to the diagonal of ¯ and is also the closed-form solution to the problem in (13).Likewise, in the tridiagonal case, although it is not a closed-form solution to the problem in (26), we can use the following approximation as a solution for the tridiagonal elements.Note that this solution contains the elements in the diagonal and main subdiagonals of ¯ .The intuition behind such an update rule is using the neighboring correlations that come from the full correlation matrix ¯ .Since extracting a tridiagonal submatrix from the correlation matrix ¯ preserves the relation between neighboring elements, we embrace this approach.However, the convergence of such an update rule cannot be guaranteed as the proposed matrix is not guaranteed to be positive definite anymore.Then, log(| |) might be undefined, and it might become complex valued at certain points, and it cannot guarantee the increment of the Q function in (12) during EM steps.
To guarantee the positive definiteness of and obtain a generalized update rule for , we multiply the sub-diagonals by a parameter β to reduce their values: Still, the correlation between neighboring elements is preserved which causes the group sparsity thanks to the relation between neighboring zeros and nonzeros.For values of β ∈ [0, 0.5], we empirically observe that stays positive definite and the Q function (12) does not result in complex values.
Here, the most efficient β is observed to be 0.5.Although the update rule proposed for the sub-diagonal elements is intuitive, it provides a significant performance improvement for block sparse and isolated elements.
To provide theoretical bounds of β for the positive definiteness of , we use the following proposition from [29]: Hermitian symmetric tridiagonal matrix with diagonal entries positive and real.If then t is positive definite.Using this proposition, we can introduce the following theorem.Theorem 1: For (30) is positive definite.
Proof: First, we prove this for the second part of (30) which includes μ s μ H s .Assuming that μ s = c, we have the following where tridiag(.) is defined as the extraction of the tridiagonal part of a given matrix and multiplication of the subdiagonals by β.For β 2 < 1 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Now, we generalize this approach for any positive definite matrix.Note that all positive definite matrices can be written as CC H = m i=1 c i c H i which is a summation of multiple rankone matrices.Here, c i is the ith column of C. As tridiag(.) is a linear operator, it can be written as follows: Since tridiag(.)results in a positive definite matrix for each c i c H i , and the summation of positive definite matrices is also a positive definite, tridiag(CC H ) is positive definite.
Note that s is positive definite for a positive definite .Therefore, tridiag( s ) is positive definite.As a final step, it can be concluded that, is positive definite since it is a summation of positive definite matrices.
Remark 1: From Theorem 1, − guarantees the positive definiteness of .To choose a safe boundary for β, we keep it in the range [0,0.5] for values of n > 50.Note that we keep β positive so as not to change the sign of the correlation between the neighboring elements.Remark 2: Note that it is important to show the proposed update rule increases the Q function ( 12) in every iteration.Because then we can draw from the generalized EM theory [30]: any hyperparameter update rule that ensures that the Q function is non-decreasing in each EM iteration will ensure convergence of the EM iterations to a local maximum or saddle point.Once we consider the case β ∈ [0.0.5] for (30), we empirically observe that the Q function in (12) increases in each iteration, and it has a higher increase in each EM iteration compared to the one in (14).Although the theoretical convergence proof is unavailable, we have never encountered a case where the algorithm's convergence is not satisfied with a high number of trials of simulations and various problem models.
Remark 3: The choice of β seems slightly important for the recovery performance, as demonstrated by our simulation results.Although our simulations suggest that choosing a non-zero β mostly improves the performance compared to the setting with β = 0, the best choice of β appears to be around 0.5.However, for cases where the signal structure is unknown, the β parameter might be adjusted for the structure of the data by setting it to another value in the range β ∈ [0, 0.5].
Initialize σ 2 = 1, diag( ) = 1, min = 0.001, J max = 100 while j < J max and min < do The second equality can be derived as follows: Then, we set the derivative of (35) with respect to σ 2 to 0 and we obtain the update for σ 2 as The estimate of s is finally given by μ s in ( 6) by iteratively calculating the ll 's in (29) and σ 2 in (37) till convergence.
The iterative steps of the proposed algorithm are given in Algorithm 1.For the convergence, we use the stopping criterion that the maximum absolute error of two successive estimates of s is smaller than a threshold, or the number of iterations exceeds the maximum number of iterations.

E. COMPUTATIONAL COMPLEXITY
The computational complexity of the proposed algorithm is similar to the classical SBL, PCSBL, and CSBL.Here, the main computational task at each iteration is to calculate the covariance matrix s as it requires computing the inverse of an n × n matrix.By using the matrix inversion lemma [31], this matrix inversion can be converted to an m × m matrix inversion.Hence the computational complexity is of order O(m 3 ).However, for the computation of the tridiagonal , there is a slight increase in the computational cost, but its effect on the overall computational complexity is negligible.

IV. RELATION TO EBSBL
To show the relation between our method and EBSBL, we consider EBSBL with h = 2 and hence g = n − 1.In such case, , where (.) shows the entry indice in x i .Different than EBSBL, where there is a single B for each block, we use different B i s for each block.We then have E [x i x H j ] = δ i, j β i B i (δ i, j = 1 if i = j; otherwise δ i, j = 0).Note that, to avoid ambiguities, we also take β i = 1.In our approach, we assume each s i is correlated to the neighboring elements s i−1 and s i+1 and ignore the other correlations.By interpreting the E [ss H ] in (22) in terms of the B i matrices our entries in the tridiagonal matrix are given as follows: (11)   i + B (22)   i−1 (12)   i where B (.)   i corresponds to the (.)th entry of the B i matrix.Here, the intermediate terms in The equations in (38) can also be written as follows: (11)   i + B (22)   i−1 B (12)   i B (21)   i B (11)  i+1 + B (22)   i ⎤ ⎦ (39) where s i: j represents the elements of s from i to j.Therefore, our model can be interpreted as an alternative to the EBSBL for h = 2 when there are separate correlation matrices B i for each group.
To show the equivalence between the MAP estimates of ŝ in EBSBL and in the proposed method, we give an examination of the updating rule of ŝ in (20) that comes from (18), and the updating rule of ŝ in (24), respectively.Rewriting the first part of ( 18) and (24) as (11)  1 a H 1 + B (12)  1 a H 2 B (21)  1 a H 1 + (B (22)   1 + B (11)  2 )a H 2 + B (12)  2 a H 3 . . .B (21)  n−2 a H n−2 + (B (22)  n−2 + B (11)  n−1 )a H n−1 + B (12)  n−1 a H n B (21)  n−1 a H n−1 + B (22)  n−1 a B (11)  1 a H 1 + B (12)  1 a H 2 B (21)  1 a H 1 + B (22)  1 a H 2 . . .B (11)  n−1 a H 1 + B (12)  n−1 a H n B (21)  n−1 a H 1 + B (22) where a i represents the ith column of the A matrix.It is straightforward to see that A H = n−1 i=1 E i 0 H . Now, we need to show the equivalence of the second part of ( 18) and (24), which is given by (σ 2 I N + 0 H ) −1 Z and (σ 2 I N + A A H ) −1 Z.Using the simple diagonal structure of 0 we attain the following: (a i B (11)   i a H i + a i+1 B (21)   i a H i + a i B (12)   i a H i+1 + a i+1 B (22)   i and using the tridiagonal structure of we obtain (a i B (12)   i a H i+1 + a i+1 (B (22)   i + B (11)  i+1 )a H i+1 + a i+2 B (21)   i It is easy to see the equivalence of the expressions in (42) and (43).Therefore, the MAP estimates of ŝ are the same for both algorithms.
The difference between the algorithms can be seen in the update rules for B i in EBSBL and the proposed algorithm.The update rules for EBSBL in [12] are given as follows: Normally, in EBSBL, β i is also learned during the iterations but once we assume β i = 1 we cancel that step and only have the iterations for B i and σ 2 .Our update rule can be considered a counterpart of the case without averaging the B i s in EBSBL as follows: where x i ∈ C 2×2 corresponds to the ith diagonal block in x in (19) and μ x i ∈ C 2 is the ith block of μ x in (18).We already showed that μ s = n−1 i=1 E i μ x .However, now we cannot obtain the tridiagonal part of μ s μ H s , which is included in our update rule in (29), by the overlapping block diagonal sum of blocks in μ x μ H x in the way that we obtained (39).Another difference between our algorithm and EBSBL is in the update of x in (19) and s in (25) due to the inversion terms.While Besides, the outmost inverse term entangles the relationship between x and s .
Note that the number of unknown variables in EBSBL when h = 2 is almost two times the one required in the proposed algorithm.The reduced number of unknowns enhances the performance of the proposed algorithm.Lastly, the proposed method has around O(h 3 ) = O( 23 ) times lower computational complexity than EBSBL.

V. NUMERICAL RESULTS
In this section, we conduct numerical experiments to evaluate the performance of the proposed algorithm in comparison with the existing literature.The performance of the algorithms is examined for both synthetic and real data.The benchmark algorithms include SBL [24], BSBL [12], EBSBL [12], CSBL [17], PCSBL [16], and the proposed method.For EB-SBL, we use EBSBL-BO: the bound-optimization presented in [12], as it is used as a reference method for comparison and it is stated that it has a similar performance to EBSBL-EM: the expectation-maximization method.

A. SYNTHETIC DATA AND SYNTHETIC SYSTEM MATRIX
The measurement matrix A is randomly generated with each entry independently drawn from a normal distribution, and the columns are normalized to unit norm.Likewise, the nonzero coefficients of s are drawn from a complex normal distribution.Complex Gaussian white noise is added with a signal-to-noise ratio of SNR(dB) = 20 log 10(||As|| 2 /||n|| 2 ).
We use the success rate and the support recovery rate under the noiseless case for performance evaluation.On the other hand, the normalized mean squared error (NMSE) is used under noisy cases.The NMSE is calculated by averaging the normalized squared errors ||s − ŝ|| 2  2 /||s|| 2 2 .The success rate is defined as the percentage of successful trials in a total of T independent runs.A successful trial is defined as one with NMSE being less than 10 −3 .A total number of T = 100 independent trials are conducted.Besides, for the identification of the true support of sparse signals, we consider the "pattern recovery success rate".Similar to the regular success rate, it is the ratio of the number of successful trials to the total number of independent runs.However, each trial is considered successful if the support of the block-sparse signal is recovered.A coefficient whose magnitude is less than 10 −2 is assumed as a zero coefficient for the calculation of the pattern recovery success rates (but not for the regular success rate).
For the proposed method, = 0.001 and J max = 100 in Algorithm 1 are used in our experiments.Similarly, for PCSBL and CSBL, we used the same values.On the other hand, the probability distribution relies on hierarchical parameters a and b in p( |a, b) in PCSBL.It should be noted that the choice of the hyperparameter a of PCSBL affects dramatically the algorithm's performance [17].The parameter a is always set to 0.5 to achieve its best performance in the following experiments.Also, the parameter b = 10 −10 .To make a fair comparison, the block size h is first set to be 2 for both BSBL and EBSBL.However, we also consider h = 4 for BSBL and EBSBL, as it is used for comparison in the literature [12], [16].Note that the algorithms are modified to handle the complex data.
For the proposed algorithm, we use the update rule in (30) with β = 0.5.For this, we observe the behavior of the Q function in (12) during EM iterations both for SBL and the proposed method with different β values.For a sample realization, the objective function increase during EM iterations for all of them is shown in Fig. 1 In the numerical simulations, sparse signals with dimensionality of n = 100 and K = 25 nonzero coefficients are partitioned into L = 5 arbitrary blocks with random sizes and arbitrary locations.For these arbitrary groups, we generate them in the same way with [16].Here, the group sizes are likely to be higher than two and the nonzero groups are apart from each other.Hence, in this setting, the chance of there being isolated zeros and nonzeros among the groups is very small.The sources are chosen to be both uncorrelated and correlated which matches our tridiagonal correlation assumption.The correlated sources are created as s = R 1/2 w where w is complex random noise with unit variance.We choose where c = 0.3.R is chosen to be a tridiagonal correlation matrix; hence, only the neighboring correlations are considered.As a second setting, the sources are chosen to be uncorrelated to see the robustness of the algorithms with such a setting and c = 0.

1) PERFORMANCE IN NOISELESS ENVIRONMENTS
The success rates of the exact recovery of different algorithms the noiseless case (SNR = 100 dB) are provided from the viewpoint of the size of the measurements m for the purpose of comparison.Fig. 2 gives the success rates of different algorithms against the size of the measurements m and the sparsity level K for both correlated and uncorrelated data, respectively.Simulation results with correlated complex-valued data in Figs.2(a) and (b) show that the proposed method outperforms all other methods in terms of success rate.Note that the main counterpart of our method is EBSBL (h = 2) and there is a significant difference between EBSBL and the proposed one in terms of success rate.Furthermore, our method has lower computational complexity.Surprisingly, CSBL's success rate  is lower than other algorithms.It has been reported that CSBL performs similarly to PCSBL [17].It still has good performance in terms of NMSE, but since the preselected threshold is set at 10 −3 , its success rate is low.If we select it as 10 −1 , the success rate of CSBL would be higher.However, for the noiseless case, 10 −3 seems to be a good choice to evaluate the performances of the algorithms.On the other hand, BSBL might not be a suitable algorithm to test the performances of the data with varying block sizes and varying block partition locations.While BSBL performs well with the data that might fit into the considered block partition assumption, it does not perform well with the other data.With uncorrelated data, again the proposed method has a higher success rate than the existing literature.Compared to the correlated case in Fig. 2, the EBSBL method has lower performance with uncorrelated data.This is expected as it directly uses the correlations.However, surprisingly our method's and PCSBL's performances did not change significantly.While we exploit the correlations of the neighboring elements, PCSBL assumes them uncorrelated and enforces the statistical dependence between sparsity patterns.Although these two methods utilize different motivations for correlated neighboring coefficients, they seem not to be affected by the level of the correlation.
In the support recovery rate graphs in Fig. 3, the performances of the algorithms are similar to the success rate performances given in Fig. 2.Although we observe small variations over the methods, again the proposed method outperforms the existing literature.

2) PERFORMANCE IN NOISY ENVIRONMENTS
The performances of different algorithms in terms of the NMSE against m and the sparsity level K are given in Fig. 4.Under noisy environments, the NMSEs of all algorithms consistently decrease as the size of the measurements m increases and the sparsity degree K decreases as shown in Fig. 4, where PCSBL with a = 0.5 achieves the lowest NMSE and proposed method achieves lower performance than the best performance of PCSBL.Note that the tested data consists of sparse groups and the chance of isolated zeros and nonzero elements in the data is very small.Different from the noiseless case, CSBL is better at dealing with noise than other correlation-based methods.However, in noiseless cases, it does not provide a perfect estimation of s while other methods can produce a very low reconstruction error and high success rate.
Illustrative examples of sparse coefficient recovery of different algorithms with the size of the measurements m being   40 are given in Fig. 5, respectively.PCSBL and our proposed method provide the most accurate estimates of the original sparse coefficients with fewer measurements, especially for those significant elements inside blocks.By closely looking at CSBL and PCSBL, we observe smooth decay around the corners of the groups.This effect is stronger in CSBL due to the modeling difference between CSBL and PCSBL.On the other hand, although the proposed method is able to reconstruct sharp edges, the proposed one and other correlation-based methods suffer from reconstructing some off-group elements.This is possibly arising from the effect of the noise correlation.In CSBL and PCSBL, we observe smooth edges and the boundaries of the groups are indistinct but we do not observe reconstructed elements that are not close to the groups.
In addition to testing with group sparse data, we test our algorithm for a new setting where each group has an isolated zero element inside the group.For such a scenario, the proposed method achieves the best performance in terms of NMSE with 20 dB SNR as shown in Figs.6(a) and (b) for varying numbers of m and K respectively.The a random design is not realistic to evaluate the performance of these algorithms.Hence, we tested and compared their performance for a simple DOA and amplitude estimation problem.Here, we consider an array with various numbers of elements.The DOAs are on an angular grid [−90 : 0.5 : 90] • , and m = 361.The noise is modeled as i.i.d.complex Gaussian.Here, we examine a scenario with K = 25 random sources in L = 5 random groups.DOA groups collect s l values having random complex amplitudes.The sources are chosen to be correlated and c = 0.5.
The performances of different algorithms in terms of the NMSE against the size of the measurements m and the sparsity level K are given in Figs. 9 and 10 for 20 dB SNR and 10 dB SNR, respectively.With such a realistic setting where the system model A is a realistic matrix, the best performance is attained by the proposed algorithm, especially for the 20 dB SNR case.The performance of the proposed algorithm is still comparable to or better than the state-of-the-art under the 10 dB SNR case.Note that the DOA matrix A has a higher coherence compared to a randomly generated A matrix.Illustrative examples of the sparse coefficient recovery of different algorithms with the size of the measurements m being 40 are given in Fig. 8. Here, the proposed method provides the most accurate estimates of the original sparse coefficients.By closely looking at CSBL and PCSBL, we again observe a smooth decay around the corners of the groups.On the other hand, the proposed method is able to reconstruct sharp edges.

VI. CONCLUSION AND DISCUSSION
In this paper, we have proposed a correlated sparse Bayesian learning algorithm for block sparse signals with arbitrary block sizes and locations under the Bayesian framework.This is a simpler alternative to EBSBL and we explain the underlying relationship between the proposed method and a particular case of EBSBL.The proposed algorithm uses the fact that immediate neighboring sparse coefficients are correlated.Unlike the diagonal correlation matrix in conventional SBL, the unknown correlation matrix has a tridiagonal structure to capture the correlation with neighbors.Due to the entanglement of the elements in the inverse tridiagonal matrix, instead of a direct closed-form solution, an approximate solution is proposed.The sparse reconstruction performance of the algorithm is evaluated with both correlated and uncorrelated block sparse coefficients.Results of comprehensive simulations demonstrate that the proposed algorithm outperforms CSBL and PCSBL and other correlation-based methods such as EBSBL in terms of reconstruction quality.The numerical results also show that the proposed correlated SBL algorithm is capable of recovering signals with both block patterns and isolated coefficients.
(b), and the reconstructed data for different methods are shown in 1(a) with a noiseless case.Note that the proposed method further increases the value of the objective function compared to classical SBL for values of β ∈ [0, 0.5].Since the highest increment is observed with β = 0.5 among several realizations, we select β = 0.5 to test our algorithm.

FIGURE 1 .
FIGURE 1.(a) Magnitudes of reconstructed group sparse data with K = 25, m = 35 and n = 100 for SBL and the proposed method with noiseless data and (b) the value of the Q function for SBL and the proposed one.

FIGURE 2 .
FIGURE 2. Success rate performance comparison of the sparse Bayesian learning algorithms with correlated (a) and uncorrelated (c) noiseless data for different sizes of the measurements m; with correlated (b) and uncorrelated (d) noiseless data for different sparsity levels K.

FIGURE 3 .
FIGURE 3. Support recovery rate performance comparison of the sparse Bayesian learning algorithms with correlated (a) and uncorrelated (c) noiseless data for different sizes of the measurements m; With correlated (b) and uncorrelated (d) noiseless data for different sparsity levels K.

FIGURE 4 .
FIGURE 4. NMSE performance comparison of the sparse Bayesian learning algorithms with correlated (a) and uncorrelated (c) noisy data (20 dB SNR) for different sizes of the measurements m; with correlated (b) and uncorrelated (d) noisy data (20 dB SNR) for different sparsity levels K.

FIGURE 5 .
FIGURE 5. Magnitudes of the reconstructions of SBL algorithms with group sparse noisy data (20 dB SNR).

FIGURE 6 .FIGURE 7 .
FIGURE 6. NMSE performance comparison of the SBL algorithms with the data that has isolated zeros for different (a) size of the measurements m; (b) Sparsity level K and NMSE performance comparison of the data with mixed groups, isolated zeros, and isolated nonzeros for different (c) size of the measurements m; (d) Sparsity levels K under 20 dB SNR.