A High-Resolution and Low-Frequency Acoustic Beamforming Based on Bayesian Inference and Non-Synchronous Measurements

Beamforming is a powerful technique to achieve acoustic imaging in far-field. However, its spatial resolution is strongly blurred by the point spread function (PSF) of phased microphone array. Due to the limitation of array aperture and microphone density, the PSF is far from Dirac delta function, so that it is difficult to obtain a high-resolution beamforming image at low-frequencies (e.g.500-1500Hz). This paper proposes a Bayesian inference method based on Non-synchronous Array Measurements (Bi-NAM) so as to refine the PSF and break through the beamforming limitation for low-frequency source imaging. Firstly, by sequentially moving prototype array at different positions, the non-synchronous measurements can get a sizeable synthetic aperture and high density of microphones. The synthetic cross-spectrum matrix (CSM) can significantly improve the beamforming performance. To confine the approximation error of synthetic CSM and the uncertainty of forward model, as well as the noise interference, a Bayesian inference based on joint maximum a posterior (JMAP) is proposed to solve an ill-posed inverse problem. A Student-t prior is employed to enforce the sparse property of acoustic strength distribution. The background noise can be adaptively modeled by the Student-t distribution, which is related to some of the typical symmetric distributions. Then the hyper-parameters in JMAP inference are efficiently estimated by the Bayesian hierarchical framework. Through experimental data, the proposed Bi-NAM approach is confirmed to achieve high-resolution acoustic imaging at 1000Hz and 800Hz, respectively, even under the Laplace noise interference.


I. INTRODUCTION
Acoustic source imaging has widespread applications such as city-noise mapping, industry noise monitoring and mechanical fault diagnosis, etc. The spatial position and strength distribution can be visualized by back-projecting the acoustic pressure measurements of phased microphone array. In the far-field, the Conventional BeamForming (CBF) is a fast and robust method to get an intensity map of acoustic distribu-The associate editor coordinating the review of this manuscript and approving it for publication was Stavros Ntalampiras .
tion [1]- [4]. However, the spatial resolution of the CBF map is strongly blurred by the spatial impulse response or point spread function (PSF) of phased microphone array [5], [6]. Due to the limitation of array aperture and microphone density, the PSF is far from an ideal impulse response, which should be as sharp and abrupt as the Dirac delta function. In this sense, the CBF map can be regarded as a convolution result of the PSF (modeled by a convolution kernel) and a ground-truth image of source intensity. In fact, the PSF spectrum can be expressed as a spatial low-frequency filter.
As for the small aperture-array and low-frequency sources, the cut-off frequency of PSF spectrum becomes so narrow that it sacrifices both high and low frequency information of the acoustic source. Meanwhile, the spatial shape of PSF distribution becomes so fat and smooth that it not only makes the CBF map quite ambiguous due to the convolution effect, but also causes the PSF spatially-shifted, which cannot be easily modeled by a single convolution kernel for different source positions.
To improve the spatial resolution of the CBF at low frequencies, the fundamental approach is to clean up the PSF influence or make the PSF close to Dirac delta function. There are at least two ways to follow.
One way is the soft-calibration-based method, which can attenuate the blurred effect of the PSF directly from the CBF map without changing the topology of the microphone array. In this way, soft-calibration methods can be generally classified into three categories: (1) Spatial filtering methods [2]- [7], such as adaptive beamforming based on the Minimum Variance Distortionless Response (MVDR), and the eigenvalue decomposition of Cross-Spectral Matrix (CSM) based on the MUltiple SIgnal Classification (MUSIC). (2) Deconvolution-based methods [8]- [10] such as CLEAN-SC based on the sparsity constraint, the Deconvolution Approach for the Mapping of Acoustic Sources (DAMAS) and its extensions. (3) Regularization-based methods [11]- [23] such as Tikhonov regularization for denoising, Total Variation (TV) regularization for artifact flaw removal, compressive sensing (CS) beamforming for high-resolution, and Bayesian inference approaches for super-resolution. Though the filtering methods can efficiently improve the CBF resolution, the selections of filter parameters need the knowledge of the acoustic source and noise interference. Deconvolution methods can get a high spatial resolution without knowing the source priors. Still, the deconvolution is prone to cause artifact flaws if the PSF is not known, and sometimes it is sensitive to strong background noise and not fast to converge. By adding the sparsity constraints, the sparse regularization methods are capable of obtaining much higher resolution and fewer artifacts than the deconvolution in the case of strong noise interference. But regularization parameters should be tuned carefully. Sometimes they are not adaptive to scenario changes or non-Gaussian background noise.
The other way is the non-synchronous measurement beamforming method, which can get a large aperture of the synthetic array and high density of microphones, and make the PSF more ideal than that of the prototype array. Nonsynchronous measurements can be implemented in two steps. Firstly, a series of multi-angle observations can be obtained by sequentially moving the prototype array at the different positions and scanning the same source plane. But due to the non-synchronous measurements, the phase relationships between consecutive snapshots from different array positions are missing, and it results in the missing items at the synthetic cross-spectral matrix (CSM), as Yu et al. [24] pointed out. In other words, in this large dimension of synthetic CSM, its diagonal blocks consist of small-size CSMs of prototype array measurements at different array positions, but other non-diagonal blocks are the missing items. Therefore, in the second step, the synthetic CSM can be completed to improve the beamforming performance. This synthetic CSM completion problem can be solved by using the property of the low-rank matrix and the continuity of the acoustic fields. However, even if the acoustic sources are ideal and stationary monopoles, it is inevitably introducing interpolation errors during the filling completion of synthetic CSM.
Most of these drawbacks can be overcome by Bayesian inference methods [15], [20], [25]. It can adaptively estimate both unknown random variables and unknown model parameters by applying the Bayes' rule in updating the probability law, in which, a posterior probability can be obtained from the likelihood and prior models. And the likelihood can be derived from the forward model using measured data. The prior models can be assigned according to prior information on the unknowns. The priors serve to promote useful regularizations on ill-posed inverse problems caused by non-synchronous measurements. For example, the priors on acoustic sources can introduce valuable information such as sparsity property of monopole source distribution, so that the optimal estimation can be derived from many possible solutions due to ill-posed inversion. Moreover, to describe the forward model uncertainty, both the model approximation error and array noise interference can be modeled by a proper probability density function, so that the likelihood probability function can be derived to build up a mathematical bridge from priors to posteriors. Finally, hyper-parameters (including source strength, prior model parameters, likelihood model parameters, and related hidden variables) can be estimated alternatively under the Bayesian optimization criteria such as Maximum A Posterior (MAP), minimum Kullback-Leibler (KL) divergence, etc. Though Bayesian inference methods can breakthrough the CBF limitation, it still requires tremendous calculation for optimization and convergence.
The contributions in this paper are (1) to propose Bayesian inference method based on Non-synchronous microphone Array Measurements (Bi-NAM) in order to refine the PSF and achieve high-resolution imaging at low-frequencies; (2) to develop Student-t distribution as sparsity-enforcing prior so as to promote the spatial resolution. Meanwhile, the background noise is also modeled by Student-t distribution which can be adaptively fit for Gaussian and non-Gaussian noise interference such as Cauchy or Laplace noise. (3) to implement the Joint Maximum A Posterior (JMAP) based on alternate iteration optimization in the Bayesian hierarchical framework so as to efficiently solve an ill-posed inverse problem.
This paper is organized as follows. The forward model based on non-synchronous measurements is presented in section II. The proposed Bayesian inference is described in section III. Section IV provides experimental results at low-frequency localization to validate the proposed method. Finally, section V summarizes this paper and gives further prospects.

II. FORWARD MODEL OF NON-SYNCHRONOUS MEASUREMENTS A. NON-SYNCHRONOUS MEASUREMENTS
This paper mainly considers the 2D imaging of acoustic sources. For a static and stationary source in the far-field in Fig. 1, the acoustic pressure distribution at the observed position vector x and time t can be given as: where p ( x, t) denotes distribution function of acoustic pressure in the far-field and s ( x, t) denotes source distribution function in the planar plane. c is acoustic propagation speed in the uniform media, normally c = 340m/s in air. ∂ 2 (·) ∂(·) 2 denotes the second partial derivative. According to the superposition principle of the acoustic field, the measured acoustic pressure at an arbitrary position can be expressed as follows: where dV y denotes the volume infinitesimal at the source position y. And G x, t| y, t is the Green function defined in (3), which reflects acoustic field response from the source position y and time t to observed position x and time t.
where f denotes acoustic frequency and k = 2πf c denotes the number of waves. exp(·) denotes exponential function based on Euler's number. |·| denotes the module of a vector. j is the imaginary unit. | r| = | x − y| denotes the distance between source and observer, as shown in Fig. 1. And |t − t| = | x− y| c is also considered by G ( x| y) for a static and stationary source. So that the propagation model can be expressed in the frequency domain as: A continuous propagation model is expressed in (4). When the microphone array is used to observe the acoustic signals, a discrete propagation model is derived from the continuous model. Assume that the source plane is divided into N grids, There are K incoherent monopole sources in the source plane, satisfying K N . An array composed of M (K < M N ) microphones is moved onto Z positions on the array plane, which is parallel to the source plane. The time-domain sampling is transformed to the frequency-domain analysis, and snapshots are used to make wide-band signal separated into many narrow-bands under the assumption of Gaussian widesense stationary process, then measured pressures of the z-th array position (z = 1, · · · , Z ) at a frequency f corresponding to the l-th (l = 1, · · · , L) snapshot can be expressed as: where S * l = (s * 1 , · · · , s * k , · · · , s * K ) T ∈ C K ×1 denotes acoustic pressures of K uncorrelated monopole sources in the frequency domain. The symbol (·) T denotes matrix transpose.
where r m,k is the distance between k-th source and the m-th microphone. However, it is a non-linear system of equations in (5), since source signals and source positions are both unknown. To transform (5) into a linear system, the classical approach is to discretize the acoustic source plane into a large number of identical grids, and each grid is regarded as a possible position for the acoustic source, as shown in Fig. 1.
To obtain the CBF result at the z-th position, a discrete forward model can be derived from (5) as: where S l = (s 1 , · · · , s n , · · · , s N ) T ∈ C N ×1 denotes the acoustic pressures of each grid. Here discrete source S l is supposed to contain physical source S * l . Correspondingly, is discrete Green function matrix, whose element g {z} m,n is discrete propagation vector, derived from (6) as: where r mn denotes the distance between the n-th grid and the m-th microphone. Then beamforming output at the n-th grid of the z-th array position is given as: where (7). · l denotes l-norm, and (·) H denotes the complex conjugate transpose. It is noted that R {z} CSM (f ) is defined as the expectation of the inner product of acoustic pressure vectors in (10). Assume that the measured signals are stochastically stationary, thus R {z} CSM (f ) can be estimated by averaging the snapshot blocks over time as: where E [·] denotes mathematical expectation. In the following, we omit f for the simplicity of math symbols. For non-synchronous measurements, the final goal is to achieve the same results as the simultaneous measurements of Z sets of arrays at Z positions. Therefore, the critical problem boils down to a matrix completion from the item-missing CSM R The matrix completion problem can be solved by the Fast Iterative Shrinkage Thresholding Algorithm (FISTA), as introduced in [24]. Based on the completed CSM, the improved beamforming is derived from (9) as:

B. FORWARD MODEL OF POWER PROPAGATION
Based on the assumption of incoherent monopole sources, the full CSM R CSM can be expressed as: where T ∈ R MZ ×1 denotes the model uncertainty of sequential measurements and noise interference at Z array positions, whose item E {z} l is defined in (5). x = diag(E S l (S l ) H ) denotes the powers of acoustic sources, x = (x 1 , · · · , x n , · · · , x N ) T ∈ R N ×1 with x n being the power of a discrete source (grid) at the n-th grid. And diag(·) means diagonal items of a matrix.
Inserting (12) into (11), the beamforming output of nonsynchronous measurements can be expressed as: A linear equation group can summarize the discrete forward model of acoustic power propagation in (13) as: where e = (e 1 , · · · , e n , · · · , e N ) T ∈ R N ×1 with e n related to the n-th grid. e denotes the model uncertainty, including the noise interference at each microphone and the approximation error of the forward model. h n 1 ,n 2 is the element of power propagation matrix H ∈ R N ×N .
In (13) and (4), the beamforming output y n 1 at the n 1 -th grid mainly depends on the weighted combination of all discrete power vector x and its weight h n 1 ,n 2 , which comes from each row of the power propagation matrix H. In (14), the CBF result y of non-synchronous measurements depends on the weighted combinations (Hx) of all discrete grids. Then the source power x can be estimated from an inverse problem in (14) when given y under model uncertainty e.
Compared with the discrete forward model of acoustic signals in (7), the advantages of (14) are 1) it is a system of determined equations, whereas (7) is an underdetermined system of equations, because the number of microphones is much less than the number of scanning grids (M N ); 2) it directly infers the acoustic source power x instead of acoustic signals S l (including phases), and source power is directly used to visualize the acoustic strength distribution (acoustic imaging); 3) it gives more useful constraints on the ill-posed inverse problem such as positivity (x 0) and sparsity (K N ) of source powers than those in (7).

C. PROPOSED CONVOLUTION APPROXIMATION
To make an insight on the PSF influence at the nonsynchronous measurement beamforming, in this section, we propose a convolution model to approximate the forward model of power propagation in (14). We find out that power propagation matrix H seems to be a quasi-Symmetric Block Toeplitz (SBT) matrix in the far-field measurement, so that VOLUME 8, 2020 the (in)variant convolution kernels (sizes and values) can be derived from this SBT matrix [5], [6]. Thus a 2D-convolution forward model can be used to approximate Hx as: where x 0 denotes the ground-truth image of acoustic powers. Here x 0 is the matrix form of x. And [·] denotes the vectorization operator transforming a matrix to a vector in column order. h * ∈ R A×B is a 2D-convolution kernel (PSF). * denotes the valid convolution, whose output is made up of the overlap parts without zero-padded edges, so that the size of the output is the same as the input. A × B denotes the size of the PSF. Normally, the PSF with square form (A = B) is mostly used, and its size is no larger than that of x 0 . It is noted that in (15), e denotes the model uncertainty, including noise interference, the approximation errors of non-synchronous measurements, as well as the 2D-convolution.
In the forward convolution model of (15), the beamforming result y can be seen as the convolution result of acoustic source power image x 0 and PSF h * . But sometimes, the 2Dconvolution kernel is spatially variant for the large-surface scanning, that is, the same acoustic source is blurred by different PSFs in the non-synchronous measurements. To simplify this problem, suppose that the convolution kernel is spatially invariant. For the n-th source, an average distancē r n is defined to satisfy the conditionr n /r mn ≈ 1 as: Based on the above assumption, each element h n 1 ,n 2 ∈ H in (14) can be approximated by: Then the power propagation matrix can be separated into an approximated SBT matrixH and two diagonal matrices D 1 and D 2 as: where Diag [·] denotes the operator transforming a vector to a diagonal matrix. For the approximated SBTH, its middle row (n 1 = (N + 1)/2/2 ) contains more diverse items than the other rows. Thus the PSF of microphone array can be derived from the elements in the middle row of SBTH. In (15),  (18) as: where · denotes the integer part. A real and non-negative symmetric h * matrix is obtained to approximate the spatially invariant PSF of the non-synchronous measurements. In Fig. 3, the distance between planar acoustic sources and the planar array is 1.5 m. These two planes are parallel to get a 2D acoustic imaging. The sub-figures in the left, middle, and right are respectively the prototype of microphone array with 56 channels, acoustic source plane, and non-synchronous measurements of the prototype array at Setting that the analysis frequencies are 800 Hz and 1000 Hz respectively. According to the definition of the spatially invariant 2D-convolution kernel in (18), the PSF comparisons between the prototype array and non-synchronous measurements are demonstrated in Fig. 4. In far-field, the PSF would theoretically be a Dirac delta function under ideal conditions such as large enough array aperture, high enough microphone density, etc. Although the PSF is not ideal in experiments, it can still be measurable by the array response. According to the results in Fig. 4(a) and 4(b), when a prototype array is used to measure the acoustic pressures, the response region of PSF has been flatted over the scanning plane, and the form of PSF is fat and smooth, which results in significant blur-effect on beamforming result. In contrast, the non-synchronous measurements successfully refine the PSF performance by reducing significantly the PSF volume and sharpening the PSF shape. Moreover, the PSF of the prototype array at 800 Hz in Fig. 4(c) has apparent deterioration from the one at 1000 Hz in Fig. 4(b). On the contrary, the PSF of non-synchronous measurements at 800 Hz is as fine as that inFig. 4(b).

III. REGULARIZATION AND BAYESIAN INFERENCE A. REGULARIZATION IN BAYESIAN FRAMEWORK
In this section, two classical regularization methods are developed in the Bayesian framework. For a stochastically stationary process, the Gaussian distribution is regarded as the most common prior to the unknown variables. The Gaussian priors are assigned to model uncertainty in (14) or (15): each component of e is approximated as a zero-mean Gaussian distribution with unknown variance v e and all the elements of e are independent of each other. The rationality of this assumption is detailed in the reference [3], [14], [19], [29]. Therefore the prior distribution of e and likelihood function can be expressed as: (20) where N (0, v e ) stands for Gaussian independent distribution. To obtain the posterior distribution of the estimated object x, proper priors should be assigned to x. Here two specific prior distributions are discussed later (seen in Fig. 5).
Firstly, a Gaussian prior is attributed to x: each component of x is approximated by a zero-mean Gaussian independent distribution with unknown squared variance v x . Due to the Bayesian posterior law, the posterior probability can be obtained as: where ln(·) denotes the natural logarithmic operator. According to (21), the problem of maximizing a posterior can be transformed into minimizing the cost function of x as: Then x is obtained from the following optimization problem: where λ = v e /v x , λ > 0 denotes the regularization parameter, and it controls the trade-off between the smoothed solution and data-fitting error. The equation (23) is exactly the Tikhonov regularization formula, or called L 2 regularization. Without the Bayesian framework, λ has to be carefully tuned by conventional L 2 regularization for optimal performance. Owing to the Bayesian inference, λ is inherently determined by the Signal-to-Noise Ratio (SNR). In general, the L 2 regularization is solved by ridge regression or time-consuming convex optimization [26], [27], and it is robust to noise interference, but L 2 norm prefers to smooth estimation of x and its spatial resolution still needs improving. This fact can be explained by Bayesian inference that the Gaussian independent prior related to L 2 regularization is not a sparse distribution, since there is a long tail and slim body for sparse distribution (seen in Fig. 5).
Another commonly used sparse prior is the Laplace prior assigned to x: each component of x is approximated as a zeromean Laplace distribution with unknown scale parameter b x and all the elements of x are independent of each other. Then the posterior probability can be expressed as: The cost function under Laplace prior is defined as: Then x is obtained from the following optimization problem: where η = 2v e /b x , η > 0 denotes the regularization parameter corresponding to L 1 regularization and its function is similar to λ in L 2 regularization in (23). The L 1 regularization is commonly solved by the Least Absolute Shrinkage and Select Operator (LASSO) [28], [29]. Owing to the sparse prior, the L 1 regularization can obtain better spatial resolution than L 2 regularization does. But due to its non-differentiable at x = 0, L 1 regularization inevitably causes the deconvolution artifacts and cannot guarantee a fast convergence in strong noise interference.
In the inverse problem of acoustic localization in (23) and (26), the physical meaning of x is the power distribution on the scanning plane. Generally, the number of acoustic sources is much less than the number of scanning grids (K N ). So that the power distribution of acoustic sources is sparsely distributing along the scanning plane. This sparsity is an essential constraint for solving the inverse problem of (14). From the perspective of the Bayesian framework, Probability Density Function (PDF) is the most intuitive representation of the sparsity of prior information. Different prior distributions will bring varying degrees of sparsity constraints. Moreover, the Bayesian framework reveals that the regularization parameters λ and η can be determined by the SNR. In this paper, a Student-t distribution is introduced to balance the sparsity and computation complexity compared with the Gaussian distribution and Laplace distribution. As expressed in (27), x is one of the elements within x. ℘ 1 (x), ℘ 2 (x), and ℘ 3 (x) are the PDFs of Gaussian distribution, Laplace distribution, and Student-t distribution, respectively. Assume the mean values of these distributions are 0, and the shapes of these distributions are controlled by scale parameter v, as shown in Fig. 5. Student-t distribution has the longest tail, and more sparsity than the Laplace distribution. In fact, the Gaussian distribution has less sparsity due to its short tail. Moreover, the background noise can also be modeled by Student-t distribution, which can be adaptively fit for Gaussian and non-Gaussian noise interference such as Cauchy or Laplace noise.
where (·) denotes the Gamma function. Combining the (27) and corresponding PDFs in Fig. 5, most of the high probability values concentrate on zero-mean, and very few nonzero values are distributing over the tail of PDF. In general, a sparse distribution should have a slim body and a long tail such as Student-t and Laplace distributions. According to Fig. 5, the dynamic range of Laplace distribution is wider than the Gaussian distribution and narrower than the Student-t. Meanwhile, the tail of Laplace distribution is longer than the tail of Gaussian distribution and shorter than the Student-t. Hence, the solution of L 1 regularization is more sparse than the L 2 regularization. But L 1 regularization is more complex to calculate than the L 2 regularization. Moreover, the Student-t distribution has a great potential to coordinate the solution sparsity and calculation complexity. Therefore, the following part will explore the advantages of Student-t distribution.

B. PROPOSED SPARSITY-ENFORCING PRIOR
To enforce the solution sparsity, there are several kinds [3], [5], [15], [25] of priors such as Double Exponential, Generalized Gaussian, and some other mixture models with heavy tails. In this paper, the Student-t distribution is investigated to promote sparse solutions for two reasons. Firstly, as mentioned above, the tail of Student-t distribution is longer than Gaussian distribution and Laplace distribution; secondly, thanks to its Infinite Gaussian Scaled Mixture (IGSM) property, the Student-t can be expressed as a hierarchical Gaussian-Gamma framework as: (28) where St (·) and G (·) denote Student-t distribution and Gamma distribution, respectively. 1/u is the variance of Gaussian distribution. 1/α is the shape parameter and 1/β is the scale parameter of Gamma distribution.
Owing to the IGSM property in (28), the following hierarchical prior model is developed as: where V x and V e denote unknown covariance matrices of x and e respectively, and V x j,j is one of the diagonal items within V x , so does V e i,i to V e . And IG (v|α, β) denotes inverse Gamma distribution with shape parameter α and scale parameter β. Compared with (20), the model uncertainty e has been modeled by Student-t distribution, which is related to non-Gaussian noise interference such as Cauchy and Laplace noise. According to (29), the Bayesian hierarchical diagram is depicted in Fig.6 as: Based on the proposed Bayesian hierarchical framework, the joint posterior of all the variables x, V x and V e is derived as: where the noise interference is supposed to be independent of acoustic sources, thus V e is independent of x and V x . And the JMAP inference is implemented to solve (30) in the following.

C. JMAP INFERENCE
Inserting the (29) and (14) into (30), the JMAP can be transformed into minimization of the cost function as: VOLUME 8, 2020 The estimation of all the unknown variables x, V x , V e can be achieved by solving the alternate optimization as: For this optimization problem, there are the theoretical solution for some unknown variables under the assumption that other variables are fixed. Firstly, when V x and V e are fixed, the cost function for x is: The cost function (33) can be solved by a gradient descent approach. Note that the initialization of x {0} can be the analytical solution by putting the derivative of (33) equal to zero. This initialization makes alternate iteration to converge much faster than that initialized by random process. Thus the initialization and iteration can be expressed as: where ρ denotes the step size of gradient decent, c denotes the number of iterations. Then assume x being fixed, the cost function for V e and V x can be alternatively expressed as: Similarly, putting the derivative of V x and V e equal to zero, the JMAP estimation of V x and V e can be expressed as: where H i denotes the i-th row of matrix H. Here α x 0 , β x 0 , α e 0 and β e 0 are initialized by a random number between 0 and 1. These alternate iterations are summarized by the Bayesian JMAP Inference Algorithm in Table 1: where ε denotes a small value to stop the alternate iteration. Bayesian JMAP inference finally achieves reasonable estimations of object x and hidden variables V x , V e . The whole framework is shown in Fig. 7. The proposed Bi-NAM via the JMAP algorithm is implemented to obtain high-resolution imaging at low frequencies. Here the clean map from our proposed method can be feedback to optimize the non-synchronous measurements. But this point will not be discussed due to limited pages.

IV. EXPERIMENTAL RESULTS AND DISCUSSIONS A. EXPERIMENT SETTING
In this section, experiments are carried out to validate the proposed Bi-NAM method. In order to remove the effects of uncontrollable factors, the experiments are performed in an anechoic chamber, as shown in Fig. 8(a). The distance between planar acoustic sources and the planar array is 1.5 m, and the distance between diagonal loud-speakers is about 0.75 m. The prototype array is moved sequentially 9 times during non-synchronous measurements, and array centre positions at each move are shown in Fig. 8 . It is noted that the movement order of array does not substantially affect the imaging results. However, it is a convenient choice to move in sequence according to the convenience in the experiment.
In Fig. 9, narrow-band signals with center frequencies at 1000 Hz and 800 Hz are played respectively by 4 loudspeakers to simulate acoustic monopole sources. At each   position of prototype array, the signals are sampled for 10 seconds with a sampling frequency of 50000 Hz. Then the measured data are processed by the CBF, LASSO, and proposed Bi-MAN based on JMAP at 1000 Hz and 800 Hz, respectively, in Fig. 10-12. The hybrid data with the Gaussian and Laplace noise interferences are also tested in Fig. 13. In reality, the difficulties of source setup are that the used loud-speakers are not identical sources, and they cannot synchronously emit signals. Moreover, the emitted sounds are not strictly stationary and stable.

B. RESULTS AND DISCUSSIONS
Firstly, the CSM completion is shown in Fig. 10. Colorbar represents the relative sound power level. Without noise interference, the differences between full CSMs from 1000 Hz to 800Hz cannot be neglectable. This is because that the tested signals are not strictly stationary and synchronous. And these differences might lead to the localization result of the proposed Bi-NAM at 800 Hz in Fig. 11(c) is not better than that at 1000 Hz in Fig. 12(c).  In Fig. 11, it shows the acoustic imaging results at 800 Hz without noise interference. The colormap dynamic range is 5 dB. The blue bold dotted line stands for the positions of four acoustic sources. The CBF and LASSO can hardly demonstrate any valid information about acoustic sources. The proposed Bi-NAM method can localize each source accurately and break the limitation caused by the PSF from the blurred CBF result. In Fig. 11(c), the intensity of the left acoustic source is much lower than those of the other 3 sources, so that it is invisible in the 5 dB dynamic range. The reason is that the acoustic intensity is unstable and nonstationary due to loudspeaker quality.
In Fig. 12, it shows the acoustic imaging results at 1000Hz without noise interference. The CBF is not able to distinguish clearly 4 monopoles due to the fat and flat PSF at 1000Hz in Fig. 4. The LASSO can provide a high-resolution result, but the reconstructed distributions are too sparse that the oversparse results will lose the continuous distribution. When the dynamic range is increased, the massive artifacts will inevitably cause frequent false-alarm detection. It is difficult to tell whether these points are sources or sidelobes, and not easy to confirm the number of the monopole sources. The proposed Bi-NAM method can achieve high-resolution imaging, and adjacent sources can be separated clearly. The  distribution of acoustic sources can also be vividly displayed. In addition to the acoustic imaging results, the calculation efficiency is also a significant indicator to evaluate the performance of the algorithm. In this paper, all programs are run in MATLAB R2017b software on Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz system with 16 GB-RAM. The iteration termination parameter is set asε = 0.1, and the calculation time of the LASSO and the proposed Bi-NAM based on JMAP is given in Table 2, in which we can conclude that the proposed Bi-NAM method doesn't bring extra computation burden.
In Fig.13(a) and (b), the hybrid data are generated by adding the Gaussian and Laplace noise interferences to the real data in Fig.9. The SNR are set 0 dB and -5 dB, respectively. Then the proposed Bi-NAM method can achieve reasonable and consistent localization results at 1000 Hz in Fig.13(c), (d), (e) and (f), as well as at 800 Hz in Fig.13(g), (h), (i) and (j). But due to the non-stationarity of four loudspeakers, it would worsen the phase continuity and CSM completion of the non-synchronous measurements, and bring the apparent sidelobes and artifacts around corners to the localization results.
In brief, the proposed Bi-NAM approach has been confirmed to achieve high-resolution acoustic imaging at 1000Hz and 800Hz at low SNR, respectively, even under the Laplace noise interference. The comparisons of the PSF and aforementioned methods are summarized in Table 3.

V. CONCLUSION
This paper presents that the PSF of microphone array is a critical factor that causes quite blurred imaging for acoustic localization and visualization at low-frequencies. In fact, the PSF has become an insightful way from array topology to localization resolution. In order to obtain high-resolution imaging at low-frequency, what is worth breaking is not only to improve the rationality of sparse constraints in the inverse algorithm, but also to reduce the PSF influence by optimizing the prototype array, or making the synthetic array of nonsynchronous measurements.
The proposed Bi-NAM method has taken the above two aspects into account comprehensively. First, the nonsynchronous measurements are implemented to extend the array synthetic aperture and enhance the microphone density. In this way, the PSF of the prototype array, which is flat and fat, has been successfully refined as the synthetic PSF of the non-synchronous measurements, which is sharp and slim. Next, the Bayesian JMAP inference is developed to solve the very ill-posed inverse problem caused by non-synchronous measures, in which, the item-missing CSM needs elementcompletion and necessary approximation. Student-t prior is employed to promote sparsity for high resolution and low computation. All the variables can be alternatively estimated via a Bayesian hierarchical framework. Finally, localization results of experiment data at 800 Hz and 1000 Hz validate the proposed Bi-NAM method. Our proposal is capable of building up a refined PSF and inferring high-resolution acoustic imaging at both low-frequency and low SNR, even under Laplace noise interference.
In the future, the proposed Bi-NAM method can be extended to 3D acoustic source localization with high resolution. The optimal topological design of non-synchronous measurements is worthy of studying for easier implementation and better performance. In the fast calculation, it is an interesting topic for the Bayesian JMAP algorithm combined with the shift-invariant convolution model. DAZHUAN WU received the bachelor's degree in machinery and equipment of chemical engineering and the Ph.D. degree in chemical process equipment from Zhejiang University, Hangzhou, China, in 1999 and 2004, respectively. He was a Visiting Scholar from Kyoto University, Japan. He is currently a Professor with Zhejiang University. His research interests are flow-induced vibration and noise control, Pumps, and fans and propulsion technology. VOLUME 8, 2020