Processing math: 100%
Free-Field TDOA-AOA Sound Source Localization Using Three Soundfield Microphones | IEEE Journals & Magazine | IEEE Xplore

Free-Field TDOA-AOA Sound Source Localization Using Three Soundfield Microphones


3D Sound Source localization in the anechoic chamber.

Abstract:

Accurate source localization is an important problem in many research areas as well as practical applications in wireless communications and acoustic signal processing. T...Show More

Abstract:

Accurate source localization is an important problem in many research areas as well as practical applications in wireless communications and acoustic signal processing. This paper presents a passive three-dimensional sound source localization (SSL) method that employs a geometric configuration of three soundfield microphones. Two methods for estimating the angle of arrival (AOA) and time difference of arrival (TDOA) are proposed based on Ambisonics A and B format signals. The closed-form solution for sound source location estimation based on two TDOAs and three AOAs is derived. The proposed method is evaluated by simulations and physical experiments in our anechoic chamber. Simulations demonstrate that the estimation method can theoretically obtain Cramér-Rao lower bound for a small Gaussian noise present in AOA and TDOA observations. Investigation on the uncertainty of TDOA and AOA measurements depending on the length of measurement interval is also conducted. Experimental results in terms of RMSE indicate that the proposed solution can be used to accurately find a 3D position of the sound source in free-field environment. Performance evaluation regarding the number of estimation steps shows that higher accuracy can be achieved by longer observations of stationary sound source.
3D Sound Source localization in the anechoic chamber.
Published in: IEEE Access ( Volume: 8)
Page(s): 87749 - 87761
Date of Publication: 07 May 2020
Electronic ISSN: 2169-3536

Funding Agency:


CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.
SECTION I.

Introduction

The sound source localization (SSL) is an essential step in a wide range of audio/acoustic-based applications. Nowadays-concerned research topics on SSL are ranging from detection of the speaker position in human-computer interaction [1] or smart video conferencing [2], robot movement in an unknown environment [3], [4], search and rescure [5] to advance military applications such as localization of a sniper [6] and medium-range aircraft localization [7]. Besides, SSL is usually used as a necessary preprocessing step before the enhancement of an acoustic signal from a particular location [8].

In general, the task of source localization can be viewed either as an active localization scenario, where transmitters actively emit signals to illuminate the target of interest while the target location is inferred by collecting reflections, wherein the scenario of passive localization, receiving sensors can only collect signals emitted from the source. In many areas, only a passive source location is considered, where the signals usually do not carry the time information about their transmission. The sound localization as a type of passive localization refers to the problem of estimating the position from which a sound signal originates concerning the microphone array geometry. In this case, a localization system is unable to directly measure the time of arrival (TOA) between the source location and receiving sensor, but instead, only the difference between times when the different sensors receive the signals can be measured.

Various methods for SSL have been proposed where all methods can be grouped by their efforts to detect sound source either in 2D space [9], [10] or in 3D space [3], [11]. Fundamentally, there are two main approaches to finding a source with respect to recorded audio signals. Both approaches are mainly based on estimating the time difference of arrival (TDOA) obtained by using various configurations of microphone arrays, such as linear array [12], circular array [13], or distributed array [14] and different cross-correlation algorithm to estimate time lag between microphones. The first approach aims to maximize the steered response power (SRP) of the output of a delay and sum beamformer [15]. This direct approach performs an exhaustive search in the whole SRP space to find a sound location, which is found to be computationally expensive. In contrast, the indirect approach is based on using estimated TDOAs measurements, where the sound source is ascertained by addressing criteria such as the hypercone fitting problem [16]. Although results obtained by the indirect method are more prone to error as they are more sensitive to background noise, and reflections oppose to direct methods, the main advantage of indirect TDOA based approach is that it can be effectively used in distributed microphone network since it only requires for TDOA values to be transmitted and not raw sound signal data [17].

A. Related Work

In many research papers, TDOA is a standard measurement used for passive localization [18], [19]. Source localization based solely on TDOA measurements demanded a specific number of measuring devices, i.e., a minimum number of three and four sensors are required to locate an unknown target respectively in 2D and 3D space. The principle of TDOA based approach is to estimate source position from the intersection of hyperbolic arcs and surfaces respectively for 2D and 3D cases. In the case of near-field applications where the source range to the sensor array baseline ratio is not large, the resulting intersection can be obtained by solving a set of nonlinear equations. On the contrary, in the far-field applications, resulting intersection produces a low location estimate since the hyperbolic arcs/surfaces become almost parallel to one another. Given the evidence that the accuracy of the source position estimate degrades when the source moves sufficiently far away from the sensors resulted in two different localization problems. Recently, unified near/fat-field TDOA based localization was proposed in [20]. The proposed approach consisted of two formulations for the unified localization problem in two different coordinate systems. The first formulation is of a nonlinear non-convex weighted least squares optimization based on the modified polar representation for the source position, and the other is the non-convex fractional programming formulation using the conventional Cartesian coordinates of the source position as the optimization variable. Besides the TDOA-based approach, a source location can also be calculated from the AOA measurements and its derivative. Obtaining AOA often involves a sensor that is equipped with an array of receivers; thus, it elevates a requirement of synchronization between different sensors since each produces angle by itself. In [21], the authors proposed a solution that can attain Cramér-Rao lower bound under mild conditions. Another 3D bearing-only localization is proposed by [22], where authors achieved a significant reduction in bias and root-mean-square error using a pseudo-linear estimator. In general, the passive source localization problem is not trivial since direct relationships between the position of a source and the measurements are complex, and procedures for solving equations for the TDOA and AOA methods are hard because of nonlinearity.

One of the more recent research directions is in combining TDOA and AOA measurements, where algebraic manipulations allow transforming the relationships to the linear form and lowering energy consumption [23]. There are several advantages with the hybrid TDOA-AOA approach, such as improved localization performance [24], [25], reduced number of sensors required [26] and it can minimize the occurrence of ghost targets which is typical for localization approach with individual TDOA measurements [27]. Many studies have been carried out proposing a different solution to source localization [26], [28]–​[31]. The most straightforward method to source localization is an exhaustive search in a feasible solution region, which is a time-consuming and inefficient solution for real-time application. In general, the maximum likelihood (ML) estimator is introduced to estimate the location since it is asymptotically efficient.

However, the aftermentioned approaches are computationally expensive, and it is hard to find a closed-form solution, or a closed-form solution does not exist at all. One of the solutions is to linearize equations with a recursive approach such as Taylor-series. Nevertheless, these numerical search techniques can converge to an optimal solution only if the ML function is convex. The numerical methods are prone to error since they depend on the right initial position guess, and thus it is difficult to guarantee its global convergence and calculating time. Their iterative nature does not make them very suitable for real-time applications. To improve the robustness and reduce the complexity, a closed-form solution is required. A linear least-squares estimator with the closed-form solution, called pseudo linear estimator (PLE), was proposed in [22]. Although this approach is less computationally demanding, the estimated source position is biased because of the correlation between system and measurement. Recently, authors [32] proposed a new method for localization. The proposed method represents a simple algebraic solution that does not suffer from the local convergence problem. However, this method also has a larger bias since localization accuracy is affected by the deviation of the noise correlation matrix. Another fine localization solution, based on generalized trust-region subproblem technique is proposed by [33], were authors analyzed the necessary optimal conditions of squared range difference least square cost function. To reduce the bias of the estimator, a few different methods where proposed. In [34] BR-PLE method is proposed for reducing bias. Authors in [35] proposed a solution to find a rotation angle that could reduce the bias of the estimator since they demonstrated that estimator performance was sensitive to origin rotation. [36] proposed another solution for reducing estimator bias in the presence of sensors error. The proposed method introduces a quadratic constraint so that estimator expectation cost function can attain the minimum value at a true position and thus to achieve Cramer-Rao lower bound. By analyzing different approaches, one could conclude that for the most methods, an estimation bias arises from the least-squares techniques. One of the solutions is to construct a new pseudo linear system such as [27], where a new weight least-square method is proposed. Authors claimed that the novel structured total least squares method could reduce estimation bias when the target is outside the convex hull formed by measurement sensors. The total least squares estimator was also considered in [37], where it has demonstrated improved accuracy over least-square solutions. Another practical solution to position estimation was proposed by [38]. The proposed closed-form method was based on converting time-delay measurements to angular information, but it couldn’t achieve the CRLB performance. Recently, an efficient closed-form solution has been proposed for passive source location using only two stations [26]. In their work, a new relationship between hybrid TDOA and AOA measurements and unknown source positions was constructed. The theoretical simulation shows that the proposed solution can achieve CRLB for Gaussian noise, where bias compared to variance can be ignored.

By analyzing the aforementioned different source localization approaches, some general remarks regarding localization performance can be made. Investigating different types of input data for source estimation, one can observe that using hybrid TDOA and AOA measurements reduces the number of sensors required, which is especially important for a real-life scenario where the basic assumption about line-of-sight (LOS) between unknown source position and sensor may not be respected. In many the non-line-of-sight (NLOS) environments, signals emitted from the source are often inaccessible by all measurement sensors, and one of the solutions is to use wireless sensor networks (WSN) in which received signals by each sensor are transmitted to the fusion center for source localization. Authors in [39] proposed a distributed NLOS cooperative localization algorithm. The proposed localization algorithm employs the multiplicative convex model based on the physical mechanism of the NLOS propagation to achieve robustness in changing environments. A TDOA-based cooperative localization approach for mixed LOS/NLOS conditions is proposed by [40]. For the location of the multiple stationary target nodes, the authors formulated a non-convex robust weighted least squares problem (RWLS). To efficiently solve RWLS, the semidefinite relaxation technique is used to transform RWLS into a convex mixed semidefinite and second-order cone programming problem. Authors in [41] presented the energy-based localization solution in WSNs using received signal strength (RSS) and received signal strength difference (RSSD). In the proposed solution, RSSD is based on transforming the nonlinear and non-convex objective functions into a convex optimization problem via relaxation and semidefinite programming. Another mixed semidefinite and second-order cone relaxation for source localization in 3D WSN was proposed by [42]. The proposed target node localization is based on hybrid RSS-AOA measurements in both noncooperative and cooperative WSNs, where the authors proposed new LS estimator to reduce the implementation costs. From conducted analysis authors concluded that proposed RSS-AOA based estimator is more suitable for large-scale cooperative WSNs compared with AOA-TDOA based estimators. In general, centralized algorithms suffer from high computation complexity, network traffic bottleneck, and as such, are not recommended in scenarios where each sensor node cannot get raw measurements directly. To address such problem authors in [43] proposed a completely decentralized localization approach based on augmented Lagrangian methods and alternating direction method of multipliers (ADMM). Discussing performance comparison in terms of computational efficiency, it is visible that the unknown source position can be directly calculated from different sets of geometric relationships. Although computational complexity for this kind of approach is low, it frequently does not perform sufficiently when measurement error exists. Since TDOA and AOA observations have measurement error, the location of a source is often estimated by propagating these errors trough the computation, where iterative nonlinear minimization is required for an optimal solution. The most straightforward approaches for handling measurement noise are iterative algorithms based on initial position estimate, obtained, for example, by the Gauss-Newton method, which has high computational requirements. The maximum likelihood estimator is asymptotically efficient, but it requires a good initial guess. To avoid the need for good initial position guess characteristic for iterative ML approach, different closed-form source estimation methods are proposed. A linear LS approach is an alternative approach, which can achieve CRLB but reports large estimation bias. In general, various closed-form solutions have been proposed, each designed to reduce biases, or to work with a different number of sensors. To asses the performance of localization estimation, most state-of-the-art studies perform benchmark in terms of CRLB. Analyzing literature, one can see that the weighted least square estimator is superior in comparison to LS solutions in terms of lower bias. Also, it can be concluded that even both WSL and ML approaches can achieve CRLB, the WSL approach is more computationally attractive and suitable for real-time application and does not really on the initial guess. From the performance comparison between different methods, one can conclude that while most state-of-the-art closed-form solutions achieve CRLB, each approach is designed to work under restricted measurement error.

In the context of room acoustics, it is difficult to establish measurement error as the measurement microphone captures not only the direct-path component of the source signal but also the multipath component caused by reflections. The multipath component, together with the background noise, can lead to distortion of the time delay information from received signals, and thus it can degrade the localization performance. To address key challenges of the realistic environment such as room reverberation, background noise, and sound interference, different methods to compute the TDOAs across various combinations of pairs of spatially separated microphones were proposed [44]–​[47]. Recent work also suggested that deep learning can successfully be applied for modeling rooms acoustic [48]. For instance, in [49] authors employed deep model for SSL, where it was shown that deep learning-based system achieved higher accuracy under low SNR conditions in comparison with cross-correlation phase transform (GCC-PHAT) method. Authors in [17] proposed a novel learning approach for SSL based on TDOA estimation, where coordinates of a sound source were defined as functions of TDOA. In their work, pre-recorded sound measurements and their corresponding source locations were used to train the multilevel B-Splines based learning model. A new dataset for learning-based SSL was proposed by [50] which contained different acoustic events recorded in the anechoic chamber, where the anechoic chamber environment was used to verify the feasibility of the proposed baseline model. Authors in [51] addressed SSL for indoor environments with high reverberation and low signal-to-noise ratio. The authors proposed a novel sound source localization method using a probabilistic neural network for the classification of 3D space clusters. Another learning-based approach was demonstrated in [52], where authors evaluated the Structural Sparse Bayesian Learning model with signals recorded in an anechoic chamber with one reflective plate. Moreover, it can be seen that many researchers validated their SSL methods by using recorded audio signals simulating free field conditions or experimental analysis was carried out inside a semi-anechoic and anechoic chamber. The anechoic chamber provides a good simulation of the outdoor conditions due to the low level of reflections. This environment provides the possibility to precisely control the conditions and to measure the levels of sound events and noise, which is substantial for many experiments. Kotus et al. [53] performed multiple sound sources localization in the anechoic chamber, where different methods for obtaining the direction of arrival were tested. In the work [54], the authors proposed a modified cross-correlation algorithm to obtain a more reliable measurement of time difference of arrival in the reverberation environment. They performed a triangulation procedure using calculated TDOA values to obtain a sound source position in 3D space. Another three-dimensional method for SSL was proposed by Ding et al. [55]. The authors performed theoretical simulation accompanied by experimental results in the anechoic chamber. They proposed the use of a planar microphone array combined with a beamforming technique to obtain the location of the point sound source.

B. Outline

This paper presents a passive sound source localization method using three soundfield microphone stations. The geometric configuration of three soundfield microphones can be employed to obtain two TDOA and three AOA measurements concerning unknown source position. A closed-form mathematical solution for SSL estimation is presented. Results are given in terms of RMSE, where it was shown that simulation of the sound source estimation algorithm can reach Cramér-Rao lower bound for small Gaussian noise presented in measurements. Theoretical simulation is supported by the experimental analysis conducted in our anechoic chamber [56]. In this work, TDOA and AOA measurements are directly obtained by exploiting the A and B format of the soundfield microphones, which can achieve small measurement deviations. This paper also demonstrates the value of using a soundfield microphone for the SSL task because AOA can be easily obtained due to the configuration of the soundfield microphone capsules.

The rest of the paper is organized as follows. Section II describes the mathematical formulation of the SSL algorithm. Section III presents our TDOA and AOA estimation methods using soundfield microphones. The simulation and experimental results in the anechoic chamber are given in Section IV, and Section V concludes the paper.

SECTION II.

3D Source Location Estimation

This section explains a novel sound source localization method based on three soundfiled microphone stations. First, we establish a geometrical relationship between an unknown position of the sound source and known positions of sensors, over obtained AOA and TDOA observations. Second, we define a weighted least squares estimator, where the sum of the squared residuals is minimized with respect to the error measurement vector. Measurement error is modeled as a covariance matrix containing measurement uncertainties. In experimental phase those measurement error parameters are obtained through testing in the anechoic chamber [56].

Here we presented the theoretical formulation of the localization scenario where a geometric configuration of three stations \boldsymbol {m_{i}} = [m_{x,i}, m_{y,i}, m_{z,i}]^{T}\in \mathbb {R}^{3} is used to estimate the position of the single sound source \boldsymbol {s} = [s_{x}, s_{y}, s_{z}]^{T}\in \mathbb {R}^{3} in 3D space. Assuming that each station on position \boldsymbol {m}_{i} can determine bearing angles of received sound wave transmitted from the unknown source position s , geometric relationship between \boldsymbol {s} and \boldsymbol {m}_{i} can be expressed trough observed AOAs by nonlinear equation:\begin{align*}&\hspace {-.5pc}\begin{bmatrix} \theta _{i} \\ \phi _{i} \end{bmatrix} = \begin{bmatrix} \arctan \left ({\dfrac {s_{y}-m_{y,i}}{s_{x}-m_{x,i}}}\right) \\ \arctan \left ({\dfrac {s_{z}-m_{z,i}}{\sqrt {(s_{x}-m_{x,i})^{2} + (s_{y}-m_{y,i})^{2}}} }\right) \end{bmatrix}, \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\displaystyle { i=1,2,3} \tag{1}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \theta _{i}\in (-\pi,\pi) and \phi _{i}\in (0, \pi /2 ) form an AOA pair corresponding to azimuth and elevation angles in the right-hand coordinate system.

Let r_{i} be the Euclidean distance between the source and microphone m_{i} :\begin{equation*} r_{i} = ||\boldsymbol {s}-\boldsymbol {m}_{i}|| = \sqrt {\left ({\boldsymbol {s}-\boldsymbol {m}_{i})^{T}(\boldsymbol {s}-\boldsymbol {m}_{i}}\right)},\quad i=1,2,3.\tag{2}\end{equation*}

View SourceRight-click on figure for MathML and additional features. and r_{1} is the true distance from \boldsymbol {m}_{1} to the unknown source, then all TDOA observations with respect to \boldsymbol {m}_{i} are given as:\begin{equation*} \tau _{i1} = r_{i} - r_{1} = \Delta t_{i1} * v, \quad i=2,3.\tag{3}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \tau _{i1} is the range difference, \Delta t_{i1} is the actual TDOA measurement obtained as the propagation lag between \boldsymbol {m}_{i} and \boldsymbol {m}_{1} and v is the speed of signal propagation.

To estimate the unknown source position \boldsymbol {s} using true TDOA and AOA values, we define vector \boldsymbol {\mu }= [\boldsymbol {\tau }_{21}, \boldsymbol {\tau }_{31}, \boldsymbol {\kappa }_{1}^{T}, \boldsymbol {\kappa }_{2}^{T}, \boldsymbol {\kappa }_{3}^{T}]^{T} \in \mathbb {R}^{8} containing two TDOA observations with respect to \boldsymbol {m}_{1} and three AOA observations produced from positions \boldsymbol {m}_{1} , \boldsymbol {m}_{2} and \boldsymbol {m}_{3} . In practice, the AOA and TDOA are noisy observations where we assume that true values are influenced by additive Gaussian noise as:\begin{align*} \hat {\boldsymbol {\kappa }}_{i}=&\boldsymbol {\kappa _{i}} + \boldsymbol {\epsilon }_{i} = \begin{bmatrix} \theta _{i} + n_{i}\\ \phi _{i} + m_{i} \end{bmatrix}, \\ \hat {\tau }_{i1}=&\tau _{i1} + \varepsilon _{i1}\tag{4}\end{align*}

View SourceRight-click on figure for MathML and additional features. When expressing noisy TDOA and AOA in vector form results in:\begin{equation*} \hat {\boldsymbol {\mu }} = \boldsymbol {\mu } + \boldsymbol {\varepsilon }\tag{5}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where \hat {\boldsymbol {\mu }} is actual noisy measurement vector \boldsymbol {\mu } and \boldsymbol {\varepsilon } =[\boldsymbol {\varepsilon }_{21}, \boldsymbol {\varepsilon }_{31}, \boldsymbol {\epsilon }_{1}^{T}, \boldsymbol {\epsilon }_{2}^{T}, \boldsymbol {\epsilon }_{3}^{T}]^{T} \in \mathbb {R}^{8} is zero-mean Gaussian with covariance matrix \boldsymbol {Q}\in \mathbb {R}^{8\times 8}

For localization scenario shown in Fig 1. we construct a unit norm vector \boldsymbol {b}_{i} = [\cos \phi _{i}\cos \theta _{i}, \cos \phi _{i}\sin \theta _{i}, \sin \phi _{i}]^{T}\in \mathbb {R}^{3} so that \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{i}=1 and to satisfy the given geometric relation:\begin{equation*} \boldsymbol {s}- \boldsymbol {m}_{i} = r_{i} \boldsymbol {b}_{i}, \quad i =1,2,3.\tag{6}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

FIGURE 1. - Localization geometry.
FIGURE 1.

Localization geometry.

By constructing the matrix \boldsymbol {G_{i}} in order to have columns that are orthonormal basis of the plane orthogonal to \boldsymbol {b}_{i} , it follows that \boldsymbol {G}_{i}^{T} \boldsymbol {G}_{i} =\boldsymbol {I}_{2\times 2} and \boldsymbol {G}_{i}^{T} \boldsymbol {b}_{i} = \boldsymbol {0}_{2} where \boldsymbol {I}_{2\times 2} and \boldsymbol {0}_{2} are identity matrix and zero vector. Let \boldsymbol {G}_{i} \in \mathbb {R}^{3\times 2} be:\begin{align*} \boldsymbol {G}_{i} = \begin{bmatrix} \sin \theta _{i} &\quad \sin \phi _{i}\cos \theta _{i} \\ -\cos \theta _{i} &\quad \sin \phi _{i}\sin \theta _{i} \\ 0 &\quad -\cos \phi _{i} \end{bmatrix}, \quad i =1,2,3.\tag{7}\end{align*}

View SourceRight-click on figure for MathML and additional features.

By performing left multiplication on (6) with \boldsymbol {G}_{i}^{T} we get the following expression:\begin{equation*} \boldsymbol {G}_{i}^{T} \boldsymbol {u} = \boldsymbol {G}_{i}^{T} \boldsymbol {m}_{i}, \quad i =1,2,3.\tag{8}\end{equation*}

View SourceRight-click on figure for MathML and additional features.

By realizing that \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{j} = \boldsymbol {b}_{j}^{T} \boldsymbol {b}_{i} and \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{i} = 1 we construct identity:\begin{equation*} \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{i} + \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{j} - \boldsymbol {b}_{j}^{T} \boldsymbol {b}_{i} + \boldsymbol {b}_{j}^{T} \boldsymbol {b}_{j} = 0, \quad i\neq j =1,2,3.\tag{9}\end{equation*}

View SourceRight-click on figure for MathML and additional features. Equation (9) can be rewritten with respect to \boldsymbol {m}_{1} as \begin{equation*} r_{i}(\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T}(\boldsymbol {b}_{i}+ \boldsymbol {b}_{1})= 0, \quad i =2,3.\tag{10}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where r_{i} is the geometric distance from \boldsymbol {m}_{i} to \boldsymbol {s} . Substituting (3) as r_{i} in (10) we have r_{i}(\boldsymbol {b}_{i}+ \boldsymbol {b}_{i}) = r_{i} \boldsymbol {b}_{i} + (r_{1} + \boldsymbol {\tau }_{i1}) \boldsymbol {b}_{1} and after using (6) as r_{i}= \boldsymbol {b}_{i}^{T}(\boldsymbol {u}- \boldsymbol {m}_{i}) equation (10) becomes:\begin{align*} 2(\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T} \boldsymbol {s} = (\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T}(\boldsymbol {m}_{1} + \boldsymbol {m}_{i} - \boldsymbol {\tau }_{i1} \boldsymbol {b}_{1}), \quad i =2,3. \\\tag{11}\end{align*}
View SourceRight-click on figure for MathML and additional features.

A closed form solution was obtained in [26] for two stations and one source, while in this work, we expand the solution to work with three stations. Using (8) and (11) together in matrix form gives:\begin{align*} \boldsymbol {h}=&\boldsymbol {G}^{T} \boldsymbol {s} \tag{12}\\ \boldsymbol {h}=&[\beta _{21}, \beta _{31}, \boldsymbol {s}_{1}^{T} \boldsymbol {G}_{1}, \boldsymbol {s}_{2}^{T} \boldsymbol {G}_{2}, \boldsymbol {s}_{3}^{T} \boldsymbol {G}_{3}]^{T} \in \mathbb {R}^{8} \tag{13}\\ \beta _{i1}=&(\boldsymbol {b}_{i}-\boldsymbol {b}_{1})^{T}(\boldsymbol {m}_{1} + \boldsymbol {m}_{i} - \tau _{i1} \boldsymbol {b}_{1}), \quad i= 2,3 \tag{14}\\ \boldsymbol {G}=&[2(\boldsymbol {b}_{2}-\boldsymbol {b}_{1}),2(\boldsymbol {b}_{3}-\boldsymbol {b}_{1}), \boldsymbol {G}_{1}, \boldsymbol {G}_{2}, \boldsymbol {G}_{3}] \in \mathbb {R}^{3 \times 8}\tag{15}\end{align*}

View SourceRight-click on figure for MathML and additional features.

Equation (12) represents an ideal condition which doesn’t hold in practice since there is a measurement error \boldsymbol {\varepsilon } in \boldsymbol {h} and \boldsymbol {G} matrix. To analyze the influence of the error \boldsymbol {\varepsilon } for a given measurement vector \hat {\boldsymbol {\mu }} in (5) on localization accuracy, \boldsymbol {\varepsilon } will be introduced in (1) and (3) where the true geometric values of TDOA and AOA will be expressed in terms of their noisy observations (4). We can approximate (12) up to first order noise terms which gives:\begin{equation*} \hat {\boldsymbol {h}} \approx \hat {\boldsymbol {G}}^{T} \boldsymbol {s} + \boldsymbol {T}\boldsymbol {\varepsilon }\tag{16}\end{equation*}

View SourceRight-click on figure for MathML and additional features. where \hat {\boldsymbol {h}} and \hat {\boldsymbol {G}} are matrices containing noisy measurements instead of true TDOA and AOA values as their counterparts \boldsymbol {h} and \boldsymbol {G} . The formulation of \boldsymbol {T} matrix is shown in (18)–(20), as shown at the bottom of the next page. From (5) it follows that \boldsymbol {T}\boldsymbol {\epsilon } in (16) is zero-mean Gaussian with covariance matrix \boldsymbol {Q} . To calculate a weighted Least Square estimate of \boldsymbol {s} from (16), the sum of squared residuals is minimized with the respect to error measurement vector \boldsymbol {\varepsilon } .\begin{align*} \ddot {\boldsymbol {s}}=&\arg \min ||\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T}||^{2} \\=&\arg \min (\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T} \boldsymbol {s})^{T} \boldsymbol {W}^{-1}(\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T} \boldsymbol {s}) \\=&(\hat {\boldsymbol {G}}\boldsymbol {W}^{-1}\hat {\boldsymbol {G}}^{T})^{-1}\hat {\boldsymbol {G}}^{-1}\hat {\boldsymbol {h}}\tag{17}\end{align*}
View SourceRight-click on figure for MathML and additional features.

where \boldsymbol {W} = \boldsymbol {T} \boldsymbol {Q} \boldsymbol {T}^{T} is weight matrix. The expression (17) is based on the implicit assumption that the measurement errors are uncorrelated with each other and that TDOA and AOA observations have corresponding \sigma _{RD} and \sigma _{AOA} uncertainties. This is insured by modeling \boldsymbol {Q} as a matrix, where the diagonal elements are given in form of two TDOA and three AOA variances, setting the rest of off-diagonal entries to null.

SECTION III.

TDOA-AOA Estimation Method

In this section, we explain the proposed approach for estimating TDOA and AOA values, computed from the real sound measurements collected by three soundfield microphone stations. Given that the solution (17) performs minimization of the sum of squared residuals with respect to measurement error, it was essential to determine observation uncertainties \sigma _{RD} and \sigma _{AOA} of the proposed TDOA-AOA estimation method in free field conditions. In this work, to ensure universal reference regarding methods uncertainties, all measurements are made in a small anechoic chamber [56] which facilitates free-field conditions where no reverberation of the sound source is present. Fig. 2 displays our measurement setup with three soundfield microphones and a loudspeaker representing a single sound source inside an anechoic chamber.

FIGURE 2. - Anechoic chamber with triangular stand counting three soundfiled microphones and speaker representing sound source.
FIGURE 2.

Anechoic chamber with triangular stand counting three soundfiled microphones and speaker representing sound source.

Soundfiled microphone uses 4 subcardioid capsules mounted as close as possible to form a tetrahedron. Each soundfield microphone can be viewed as four symmetrical receivers positioned on the surface of the sphere where it can produce two distinct sets of audio signals called A-format and B-format. A-format consists of 4 signals coming from each microphone capsule arranged as shown in Fig. 3.

FIGURE 3. - Soundfield microphone.
FIGURE 3.

Soundfield microphone.

The B-format signals comprise a truncated spherical harmonic decomposition of the sound field. They correspond to the sound pressure and the three components of the pressure gradient at a point in space. The transformation from A to B-format can be easily performed by knowing the measurement values of the individual capsules in A-format. A linear system of equations displayed below can be used for format conversion:\begin{align*} p_{w}=&p_{LF} + p_{RB} + p_{RF} + p_{LB}, \\ p_{x}=&p_{LF} - p_{RB} + p_{RF} - p_{LB}, \\ p_{y}=&p_{LF} - p_{RB} - p_{RF} + p_{LB}, \\ p_{z}=&p_{LF} - p_{RB} - p_{RF} - p_{LB},\tag{21}\end{align*}

View SourceRight-click on figure for MathML and additional features. where p_{w} is sound pressure signal at the microphone position, p_{x} is the sound velocity in the direction back and forth, p_{y} is the sound velocity information in the direction left to right, and p_{z} is the sound velocity information in the direction up and down. Additionally signal filtering can be used to compensate for inequalities between individual capsules.

To validate estimation algorithm (17) using real measurements, this paper proposes two methods for calculating TDOA and AOA values from observed sound signal emitted from unknown source location \boldsymbol {s} .

A. TDOA Estimation

The system is composed of an unknown sound source and three soundfield microphones. If the sound source transmits a signal at time T = 0, the microphones will sense the signals at the unknown times T1, T2, and T3. Following (3), the difference between TOAs can be measured as follows:\begin{align*} \Delta T_{21}=&T_{2}-T_{1}, \\ \Delta T_{31}=&T_{2}-T_{1},\tag{22}\end{align*}

View SourceRight-click on figure for MathML and additional features. where the microphone at position \boldsymbol {m}_{1} is taken as a reference. Note that for the case of three microphones \Delta T_{32} value could be additionally calculated to introduce redundancy in the system. To calculate the time lag between two microphones, in this work we used the generalized cross-correlation phase transform (GCC-PHAT) algorithm as follows:\begin{align*} \hat {G}_{PHAT}(f)=&\dfrac {X_{1}(f)[X_{i}(f)]^{*}}{|X_{1}(f)[X_{i}(f)]^{*}|}, \\ \hat {d}_{PHAT}(1,i)=&\arg \max (\hat {R}_{PATH}(d))\tag{23}\end{align*}
View SourceRight-click on figure for MathML and additional features.
where x_{i} is a sound signal received by the microphone at position \boldsymbol {m}_{i} , X_{i} is Fourier transform and \hat {R}_{PATH}(d) is the inverse Fourier transform of i-th signal and []^{*} denotes the complex conjugate. The term \hat {d}_{PHAT} corresponds to the estimated time difference between \boldsymbol {m}_{1} and \boldsymbol {m}_{i} . To estimate TDOA using soundfield microphones, in this work, A-format is used which enables to calculate TDOA as average time lag between corresponding four capsules of the observed microphones i.e \hat {d}_{PHAT}(1_{LF},i_{LF}) , \hat {d}_{PHAT}(1_{LB},i_{LB}) , \hat {d}_{PHAT}(1_{RB},i_{RB}) and \hat {d}_{PHAT}(1_{RF},i_{RF}) . Note that for this pairwise TDOA calculation, each soundfield microphone should be aligned with global coordinate axes. Calculating the range difference from (3) is straightforward, where v = 331.57+0.607\lambda [m/s] is speed of sound in the air and \lambda is air temperature in C^{\circ } . As already mentioned, algebraic solution (17) assumes that the measurement vector is corrupted with small Gaussian noise defined by covariance matrix \boldsymbol {Q} , pointing to the need of investigating the uncertainty of our TDOA estimation method. In order to analyze our TDOA based estimation performance given measurement duration t , we examined uncertainty reduction when the length of the sound signal used for estimation increases. From the results shown in Fig. 4 it is visible that by choosing measurement duration t > 300 ms will provide stable TDOA observations ({RD} < 5 cm). Fig. 4 illustrates deviation of time difference regarding \boldsymbol {m}_{1} and \boldsymbol {m}_{2} for a given sound source location \boldsymbol {s} =[{160, -30, -20}]^{T} .

FIGURE 4. - Dependency between standard deviation of TDOA estimation and measurement length.
FIGURE 4.

Dependency between standard deviation of TDOA estimation and measurement length.

B. AOA Estimation

In this work, we propose a method for calculating AOA estimation by exploiting B-format signals of the soundfield microphone. The AOA estimation technique is based on obtaining directivity vector \boldsymbol {d}= [d_{x}, d_{y}, d_{z}]^{T}\in \mathbb {R}^{3} . The procedure is performed by calculating the power of each velocity signal for a given integration interval t , where each velocity signals corespondent to X, Y, Z component of B-format. Next, normalization is applied by dividing each p component with R = \sqrt {p_{x}^{2} + p_{y}^{2} + p_{z}^{2}} where p_{x} , p_{y} and p_{z} represents the total power over t for each discrete axis. To produce a pair of AOA observations from the calculated directional vector \boldsymbol {d} , similar expression as (1) is used:\begin{align*} \begin{bmatrix} \theta _{i} \\ \phi _{i} \end{bmatrix} = \begin{bmatrix} \arctan \left ({\dfrac {d_{y,i}}{d_{x,i}}}\right) \\ \arctan \left ({\dfrac {d_{z,i}}{\sqrt {d_{x,i}^{2} + d_{y,i}^{2}}} }\right) \end{bmatrix},\quad i=1,2,3\tag{24}\end{align*}

View SourceRight-click on figure for MathML and additional features. where \theta _{i} and \phi _{i} are angles concerning microphone internal origin. It should be noted here that (24) is prone to the error if the axis of each soundfield microphone used for AOA estimation are not perfectly aligned with the axis of global geometry.

Following similar logic as in previous subsection III-A, we investigated the dependency between the directivity error and integration time i.e. duration of the sound signal used for calculating AOA observations. From the Fig. 5 it is visible that for t > 400~ms our methods starts to stabilize, providing estimation results with standard deviation of \sigma _{AOA} < 0.5^{\circ } . Fig. 5 illustrates standard deviation of the estimated angle of directional vectors, measured by \boldsymbol {m}_{1} given the position of the sound source \boldsymbol {s} =[{160, -30, -20}]^{T} .

FIGURE 5. - Dependency between standard deviation of AOA estimation and measurement length.
FIGURE 5.

Dependency between standard deviation of AOA estimation and measurement length.

Based on our findings in terms of dependency between \sigma _{AOA} and t and by following the conclusion regarding \sigma _{RD} and t from subsection III-A, we have chosen duration of the signal measurement to be t=500 ms in all our subsequent experiments.

SECTION IV.

Localization Results and Performance Evaluation

This section presents the performance study regarding the localization accuracy of the proposed SSL approach in the free field conditions. First, a simulation is made to demonstrate a margin of localization error regarding uncertainty in TDOA-AOA observations. To alleviate the dependency on the particular geometry, TDOA-AOA values are calculated by sampling the angles between three microphone stations from a uniform distribution. In total, three scenarios are conducted to analyze how measurement error affects the deviation of localization results compared to theoretical CRLB. Second, the experimental validation of the simulation results is made in the anechoic chamber containing one sound source and three soundfield microphones. Anechoic environment, however far from being realistic, gives the possibility to precisely control the environmental conditions and to accurately determine the measurement deviations of the proposed TDOA-AOA estimation method, which is substantial in this experiment. If the experiment was carried out in a reverberant room, the room acoustics would influence the estimation of TDOA-AOA values and thus obtained measurement uncertainties would not make a universal reference. Identifying measurement uncertainties \sigma _{RD} and \sigma _{AOA} , proposed solution (17) can be tested using estimated TDOA-AOA values calculated from real sound source.

A. SSL Simulation Under Gaussian Noise

This subsection presents performance analysis of the proposed estimation method where it will be shown that for the given localization scenario, the proposed method can reach CRLB for Gaussian noise present in TDOA and AOA measurements. For the known source position \boldsymbol {s} , CRLB matrix is defined by [57]:\begin{align*} \boldsymbol {CRLB}(\boldsymbol {s})=&\boldsymbol {FIM}^{-1}(\boldsymbol {s}) \tag{25}\\ \boldsymbol {FIM}(\boldsymbol {s})=&\left({\dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}}}\right)^{T}\boldsymbol {Q}^{-1}\dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}}\tag{26}\end{align*}

View SourceRight-click on figure for MathML and additional features. where FIM is the Fisher information matrix given as (26) for zero-mean Gaussian with a covariance matrix \boldsymbol {Q}\in \mathbb {R}^{8 \times 8} . Taking partial derivative of \boldsymbol {\mu } with respect to s_{x} , s_{y} and s_{z} yields:\begin{equation*} \dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}} = [\boldsymbol {c}_{21}, \boldsymbol {c}_{31}, \boldsymbol {D}_{1}^{T},\boldsymbol {D}_{2}^{T}, \boldsymbol {D}_{3}^{T}]^{T}\in \mathbb {R}^{8 \times 3}\tag{27}\end{equation*}
View SourceRight-click on figure for MathML and additional features.
where:\begin{align*} \boldsymbol {c}_{i1}=&\dfrac {\boldsymbol {s}-\boldsymbol {m}_{i}}{r_{i}}-\dfrac {\boldsymbol {s}-\boldsymbol {m}_{1}}{r_{1}}, \quad i =2,3. \tag{28}\\ \boldsymbol {D}_{m}=&\begin{bmatrix} \dfrac {-d_{y,i}}{l_{i}^{2}} &\quad \dfrac {d_{x,i}}{l_{i}^{2}} && 0 \\ \dfrac {-d_{x,i}d_{z,i}}{r_{i}^{2}l_{i}} &\quad \dfrac {-d_{y,i}d_{z,i}}{r_{i}^{2}l_{i}} &\quad \dfrac {l_{i}}{r_{i}^{2}} \\ \end{bmatrix}, \\ l_{i}=&\sqrt {d_{x,i}^{2} + d_{y,i}^{2}}, \quad i=1,2,3\tag{29}\end{align*}
View SourceRight-click on figure for MathML and additional features.
values d_{x,i} = s_{x} - m_{x,i} and respectively d_{y,i} and d_{z,i} are the difference from source position \boldsymbol {s} and \boldsymbol {m}_{i} for a given direction.

The theoretical value of CRLB presents the best possible accuracy that estimator can achieve given small measurement error. To adequately analyze the performance of the proposed method, evaluation is made in terms of the difference between the theoretical value of square root CRLB and root mean square error (RMSE) of the estimation algorithm for a given measurement error variance \sigma _{AOA}^{2} , \sigma _{RD}^{2} . In this work, measurement error covariance matrix \boldsymbol {Q} is modeled as:\begin{align*} \boldsymbol {Q} = \left [{\begin{matrix} \sigma _{RD}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, \sigma _{RD}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} \\ \end{matrix}}\right] \\\tag{30}\end{align*}

View SourceRight-click on figure for MathML and additional features.

In total, three different simulations for a sound source localization scenario is made. Firstly, we inspect the performance of the proposed estimation method when the uncertainty of TDOA measurement increases. Secondly, we perform a similar test by increasing the standard deviation \sigma _{AOA} . And for the third simulation, a difference of RMSE from theoretical root CRLB will be analyzed in a scenario where the range of the source \boldsymbol {s} is increasing from the origin.

For an input simulation data, TDOA and AOA observations are modeled as (4), where \boldsymbol {\tau _{i1}} and \boldsymbol {\kappa _{i}} correspond to true geometrical values calculated from (1) and (2), while \varepsilon _{i1} and \boldsymbol {\epsilon }_{i} are corresponding zero-mean Gaussian noise defined by error variance \sigma _{AOA}^{2} , \sigma _{RD}^{2} . We should also note that measurement noise between each TDOA and AOA observation is uncorrelated.

Each simulation results are reported in terms of RMSE which is defined as RMSE(s) = \sqrt {\sum _{l=1}^{L}||\ddot {\boldsymbol {s}}_{l}- \boldsymbol {s}||^{2}/L} , where \ddot {\boldsymbol {s}}_{l} is the estimated source position after {l} -th iteration and \boldsymbol {s} is true position of the source. L = 500 is the number of estimation runs for a given \sigma _{RD} , \sigma _{AOA} and \boldsymbol {s} . Note that weight matrix \boldsymbol {W} in (17) depends on actual source position via (18) which is not know in advance. We can define \boldsymbol {W} as an identity matrix \boldsymbol {I}\in \mathbb {R}^{8 \times 8} to obtain an initial position estimates \ddot {\boldsymbol {s}}_{1} from (17) which in this case reduces (17) back to LS estimation. For the rest of the simulation, the matrix \boldsymbol {W} is computed using previously estimated source position \ddot {\boldsymbol {s}}_{l-1} together with the current AOA observation. Simulation procedure respects geometric configuration shown in Fig. 1 where three stations are placed 1 m apart forming vertices of an equilateral triangle with the center at the origin and lying on the XY plane. For each simulation, the calculated RMSE value is given by averaging results obtained for ten different rotations of the equilateral triangle around Z-axis.

Results of the first simulation are shown in Fig. 6 For a fixed parameters \sigma _{AOA}= 0.5^{\circ } and \boldsymbol {s} = [{160,40,40}]^{T} cm, reported RMSE closely follows theoretical CRLB given increase in \sigma _{RD} . Fig. 7 illustrates the performance of the second simulation where \sigma _{AOA} varies from [0.5^{\circ }-2.5^{\circ }] while parameter \sigma _{RD} is fixed at 10 cm leaving source at position \boldsymbol {s} = [{160,40,40}]^{T} cm. The third simulation shows the performance evaluation depending on the distance from the source for the fixed \sigma _{RD} = 10 cm and \sigma _{AOA} = 0.5^{\circ } . For this case the source moves away from the origin in accordance with the expression \boldsymbol {s} = a[{50,50,50}]^{T} where a ranges from 1 to 5. From Fig. 8 it is clear that proposed method provides near CRLB performance.

FIGURE 6. - Simulation 1: RMSE dependency on 
$\sigma _{RD}$
.
FIGURE 6.

Simulation 1: RMSE dependency on \sigma _{RD} .

FIGURE 7. - Simulation 2: RMSE dependency on 
$\sigma _{AOA}$
.
FIGURE 7.

Simulation 2: RMSE dependency on \sigma _{AOA} .

FIGURE 8. - Simulation 3: RMSE dependency on the distance of the source.
FIGURE 8.

Simulation 3: RMSE dependency on the distance of the source.

B. Experimental Evaluation

In this section, we present experimental results on sound source localization in 3D space. Following theoretical simulation given in subsection IV-A, three TetraMic soundfield microphones were placed 1 m apart forming vertices of an equilateral triangle. Arta measurement software was used as the white noise generator, which was then converted to the analog audio signal by MOTU STAGE B-16 audio interface, amplified with Pioneer SA-8500 II sound amplifier and then played through Magnat 145 801 12 loudspeaker, which represented the source. The TetraMic is connected to audio interface via four XLR-M connectors coming from the PPA receiver. For each soundfield microphone, the four-channel signal is digitized with the B-16 audio interface and recorded with Reaper software in A-format. To obtain B-format from four-channel A-format in this work, we used an official software VVMic which also performs the corrections using the calibration file, provided with the specific microphone.

In the experiment we used 48 different sound source locations. Due to space limitations of our anechoic chamber all measurements were conducted regarding s_{x} >0 by following measurement plan shown in Fig. 9, where after performing measurements for each 12 different sound source locations, an equilateral triangle holding TetraMic microphones was rotated around Z-axis for 90°.

FIGURE 9. - Measurement plan.
FIGURE 9.

Measurement plan.

For each source location \boldsymbol {s}_{i} white noise signal of 18s duration was recorded. As concluded in the previous section, t=500\,\,ms interval was chosen for measurements’ duration which resulted in 36 observations for each source position \boldsymbol {s}_{i} . Following the procedure explained in section III, for each segment we calculated two TDOAs from A-format and three AOAs from B-format. Measured observations were first used in (17) to get the initial position estimate \ddot {\boldsymbol {s}}_{1} of the source which is needed to calculate the weight matrix \boldsymbol {W} . After obtaining an initial position estimate, method (17) was used 35 more times where for each position update, source estimation from the previous step was used to calculate \boldsymbol {W} which was used in conjunction with new TDOA and AOA observations to estimate new sound source position in 3D space. Since the computational step (17) is performed for each position update, it is interesting to analyze the efficiency of (17) in terms of computation time given its corresponding TDOA-AOA values. In this work, all processing procedures were carried out on the platform of Matlab 2019 on a 64-bit PC with the computational capability of Intel(R) Core(TM) i5 CPU @ 3.80 GHz and memory of 32.00 GB 400 MHz DDR4. In the average running time of our algorithm (17) for initial position estimation is 25.4ms and 3.9 ms for each additional position update.

To show the effectiveness of the proposed approach, Fig. 10 illustrates the estimation results for 4 different sound source locations, where reported RMSE results were 11.17, 15.56 18.90, and 10.47 cm respectively for each location. It is interesting to note that the estimate (shown in cyan) distant from the rest of the estimation group corresponds to the initial position guess, where the true source location is indicated by the coordinates label. Narrow estimation spread is clearly visible.

FIGURE 10. - Sound source position estimation for 4 different locations in anechoic chamber.
FIGURE 10.

Sound source position estimation for 4 different locations in anechoic chamber.

It was also interesting to analyze the dependency between reported RMSE and the number of iterations. From Fig. 11 it is visible that by using more iterations, (17) will provide better results. For longer measurement intervals, we can achieve even higher accuracy. The obtained result indicates that the proposed procedure can be used to locate the stationary sound source with excellent precision in the free-field. We plot the experimental cumulative distribution function (CDF) of positioning error RMSE in Fig. 12 to show the localization performance of the proposed weighted least square estimator (17). CDF in Fig. 12 is calculated for 48 source positions with a median value of 21.58 cm. Given a measurement interval of 18 sec, i.e., 36 estimations per point, the worst reported RMSE value reaches 38.90 cm while the best point has RMSE of 10.47 cm. From the CDF graph, it is visible that for 90% source positions, localization error falls within the margin of 28.38 cm.

FIGURE 11. - Dependency between reported RMSE and number of estimation steps.
FIGURE 11.

Dependency between reported RMSE and number of estimation steps.

FIGURE 12. - The CDF of the RMSE for 48 measurement points.
FIGURE 12.

The CDF of the RMSE for 48 measurement points.

From the presented results, it is evident that the proposed measurement method provides accurate estimations of azimuth and elevation angles of the unknown sound source in the anechoic chamber. In particular, free field conditions play a major role in the acoustic measurements and sound perception experiments, as the results are influenced only by the direct-path component of the sound source and not by multipath component caused by room reflections. Given that the experimental evaluation in the SSL task was performed using real sound measurements, the above results indicate the potential for further research, where it can be expected that by employing the more robust method for estimating TDOA, such as method in [54], the results can translate to real-world cases. Although it is undeniable that the room acoustics would influence the SSL recognition results, conducting experimental evaluation in a controlled environment provides universal reference to localization performance. Since most state-of-the-art TDOA-based SSL either do not have experimental analysis (where results are given through simulation) or experimental evaluation is performed in non reproducible environmental conditions, a direct comparison is often inaccessible. Regardless of the results obtained under certain conditions, our comparison with state-of-the-art is made to demonstrate the performance of the proposed SSL approach. For instance, the state-of-the-art method in [17] reported a localization performance of 0.3m in terms of RMSE concerning the typical room experiment. Authors also compared themselves with commonly used SRP-HAT method, and they showed that their approach outperformed the SRP-HAT method, which reported a localization error of 0.76m. It is to expect that evaluation results obtained in the free-field will outperform results obtained in a real-life environment [17], and from comparison, it can be seen that the proposed TDOA-AOA approach, based on three soundfiled microphones, achieves 0.09 m lower localization error compared to Multilevel B-Splines-Based Learning Approach and 0.55 m compared to SRP-HAT method [17]. Given the fact that the SSL results presented are 30% and 72% lower in terms of RMSE indicate the effectiveness of the proposed approach in free-filed with great potential to operate under realistic conditions.

SECTION V.

Conclusion

This paper proposes a sound source localization method in 3D space using a geometric configuration of three microphone stations. The closed-form solution for estimating a sound source location based on two TDOAs and three AOAs is presented. In this work, a soundfield microphone is used as a measurement station. By exploiting A-format of the soundfield microphone, a pair-wise TDOA estimation using a general cross-correlation is obtained as the time lag between the signals from two microphones. The method for obtaining AOA measurements is proposed, based on a calculating directional vector derived from B-format. The investigation of the impact of the signal measurement interval length on TDOA and AOA estimation performance is made. The proposed method was evaluated by simulations and physical experiments in our anechoic chamber. The simulation results of the sound source localization reported that (17) reaches the theoretical CRLB performance regarding a small Gaussian noise presented in measurements. Since (17) can be viewed as iterative algorithm, an investigation between accuracy and number of iteration steps was made, concluding that with a longer recording of the analyzed signals from unknown source position, we can achieve even higher accuracy. The obtained experimental results indicate that the proposed procedure can be used to locate the stationary sound source with good precision in free-field environment. In our future work, it is planned to extend the proposed method for multiple source localization and tracking, first in free-field anechoic chamber conditions, and later in different SNR and reverberation conditions. Special efforts will be put on problems tackling real-life scenarios, like detecting speech in noisy environments when distant recording is required, in order to increase speech recognition performance.

References

References is not available for this document.