Introduction
The sound source localization (SSL) is an essential step in a wide range of audio/acoustic-based applications. Nowadays-concerned research topics on SSL are ranging from detection of the speaker position in human-computer interaction [1] or smart video conferencing [2], robot movement in an unknown environment [3], [4], search and rescure [5] to advance military applications such as localization of a sniper [6] and medium-range aircraft localization [7]. Besides, SSL is usually used as a necessary preprocessing step before the enhancement of an acoustic signal from a particular location [8].
In general, the task of source localization can be viewed either as an active localization scenario, where transmitters actively emit signals to illuminate the target of interest while the target location is inferred by collecting reflections, wherein the scenario of passive localization, receiving sensors can only collect signals emitted from the source. In many areas, only a passive source location is considered, where the signals usually do not carry the time information about their transmission. The sound localization as a type of passive localization refers to the problem of estimating the position from which a sound signal originates concerning the microphone array geometry. In this case, a localization system is unable to directly measure the time of arrival (TOA) between the source location and receiving sensor, but instead, only the difference between times when the different sensors receive the signals can be measured.
Various methods for SSL have been proposed where all methods can be grouped by their efforts to detect sound source either in 2D space [9], [10] or in 3D space [3], [11]. Fundamentally, there are two main approaches to finding a source with respect to recorded audio signals. Both approaches are mainly based on estimating the time difference of arrival (TDOA) obtained by using various configurations of microphone arrays, such as linear array [12], circular array [13], or distributed array [14] and different cross-correlation algorithm to estimate time lag between microphones. The first approach aims to maximize the steered response power (SRP) of the output of a delay and sum beamformer [15]. This direct approach performs an exhaustive search in the whole SRP space to find a sound location, which is found to be computationally expensive. In contrast, the indirect approach is based on using estimated TDOAs measurements, where the sound source is ascertained by addressing criteria such as the hypercone fitting problem [16]. Although results obtained by the indirect method are more prone to error as they are more sensitive to background noise, and reflections oppose to direct methods, the main advantage of indirect TDOA based approach is that it can be effectively used in distributed microphone network since it only requires for TDOA values to be transmitted and not raw sound signal data [17].
A. Related Work
In many research papers, TDOA is a standard measurement used for passive localization [18], [19]. Source localization based solely on TDOA measurements demanded a specific number of measuring devices, i.e., a minimum number of three and four sensors are required to locate an unknown target respectively in 2D and 3D space. The principle of TDOA based approach is to estimate source position from the intersection of hyperbolic arcs and surfaces respectively for 2D and 3D cases. In the case of near-field applications where the source range to the sensor array baseline ratio is not large, the resulting intersection can be obtained by solving a set of nonlinear equations. On the contrary, in the far-field applications, resulting intersection produces a low location estimate since the hyperbolic arcs/surfaces become almost parallel to one another. Given the evidence that the accuracy of the source position estimate degrades when the source moves sufficiently far away from the sensors resulted in two different localization problems. Recently, unified near/fat-field TDOA based localization was proposed in [20]. The proposed approach consisted of two formulations for the unified localization problem in two different coordinate systems. The first formulation is of a nonlinear non-convex weighted least squares optimization based on the modified polar representation for the source position, and the other is the non-convex fractional programming formulation using the conventional Cartesian coordinates of the source position as the optimization variable. Besides the TDOA-based approach, a source location can also be calculated from the AOA measurements and its derivative. Obtaining AOA often involves a sensor that is equipped with an array of receivers; thus, it elevates a requirement of synchronization between different sensors since each produces angle by itself. In [21], the authors proposed a solution that can attain Cramér-Rao lower bound under mild conditions. Another 3D bearing-only localization is proposed by [22], where authors achieved a significant reduction in bias and root-mean-square error using a pseudo-linear estimator. In general, the passive source localization problem is not trivial since direct relationships between the position of a source and the measurements are complex, and procedures for solving equations for the TDOA and AOA methods are hard because of nonlinearity.
One of the more recent research directions is in combining TDOA and AOA measurements, where algebraic manipulations allow transforming the relationships to the linear form and lowering energy consumption [23]. There are several advantages with the hybrid TDOA-AOA approach, such as improved localization performance [24], [25], reduced number of sensors required [26] and it can minimize the occurrence of ghost targets which is typical for localization approach with individual TDOA measurements [27]. Many studies have been carried out proposing a different solution to source localization [26], [28]–[31]. The most straightforward method to source localization is an exhaustive search in a feasible solution region, which is a time-consuming and inefficient solution for real-time application. In general, the maximum likelihood (ML) estimator is introduced to estimate the location since it is asymptotically efficient.
However, the aftermentioned approaches are computationally expensive, and it is hard to find a closed-form solution, or a closed-form solution does not exist at all. One of the solutions is to linearize equations with a recursive approach such as Taylor-series. Nevertheless, these numerical search techniques can converge to an optimal solution only if the ML function is convex. The numerical methods are prone to error since they depend on the right initial position guess, and thus it is difficult to guarantee its global convergence and calculating time. Their iterative nature does not make them very suitable for real-time applications. To improve the robustness and reduce the complexity, a closed-form solution is required. A linear least-squares estimator with the closed-form solution, called pseudo linear estimator (PLE), was proposed in [22]. Although this approach is less computationally demanding, the estimated source position is biased because of the correlation between system and measurement. Recently, authors [32] proposed a new method for localization. The proposed method represents a simple algebraic solution that does not suffer from the local convergence problem. However, this method also has a larger bias since localization accuracy is affected by the deviation of the noise correlation matrix. Another fine localization solution, based on generalized trust-region subproblem technique is proposed by [33], were authors analyzed the necessary optimal conditions of squared range difference least square cost function. To reduce the bias of the estimator, a few different methods where proposed. In [34] BR-PLE method is proposed for reducing bias. Authors in [35] proposed a solution to find a rotation angle that could reduce the bias of the estimator since they demonstrated that estimator performance was sensitive to origin rotation. [36] proposed another solution for reducing estimator bias in the presence of sensors error. The proposed method introduces a quadratic constraint so that estimator expectation cost function can attain the minimum value at a true position and thus to achieve Cramer-Rao lower bound. By analyzing different approaches, one could conclude that for the most methods, an estimation bias arises from the least-squares techniques. One of the solutions is to construct a new pseudo linear system such as [27], where a new weight least-square method is proposed. Authors claimed that the novel structured total least squares method could reduce estimation bias when the target is outside the convex hull formed by measurement sensors. The total least squares estimator was also considered in [37], where it has demonstrated improved accuracy over least-square solutions. Another practical solution to position estimation was proposed by [38]. The proposed closed-form method was based on converting time-delay measurements to angular information, but it couldn’t achieve the CRLB performance. Recently, an efficient closed-form solution has been proposed for passive source location using only two stations [26]. In their work, a new relationship between hybrid TDOA and AOA measurements and unknown source positions was constructed. The theoretical simulation shows that the proposed solution can achieve CRLB for Gaussian noise, where bias compared to variance can be ignored.
By analyzing the aforementioned different source localization approaches, some general remarks regarding localization performance can be made. Investigating different types of input data for source estimation, one can observe that using hybrid TDOA and AOA measurements reduces the number of sensors required, which is especially important for a real-life scenario where the basic assumption about line-of-sight (LOS) between unknown source position and sensor may not be respected. In many the non-line-of-sight (NLOS) environments, signals emitted from the source are often inaccessible by all measurement sensors, and one of the solutions is to use wireless sensor networks (WSN) in which received signals by each sensor are transmitted to the fusion center for source localization. Authors in [39] proposed a distributed NLOS cooperative localization algorithm. The proposed localization algorithm employs the multiplicative convex model based on the physical mechanism of the NLOS propagation to achieve robustness in changing environments. A TDOA-based cooperative localization approach for mixed LOS/NLOS conditions is proposed by [40]. For the location of the multiple stationary target nodes, the authors formulated a non-convex robust weighted least squares problem (RWLS). To efficiently solve RWLS, the semidefinite relaxation technique is used to transform RWLS into a convex mixed semidefinite and second-order cone programming problem. Authors in [41] presented the energy-based localization solution in WSNs using received signal strength (RSS) and received signal strength difference (RSSD). In the proposed solution, RSSD is based on transforming the nonlinear and non-convex objective functions into a convex optimization problem via relaxation and semidefinite programming. Another mixed semidefinite and second-order cone relaxation for source localization in 3D WSN was proposed by [42]. The proposed target node localization is based on hybrid RSS-AOA measurements in both noncooperative and cooperative WSNs, where the authors proposed new LS estimator to reduce the implementation costs. From conducted analysis authors concluded that proposed RSS-AOA based estimator is more suitable for large-scale cooperative WSNs compared with AOA-TDOA based estimators. In general, centralized algorithms suffer from high computation complexity, network traffic bottleneck, and as such, are not recommended in scenarios where each sensor node cannot get raw measurements directly. To address such problem authors in [43] proposed a completely decentralized localization approach based on augmented Lagrangian methods and alternating direction method of multipliers (ADMM). Discussing performance comparison in terms of computational efficiency, it is visible that the unknown source position can be directly calculated from different sets of geometric relationships. Although computational complexity for this kind of approach is low, it frequently does not perform sufficiently when measurement error exists. Since TDOA and AOA observations have measurement error, the location of a source is often estimated by propagating these errors trough the computation, where iterative nonlinear minimization is required for an optimal solution. The most straightforward approaches for handling measurement noise are iterative algorithms based on initial position estimate, obtained, for example, by the Gauss-Newton method, which has high computational requirements. The maximum likelihood estimator is asymptotically efficient, but it requires a good initial guess. To avoid the need for good initial position guess characteristic for iterative ML approach, different closed-form source estimation methods are proposed. A linear LS approach is an alternative approach, which can achieve CRLB but reports large estimation bias. In general, various closed-form solutions have been proposed, each designed to reduce biases, or to work with a different number of sensors. To asses the performance of localization estimation, most state-of-the-art studies perform benchmark in terms of CRLB. Analyzing literature, one can see that the weighted least square estimator is superior in comparison to LS solutions in terms of lower bias. Also, it can be concluded that even both WSL and ML approaches can achieve CRLB, the WSL approach is more computationally attractive and suitable for real-time application and does not really on the initial guess. From the performance comparison between different methods, one can conclude that while most state-of-the-art closed-form solutions achieve CRLB, each approach is designed to work under restricted measurement error.
In the context of room acoustics, it is difficult to establish measurement error as the measurement microphone captures not only the direct-path component of the source signal but also the multipath component caused by reflections. The multipath component, together with the background noise, can lead to distortion of the time delay information from received signals, and thus it can degrade the localization performance. To address key challenges of the realistic environment such as room reverberation, background noise, and sound interference, different methods to compute the TDOAs across various combinations of pairs of spatially separated microphones were proposed [44]–[47]. Recent work also suggested that deep learning can successfully be applied for modeling rooms acoustic [48]. For instance, in [49] authors employed deep model for SSL, where it was shown that deep learning-based system achieved higher accuracy under low SNR conditions in comparison with cross-correlation phase transform (GCC-PHAT) method. Authors in [17] proposed a novel learning approach for SSL based on TDOA estimation, where coordinates of a sound source were defined as functions of TDOA. In their work, pre-recorded sound measurements and their corresponding source locations were used to train the multilevel B-Splines based learning model. A new dataset for learning-based SSL was proposed by [50] which contained different acoustic events recorded in the anechoic chamber, where the anechoic chamber environment was used to verify the feasibility of the proposed baseline model. Authors in [51] addressed SSL for indoor environments with high reverberation and low signal-to-noise ratio. The authors proposed a novel sound source localization method using a probabilistic neural network for the classification of 3D space clusters. Another learning-based approach was demonstrated in [52], where authors evaluated the Structural Sparse Bayesian Learning model with signals recorded in an anechoic chamber with one reflective plate. Moreover, it can be seen that many researchers validated their SSL methods by using recorded audio signals simulating free field conditions or experimental analysis was carried out inside a semi-anechoic and anechoic chamber. The anechoic chamber provides a good simulation of the outdoor conditions due to the low level of reflections. This environment provides the possibility to precisely control the conditions and to measure the levels of sound events and noise, which is substantial for many experiments. Kotus et al. [53] performed multiple sound sources localization in the anechoic chamber, where different methods for obtaining the direction of arrival were tested. In the work [54], the authors proposed a modified cross-correlation algorithm to obtain a more reliable measurement of time difference of arrival in the reverberation environment. They performed a triangulation procedure using calculated TDOA values to obtain a sound source position in 3D space. Another three-dimensional method for SSL was proposed by Ding et al. [55]. The authors performed theoretical simulation accompanied by experimental results in the anechoic chamber. They proposed the use of a planar microphone array combined with a beamforming technique to obtain the location of the point sound source.
B. Outline
This paper presents a passive sound source localization method using three soundfield microphone stations. The geometric configuration of three soundfield microphones can be employed to obtain two TDOA and three AOA measurements concerning unknown source position. A closed-form mathematical solution for SSL estimation is presented. Results are given in terms of RMSE, where it was shown that simulation of the sound source estimation algorithm can reach Cramér-Rao lower bound for small Gaussian noise presented in measurements. Theoretical simulation is supported by the experimental analysis conducted in our anechoic chamber [56]. In this work, TDOA and AOA measurements are directly obtained by exploiting the A and B format of the soundfield microphones, which can achieve small measurement deviations. This paper also demonstrates the value of using a soundfield microphone for the SSL task because AOA can be easily obtained due to the configuration of the soundfield microphone capsules.
The rest of the paper is organized as follows. Section II describes the mathematical formulation of the SSL algorithm. Section III presents our TDOA and AOA estimation methods using soundfield microphones. The simulation and experimental results in the anechoic chamber are given in Section IV, and Section V concludes the paper.
3D Source Location Estimation
This section explains a novel sound source localization method based on three soundfiled microphone stations. First, we establish a geometrical relationship between an unknown position of the sound source and known positions of sensors, over obtained AOA and TDOA observations. Second, we define a weighted least squares estimator, where the sum of the squared residuals is minimized with respect to the error measurement vector. Measurement error is modeled as a covariance matrix containing measurement uncertainties. In experimental phase those measurement error parameters are obtained through testing in the anechoic chamber [56].
Here we presented the theoretical formulation of the localization scenario where a geometric configuration of three stations \begin{align*}&\hspace {-.5pc}\begin{bmatrix} \theta _{i} \\ \phi _{i} \end{bmatrix} = \begin{bmatrix} \arctan \left ({\dfrac {s_{y}-m_{y,i}}{s_{x}-m_{x,i}}}\right) \\ \arctan \left ({\dfrac {s_{z}-m_{z,i}}{\sqrt {(s_{x}-m_{x,i})^{2} + (s_{y}-m_{y,i})^{2}}} }\right) \end{bmatrix}, \\&\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\quad\displaystyle { i=1,2,3} \tag{1}\end{align*}
Let \begin{equation*} r_{i} = ||\boldsymbol {s}-\boldsymbol {m}_{i}|| = \sqrt {\left ({\boldsymbol {s}-\boldsymbol {m}_{i})^{T}(\boldsymbol {s}-\boldsymbol {m}_{i}}\right)},\quad i=1,2,3.\tag{2}\end{equation*}
\begin{equation*} \tau _{i1} = r_{i} - r_{1} = \Delta t_{i1} * v, \quad i=2,3.\tag{3}\end{equation*}
To estimate the unknown source position \begin{align*} \hat {\boldsymbol {\kappa }}_{i}=&\boldsymbol {\kappa _{i}} + \boldsymbol {\epsilon }_{i} = \begin{bmatrix} \theta _{i} + n_{i}\\ \phi _{i} + m_{i} \end{bmatrix}, \\ \hat {\tau }_{i1}=&\tau _{i1} + \varepsilon _{i1}\tag{4}\end{align*}
\begin{equation*} \hat {\boldsymbol {\mu }} = \boldsymbol {\mu } + \boldsymbol {\varepsilon }\tag{5}\end{equation*}
For localization scenario shown in Fig 1. we construct a unit norm vector \begin{equation*} \boldsymbol {s}- \boldsymbol {m}_{i} = r_{i} \boldsymbol {b}_{i}, \quad i =1,2,3.\tag{6}\end{equation*}
By constructing the matrix \begin{align*} \boldsymbol {G}_{i} = \begin{bmatrix} \sin \theta _{i} &\quad \sin \phi _{i}\cos \theta _{i} \\ -\cos \theta _{i} &\quad \sin \phi _{i}\sin \theta _{i} \\ 0 &\quad -\cos \phi _{i} \end{bmatrix}, \quad i =1,2,3.\tag{7}\end{align*}
By performing left multiplication on (6) with \begin{equation*} \boldsymbol {G}_{i}^{T} \boldsymbol {u} = \boldsymbol {G}_{i}^{T} \boldsymbol {m}_{i}, \quad i =1,2,3.\tag{8}\end{equation*}
By realizing that \begin{equation*} \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{i} + \boldsymbol {b}_{i}^{T} \boldsymbol {b}_{j} - \boldsymbol {b}_{j}^{T} \boldsymbol {b}_{i} + \boldsymbol {b}_{j}^{T} \boldsymbol {b}_{j} = 0, \quad i\neq j =1,2,3.\tag{9}\end{equation*}
\begin{equation*} r_{i}(\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T}(\boldsymbol {b}_{i}+ \boldsymbol {b}_{1})= 0, \quad i =2,3.\tag{10}\end{equation*}
\begin{align*} 2(\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T} \boldsymbol {s} = (\boldsymbol {b}_{i}- \boldsymbol {b}_{1})^{T}(\boldsymbol {m}_{1} + \boldsymbol {m}_{i} - \boldsymbol {\tau }_{i1} \boldsymbol {b}_{1}), \quad i =2,3. \\\tag{11}\end{align*}
A closed form solution was obtained in [26] for two stations and one source, while in this work, we expand the solution to work with three stations. Using (8) and (11) together in matrix form gives:\begin{align*} \boldsymbol {h}=&\boldsymbol {G}^{T} \boldsymbol {s} \tag{12}\\ \boldsymbol {h}=&[\beta _{21}, \beta _{31}, \boldsymbol {s}_{1}^{T} \boldsymbol {G}_{1}, \boldsymbol {s}_{2}^{T} \boldsymbol {G}_{2}, \boldsymbol {s}_{3}^{T} \boldsymbol {G}_{3}]^{T} \in \mathbb {R}^{8} \tag{13}\\ \beta _{i1}=&(\boldsymbol {b}_{i}-\boldsymbol {b}_{1})^{T}(\boldsymbol {m}_{1} + \boldsymbol {m}_{i} - \tau _{i1} \boldsymbol {b}_{1}), \quad i= 2,3 \tag{14}\\ \boldsymbol {G}=&[2(\boldsymbol {b}_{2}-\boldsymbol {b}_{1}),2(\boldsymbol {b}_{3}-\boldsymbol {b}_{1}), \boldsymbol {G}_{1}, \boldsymbol {G}_{2}, \boldsymbol {G}_{3}] \in \mathbb {R}^{3 \times 8}\tag{15}\end{align*}
Equation (12) represents an ideal condition which doesn’t hold in practice since there is a measurement error \begin{equation*} \hat {\boldsymbol {h}} \approx \hat {\boldsymbol {G}}^{T} \boldsymbol {s} + \boldsymbol {T}\boldsymbol {\varepsilon }\tag{16}\end{equation*}
\begin{align*} \ddot {\boldsymbol {s}}=&\arg \min ||\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T}||^{2} \\=&\arg \min (\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T} \boldsymbol {s})^{T} \boldsymbol {W}^{-1}(\hat {\boldsymbol {h}}-\hat {\boldsymbol {G}}^{T} \boldsymbol {s}) \\=&(\hat {\boldsymbol {G}}\boldsymbol {W}^{-1}\hat {\boldsymbol {G}}^{T})^{-1}\hat {\boldsymbol {G}}^{-1}\hat {\boldsymbol {h}}\tag{17}\end{align*}
TDOA-AOA Estimation Method
In this section, we explain the proposed approach for estimating TDOA and AOA values, computed from the real sound measurements collected by three soundfield microphone stations. Given that the solution (17) performs minimization of the sum of squared residuals with respect to measurement error, it was essential to determine observation uncertainties
Anechoic chamber with triangular stand counting three soundfiled microphones and speaker representing sound source.
Soundfiled microphone uses 4 subcardioid capsules mounted as close as possible to form a tetrahedron. Each soundfield microphone can be viewed as four symmetrical receivers positioned on the surface of the sphere where it can produce two distinct sets of audio signals called A-format and B-format. A-format consists of 4 signals coming from each microphone capsule arranged as shown in Fig. 3.
The B-format signals comprise a truncated spherical harmonic decomposition of the sound field. They correspond to the sound pressure and the three components of the pressure gradient at a point in space. The transformation from A to B-format can be easily performed by knowing the measurement values of the individual capsules in A-format. A linear system of equations displayed below can be used for format conversion:\begin{align*} p_{w}=&p_{LF} + p_{RB} + p_{RF} + p_{LB}, \\ p_{x}=&p_{LF} - p_{RB} + p_{RF} - p_{LB}, \\ p_{y}=&p_{LF} - p_{RB} - p_{RF} + p_{LB}, \\ p_{z}=&p_{LF} - p_{RB} - p_{RF} - p_{LB},\tag{21}\end{align*}
To validate estimation algorithm (17) using real measurements, this paper proposes two methods for calculating TDOA and AOA values from observed sound signal emitted from unknown source location
A. TDOA Estimation
The system is composed of an unknown sound source and three soundfield microphones. If the sound source transmits a signal at time T = 0, the microphones will sense the signals at the unknown times T1, T2, and T3. Following (3), the difference between TOAs can be measured as follows:\begin{align*} \Delta T_{21}=&T_{2}-T_{1}, \\ \Delta T_{31}=&T_{2}-T_{1},\tag{22}\end{align*}
\begin{align*} \hat {G}_{PHAT}(f)=&\dfrac {X_{1}(f)[X_{i}(f)]^{*}}{|X_{1}(f)[X_{i}(f)]^{*}|}, \\ \hat {d}_{PHAT}(1,i)=&\arg \max (\hat {R}_{PATH}(d))\tag{23}\end{align*}
B. AOA Estimation
In this work, we propose a method for calculating AOA estimation by exploiting B-format signals of the soundfield microphone. The AOA estimation technique is based on obtaining directivity vector \begin{align*} \begin{bmatrix} \theta _{i} \\ \phi _{i} \end{bmatrix} = \begin{bmatrix} \arctan \left ({\dfrac {d_{y,i}}{d_{x,i}}}\right) \\ \arctan \left ({\dfrac {d_{z,i}}{\sqrt {d_{x,i}^{2} + d_{y,i}^{2}}} }\right) \end{bmatrix},\quad i=1,2,3\tag{24}\end{align*}
Following similar logic as in previous subsection III-A, we investigated the dependency between the directivity error and integration time i.e. duration of the sound signal used for calculating AOA observations. From the Fig. 5 it is visible that for
Based on our findings in terms of dependency between
Localization Results and Performance Evaluation
This section presents the performance study regarding the localization accuracy of the proposed SSL approach in the free field conditions. First, a simulation is made to demonstrate a margin of localization error regarding uncertainty in TDOA-AOA observations. To alleviate the dependency on the particular geometry, TDOA-AOA values are calculated by sampling the angles between three microphone stations from a uniform distribution. In total, three scenarios are conducted to analyze how measurement error affects the deviation of localization results compared to theoretical CRLB. Second, the experimental validation of the simulation results is made in the anechoic chamber containing one sound source and three soundfield microphones. Anechoic environment, however far from being realistic, gives the possibility to precisely control the environmental conditions and to accurately determine the measurement deviations of the proposed TDOA-AOA estimation method, which is substantial in this experiment. If the experiment was carried out in a reverberant room, the room acoustics would influence the estimation of TDOA-AOA values and thus obtained measurement uncertainties would not make a universal reference. Identifying measurement uncertainties
A. SSL Simulation Under Gaussian Noise
This subsection presents performance analysis of the proposed estimation method where it will be shown that for the given localization scenario, the proposed method can reach CRLB for Gaussian noise present in TDOA and AOA measurements. For the known source position \begin{align*} \boldsymbol {CRLB}(\boldsymbol {s})=&\boldsymbol {FIM}^{-1}(\boldsymbol {s}) \tag{25}\\ \boldsymbol {FIM}(\boldsymbol {s})=&\left({\dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}}}\right)^{T}\boldsymbol {Q}^{-1}\dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}}\tag{26}\end{align*}
\begin{equation*} \dfrac {\partial {\boldsymbol {\mu }}}{\partial {\boldsymbol {s}^{T}}} = [\boldsymbol {c}_{21}, \boldsymbol {c}_{31}, \boldsymbol {D}_{1}^{T},\boldsymbol {D}_{2}^{T}, \boldsymbol {D}_{3}^{T}]^{T}\in \mathbb {R}^{8 \times 3}\tag{27}\end{equation*}
\begin{align*} \boldsymbol {c}_{i1}=&\dfrac {\boldsymbol {s}-\boldsymbol {m}_{i}}{r_{i}}-\dfrac {\boldsymbol {s}-\boldsymbol {m}_{1}}{r_{1}}, \quad i =2,3. \tag{28}\\ \boldsymbol {D}_{m}=&\begin{bmatrix} \dfrac {-d_{y,i}}{l_{i}^{2}} &\quad \dfrac {d_{x,i}}{l_{i}^{2}} && 0 \\ \dfrac {-d_{x,i}d_{z,i}}{r_{i}^{2}l_{i}} &\quad \dfrac {-d_{y,i}d_{z,i}}{r_{i}^{2}l_{i}} &\quad \dfrac {l_{i}}{r_{i}^{2}} \\ \end{bmatrix}, \\ l_{i}=&\sqrt {d_{x,i}^{2} + d_{y,i}^{2}}, \quad i=1,2,3\tag{29}\end{align*}
The theoretical value of CRLB presents the best possible accuracy that estimator can achieve given small measurement error. To adequately analyze the performance of the proposed method, evaluation is made in terms of the difference between the theoretical value of square root CRLB and root mean square error (RMSE) of the estimation algorithm for a given measurement error variance \begin{align*} \boldsymbol {Q} = \left [{\begin{matrix} \sigma _{RD}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, \sigma _{RD}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} &\, 0 \\ 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, 0 &\, \sigma _{AOA}^{2} \\ \end{matrix}}\right] \\\tag{30}\end{align*}
In total, three different simulations for a sound source localization scenario is made. Firstly, we inspect the performance of the proposed estimation method when the uncertainty of TDOA measurement increases. Secondly, we perform a similar test by increasing the standard deviation
For an input simulation data, TDOA and AOA observations are modeled as (4), where
Each simulation results are reported in terms of RMSE which is defined as
Results of the first simulation are shown in Fig. 6 For a fixed parameters
B. Experimental Evaluation
In this section, we present experimental results on sound source localization in 3D space. Following theoretical simulation given in subsection IV-A, three TetraMic soundfield microphones were placed 1 m apart forming vertices of an equilateral triangle. Arta measurement software was used as the white noise generator, which was then converted to the analog audio signal by MOTU STAGE B-16 audio interface, amplified with Pioneer SA-8500 II sound amplifier and then played through Magnat 145 801 12 loudspeaker, which represented the source. The TetraMic is connected to audio interface via four XLR-M connectors coming from the PPA receiver. For each soundfield microphone, the four-channel signal is digitized with the B-16 audio interface and recorded with Reaper software in A-format. To obtain B-format from four-channel A-format in this work, we used an official software VVMic which also performs the corrections using the calibration file, provided with the specific microphone.
In the experiment we used 48 different sound source locations. Due to space limitations of our anechoic chamber all measurements were conducted regarding
For each source location
To show the effectiveness of the proposed approach, Fig. 10 illustrates the estimation results for 4 different sound source locations, where reported RMSE results were 11.17, 15.56 18.90, and 10.47 cm respectively for each location. It is interesting to note that the estimate (shown in cyan) distant from the rest of the estimation group corresponds to the initial position guess, where the true source location is indicated by the coordinates label. Narrow estimation spread is clearly visible.
It was also interesting to analyze the dependency between reported RMSE and the number of iterations. From Fig. 11 it is visible that by using more iterations, (17) will provide better results. For longer measurement intervals, we can achieve even higher accuracy. The obtained result indicates that the proposed procedure can be used to locate the stationary sound source with excellent precision in the free-field. We plot the experimental cumulative distribution function (CDF) of positioning error RMSE in Fig. 12 to show the localization performance of the proposed weighted least square estimator (17). CDF in Fig. 12 is calculated for 48 source positions with a median value of 21.58 cm. Given a measurement interval of 18 sec, i.e., 36 estimations per point, the worst reported RMSE value reaches 38.90 cm while the best point has RMSE of 10.47 cm. From the CDF graph, it is visible that for 90% source positions, localization error falls within the margin of 28.38 cm.
From the presented results, it is evident that the proposed measurement method provides accurate estimations of azimuth and elevation angles of the unknown sound source in the anechoic chamber. In particular, free field conditions play a major role in the acoustic measurements and sound perception experiments, as the results are influenced only by the direct-path component of the sound source and not by multipath component caused by room reflections. Given that the experimental evaluation in the SSL task was performed using real sound measurements, the above results indicate the potential for further research, where it can be expected that by employing the more robust method for estimating TDOA, such as method in [54], the results can translate to real-world cases. Although it is undeniable that the room acoustics would influence the SSL recognition results, conducting experimental evaluation in a controlled environment provides universal reference to localization performance. Since most state-of-the-art TDOA-based SSL either do not have experimental analysis (where results are given through simulation) or experimental evaluation is performed in non reproducible environmental conditions, a direct comparison is often inaccessible. Regardless of the results obtained under certain conditions, our comparison with state-of-the-art is made to demonstrate the performance of the proposed SSL approach. For instance, the state-of-the-art method in [17] reported a localization performance of 0.3m in terms of RMSE concerning the typical room experiment. Authors also compared themselves with commonly used SRP-HAT method, and they showed that their approach outperformed the SRP-HAT method, which reported a localization error of 0.76m. It is to expect that evaluation results obtained in the free-field will outperform results obtained in a real-life environment [17], and from comparison, it can be seen that the proposed TDOA-AOA approach, based on three soundfiled microphones, achieves 0.09 m lower localization error compared to Multilevel B-Splines-Based Learning Approach and 0.55 m compared to SRP-HAT method [17]. Given the fact that the SSL results presented are 30% and 72% lower in terms of RMSE indicate the effectiveness of the proposed approach in free-filed with great potential to operate under realistic conditions.
Conclusion
This paper proposes a sound source localization method in 3D space using a geometric configuration of three microphone stations. The closed-form solution for estimating a sound source location based on two TDOAs and three AOAs is presented. In this work, a soundfield microphone is used as a measurement station. By exploiting A-format of the soundfield microphone, a pair-wise TDOA estimation using a general cross-correlation is obtained as the time lag between the signals from two microphones. The method for obtaining AOA measurements is proposed, based on a calculating directional vector derived from B-format. The investigation of the impact of the signal measurement interval length on TDOA and AOA estimation performance is made. The proposed method was evaluated by simulations and physical experiments in our anechoic chamber. The simulation results of the sound source localization reported that (17) reaches the theoretical CRLB performance regarding a small Gaussian noise presented in measurements. Since (17) can be viewed as iterative algorithm, an investigation between accuracy and number of iteration steps was made, concluding that with a longer recording of the analyzed signals from unknown source position, we can achieve even higher accuracy. The obtained experimental results indicate that the proposed procedure can be used to locate the stationary sound source with good precision in free-field environment. In our future work, it is planned to extend the proposed method for multiple source localization and tracking, first in free-field anechoic chamber conditions, and later in different SNR and reverberation conditions. Special efforts will be put on problems tackling real-life scenarios, like detecting speech in noisy environments when distant recording is required, in order to increase speech recognition performance.