Minimum-Spanning-Tree-Based Time Delay Estimation Robust to Outliers

In this paper, we present a novel approach to estimating multiple time delays (TDs) in sensor arrays that is robust to outliers of TD measurements. These measurements are typically obtained from the peak of the cross correlation of two sensor signals but may contain outliers due to noise, significantly degrading the performance of downstream applications. To address this issue, we propose an approach to leverage only the best minimum-necessary TD measurements. First, we describe the general properties of TDs and show that the degree of freedom of TDs is the number of sensors minus one for a signal source, which indicates that the full set of TDs is redundant in this case. We then consider selecting nonredundant TDs from all measurements given the necessary and sufficient condition to reconstruct all TDs uniquely. We represent this condition by utilizing the graph theory, and then, formulate an optimization problem to select the optimal nonredundant TDs while satisfying the condition above. We reduce this problem to the problem of finding a minimum spanning tree and propose an efficient algorithm for TD estimation. Experimental evaluation shows that our method successfully eliminates outliers while ensuring that all TDs can be restored.

Recently, devices equipped with more than three sensors have become common.In the field of acoustic signal The associate editor coordinating the review of this manuscript and approving it for publication was Xuebo Zhang .
processing, spatially distributed microphone arrays and wireless acoustic sensor networks (WASNs) have been widely and actively studied [17], [18], [19].TDs are necessary information in resampling and synchronization [20], [21], [22], which is a general topic in WASNs.For these applications, a TDE method that can estimate multiple TDs with high accuracy is expected.
Let us consider TDE on an M channel sensor array.Fig. 1(a) shows an example of a 5-channel sensor array, where the circles and lines indicate the sensors and TDs, respectively.As shown in this figure, there are M C 2 = M (M − 1)/2 TDs corresponding to every possible pair of sensors (2-combinations of M sensors).A set of these TDs is referred to as the full TD set [23], [24].In practice, individual TDs can be measured by, e.g., the GCC method [11], [12].Therefore, the full TD set must be contaminated by errors in the GCC method and may contain outliers (huge estimation errors).This problem becomes more severe in, e.g., noisy environments and rooms with long reverberations.Since these outliers have a tremendous negative impact on downstream applications, in this paper, we study how to avoid using them but estimate the full TD set.
The key idea to achieve the above is redundancy of the full TD set (overdetermined) [23], [24].Indeed, M (M − 1)/2 TDs can be represented by the M − 1 TDs among them.For example, a TD between sensors i and j, τ ij say, can be obtained indirectly as τ ik +τ kj for any k.That is, the degree of freedom of the TDs is M −1, whereas M (M −1)/2 TDs exist.This redundancy can be utilized to improve the accuracy and robustness of the TDE algorithms.
One approach to leveraging the full TD set is computing M − 1 nonredundant TDs from every member of the set.A classical solution in this sense is using the average of τ ik + τ kj for all k (k = 1, . . ., M ) to estimate τ ij .This is the least square estimator under the assumption that TD measurements are contaminated by additive Gaussian noises [25], [26].Moreover, Velasco et al. have proposed novel TDE methods based on a TD (or TDOA) matrix (TDM) [24].The TDM itself has been considered in [27], and its properties have been further studied in [24].The TDM is one representation of a full TD set; thus, its measurement might include errors and outliers.Therefore, the denoising methods have also been proposed to deal with this problem [24], [28], [29].
Another approach is to choose a reference sensor and only use nonredundant M − 1 TDs between the reference sensor and the others [3], [23], [30].Fig. 1(b) shows an example where sensor 1 is chosen as the reference sensor.In this case, any TDs can be computed from the TDs between the reference sensor and the remaining ones, e.g., τ 34 = τ 31 + τ 14 .Utilizing an appropriate reference sensor may help avoid the use of TD measurements with significant errors.Choosing a reference sensor is equivalent to determining the absolute time origin for TDs, which are relative values.This approach is thus meaningful.Indeed, choosing a reference sensor is also a common practice in downstream applications [3], [31], [32].
However, is this approach optimal for avoiding outliers?There might be a better way to choose nonredundant TDs that can still reconstruct the full TD set; this is the focus of this study.In fact, there is an alternative approach to selecting M − 1 TDs that can restore the full TD set, even without using the reference sensor, as shown in Fig. 1(c) Note that we cannot always compute the full TD set from the chosen M − 1 TDs, as will be shown later.
In this paper, we propose a new TDE method based on the selection approach, which is robust against the outliers in the full set of TD measurements.We discuss the methodology and criterion for selecting the best TDs among the TDs in the full set.We introduce fundamental graph theory to the above problem, which is essential in this paper.On the basis of the graph theory, we first show a necessary and sufficient condition to uniquely reconstruct the full TD set from the chosen TDs.We then consider the optimization problem for choosing the best minimum-necessary TDs under that condition.This optimization problem can be reduced to the problem of finding the minimum spanning tree (MST), a common topic in the graph theory.The cost function for the optimization is designed on the basis of the GCC method.
Note that disambiguation [33], [34] is another topic when considering a graph structure on a sensor array.In contrast to our study, the key constraint in the disambiguation methods is that the sum of cyclic TDs becomes 0, e.g., τ 12 + τ 23 + τ 31 = 0. Disambiguation aims to estimate the nonredundant TDs under the above constraint.Since multiple sets of TDs that satisfy the constraint may exist, however, a certain type of postprocessing is necessary to choose the best TD set [31].One advantage of our approach is that the solution is uniquely obtained by finding the MST.
Robust localization [35], [36], [37] is a field that also addresses outliers removal, where sensor positions or distances are necessary to compute DOAs from TDs.In contrast, our approach focuses solely on TD measurements, requiring no prior information, such as microphone positions.Furthermore, the proposed method eliminates outliers and estimates the full set of TDs, which are the fundamental spatial cues for various array signal processing methods.The proposed method can thus serve as preprocessing for a wide range of applications.
The rest of this paper is organized as follows.In Section II, we introduce the background knowledge regarding TDE.In Section III, we explain the motivation and then state the problem addressed in this paper.Section IV is devoted to solving the problem, where we present the necessary and sufficient condition to restore the full set of TD estimates from the chosen TD measurements, and in Section V, we propose the method of estimating optimal M − 1 TDs.In Section VI, we discuss the efficacy of the proposed method via numerical experiments.Finally, we conclude this paper in Section VII.

A. NOTATION
We write scalars with regular letters (e.g., x), vectors and matrices with bold lowercase and uppercase letters (e.g., x and X), respectively, and sets with calligraphic fonts (e.g., X ).The superscripts T, H, and * denote transposition, conjugate transposition, and complex conjugate, respectively.

II. PRELIMINARIES A. SIGNAL MODEL
Let us consider estimating TDs observed by an M channel sensor array.In this paper, we assume that only one target signal propagates to the array.Let x mkn be the shorttime Fourier transform (STFT) representation of the signal observed by the mth sensor (m = 1, . . ., M ), where k = −K /2 + 1, . . ., K /2 and n = 1, . . ., N denote the frequency bin and time frame indices, respectively.We model the discrete observed signals as follows: The variable s kn denotes the source image observed at the reference sensor indexed by r ∈ {1, . . ., M }.The vector T denotes noise signals at each sensor, ω k = 2πk/K denotes the normalized angular frequency, and i denotes the imaginary unit.The vector g k (τ r ) denotes the relative transfer function (RTF) [38], which is defined as the ratio of the transfer function at the reference sensor to those at the other sensors.The relative amplitude is denoted by a mk ∈ R + .Fig. 2 shows an example of a sensor array.The continuous TD parameter between sensors i and j is denoted by τ ij .Given three sensors with the indices i, j, and k, the TDs between two of these three sensors satisfy the following three equations: These equations are naturally used to evaluate the so-called consistency of TDs [23], [24].Clearly, all TDs on an M channel sensor array have the M − 1 degree of freedom at most, whereas M C 2 = M (M − 1)/2 TDs can be considered.Note that the use of a reference sensor is not mandatory in this paper.Nonetheless, we use the reference sensor merely for notation ease and denote the TD vector τ r as in (3).

B. TD MEASUREMENT AND TD MATRIX
Hereafter, we will proceed with our discussion by distinguishing between the TD τ ij and the TD measurement θ ij .The TDs are the parameters we seek to estimate.Even though there exist M (M − 1)/2 TDs, the degree of freedom of TD parameters is M − 1 and they satisfy (4a), (4b), and (4c) as described in the previous section.On the other hand, the TD measurements obtained from pairs of two sensor signals are a type of observation including errors.We assume that the TD measurements also satisfy θ ij = −θ ij and θ ii = 0 as in (4b) and (4c), but θ ij = θ ik + θ kj does not always hold owing to errors.
A popular method of obtaining TD measurements is the GCC method [11], [12].Given two observations x ikn and x jkn , the GCC function is written as where t denotes the discrete lag and W k ∈ R + denotes an arbitrary weight function.Suitable weights have been proposed, e.g., GCC-phase transform (PHAT) and GCCsmoothed coherence transform (SCOT); Then, we can obtain the TD measurement between sensors i and j as the peak of the GCC function: Now, we consider the following TDM [24], [27] to represent the full set of TD measurements: where T denotes the TD measurements between sensors r and m = 1, . . ., M .The TDM is a skew-symmetric matrix since θ ij = −θ ji and θ ii = 0.
When the measurements contain no errors, i.e., all of them satisfy (4), can be rewritten as where In this case, the degree of freedom of the TDM is essentially M − 1 because is represented by only θ r as in (10), where θ rr = 0.The other properties, such as the rank of , which is two in theory, have been shown in [24].However, when the measured TDM contains errors, it has not M − 1 but M (M − 1)/2 degrees of freedom, which is redundant.
121286 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

III. PROBLEM STATEMENT A. MOTIVATION
The purpose of this study is to develop a TDE algorithm that is robust against errors in TD measurements.Let us consider errors by dividing them into two types: small ones around the true TD value and outliers far from it.In the context of the GCC method, the former corresponds to the errors around the global maxima.The GCC function, being a nonconvex function, has one main lobe and numerous side lobes.The peak of the main lobe is basically expected to be close to the true TD, whereas the peak and the true TD do not match because they are affected by noise and other factors.Some methods have utilized the redundancy of multichannel observation to further improve the accuracy of GCC-based TDE [24], [39], [40].
The peaks of side lobes are far from that of the main lobe.The value of the GCC function corresponding to these peaks may be reversed in severe environments.The TD measurement obtained in such circumstances may become an outlier that is far from the true TD.Since these outliers significantly degrade the quality of downstream applications, such as DOA estimation and signal enhancement, they should be removed immediately after observation.In this paper, we thus discuss the methodology for avoiding the outliers.

B. APPROACH
To develop a TDE method that is robust to outliers, we adopt a selection approach, as shown in Fig. 3. Specifically, we start with the measurement of TDM, as shown in Fig. 3(a), and identify a minimally necessary set of sensor pairs {i, j} denoted by E ′ , as shown in Fig. 3(b).By using only the TD measurements θ ij such that {i, j} ∈ E ′ , we restore all TDs, as shown in Fig. 3(c).We aim to mitigate the impact of outliers by selecting the smallest number of reliable measurements for TDE, as the number of outliers is unknown in practical scenarios.
Let τij be an estimate of τ ij .We first define the direct and indirect TDEs as follows.
• We refer to the TDE τij ← θ ij as direct estimation, where the sensor pair {i, j} is the member of E ′ ; • We similarly refer to τij ← L−1 ℓ=1 θ z ℓ z ℓ+1 as the indirect estimation, where z ℓ denotes the sensor index, especially The sensor pair {z ℓ , z ℓ+1 } for all ℓ is the member of E ′ .Given E ′ , we then restore the full set of TD estimates T = T by direct or indirect estimation, where τ r is the estimate of the TD vector (3).In this approach, the choice of sensor pairs is very important.Depending on how the sensor pairs are selected, all TD estimates may not be obtained or multiple inconsistent estimates may be obtained.Then, the specific problems addressed in this paper are as follows.
• What is the necessary and sufficient condition of E ′ for obtaining all elements of T uniquely?
• How is the optimal one determined among E ′ satisfying the condition above?We devote Section IV to solving the first problem and Section V to the second one.

IV. RECONSTRUCTION OF FULL TD SET A. SENSOR ARRAY AS GRAPH
In this paper, we consider the problem of sensor pair selection on the basis of graph theory.The graph theory has sometimes been used for representing the pairwise relations in the TDE problem as in [33] and [34].
Let us consider the M channel sensor array.Let V = {i | 1 ≤ i ≤ M } and E = {{i, j} | i, j ∈ V and i ̸ = j} be a set of vertices and all distinct pairs of vertices (edges), respectively.We denote the undirected graph including all edges as G = (V, E).On the other hand, the set of selected sensor pairs E ′ , which appeared in the previous section, is a subset of E, that is, E ′ ⊆ E, and we use only the TD measurements for sensor pairs contained in E ′ .Then, let G ′ = (V, E ′ ) be a sensor graph.Now, Fig. 1 can be regarded as an example of the sensor graphs with the 5-channel sensor array.

B. RECONSTRUCTION CONDITION
Our objective here is to choose E ′ appropriately since a certain choice of E ′ would yield a problem such that not all TDs are estimated or some TDs are not uniquely determined.We can visualize this problem with the sensor graph as shown in Fig. 4(a) where the TD measurements associated with the sensor pairs {1, 4}, {1, 5}, {2, 3}, or {4, 5} are selected.In this case, we cannot compute the TD estimate τ12 since the subgraphs with {1, 4, 5} and {2, 3} are not connected; then, the relationship between them has been lost.Moreover, the selected edges form a cycle in Fig. 4(a).Then, there exist both direct and indirect estimations, namely, τ (1) 14 ← θ 14 and τ (2) 14 ← θ 15 + θ 54 , where τ (1) 14 ̸ = τ (2) 14 owing to the error in measurements.This indicates that the TDs corresponding to the chosen sensor pairs remain redundant.To avoid this problem, the edges must be chosen appropriately, as shown in Fig. 4(b).We thus consider the condition for reconstructing the full TD set.
First, the following proposition regarding a single TD holds for any sensor graph G ′ = (V, E ′ ).Proposition 1: A TD between arbitrary two sensors i and j can be uniquely determined by either direct or indirect estimation in general1 if and only if there exists a unique path between the vertices i and j on the sensor graph G ′ = (V, E ′ ).Proof: Let us denote the path from a vertex z 1 to another vertex z L on the graph E ′ as (z 1 , . . ., z L ).Suppose there exists a unique path (z 1 , . . ., z L ) from z 1 = i to z L = j where L ≥ 2. Since the edge {z ℓ , z ℓ+1 } corresponds to the TD measurement θ z ℓ z ℓ+1 , the TD estimate is uniquely obtained as τij ← L−1 ℓ=1 θ z ℓ z ℓ+1 .Conversely, suppose there is no unique path from the vertices i to j, which is divided into two cases, that is, no path or multiple paths exist.If there is no path from the vertices i to j, τij cannot be computed by the direct or indirect estimation.Next, suppose there exist multiple distinct paths, namely, (z 1 , . . ., z L 1 ) and (z ′ 1 , . . ., z ′ L 2 ), where Then, the TD estimate can be computed in two ways by tracing the two paths: τij ← However, these two TD estimates do not match in general since they contain different TD measurements; thus, τij cannot be uniquely determined.This holds for any i and j.Now, we present the following proposition, which we refer to as the reconstruction condition.
Proposition 2: TDs between all distinct sensor pairs can be uniquely determined by either direct or indirect estimation if and only if the sensor graph G ′ = (V, E ′ ) is a spanning tree.
Proof: From Proposition 1, the following two conditions are identical: TDs between all distinct sensor pairs can be uniquely determined and there exists a unique path between any distinct pair of vertices.Now, a graph with such a property is referred to as a spanning tree [41], [42].
In accordance with Proposition 2, the full set of TD estimates can be computed from the TD measurements corresponding to an arbitrary spanning tree.The number of minimum-necessary sensor pairs is thus M − 1, which corresponds to the same number of nonredundant TD measurements.As an example of the reconstruction process, let us consider computing τ34 in the spanning tree shown in Fig. 4(b).The edges are {{1, 5}, {2, 3}, {2, 5}, {4, 5}} and the corresponding TDs are {θ 15 , θ 23 , θ 25 , θ 45 } with their sign reversals.With such a spanning tree, there exists a unique path from sensors 3 to 4, and thus, the TD estimate can be computed indirectly as All the elements in T = [ τij ] 1≤i,j≤M can be obtained in the same manner, where τii = 0.

V. ESTIMATION OF NONREDUNDANT TDS A. OPTIMIZATION PROBLEM
As mentioned in the previous section, estimates of TDs between all distinct sensor pairs can be determined uniquely if and only if the set of selected sensor pairs forms a spanning tree.Let S be a set of spanning trees on V and S be a member of S. The problem here is how to find the optimal spanning tree S † among S.
To evaluate the optimality of a spanning tree, we introduce a cost function for each TD measurement, which corresponds to each edge of the graph, e ∈ E. The lower the cost, the better the TD measurement.(or: We intend to represent that the lower the cost, the better the estimate.)Let C ij denote the cost for the TD measurement θ ij and C = [C ij ] 1≤i,j≤M denote the cost matrix.We define that all diagonal elements of C are zero.The matrix C is symmetric as a result, providing the weight for the edge {i, j} on G.The cost matrix then represents the adjacency matrix of a weighted graph.With such a cost matrix, we consider the optimal spanning tree to be the one that minimizes the sum of the costs of its edges.
We can then formulate the following optimization problem pertinent to the sensor graph with selected edges, G ′ = (V, E ′ ), for choosing the optimal set of sensor pairs.
where the abbreviation ''arg min'' stands for the argument of the minimum.The optimal spanning tree can be obtained as 121288 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Algorithm 1 Kruskal's MST Solver
As mentioned in Section I, the reference sensor is commonly used to employ nonredundant TDs [3], [23], [30], [31], [32], with the first sensor often serving as the reference.Then, before solving (12), let us consider the problem of choosing the optimal reference sensor as follows.
Obviously, this solution satisfies the reconstruction condition [see (10)].We will show that it is a special case of (12) as follows.
From the graph theory perspective, this solution forms a star [41], as shown in Fig. 1(b).We can also confirm that the reconstruction condition holds since the star is one of the spanning trees.Hereafter, we will refer to this solution as SST (star-spanning tree).
This approach is very intuitive and the computational complexity is low.However, this solution is sub-optimal since the feasible region is limited to only a set of stars among all spanning trees.For instance, let us consider a TDM that has outliers on superdiagonal entries, namely, (i, i + 1) entries (and consequently, (i + 1, i) entries as well).Then, no matter which reference sensor is chosen, an outlier will inevitably be included.If the selection of sensor pairs is not constrained to the pair of the reference sensor and others, we can still choose M − 1 TD measurements without outliers even in this case.

C. MST-BASED SOLUTION
Now, we consider solving the optimization problem (12) directly, which is referred to as the MST, one of the well-studied subjects in the graph theory [43], [44], [45], [46], [47], [48], [49].There exist efficient algorithms that can achieve a unique solution as long as the edge weights are defined appropriately.
Kruskal's algorithm [43], [44] is one of the famous MST solvers.We briefly show the pseudo-code in Algorithm 1, where C = {C ij | i, j ∈ V and i < j} denotes the set of weights corresponding to each edge in E. The inequality for i and j is Algorithm 2 Prim's MST Solver Find the edge e = {i, j} such that whose weight merely for choosing C ij from the elements above the main diagonal of the cost matrix, which is essentially meaningless.Another famous method is Prim's algorithm [45], [46], [47], which is shown in Algorithm 2. The MST is obtained by iteratively expanding a subset of edges.The subset is grown by adding the edge with the lowest weight between the vertices in the current subset and the remaining ones.The computational complexity2 is O(|V| 2 ).
To compare these methods, Kruskal's algorithm is preferable when the edges are sparse since it first sorts all the edges.On the other hand, Prim's algorithm evaluates a limited number of edges per loop and is thereby faster for a complete graph.Since the graph associated with the full TD set always has full edges, the latter is a better choice for our problem.
The output of whichever MST solver is the MST with a set of M − 1 edges.We can then reconstruct the TDM by estimating every TD directly or indirectly, as discussed in Section IV-B.

D. COST MATRIX
The remaining problem is how to define an appropriate cost matrix C = [C ij ] 1≤i,j≤M .In this paper, we propose the GCC-function-based cost matrix for the following reason.Typically, a TD is obtained by measuring the peak of the GCC function.It is expected that the GCC function will take a larger value when the given TD measurement is closer to the unknown true TD.Conversely, outliers are expected to have a relatively small GCC function value.We thus employ the value obtained by substituting the measurement θ ij into the GCC function (5) as the cost for itself.Moreover, we introduce the weighting factor α ∈ {0, 1} for normalization by defining the following cost matrix: where i, j ∈ e and e ∈ E. The (i, j) element of the cost matrix corresponds to the ordinary cross correlation when α = 0, and it corresponds to the GCC-PHAT ( 7) when α = 1.Hence, the properties of the cost matrix with α = 1 follow those of the GCC-PHAT.

E. PROPOSED ALGORITHM
Finally, we summarize the algorithm of the MST-based TDE in Algorithm 3. The function TRIU(•) returns a matrix with the upper triangle elements of the given matrix, where elements below the main diagonal are zero.MST(•) denotes the MST solver.Although any MST solver can be used, Prim's algorithm is employed in this paper.One implementation of Prim's algorithm is in ''NetworkX,'' a Python package [50]. 3 In Algorithm 3, we assume that the measurements are only the multichannel observation x kn .The normalization factor p is the user-defined parameter.The measurement of the TDM is obtained by performing the GCC method, and the cost matrix defined in ( 15) is employed.The output is the estimate of TDM T or equivalently the full set of TDs.
There are some options regarding the above conditions.For example, given a measurement of the TDM, we can use it directly instead of performing the GCC method.Any existing TDE method can also be utilized as an alternative to the GCC method.It is also possible to define another cost matrix, where the cost matrix must be symmetrical since the TDM is skew-symmetric.

VI. EXPERIMENTS
To investigate the efficacy of the proposed method, we conduct two numerical experiments and compare its performance with that of existing methods.In Section VI-A, we evaluate the robustness of the proposed method against the outliers.In Section VI-B, we evaluate the TDE performance in a simulated reverberant environment.

A. ROBUSTNESS AGAINST OUTLIERS 1) EXPERIMENTAL CONDITIONS
We simulate an M = 8 channel microphone array.The target signal s kn is a white noise of 4096 samples.The TD between two adjacent microphones is randomly generated from the uniform distribution whose interval is (−25, 25] samples.TDs are simulated by rotating the phase of the target signal, and the TDM is computed by using the ground truth.We then contaminate the TDM with outliers, which are uniformly generated from ± [50,100] samples.The number of outliers is in the range of 0 to 8 C 2 = 28.We perform STFT with 3 Available as networkx.minimum_spanning_tree().

Algorithm 3 MST-Based TDE
4096 samples in a rectangle window, which is equivalent to the one-shot discrete Fourier transform (DFT).

2) COMPARISON METHODS
We evaluate three methods for comparison: RST, SST, and TDM.RST is a crude method that randomly generates a spanning tree and uses it as a solution, implemented merely for comparison.The algorithm employed in this paper to obtain a random spanning tree 4 has a computational complexity of O(|V|).SST is the method described in Section V-B.TDM is a robust TDE method based on the TDM, which has been proposed in [24, Sec.VI].To the best of the authors' knowledge, TDM is one of the state-ofthe-art algorithms that does not require prior information, such as microphone positions.Although TDM is a method of computing the TDM by using all TD measurements rather than selection, we employ it as a comparison method because the purpose of removing outliers is the same.Note that TDM requires the maximum number of outliers supposed to be present, and in this experiment, we use the true value.
We also perform the proposed method denoted by MST, where α = 0 in this experiment.Since the noisy measurement of the TDM is given, we use it directly instead of performing the GCC method in this experiment.

3) EVALUATION CRITERIA
In this paper, we will evaluate whether all outliers have been removed.We thus define the rejection failure (RF) and RF rate (RFR) as the evaluation criteria as follows: 4 Available as networkx.random_tree().
121290 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
where T ⋆ (p) denotes the true TDM and the subscript p = 1, . . ., P denotes the simulation index.THR denotes the threshold for distinguishing whether the estimate is an outlier or not, and we set THR = 5 in this paper.If a TDE method successfully estimates T and eliminates all outliers from , RF p becomes 0.

4) RESULTS AND DISCUSSION
Fig. 5 shows the results of the experiment, where we perform P = 10000 trials.The horizontal axis shows the number of outliers in the noisy TDM and the vertical axis shows RFR.In the spanning-tree-based approach, at least M − 1 = 7 TD measurements must be clean.Hence, these methods theoretically fail if the number of outliers equals or exceeds 22.As shown in Fig. 5, RST shows the worst result.The probability of not choosing one outlier in 28 elements is onefourth, almost matching the result of RST when the number of outliers is one.The result of SST indicates that estimating the optimal reference sensor helps avoid a few outliers, while it may fail to remove outliers when their count exceeds five.This tendency is almost the same in TDM even if the true number of outliers is given.
In contrast, the proposed method, MST, works well despite the numerous outliers.Even if 10 out of 28 TDs are strongly contaminated by noise, MST successfully removes them almost 100 % of the time and restores the full set of accurate TDs.Additionally, the curvature of the MST result is far from the others.The RFR should be as close to zero as possible, and MST achieves a significantly low RFR.Note that the ideal RFR depends on which elements of the TDM contain outliers and may not always be zero.Let us consider a specific scenario where there are 20 outliers.In some cases, it is possible that, regardless of which seven of the eight clean TDs are selected, a spanning tree cannot be constructed.Then, outlier removal cannot be achieved in principle.

B. PERFORMANCE EVALUATION OF TDE 1) EXPERIMENTAL CONDITIONS
Next, we evaluate the performance of TDE in a reverberant enclosure.We use pyroomacoustics [51], a Python package, in this experiment.We synthesize M = 4, 8, and 16 observed signals with simulated room impulse responses (RIRs) with a reverberation time of approximately 400 ms.The sampling frequency is 16 kHz.The target signal is approximately 4 s long and is randomly generated following the normal Gaussian distribution.Noise signals are generated under the same conditions, and the average signal-to-noise ratio (SNR) is set to 20 dB.We perform a one-shot DFT with a rectangle window for the observed signals.The target source and microphones are randomly located in a room of 18 m × 24 m × 6 m size.They are placed at least 1 m away from the surfaces of the room, and the distance between the adjacent microphones is forced to be greater than 0.01 m.Thus, the minimum and maximum distances between the microphones are 0.01 m and approximately 27.50 m, respectively, with the average distance of about 10.77 m.The source-microphone distance is around 10.78 m on average.In general, the larger the microphone spacing and/or the longer the reverberation time, the more difficult it is to estimate the TDs.We adopt this room size as a setting where outliers are likely to occur.

2) COMPARISON METHODS
The comparison methods are generally the same as those in Section VI-A.Since the measurements are the microphone observations in this experiment, we employ the GCC method to compute the TDM, as shown in Algorithm 3. The weight function here is W k = 1 for all k.We then perform SST with the computed TDM, referred to as SST-INT.
The TD measurements obtained by the GCC method alone are limited to an integer multiple of the reciprocal of the sampling frequency.We thus employ parabolic interpolation [52] as a postprocessing to obtain the TDM with subsample TD measurements.Every method except for SST-INT is based on the subsample TDM.
As in the previous section, we perform three comparison methods: RST, SST, and TDM.TDM requires the number of outliers in advance, whereas it is unknown in practical situations.We thus perform TDM several times while changing the hyperparameter, which is denoted by κ.

3) EVALUATION CRITERIA
In addition to the RFR, we use the root mean square error (RMSE) as the evaluation metric in this section defined as 5

RMSE
where ∥ • ∥ F denotes the Frobenius norm.The number of simulations, P, is 10000 in this paper.Since we generate the RIRs regarding the microphone and source positions, the ground truth T ⋆ (p) is unknown.Therefore, we compute it from the distance between them.The speed of sound is set to 343 m/s, which is equal to that used for RIR generation via pyroomacoustics.
If there is even one outlier in the TD estimates, the RMSE becomes extremely large owing to the effect of the outlier.The criteria excluding such a gross error(s) will help in the interpretation of the results.Therefore, we also evaluate the RMSE using only the results of trials in which outliers were successfully discarded.From now on, we denote the RMSE computed in this manner as ''RMSE (W/O OLs)''.

4) RESULTS AND DISCUSSION
Table 1 show the results of the experiments.In the case of M = 4, the baseline method SST-INT shows the worst RMSEs.SST-INT fails to remove outliers 190 times out of P = 10000 trials.SST improves the RMSE (W/O OLs) by applying quadratic interpolation to the TD measurements.The RFR is also slightly improved accordingly.RST is worse than the baseline method, which is similar to the results discussed in the previous section.We show the results of TDM for the different numbers of outliers κ = 0, 2, . . ., 10. Since the number of TD measurements is 4 C 2 = 6, no results are available with κ = 6, 8, 10.TDM is a method that simultaneously realizes the refinement of TD measurements and the removal of outliers by utilizing all TD measurements.Thus, it shows much better results in RMSE (W/O OLs).However, κ = 0 assumes that there are no outliers (in other words, all measurements are assumed to be reliable), and the RFR increases as a result.It can be seen that the RFR is less than that of the baseline when considering outliers with κ = 2 and 4.
The proposed method MST shows the best results when α = 1 (with GCC-PHAT-based cost matrix).The number of RFs is the lowest at 46, and the RMSE is accordingly improved.Additionally, we can see that the cost matrix based on the GCC-PHAT significantly improves the performance of MST.Unlike TDM, the spanning-tree-based approach has no refinement mechanism since it uses the minimum-necessary TD measurements.The RMSE (W/O OLs) is thus limited by the accuracy of the given TD measurements.In practice, it is desirable to apply some accurate TDE algorithms, such as those in [40], as the postprocessing of MST.
The trend observed with M = 8 microphones is similar to the aforementioned results; however, the benefits of the proposed method become more pronounced.As the number of microphones increases from four to eight, the number of TDs to be estimated also increases, leading to a higher occurrence of outliers.This is the reason for the overall increase in RFR compared with the case of M = 4. Additionally, parabolic interpolation is applied regardless of the accuracy of TD measurements.The interpolated TDs are expected to approach the ground truth more closely, whereas interpolated outliers may further stray away from it.Then, the subsample TDM might contain worse outliers, resulting in the decrease in the performance of SST compared with SST-INT.Despite such circumstances, MST (α = 1) markedly reduces the number of RFs.SST and MST search for the optimal spanning tree in the same TDM on the basis of their respective criteria.Therefore, we can confirm the effectiveness of the proposed approach that estimates the MST rather than using the best reference sensor.TDM can also reduce the RFR by setting κ appropriately, such as κ = 4, 6.Basically, TDM is guaranteed to converge when κ is small [24].However, it sometimes diverges when the number of outliers, κ, is set to 10 (its maximum value here is 8 C 2 = 28), which leads the RMSE to become infinity.In contrast, the proposed MST (α = 1) shows the best results in terms of the RMSE and RFR.
In the case of M = 16, similar results as in the case of M = 8 are obtained.As shown in the rightmost column of Table 1, the RFR becomes worse than that with M = 8 in most cases.On the other hand, MST (α = 1) can reduce the RFR to the same level as in the case of M = 8 by utilizing the additional redundancy owing to the increase in the number of observations.Since the RMSE is still limited, the application of postprocessing is preferable, as mentioned earlier.Finally, from the above results, we can confirm the efficacy of MST regardless of the number of microphones.

VII. CONCLUSION
In this paper, we proposed a novel method of estimating TDs that is robust against outliers in the measurements.
121292 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Specifically, we addressed the problem of selecting non redundant TDs from a full set of TD measurements, while ensuring that all TDs can still be restored.We showed that this problem is reduced to the problem of finding the MST and we developed an efficient algorithm based on graph theory.
In numerical experiments, we demonstrated that the proposed method successfully selects relatively clean TD measurements and reconstructs the full set while discarding outliers.Moreover, we verified the efficacy of the proposed method in a reverberant room environment through computer simulations.
Compared with conventional methods, our approach achieved superior performance in removing outliers, although its estimation accuracy is still limited.The future work thus includes integrating our method with TD refinement methods to realize highly accurate and robust TDE.

FIGURE 1 .
FIGURE 1. Sensor array and TDs for a single source.The circles and lines between them represent the sensors and the TDs to be measured from the sensor pairs, respectively.(a) Set of full redundant M(M − 1)/2 TDs.(b) Set of nonredundant (M − 1) TDs.Only the TDs between the reference sensor and the others are used, where sensor 1 is the reference.(c) Another set of nonredundant TDs.No reference sensor is used in this case.

FIGURE 2 .
FIGURE 2.TDs defined in a sensor array.TD τ 13 is equal to the sum of τ 12 and τ 23 .This relationship holds for a single source in any sensor array arrangement.

FIGURE 3 .
FIGURE 3. Processing flow of the proposed approach with a 5-channel sensor array.(a) TDM measurement , where θ ij = −θ ji and θ ii = 0 for all i and j .(b) Perforated TDM with the TD measurements corresponding to the set of selected sensor pairs E ′ .(c) Estimate of the TDM T computed from (b), where τij = τik + τkj , τij = − τji , and τii = 0 for all i , j and k.The TD estimates in red and blue represent direct and indirect estimates, respectively.

FIGURE 4 .
FIGURE 4. Examples of undirected graphs for the 5-channel sensor array with 4 edges.(a) Graph with a cycle where the sensor nodes {1, 4, 5} and {2, 3} are disconnected, indicating that this is not a tree.(b) Graph without any cycles where all sensor nodes are connected, thus indicating that this is a tree.

FIGURE 5 .
FIGURE 5. RFR as a function of the number of outliers in the TD measurements, where the number of sensors is eight.Each method attempts to eliminate outliers, and its failure rate is shown on the vertical axis.Spanning-tree-based methods require at least M − 1 clean TDs; hence, they always fail to remove outliers in the range of 22 or more on the horizontal axis.
Kruskal's algorithm is a kind of greedy algorithm.The MST is found by iteratively selecting/removing the edge with the lowest weight from a list of sorted edges while ensuring that the selected ones do not form a cycle.The computational complexity is O(|E| log |V|) or equivalently O(|V| 2 log |V|) because |E| = |V|(|V| − 1)/2 in our problem.