On Interference-Rejection Using Riemannian Geometry for Direction of Arrival Estimation

We consider the problem of estimating the direction of arrival of desired acoustic sources in the presence of multiple acoustic interference sources. All the sources are located in noisy and reverberant environments and are received by a microphone array. We propose a new approach for designing beamformers and DoA estimation methods based on the Riemannian geometry of the manifold of Hermitian positive definite matrices. Specifically, we show theoretically that incorporating the Riemannian mean of the spatial correlation matrices into frequently-used beamformers gives rise to spatial spectra that reject the directions of interference sources and result in a higher signal-to-interference ratio. We experimentally demonstrate the advantages of our approach in designing several beamformers and a recent DoA estimation method in the presence of simultaneously active multiple interference sources.


I. INTRODUCTION
E STIMATION of the direction of arrival (DoA) of an acoustic source is prevalent in signal processing; it is an important step in many tasks, such as source localization, beamforming, source separation, spectrum sensing, and speech enhancement [1], to name but a few.Despite the large research attention it has drawn in the past decades, acoustic DoA estimation is still considered a challenging open problem.Especially in noisy and reverberant environments and in the presence of interference sources, it continues to be an active research field.
Acoustic source localization, and particularly DoA estimation, are often addressed using beamforming [2].Many beamformers have been proposed over the years for these tasks.One class of beamformers is based on the steered response power (SRP) of a beamformer output.For example, considering the maximum likelihood criterion for a single source, the output power of the beamformer from all the directions is computed, and the DoA is identified as the direction with the maximal power [3]- [6].Another example is the Minimum Variance Distortionless Response (MVDR) beamformer [7]- [9], which was first introduced by Capon [10].The MVDR beamformer extracts the DoA of each of the existing sources, maintaining a unit gain at their direction while minimizing the response from other directions.An important generalization of the MVDR beamformer is the Linearly Constrained Minimum Variance A. Bar and R. Talmon are with the Viterbi Faculty of Electrical and Computer Engineering, Technion-Israel Institute of Technology, Haifa 32000, Israel (e-mail: amitayb@campus.technion.ac.il; ronen@ef.technion.ac.il).
This work was supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No. 802735-ERC-DIFFOP.
(LCMV) beamformer [11], obtained by minimizing the output power under multiple linear constraints, and can be used for DoA estimation as well [12].Another line of beamformers is derived based on a subspace approach, i.e., by identifying the subspace of the desired sources, which is assumed to contain only a small portion of the noise and the interference sources.A prominent subspace method, which is also used for DoA estimation, is MUltiple Signal Classification (MUSIC) [13]- [16].
A notable algorithm for acoustic source localization is the Steered-Response Power Phase Transform (SRP-PHAT) algorithm proposed in [17].In SRP-PHAT, the phase transform is used to normalize the different frequencies, such that only their phase information is considered.This allows for the fusion of the different frequencies when considering a broadband signal.A popular time-domain implementation of the SRP-PHAT is the generalized cross-correlation with phase transform (GCC-PHAT) proposed in [18], which normalizes each cross-correlation using the phase transform.In recent years, sparse signal recovery methods have been proposed for DoA estimation [19]- [21].In particular, one approach for solving the sparse recovery problem is the sparse Bayesian learning approach proposed in [22], which was adopted for the DoA estimation problem as well, for example in [23], [24].
In this paper, we consider DoA estimation in a reverberant enclosure consisting of desired sources along with interference sources.We assume that the desired sources are constantly active, whereas the interference sources are only intermittently active.The number of sources, their locations, and their times of activity are all unknown.Consequently, their identification as desired or interference is unknown as well.The power of the different sources is also unknown, and the interference sources could, in fact, be stronger than the desired sources with overlapping activity periods.Our goal is to estimate the DoA of the desired sources in the presence of possibly simultaneously active, multiple interference sources.This setting poses a major challenge to the common practice in existing methods that rely on maximal power because estimating the DoA of the strongest sources might result in distinct beams in the direction of the interference sources rather than the desired sources.Furthermore, these beams could mask the beams pointing at the directions of desired sources.
We address this challenge from a geometric standpoint.Our approach relies on the observation that the frequently-used beamformers implicitly consider Euclidean geometry when processing sample correlation matrices.Therefore, since the sample correlation matrices are Hermitian Positive Definite (HPD) matrices, important geometric information is not fully utilized.Instead, we propose a new approach for beamforming design that is based on the Riemannian geometry of the manifold of HPD matrices [25]- [27].Concretely, we analyze the received signal in short time windows and consider the Riemannian mean [25] of the sample correlation matrices in these windows.Then, we leverage particular spectral properties of the Riemannian mean.In [28], it was shown that the Riemannian mean of HPD matrices preserves shared spectral components and attenuates unshared spectral components.Consequently, the continual activity of the desired sources and the intermittent activity of the interference sources enable us to associate desired sources with shared spectral components and interference sources with unshared spectral components.By combining the above, we show that the incorporation of the Riemannian mean into the beamformer design leads to interference rejection, i.e., gives rise to a spatial spectrum that implicitly rejects the beams pointing at the interference sources and preserves the beams pointing at the desired sources.The resulting spectrum is, in turn, used for the estimation of the DoA of the desired sources.Importantly, our approach is applicable to a large number of beamformers used for DoA estimation.
By incorporating our Riemannian approach, we present new implementations of several beamformers: the Delay and Sum (DS) beamformer, subspace-based beamformers, and the MVDR beamformer, as well as the Bayesian learning method for DoA estimation proposed in [23].
The main contributions of this paper are as follows.First, we observe that despite being commonly-used in array processing in general, and for DoA estimation in particular, the HPD structure of the signal correlation matrices is not fully exploited.Here, we identify that by dividing the computation of the sample correlation matrix into short-time segments, it can be viewed as the Euclidean mean of the sample correlation matrices of the segments.This observation enables the introduction of the Riemannian geometry of HPD matrices, and in particular, it allows us to introduce a rather simple approach that promotes the use of the Riemannian mean instead of the Euclidean mean and to show its multiple merits.Second, we demonstrate that the proposed approach can naturally be applied to a broad range of DoA estimation methods that use the signal sample correlation matrix, and we show that it is computationally efficient in the sense that it typically does not change the order of the computational complexity of the DoA estimation methods.Third, we theoretically analyze the proposed approach and show that in the case of the DS beamformer, it results in higher SIR values that lead to better DoA estimation accuracy.In addition, we present a noise sensitivity analysis.Fourth, we present empirical results in adverse conditions that include simultaneously active multiple interference sources.We showcase the applicability of our approach to both classical and recent DoA estimation methods and demonstrate that the obtained performance improvement is by a large margin.
We conclude the introduction with three remarks.First, a similar setting to ours, consisting of desired sources accompanied by interference sources, was considered in [29] and [30] but in the context of signal enhancement.In [29], a single desired source and a single interference source were considered, and in [30], multiple desired sources and multiple interference sources were considered.However, in both works, it was assumed that there is at least one segment for each source, desired or interference, in which it is the only active source.Furthermore, in [30], the number of the desired sources and their activity patterns were assumed to be known.Second, in the context of radar, the Riemannian geometry of the Toeplitz HPD matrices was used in [31] and [32] for target detection by comparing Riemannian distances to a threshold.In the radar settings, [33] estimated the correlation matrix as a linear combination of correlation matrices, with weights that are based on the Riemannian distance.Third, in this paper, we demonstrate the Riemannian approach for designing beamformers that reject interference sources for DoA estimation.However, other applications, e.g., signal enhancement, could also benefit from spectra that reject interference sources.
This paper is organized as follows.In section II, we present a brief background on the HPD manifold.In Section III, we formulate the problem and the setting.In Section IV we describe the proposed approach and present the algorithm for DoA estimation.In Section V, we provide a theoretical analysis of the proposed approach for the DS beamformer.In Section VI, extensions of the approach to other beamformers are presented.Section VII shows simulation results demonstrating our Riemannian approach.Lastly, we conclude the work in Section VIII.

II. BACKGROUND ON THE HPD MANIFOLD
An HPD matrix, Γ ∈ C n×n , is a Hermitian matrix, i.e.Γ = Γ H , where Γ H is the conjugate transpose of Γ, whose real eigenvalues are strictly positive.Associating the space of HPD matrices with the Affine Invariant metric [34] constitutes a Riemannian manifold, M. The distance between two matrices Γ 1 and Γ 2 , induced by the Affine Invariant metric, is given by where • F is the Frobenius norm.
The tangent space to a point on a manifold Γ ∈ M is a vector space that can be viewed as a local approximation of the manifold around Γ.For the HPD manifold, the tangent space around each Γ ∈ M is the vector space of Hermitian matrices of dimensions n × n.
The Riemannian mean, Γ R , of a set of point {Γ i |Γ i ∈ M} is defined by the Fréchet mean as the point minimizing the sum of the distances from all the points in the set as follows In general, there is no closed-form expression for the Riemannian mean of more than two matrices, and a solution can be found using an iterative procedure [35], described in Algorithm 1.The computation of the Riemannian mean requires two maps.The Logarithm map, which maps an HPD Algorithm 1 Riemannian mean for the HPD manifold [36] Input: a set of K HPD matrices Compute the Euclidean mean in the tangent plane: The Exponential map, which maps a vector T from the tangent space at Γ, is given by We note that there exists an efficient iterative estimator for the Riemannian mean [37].It is described in Appendix C. A discussion about the choice of the Affine Invariant metric appears in Appendix D.

III. PROBLEM FORMULATION
We consider the problem of localizing N D desired sources in the presence of N I interference sources.All the sources are static and located in a reverberant environment.The signals are received at a noisy microphone array of M microphones, which are positioned in known, but possibly arbitrary, positions.The acoustic environment between each source and each microphone is modeled by the Acoustic Impulse Response (AIR).The signal at the mth microphone, considering a farfield setting, is given by: where s d j (n) is the jth desired source, s i j (n) is the jth interference source, and v m (n) is the mth microphone noise.We denote by h d jm (n) the AIR between the mth microphone and the jth desired source.Similarly, we denote by h i jm (n) the AIR between the mth microphone and the jth interference source.
The sources are characterized by their activation times.The desired sources are active during the entire interval.In contrast, the interference sources are only partially active, namely active during segments of the interval.We emphasize that albeit we consider intermittent interference sources, longer signals might consist of a larger number of interference sources in addition to more occurrences of active interference sources.We do not assume there is a segment with only the desired source.Longer intervals do not necessarily increase the probability of such a segment since new interference sources could emerge in such a scenario.Consequently, even a long signal still poses a significant challenge to the DoA estimation.Additionally, we assume that the desired sources, the interference sources, and the noise are all uncorrelated.The noise is assumed to be spatially white.We note that these assumptions are made for simplicity.Empirically, the proposed approach leads to superior results for correlated signals and for noise with arbitrary covariance matrix as well.
The received signal is processed using the Short-Time Fourier Transform (STFT).We denote by s d j (l, k) the STFT at the lth window and the kth frequency of s d j (n).The notation for s i j (l, k), h d jm (l, k), h i jm (l, k), and v m (l, k) follows similarly.Then, the STFT at the lth window and the kth frequency of the received signal is given by where we assume the length of the window is much larger than the AIR length.We stack the received signals, {z m (l, k)} m , from all the microphones to obtain a column vector z(l, k) Its explicit expression is where s d (l, k) and s i (l, k) denote the stacked STFT representations of the desired sources and the interference sources, respectively, and are given by and the noise term is The Acoustic Transfer Functions (ATFs) from the jth desired source and the jth interference source to the microphone array are and in a matrix form Henceforth, we focus on a single frequency bin and omit the frequency index.Throughout the paper, we refer to z(l) as the received signal.Since all the sources are static the ATFs do not change over time, so in the following, we omit their STFT window index l.
In this work, we focus on a single frequency to demonstrate the Riemannian approach, which is suitable for narrowband signals.For broadband signals, this is the formulation of only a single frequency.We note that the fusion of the different frequencies is of great importance, and normalization techniques for the different beamformers have been proposed [38] and can be applied after our method.
Our goal is to estimate the direction to the desired sources, given z(l), l = 1, . . ., L STFT , where L STFT is the number of STFT windows.The main challenge is the existence of interference sources, positioned at unknown locations with possibly high signal power.

IV. PROPOSED APPROACH
Typically, the DoA estimation of a desired source is based on the output of a beamformer.In this section, we present the proposed approach applied to the Delay-and-Sum (DS) beamformer.In Section VI, we extend the proposed approach to other DoA estimation methods.
We consider arbitrary indexing of the microphones in the array and designate the first microphone as the reference microphone.Let d(θ) denote the steering vector of the array to direction θ relative to the first (reference) microphone, which is given by where φ m (θ) is the phase of the received signal at the mth microphone with respect to the first microphone.For example, for a uniform linear array and the typical microphone indexing, we have φ m (θ) = 2π • m δ λ sin θ, where λ is the wavelength of the received signal, and δ is the distance between the microphones.
The SRP of the DS beamformer is given by where is the population covariance matrix.Since the desired source is constantly active and assumed to be at a fixed location during the entire interval, we estimate the population correlation matrix Γ using the sample correlation matrix Γ by averaging z(l)z H (l) over multiple STFT windows.We divide the STFT of the signal into L s disjoint segments, each consisting of L w consecutive STFT windows, i.e., L STFT = L s • L w (for more details regarding the partitioning see Section V-D).Then, the sample correlation matrix over each segment is computed as follows where i is the segment index.To obtain a full rank correlation matrix, the number of STFT windows is set to be larger than the dimension of the matrix (the number of microphones), i.e., L w ≥ M .The incorporation of the Riemannian geometry is realized by viewing each matrix Γ i as a point on the HPD manifold [34] and considering their Riemannian mean, denoted as Γ R and given by: In general, there is no closed-form solution to (17) on the HPD manifold for more than two points [35].Therefore, Algorithm 1 proposed in [36] is used to compute the Riemannian mean of the L s correlation matrices.Once Γ R is at hand, the SRP of the DS beamformer, P DS (θ; Γ R ), is computed by In the case of a single desired source and assuming the direct path is dominant in the AIR, the direction to it is set as the direction achieving the maximum value of the SRP of the DS beamformer, i.e., θ = arg max We note that (19)  Output: The estimated direction of the desired source θ 1: Divide {z(l)} LSTFT l=1 into L s consecutive segments 2: For each segment i, compute the sample correlation matrix Γ i using (16) 3: Compute Γ R of the set { Γ i } LS i=1 using Algorithm 1 4: Compute P DS (θ; Γ R ) according to (18) 5: Return θ = arg max θ P DS (θ; Γ R ) As a baseline, we consider the common practice of the DS SRP computation, which is typically based on the sample correlation matrix over the entire interval, i.e., P DS (θ; Γ E ), where We observe that computing the Euclidean mean of the sample correlation matrices per segment, { Γ i } Ls i=1 , results in Γ E in (20).So, Γ E in (20) is the Euclidean counterpart of Γ R , the sample correlation matrix resulting from the Riemannian approach.
We will show that our Riemannian approach exploits the assumption that the desired source is constantly active and at a fixed location, whereas the interference sources are intermittent.More specifically, we will show both theoretically in Section V and empirically in Section VII that the Riemannian mean attenuates the intermittent interferences while preserving the constantly active sources.In contrast, the standard Euclidean mean accumulates all the sources, and as a result, the main lobe could deviate from the direction of a desired source, and even focus on an interference source.
We show in Section V and Section VII that our proposed approach results in a SRP that rejects the interference sources, allowing the beamformer to extract the DoA of the desired sources.
We remark that the proposed approach only requires that the desired sources are the only sources active during the entire interval.The rank of the signal matrix is not known nor needs to be estimated.The number of interference sources is unknown as well.This is by virtue of the Riemannian mean.Unlike other works (e.g.[30]), we do not need to know the activation times of each interference, nor the number of interference sources.Furthermore, we do not assume that there exists a segment, at which a desired source is the only active source, namely, it could always be accompanied by interference sources.
In terms of complexity, the Riemannian approach requires the computation of the Riemannian mean of the correlation matrices, which is more complex than the Euclidean mean.However, the excess complexity depends on the number of microphones, which is typically not high compared to the complexity of computing the correlation matrix which depends on the number of signal samples.Consequently, the excess complexity is negligible.In particular, following the iterative computation of the Riemannian mean in Algorithm 3 in Appendix C, at each iteration, the computation involves the eigenvalue decomposition of two complex matrices of dimension M × M , which is O(M 3 ), and the computation of 6 matrix products with the complexity of O(M 2 ).So, at each iteration, the overall excess complexity is O(M 3 ).We recall that for N samples, the complexity of computing the sample correlation matrix is O(M 2 N ) which is much larger than O(M 3 ) since typically N >> M .Empirically, we found that the Algorithm 3 converges very quickly after several iterations.To summarize, the proposed approach leads to improved performance with only negligible additional computational cost.In Appendix C, we present the estimator along with an implementation for the streaming data setting.
To evaluate the performance of the proposed approach, we define the output Signal to Interference Ratio (SIR) as follows: where P (θ; Γ) is the SRP computed using the sample correlation matrix Γ, θ d is the direction of a desired source, and θ i j is the direction of the jth interference.When using the DS beamformer, the output SIR becomes This measure of performance is used because the main challenge in this setting is the presence of interference sources rather than the microphone noise.

V. ANALYSIS
In this section, we analyze the proposed approach which is based on Riemannian geometry and compare it to its Euclidean counterpart.The proofs of the statements appear in the Supplementary Material (SM).In the analysis, we consider the population correlation matrix of the received signal, neglecting the estimation errors stemming from the finite sample in a segment.We note that sections IV, VI, VII consider the sample correlation matrices, and only Section V considers the population correlation matrix.
We begin with a short derivation, demonstrating that Riemannian geometry preserves better the desired source subspace in comparison to Euclidean geometry.Consider a single desired source and assume its ATF, denoted by h 0 , is a common eigenvector of all the correlation matrices per segment associated with the same eigenvalue (this assumption is made formal in Assumption 1 in the sequel).In this case, according to Lemma 2 (see Appendix D), h 0 is an eigenvector of both means and it is associated with the same eigenvalue, namely λ 0 (Γ R ) = λ 0 (Γ E ), where λ i (Γ) is the ith eigenvalue of Γ.In addition, all other eigenvectors span the interference and noise subspaces.Under the aforementioned assumption, and by the following known property of the Riemannian mean [39] Γ R Γ E . ( we get that due to the equality of the numerators and ( 23).The inequality in (24) implies that the desired source subspace is more dominant relative to the subspace of the interference and noise in the Riemannian mean compared to the Euclidean mean.
In the remainder of this section, we extend this analysis and present additional results.

A. Assumptions
To make the analysis tractable, we consider a single desired source and multiple interference sources.Therefore, we simplify the notations by omitting the superscripts (•) i and (•) d associated with the interference sources and the desired sources and setting the index of the desired source to 0.
For the purpose of analysis, we make the following assumptions: It follows from Assumption 1 and Assumption 2 that the ATFs, associated with the desired source and the interference sources are all uncorrelated.These are common assumptions, e.g., see [30].The assumptions are made only for analysis purposes, whereas in the experimental results, we consider acoustic signals in a reverberant environment without any assumptions.We note that we do not assume there exists a segment at which only one of the sources is active (e.g., as in [30]).In case an interference source is only partially active during a segment, we consider it active during the entire segment.
The population correlation matrix of the ith segment is given by where σ 2 0 and σ 2 v are the power of the desired source and the power of the noise, respectively.The diagonal matrix Λ i captures the signal power of the interference sources and is given by: where H = H i (l, k) due to the omission of the indices and is the expected signal power of the jth interference source at the ith segment, L j is the set of segments at which the jth interference source is active, and I i∈Lj is an indicator function, attaining the value of 1 when the jth interference is active during the ith segment and 0 otherwise.We denote by Ls the relative number of segments during which the jth interference source is active.We assume the same expected power at all the segments in the interval, i.e., σ 2 j (i) = σ 2 j for all j = 1, . . ., N I and i = 1, . . ., L s .
We continue with defining the Signal to Noise Ratio (SNR) at the mth microphone as where |h 0 [m]| 2 is the attenuation of the signal due to the acoustic channel between the desired source and the mth microphone.Since we focus on a single frequency bin, (27) is the narrowband SNR.
To capture the correlation between the steering vectors and the ATFs, we define where r, s = 0, 1, 2, ..., N I , indicating the desired source or an interference source.
We conclude the preliminaries of the analysis with two additional assumptions.Assumption 3. ρ rr is fixed ∀r, and ρ rs is fixed ∀r = s.Assumption 3 implies that the correlation between the ATFs and the steering vectors depends only on whether they are associated with the same source or not.Following Assumption 3, henceforth we denote κ = ρ rr and ρ = ρ rs for r = s.
Assumption 4 is typically made in the context of source localization.It implies that the correlation between a steering vector to a source and the ATF associated with that source is higher than the correlation between a steering vector to a source and the ATF associated with a different source.

B. Main Results
Our first result states that the output SIR (22) of the Riemannian-based DS beamformer is higher than the output SIR of the Euclidean-based DS beamformer.Proposition 1.For every interference source j, the following holds for any number of microphones in the array.
Examining the dependency of the output SIR on the noise power σ 2 v leads to the following result.Proposition 2. If Namely, the lower the noise power is the higher the SIR is, and the improvement in SIR j (Γ R ) is greater than the improvement in SIR j (Γ E ).Since we established that the Riemannian approach is better than the Euclidean one in terms of the SIR in Proposition 1, Proposition 2 implies that increasing the SNR further increases the gap between the two approaches.Nevertheless, it also indicates that the performance of the Riemannian approach in terms of the SIR is more sensitive to noise compared to the Euclidean counterpart.Note that this statement holds under condition (30), which implies that the received power of the desired source is stronger than the received power of each interference source, considering the attenuation stemming from the activity duration.See more details in Section I-C in the SM.
The proofs of Proposition 1 and Proposition 2 rely on the following lemma, which is important in its own right.(25) over the entire interval can be written in the same parametric form as:

Lemma 1. The Riemannian or the Euclidean mean of the population correlation matrices of the segments
The Riemannian mean Γ R is obtained by setting the parameters µ j to and the Euclidean mean Γ E is obtained by setting We note that only assumptions 1 and 2 are necessary for this lemma to hold.In addition, we note that if the interference sources are always active, i.e., Lemma 1 shows that both Γ R and Γ E , i.e., the population correlation matrix of the segments in (25), can be decomposed into three terms associated with the desired source, the interference sources, and the noise.By (32), the desired source term and the noise term (the first and third terms) are the same in both the Riemannian and the Euclidean means.In contrast, the coefficients in (33) and (34) imply that the amplitude of the interference sources term (the second term) depends on the used geometry.By further inspecting the expressions of {µ 2 j }, we see that the interference attenuation using the Riemannian geometry in (33) is more involved than its Euclidean counterpart in (34), depending not only on the interference power and the duration of activity but also on the noise power and the corresponding ATF.
Furthermore, considering µ j in (34), the condition (30) could be viewed as the dominance of the desired source after the attenuation of the Euclidean mean.Discussion about the condition (30) for µ j in the Riemannian case in (33) appears in Section I-C in the SM.
Next, we examine a family of correlation matrices that pertain to the same parametric form as in (32) in Lemma 1, i.e., for some coefficients a = [a 1 , a 2 , ..., a NI ].Without loss of generality, we set the coefficient of h 0 h H 0 to 1.We note that Γ a is in accordance with (25).
For any j, we have that where a = 0. Consequently, Considering vanishing noise, i.e., when the noise power approaches zero, the following result stems from Lemma 1 by considering the limit lim σ 2 v →0 µ 2 j = 0 using (33).Corollary 1.
According to Corollary 1, the Riemannian mean approaches the optimal correlation matrix as the noise becomes negligible.By adding a condition on the presence of the interference sources, from Lemma 1 and (33) we also have the following.
Corollary 2. For any interference source j, if τ j < 1 2 , then lim Additionally, if τ j < 1 2 for all j, then lim Corollary 2 implies that for vanishing noise, even when all the interference sources have infinite power, the desired source is still the dominant source in the SRP of the DS beamformer using the Riemannian mean.Following (37) it holds that lim σ 2 j →∞,σ 2 v →0 SIR j (Γ opt ) = κ ρ > 1 for all j.For the Euclidean mean it holds that lim σ ) for all j.Note that we consider noise power approaching zero rather than strictly zero, because when σ 2 v = 0, the correlation matrix is singular, and therefore, lies outside the HPD manifold.Additionally, in practice, noise is always present.
To illustrate the obtained expressions for the Riemannian and the Euclidean SIR, we present the following simple example.
Example 1.Consider an anechoic environment without attenuation, for which κ = 1 and ρ = 0, and two interference sources.Each interference source is active at a different segment, i.e., L s = 2, L 1 = {1}, and L 2 = {2}.All the sources have the same power.In this setting, for the Riemannian geometry, we have and for Euclidean geometry, we have: Therefore, in the limit of We conclude this analysis with a few remarks.First, we note that Γ E leads to the ML estimator by taking θ0 = arg max θ P DS (θ; Γ E ) for the interference-free setting.In this case, the Riemannian mean coincides with the Euclidean mean, i.e., the proposed method coincides with the ML estimator.The main advantage of the proposed method lies in attenuating the interference sources while preserving the desired source.Second, under assumptions 1 and 2, the number of sources, both desired and interference, are limited by the number of microphones in the array, i.e., N I + N D < M .In Section V-C we alleviate Assumption 2, which removes this restriction.Third, following the same techniques in the proof of Proposition 1 and Proposition 2, similar results are derived for an alternative definition of the SIR: SIR tot (Γ) ≡ , which captures the ratio between the desired source and the sum of all interference sources.See Section II in the SM for more details.

C. Relation to Signal Enhancement
For signal enhancement in reverberant environments, the estimation of the ATF of the desired source is typically required.In our setting, there is no segment at which the desired source is the only active source, and therefore, the ATF estimation is done in the presence of the interference sources.In such a case, the following quantity could be of interest: which is different than (22) in the use of the ATFs instead of the steering vectors.
Similarly to Proposition 1, the following Proposition 3 examines the performance in terms of the SIR defined in (43).Here, assumptions 2-4 are not required, and therefore, the ATFs of the interference sources could be correlated, and the number of sources is not limited by the number of microphones in the array.Proposition 3.Under Assumption 1, for all j we have: Another interesting component in signal enhancement is the Relative Transfer Function (RTF) between different microphones [40]- [42].We compute the RTFs with respect to the first microphone, i.e., hj hj (1) .Since the RTFs are proportional to the ATFs, assumption 1 holds for the RTFs, and therefore, all the derived results apply to the RTFs as well.

D. The Segments and the Interference Sources Activity
In this section, we investigate the effect of misalignment between the segments and the activity of the interference sources.We consider two interference sources and two segments.We denote by α the offset between the segments and the activity of the interference sources.For simplicity, we consider alternately active interference sources.Suppose the first interference source is active during α ∈ [0, 1] of the first segment and during 1 − α of the second segment, and suppose the second interference source is active during 1 − α and α of the first and second segments, respectively.In this case, the correlation matrices of the two segments are given by: The correlation matrices in (45) depend on α, and as a result, their Riemannian mean Γ R (α) and their Euclidean mean Γ E (α) depend on α as well.
Examining the dependency of the SIR on α leads to the following result.
Proposition 4 states that for every misalignment between the segments and the activity of the interference sources, the Riemannian mean leads to higher SIR in comparison to its Euclidean counterpart.Equality in (46) is obtained for α = 12 , which means 50% offset.In this case, it holds that Γ 1 = Γ 2 , and both means are the same.
Empirically, we found that the advantage of the Riemannian mean over the Euclidean mean decreases as the offset between the segments and the activity of the interference sources increases.We leave the question of optimal partitioning of the STFT windows into segments to future work.

VI. EXTENSION TO OTHER DOA ESTIMATION METHODS
In this section, to broaden its applicability, we demonstrate the incorporation of the Riemannian approach in other beamformers.Each beamformer generates a spatial spectrum from which the directions to the desired sources are estimated according to the highest peaks in the spectrum.
As a subspace (SbSp) approach, we implement MUSIC [13] in the following way.given N D desired sources, we take the leading N D eigenvectors of the sample correlation matrix, Γ, and construct the signal subspace matrix, U ( Γ) ∈ C M×ND , whose columns are the N D eigenvectors.Then, the SbSp spectrum is defined by We note that the appropriate number of eigenvectors N D (the dimension of the subspace) needs to be estimated.Similarly to the DS beamformer based on the SRP in ( 18) and ( 19), the Riemannian and the Euclidean SbSp methods are given by P SbSp (θ; Γ R ) and P SbSp (θ; Γ E ), respectively, according to (47).The SbSp method could also benefit from our Riemannian approach.Recalling assumptions 1 and 2 and the structure of the mean correlation matrices Γ R and Γ E in (32), we see that h 0 is an eigenvector of both Γ R and Γ E , spanning the signal subspace (we assume a single desired source for simplicity).Moreover, the vectors {h j } are also eigenvectors of Γ R and Γ E , spanning the subspace of the interference sources.SbSp methods typically focus on the principal eigenvector.Following (32), the principal eigenvector is determined according to the largest coefficient among σ 0 and {µ j } NI j=1 .The parameter µ j in (33) for the Riemannian mean Γ R is smaller than µ j in (34) for the Euclidean mean Γ E , as proven in Proposition 1.As a result, the signal subspace is more dominant relative to the interference subspace when considering the Riemannian mean in comparison to the Euclidean mean, implying better results for the Riemannian SbSp approach.For an interference source with sufficiently high power, the leading eigenvector of Γ E could span the interference subspace rather than the signal subspace, whereas the leading eigenvector of Γ R spans the signal subspace.In Section VII, we demonstrate these SbSp methods empirically and show the advantage of the Riemannian approach.
Our approach is also applicable to the MVDR beamformer [10], whose spectrum is given by The typical spectrum of the MVDR beamformer is obtained by using Γ E in (48), namely P MVDR (θ; Γ E ).We propose to use the Riemannian mean by setting Γ to be Γ R in (48) to obtain P MVDR (θ; Γ R ).
In principle, many DoA estimation methods that employ the sample correlation matrix could potentially benefit from the proposed approach, even if it is not based on a beamformer.For example, the Bayesian learning method for signal recovery for DoA estimation proposed in [23] employs the sample correlation matrix.Its Riemannian alternative is implemented by following their algorithm only with the Riemannian mean instead of the Euclidean mean.The empirical results appear in Section VII.

VII. SIMULATION RESULTS
In this section, we demonstrate the performance of the proposed approach based on Riemannian geometry, and compare it to Euclidean geometry, implicitly considered by the common practice 1 .Additionally, we compare our approach to a heuristic method, based on the intersection of subspaces.The intersection leads to the rejection of non-common components, such as the interference sources subspace, and preserves common components, such as the desired source subspace.We refer to it as the intersection beamformer (see Appendix A for more details).
We consider a reverberant enclosure of dimensions 5m × 4m × 3.5m consisting of a microphone array with M = 12 microphones.The AIRs between the different sources and the array are generated based on the image method [43], as implemented by the simulator in [44].The sampling frequency is 16KHz, and the length of the AIRs is set to 2048 samples.The received signal is transformed to the time-frequency domain using STFT with a window size of 1024 samples with 50% overlap.We test all methods using a single frequency bin, of index 250, chosen according to the microphone spacing, which is 4.36cm.Here, the correlation matrix estimation is based on 16 STFT windows, which results in a segment duration of 1.024s.The emitted signals are generated as white Gaussian noise.The desired source is constantly active, whereas the interference sources are active only intermittently.
All the sources are positioned on a 140 • arc of radius 2m from the center of the array on the XY plane.The heights of the interference sources vary randomly, uniformly distributed between 0.5m and 3m.The height of the desired source is set to 1.8m.Figure 1 presents the room layout, where the (two) interferences are marked by green squares, the desired source by a red star, and the microphone array by blue circles.The leftmost microphone is positioned at (2.0436m, 1m, 2m), and the rest are positioned 4.36cm apart along the x-axis.While we focus on this specific configuration, we note that we tested other configurations that yielded similar results.
We examine the performance of both the DS and the SbSp methods.Algorithm 2 is used for the proposed Riemannian DS, and the common practice is implemented by replacing step 3 in Algorithm 2 with (20).The SbSp methods require knowing the dimension of the signal space of the mean correlation matrix.For the intersection method only, the dimension of the signal space of each segment is also required.Since the desired source may not be the strongest source received at the array, we need to consider all the active sources, and not merely the strongest one when estimating the dimension of the signal subspace.To estimate the dimension, we implement a heuristic algorithm, based on the spectral gap.We consider the dimension of the signal space to be the number of eigenvalues higher than a threshold.For all methods, the threshold for the mean correlation matrix is the mean value plus the standard deviation of the eigenvalues (normalized to a unit sum).For the intersection method, the threshold for the sample correlation matrix of each segment is 1.5 times the mean of the eigenvalues of the sample correlation matrix.Apart from this practical implementation, we also present results for an oracle implementation, assuming the dimension of the signal space is perfectly known.
For quantitative evaluation, we use the root mean square error (RMSE) and the accuracy, for which DoA estimation error that is smaller than 3 • (arbitrarily chosen) is considered accurate.In addition, we use two other metrics.The first is the mean of the empirical output SIR with respect to all the interference sources, which is given by: where P is the spectrum computed using the evaluated method.The second metric is the directivity [45, ch.2], which is given by The proposed approach results in inherent interference rejection, which is its greatest merit.The attenuation of the interference sources by our approach allows for accurate DoA estimation even in the presence of strong interference sources.The measure of SIR is indicative of the amount of interference rejection.
In the first experiment, we consider two interference sources, each active at a single, but disjoint, segment, resulting in a signal of 2.048s.The reverberation time is set to 150ms, and the SNR is 20dB.We start with an example of the SRP of the DS beamformer (see (18)), computed using Γ R and Γ E , which is presented at the top of Figure 2. The SRP of the Riemannian DS beamformer is shown in solid blue and the SRP of the Euclidean DS beamformer is shown in dashed red.Both SRPs are in a dB (log) scale.The directions to the desired source and the interference sources are represented by a black solid line and a dashed black line, respectively.We see that by using Γ R the main lobe is directed towards the desired source.In contrast, the SRP using Γ E is peaked at 2 different directions, none of which is the direction to the desired source.The bottom of Figure 2 is the same as the top, only with the SRP computed using the sample correlation matrix of each of the two segments, presented in different shades of orange.Even though the main lobes of the two SRPs are not pointing toward the desired source, the Riemannian mean leads to a SRP with the main lobe directed at the desired source, whereas the lobes to other directions are highly attenuated.We emphasize that viewing the SRP of the Euclidean DS beamformer in addition to the SRP of each segment separately does not allow correct identification of the desired source.We note that since the population correlation matrix is approximately low rank and the number of microphones is typically small, the estimation of the correlation matrix based on a small sample is feasible.
Next, we randomly generate 200 different pairs of positions for the interference sources.For each pair, the desired source is located at 20 different equally spaced directions along the arc (with the height of 1.8m).Thus, in total, 4000 different scenarios are examined.Figure 3 presents the mean output SIR (top) and the directivity (bottom) for the DS method using the correlation matrix estimates: Γ R (based on Riemannian geometry) in blue and Γ E (based on Euclidean geometry) in red.The box indicates the 25th and 75th percentiles, and the line marks the median.We test the different methods, Riemannian or Euclidean, in varying input SIR values, i.e. the SIR with respect to the source's power (excluding the AIRs and the beamformer processing).
We see that the Riemannian DS method attains high output SIR values, even for strong interference sources (high input SIR).In contrast, the Euclidean DS method results in relatively low output SIR values.The gap in the output SIR values between the Riemannian DS, and the Euclidean DS is up to 10dB.These results coincide with Proposition 1, stating that SIR j (Γ R ) > SIR j (Γ E ), for every interference source j.We emphasize that the Euclidean mean, Γ E , is equivalent to the common practice of using the entire signal for a single correlation matrix estimation.Since both the mean output SIR and the directivity present similar trends, and due to space considerations, in the following, we only present the mean output SIR.
Figure 4 is the same as Figure 3, but presenting the SbSp method with the addition of the intersection method, which appears in orange.The top subfigure presents the results for the practical implementation that includes estimating the dimension, whereas the bottom subfigure presents the results for the oracle.We see that the Riemannian approach outperforms its Euclidean counterpart, by approximately 20dB.In addition, the oracle SbSp method is better than the practical SbSp.In comparison to Figure 3(top), it can be seen that the Riemannian SbSp method results in higher output SIRs than the Riemannian DS method.In contrast, for the Euclidean approach, the SbSp method yields slightly lower SIRs than the DS method.The reason is that the Riemannian mean better attenuates the interference sources, allowing for a better estimation of the signal subspace than the Euclidean mean.
We continue with examining the direction estimation to the desired source.The estimated direction is defined as the Figure 5 shows the estimated direction to the desired source for the Riemannian DS method (blue square), the Euclidean DS method (red circle), and the intersection method (orange star).The solid black line marks the true location of the target source (at 20 different positions).The dashed line marks the fixed location of the interference sources.The results for input SIR −6dB and −10dB appear on the left and right, respectively.We see that using the Riemannian mean, the direction estimation follows the desired source.In contrast, using the Euclidean mean (the entire signal) results in estimating the direction of one of the interference sources.The intersection method is also inferior to the proposed approach, resulting in direction estimation to the desired source or an interference source depending on the input SIR.We note that since the DoA is estimated using (51), only the main lobe of the SRP is considered for the DoA estimation (for simplicity, we assume a single desired source).Typically-used beamformers, based on Euclidean geometry, result in a main lobe that is pointing toward the strongest source which is an interference source.Since the number of sources is unknown, and their relative positioning is unknown as well, considering even lower lobes does not allow for estimating the DoA of the desired source using the SRP.Even if the interference source does not mask the desired source (which is often the case), we cannot differentiate between the interference source and the desired source, and cannot assign the correct direction to the desired source.In contrast, the Riemannian approach rejects the interference sources, resulting in a main lobe that points toward the desired source only.We report that similar results as in Figure 5 are obtained also for an SNR value of 0dB and for interference sources with similar DoA.Results for speech signals from the TIMIT dataset appear in Appendix B where similar trends are demonstrated.We repeat the experiment for 200 iterations, where at each iteration the desired source is positioned at 20 different positions as before.Figure 6 presents the RMSE and the accuracy for a different number of microphones for the Riemannian approach (blue circles) and its Euclidean counterpart (red squares).We recall that DoA estimation error that is smaller than 3 • (arbitrarily chosen) is considered accurate.We see that the proposed approach leads to significant improvements for all the tested number of microphones.
Next, we examine the sensitivity of the proposed approach to the SNR and the reverberation time.We repeat the setting of the two interferences, as described in the first experiment.The results are presented in Figure 7.At the top, the mean output SIR for the DS method is presented as a function of the reverberation time for a fixed SNR of 20dB.Several input SIR values are shown: 0dB (asterisks), −6dB (circle), and −10dB (triangle).The results for the Riemannian and Euclidean DS appear in blue and red, respectively.The bottom figure is the same as the top, only for different SNR values for a fixed reverberation time of β = 150ms.We see that the smaller the reverberation time is, the higher the output SIR is for the Riemannian DS method.In contrast, the Euclidean DS is less affected by the reverberation time, resulting in relatively low output SIRs.For all values of reverberation times, the Riemannian DS results in higher output SIR than its Euclidean counterpart.From the bottom figure, we see that the SNR has a large impact on the performance of the Riemannian DS; the higher the SNR is, the higher the output SIR becomes.Conversely, the Euclidean DS is moderately affected by the SNR, resulting in much lower output SIR values.A possible explanation is that the main phenomenon limiting the performance of the Euclidean approach is the existence of interference sources.In addition, as Proposition 2 predicts, the sensitivity of the Riemannian DS to the SNR is higher than the Euclidean DS, and the higher the SNR is, the larger the gap in the performance between the Riemannian and the Euclidean approaches.
We examine the performance of the MVDR beamformer, given by (48).The MVDR beamformer is popular when interference sources are present, thanks to its distortionless response in the direction of the desired source, and its typical narrow beams.However, in our setting, since the desired source is accompanied by interference sources, and the directions to all the sources are unknown, the DoA estimation of the desired source using the MVDR beamformer is outperformed by the DS and the SbSp methods.We also examine the case of 2 desired sources.The results show similar trends and appear in Appendix B, due to space considerations.
In the second experiment, we examine a multiple interference setting, by considering N I = 14 interference sources.We note that the number of interference sources is larger than the number of microphones, N I > M = 12, which typically limits the number of interference sources that can be accommodated (e.g.see [30]).Furthermore, assumptions 1 and 2 implicitly restrict the number of interference sources to be bounded by M − N D = 11.This limitation is only for the analysis, and, in practice, improved results are obtained even for a larger number of interference sources.The signal contains 10 segments, demonstrating the number of segments could be smaller than the number of interference sources.The duration of the emitted signal is 10.24s.The input SIR for each interference is −6dB.Each interference has a 30% probability of being active at each segment.The position of all the sources is set at random on the arc, as described in the first experiment.The activation map of the interference sources at the first Monte Carlo iteration appears in Figure 8(a), where light blue indicates 'active'.On average, 4.2 interference sources are active at the same time at each segment.An interference source that is partially active during a segment is considered active during the entire segment (a worst case).Note that there are interference sources active continuously during more than one segment, so their activation is not necessarily related to the division of the received signal into segments.Additionally, there exists no segment in which only one source is active.The output SIRs are presented in Figure 8(b).The Riemannian DS and SbSp methods appear in blue, the Euclidean DS and SbSp methods are in red, and the intersection is in orange.It can be seen that the Riemannian approach is superior to the Euclidean one, resulting in higher output SIRs.We note that  We remark that the empirical results of the proposed approach applied to the typically-used beamformers are not sensitive to the number of samples used for the computation of the correlation matrix.Since, in the experiments, additional samples also include samples from at least one interference source, adding samples increases the effect of the interference sources, resulting in weak dependency on the number of samples.
Next, we evaluate the performance of the proposed approach applied to the Bayesian learning method proposed in [23] for sparse signal recovery for DoA estimation.This is done by replacing the typically-used (Euclidean) sample correlation matrix with the Riemannian mean of sample correlation matrices computed in short-time segments.We repeat the same experiment as in [23] with 6 desired sources, only we consider two disjointly active interference sources with SIR −6dB and −10dB.The SNR is 20dB.The number of desired sources is unknown.We evaluated the performance using 3000 experiments, where at each experiment the direction of the interference sources is generated uniformly at random between [−50 • , 50 • ]. Figure 9 presents the RMSE for a different number of STFT windows, which are equivalent to the number of samples for the sample correlation matrix computation of each segment (there are two segments in the experiment).We see that applying the proposed Riemannian approach leads to improved performance for all the tested number of samples.

VIII. CONCLUSION
We present a Riemannian approach for the design of beamformers and DoA estimation methods for interference rejection in reverberant environments.Specifically, the Riemannian mean is incorporated instead of the sample correlation matrix for DoA estimation of the desired source, as it inherently rejects the interference sources.We analytically show that the DS beamformer, based on the Riemannian geometry of the HPD manifold, results in a higher output SIR than the typical DS beamformer, which implicitly considers the Euclidean geometry.We extend our approach to other beamformers, such as subspace-based beamformers and the MVDR, as well as a Bayesian learning method, experimentally demonstrating superior output SIR and better DoA estimations in comparison to their Euclidean counterparts.

APPENDIX A THE INTERSECTION BEAMFORMER
Another beamformer we examine is based on the observation that the desired signal subspace is the intersection of subspaces spanned by eigenvectors of the correlation matrix of the different segments.From each segment, i, we extract the desired signal subspace from the sample correlation matrix, Γ i , to obtain V ( Γ i ) whose columns are the N D leading eigenvectors.The projection matrix onto the signal subspace of Γ i is computed as follows: Since each desired source is active during all the segments, its ATF, h d j , is an eigenvector of V ( Γ i ).Consequently it is also an eigenvector of P sig ( Γ i ) for all i, with an eigenvalue 1, i.e.P sig ( Γ i )h d j = h d j .As a result, it is an eigenvector of P sig = Ls i=1 P sig ( Γ i ) with eigenvalue 1 as well since We consider the leading N D eigenvectors of P sig and form the matrix P sig,ND ∈ C M×ND , whose columns are the eigenvectors.The intersection spectrum is defined as follows: We note that in addition to the dimension of the signal space of the matrix P sig , the intersection method requires the estimation of the dimension of the signal space of the correlation matrix for each segment (in order to compute V ( Γ i )).

APPENDIX B ADDITIONAL EXPERIMENTAL RESULTS
We repeat the setting of the first experiment, only with an additional desired source, located at (2m, 3.5m, 2.5m).Figure 10 is the same as Figure 3, but the mean output SIR is taken over the two interferences and the two desired sources.The left figure presents results for input SIR of −10dB, and the right presents results for input SIR of −6dB.We see that the Riemannian approach is superior to the Euclidean one, resulting in higher SIRs.Also, the intersection method is more sensitive to the input SIR, resulting in higher output SIR for −6dB.Similar to the setting of one desired source, the Riemannian SbSp method is superior to the Riemannian DS method, as opposed to the Euclidean approach, for which they are on par.This is a consequence of the higher interference attenuation of the Riemannian approach, resulting in a better estimate of the signal space dimension.Next, we examine the performance of the Riemannian MVDR, and the Euclidean MVDR beamformers, given by (48), for Γ R and Γ E , respectively.Figure 11 presents the results.It is the same as Figure 3, only for the MVDR beamformer.We see that the Riemannian approach is superior to the Euclidean one, however, with a slight advantage.
We repeat the experiment with speech signals from the TIMIT dataset.For each source, the speaker and the time interval are chosen uniformly at random. Figure 12 presents the results.We see that the proposed approach leads to improved results in comparison to the typically-used beamformers also for speech signals.

APPENDIX C EXTENSION TO A STREAMING DATA SETTING
In a streaming data setting, we do not have access in advance to the entire signal, and the direction estimation is updated as more samples become available.Therefore, in this setting, we cannot compute the Riemannian mean using (17).To circumvent this, we turn to the estimator of the Riemannian mean proposed in [37], [46], which is updated after every received segment.Each segment, i, is processed separately using L w STFT windows, and an estimation of the correlation matrix, Γ i , is computed using (16).Next, the estimation of the Riemannian mean, denoted as Ri , which is the adaptive counterpart of Γ R , is updated using the following update step: The SRP of the DS beamformer is computed using (18) with Ri as an estimate of Γ R , and the direction to the desired source is set using (19).The algorithm is described in Algorithm 3.
We note that the current correlation matrix estimate has a full rank if L w > M .This implies a latency of at least M times the duration of the STFT window.In case the STFT is performed with overlap this latency is reduced accordingly.
4) Compute P DS (θ; Ri ) using (18) 5) Return θ = arg max θ P DS (θ; Ri ) As for the Euclidean alternative, its estimator is updated in finer granularity at every STFT window, l.The estimator at the lth update is denoted by E l and is computed as follows: Setting L w = 1 in step 2.1 in Algorithm 3, and substituting (55) in step 2.3 in Algorithm 3, results in the streaming version of the Euclidean counterpart.

APPENDIX D ON THE PARTICULAR CHOICE OF THE RIEMANNIAN METRIC
In this work, we consider the Affine Invariant metric (also called Fisher Information metric) [47].Another commonlyused metric in the space of HPD matrices is the Log-Euclidean metric [48], which could be viewed as a computationally efficient local approximation of the Affine Invariant metric.The induced Log-Euclidean distance is given by: In the context of this work, the Affine Invariant metric is advantageous over the Log Euclidean because it better enhances the desired source subspace relative to the interference and noise subspace, as we show next.We remark that the derivation is similar to the derivation in Section V, where we demonstrate the advantage of the Affine Invariant metric over Euclidean geometry.First, we note that the following holds [49]: where Γ LE denotes the Riemannian mean based on the Log-Euclidean metric.Second, the ATF to the desired source, h 0 , is a common eigenvector of all the correlation matrices per segment.Consequently, according to Lemma 2 below, the mean correlation matrices based on both metrics have the same eigenvalue associated with the common eigenvector h 0 .More specifically, denoting λ 0 (Γ) Lemma 2. Let {Γ j } j be the set of HPD matrices, having a common eigenvalue λ 0 , associated with the common eigenvector h 0 .Then where Γ R , Γ LE , Γ E are the Riemannian means based on the Affine Invariant and the Log-Euclidean metrics, and the Euclidean mean.
The rest of the eigenvectors of the correlation matrices span the interference and noise subspace.We recall that λ i (Γ) is the ith eigenvalue, so we get which follows from Lemma 2, and (57).We see that the Riemannian mean induced by the Affine Invariant metric captures better the desired signal subspace in comparison to the Riemannian mean induced by the Log Euclidean metric and the Euclidean mean (according to (24)), entailing an advantage in SbSp methods, for example.

SUPPLEMENTARY MATERIAL: ON INTERFERENCE-REJECTION USING RIEMANNIAN GEOMETRY FOR DIRECTION OF ARRIVAL ESTIMATION PROOFS OF THEORETICAL RESULTS
A key observation in our analysis is that although there is no explicit expression for the Riemannian mean, in general, under assumptions 1 and 2, the correlation matrices {Γ l } commute, and the Riemannian mean has an explicit form, according to Proposition 5. We note that in practice we use Algorithm 1 to compute the Riemannian mean because the assumptions do not strictly hold.
For completeness, we prove Proposition 5, which is a property of Karcher's mean [39].
Proposition 5.The Riemannian mean of K commuting HPD matrices, {Γ l } K l=1 is given by Proof.The Karcher mean, Γ R , is the solution of the Karcher equation [35], [39], [50], [51] K l=1 log(Γ l , and use the fact the matrices commute: A. Proof of Proposition 1 Proof.The proof relies on Lemma 1, which we prove first. Lemma 1.The Riemannian or the Euclidean mean of the population correlation matrices of the segments ( 25) over the entire interval can be written in the same parametric form as: The Riemannian mean Γ R is obtained by setting the parameters µ j to and the Euclidean mean Γ E is obtained by setting µ 2 j = σ 2 j τ j .
Proof.We start by expressing the correlation matrix of the ith segment: where J i is the set of indices of the active interference sources during the ith segment.The vectors h 0 , and {h j } NI j=1 are all orthogonal, thus they are eigenvectors of Γ i , ∀i = 1, ..., L s .
We denote by upper tilde the normalized version of a vector, i.e. h = h h .So, The correlation matrices for each segment, {Γ i } Ls i=1 , commute with each other because they share their eigenvectors: h 0 , {h j } LI j=1 , and {v l } M−NI−1 l=1 , where {v j } M−NI−1 j=1 are the eigenvectors spanning the common noise subspace of all the matrices.
Following Proposition 5, it holds true that We compose Γ R using a transformation for the eigenvalues for each Γ i : and the rest of the eigenvectors have an eigenvalue of σ 2 v , with multiplicity of M − N I − 1.The matrix Γ R meets all the requirements.
The proof for the Euclidean mean follows from its definition and (64).
We compute the output SIR for a general matrix with a similar structure as in Lemma 1.The correlation matrix is: and the SIR becomes: We note that the SIR depends on the number of microphones implicitly through the norm of the ATFs h j , and h 0 .Since κ, ρ ≥ 0, and κ > ρ, the smaller µ 2 j , µ 2 l ∀l are, the higher the SIR is.Using Lemma 1, we identify the coefficients for the Riemannian mean as: and for the Euclidean mean as µ 2 l = σ 2 l τ l .It is left to show that the coefficients for the Riemannian SIR are smaller than for the Euclidean SIR.So, we examine the conditions under which We see that the Riemannian mean of the correlation matrices leads to the geometric mean of the noise power and the interference power plus the noise power.In contrast, the common practice that is the Euclidean mean of the correlation matrices leads to the arithmetic mean of the powers.

B. Proof of Lemma 2
Proof.For the Euclidean geometry, the proof is straightforward.
For the Riemannian geometry, since, in general, there is no close form expression for the Riemannian mean, we use its definition as the minimizer of Γ R = arg min For the Affine Invariant metric, the following holds Since the matrix Γ is a function of the matrices Γ j , we have that h 0 is also its eigenvector.We notice that min log 2 λ 0 (Γ for λ 0 (Γ R ) = λ 0 (Γ j ).This is the minimum possible value for every j, so it must hold that λ 0 (Γ R ) = λ 0 (Γ j ).
For the Log-Euclidean distance, we use eigenvalue decomposition and denote by {u i } and {v k } the eigenvectors of Γ LE and Γ j , respectively.We have which is minimal for every j, for λ 0 (Γ LE ) = λ 0 (Γ j ).The last equality is due to the orthogonality between h 0 and the rest of the eigenvectors.

C. Proof of Proposition 2
We start by proving Proposition 2, i.e. showing that and continue with examining the conditions under which Proof.The proof employs the following technical Lemma Lemma 3.For µ 2 j given by (33), the following holds: We focus on the expression in the bracket, rewrite it as a fraction, and show that it is larger than 1.
We recall the expression for the SIR from (69): We denote the following functions, which are the numerator and the denominator of the expression of the Riemannian SIR: where µ 2 j is given by (33).Similarly, we define f E , and g E with µ 2 j given by (34).The derivative is where f and g represent f R or f E and g R or g E , respectively, and (•) ′ denotes the derivative with respect to σ 2 v .In the proof of Proposition 1, we show that µ 2 j for the Riemannian mean in (33) is smaller than its Euclidean counterpart in (34).In combination with Lemma 3, we have that , 0 < g R (σ 2 v ) < g E (σ 2 v ), so we focus on the numerator.We show that ∂ ∂σ 2 v SIR j (Γ R ) < ∂ ∂σ 2 v SIR j (Γ E ) < 0, by proving the following claims: j for the Euclidean mean does not depend on σ 2 v , the same holds for f E (σ 2 v ) and f ′ E (σ 2 v ) = 0.For the Riemannian case, using Lemma 3 we have that Proof of Claim 2 Since µ 2 j for the Riemannian mean in (33) is smaller than its Euclidean counterpart in (34), we have f R (σ 2 v ) > f E (σ 2 v ).Additionally, if (30) holds for µ 2 j in (34) then f E (σ 2 v ) > 0. It is left to show that g ′ R (σ 2 v ) ≥ g ′ E (σ 2 v ) > 0. This holds due to Lemma 3: Following the proof of Proposition 2, since f ′ E (σ 2 v ) = 0, it follows that iff g From the expression of f E (σ 2 v ) it follows that f E (σ 2 v ) > 0 iff condition (30) holds.So, it is a sufficient and a necessary condition for ∂ ∂σ 2 v SIR j (Γ E ) < 0 to hold.For the Riemannian mean, the condition under which ∂ ∂σ 2 v SIR j (Γ R ) < 0 is more easily met.Additionally, it is a sufficient but not a necessary condition, as we show next.
First, we note that condition (30) can be recast as where µ 2 j given by (34).For the Riemannian mean, under condition (86) with µ 2 j given by ( 33) it holds that ) < 0, condition (86) with µ 2 j given by ( 33) is a sufficient but not a necessary condition for ∂ ∂σ 2 v SIR j (Γ R ) < 0 to hold, as opposed to the Euclidean case.
In the proof of Proposition 1 it is shown that the parameter µ j for the Riemannian mean in (33) is smaller than its Euclidean counterpart in (34).Therefore, condition (86) is more easily met when µ j is given by (33) than when µ j is given by (34).As a consequence, it is possible for ∂ ∂σ 2 v SIR j (Γ E ) to be positive, while ∂ ∂σ 2 v SIR j (Γ R ) is negative.

D. Proof of Proposition 3
Proof.Under Assumption 1 the ATF of the desired source, h 0 , is an eigenvector of Γ j for all j with the same eigenvalue λ 0 .Then, according to Lemma 2 the following holds For the Riemannian and the Euclidean mean, the following holds [39]: Thus, for all j: and therefore which completes the proof.

Fig. 1 .
Fig. 1.The reverberant room with the microphone array (blue circles), the desired source (red star), and the interference sources (green squares).(a) A 3D view.(b) A 2D view.

Fig. 2 .
Fig. 2. The SRP of the DS beamformer using (a) Γ R in solid blue and Γ E in dashed red, and (b) Γ 1 and Γ 2 in different shades of orange.The black solid line indicates the direction of the desired source, and the dashed black lines indicate the directions to the interference sources.The Input SIR is −6dB.

Fig. 3 .Fig. 4 .
Fig. 3. (a) The mean output SIR and (b) the directivity for two interference sources, for the Riemannian and the Euclidean DS method.The x-axis indicates the input SIR, and the y-axis indicates the output SIR.The box indicates the 25th and 75th percentiles, and the central line marks the median.Several input SIR values are presented.

Fig. 5 .
Fig. 5. Estimation of the DoA to the desired source for (a) input SIR of −6dB and (b) input SIR of −10dB.

Fig. 6 .
Fig. 6.(a) RMSE and (b) accuracy versus the number of microphones for two interference sources with input SIR of −10dB.

Fig. 7 .
Fig. 7.The mean output SIR as a function of (a) the reverberation times and (b) the SNR, for 2 interference sources.The Riemannian DS appears in blue, whereas the Euclidean DS appears in red.Several input SIR values are presented: 0dB (asterisks), −6dB (circle), and −10dB (triangle).

Fig. 8 .
Fig. 8. (a) Activation map for the 14 interference sources during the observed 10 time segments (blue indicates 'active').(b) Output SIR of the different beamformers.The input SIR of each interference is −6dB.

Figure 8 (
Figure 8(b) serves as an example of an increased number of segments in comparison to Figure 3. Since there are on average 4.2 active interference sources at every segment, the results are not improved as the number of segments is increased.We remark that the empirical results of the proposed approach applied to the typically-used beamformers are not sensitive to the number of samples used for the computation of the correlation matrix.Since, in the experiments, additional samples also include samples from at least one interference source, adding samples increases the effect of the interference sources, resulting in weak dependency on the number of samples.Next, we evaluate the performance of the proposed approach applied to the Bayesian learning method proposed in[23] for sparse signal recovery for DoA estimation.This is done by replacing the typically-used (Euclidean) sample correlation matrix with the Riemannian mean of sample correlation matrices computed in short-time segments.We repeat the same experiment as in[23] with 6 desired sources, only we consider two disjointly active interference sources with SIR −6dB and −10dB.The SNR is 20dB.The number of desired sources is unknown.We evaluated the performance using 3000 experiments, where at each experiment the direction of the interference sources is generated uniformly at random between [−50 • , 50 • ].Figure9presents the RMSE for a different number of STFT windows, which are equivalent to the number of samples for the sample correlation matrix computation of each segment (there are two segments in the experiment).We see that applying the proposed Riemannian approach leads to improved performance for all the tested number of samples.

Fig. 9 .
Fig. 9.The RMSE of the Bayesian learning method for 6 desired sources with and without the proposed approach in blue and red, respectively, for (a) SIR −6dB and (b) SIR −10dB.

Fig. 10 .Fig. 11 .
Fig. 10.Output SIR of the different beamformers for two desired sources and two interference sources.(a) The input SIR is −10dB.(b) The input SIR is −6dB.

Fig. 12 .
Fig. 12. Estimation of the DoA to the desired source for speech signals where (a) the input SIR is -6dB, and (b) the input SIR is −10dB.

Algorithm 3 2 )
Streaming DoA estimation in the presence of multiple interferences Input: The current result of the STFT window of the received signal Output: The estimated direction of the desired source θ 1: Set R0 = I 2: Repeat 1) Accumulate L w STFT windows (that form a segment) Compute its sample correlation matrix, Γ i , using ) Equation (71) holds due to the weighted arithmetic mean and geometric mean inequality [52, pp.111-112, Theorem 10.5] [53].

d 2 LE 2 Fi 2 F 2 F
(Γ LE , Γ j ) = ln(Γ LE ) − ln(Γ j ) = ln λ i (Γ LE )u i u H i + ln λ 0 (Γ LE )h 0 h H 0 − k ln λ k (Γ j )v k v H k − ln λ 0 (Γ j )h 0 h H 0 = C + (ln λ 0 (Γ LE ) − ln λ 0 (Γ j ))h 0 h H 0 = tr CC H + (ln λ 0 (Γ LE ) − ln λ 0 is used since the DoA estimation is based on a single STFT frequency bin.In the case of N D desired sources, the N D directions are set according to the N D strongest lobes in the SRP.The algorithm for a single desired source is described in Algorithm 2.