Low-Rank Room Impulse Response Estimation

In this paper we consider low-rank estimation of room impulse responses (RIRs). Inspired by a physics-driven room-acoustical model, we propose an estimator of RIRs that promotes a low-rank structure for a matricization, or reshaping, of the estimated RIR. This low-rank prior acts as a regularizer for the inverse problem of estimating an RIR from input-output observations, preventing overfitting and improving estimation accuracy. As directly enforcing a low rank of the estimate results is an NP-hard problem, we consider two different relaxations, one using the nuclear norm, and one using the recently introduced concept of quadratic envelopes. Both relaxations allow for implementing the proposed estimator using a first-order algorithm with convergence guarantees. When evaluated on both synthetic and recorded RIRs, it is shown that under noisy output conditions, or when the spectral excitation of the input signal is poor, the proposed estimator outperforms comparable existing methods. The performance of the two low-rank relaxations methods is similar, but the quadratic envelope has the benefit of superior robustness to the choice of regularization hyperparameter in the case when the signal-to-noise ratio is unknown. The performance of the proposed method is compared to that of ordinary least squares, Tikhonov least squares, as well as the Cramér-Rao lower bound (CRLB).


I. INTRODUCTION
C ONSIDERING the acoustics of a room as a linear timeinvariant (LTI) system, room impulse responses (RIRs) describe the impact of the room on a sound signal, between a point source and a point receiver. They play a vital part in algorithms within a multitude of acoustic signal processing tasks, such as source localization [1], speech dereverberation [2], auralization [3], source separation [4], and echo cancellation [5].
There are several ways of modeling the RIR. Among the most popular ones are the infinite impulse response (IIR) (see e.g., [6], [7]) and finite impulse response (FIR) models (see e.g., [6], [8]). The IIR model offers the possibility of a more compact representation, however with the downside of possible difficulties Manuscript received 16  estimating the filter parameters [9], and potential issues with instability [10]. The FIR model is simple and straightforward, but with the disadvantage that comparatively many coefficients are needed to accurately represent the RIR [9]. For a regular office-sized room, the FIR model can be several thousands of taps long [2], which can be prohibitive from a computational and memory requirement point for view [11], [12]. If the RIR is estimated from an output signal originating from an input signal with poor spectral excitation, i.e., lacking power in certain frequency bands, the estimation problem can become ill-posed, and the resulting estimation could suffer from large variance. To counteract this, it is common to use regularization, see e.g., [13]. The standard approach in RIR estimation is to use Tikhonov regularization [14], a special case of the types of regularization presented in [13]. However, we will demonstrate in this work that by exploiting an (approximate) low-rank structure of RIRs, estimation performance may be improved as compared to state-of-the-art estimators with and without regularization. As of late, low-rank modeling, and low-rank regularization, has become increasingly popular in signal processing, data science and related fields, such as machine learning [15], video background subtraction [16], and matrix completion [17].
In these applications, low-rank structures of matrices or tensors parametrizing the signals under consideration are exploited as to, e.g., obtain compact representations or to regularize otherwise ill-posed estimation problems. The class of signals amenable to this type of treatment includes polynomials, rational functions, smooth and periodic functions, and what will be used in this paper as a model for motivating low-rank structure of RIRs, sums of decaying sinusoids [18].
It should here be noted that in the system identification literature, low-rank structures are most commonly utilized for describing the Toeplitz matrix representing the discrete convolution operation, i.e., the LTI system acting on an input. This has been discussed by, e.g., Marconato et al. in [19] and [20]. In these works, empirical Bayes methods are primarily being used, and it is shown that low-rank regularization acts as a powerful method for improving the accuracy of the estimation. In particular, promoting Toeplitz matrices with low-rank is shown to add robustness in the face of input signals with poor spectral excitation.
In contrast, the low-rank structure considered herein does not refer to aforementioned Toeplitz matrix, but rather to a matricization of the impulse response itself. Specifically, we here aim to exploit that RIR vectors when reshaped into a matrix allow for low-rank approximations, something we have taken advantage of in our recent work [21] (see also [18], [22] for relevant work outside the domain of acoustic signal processing). This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ System identification with this type of low-rank structure has earlier been explored by Paleologu et al. in works like [23], [24], [25] using iterative Wiener filter, recursive least-squares, and Kalman filter respectively, but there under the name nearest Kronecker product. The main focus there are shorter impulse responses, particularly for network echo cancellation. In contrast, the scenario considered in this paper concerns longer impulse responses for applications in acoustic signal processing.
Furthermore, the estimation algorithms in the above mentioned papers require the user to exactly specify the number of low-rank components to be estimated, thereby relying on oracle knowledge of the system complexity. In the work presented herein, we instead propose an estimator where the rank of the estimate is implicitly controlled by means of hyperparameters. In particular, the proposed estimator poses RIR estimation as an inverse problem, with low-rank estimates being promoted by the use of a regularization term.
It may here be noted that directly penalizing the rank of an RIR matrix yields an NP-hard problem [26]. A popular remedy to this in general low-rank estimation is nuclear norm regularization, i.e., penalizing the sum of the singular values, since it yields a convex optimization problem that can be solved by standard methods. This approach has however been shown to introduce a shrinkage, or bias towards zero, of the singular values (see e.g., [15]).
In this work, we propose to counter this bias by utilizing the so-called quadratic envelope of the rank penalty. Quadratic envelopes, introduced in [27], constitute a class of minorizers constructed from point-wise best function approximations from below by quadratic functions, and have recently been used for addressing bias issues in sparsity problems within the realm of image processing and computer vision by Carlsson et al. in works like [27], [28], [29], [30]. A drawback of this relaxation is that the resulting optimization problem is not necessarily convex, but in many cases this is outweighed by the very attractive property that, for appropriate hyperparameter values, the relaxation does not move any local minima of the problem being approximated. Its merit is, however, to the best of the authors' knowledge, yet to be explored in the context of acoustic signal processing. In addition, the problem considered herein falls into a class of problems for which convergence of the proximal gradient method, or forward-backward splitting (FBS), to a local minimum can be guaranteed.
The contribution of this paper is therefore to provide a framework for low-rank estimation of room impulse responses. We present an easy-to-use algorithm and show that low-rank regularization outperforms benchmark methods, that is, ordinary least squares and Tikhonov regularized least squares. This is demonstrated using synthetic as well as real-life RIRs, driven by synthetic as well as real-life signals and it is shown that this works well on real life-recorded RIRs. The outperformance is consistent, but particularly noticeable when the SNR is low, or when the spectral excitation is poor. Further, the quadratic envelope performs exceptionally well when prior knowledge about the SNR can not be assumed. Finally, it should be noted that the use of the proposed algorithm is not restricted to acoustics and RIRs, but rather it is applicable to any physical system of which the impulse response is well modeled by a sum of decaying sinusoids.
This paper is organized as follows: first, Section I is concluded with an introduction of the notation used throughout the paper. In Section II, the signal model is introduced. In Section III, different possible relaxations of the low-rank penalty term are discussed, and in Section IV, the proposed algorithm is introduced. Numerical results are presented in Section V. Finally, in Section VI, conclusions of the work presented here are discussed, and possible avenues for future research are being pointed out.

A. Notation
Vectors are denoted by lower case, bold letters, such as h, and matrices are denoted by upper case, bold letters, such as H. Subscript on an upper case bold letter indicates matrix column, i.e., H j denotes the jth column of H, and on a lower case bold letter it indicates vector element, i.e., h j denotes the jth element of h. A bar,·, denotes complex conjugation of a complex number and the hat symbol,·, denotes an estimated quantity. The symbol denotes the Hadamard product, i.e., element-wise multiplication of vectors or matrices, the nabla operator, ∇, refers to the gradient of a function, and E[·] denotes statistical expectation. Linear operators are denoted by upper case calligraphic letters, such as A, and an asterisk, · * , denotes adjoint. Sets are also denoted by upper case calligraphic letters, but it will be obvious from context what is considered. The norm of a matrix or an operator is denoted · and refers to · 2 , u, v n = n j=1 u j v j denotes the inner product between two vectors u, v ∈ R n , and i = √ −1 denotes the imaginary unit.

II. SIGNAL MODEL
We assume an observed acoustic pressure signal in discrete time y(n), n = 1, 2, . . . , n y , with corresponding signal vector y ∈ R n y . This observed signal is the result of a known signal x(n), n = 1, 2, . . . , n x , with corresponding vector x ∈ R n x , being generated by a point source in a room, and can therefore be modeled as the convolution of x with the room impulse response h(n), n = 1, 2, . . . , n h , with corresponding vector h ∈ R n h , for n = 1, 2, . . . , n y , i.e., we consider only the part of the RIR where the system is fully excited. We will assume that n x ≥ n h , to not have to consider initial conditions. We further assume measurement noise e ∈ R n y , that we assume to be white Gaussian with variance σ 2 , i.e., e ∈ N (0, σ 2 I n y ), where I n y ∈ R n y ×n y denotes the identity matrix. In the more general case of nonwhite, non-stationary noise, a combination of prefiltering and regularization can be used, as elaborated on in [14]. Taken together, we have that which can also be written as where X ∈ R n y ×n h is the Toeplitz matrix corresponding to the convolution operation in (2). We will have that n y = n x − n h + 1. We define the linear operator A : R n h → R n y , such that A(h) = x * h. Here we also define the adjoint operator A * : R n y → R n h , as the unique operator for which

III. ESTIMATION WITH LOW-RANK HEURISTICS
The room impulse response h can be estimated from the input and observed signal vectors with the least squares method, and under the assumption that the noise e is white Gaussian, this yields the maximum likelihood estimator. Inverse problems like these are, however, often ill-posed [31], [32], particularly in acoustic signal processing, where poor excitation is a common occurrence [14]. The inverse mapping from y to h might therefore be unstable due to ill-conditioning. The fact that the output is corrupted by noise could yield an over-fitted solution, and finally, the solution could be non-unique [32]. In order to have a well-posed problem, a possible approach is to use regularization [13]. This is done by appending a penalty term to the optimization problem (5). With that, the problem can be written as We consider the function f : R n h → R to be the data-fit term 1 2 y − A(h) 2 2 , i.e., a function penalizing the distance between the model output and the observation. The function g : R n h → R is the regularization term. Note that this seemingly simple form includes constrained optimization problems, since we can let g be an indicator function for some non-empty set S, This form also includes the aforementioned Tikhonov regular- where λ T ≥ 0 is a parameter controlling the degree of regularization. The regularization should, however, preferably be designed using prior knowledge about the problem [14]. We will therefore demonstrate the low-rank properties of RIRs, and argue how these can be exploited in regularization. An RIR can be well modelled as an infinite sum of decaying sinusoids, see e.g., [33]. The frequencies of these decaying sinusoids correspond to the modes of the room, i.e., frequencies at which standing waves would occur in the absence of wall absorption. The number of room modes, N f , below a certain frequency f satisfies , where c is the speed of sound. One might therefore expect to have to use a large number of terms to approximate this sum. However, in [10] it is concluded that the large overlap between the modal curves, which occurs particularly at higher frequencies [9], makes the number of distinguishable peaks in the frequency magnitude response much smaller, and it is justified to approximate the infinite sum with a finite one. For a longer exposition, see e.g., [10] and [21]. With this, the room impulse response can be modeled by for n = 1, 2, . . . , n h . Here, μ m denotes the initial amplitude, r r , r s ∈ R 3 the position of the receiver and the source, respectively, β m ∈ R + the exponential decay constant, ω m ∈ [0, π] the angular frequency φ m ∈ [0, 2π), the phase, and m s ∈ N is the number of decaying sinusoids used in the model. For ease of notation we will drop the dependence on r r and r s and refer to h(r r , r s , n) simply as h(n), the same h(n) as presented in Section II.
When a vector consisting of the sum of m s discrete time decaying sinusoids is reshaped into a matrix, that matrix will have rank 2m s , see e.g., [35]. Further, as previously mentioned, there is a large spectral overlap of some of the modal curves of the decaying sinusoids that make up the RIR, particularly at higher frequencies. These two facts taken together makes for good conditions for enforcing a low-rank structure on the solution, i.e., that when the solution vector is reshaped into a matrix, that matrix should have low rank. In previous work we have shown the connection between the physics driven modeling of room acoustics and the use of the sum of decaying sinusoids model, along with the connection to low rank modeling [21]. For more on the physical motivation we refer to [21], but for the convenience of the reader, we here reiterate the low-rank motivation. It should be noted that a measured RIR will, when matricized, not be low-rank in the strict sense, because of measurement noise and model errors. Low-rank is here meant in a less strict sense, i.e., that a low-rank approximation of the RIR will render a small approximation error.
Let us define the reshaping operator R pq : (10) Then, H = H 1 + H 2 + · · · + H m s , where H m corresponds to the mth decaying sinusoid of (9), where z m = e iω m −β m . We see that each of the m s terms that make up H, can be written as a sum of two rank-1 matrices, i.e., as a rank-2 matrix. As long as z m are distinct, the matrix H will have rank 2m s [36]. In general, H could be a non-square matrix, but here we consider the square case, as to simplify the exposition.
In light of this discussion, the problem to be solved would ideally be or, equivalently, using the form of (6), where the penalty function, I 2m s , is the indicator function for the set of matrices of rank at most 2m s . This problem, however, is non-convex and NP-hard [26]. Furthermore, the exact rank, 2m s , is typically not known. Instead of the constrained problem of (12) and (13), one could consider the penalized version, Here, λ ≥ 0 is a parameter controlling the degree of penalization of the rank of the solution, in analogy to λ T in (8). This transforms the problem of not knowing the rank 2m s , into the problem of choosing λ, but the problem remains NP-hard [15]. Considering that the rank of a matrix is equal to the number of non-zero singular values, (14) is equivalent to where · 0 is the 0 -pseudo-norm and σ(H) : R q×q → R q is the function that takes an R q×q -matrix and returns a vector of its singular values in non-increasing order, given its singular value The NP-hardness persists, however, so in order to be able to find an approximate solution to (15), the problem needs to be relaxed. This can be done in different ways, which will now be further explored.

A. Nuclear Norm Regularization
In order to achieve low-rank solutions, throughout this paper, two different kinds of penalty functions will be used and compared. Firstly, the popular approach of using the 1 -norm instead of the 0 -norm in the penalization of σ(H) [37], where H * denotes the nuclear norm of H, defined as 1 , and λ NN ≥ 0 is analogous to λ in (14). This is the first of the two optimization problems we will aim to solve in Section IV. The advantage of problem (16) is that it is convex and can be solved efficiently [38]. The disadvantage is that this solution will be biased, in that the singular values will be shrunk [15].
Several different methods have been proposed in an attempt to remedy the issue of the shrinking of large singular values. Among them, the weighted nuclear norm [39], which has been generalized to so called generalized singular value thresholding in [40].

B. Quadratic Envelope
The second approach to achieving a low-rank solution that will be used in this paper, is the use of a function created in attempt to closer emulate the 0 -pseudonorm, the so called quadratic envelope of the 0 -norm. The quadratic envelope of a function consists of pointwise, quadratic approximations, that nowhere overestimate the original function. For the function g : R n → R, the quadratic envelope, at the point u ∈ R n , is defined as i.e., the supremum of all functions of the form α − γ 2 u − v 2 , that nowhere overestimates the original function g(·). The parameter γ > 0 controls how close this envelope will be to the original function. A smaller value of γ means a larger degree of smoothing of the function, whereas Q γ (g) converges point-wise to g as γ → ∞ [27].
Firstly, it is readily verified that In other words, the problem decouples in all dimensions, and we can, without loss of generality, look at a one-dimensional version of the aforementioned function, i.e., u ≡ u. This has been considered before, in works like [41] and [42], with support of the theory in [43] and [44], but we restate the results here. Further, since this is to be applied to the singular values, we will, to simplify the exposition, only consider u ≥ 0. Proposition: The quadratic envelope of λ QE u 0 is given by or, equivalently, where μ = 2λ QE /γ, and λ QE ≥ 0 is analogous to λ NN in (16). Proof: The proof is based on [44], but for the convenience of the reader we iterate it here. See Appendix A.
The major benefit of the quadratic envelope, as compared to the nuclear norm, manifests itself in (19). For values u larger than μ, the quadratic envelope is constant. The differences between the nuclear norm and the quadratic envelope in approximating u 0 are further illustrated in Fig. 1. The effect of the parameter γ is noticeable, in that a larger γ yields an envelope closer to u 0 . Further, it can be seen that the main effect of increasing λ QE is raising the level at which the function remains constant.
A particular case of the penalty (20), when γ = 2, has been considered in [29]. Adding (20) to (5) yields the final form of the second optimization problem we are aiming to solve in Section IV, where we, for ease of notation, for a matrix X, define As previously mentioned, γ is a parameter that controls the fidelity of Q γ (g) to the original function g. If γ ≤ A , then the objective function in (21) is convex. This is of course a very desirable property as these problems are generally considered comparatively easy to solve. However, for γ > A we have the following upside: ifĥ is a (strict) local minimizer of (21), thenĥ is also a (strict) local minimizer of (15). In addition to this, their global minima coincide [27].

A. Preliminaries
As previously indicated, the optimization problem will be solved using the proximal gradient method, also known as forward-backward splitting (FBS), which can be seen in Algorithm 1. It has become very popular recently due to its ability to handle non-smooth functions. For more on the proximal gradient method, see e.g., [32], [45], [46]. The general idea is to alternate between taking a step in the negative direction of the gradient of the smooth data-fit term f , and trying to minimize the (possibly non-smooth) penalty function g, using the so-called proximal operator. For a function g(V) : R q×p → R, with the step length parameter ρ > 0, it is defined as In Algorithm 1, the gradient step, u k − ρ∇f (u k ), is to be interpreted as the forward step, and the proximal mapping, prox ρg (·), as the backward step.
Here, L denotes the Lipschitz constant of ∇f . In order to guarantee that Algorithm 1 will converge, a few additional technical assumptions have to be made about f and g respectively. These are all met for the problems considered in this paper, more details can be found in [47].
In order to facilitate the exposition we will introduce a couple of linear operators. The operator H : R n → R n denotes the reversion of the order of the elements in a vector, i.e The operator F : R n → C n denotes the discrete Fourier transform (DFT), and F −1 : C n → R n the inverse DFT. 1 The operator P k : R n → R n+k denotes padding of a vector with k zeroes at the end. Finally, the operator C n k : R m → R n−k+1 , where n ≥ k and m ≥ n − k + 1, is the linear operator that picks out the kth till the nth element of a vector, i.e., C n k (y) = y k , y k+1 , . . . , y n .

B. Proposed Algorithm
The previously introduced operators A and A * can now be expressed in terms of these operators, and respectively.
The first step in iteratively solving (16) and (21), respectively, is the gradient step, with respect to the data-fit term f in the cost function. The gradient of (5) is given by For the backward step, there is a seemingly complicating fact in that this is to be taken with respect to a matrix and its singular values. This, however, turns out not to be a problem. After the Algorithm 2: Low-Rank RIR Estimation Algorithm.
Proposition: The proximal operator of Q(H) is given by and the proximal operator of λ NN σ(H) 1 is given by where D : R q → R q×q is the operation of creating a diagonal q × q-matrix, with the argument of the operator as the diagonal. The explicit expressions for prox ρQ(·) (H) and prox ρλ NN · * (H) are given in Appendix B.
Proof: See Appendix B. It may be noted that this is due to the orthogonal invariance of Q(H) and λ NN σ(H) 1 , respectively [46].
After either of the proximal operators has acted on the vector of singular values, all that remains is to reassemble the matrix, with the adjusted singular values, and then vectorize, in order to get the current iterate of the estimated RIR. Finally, a check is made to see if the algorithm is making enough progress, i.e., if the new iterate is sufficiently different from the previous one, or if the iterative procedure should be terminated. These steps are summarized in Algorithm 2. Solving (16) or (21) differs only in the use of proximal operator. That is reflected in step 3 of Algorithm 2, in that 3a) corresponds to solving (16) and 3b) corresponds to solving (21).
The parameter ρ in (23) controls the balance between minimizing the penalty function, and staying close the current point V, and serves as step-size for the proximal gradient algorithm. Small values of ρ will yield a Z close to V, whereas larger values of ρ will result in Z being closer to the minimum of g(Z). In order to assure that the sequence generated by Algorithm 1 is bounded, we must chose ρ ∈ (0, 2/L) [48].
We have that ∇f (w) − ∇f (z) ≤ A * A w − z i.e., L ≤ A * A . For a linear operator, the norm of the operator is the same as for the corresponding matrix, i.e., A * A = X T X , where X ∈ R n y ×n h is the Toeplitz matrix s.t. A(h) = Xh. Computing X T X is, however, computationally demanding and something that should be avoided if possible. In this instance, it is possible to use a bound on X . It is shown in [49] that X ≤ 2 M , where M denotes the essential supremum of the absolute value of the Fourier series of x. Taken together with the well-established fact that, for any two linear operators A and B, AB ≤ A B , we have that L ≤ 4 M 2 . In order to ensure that the strict inequality is upheld, we let Finally, three more parameters need to be set by the user. A maximum number of iterations, maxIter, that the user is willing to run, a tolerance level, tol, to determine when the algorithm is not making sufficient progress anymore and should be terminated, and a small number δ to avoid division by zero in the normalization of the norm of the difference between the new and the old iterate. Preliminary simulations showed that maxIter = 10 4 , tol = 10 −4 , and δ = 10 −6 were suitable choices, and these values will be used throughout this paper.

C. Computational Complexity
In this Section we study the computational complexity of Algorithm 2, as a function of the length of the RIR, n h . The difference between using the nuclear norm or quadratic envelope penalty is negligible, in terms of complexity per iteration. Asymptotically, as n h → ∞, the most demanding steps of the algorithm is step 3. In step 3, we perform a singular value decomposition of the In terms of wall time, for problems of the size studied in this paper, step 1 of the algorithm is the most demanding. Although the fast Fourier transform (FFT) require only O(n h log (n h )) multiplications, the fact that we in step 1 have to carry out 4 of these (including the IFFT's) each iteration, makes this the most time-consuming part of the algorithm.

V. NUMERICAL RESULTS
In order to visualize the dependence of the proposed and benchmark algorithms' performance on room related parameters, we will first provide numerical results using synthetically generated RIRs. Then, to demonstrate the practical applicability of the proposed algorithm, we proceed by presenting results of simulations using real-life RIRs. The accuracy of an RIR estimate,ĥ, is measured by the normalized misalignment, defined as where h is the RIR we are aiming to estimate. In Sections V-I and V-J we will use slightly modified evaluation measures, which will be further explained there. Throughout this Section we will compare the performance of the two proposed methods to that of two benchmark methods. Firstly, ordinary least squares, i.e., without regularization, corresponding to solving (5). Secondly, least squares, with Tikhonov regularization, i.e., solving where λ T controls the degree of regularization.

A. Impact of T 60
First we present simulations performed on synthetically generated RIRs, in order to show how the algorithms under comparison perform, as a function of the reverberation time, T 60 , of the room. The T 60 is defined as the time required for the sound level to drop 60 dB after switching off a stationary source. The RIRs are generated by letting each tap independently be drawn from a normal distribution, with an exponentially decaying envelope, corresponding to the given T 60 . This is a stochastic reverberation model [50], and can be attributed to [51] and [52]. In this simulation example, for each value of T 60 , taken from a a grid of 9 values in the range [33,300] ms, 10 synthetic RIRs were generated. The system was driven by a white Gaussian noise. Finally, independent white Gaussian measurement noise was added, with signal-to-noise ratio (SNR dB ) = 10, defined as SNR dB = 10 log 10 where P S and P N denote the power of the output signal without the noise, and the power of the noise, respectively. The RIR length, n h , was set to correspond to the reverberation time, and n y was set to 1.2n h , in order to have comparable overdetermined systems, for the various values of T 60 . The hyperparameters λ T , λ NN , λ QE , and γ were tuned for each individual setting, by finding the optimal values using another set of randomly generated RIRs. The results are shown in Fig. 2. There it can be seen that the performance is close to constant as a function of T 60 for all the analyzed methods, and that the low-rank models outperform ordinary least squares, as well as Tikhonov-regularized least squares, with a slight preference for the quadratic envelope.

B. SMARD
In order to demonstrate the proposed methods' applicability to actual measured RIRs, we apply it to the single-and multichannel audio recordings database (SMARD) [53]. These recordings are sampled at 48 kHz, for 12 seconds, yielding impulse responses of 576 · 10 3 taps. These are recorded at various source and receiver positions, and with varying equipment, in a room of size 7.34 m ×8.09 m ×2.87 m, with a reverberation time of approximately 0.15 s. In total, the dataset contains 1008 RIRs. Two things should be noted regarding the numerical simulations performed here. Firstly, the RIRs are truncated at some discretetime index N 576 · 10 3 . Secondly, the truncated RIRs are set to start at the, in absolute value, largest value of the original RIR recording, i.e., at the point where the direct component hits the microphone.

C. Hyperparameter Tuning
As with any algorithm requiring hyperparameters, the parameters of the proposed algorithm need to be rigorously set, in order for the algorithm to be effective. In this Section we expand on the tuning of the hyperparameters for the compared methods. The dependance on the variables of the problem largely decouples between the hyperparameters, with the optimal choice of γ being impacted by the input signal x, and λ QE , λ NN , and λ T depending on the SNR, much like in sparse estimation. Using a random subset of 20 of the RIRs of SMARD, we here vary the SNR dB for white Gaussian measurement noise from 0 to 20 in steps of 2, and find the optimal choice of λ QE , λ NN , and λ T respectively, for each value of the SNR. We then perform linear regression to find models for λ QE , λ NN , and λ T , as a function of the SNR. This is done with n h = 3600, n y = 4320, γ = 10 2 and with white Gaussian noise of unit power as input. In accordance with the results of this, we let λ QE = 10 0.4−0.14·SNR dB , λ NN = 10 −3.7−0.08·SNR dB , and λ T = 10 3.6−0.1·SNR dB . For λ QE and λ NN , this can is illustrated in Fig. 3. Different choices of n h and n y would yield different values for the model parameters, hence simulations from hereon throughout this paper will be done with n h = 3600 and n y = 4320, respectively. The performance of the proposed algorithms, as a function of SNR, will be discussed in Section V-D.
As for γ, the main dependency is on the input x, but the dependency is more involved. The optimal choice is impacted by the power of the input signal, as well as the length of the RIR, n h (and thereby the length of the input signal, n x ), and finally also the frequency content of the input signal. It is therefore very difficult to find a model that covers all possible choices of these variables. For a vast majority of the experiments performed, preliminary simulations showed that γ = 10 2 was a good enough choice. In Sections V-G and V-I, where the frequency content of the input signal is varied, each scenario requires its own opitimized γ, which can be found in Table I. In Section V-H, where snippets of speech are used as input, γ = 10 −2 proved to be a good choice.

D. SNR
The benefit of low-rank modeling of the RIR becomes particularly evident when the signal-to-noise ratio is low. With white Gaussian noise as input and values of λ QE , λ NN , λ T , and γ as described in Section V-C, and using all of the 1008 RIRs from SMARD, we showcase the adequacy of the proposed modeling framework. The results can be seen in Fig. 4. The low-rank enforcing methods are clearly preferable to ordinary least squares, as well as Tikhonov regularized least squares. As expected, the difference compared to ordinary least squares is particularly apparent when the SNR is low.

E. Convergence Analysis
For the least squares methods, ordinary and with Tikhonov regularization, we have closed-form expressions, and no iterative procedure is needed. For the nuclear norm and quadratic envelope, however, we here showcase the difference in speed of convergence, using part of the simulations done in Section V-D. In Fig. 5, we see the averaged normalized misalignment for the two iterative methods, as a function of the iteration index, for the case where SNR dB = 10. The difference in average number of iterations needed for convergence for the quadratic envelope and nuclear norm, for this particular value of SNR, was negligible. We see in Fig. 5 that the normalized misalignment, for a given number of iterations, is lesser for the quadratic envelope, compared to the nuclear norm. Preliminary simulations showed similar behaviour for other SNR values, both in terms of the  relation between the number of iterations needed for convergence of the respective methods, and the decay of the normalized misalignment, as a function of the iteration index.

F. Hyperparameter Sensitivity
In Section V-D we saw that the quadratic envelope and nuclear norm regularizations performed similarly as a function of the SNR, when oracle knowledge of the SNR was assumed. Using all the 1008 RIRs of SMARD, with SNR dB = 10, and white Gaussian noise as input, we here consider what happens if the algorithms are provided over-or underestimated values of their respective regularization parameters. According to the formulas provided in Section V-C, we set λ QE = 10 0.4−0.14·10 , λ NN = 10 −3.7−0.08·10 , and λ T = 10 3.6−0.1·10 respectively. We then look at what happens if these parameters are multiplied by a factor of 10 −1 or 10 1 , so as to simulate an underestimation and an overestimation respectively. The averaged results are found in Fig. 6. There it can be seen that the quadratic envelope clearly is the method losing the least in accuracy, for the perturbed parameter values. This suggests that the quadratic envelope would be particularly useful in scenarios where the SNR is not known exactly. For reference, the average misalignment using ordinary least squares is on the order of −5 dB, indicating that when the knowledge of the SNR is poor and the regularization parameter is overestimated, some types of regularization do more harm than good.

G. Frequency Content of Input Signal
In this Section, we investigate how the proposed method performs as a function of inverse frequency power of the input, i.e., colored noise with the power spectral density 1/|f | α . As previously mentioned, the hyperparameter values' dependency on the variables of the problem is complicated. Therefore, when displaying the aptitude of the proposed method as a function of α, the hyperparameters were tuned on 20 randomly selected RIRs. These parameter values are shown in Table I. Then, with these hyperparameter values, simulations were done on all of the RIRs of SMARD, and with SNR dB = 10. The results are displayed in Fig. 7. There it can be see that the low-rank estimation methods, nuclear norm regularization as well as the quadratic envelope of the 0 -norm, are particularly successful, in comparison to ordinary least squares, when the color of the noise is not white. As for the performance of Tikhonov least squares, we note that it is fairly constant as a function of α. Thus, the difference in performance between the low-rank methods and Tikhonov least squares is largest for α close to zero.

H. Speech Signal Input
As indicated in Section V-G, the proposed method provides the most improvement to the RIR estimation when the spectral excitation by the input signal is poor. We study this input by letting the driving signal be a speech signal. Yet again, we set all the parameter values in accordance with the discussion in Section V-C (note that here γ = 10 −2 ), and use 100 randomly chosen RIRs from SMARD, all driven by 10 different snippets of anechoic male and female speech, chosen at random from the TSP Speech Database [54]. In order to make sure that the speech snippets contained enough power, they where all chosen with the criterion that x ≥ 2. The results can be seen in Fig. 8. Here, we have opted to not plot the normalized misalignment of the ordinary least squares estimation. It was consistently at least two orders of magnitude larger than the other methods, and the inclusion of it in Fig. 8 would cloud the difference between the other methods. It can be seen that, the low-rank methods outperform Tikhonov regularized least squares. This further strengthens the hypothesis that the low-rank modeling is most useful when the input spectral excitation is poor.

I. Input-Output Relations
In order to further investigate the merit of the low-rank estimation, we will in this Section, as previously indicated, use a slightly modified evaluation measure. Instead of, as previously, looking at the distance between the estimated RIR and the measured RIR, we will here look at the following metric, where x is a previously unseen signal, with the same properties as the input signal generating the output y, from whichĥ is estimated, and the expectation is with respect to x. We will refer to this as normalized output misalignment. Under these conditions, we replicate the experiment in Section V-G. The same parameter values as in Section V-G are used, i.e., the ones found in Table I. Here, a subset of 100 randomly chosen RIRs from SMARD were used, and (33) was estimated using 50 different instances of x, drawn from the same distribution as the input signal. The results can be seen in Fig. 9. It can be noticed that, for all but ordinary least squares, the normalized output misalignment is decreasing as a function of α. One possible explanation for this is the discrepancy in spectral excitation. For α = −1, poor estimation at a certain frequency is more likely to be masked by poor spectral excitation by the input signal, at that particular frequency. The benefit of the low-rank regularization is, however, evident all across the board.

J. Cramér-rao Lower Bound
In this Section we study the mean squared error (MSE), of the suggested estimators, in relation to theoretical bounds given by the Cramér-Rao Lower Bound (CRLB), the smallest variance that can be achieved by an unbiased estimator. Let denote the error covariance matrix of an unbiased estimatorĥ. We have that MSE(ĥ) = tr(Σ h ), and it can also be noticed that We then define MSE dB ĥ = 20 log 10 MSE ĥ .
The CRLB of the variance of low-rank estimation algorithms has been explored by Tang et al. in [55]. For the convenience of the reader, we here reiterate the principal content of their results. Let Σ denote the measurement noise covariance matrix. We then have that with , U j and V j denoting the jth column of U and V respectively, r is the rank, and we remind the reader that U and V come from the SVD of H = USV T , and that X is the Toeplitz matrix from (3). Further, A ≥ B means that A − B is positive semidefinite. It follows from (38) that, by taking the trace of both sides of the inequality, using the cyclic property of the trace operation, and the fact that i.e., the right hand side of (40) is the CRLB. As previously touched upon, an RIR from SMARD will, when matricized, not be low-rank in a strict sense. In order to avoid complicated error bounds depending on the misspecification of the model, we will therefore, in this Section, make low-rank approximations of the matricized RIRs, in order to be able to study the properties of the estimation, in relation to the CRLB. Hence, h will in this Section be a rank-20 approximation of the recorded and truncated RIR. This is a meritable approximation, as per the discussion in [21]. A rigorous analysis of performance bounds for a not strictly low-rank RIR that has (purposefully) been misspecified as low-rank, is beyond the scope of this paper, and will be left for future work. Comparison is made with two different theoretical bounds. The first one is under the assumption that the low-rank property is unknown. For the second bound, the low-rank property is known. Yet again, the parameter values from Section V-C are used. Experiments were performed using 50 randomly chosen RIRs from SMARD, and for each RIR running 20 Monte Carlo-simulations. The results, averaged over the Monte Carlo-simulations and the different RIRs, can be seen in Fig. 10. As expected, the first bound is greater than the second one. The low-rank methods consistently outperform the first bound, which is to be expected, since they are using information not available to that theoretical bound. In relation to the second bound, the low-rank methods outperform the bound for lower values of SNR. This is possible since the estimator is biased. Also this outperformance is appealing to intuition. For a lower SNR, the gain that comes from the regularization in terms of improved conditioning, can reduce variance so that it outweighs the added bias, something that is generally not possible with an increasing SNR.

VI. CONCLUSION
In this paper we have presented a method for exploiting low-rank, or close to low-rank, properties of RIRs. The lowrank penalty serves as a regularizer and prevents an overfitted solution, similar to the idea of sparse estimation. The low-rank penalty is enforced on a matricization of the coefficient vector, and consists of either the nuclear norm, or the quadratic envelope of the 0 -pseudo-norm of the vector of singular values of the matrix. The results of the proposed methods are compared to those of ordinary least squares, as well as Tikhonov regularized least squares. It is shown that the low-rank methods work very well in general, but excel particularly when either the SNR is low, or when the input spectral excitation is poor, which can stem from the input signal being either colored noise or a speech signal. The difference between the two investigated low-rank methods, i.e., nuclear norm and quadratic envelope, is comparatively small. The main benefit of the nuclear norm is the convexity, and that only one hyper parameter has to be tuned. The quadratic envelope, on the other hand, displays superior robustness to a suboptimally chosen regularization parameter, in the case where oracle knowledge about the SNR cannot be assumed.
Future work will focus on investigating the properties of the error introduced by the low-rank estimation. A structure to this error could indicate possible improvements, as well as potential applications, for the methods proposed here. The interplay between the bias introduced by a misspecification, and the possible reduced estimation variance, is another research area we aim to explore.
When trying to find Q γ (λ QE · 0 )(u), we notice two things. Firstly, Q γ (λ QE · 0 )(0) = 0, and secondly, ∀u s.t. |u| ≥ C, for some constant C > 0, Q γ (λ QE · 0 )(u) = λ QE . The question is then what the value of C is, and what values Q γ (λ QE · 0 )(u) takes on for u ∈ (−C, C). The limiting factor for the envelope is that λ QE 0 0 = 0, i.e., the negative quadratic function must pass through, or below, the origin. This yields the constraint that Looking for the supremum, we want to chose the largest possible α, hence α = γ 2 v 2 . Inserting this into (17) and simplifying yields We know this is valid in z = 0 so we can write (42) If the peak of the parabola is at or below v = λ QE then, naturally, all other points on the parabola will be as well. Therefore we insert z = v and simplify where μ = 2λ QE /γ. The expression we aim to maximize, γuv − γu 2 2 , is growing in v if v > 0, and in −v if v < 0. In other words, we want to chose v as big as possible, in modulo, without ever overestimating g(z), yielding y = sign(u)μ. Inserting this value for v into Q γ (λ QE · 0 )(u) and setting this expression equal to λ QE yields that this occurs at u = ±μ, thus, the sought after point C = μ. Taken together we have or, equivalently,

APPENDIX B PROXIMAL OPERATORS
The regularization terms of (16) and (21) are both orthogonally invariant [46]. A function g : R q×q → R is said to be orthogonally invariant if for all S ∈ R q×q , U ∈ R q×q , and V T ∈ R q×q , where U and V T are orthogonal matrices. More specifically, this means that g(H) = g(D (σ (H)) .
A function g is orthogonally invariant if and only if g =g • σ, whereg : R q → R is absolutely symmetric. More details can be found in [46]. This implies that, for the proximal operator of an orthogonally invariant function g, it holds that prox ρg (H) = U D(prox ρg (σ(H))) V T , i.e., the proximal step is performed on the vector of singular values [46]. For the nuclear norm regularizer, the corresponding proximal operator prox λ NN ||·|| 1 (σ(H)) in (28) is given by element-wise applying the so-called soft-thresholding operator S λ NN : R q → R q [56], For the penalty term in (21), we make use of the fact that the function decouples and that we therefore, without loss of generality, can consider the one-dimensional case. Further, since this is to be used for singular values, we consider only u ≥ 0 as argument and z ≥ 0 as function value. Finally, in order for the proximal operator to be single-valued, we must have that γρ < 1, since this guarantees that the curvature is strictly positive everywhere.
The proximal operator (23) returns the argument that minimizes the sum of two terms; the original function Q γ (λ QE · 0 )(z), and the proximity term, 1 2ρ (u − z) 2 . What is sought after is the argument z that minimizes the sum of these, for a given u. First, it can be concluded that for u = 0, where the proximal operator is not differentiable, the optimal choice of z is 0. This renders a cost of 0, so no choice can be better. Next, the question is if there is a u > 0, for which choosing a z > 0 yields a lower cost than z = 0 and, if so, what u? For z > 0 the proximal operator is differentiable. We can set that derivative equal to zero to find the optimal choice of z there, z opt . This yields that where μ = 2λ QE /γ. Comparing the costs and determining when it is better to choose z > 0, as opposed to z = 0, is equivalent of finding for which x Inserting z opt from (50) yields that x > μγρ. Taken together, To the best of the authors' knowledge, the explicit expression has not been published before.