A Flexible Framework for Expectation Maximization-Based MIMO System Identification for Time-Variant Linear Acoustic Systems

Quasi-continuous system identification of time-variant linear acoustic systems can be applied in various audio signal processing applications when numerous acoustic transfer functions must be measured. A prominent application is measuring head-related transfer functions. We treat the underlying multiple-input-multiple-output (MIMO) system identification problem in a state-space model as a joint estimation problem for states, representing impulse responses, and state-space model parameters using the expectation maximization (EM) algorithm. We address limitations of prior work by imposing different model structures, especially for dependencies within a (transformed) state vector. This results in block diagonal matrix structures, for which we derive M-step update rules. Making assumptions about this model structure and choosing a block size for a given application define the computational complexity. In examples, we found that applying this framework yields improvements of up to 10 dB in relative system distance in comparison to a conventional method.


I. INTRODUCTION
Identifying acoustic systems is required in many audio signal processing applications.Characterizing linear acoustic systems or transfer paths using measurements is an essential step before designing filters for applications, such as feed-forward active noise control (ANC) [1], personal sound zones [2], crosstalk cancellation [3], or when measuring head-related transfer functions (HRTFs) for binaural synthesis [4], [5].In these measurements, control over the playback signal, which is fed into the system, can be assumed.In some applications, such as acoustic echo control (AEC), it is required to identify acoustic systems at runtime, and the playback signal cannot be controlled; these applications are not the primary focus of this contribution.
Two main approaches for measurements of linear acoustic systems exist.On the one hand, in static measurements acoustic impulse responses (IRs) are measured for one specific configuration between transmitter(s) and receiver(s).This type of measurement is often conducted using exponential sweeps [6], [7], [8], [9].On the other hand, quasi-continuous measurements aim at identifying time-variant acoustic systems that slowly change over time.This offers the possibility to measure numerous spatial configurations between transmitter(s) and receiver(s) in a short time, and to simulate time-variant systems, as in [10], [11].
The quasi-continuous measurements, especially of HRTFs, have become popular due to a reduced measurement duration [5], [12], [13].In this application, one or more loudspeakers reproduce a predefined signal while microphones in the ear canals of a subject capture signals that are filtered with the desired acoustic system's responses.To obtain a spatially dense grid of head-related impulse responses (HRIRs) the subject is either rotated on a turntable, as in [14], [15], [16], [17], [18], [19], or can move freely [20], [21], [22], [23], [24].Adaptive filtering method, such as the normalized least-mean-square (NLMS) algorithm or variants thereof are often applied to estimate the IRs.The NLMS algorithm can also be related to deconvolution used in static measurements [25].Some other adaptive filtering methods can be interpreted as state estimation techniques [26] to obtain IRs as state estimates, as, for example, applied in the time domain for ANC in [27] or in the discrete Fourier transform (DFT) domain for AEC in [26].For the latter application, a variant of the expectation maximization (EM) algorithm has been adopted to learn the measurement noise covariance and process noise covariance for a DFT-domain adaptive filter [28].While [28] focuses on online processing, which is required in AEC, in measurement applications, however, online filtering poses an unnecessarily strict requirement.In constrast to adaptive filtering, offline processing or algorithms that require a large lookahead can be applied without restrictions once recording the signals is completed, as in [29].In [30], the EM algorithm has been applied to identify time-invariant multiple-input-multiple-output (MIMO) systems in the frequency domain.We proposed to apply the EM algorithm offline to estimate time-variant IRs in HRTF measurements, and to learn the parameters of the corresponding state-space model independently on overlapping sequences [31].While the results indicate a large potential, the applicability of this approach to real-world measurements with higher sampling rates, longer IRs and more loudspeakers in parallel is limited due to the excessive computational complexity and memory demand.
In this contribution, we present a flexible framework which applies EM-based joint learning of state-space model parameters and IRs for application in quasi-continuous system identification of time-variant linear acoustic systems so that the limitations of [31] can be overcome.This flexible framework allows to estimate time-domain finite impulse response (FIR) coefficients, a DFT-domain state representation, or any other transformed state representation obtained from a linear invertible transform.Moreover, the proposed framework employs blockwise processing and allows to treat multiple microphones jointly in a MIMO system [32].Imposing different coupling models for dependencies within the state vector can yield block diagonal matrix structures with various levels of computational complexity.As a result, a wider range of applications can be tackled, e.g., measuring time-variant IRs of ANC headphones, as conducted in [33] for a single reference microphone, could be improved and extended to multi-microphone setups, as discussed in [1].
This contribution is structured as follows: Section II presents the state-space system model while Section III describes the EM algorithm for the joint estimation of states and parameters for this model.In Sections IV and V, we present specialized model structures and derive variants of M-step update rules for various models.Examples are provided in Section VI, and Section VII concludes the paper.

II. STATE-SPACE MODEL
Quasi-continuous MIMO system identification aims at estimating IRs of an acoustic system with very high temporal resolution, resulting in either one estimate per sample or per every few milliseconds.We assume that T transmitters (loudspeakers) simultaneously play back signals x t (k), t = 1, . . ., T , such that R receivers (microphones) receive signals y r (k), r = 1, . . ., R. The IR between receiver r and transmitter t at time k is modeled as an FIR filter h rt,k ( ).
To completely represent this linear acoustic MIMO system at time k, we define the state vector at time k, comprising the IRs valid at this time instance, as where ] T is a length-L coefficient vector for each IR h rt,k ( ).The order of the state-space system is given by the number of states N z = RT L. To describe the state-space model, we define the state equation, now using a frame index n, similarly to [31], as (2) describes the evolution of the IRs over time.Here, A is the state transition matrix, and q n is the process noise, which is assumed to be zero-mean Gaussian with covariance .
The observation equation describes how the current state, the IRs, relate to the observations, i.e., the recorded signal samples.We extend the observation model from the block time-domain Kalman filter in [34] such that multiple receivers are considered jointly in a multiple-output system.This leads to the observation equation where N o is the number of samples that form an observation vector y

III. EM-BASED STATE AND PARAMETER ESTIMATION
Given a sequence of observations Y = {y 1 , . . ., y N }, we want to jointly estimate the sequence of hidden states, i.e., the IRs, Z = {z 1 , . . ., z N } and the set of state-space model parameters θ = {A, , , μ 0 , P 0 }.Here μ 0 is the initial mean state vector and P 0 is the associated initial a priori state covariance matrix.
The EM algorithm for state-space models [35] conducts this joint optimization by maximizing the expected log-likelihood, which is given by for the above state-space model.Here det(•) denotes the determinant of a matrix and E{•} is the expectation operator.Note that, in contrast to [35], the observation matrix is timedependent, as in [36], but defined by the playback signals.
The EM algorithm iterates between E-step and M-step to find a locally optimal solution for both state and parameter estimates.The E-step calculates the maximum-likelihood state estimates for a fixed set of parameters θ, and is given by the recursive Kalman filtering and Kalman smoothing equations, which involve the following quantities: the a priori state covariance matrix P n , the Kalman gain K n , the filtered state estimate μ n , the a posteriori state covariance matrix V n , the Kalman smoother gain J n , the smoothed state estimate μn , and the smoothed a posteriori state covariance matrix Vn .The Kalman filtering equations [35] are with the N z × N z identity matrix I N z , and they are evaluated recursively for n = 1, . . ., N. The Kalman smoother equations [35] J are evaluated for n = N, . . ., 1.
Then, the M-step updates the parameters θ for the fixed set of estimates (from the E-step).Assuming the above statespace model and that all matrices are fully populated, the M-step update equations yield the updated quantities, which are highlighted by , as [35] with the auxiliary definitions Evaluating ( 7) to (9) requires the following equations [35]: For each sequence Y, I EM iterations are conducted.These sequences are overlapping signal segments of the entire recorded signal, and each sequence consists of a lookback part, a central part and lookahead part, as in [31].To finally provide IR estimates for each N o -th sample of the entire recorded signal, the estimates, obtained independently on each sequence, are combined [31].
As the state and parameter estimates resulting from the iterative EM algorithm depend on the choice of initial parameters [35], we provide a rule of thumb on how to choose an initial set of parameters θ (0) .As a result of offline processing conducted in measurements, we can assume that preliminary estimates of the IRs (states) can be obtained for a given sequence, e.g., using the NLMS algorithm.These states shall be denoted as zn compute the initial mean state vector and the initial a priori state covariance matrix as the maximum-likelihood estimates of mean and covariance of these preliminary IRs as μ (0) 0 = N −1 N n=1 zn and T , respectively.Similarly, with the initial assumption A (0) = I, we can obtain a maximum-likelihood estimate of the process noise covariance from realizations of the process noise qn = zn − zn−1 corresponding to the preliminary state estimates, similar to (2), as (0) = (N − 1) −1 N n=2 ( qn − q)( qn − q) T , with the mean process noise corresponding to the preliminary state estimates given by q = (N − 1) −1 N n=2 zn .An initial measurement noise covariance matrix (0) can similarly be obtained from a background noise recording.

IV. SPECIALIZED MODEL STRUCTURES
The state and parameter estimation described above is similar to the one in [31], except that we consider a MIMO system instead of a multiple-input-single-output (MISO) system and that we use blockwise processing.If the IRs to be estimated are relatively long and/or the number of transmitters and/or receivers is high, the number of states N z becomes large.As a result, the computational complexity and memory demand of the E-step and the M-step can become very large.To reduce the number of model parameters and the computational complexity, we propose to impose specific structures for the matrix-valued parameters A, , , and P 0 .For example, imposing a (block) diagonal structure on the covariance matrices implies that the cross-covariances between particular states are zero, i.e., the states are assumed to be statistically independent.

A. MOTIVATIONAL EXAMPLES
The first example corresponds to combining two decoupled MISO systems: The state covariance matrices P n , V n , Vn , and the process noise covariance matrix are then set up as block diagonal matrices with R blocks of size T L × T L such that the IRs to each receiver are modeled as independent, but the crosscovariances within the T L samples of the T IRs of length L are considered (cf.Table 1, ID 2).The state-transition matrix should also reflect this block diagonal structure.
Second example: Correlation between IRs to the different receivers can be assumed if they are physically close to each other, especially in freefield-like conditions.The direct path to a linear array with closely-spaced microphones for low frequencies can be expected to change in a similar fashion for adjacent microphones.In case of HRTF measurements, the two microphones in the subject's ear are physically connected through the head and jointly move.Then, dependencies between receivers for a specific frequency could be modeled by applying a frequency-domain transform and a permutation such that the coefficients corresponding to a given frequency appear as groups in the state vector.This requires to reorder the state vector.In (1) the coefficient index changes fastest and the receiver index changes slowest.To form the groups in this example that model within-receiver coupling for a fixed frequency, the receiver index should change fastest.This would result in T L/2 blocks of size 2R × 2R (cf.Table 1, ID 10) for two coefficients (real and imaginary part) per DFT bin.By considering the cross-covariance between real and imaginary parts of a DFT bin, each DFT coefficient is modeled as an improper complex Gaussian random variable [37].The DC and the Nyquist bin (for even L) are considered jointly in one 2 × 2 block.
The third example considers distance changes between receiver(s) and transmitter(s).Cyclically shifting and scaling an IR can be represented as a complex-valued pointwise multiplication in the DFT domain.This can be achieved by applying the DFT to the state vector and by using a state transition matrix with 2 × 2 blocks to implement the complex-valued multiplications.Depending on the application, a meaningful coupling model can be chosen.See Table 1 for a list of suggested coupling models.

B. FURTHER APPROXIMATIONS AND STATE TRANSFORM
When independent blocks in the state are assumed, the block diagonal structure is preserved through the recursive Kalman filtering and Kalman smoothing equations, except in (5d), ( 14) and (15).Therefore, we suggest the following approximations that maintain the block diagonal structure: Here, blkdiag{•} extracts a block diagonal matrix from a matrix.The block size should be clear from context.( 16) is structurally similar to the covariance update in the subdiagonal DFT-domain MISO Kalman filter [38] and in the time-domain broadband Kalman filter [39].It is worth noting that using these approximations it is no longer guaranteed that the expected log-likelihood does not decrease-in contrast to the regular EM algorithm, which provides this guarantee [36].Yet, we found that these approximations yield useful results.
Instead of using a time-domain FIR coefficient representation of the IRs in the state vector, we can apply any linear transform to represent the IR coefficients h rt,k in a different basis, such as real and imaginary parts of an L-point DFT.This transform, applied to each IR separately, shall be given by an invertible matrix T of size L × L. To implement the reordering of the states according to the indexing order in Table 1, a permutation matrix P is introduced.The permuted and transformed state can then be described as Here, ⊗ denotes the Kronecker product.As the observation vectors remain in the time domain, the observation matrix that would be multiplied with the transformed state vector from the right is given by Cn = C n W −1 .When the state dimension N z becomes large, it can be advantageous to exploit that W −1 = (I RT ⊗ T −1 )P T can be calculated more efficiently than simply inverting the large matrix W.
If a transform-domain state representation is considered in the EM algorithm, the initial values can be transformed using the relations Ã = WAW −1 and ˜ = W W −1 and the IR estimates can be reconstructed using (19).

C. COMPUTATIONAL COMPLEXITY AND MEMORY REQUIREMENTS
The dominant term of the computational complexity of one EM iteration in Section III stems from the matrix-matrix multiplications of N z × N z matrices.For a signal sequence of length N x = N • N o , the number of operations is in the order of O( N x N o N 3 z ).Assuming that the matrices A, , , and P 0 are block diagonal with N B z blocks of size B z × B z , the number of operations for the above matrix-matrix multiplications reduces to O( N x N o N B z B 3 z ). 1 Further, it is required to store O( N x N o N 2 z ) elements of the a posteriori state covariance matrices in Section III.This number reduces to O( N x N o N B z B 2 z ) for block diagonal matrices.
Note that both the computational complexity terms above and the memory requirements are inversely proportional to the block size N o .Therefore, assuming constant IRs for several time instances in the observation model in (3) can result in large savings compared to [31] where N o = 1.

V. DERIVED M-STEP UPDATE RULES
To decrease the number of model parameters and/or to impose a block diagonal structure, we now derive the M-step update rules for numerous assumptions about the matrix-valued parameters' structure.Therefore, the derivative of the expected log-likelihood Q(θ) in (4) w.r.t. the parameter is calculated and set to zero, analogously to the derivations in [35], [36].Irrespective of the assumption about the structures, the update rules for the initial a priori state (10) and for the initial state covariance matrix (11) hold.

A. STATE TRANSITION UPDATE RULES
Assuming a scaled identity state transition matrix A = aI yields Here, tr{•} denotes the trace operator.This model corresponds to the scalar fading factor Markov model as found in many Kalman filtering approaches, e.g., in [26], [40], [41], [42].

Assuming a diagonal state transition matrix
where diag{•} converts a vector into a diagonal matrix or extracts the main diagonal from a matrix, and represents elementwise multiplication.In combination with a DFTtransformed state vector, this model allows for frequencydependent fading factors.This could be assumed when the physical distance between transmitter(s) and receiver(s) is expected to change.This would result in slower changes 1 For small block sizes, a different term could dominate the overall complexity.A comprehensive analysis of the exact computational complexity seems impractical here due to the variety of relationships between the parameters R, T, L, N o , N Bz , N By and N that, together with the choice of M-step update equations, determine the block matrix dimensions and the number of computations per iteration.
in lower frequencies and faster changes in higher frequencies, i.e., different fading-factor time constants per frequency.
Next, a block diagonal state transition matrix with N B z blocks of size B z = N z /N B z is assumed, and the auxiliary matrix allows to use the mkblkdiag{•} operator, which "makes" a block diagonal matrix and which can be understood as with a block matrix that has an identity matrix of size B z × B z in block row i and block column i, and a unit vector-like matrix that has an identity matrix in block column i.To derive the update rule for , which contains all the parameters describing A, we set ∂Q ∂ != 0, which yields the condition blkdiag A system of the form blkdiag {A} = blkdiag {B mkblkdiag {X } C} (28) for matrices A, B, C ∈ R N Bz B z ×N Bz B z and X ∈ R N Bz B z ×B z can be rewritten using a vectorized representation of the block matrices using vec{•}, i.e., stacking all matrix elements into a column vector, as where S (B, C) is a matrix that implements (28).Eventually, to find a representation of , we solve and undo the vectorize operation.The update rule (30) requires solving a linear system of equations with N B z B 2 z variables, which can become impractically large.Note that ( 20) and ( 21) also can involve large matrix multiplications.
To avoid them, we can, instead of full coupling between all states, assume independent blocks of B z states in the state vector.Then, the condition in ( 27) simplifies, and we obtain N B z separate matrix-valued equations with B 2 z variables each and obtain the update rule for the block with index b as where [•] i j denotes the block in the i-th block row and jth block column.Instead of dealing with matrices of size (30), the matrices in ( 31) are only of size B z × B z .
This block structure can be applied to model that all IRs change independently, for instance, when the transmitters are spatially far apart from each other, or to model that complexvalued frequencies bins change independently (for B z = 2).

B. PROCESS NOISE UPDATE RULES
Assuming a scaled identity matrix = γ I yields This model resembles the so-called broadband Kalman filter in [39].In combination with T as the real-valued DFT, the model could also be applied when assuming that all frequencies change by similar amounts.
With T as the real-valued DFT, this model could be applied when assuming that all frequencies change by different amounts.This can be understood as similar to the diagonal process noise covariance matrix in the DFT-domain Kalman filter [26].However, there the state vector results from transforming a zero-padded IR estimate into the DFT domain.In the time domain, this structure corresponds to the structure found in the process noise estimation in [27].Assuming a block diagonal matrix with N B z blocks of size B z × B z , i.e., = mkblkdiag{G} with containing all the parameters describing , yields For a large number of states N z , the update rules ( 32), ( 33) and ( 34) still require computing products of large dense matrices in (12) before extracting the relevant matrix entries.If it is instead assumed that independent blocks occur in the state vector and that the state transition matrix A is modeled as a block diagonal matrix as well, we can simplify to obtain the update rule for block b as follows: Note that the matrices in (35) are only of size B z × B z .This model is conceptually similar to the so-called submatrixdiagonal form for MISO systems in [38], where dependencies between transmitters for a fixed frequency bin are considered.

C. MEASUREMENT NOISE UPDATE RULES
Assuming a scaled identity measurement noise covariance matrix = sI N o yields This corresponds to modeling additive white noise with identical variance for all receivers.Assuming a diagonal measurement noise covariance matrix With samplewise processing, i.e., N o = 1, a different amount of additive white noise in each receiver could be modeled.
For N o > 1, this allows to model additive noise with correlation between noise samples, e.g., differently colored noise at each receiver.The larger N o , the longer temporal correlations in the measurement noise can be modeled.Assuming independent blocks in the observations y n , (38) can further be simplified as follows: ) where [C n ] b: denotes the b-th block row and all (block) columns of C n .In contrast to the fully populated measurement noise covariance matrix in (9), which would also try to jointly model measurement noise at different receivers-a reasonable assumption if there is an external noise source that affects all receivers-(39) allows to model colored measurement noise independently at each receivers, for instance, microphone self-noise.

VI. EXAMPLES
To demonstrate the potential of the proposed framework for the identification of time-variant linear acoustic systems, two examples are presented.
To judge the IR estimate quality, we evaluate the relative system distance (also called normalized misalignment) at frame n between the IR estimates and those of a reference measurement for one specific position that is also contained in the continuous measurement, represented by a corresponding state vector z (ref) , as SD n = 20 log 10 μn − z (ref) / z (ref)   n dB.(40) Here, μn could also represent the IR estimates of a baseline algorithm, such as the NLMS algorithm.

A. HRTF MEASUREMENT WITH 37 CHANNELS
In [43], HRTFs were measured using a continuous system identification approach with T = 37 loudspeakers at different elevations in an anechoic chamber.Linear sweeps were played back while a turntable rotated the subject (or dummy head).The rotation speed was 1.5 • s −1 .Each impulse response was L = 1024 samples long, resulting in a sweep period length of T L = 37 888.A reference measurement of the frontal HRTF for all 37 channels is available as the dummy head in the example measurement remained motionless for multiple sweep periods before the rotation started.Due to the optimal convergence properties of perfect periodic sequences [44] these IRs can be considered a valid reference set of HRIRs for this position.
We estimated the HRIRs using the NLMS algorithm with a step size of 0.5, as used in the original HUTUBS database [43], and with a step size of 1.0.Then, a comparison to the results using the proposed framework with the following settings was conducted: The segment length was chosen to be 6 • 37 888 (5.15 s at a sampling rate of 44.1 kHz) with equal lengths of lookback part, central part and lookahead part, as in [31].We assumed that the IRs are constant for about 5.8 ms, corresponding to very small spatial angles, and hence chose N o = 256.Independent time-domain coefficients were assumed (cf.Table 1, ID 8) due to the high state dimension of N z = 37 888, and ( 31), ( 35), (10), and (11) were used in the M-step.θ (0) was chosen based on preliminary estimates obtained with step size 1.0 in the NLMS algorithm, and I = 2 iterations were conducted.
Following Section IV-C, the dominant complexity term for the formulation in [31] with N o = 1 would require about 1.2 • 10 19 operations, which is reduced to roughly 3.4 • 10 7 here due to B z = 1.The memory requirement decreases from storing 3.3 • 10 14 to 3.4 • 10 7 elements.This highlights that the blockwise processing and the assumption of independent time-domain coefficients make it feasible to compute a solution with a reasonable amount of resources.
Fig. 1 shows the relative system distance SD n for the time when the dummy head rotates through the initial position again after having rotated 360 • .When the dummy head approaches the initial position, SD n is expected to decrease until the initial orientation is reached.There SD n is expected to reach a minimum-the continuous measurement and the reference measurement match closest.When the dummy head moves away from this orientation, SD n is expected to increase as the HRIRs begin to deviate again.
The time-dependent relative system distance SD n , comparing all 37 IRs, is shown with thin dashed lines and exhibits the lowest minimum for the proposed method, corresponding to an improvement of about 4 dB compared to the NLMS algorithm's results.Additionally, the relative system distance for the single IR between the 0 • -elevation loudspeaker and the left ear, as shown by the thick solid line, improves by about 10 dB.The staircase-like shapes for the relative system distances for the NLMS algorithm are a result of the linear sweep excitation signal.The distance between the jumps matches one period length of 37 888 samples (0.86 s).The curve for step size 0.5 appears to be slightly delayed as a result of the implicit temporal smoothing of the NLMS algorithm for step sizes less than one.Both of these observations match the analyses in [45].Overall, this result demonstrates that a significant improvement can be achieved with the proposed framework despite the simplifying assumptions about the matrix-valued parameters.

B. TWO-LOUDSPEAKER HRTF MEASUREMENT
We conducted a continuous measurement of HRTFs from T = 2 loudspeakers spanning a 60 • stereo setup at a distance of 1.75 m in a semi-anechoic chamber with a reflective floor.A noise-like perfect periodic sequence of period length 2L = 9600 was used as excitation signal in accordance with [44] to identify two IRs of length L = 4800 (100 ms at a sampling rate of 48 kHz) in parallel.We consider the system as a MIMO setup with R = 2 coupled receivers, represented by the microphones of the HEAD acoustics HMS II.3 dummy head.It was rotated on a turntable with a rotational velocity of 1 • s −1 .Only coupling between receivers in the DFT domain (cf.Table 1, ID 10) was assumed as the two microphones of the dummy head rotate jointly.The M-step updates (31), (35), (10), and (11) were applied with B z = 4.We assumed white measurement noise and hence also chose (36) in the M-step.It was assumed that the IRs are constant for durations of 1ms and thus N o = 48 was set.A segment length of 57 600 samples was chosen, and I = 2 iterations were conducted.The reference IRs at azimuth angle near 0 • , 75 • and 90 • were measured using exponential sweeps.
Similarly to above, the dominant complexity term for the MIMO formulation in Section III with N o = 1 would require about 4.1 • 10 17 operations, which is reduced to roughly 3.7 • 10 8 here due to block size B z = 4.The memory requirement decreases from storing 2.1 • 10 13 to 9.2 • 10 7 elements.Fig. 2 shows the time-dependent relative system distances SD n for the times when the azimuth angle corresponding to the continuous rotation passed through the reference azimuth angles, as indicated by the lower plot that displays the azimuth angle recorded at the reference position and during the measurement.The minimum relative system distances, comparing all four IRs, are improved by about 3 dB to 5 dB compared to the result of applying the NLMS algorithm with step size 1.0.The minima occur temporally close to but slightly before the expected position, which is suspected to be a consequence of the head-tracking system's latency and/or mechanical imperfections.The delay in attaining the minimum SD n with the NLMS algorithm is a consequence of the systematic delay of half a period length, i.e., 4800 samples (0.1 s), as also analyzed in [45].

VII. CONCLUSION
We have presented a framework for EM-based identification of time-variant linear acoustic systems.This framework combines the time-domain block observation model for a MIMO system with a state vector transform, as well as a variety of coupling models that can yield a block diagonal matrix structure, for which we have derived M-step update rules.The choice of model structure and block sizes determines the computational complexity.This way, tasks that were previously considered computationally infeasible, such as HRTF measurements involving numerous channels, can be successfully addressed with the joint state and parameter estimation.Our examples illustrate that this framework can improve the quality of the quasi-continuous IR estimates by up to 10 dB in relative system distance when comparing to a reference measurement.The proposed framework hence enables improved quasi-continuous MIMO system identification.

1 .
Assuming a block diagonal measurement noise covariance with N B y blocks of size B y × B y with B y = RN o /N B y , i.e., = mkblkdiag{S} with S = [S T . .S T N By ] T ∈ R N By B y ×B y yields S = 1 N N n=1 blkdiag {M n } .(

FIGURE 1 .
FIGURE 1.Comparison of time-dependent relative system distances for rotation through initial position for all channels (dashed) and 0 • -elevation channel (solid).

FIGURE 2 .
FIGURE 2. Comparisons of relative system distances when rotating through three different example reference measurement positions.