Common-slope modeling of late reverberation in coupled rooms

Coupled rooms have a distinct sound energy decay behavior, which exhibits more than one decay time under certain conditions. The sound energy decay analysis in such scenarios requires decay models consisting of multiple exponentials with distinct decay rates and amplitudes. While multi-exponential decay analysis is commonly used in room acoustics, the spatial and directional sound energy decay variations in coupled rooms have received little attention. In this work, we introduce the common-slope model of late reverberation for coupled rooms. Common slopes are spatially and directionally invariant decay functions over time, whose amplitudes model all decay variations with respect to the source-receiver conﬁguration. For example, in a scene consisting of two coupled rooms, it is possible to determine two common decay times that approximate the decay for all source-receiver conﬁgurations in the scene. Consequently, all spatial and directional decay variations are expressed via decay amplitudes only. We apply the common-slope analysis to measurements of room transitions between coupled rooms. Our analysis shows that the common-slope model approximates the measured sound energy decay with little error. The proposed common-slope model can be used for room acoustic analysis and the efﬁcient synthesis of artiﬁcial late reverberation tails.


INTRODUCTION
The sound energy decay of coupled rooms has received much attention in room acoustic literature [1][2][3][4][5].It is well documented in those studies that, under certain conditions, the sound energy in coupled rooms decays with more than one decay rate.Multi-exponential models are commonly used to analyze such multi-rate energy decays in terms of decay times and amplitudes [2].For this purpose, different approaches to determine the model parameters have been proposed [6,7].
The diffuse field assumption is the foundation for many room acoustic studies and theories.In a diffuse sound field, the energy is assumed to be spatially and directionally uniformly distributed.In more complicated room geometries, such as coupled rooms or rooms with non-uniform absorption distribution, this assumption The remainder of this paper is structured as follows.Section 2 provides an overview of the acoustic fundamentals of this work.Section 3 introduces the common-slope model of reverberation and Section 4 demonstrates its usage on a large number of acoustic measurements conducted in coupled rooms.Section 5 concludes this work.
More details, background, and a thorough derivation of the common-slope model can be found in a recently submitted journal article by the authors [18].This paper extends the journal paper by demonstrating the common-slope analysis on further transitions between coupled rooms, including various sound source positions.

BACKGROUND
The time-domain transfer function for sound traveling between a sound source at position x s = (x s , y s , z s ) and a receiver at position x r = (x r , y r , z r ) can be given in terms of a room impulse response (RIR).When dealing with directional sound sources or receivers, the propagation path's direction of departure (DOD) from the source and direction of arrival (DOA) at the receiver must be considered.They are denoted by W W W s = (f s , q s ) and W W W r = (f r , q r ), respectively, with the azimuth angle f and the elevation angle q.In this paper, we define the source-receiver configuration x = (x s , x r , W W W s , W W W r ) as the combination of source and receiver position and the propagation directions.
The sound energy decay of rooms is commonly described in terms of energy decay functions (EDFs).To this end, the Schroeder backwards integration procedure [19] can be used to calculate an unnormalized EDF d(x,t) from an RIR h(x,t) as where L is the number of samples in the EDF.We assume discrete time throughout this paper, i.e., t is the discrete-time sample index.
Coupled rooms have a distinct sound energy decay behavior, which exhibits more than one decay rate under certain conditions [1,20].In such scenarios, the sound energy decay is typically modeled as a superposition of multiple exponentials with individual decay rates and amplitudes [2].The multi-exponential model with noise is given by [2] with the decay kernel In the above model, T k,x and A k,x are the decay times and amplitudes of the kth exponential, respectively, N 0,x is the amplitude of the noise term, f s is the sampling frequency, and 13.8 = ln (10 6 ) is a constant ensuring that the sound energy has decayed by 60 dB after T k,x seconds, where ln(•) denotes the natural logarithm.The square brackets in Eq. ( 2) include the constant term Y (tr.) k,x (L), which accounts for the finite upper limit of integration in the Schroeder backwards integration and can be dropped for large L [21].
We will refer to this model throughout the paper as the traditional multi-exponential model.It is evident from Eqs. ( 2) and (3) that the kth order traditional multi-exponential model features (2k + 1) free parameters that have to be determined for every source-receiver configuration x, namely, k decay times, k decay amplitudes, and 1 noise amplitude.Standard decay analysis methods like the DecayFitNet [6] or Bayesian analysis [7] can be used for this purpose.

COMMON-SLOPE MODEL OF REVERBERATION
This section introduces the common-slope model of reverberation and outlines how it can be used to describe sound energy decay variations in rooms.
As elaborated before, EDFs can be modeled as a linear combination of one or more exponentials and a noise term.The analysis is usually carried out in frequency bands under the assumption that a specific frequency band features only a limited number of decay rates.Earlier work by Kuttruff investigates the distribution of room modes over the frequency range and establishes their relationship to EDFs [20,24].More precisely, when the room modes within a frequency band have similar decay times, the resulting EDF follows an exponential, whose decay time is the average of the individual mode decay times [20].In coupled rooms, multiple mode groups with decay times scattered around distinct means may occur.This phenomenon motivates multi-exponential decay models, in which the number of slopes corresponds to the number of mode groups.
The relationship between room modes and energy decay slopes is a central part of this work.It is particularly useful to recognize that the mode decay times only depend on the room geometry and wall properties [20].In other words, for varying source-receiver configurations, the mode decay times stay constant, whereas all acoustic changes can be modeled with mode decay amplitudes [20,17].In the following, we will refer to this property as the common-decay property (CDP).A similar line-of-thought can be found in previous work by Haneda et al. [17], which use the CDP in their common-acoustical-pole and residue (CAPR) model to interpolate between RIRs.Another study relying on the CDP was presented by Das et al. [25], who extend the CAPR toward frequency-band-wise processing.In this paper, we will utilize the CDP and combine it with the previously elaborated insight that room modes and EDFs are closely connected.
We propose the common-slope model of reverberation, which is given by with the decay kernel At first sight, this model considerably resembles the traditional multi-exponential decay model in Eqs. ( 2) and (3).However, it is important to note that the decay kernel Y k is now independent of the source-receiverconfiguration x, as suggested by the CDP.More precisely, the common-slope model assumes that EDFs of multiple source-receiver-configurations can be modeled with a common set of exponential decay times T k .We therefore refer to the common T k values as common decay times and to the corresponding EDF slopes as common slopes.Consequently, all variations with respect to the source-receiver-configuration are described in terms of the amplitude values A k,x and noise values N 0,x only.Please note that the common-slope model can only be applied if the different source-receiver-configurations are part of the same scene.For example, a coupled room geometry consisting of multiple connected rooms counts as one scene, but multiple non-connected rooms in different buildings do not.
The common decay times can be obtained in three steps.Firstly, a standard decay analysis approach like the DecayFitNet [6] or Bayesian analysis [7] is used to determine the decay times T k,x of the traditional multiexponential model [c.f.Eqs. ( 2) and ( 3)].Secondly, the k-means clustering algorithm [26,27] is applied to the decay times T k,x to obtain k decay time clusters corresponding to the k assumed mode groups.Each T k,x value is assigned to a cluster, such that it has the smallest absolute distance to the cluster mean.The number of clusters can be visually determined from a histogram of the T k,x values.Lastly, one common decay time is determined for each cluster.The common decay time is defined as the center of the histogram bin containing the largest number of T k,x values.Figure 1 demonstrates this procedure for all three transitions analyzed in this paper (c.f.Section 4.1).After the common decay times T k have been determined, the remaining model parameters A k,x and N 0,x need to be estimated for all source-receiver configurations x.Due to the common decay times, all non-linearities of the traditional multi-exponential model have been eliminated.Consequently, the remaining estimation problem becomes solvable as a constrained linear least-squares problem with A k,x 0 and N 0,x 0.

RESULTS
In the following section, we will demonstrate the common-slope model and show how it can be used to obtain interpretable room acoustic analysis results.We will focus specifically on coupled room geometries.

Dataset under investigation
The analyzes in this section will be based on the Room Transition dataset by McKenzie et al. [28,29].The Room Transition dataset contains higher-order Ambisonics RIRs, which were measured with an Eigenmike microphone array.During the measurements, the microphone array was placed in 5 cm intervals on 5 m long, straight transition lines centered around the aperture between the coupled rooms.Each transition was measured with four different source positions: two in each room, one of which has clear line-of-sight to all receiver positions (CLOS), while the other one has no clear line-of-sight (NLOS).Consequently, there are 101 ⇥ 4 = 404 RIRs per transition, corresponding to 404 different source-receiver configurations x.
In this paper, we will demonstrate the common-slope model on the transitions "Meeting room to hallway", "Office to stairwell", and "Office to kitchen".Table 1 briefly summarizes the dimensions and properties of the individual rooms.For more information, please refer to the dataset and its accompanying publication, which also include the floor plans of all scenes [28,29].

Room transition along a straight line: spatial and directional analysis
In this section, we extend the preceding analysis with directional information.To this end, we use the directional information captured by the higher-order Ambisonic RIRs and perform beamforming into different directions.
In the present analysis, we beamform with a 15 azimuth resolution into directions on the horizontal plane.After beamforming into a certain direction, EDFs can be calculated from the directional RIRs via the Schroeder backwards integration procedure [19] to yield directional EDFs (DEDFs).This procedure is analogous to the methodology in prior work [16,30].
We obtain beamformer output signals S 2 R J⇥L for J analysis directions as where s x is the x th beamformer output, A 2 R J⇥(N+1) 2 and h 2 R (N+1) 2 ⇥L denote the analysis matrix and the Nth-order Ambisonic RIR 1 , respectively, and (•) T denotes the matrix transpose.We assume axisymmetric beamformers, i.e., the analysis matrix A can be obtained from the beamformer weights c N 2 R N⇥1 as where y(W W W x ) 2 R (N+1) 2 ⇥1 denote spherical harmonics (SHs) evaluated at the beamformer steering directions W W W x .
Due to the axisymmetry of the beamformers, we repeat the nth beamformer weight µ times, with µ and n being the SH degree and order, respectively.This operation is formalized by diag N (•).
Previous studies on room acoustic analysis and Ambisonic RIR processing showed that a great front-toback-separation is important for resolving energy differences along room axes [30,31].Therefore, we employ a spatial Butterworth beamformer [32, Table 3.1] in this work, whose axisymmetric weights are given by where we set the Butterworth beamformer order g = 5, and the cuton SH order n c = 3.
In the actual common-slope analysis, we apply the DecayFitNet [6] on all 9696 DEDFs of a specific transition (101 receiver positions ⇥ 4 source positions ⇥ 24 beamformer directions) to obtain the T k,x values of the traditional multi-exponential model [c.f.Eqs. ( 2) and ( 3)].Afterwards, we determine the common decay times T k based on all 9696 T k,x values as outlined in Section 3. Finally, the decay amplitudes A k,x and noise amplitudes N 0,x are calculated via a linear least-squares fit of the common-slope model [c.f.Eqs. ( 4) and ( 5)] to the measured DEDFs.
Figure 2 shows the common-slope analysis results for the transition "Meeting room to hallway, source in meeting room, clear line-of-sight".Two common decay times were determined (c.f. Figure 1a) and they amount to T 1 = 0.43 s and T 2 = 1.53 s. Figure 2a illustrates how the A 1,x values change for various positions on the transitions and beamforming directions.The values are generally higher in the meeting room, and gradually fade out when transitioning toward the hallway.Slightly increased amplitudes can be observed near the hallway wall (i.e., x r = 500 cm), which can be attributed to reflections from the wall.Distinct peaks can be observed for f r = ±90 .The amplitude variations for different look directions indicate that the reverberation is considerably 1 The channel ordering is not explicitly defined here because it is not relevant for this work as long as it is used consistently throughout the analysis pipeline.Please note that the maximum spherical harmonic order N should not be confused with the noise term N 0 of the decay model.anisotropic, which could be explained by an uneven distribution of absorption material in the room and the energy transfer through the door.The A 2,x values corresponding to the common decay time T 2 are depicted in Figure 2b.They gradually fade in while transitioning into the hallway.For positions closer to the door, one can observe how the energy leaks into the meeting room, and that this effect is considerably directional.Just like the A 1,x values, the A 2,x exhibit anisotropy, where clear peaks can be observed for f r = ±90 .Figure 2c depicts the dB-MSE between the common-slope fit d (dB) k (x,t) and the true DEDFs d (dB) (x,t), which is defined as where both DEDFs are represented on a logarithmic scale in dB.We exclude some portions of the EDF because they are not representative for the actual late reverberation decay: 1.The first 2m 343 m/s ⇡ 5.8 ms, because this part of the EDF only includes the direct sound and possibly one or two reflections, which show up as a very steep energy drop.This part will inevitably introduce an error, because the multi-exponential model cannot properly model this part.
2. The last 5 % of the EDF (i.e.t > 0.95 L), because this part mostly features noise, which exhibits statistical uncertainty.Although the Schroeder backwards integration procedure converges to a steady noise floor after a while, it still requires to integrate a larger number of samples to account for this uncertainty.9)] values for a common-slope analysis with the common decay times T 1 = 0.28 s, T 2 = 0.42 s, and T 3 = 1.02 s.
Figure 5. Common-slope analysis results (1 kHz octave band, f r = 0 , q r = 0 ) of the "Office to stairwell, source in office, no line-of-sight" transition.The plots show the A 1,x , A 2,x , A 3,x , N 0,x , and dB-MSE [c.f.Eq. ( 9)] values for all positions along the transition line.The common-slope analysis is based on the common decay times T 1 = 0.28 s, T 2 = 0.42 s, and T 3 = 1.02 s.
The dB-MSE is well below 3 dB for all source-receiver configurations, with median and 99 % values of 0.18 dB and 0.51 dB, respectively.The low dB-MSE values indicate that the common-slope model is suitable for describing the energy decay behavior along the entire transition, despite having fewer degrees-of-freedom.
In Figure 3, we focus only on the beamformer direction f r = 0 , q r = 0 to highlight the cross-fade between the two different decay processes.It becomes clear from the figure, how the A 1,x values gradually fade out, while the A 2,x values are getting stronger during the transition into the hallway.Interestingly, the A 1,x values are slightly increasing again toward the 500 cm transition position.This observation can be attributed to stronger reflections from the hallway wall.The N 0,x values remain approximately constant throughout the transition.Finally, the low dB-MSE values [c.f.Eq. ( 9)] indicate that the common-slope model is a suitable description for the analyzed transition.
Figure 4 shows the common-slope analysis results for the transition "Office to stairwell, source in office, no line-of-sight".This transition required three common decay times to describe all source-receiver configurations (c.f. Figure 1b), and they amount to T 1 = 0.28 s, T 2 = 0.42 s, and T 3 = 1.02 s, respectively.Just as in the previously described transition, the A 1,x values gradually fade out while transitioning into the second room, whereas the A 2,x and A 3,x values are getting stronger.Distinct peaks at lateral directions highlight the anisotropy of the sound energy decay.Furthermore, the low dB-MSE median and 99 % quantile values of 0.13 dB and 1.08 dB, respectively, demonstrate that only little error between true DEDFs and common-slope model can be observed.This result indicates that the common-slope model is suitable for describing the analyzed room transition.
Figure 5 focuses on the analysis results for the beamformer direction f r = 0 , q r = 0 .It becomes more  9)] values for a common-slope analysis with the common decay time T 1 = 0.43 s.
Figure 7. Common-slope analysis results (1 kHz octave band, f r = 0 , q r = 0 ) of the "Office to kitchen, source in office, no line-of-sight" transition.The plots show the A 1,x , N 0,x , and dB-MSE [c.f.Eq. ( 9)] values for all positions along the transition line.The common-slope analysis is based on the common decay time T 1 = 0.43 s.
evident from this figure, how the A 1,x values fade during transition into the stairwell, the A 2,x and A 3,x values are getting stronger.For positions in the stairwell, the slope of the shorter decay time T 1 is sometimes masked by the slower decaying slopes with decay times T 2 and T 3 .In these cases, the T 1 slope may be hard to detect, thus resulting in near-zero amplitude values A 1,x .In Figure 5, we therefore excluded all A 1,x values below 40 dB to make the plot more understandable.The noise level N 0,x is approximately constant along the transition.Lastly, the dB-MSE values [c.f.Eq. ( 9)] are low for all positions, thus indicating that the common-slope model is a suitable model for describing this transition.Figure 6 depicts the common-slope analysis results for the transition "Office to kitchen, source in office, no line-of-sight".For this transition, only one common decay time could be determined with the k-means method (c.f. Figure 1c) and it amounts to T 1 = 0.43 s.The acoustic properties of both rooms are very similar, and consequently only one decay rate could be determined for all source-receiver configurations.The A 1,x values once again show considerable anisotropy, and they gradually fade out while transitioning into the room without the sound source.Slightly increased errors can be observed for this transition, with median and 99 % quantile values of 0.39 dB and 1.45 dB, respectively.The histogram in Figure 1c shows that the decay times T k,x vary between 0.3 s and 0.6 s.Such a small decay time variability could be easily accommodated for the previous two transitions, because the multi-exponential model is very versatile and can compensate for slightly wrong decay times by adapting the amplitudes accordingly.This has already been found by Lanczos, who states that the decomposition of a decay function into a linear combination of exponentials is a highly ill-conditioned problem [33].However, with only one active decay time or slope, the margin for compensation is considerably reduced.Consequently, slightly higher errors are observed for this transition.dB-MSE [c.f.Eq. ( 9 MSE, c.f. Eq. ( 9)] between directional energy decay functions of the room transition dataset [28,29], the common-slope model [c.f.Eqs. ( 4) and ( 5)] and the traditional multi-exponential model [c.f.Eqs. ( 2) and ( 3)].The table is based on the spatial and directional analysis, where directional information is obtained via beamforming.A perfect fit would yield a dB-MSE value of 0 dB.We refer to the different analyzed source positions as NLOS = no line-of-sight, and CLOS = clear line-of-sight.
Figure 7 shows only the analysis results for the beamformer direction f r = 0 , q r = 0 .It demonstrates how the A 1,x values are gradually getting smaller while transitioning into the kitchen.In contrast, the noise amplitudes N 0,x remain approximately constant throughout the transition.The dB-MSE [c.f.Eq. ( 9)] remains low for the entire transition, thus indicating that the common-slope model can accurately fit the sound energy decay of this scene.
Lastly, Table 2 compares the fitting performance of the common-slope model [c.f.Eqs. ( 4) and ( 5)] and the traditional multi-exponential model [c.f.Eqs. ( 2) and ( 3)].It features the mean, median, and 99 % quantile dB-MSE values [c.f.Eq. ( 9)] of all transitions and source positions.The mean and median values of the common-slope model lie between 0.13 dB and 0.46 dB, and the 99 % quantile values are below 1.5 dB for all evaluated cases, thus indicating that the common-slope model is suitable for describing all evaluated transitions.Furthermore, the fitting performance is comparable to, but sometimes less robust than the traditional multiexponential model, despite requiring only approximately half of its parameters.

CONCLUSIONS
This paper introduced the common-slope model of reverberation, which uses a common set of decay times to model sound energy decay functions (EDFs) of multiple source-receiver configurations within the same environment.Consequently, all position-and direction-dependent EDF variations are described in terms of decay and noise amplitudes only.In the present study, we used the common-slope model to analyze sound energy decays in coupled rooms.We showed that the common-slope model is suitable for describing multi-exponential EDFs with varying source-receiver configurations, while requiring considerably fewer parameters than the traditional multi-exponential model.By using the same set of decay times to model all source-receiver configurations, the common-slope model yields easily interpretable room acoustic analysis results.For example, we demonstrated that the common-slope model can be used to analyze how the decay behavior gradually changes when transitioning from one room to another through a connecting door.Furthermore, we showed that the common-slope model introduces only little error between the true and modeled EDF.
The common-slope model will benefit future research efforts in room acoustic analysis and modeling.Additionally, it can be used in all acoustic applications relying on sound energy decay or late reverberation models, such as echo cancellation, source separation, dereverberation, sound field equalization, and parametric spatial audio rendering.
Figure 1.K-means clustering of decay times T k,x to obtain the common decay times T k .The subplots demonstrate the clustering on all three transitions that are analyzed in this paper (c.f.Section 4.1).Each subplot is based on 9696 source-receiver configurations (101 receiver positions ⇥ 4 source positions ⇥ 24 beamformer directions, c.f. Section 4.2 for more details).

Figure 2 .
Figure 2. Common-slope analysis results (1 kHz octave band) of the "Meeting room to hallway, source in meeting room, clear line-of-sight" transition.The plots show the a) A 1,x , b) A 2,x ,and c) dB-MSE [c.f.Eq. (9)] values for a common-slope analysis with the common decay times T 1 = 0.43 s and T 2 = 1.53 s.

Figure 3 .
Figure3.Common-slope analysis results (1 kHz octave band, f r = 0 , q r = 0 ) of the "Meeting room to hallway, source in meeting room, clear line-of-sight" transition.The plots show the A 1,x , A 2,x , N 0,x , and dB-MSE [c.f.Eq. (9)] values for all positions along the transition line.The common-slope analysis is based on the common decay times T 1 = 0.43 s and T 2 = 1.53 s.

Figure 4 .
Figure 4. Common-slope analysis results (1 kHz octave band) of the "Office to stairwell, source in office, no line-of-sight" transition.The plots show the a) A 1,x , b) A 2,x , c) A 3,x , and d) dB-MSE [c.f.Eq. (9)] values for a common-slope analysis with the common decay times T 1 = 0.28 s, T 2 = 0.42 s, and T 3 = 1.02 s.

Figure 6 .
Figure 6.Common-slope analysis results (1 kHz octave band) of the "Office to kitchen, source in office, no line-of-sight" transition.The plots show the a) A 1,x and b) dB-MSE [c.f.Eq. (9)] values for a common-slope analysis with the common decay time T 1 = 0.43 s.

Table 1
[28,29]ry of all analyzed room transitions.The measurements are part of the Room Transition dataset by McKenzie et al.[28,29].