Common-Slope Modeling of Late Reverberation

The decaying sound field in rooms is typically described by energy decay functions (EDFs). Late reverberation can deviate considerably from the ideal diffuse field, for example, in multiple connected rooms or non-uniform absorption material distributions. This paper proposes the common-slope model of late reverberation. The model describes spatial and directional late reverberation as linear combinations of exponential decays called common slopes. Its fundamental idea is that common slopes have decay times that are invariant across space and direction, while their amplitudes vary across both. We explore different approaches for determining the common slopes for large EDF sets describing different source-receiver configurations of the same environment. Among the presented approaches, the k-means clustering of decay times is the most general. Our evaluation shows that the common-slope model introduces only a small error between the modeled and the true EDF, while being considerably more compact than the traditional multi-exponential model. The amplitude variations of the common slopes yield interpretable room acoustic analyses. The common-slope model has potential applications in all fields relying on late reverberation models, such as source separation, dereverberation, echo cancellation, and parametric spatial audio rendering.


I. INTRODUCTION
T HE sound field in a room can be described with room im- pulse responses (RIRs), which represent the time domain transfer functions for specific source-receiver configurations.Due to reflections from the room boundaries, the sound field is a superposition of multiple sound waves with distinct amplitudes, phases, and propagation directions [1].Consequently, the sound pressure and energy decay over time vary within the room, and their spatial variations (inhomogeneity) can be described if RIRs are available for every possible source and receiver position.Additionally, directional variations (anisotropy) must be taken into account when considering directional sources and receivers.
RIRs are often parameterized to obtain analytical and potentially compact representations of the source-receiver transfer functions.To this end, various parametric sound field models have been established, such as the description as a stochastic random process [2], [3], [4] or the decomposition into plane waves [5], [6], [7], spherical waves [7], [8], spherical harmonics [9], or room modes [10], [11], [12].Such models are valuable for estimating the RIR of spaces using computational tools, or for describing the source-receiver transfer function in audio signal processing applications like source separation, dereverberation, echo cancellation, or spatial audio rendering.
RIRs are usually separated into direct sound, early reflections, and late reverberation [13].Late reverberation is characterized by its large echo density [1] and can consequently be modeled as exponentially decaying noise with one or more decay rates [3], [14], [15].Furthermore, late reverberation is usually assumed to be diffuse and isotropic, thus requiring the energy density to be uniform throughout the room and over all directions [1], [2].Other stochastic late reverberation properties in diffuse and isotropic sound fields are derived by Badeau in the space, time, and frequency domain [2].
Coupled rooms were the subject of various studies, some of which have established models for predicting their energy decay.For instance, Cremer and Müller [30, pp. 261 ff.] derived a model via the analysis of power balances in a diffuse sound field.Their model can predict the energy decay in each room, but does not account for position-dependent effects.A recent room acoustic simulator uses Cremer and Müller's model to render late reverberation in coupled rooms [31].Luizard et al. [32] extended Cremer and Müller's work and proposed a parametric solution of the diffusion equation for modeling the inhomogeneous reverberation in coupled rooms.Their model features two exponential decays with spatially invariant decay rates, whose amplitudes are adapted according to the source-receiver and aperture-receiver distance.
Energy decay functions (EDFs) obtained from the Schroeder backwards integration procedure [33] are suitable descriptors of the decaying sound field when investigating late reverberation.Usually, modern sound decay analyses utilize a well-established model consisting of multiple exponentials and a noise term [15].This model can analyze energy decays from various environments, including coupled rooms or geometries with considerably non-uniform absorption material distributions [15].Due to its generality, the model has many degrees of freedom, namely two for each exponential (one amplitude and one decay time) and one for the noise amplitude.Consequently, the model may easily overfit when it is carelessly used to analyze large EDF sets consisting of spatially or directionally distributed EDFs.
Therefore, this article introduces the common-slope model of late reverberation.Its fundamental idea is that large EDF sets, whose EDFs correspond to different source-receiver configurations within the same environment, can be described with one common set of decay times.Consequently, all spatial and directional EDF variations are described solely in terms of exponential and noise amplitudes.The common-slope model of late reverberation significantly reduces the degrees of freedom in spatial and directional energy decay analysis, while introducing only a small error between the modeled and the true EDF.
The common-slope model is conceptually similar to the previously described parametric model by Luizard et al. [32] because it uses multiple decaying exponentials with fixed1 decay times but variable amplitudes.In contrast to the model by Luizard et al., our model is descriptive and not predictive.It constitutes a concise parametric framework, allowing us to model inhomogeneous and anisotropic late reverberation.Our proposed model is also inspired by the common-acoustical-pole and residue (CAPR) model proposed by Haneda et al. [10], which can be used to interpolate and extrapolate room transfer functions by exploiting that room mode decay times are independent of the source-receiver configuration.More recently, the CAPR has been extended to leverage frequency-band-wise processing [34].
The common-slope model extends, and generalizes the previous two models regarding two aspects.Firstly, it requires fewer parameters than the CAPR model, because it is based on common EDF slopes as opposed to common modes.This property implicitly assumes a dense modal overlap, thus limiting our model's advantage over the CAPR model to frequencies above the Schroeder frequency.Secondly, our model extends Luizard et al.'s model, because it is applicable to general room geometries with inhomogeneous and anisotropic late reverberation.
The remainder of this article is structured as follows.Section II summarizes the background and some acoustic fundamentals.Section III derives the common-slope decay model and outlines different approaches for determining the common decay times.Section IV describes the two datasets that are used throughout this article.In Section V, the common-slope model is evaluated on two datasets.Finally, Section VI discusses the results, and Section VII concludes the article.

II. BACKGROUND
A room impulse response (RIR) describes the combined effects on sound waves traveling from a sound source at position x s = (x s , y s , z s ) to a receiver at position x r = (x r , y r , z r ).For directional sound sources and receivers, it is also important to consider the propagation path's direction of departure (DOD) from the source and direction of arrival (DOA) at the receiver [35].They are denoted by Ω s = (φ s , θ s ) and Ω r = (φ r , θ r ), respectively, where φ is the azimuth angle and θ is the elevation angle.For conciseness of notation, we combine source and receiver position and the propagation directions into the source-receiver configuration x = (x s , x r , Ω s , Ω r ) throughout the remainder of this article.
RIRs can be described as a superposition of modes [1, pp. 82-88].For a large number of modes, M , such a modal decomposition of the RIR is given as [1] where χ m (x) describes how the mode amplitudes and phases vary for different source-receiver configurations, and τ m (t) models the temporal mode shape as decaying sinusoids with the mode frequency ω m and decay rate δ m .Throughout this article we assume discrete time, i.e., t is the discrete-time sample index and f s is the sampling frequency.Equation ( 1) exhibits an important property that is central to this work.The temporal terms τ m (t) do not depend on the source-receiver configuration x.More precisely, they only depend on the room geometry and wall properties [1].It is well understood that inhomogeneity can be fully described in terms of mode amplitude variations, thus not affecting the mode decay times [10].Furthermore, the modal formulation can take anisotropy into account as well, which becomes important when considering directional sound sources or receivers.
For example, the source directivity affects how strongly individual room modes are excited.This phenomenon becomes evident when decomposing a source with arbitrary directivity into monopoles with different amplitudes and phases [36], [37].Each monopole excites the same room modes, but with different amplitudes.As the sound field evoked by the directional source is a superposition of all monopole sound fields, the corresponding mode amplitudes are the sum of the individual monopole mode amplitudes.Consequently, the mode amplitudes will vary when the source directivity changes or a directional source rotates.However, the mode decay times are not affected by the source directivity, thus making it possible to assign a set of common mode decay times for variable source directivities and orientations.
Similarly, directional receivers affect how strongly certain modes are measured.For example, a directive microphone focuses the measurement on modes that coincide with the microphone's sensitivity, while suppressing modes from other directions.Consequently, the mode amplitudes change with varying receiver directivity and orientation.The mode decay times are unaffected by the amplification or suppression of modes due to the receiver directivity.Therefore, it is possible to choose a set of common mode decay times for describing sound energy decays with variable receiver directivities and orientations.
To summarize, the temporal terms τ m (t) are fixed for a given room geometry and wall properties, whereas mode amplitudes χ m (x) describe how the RIR varies for different source-receiver configurations.We will refer to this property from now on as common-decay property (CDP).The CDP was previously exploited by Haneda et al. [10], who used it in their common-acoustical-pole and residue (CAPR) model to interpolate between RIRs.

III. COMMON-SLOPE DECAY ANALYSIS
The first part of this section introduces the common-slope model for sound energy decay analysis.It is based on modeling sound energy decays of multiple source-receiver configurations with a set of common decay times.In other words, the commonslope model applies the CDP to sound energy decay analysis.The second part of this section describes different approaches for determining the common decay times.

A. The Common-Slope Model
The Schroeder backwards integration procedure [33] is commonly used to describe the sound energy decay in rooms and obtain smooth energy decay functions (EDFs), which are also called energy decay curves (EDCs) when presented graphically.EDFs are usually calculated in frequency bands, e.g., octave or fractional octave bands [16], [17].Throughout this article, we work with EDFs in octave bands.Following the Schroeder backwards integration procedure [33], an EDF can also be calculated directly from a (bandlimited) RIR as where L is the number of samples in the EDF.EDFs are often normalized with the total EDF energy E(x) = L l=1 h 2 (x, l).However, as this article deals with inhomogeneity and anisotropy, the normalization will not be applied here.
By squaring (1) and time-averaging over the cosine terms, one obtains a modal expression for the sound energy decay [1], [38]: If the individual mode decay rates δ m do not differ considerably in the investigated frequency band, they may be approximated by their average value δ [1].The single-exponential decay model d1 (x, t) of an RIR measurement at position x is then given as where The reverberation time RT 60 , which describes how much time elapses until the sound energy in an enclosure has decreased by 60 dB [16], [17], can be obtained via where ln(•) denotes the natural logarithm.It can be seen from ( 5) that the exponential amplitudes A 1,x are the only part of the decay model that depend on the source-receiver configuration x.
There are however many scenarios, where the δ m within a frequency band are dissimilar.For example, previous studies showed that coupled rooms or geometries with non-uniform absorption material distributions exhibit sound decays with multiple decay rates, and how the decay rates and amplitudes can be predicted from diffuse field assumptions and power balance equations using knowledge about the room geometry, absorption distribution, coupling factor, and the decay rates of the uncoupled rooms [30], [32], [39], [40], [41].From a modal perspective, the different decay rates in coupled rooms can be explained by the localization of modes, which means that some modes appear primarily in either one of the coupled rooms, while contributing less to the sound field of the other [42].Similarly, in scenarios with non-uniform absorption distributions, modes can be grouped into non-grazing and grazing incidence, i.e., modes which involve the surfaces with differing absorption properties and surfaces which do not [40], [41].Generally, the modes in (1) can be grouped into multiple distinct mode groups, yielding where M k denotes a set containing all mode indices of the kth mode group, and κ is the number of mode groups.By using ( 4) and ( 5) on every mode group and summing all contributions, a multi-exponential decay model can be established.Decay models consisting of multiple exponentials and a noise term are frequently used to model sound decays in coupled rooms and geometries with non-uniform absorption material distributions [15], [27], [43].In this article, we use the previously elaborated CDP to extend a well-established multi-exponential decay model [15] to account for different source-receiver configurations.We propose the common-slope model, which is given by Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
with the decay kernel In the model, T k and A k,x are the decay times and amplitudes of the kth mode group, respectively, N 0,x is the amplitude of the noise term, and the constant −13.8 = ln(10 −6 ) ensures that the sound energy has decayed by 60 dB after T k seconds.The decay times T k are obtained from the decay rates analogously to (7).The number of mode groups κ is also called model order.
The constant second term in the square brackets of ( 9) accounts for the finite upper limit of integration during the Schroeder backwards integration and can be neglected for large L [44].Equation ( 9) can be written using matrix notation where the modeled decay function d κ,x , decay kernels Ψ, and amplitude values a x are given as with (•) T denoting the matrix transpose, , and Ψ ∈ R L×(κ+1) .Equations ( 9) and ( 11) use the previously elaborated CDP because the mode group decay times T k , which are equivalent to the decay times of the individual EDF slopes, do not vary with the source-receiver configuration x.Therefore, we will call them common decay times and common slopes throughout the remainder of this article.In contrast, the decay and noise amplitudes A k,x and N 0,x vary with the source-receiver configuration.More precisely, the A k,x values model the mode amplitude variations, which were previously described by χ m (x) in ( 1) and ( 8).The spatial dependence of the N 0,x values accounts for localized noise sources.
After determining the common decay times T k , the remaining model parameters A k,x and N 0,x need to be estimated for all source-receiver configurations.With fixed decay times, this endeavor simplifies to a constrained linear least-squares problem where (•) † denotes the pseudo-inverse and d x the measured EDF in vector notation analogous to (12a).To obtain meaningful solutions, the problem has to be constrained, such that A k,x ≥ 0 and N 0,x ≥ 0.

B. Determination of Common Decay Times
The mode decay rates and amplitudes follow certain distributions and their relationship to EDFs was established by Kuttruff [1], [38].While mode amplitude distributions have been analytically described for certain well-established scenarios like shoebox rooms [1], less is known about the mode decay times.Given an arbitrary RIR set, the entire modal decomposition of the sound field is usually not available.Consequently, the determination of common decay times is in practice not as straightforward as described in the previous section.
The following section therefore investigates four approaches for determining common decay times in general settings.All approaches use an RIR set as their input and determine a commonslope set as their output.In the most general case, an RIR set may consist of RIRs with spatial variations (i.e., different source and receiver positions) and directional variations (i.e., different source directivities, source orientations, receiver directivities, and receiver orientations).In this article, the analyses are based on octave band processing, resulting in one set of common decay times per octave band, but the analyses can also be carried out in frequency bands with other bandwidths.However, for very low frequencies, where room modes are sparsely distributed in the frequency domain, it may be necessary to model individual modes analogously to the CAPR model by Haneda et al. [10].
Fig. 1 summarizes the common-slope analysis.Please note that the figure only features the last of the four approaches for determining common decay times because it is the most general, robust and therefore preferable approach.
1) Average EDFs: The first approach for obtaining common decay times is based on averaging EDFs of different sourcereceiver configurations where N is the number of source-receiver configurations.This averaging step is analogous to the spatial averaging for reverberation time measurements outlined by ISO 3382-2 [17].The resulting average EDF d(t) can be analyzed with standard decay analysis approaches [45], [46], directly yielding the common decay times T k .However, the standard only covers averaging for singleexponential sound energy decays, and extending it to multiexponential decays is not straightforward due to the need for manual selection of EDFs to average.In a coupled room scenario, the faster decay is often masked by the slower decay when measured in the more reverberant room, making it harder to observe both slopes.Fig. 2 illustrates this phenomenon on coupled room measurements from the Room Transition dataset [47], [48].
Averaging problems may occur if there are more single-slope measurements from the more reverberant room than doubleslope measurements from the less reverberant room.This results in the faster decay being lost during averaging.While this issue can be resolved by only averaging EDFs from the less reverberant room, it is unclear how well this approach generalizes to more complex coupled geometries.
2) Average T k,x Values: The second approach for obtaining common decay times is based on spatial or directional averaging of decay time values.In a first step, the sound energy decays of all RIRs are analyzed with a standard decay analysis method [45], [46].This step yields an EDF parameterization in terms of Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Although not explicitly depicted in the figure, it is important to understand that source-receiver configurations describe the entire source-receiver configuration, i.e., the position and orientation of both.The common-slope analysis of an acoustic environment involves three steps.Firstly, the EDFs d(x, t) of all available source-receiver configurations are analyzed with a standard decay analysis algorithm [45], [46] to obtain configuration-dependent T k,x values.Secondly, the T k,x values are clustered into κ mode groups to obtain the common decay times T k .Lastly, the common-slope amplitudes A k,x of the common-slope model [c.f. ( 9) and ( 10)] are determined via a least-squares fit to the EDFs [c.f. ( 13)].Fig. 2. Energy decay curves (EDCs) measured in a coupled room geometry consisting of a less reverberant meeting room (source-receiver configuration x (m) ) and a more reverberant hallway (source-receiver configuration x (h) ).In the hallway, only the slower decay time is observable.For both measurements, the sound source was located in the meeting room.The measurements are part of the Room Transition dataset [47], [48], with receiver positions at x (m) r = 0 cm and x (h) r = 500 cm, respectively [c.f.Fig. 4].The EDCs were calculated in the 1 kHz octave band and the decay times were obtained with the DecayFitNet [45].Both EDCs are based on measured RIRs with background noise, thus featuring the characteristic noise floor bump at their ends.
T k,x , A k,x , and N 0,x , where the decay times T k,x still exhibit inhomogeneity and anisotropy, as indicated by the subscript.In a second step, the decay time values T k,x are averaged over the source-receiver configuration, resulting in the common decay times T k .This approach is analogous to the reverberation time measurement procedure outlined by ISO 3382-1 and ISO 3382-2, in which reverberation time values are averaged over multiple source and receiver positions [16], [17].For the coupled room example above, this approach features the same drawback as the previous one if single-slope decays from the more reverberant room are blindly analyzed with the same double-slope decay model that is used for the double-slope EDFs from the less reverberant room.The fitting algorithms will either return the same decay time twice or return an incorrect second decay time with near-zero decay amplitude.While this issue can be resolved by manually selecting and grouping decay time values, it is not feasible for large-scale automatic analyses.
3) Choose T k Values With Minimum Fitting Error on all EDFs: Another approach for determining common decay times is to estimate T k,x values for all source-receiver configurations and choose the combination that best fits all EDFs in the set.The T k,x values can be estimated using a standard decay analysis method [45], [46], and candidate fits can be obtained with a constrained least-squares fit [c.f. ( 13)].Finally, the goodness of fit can be evaluated for all T k,x -EDF-combinations using an error metric like the mean squared error (MSE).However, this approach becomes impractical for large RIR sets due to the high number of combinations that need to be analyzed.Fig. 3. K-means clustering of decay times (1 kHz octave band) in a coupled room geometry consisting of an acoustically-treated meeting room and a reverberant hallway.The decay times were obtained from simulations contained in the Extended Room Transition dataset (variant 1, c.f. Section IV).The DecayFitNet [45] was used for the decay analysis.

4) K-Means Clustering of T k,x
Values: To overcome the issues with the previous approaches, decay times can be automatically clustered based on their mode group using the k-means clustering algorithm.This algorithm divides a set of data points into multiple clusters (determined by the number of mode groups, κ, in this case) based on their distance (e.g.squared Euclidean distance) to the cluster mean [49], [50].
To determine the common decay times using the k-means clustering approach, the EDFs of the RIR set are first analyzed using an established decay analysis method [45], [46].The resulting T k,x values are then pooled and clustered into κ clusters, corresponding to the mode groups expected for the room geometry.The number of mode groups can be visually determined from a histogram of the T k,x values, where distinct clusters appear as separate groups.The largest histogram bin for each cluster is identified, and the centers of these bins correspond to the common decay times k .Fig. 3 illustrates this approach.

IV. DATASETS UNDER INVESTIGATION
We use two datasets throughout this article to derive the common-slope decay analysis, illustrate some of its concepts, and evaluate the proposed approach.Both datasets are based on the room geometry illustrated in Fig. 4.
The first dataset is part of the Room Transition dataset (RTD) by McKenzie et al. [47], [48], which contains higher-order Ambisonic room impulse responses of transitions between coupled rooms.The responses were measured with a coaxial loudspeaker (Genelec 8331 A) and a higher-order spherical microphone array (mh acoustics em32 Eigenmike).We limit our analyses to measurements from the "meeting room to hallway" transition, where the first room is an acoustically-treated meeting room (4.6 m × 6.6 m × 2.8 m; volume: approx.85 m 3 ) and the second room is a more reverberant hallway (4.5 m × 18 m × 2.8 m; volume: approx.227 m 3 ).The measured transition is 5 m long, centered around the door connecting the rooms, and features receiver positions every 0.05 m.Consequently, the transition consists of N (RTD) = 101 RIRs in total.We only consider the "source in meeting room, no line-of-sight" (NLOS) configuration,2 i.e., the sound source remains at the position x (NLOS)   s for all measurements.The energy decay functions of this transition feature a distinct double-slope characteristic, which is typical for coupled room geometries [27], [39].
The second dataset is the Extended Room Transition dataset (ERTD).It was compiled specifically for this article and contains finite-difference time-domain (FDTD) simulations.It extends the "meeting room to hallway" transition of the RTD by additional receiver positions in both rooms.The ERTD contains N (ERTD) = 2833 RIRs in total.For all RIRs, the sound source position was chosen according to the "source in meeting room, no line-of-sight" (NLOS) configuration of the RTD.While the RTD only features receiver positions on a straight line between the rooms, the Extended Room Transition dataset samples the entire room geometry using a uniform grid with 0.2 m resolution.All simulations are performed on the horizontal plane at 1.55 m Fig. 5. Common-slope analysis results (1 kHz octave band) of a transition between coupled rooms.The investigated geometry consists of a meeting room and a more reverberant hallway (c.f.Section IV; the transition is indicated by the blue line in Fig. 4).The analyses are based on (a) measurements from the Room Transition dataset [47], [48] and (b) FDTD simulations from the Extended Room Transition dataset.The common decay times were obtained with the k-means clustering approach (c.f.Section III-B4).In both plots, the A 2,x values indicate that the slope with the longer decay time T 2 is getting stronger while approaching and entering the more reverberant hallway.The common-slope model introduces only a small decibel-based mean squared error (dB-MSE) between modeled and true energy decay functions [c.f. ( 16)].A perfect fit would yield a dB-MSE of 0 dB.
height.An open-source FDTD solver3 [51] (3D standard rectilinear scheme, sampling frequency 80 kHz, omnidirectional soft source) was used for the simulations.The room geometry was modeled according to the geometry specifications described by McKenzie et al. [47].We simulated two variants of the dataset.In variant ERTD1, the wall absorption properties were assigned to approximately match the corresponding measurements.The absorption coefficient was assigned uniformly to all walls in the respective rooms, where the meeting room walls exhibited an absorption coefficient α R1 = 0.12, while the hallway was modeled less absorbent with α R2 = 0.042.In variant ERTD2, a significantly higher absorption coefficient α R3 = 0.48 is assigned to the right wall of the hallway, while all other surfaces are modeled analogous to ERTD1.We use ERTD2 in our evaluation to demonstrate that the common-slope model can also be used in environments with very non-uniform absorption distributions.

A. Room Transition Along a Straight Line: Spatial Analysis
We first apply the common-slope analysis to RIRs from the room transition along a straight line, as indicated by the blue line in Fig. 4. Our analysis is based on the 1 kHz octave band of the RTD (omnidirectional channel) and ERTD1.
The following steps were carried out analogously for both RIR sets.The DecayFitNet [45] was used to analyze the EDFs d(x, t).The resulting T k,x were clustered using the k-means approach with κ = 2, as coupled room geometries usually have two mode groups and the histogram of the T k,x values showed two clear clusters.The common decay times for the RTD and the ERTD1 amounted to {T = 1.48 s}, respectively, indicating good agreement between measured and simulated sound energy decays.T 1 and T 2 correspond to the decay times of the acoustically treated meeting room and the reverberant hallway, respectively.Finally, the decay and noise amplitudes A k,x and N 0,x were determined via a linear least-squares fit [c.f. ( 13)].
Fig. 5(a) and (b) summarize all results of the common-slope analysis for the RTD and ERTD1, respectively.The common decay times are indicated by straight horizontal lines because they are by definition modeled constant along the entire transition.
The decay amplitudes A 1,x and A 2,x vary considerably with the transition position.Generally, A 2,x , which corresponds to the more reverberant hallway, is smaller for receiver positions in the meeting room and gradually increases during the transition through the door.The decay amplitudes A 1,x exhibit an inverse effect, i.e., they decrease while transitioning from the meeting room to the hallway.However, the faster-decaying slope is quickly masked by the slower-decaying slope for receiver positions further in the hallway, thus making A 1,x difficult to detect.Therefore, we deleted A 1,x values below −40 dB from Fig. 5.
Interestingly, the N 0,x values in Fig. 5(a) also vary considerably over the entire transition, likely due to a localized noise source during the measurements.After consulting with the authors of the RTD, the noise source was identified as a printer located in the hallway, which explains why the noise values there are consistently higher than in the meeting room.Although no explanation was found for the intermediate peaks, manual analysis of EDFs confirmed that the increased noise floor is part of the measurements and not due to fitting errors.To demonstrate the robustness of the common-slope analysis, a constant noise floor was added to the ERTD1 simulations.Fig. 5(b) shows that the N 0,x values are Fig. 6.Common-slope analysis results (1 kHz octave band) of a transition between coupled rooms.The analysis is based on measurements from the Room Transition dataset [47], [48] (c.f.Section IV; the transition is indicated by the blue line in Fig. 4).The investigated geometry consists of a meeting room (T 1 = 0.43 s) and a more reverberant hallway (T 2 = 1.53 s).Directional analysis results are obtained from the higher-order Ambisonic RIRs via beamforming.Plots (a) and (b) show the A 1,x and A 2,x values, respectively, for various azimuth angles and fixed elevation angle θ = 0. Plots (d) and (e) show the A 1,x and A 2,x values, respectively, for various elevation angles and fixed azimuth angle φ = 90 • .The directional analysis highlights the anisotropy of the late reverberant sound field and demonstrates how the two decay processes cross-fade during the transition between the rooms.The common-slope model introduces only a small decibel-based mean squared error (dB-MSE) between modeled and true energy decay functions [c.f. ( 16)], as depicted in Plots (c) and (f).A perfect fit would yield a dB-MSE of 0 dB.estimated correctly as approximately constant over the entire transition.
The common-slope analysis does not introduce significant errors, as evidenced by the decibel-based mean squared error (dB-MSE) curves in Fig. 5.We define the dB-MSE as where the true EDF d (dB) (x, t) and its common-slope model d (dB) κ (x, t) are represented on a logarithmic scale in dB.The plots show that the dB-MSE is close to 0 dB with no considerable peaks along the transition.The average dB-MSE for the RTD and the ERTD1 amount to 0.37 dB and 0.31 dB, respectively, indicating that the common-slope decay model is a suitable description of the inhomogeneous energy decays featured in both datasets.The common-slope model reduces the number of source-receiver-configuration-dependent parameters from 5 (2 decay times, 2 decay amplitudes, 1 noise amplitude) to 3 (2 decay amplitudes, 1 noise amplitude), and increases the interpretability of results compared to a representation where decay times and amplitudes vary.

B. Room Transition Along a Straight Line: Spatial and Directional Analysis
In this section we extend the preceding analysis with directional information, i.e., the analyzed RIR set now exhibits both inhomogeneity and anisotropy.Directional sound energy decay variations can be described with directional EDFs (DEDFs), as proposed by Berzborn and Vorländer [28].We obtain the J directionally constrained beamformer output signals S ∈ R J×L by steering axisymmetric beamformers where s ξ is the ξth beamformer output, and A ∈ R J×(N +1) 2 and h ∈ R (N +1) 2 ×L denote the analysis matrix and the N thorder Ambisonic RIR, 4 respectively.The analysis matrix A can be calculated from the axisymmetric beamformer weights c N ∈ R N ×1 and the spherical harmonics y(Ω ξ ) ∈ R (N +1) 2 ×1 evaluated at the beamformer steering directions Ω ξ as The term diag N (•) formalizes μ times repeating the νth axisymmetric beamformer weight, with μ and ν being the spherical harmonic (SH) degree and order, respectively.We employ a 4 We do not explicitly define a channel ordering here because it is not relevant for this work as long as it is used consistently along the entire analysis pipeline.The maximum spherical harmonic order N shall not be confused with the noise term N 0 of the decay model.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.spherical Butterworth beamformer [52, Table 3.1], which exhibits a great front-to-back-separation and has been successfully used in room acoustic analysis and Ambisonic RIR processing applications [53], [54].The axisymmetric weights of the spherical Butterworth beamformer are given by where we set the Butterworth beamformer order γ = 5, and the cuton SH order ν c = 3.Finally, the DEDFs d(x, t) are obtained by applying the Schroeder backwards integration procedure [c.f.(3)] on the beamformer outputs.The source-receiver configuration x now represents source and receiver position as well as DOD and DOA.
The common-slope analysis follows the same procedure as in the previous section.Fig. 6 summarizes analysis results, revealing how the two decay processes of the sub-rooms cross-fade during the transition.The reverberation of the more reverberant hallway (T 2 = 1.53 s) gradually fades in while approaching the door and eventually passing through it.In the hallway, the longer decay predominates and masks the reverberation of the meeting room (T 1 = 0.43 s).
Fig. 6(a), (b), (d), and (e) show anisotropic late reverberation in the investigated environment for various azimuth and elevation angles.In the meeting room, the reverberation of the hallway is most prominent when steering the beamformer toward the door.Higher decay amplitudes can be observed when the beamformer is steered toward φ = ±90 • .This effect might be caused by the non-uniform absorption distribution and uneven excitation of the room due to the directional loudspeaker.Fig. 6(d) and (e) highlight the anisotropy of the late reverberation with respect to different elevation angles, which is caused by the non-uniform absorption distribution in the rooms (e.g., carpet in the meeting room).

C. Two-Dimensional Receiver Grid
In this section we extend the analysis of Section V-A to the entire room geometry depicted in Fig. 4. We carry out the common-slope analysis analogously to the previous section.
For the ERTD1 variant, we obtain the common decay times T 1 = 0.46 s and T 2 = 1.51 s.Fig. 7(a) shows the spatial variations of the A 1,x values.They are generally higher in the meeting room, where they slightly decrease with increasing distance to the sound source.The A 1,x values are somewhat larger in the lower half of the hallway, potentially due to shadowing caused by the wall.While the wall shadows the sound for the upper half of the hallway, some sound can directly pass through the door to reach the lower half of the meeting room.Fig. 7(b) shows the spatial variations of the A 2,x values.In the meeting room, they are on average 6 dB to 8 dB lower than in the hallway.The plot also shows the previously described fade-in behavior when transitioning from the meeting room to the hallway, with higher amplitudes in front of the door that gradually decrease towards the sides.
Lastly, Fig. 7(c) illustrates the dB-MSE [c.f. ( 16)] introduced by the common-slope analysis.The dB-MSE is well below 3 dB for almost all receiver positions, with an average dB-MSE of 0.36 dB.The highest errors occur near the sound source, caused by steep sound energy drops after the direct sound that cannot be described by the multi-exponential model.Slightly increased errors can be observed at receiver positions far from the door due to longer horizontal sections in the early part of the EDFs due to the obstructed direct sound.
Fig. 8 summarizes the common-slope analysis results for the ERTD2 variant.A new mode group emerges due to the Table I shows the dB-MSE between EDFs of the ERTD and various models.It compares the traditional multiexponential [15] and single-exponential model [55] with the proposed common-slope model [c.f. ( 9) and (10)].While the traditional multi-exponential model yields the lowest mean and median dB-MSEs for all octave bands, the fitting performance of the proposed common-slope model is only slightly inferior, despite requiring only approximately half of the source-receiverconfiguration-dependent parameters.In contrast, the traditional single-exponential model is clearly not general enough for some source-receiver configurations in the coupled room geometry, as indicated by the considerably increased mean and 95% quantile values.However, the median error of the traditional singleexponential model is comparable to the other two models due to the large number of receiver positions in the more reverberant hallway (N (ERTD)  hallway = 2070, N (ERTD) meeting = 759).Despite the room coupling, most EDFs with receiver positions in the hallway can be described with a single-exponential model because the faster energy decay of the meeting room is masked by the slower energy decay of the hallway (c.f.Fig. 2).This effect is less pronounced for the ERTD2.
The common-slope model yields good fitting results for all octave bands with mean dB-MSEs well below 1 dB.The increased model order in the 250 Hz and 500 Hz bands can be explained by additional mode groups.At lower frequencies, axial and tangential modes are predominant, whereas the higher frequencies are dominated by oblique modes [56].The different mode types are characterized by the number of walls that are involved during their formation, and consequently, their decay times may vary considerably [56].
Firstly, the multi-exponential decay model is well-established for describing sound energy decays in rooms [15], [44].Therefore, EDFs of different source-receiver configurations can be easily expressed with distinct model parameter combinations, i.e., by varying the decay times, decay amplitudes, and noise amplitudes.In fact, modifying all parameters at the same time may be an overparameterization.According to Lanczos [57], the separation of exponentials, i.e., the decomposition of a decay function into a linear combination of exponentials with initially unknown decay times [c.f. ( 9) and ( 10)], is a highly ill-conditioned problem. 5In other words, due to an inherent lack of measurement accuracy, two decay models with considerably different decay times, decay amplitudes, and model orders can result in numerically equivalent fits to a given measurement.It is therefore not surprising that previous studies found a combination of decay time and amplitude variations to yield suitable EDF fits.However, the common-slope analysis provides a more compact representation of the variations in terms of only decay amplitudes, thus making the results easier to interpret.For example, the evaluation in Section V demonstrated that phenomena like the fade-in of slopes in coupled room geometries become evident by using the common-slope analysis.
Secondly, for certain EDF variations, variable decay times would actually yield a numerically better fit than the commonslope model.For example, let us assume a room that features modes with decay times uniformly distributed between 1.2 s and 1.8 s.Furthermore, we consider two different source-receiver configurations in this room.For the first configuration, x (1) , the mode amplitudes are zero for all modes with decay times above 1.5 s, whereas, for the other configuration, x (2) , they are zero for all modes below 1.5 s.In such an admittedly quite extreme scenario, we would expect the two configurations to exhibit numerically best fits with different decay times, such as T 1,x (1) = 1.35 s and T 1,x (2) = 1.65 s, respectively.This observation is also the reason why the unconstrained decay analysis yields decay time distributions as displayed in the histogram of Fig. 3.Moreover, it explains why the common-slope model introduces a small error at certain positions.
In summary, the goal of the common-slope analysis is not to find a numerically perfect fit to the measurement, because such a fit may not even exist due to the ill-conditioned nature of the fitting problem [57].In contrast, the goal is to find a set of common decay times, which are representative of the underlying room geometry and introduce a low error for all considered source-receiver configurations.This objective can be seen analogous to the reverberation time measurement according to ISO [16], [17], which aims to find a reverberation time value that is representative of the measured room.The common-slope analysis extends this concept by introducing multiple slopes and directional analysis.
The common-slope analysis is heavily inspired by the common-acoustical-pole and residue (CAPR) model by Haneda et al. [10].However, the common-slope analysis abstracts away the underlying modal nature of the sound field, and describes only the energy decay variations in terms of exponential slopes.In the common-slope model, the degrees of freedom and consequently the number of model parameters is fairly small.More precisely, the common-slope model has (2 κ + 1) parameters for each frequency band, thus enabling a compact representation of inhomogeneous and anisotropic energy decay.In contrast, the CAPR model requires a large number of parameters, because all room modes must be modeled individually.Due to the compactness of the common-slope model, it can be used to drive computationally-efficient room acoustic simulators, such as [31].
For the special case of model order κ = 2, the common-slope model can be transformed into the parametric diffusion equation solution proposed by Luizard et al. [32].Their parametric model was intended to describe the inhomogeneous energy decay in coupled rooms by adjusting the decay amplitudes of two exponentials with fixed decay times according to the sourcereceiver distance and two heuristically determined parameters.Our model extends the model of Luizard et al. to general geometries and reverberation that is both inhomogeneous and anisotropic.
Lastly, the applicability of the common-slope model in various frequency bands should be discussed.The analyses presented in the preceding section showed that the common-slope model works well in the octave bands between 250 Hz to 2000 Hz.In these frequency bands, it yields fitting results that are comparable to the traditional multi-exponential model regarding the dB-MSE between the modelled and true EDF [c.f. ( 16)].However, in very low frequencies, room modes are usually more sparse.In such cases, it may be necessary to model the individual modes with distinct decay times, analogously to the CAPR model by Haneda et al. [10].Hence, for very low frequencies, it may not be possible to leverage the full potential of the common-slope model regarding its compactness.

VII. CONCLUSION
This article introduced the common-slope model for late reverberation.Its main idea is to use one common set of decay times to model large EDF sets, whose EDFs describe different source-receiver configurations within the same environment.Consequently, all directional and spatial energy decay variations are described as a weighted sum of exponential functions with fixed decay times and a noise term.Different approaches for determining the common decay times were explored, finding that the k-means clustering of decay times is the most general of them.It was shown that the common-slope model reduces the degrees of freedom in energy decay analysis considerably, while introducing only a small error between the modeled and the true EDF.Furthermore, the common-slope model enables a compact representation of inhomogeneous and anisotropic energy decay, thus making its analysis results easy to interpret.For example, in our evaluation, the common-slope analysis revealed acoustic phenomena like the fade-in of reverberation during the transition between coupled rooms, or the inhomogeneous energy decay caused by a highly non-uniform absorption distribution.The proposed model leverages its full potential in frequencies above the Schroeder frequency with a dense modal overlap, whereas traditional mode-wise processing might be advantageous in lower frequencies.
The common-slope model may benefit future research on room acoustic analysis and modeling.Furthermore, it will be valuable for all research fields relying on late reverberation models, such as dereverberation, echo cancellation, source separation, sound field equalization, and parametric spatial audio rendering.
Companion Page: A companion page with additional information, animations, and source code related to this article can be found at: http://research.spa.aalto.fi/publications/papers/ieeetaslp-common-slope/

Fig. 1 .
Fig. 1.Common-slope modeling of energy decay functions (EDFs) with spatial (i.e., source or receiver position varies, upper left part of the figure) and/or directional (i.e., source or receiver orientation varies, lower left part of the figure) variations.The green dots indicate different source-receiver configurations x.Although not explicitly depicted in the figure, it is important to understand that source-receiver configurations describe the entire source-receiver configuration, i.e., the position and orientation of both.The common-slope analysis of an acoustic environment involves three steps.Firstly, the EDFs d(x, t) of all available source-receiver configurations are analyzed with a standard decay analysis algorithm[45],[46] to obtain configuration-dependent T k,x values.Secondly, the T k,x values are clustered into κ mode groups to obtain the common decay times T k .Lastly, the common-slope amplitudes A k,x of the common-slope model [c.f.(9) and(10)] are determined via a least-squares fit to the EDFs [c.f.(13)].

Fig. 4 .
Fig.4.Room geometry of the investigated Room Transition dataset (RTD)[47],[48] and Extended RTD (ERTD).The orange × indicates the source position (no line-of-sight), the orange arrow indicates the main axis of the RTD's loudspeaker, the blue dots indicate the start-and endpoints of the transition measured in the RTD (0.05 m resolution), the blue line indicates the transition, and the gray grid indicates the receiver grid in the ERTD (0.2 m resolution on the horizontal plane at 1.55 m height).The irregularities on the right hallway wall indicate the more absorbent wall material in the ERTD2 variant of the dataset.

Fig. 7 .
Fig. 7. Common-slope analysis results (1 kHz octave band) of a scene consisting of two coupled rooms.The analysis is based on the Extended Room Transition dataset variant 1 (c.f.Section IV) and the common decay times were obtained with the k-means clustering approach (c.f.Section III-B4).The orange × indicates the sound source position.(a) and (b) show how the decay amplitudes of the common-slope model [c.f.(9)] vary for different receiver positions, with (a) depicting the A 1,x and (b) the A 2,x variations, respectively.The common decay times amount to T 1 = 0.46 s and T 2 = 1.51 s.The variations of the A 2,x values highlight how the reverberation of the more reverberant hallway spreads into the less reverberant meeting room.(c) indicates that the common-slope model introduces only a small decibel-based mean squared error (dB-MSE) between the modeled and true energy decay [c.f.(16)].A perfect fit would yield a dB-MSE of 0 dB.

Fig. 6 (
Fig.6(c) and (f) show that the common-slope model introduces only little dB-MSE between the modeled and true EDFs, with average and 95% quantile values of 0.25 dB and 0.63 dB, respectively.

Fig. 8 .
Fig. 8. Common-slope analysis results (1 kHz octave band) of a scene consisting of two coupled rooms, with the rightmost wall being highly absorbent.The analysis is based on the Extended Room Transition dataset variant 2 (c.f.Section IV) and the common decay times were obtained with the k-means clustering approach (c.f.Section III-B4).The orange × indicates the sound source position.(a), (b), and (c) show how the decay amplitudes of the common-slope model [c.f.(9)] vary for different receiver positions, with (a) depicting the A 1,x , (b) the A 2,x , and (c) the A 3,x variations, respectively.The common decay times amount to T 1 = 0.44 s, T 2 = 0.84 s, and T 3 = 1.89 s.In addition to the reverberation cross-fade that was already observed in Fig. 7, the amplitude variations in (b) and (c) highlight the effect of the highly absorbent right hallway wall.(d) indicates that the common-slope model introduces only a small decibel-based mean squared error (dB-MSE) between the modeled and true energy decay [c.f.(16)].A perfect fit would yield a dB-MSE of 0 dB.