Efficient Spatial Audio Rendering Via Differentiable FIR To IIR Estimation | IEEE Conference Publication | IEEE Xplore

Efficient Spatial Audio Rendering Via Differentiable FIR To IIR Estimation


Abstract:

The MPEG-H standard for spatial audio proposes the rendering of multiple auditory objects (up to 16) and ambisonics to create a spatial audio scene. Convolution of these ...Show More

Abstract:

The MPEG-H standard for spatial audio proposes the rendering of multiple auditory objects (up to 16) and ambisonics to create a spatial audio scene. Convolution of these (and their early environmental reflections) with user-specific Head Related Impulse Responses (HRIRs), and a treatment of the late tail of the room reverberation, is the gold standard for creating a spatial audio scene. However, this is expensive both in terms of computational time/battery power for finite impulse response (FIR) convolution, and device memory required to store the HRIRs. If quality could be maintained, an implementation with equivalent infinite impulse response (IIR) filters would mitigate these costs. We propose a novel differentiable optimization approach for determination of a IIR filter cascade from a given FIR filter. This is done via an application specific formulation that yields a convex and differentiable cost function for such conversion. We describe our results for spatial audio rendering of HRIR convolution. We compare our work against a recent neural network based HRIR estimation in terms of accuracy and speed. Finally, we implemented our approach in a real-time setting, suitable for implementation on DSP hardware, and conducted a small user study. Results from human participants were positive.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

References

References is not available for this document.