Abstract:
Audio signal processing systems often operate differently depending on the recording devices, leading to performance discrepancies. Therefore, it is important to know abo...Show MoreMetadata
Abstract:
Audio signal processing systems often operate differently depending on the recording devices, leading to performance discrepancies. Therefore, it is important to know about the characteristics of the recording device; however, it is difficult to know the device’s behavior in most cases. In this study, we propose a diffusion-model-based device characteristic transfer to estimate the device’s frequency response only with the recorded signals. By joint-training the conditional and unconditional diffusion models, it is found that non-linear distortions and some filtered signals are reflected more than by only training the conditional model. We show that the proposed method transfers the style closely to the ground truth not only visually on the spectrogram but also the t-distributed stochastic neighbor embedding distribution and the performance of the device classifier. We also show the proposed method enhancing the performance as a data augmentation method for acoustic scene classification.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: