Abstract:
Automatic music transcription (AMT) takes a music recording and outputs a transcription of the underlying music. Deep learning models trained for AMT rely on large amount...Show MoreMetadata
Abstract:
Automatic music transcription (AMT) takes a music recording and outputs a transcription of the underlying music. Deep learning models trained for AMT rely on large amounts of annotated training data, which are available only for some domains such as Western classical piano music. Using pre-trained models on out-of-domain inputs can lead to significantly lower performance. Fine-tuning or retraining on new target domains is expensive and relies on the presence of labeled data. In this work, we propose a method for taking a pre-trained transcription model and improving its performance on out-of-domain data without the need for any training data, requiring no fine-tuning or retraining of the original model. Our method uses the model to transcribe pitch-shifted versions of an input, aggregating the output across these versions where the original model is unsure. We take a model originally trained for piano transcription and present experiments under two domain shift scenarios: recording condition mismatch (piano with different recording setups) and instrument mismatch (guitar and choral data). We show that our method consistently improves note- and frame-based performance.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: