Skip to Main Content
Robust data retrieval in the presence of uncertainty is a challenging problem in multimedia information retrieval. In query-by-humming (QBH) systems, uncertainty can arise in query formulation due to user-dependent variability, such as incorrectly hummed notes, and in query transcription due to machine-based errors, such as insertions and deletions. We propose a fingerprinting (FP) algorithm for representing salient melodic information so as to better compare potentially noisy voice queries with target melodies in a database. The FP technique is employed in the QBH system back end; a hidden Markov model (HMM) front end segments and transcribes the hummed audio input into a symbolic representation. The performance of the FP search algorithm is compared to the conventional edit distance (ED) technique. Our retrieval database is built on 1500 MIDI files and evaluated using 400 hummed samples from 80 people with different musical backgrounds. A melody retrieval accuracy of 88% is demonstrated for humming samples from musically trained subjects, and 70% for samples from untrained subjects, for the FP algorithm. In contrast, the widely used ED method achieves 86% and 62% accuracy rates, respectively, for the same samples, thus suggesting that the proposed FP technique is more robust under uncertainty, particularly for queries by musically untrained users.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:16 , Issue: 2 )
Date of Publication: Feb. 2008