By Topic

Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Unal, E. ; Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA ; Chew, E. ; Georgiou, P.G. ; Narayanan, S.S.

Robust data retrieval in the presence of uncertainty is a challenging problem in multimedia information retrieval. In query-by-humming (QBH) systems, uncertainty can arise in query formulation due to user-dependent variability, such as incorrectly hummed notes, and in query transcription due to machine-based errors, such as insertions and deletions. We propose a fingerprinting (FP) algorithm for representing salient melodic information so as to better compare potentially noisy voice queries with target melodies in a database. The FP technique is employed in the QBH system back end; a hidden Markov model (HMM) front end segments and transcribes the hummed audio input into a symbolic representation. The performance of the FP search algorithm is compared to the conventional edit distance (ED) technique. Our retrieval database is built on 1500 MIDI files and evaluated using 400 hummed samples from 80 people with different musical backgrounds. A melody retrieval accuracy of 88% is demonstrated for humming samples from musically trained subjects, and 70% for samples from untrained subjects, for the FP algorithm. In contrast, the widely used ED method achieves 86% and 62% accuracy rates, respectively, for the same samples, thus suggesting that the proposed FP technique is more robust under uncertainty, particularly for queries by musically untrained users.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:16 ,  Issue: 2 )