By Topic

Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yun Wang ; Dept. of Electron. Eng., Tsinghua Univ., Beijing, China ; Zhijian Ou

Modern monaural voice and accompaniment separation systems usually consist of two main modules: melody extraction and time frequency masking. A main distinction between different separation systems lies in what approaches are used for the two modules. Popular techniques for melody extraction include hidden Markov models (HMMs) and non-negative matrix factorization (NMF), and masking includes hard and soft masking. This paper investigates the flaw of NMF-based melody extraction, and proposes the combination of HMM-based melody extraction (equipped with a newly-defined feature) and NMF-based soft masking. Evaluations on two publicly available databases show that the proposed system reaches state-of the-art performance and outperforms several other combinations.

Published in:

Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on

Date of Conference:

22-27 May 2011