Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Musical Instrument Recognition in Polyphonic Audio Using Missing Feature Approach

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Giannoulis, D. ; Sch. of Electron. Eng. & Comput. Sci., Queen Mary Univ. of London, London, UK ; Klapuri, A.

A method is described for musical instrument recognition in polyphonic audio signals where several sound sources are active at the same time. The proposed method is based on local spectral features and missing-feature techniques. A novel mask estimation algorithm is described that identifies spectral regions that contain reliable information for each sound source, and bounded marginalization is then used to treat the feature vector elements that are determined to be unreliable. The mask estimation technique is based on the assumption that the spectral envelopes of musical sounds tend to be slowly-varying as a function of log-frequency and unreliable spectral components can therefore be detected as positive deviations from an estimated smooth spectral envelope. A computationally efficient algorithm is proposed for marginalizing the mask in the classification process. In simulations, the proposed method clearly outperforms reference methods for mixture signals. The proposed mask estimation technique leads to a recognition accuracy that is approximately half-way between a trivial all-one mask (all features are assumed reliable) and an ideal “oracle” mask.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:21 ,  Issue: 9 )