By Topic

A Probabilistic Interaction Model for Multipitch Tracking With Factorial Hidden Markov Models

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Wohlmayr, M. ; Signal Process. & Speech Commun. Lab. (SPSC), Graz Univ. of Technol., Graz, Austria ; Stark, M. ; Pernkopf, F.

We present a simple and efficient feature modeling approach for tracking the pitch of two simultaneously active speakers. We model the spectrogram features of single speakers using Gaussian mixture models in combination with the minimum description length model selection criterion. To obtain a probabilistic representation for the speech mixture spectrogram features of both speakers, we employ the mixture maximization model (MIXMAX) and, as an alternative, a linear interaction model. A factorial hidden Markov model is applied for tracking pitch over time. This statistical model can be used for applications beyond speech, whenever the interaction between individual sources can be represented as MIXMAX or linear model. For tracking, we use the loopy max-sum algorithm, and provide empirical comparisons to exact methods. Furthermore, we discuss a scheduling mechanism of loopy belief propagation for online tracking. We demonstrate experimental results using Mocha-TIMIT as well as data from the speech separation challenge provided by Cooke We show the excellent performance of the proposed method in comparison to a well known multipitch tracking algorithm based on correlogram features. Using speaker-dependent models, the proposed method improves the accuracy of correct speaker assignment, which is important for single-channel speech separation. In particular, we are able to reduce the overall tracking error by 51% relative for the speaker-dependent case. Moreover, we use the estimated pitch trajectories to perform single-channel source separation, and demonstrate the beneficial effect of correct speaker assignment on speech separation performance.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 4 )