Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Automatic Beamforming for Blind Extraction of Speech From Music Environment Using Variance of Spectral Flux-Inspired Criterion

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tao Yu ; Center for Robust Speech Syst. (CRSS), Univ. of Texas, Dallas, TX, USA ; Hansen, J.H.L.

This paper addresses the problem of automatic beamforming for blind extraction of speech in a music environment, using multiple microphones. A new criterion is proposed based on the variance of the spectral flux (VSF), which is shown to be a compound measure of the Kurtosis and across-time correlation for the time-frequency domain signals. Spectral flux (SF) had been adopted as a feature that distinguishes speech from other acoustic noises and the VSF of speech tends to be larger than that of other acoustic sounds. Henceforth, maximization of VSF can be employed as one potential criterion to identify the speech direction-of-arrival (DOA), in order to extract speech from the noisy observations. We construct a VSF-inspired cost function and develop a complex-value fixed-point algorithm for the optimization. Then, the stability of the proposed algorithm is analyzed based on the second-order Taylor series expansion. Rather than the DOA identification ambiguity caused by subspace decomposition-based methods or maximization of non-Gaussianity-based approaches, both real and simulated evaluations indicate that the VSF-inspired criterion can effectively extract speech from a music diffuse noise field or a musical interference noise field. A key feature of the proposed approach is that it can operate blindly, i.e., it does not require a priori knowledge about the array geometry, the noise covariance matrix, or the geometrical knowledge of the location of desired speech. Therefore, this study offers a potential perspective for blindly extracting speech from a music environment.

Published in:

Selected Topics in Signal Processing, IEEE Journal of  (Volume:4 ,  Issue: 5 )