Loading [MathJax]/extensions/MathZoom.js
Ensemble learning for speech enhancement | IEEE Conference Publication | IEEE Xplore

Ensemble learning for speech enhancement


Abstract:

Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts o...Show More

Abstract:

Over the years, countless algorithms have been proposed to solve the problem of speech enhancement from a noisy mixture. Many have succeeded in improving at least parts of the signal, while often deteriorating others. Based on the assumption that different algorithms are likely to enjoy different qualities and suffer from different flaws, we investigate the possibility of combining the strengths of multiple speech enhancement algorithms, formulating the problem in an ensemble learning framework. As a first example of such a system, we consider the prediction of a time-frequency mask obtained from the clean speech, based on the outputs of various algorithms applied on the noisy mixture. We consider several approaches involving various notions of context and various machine learning algorithms for classification, in the case of binary masks, and regression, in the case of continuous masks. We show that combining several algorithms in this way can lead to an improvement in enhancement performance, while simple averaging or voting techniques fail to do so.
Date of Conference: 20-23 October 2013
Date Added to IEEE Xplore: 09 January 2014
Electronic ISBN:978-1-4799-0972-8

ISSN Information:

Conference Location: New Paltz, NY, USA

1. INTRODUCTION

Speech enhancement methods attempt to improve the quality and intelligibility of speech that has been degraded by interfering noise or other processes. One thing that makes this problem difficult is that the interference can come in many different varieties. To further complicate matters, often the operational constraints on computation and latency preclude the use of complex models that can represent and adapt to many different noise types. As it is difficult for a simple algorithm to accommodate the variety of conditions, some assumptions about the statistical properties of the target and interference signals have to be made. Over the years, many different algorithms have been proposed, each having different explicit or implicit assumptions about the nature of the speech and interference [1]. Assuming that the strengths and weaknesses of a set of algorithms differ, it would be desirable to combine them in a way that takes advantage of all their strengths.

Contact IEEE to Subscribe

References

References is not available for this document.