Publication Year: 2013, Page(s):1121 - 1122
| PDF (209 KB)
• ### Multi-Microphone Noise Reduction Based on Orthogonal Noise Signal Decompositions

Publication Year: 2013, Page(s):1123 - 1133
Cited by:  Papers (4)
| | PDF (3871 KB) | HTML

Multi-microphone noise reduction plays an increasing and important role in acoustic communication systems. Existing multichannel noise reduction filters are commonly computed based on a single noise covariance matrix. Recently, an orthogonal noise signal decomposition was proposed that uses a single noise signal as a reference. Using this decomposition, it was possible to reformulate the noise red... View full abstract»

• ### Cross-Lingual Language Modeling for Low-Resource Speech Recognition

Publication Year: 2013, Page(s):1134 - 1144
Cited by:  Papers (1)
| | PDF (1735 KB) | HTML

This paper proposes using cross-lingual language modeling with syntactic information for low-resource speech recognition. We propose phrase-level transduction and syntactic reordering for transcribing a resource-poor language and translating it into a resource-rich language, if necessary. The phrase-level transduction is capable of performing n -m cross-lingual transduction. The synt... View full abstract»

• ### Noise Estimation Using a Constrained Sequential Hidden Markov Model in the Log-Spectral Domain

Publication Year: 2013, Page(s):1145 - 1157
Cited by:  Papers (2)
| | PDF (3912 KB) | HTML

The temporal correlation of speech presence/absence is widely used in noise estimation. The most popular technique for exploiting temporal correlation is the smoothing of noisy spectra using a time-recursive filter, in which the forgetting factor is controlled by speech presence probability. However, this technique is not unified into a theoretical framework that enables optimal noise estimation. ... View full abstract»

• ### Large Scale Distributed Acoustic Modeling With Back-Off ${rm N}$ -Grams

Publication Year: 2013, Page(s):1158 - 1169
Cited by:  Papers (3)
| | PDF (1815 KB) | HTML

The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly b... View full abstract»

• ### Wavelet Maxima Dispersion for Breathy to Tense Voice Discrimination

Publication Year: 2013, Page(s):1170 - 1179
Cited by:  Papers (7)
| | PDF (1154 KB) | HTML

This paper proposes a new parameter, the Maxima Dispersion Quotient (MDQ), for differentiating breathy to tense voice. Maxima derived following wavelet decomposition are often used for detecting edges in image processing, where locations of these maxima organize in the vicinity of the edge location. Similarly for tense voice, which typically displays sharp glottal closing characteristics, maxima f... View full abstract»

• ### Learning Phrase Patterns for Text Classification

Publication Year: 2013, Page(s):1180 - 1189
Cited by:  Papers (2)
| | PDF (1239 KB) | HTML

This paper introduces methods to discriminatively learn phrase patterns for use as features in text classification. An efficient solution is described using a recursive algorithm with a mutual information selection criterion. The algorithm automatically determines when word classes are useful in specific locations of a phrase pattern, allowing for variable specificity depending on the amount of la... View full abstract»

• ### A Simple Prior for Audio Signals

Publication Year: 2013, Page(s):1190 - 1200
Cited by:  Papers (1)
| | PDF (2936 KB) | HTML

We propose a simple prior for restoration problems involving oscillatory signals. The prior makes use of an underlying analytic frame decomposition with narrow subbands. Other than this, the prior does not have any other parameters, which makes it simple to use and apply. We demonstrate the utility of the proposed prior through some real audio restoration experiments. View full abstract»

• ### Compensation of Loudspeaker–Room Responses in a Robust MIMO Control Framework

Publication Year: 2013, Page(s):1201 - 1216
Cited by:  Papers (8)
| | PDF (4873 KB) | HTML

A new multichannel approach to robust broadband loudspeaker-room equalization is presented. Traditionally, the equalization (or room correction) problem has been treated primarily by single-channel methods, where loudspeaker input signals are prefiltered individually by separate scalar filters. Single-channel methods are generally able to improve the average spectral flatness of the acoustic trans... View full abstract»

• ### Pairwise Discriminative Speaker Verification in the ${rm I}$-Vector Space

Publication Year: 2013, Page(s):1217 - 1227
Cited by:  Papers (16)
| | PDF (3233 KB) | HTML

This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that disc... View full abstract»

• ### Position-Dependent Crosstalk Cancellation Using Space Partitioning

Publication Year: 2013, Page(s):1228 - 1239
| | PDF (2446 KB) | HTML

The present study tested a new stereo playback system that effectively cancels cross-talk signals at an arbitrary listening position. Such a playback system was implemented by integrating listener position tracking techniques and crosstalk cancellation techniques. The entire listening space was partitioned into a number of non-overlapped cells and a crosstalk cancellation filter was assigned to ea... View full abstract»

• ### Underdetermined Sound Source Separation Using Power Spectrum Density Estimated by Combination of Directivity Gain

Publication Year: 2013, Page(s):1240 - 1250
Cited by:  Papers (28)
| | PDF (2427 KB) | HTML

A method for separating underdetermined sound sources based on a novel power spectral density (PSD) estimation is proposed. The method enables up to M(M-1)+1 sources to be separated when we use a microphone array of M sensors and a Wiener post-filter calculated by the estimated PSDs. The PSD of a beamformer's output is modelled by a mixture of source PSDs multiplied by the bea... View full abstract»

• ### Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Publication Year: 2013, Page(s):1251 - 1260
Cited by:  Papers (3)  |  Patents (4)
| | PDF (1833 KB) | HTML

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be eval... View full abstract»

• ### Voice Activity Detection in Presence of Transient Noise Using Spectral Clustering

Publication Year: 2013, Page(s):1261 - 1271
Cited by:  Papers (23)
| | PDF (2437 KB) | HTML

Voice activity detection has attracted significant research efforts in the last two decades. Despite much progress in designing voice activity detectors, voice activity detection (VAD) in presence of transient noise is a challenging problem. In this paper, we develop a novel VAD algorithm based on spectral clustering methods. We propose a VAD technique which is a supervised learning algorithm. Thi... View full abstract»

• ### Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples

Publication Year: 2013, Page(s):1272 - 1284
Cited by:  Papers (7)  |  Patents (1)
| | PDF (1938 KB) | HTML

Spoken term detection (STD) is a key technology for retrieval of spoken content, which will be very important to retrieve and browse multimedia content over the Internet. The discriminative capability of machine learning methods has recently been used to facilitate STD. This paper presents a new approach to improve STD using support vector machines (SVM) based on acoustic information. The concept ... View full abstract»

• ### Eigentriphones for Context-Dependent Acoustic Modeling

Publication Year: 2013, Page(s):1285 - 1294
Cited by:  Papers (4)
| | PDF (1125 KB) | HTML

Most automatic speech recognizers employ tied-state triphone hidden Markov models (HMM), in which the corresponding triphone states of the same base phone are tied. State tying is commonly performed with the use of a phonetic regression class tree which renders robust context-dependent modeling possible by carefully balancing the amount of training data with the degree of tying. However, tying ine... View full abstract»

• ### Lexical Prefix Tree and WFST: A Comparison of Two Dynamic Search Concepts for LVCSR

Publication Year: 2013, Page(s):1295 - 1307
Cited by:  Papers (4)  |  Patents (2)
| | PDF (1447 KB) | HTML

Dynamic network decoders have the advantage of significantly lower memory consumption compared to static network decoders, especially when huge vocabularies and complex language models are required. This paper compares the properties of two well-known search strategies for dynamic network decoding, namely history conditioned lexical tree search and weighted finite-state transducer-based search usi... View full abstract»

• ### Correction to `Speaker Diarization: A Review of Recent Research' [Feb 12 356-370]

Publication Year: 2013, Page(s): 1308
| | PDF (86 KB) | HTML

In the following two articles, the author name "Xavier Anguera Miro" was published mistakenly. It should have been "Xavier Anguera." Please use "Xavier Anguera" when referencing these articles. [1] X. Anguera, S. Bozonnet, N. Evans, C. Fredouille, G. Friedland, and O. Vinyals, "Speaker diarization: A review of recent research," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 356-370... View full abstract»

