# IEEE/ACM Transactions on Audio, Speech, and Language Processing

• ### 3D Room Geometry Inference Based on Room Impulse Response Stacks

Publication Year: 2018, Page(s):857 - 872
Room geometry inference is concerned with the localization of reflective boundaries in an enclosed space. This paper outlines a method for inferring room geometry based on the positions of loudspeakers and real or image microphones, which are computed using sets of times of arrival (TOAs) obtained from room impulse responses (RIRs). These RIRs describe the acoustic propagation between the loudspea... View full abstract»

• ### Language/Dialect Recognition Based on Unsupervised Deep Learning

Publication Year: 2018, Page(s):873 - 882
Over the past decade, bottleneck features within an i-Vector framework have been used for state-of-the-art language/dialect identification (LID/DID). However, traditional bottleneck feature extraction requires additional transcribed speech information. Alternatively, two types of unsupervised deep learning methods are introduced in this study. To address this limitation, an unsupervised bottleneck... View full abstract»

• ### Waveform Modeling and Generation Using Hierarchical Recurrent Neural Networks for Speech Bandwidth Extension

Publication Year: 2018, Page(s):883 - 894
This paper presents a waveform modeling and generation method using hierarchical recurrent neural networks (HRNN) for speech bandwidth extension (BWE). Different from conventional BWE methods that predict spectral parameters for reconstructing wideband speech waveforms, this BWE method models and predicts waveform samples directly without using vocoders. Inspired by SampleRNN, which is an uncondit... View full abstract»

• ### Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation

Publication Year: 2018, Page(s):895 - 908
The adaptation of automatic speech recognition systems to a speaker or an environment is important if we are to achieve high speech recognition performance ubiquitously. Recently, deep neural network (DNN) based acoustic models have been made adaptive to speakers or environments by the addition of an auxiliary feature representing the acoustic context information such as speaker or noise character... View full abstract»

• ### Two-Microphone Hearing Aids Using Prediction Error Method for Adaptive Feedback Control

Publication Year: 2018, Page(s):909 - 923
A challenge in hearing aids is adaptive feedback control which often uses an adaptive filter to estimate the feedback path. This estimate of the feedback path usually results in a bias due to the correlation between the loudspeaker signal and the incoming signal. The prediction error method (PEM) is a popular method for reducing this bias for adaptive feedback control (AFC) in hearing aids, provid... View full abstract»

• ### Periphony-Lattice Mixed-Order Ambisonic Scheme for Spherical Microphone Arrays

Publication Year: 2018, Page(s):924 - 936
Most methods for sound field reconstruction and spherical beamforming with spherical microphone arrays are mathematically based on the spherical harmonics expansion. In many cases, this expansion is truncated at a certain order as in higher order ambisonics (HOA). This truncation leads to performance that is independent of the incident direction of the sound waves. On the other hand, mixed-order a... View full abstract»

• ### Phase-Aware Single-Channel Speech Enhancement With Modulation-Domain Kalman Filtering

Publication Year: 2018, Page(s):937 - 950
We present a speech enhancement algorithm that performs modulation-domain Kalman filtering to track the speech phase using circular statistics, along with the spectral log-amplitudes of speech and noise. In the proposed algorithm, the speech phase posterior is used to create an enhanced speech phase spectrum for the signal reconstruction of speech. The Kalman filter prediction step separately mode... View full abstract»

• ### Statistical Analysis of the Multichannel Wiener Filter Using a Bivariate Normal Distribution for Sample Covariance Matrices

Publication Year: 2018, Page(s):951 - 966
This paper studies the statistical performance of the multichannel Wiener filter (MWF) when the weights are computed using estimates of the sample covariance matrices of the noisy and the noise signals. It is well known that the optimal weights of the minimum variance distortionless response beamformer are only determined by the noisy sample covariance matrix or the noise sample covariance matrix,... View full abstract»

• ### Acoustic Denoising Using Dictionary Learning With Spectral and Temporal Regularization

Publication Year: 2018, Page(s):967 - 980
We present a method for speech enhancement of data collected in extremely noisy environments, such as those obtained during magnetic resonance imaging scans. We propose an algorithm based on dictionary learning to perform this enhancement. We use complex nonnegative matrix factorization with intrasource additivity (CMF-WISA) to learn dictionaries of the noise and speech+noise portions of the data ... View full abstract»

• ### Pseudo-Determined Blind Source Separation for Ad-hoc Microphone Networks

Publication Year: 2018, Page(s):981 - 994
We propose a pseudo-determined blind source separation framework that exploits the information from a large number of microphones in an ad-hoc network to extract and enhance sound sources in a reverberant scenario. After compensating for the time offsets and sampling rate mismatch between (asynchronous) signals, we interpret as a determined $Mtimes M$ View full abstract»

• ### Scoring Heterogeneous Speaker Vectors Using Nonlinear Transformations and Tied PLDA Models

Publication Year: 2018, Page(s):995 - 1009
Most current state-of-the-art text-independent speaker recognition systems are based on i-vectors, and on probabilistic linear discriminant analysis (PLDA). PLDA assumes that the i-vectors of a trial are homogeneous, i.e., that they have been extracted by the same system. In other words, the enrollment and test i-vectors belong to the same class. However, it is sometimes important to score trials ... View full abstract»

• ### Subjective and Objective Sound-Quality Evaluation of Adaptive Feedback Cancellation Algorithms

Publication Year: 2018, Page(s):1010 - 1024
Objective measures are widely used for the perceptual sound-quality evaluation of audio signal processing algorithms. Nevertheless, the use of subjective-evaluation measures remains relevant, in particular when application-specific objective measures are lacking. In this paper, we present a perceptual sound-quality evaluation of different algorithms for adaptive feedback cancellation (AFC), with b... View full abstract»

## Aims & Scope

The IEEE/ACM Transactions on Audio, Speech, and Language Processing is dedicated to innovative theory and methods for processing signals representing audio, speech and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems.

