Issue 2 • Date Mar 2003
Presented is a new coding paradigm, multimode transform predictive coding (MTPC), which combines speech and audio coding principles in a single coding structure. The paradigm is an adaptive coding paradigm which automatically adjusts how different coding modules are used based on the input signal. This allows MTPC coders to robustly handle a wider range of signals than single configuration (mode) transform predictive coding (TPC) designs. A wideband MTPC coder design targeting two-way communication applications and bitrates from 13 to 40 kbit/s is also presented. Subjective absolute category rating test results on speech, speech in noise and music demonstrate that the performance at 16, 24 and 32 kbit/s meets or exceeds that of ITU-T Rec. G.722 at 48, 56 and 64 kbit/s respectively for many coding conditions. Subjective Reference-ABx (R-ABx) tests are also included to show the potential advantages of the multimode coder over a single mode TPC coder. Finally, possible improvements in the MTPC coder design for applications such as broadcasting, which are less sensitive to delay and encoder complexity, are discussed. View full abstract»
The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speechPage(s): 109 - 116
Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T>P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments. View full abstract»
A computationally efficient, high quality, vector quantization scheme based on a parametric probability density function (PDF) is proposed. In this scheme, the observations are modeled as i.i.d realizations of a multivariate Gaussian mixture density. The mixture model parameters are efficiently estimated using the expectation maximization (EM) algorithm. A low complexity quantization scheme using transform coding and bit allocation techniques which allows for easy mapping from observation to quantized value is developed for both fixed rate and variable rate systems. An attractive feature of this method is that source encoding using the resultant codebook involves very few searches and its computational complexity is minimal and independent of the rate of the system. Furthermore, the proposed scheme is bit scalable and can switch seamlessly between a memoryless quantizer and a quantizer with memory. The usefulness of the approach is demonstrated for speech coding where Gaussian mixture models are used to model speech line spectral frequencies. The performance of the memoryless quantizer is 1-3 bits better than conventional quantization schemes. View full abstract»
For high quality acoustic echo cancellation long echoes have to be suppressed. classical LMS-based adaptive filters are not attractive as they are suboptimal from a computational point of view. Multirate adaptive filters such as the partitioned block frequency-domain adaptive filter (PBFDAF) are good alternatives and are widely used in commercial echo cancellers nowadays. In this paper the PBFDRAP is analyzed, which combines frequency-domain adaptive filtering with so-called "row action projection." Fast versions of the algorithm are derived and it is shown that the PBFDRAP outperforms the PBFDAF in a realistic echo cancellation setup. View full abstract»
Aims & Scope
Covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.
This Transactions ceased publication in 2005. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.