By Topic

Audio, Speech, and Language Processing, IEEE Transactions on

Issue 2 • Date Feb. 2008

Filter Results

Displaying Results 1 - 25 of 25
  • Table of contents

    Publication Year: 2008 , Page(s): C1 - C4
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing publication information

    Publication Year: 2008 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (36 KB)  
    Freely Available from IEEE
  • Introduction to the Special Issue on Music Information Retrieval

    Publication Year: 2008 , Page(s): 253 - 254
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | PDF file iconPDF (450 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Multipitch Analysis of Polyphonic Music and Speech Signals Using an Auditory Model

    Publication Year: 2008 , Page(s): 255 - 266
    Cited by:  Papers (35)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (474 KB) |  | HTML iconHTML  

    A method is described for estimating the fundamental frequencies of several concurrent sounds in polyphonic music and multiple-speaker speech signals. The method consists of a computational model of the human auditory periphery, followed by a periodicity analysis mechanism where fundamental frequencies are iteratively detected and canceled from the mixture signal. The auditory model needs to be co... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminating Between Pitched Sources in Music Audio

    Publication Year: 2008 , Page(s): 267 - 277
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (645 KB) |  | HTML iconHTML  

    Though humans find it relatively easy to identify and/or isolate different sources within polyphonic music, the emulation of this ability by a computer is a challenging task, and one that has direct relevance to music content description and information retrieval applications. For an automated system without any prior knowledge of a recording, a possible solution is to perform an initial segmentat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Normalized Cuts for Predominant Melodic Source Separation

    Publication Year: 2008 , Page(s): 278 - 290
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (940 KB) |  | HTML iconHTML  

    The predominant melodic source, frequently the singing voice, is an important component of musical signals. In this paper, we describe a method for extracting the predominant source and corresponding melody from ldquoreal-worldrdquo polyphonic music. The proposed method is inspired by ideas from computational auditory scene analysis. We formulate predominant melodic source tracking and formation a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized Audio

    Publication Year: 2008 , Page(s): 291 - 301
    Cited by:  Papers (32)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    We describe an acoustic chord transcription system that uses symbolic data to train hidden Markov models and gives best-of-class frame-level recognition results. We avoid the extremely laborious task of human annotation of chord names and boundaries-which must be done to provide machine learning models with ground truth-by performing automatic harmony analysis on symbolic music files. In parallel,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distortion Estimation in Compressed Music Using Only Audio Fingerprints

    Publication Year: 2008 , Page(s): 302 - 317
    Cited by:  Papers (7)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1169 KB) |  | HTML iconHTML  

    An audio fingerprint is a compact yet very robust representation of the perceptually relevant parts of an audio signal. It can be used for content-based audio identification, even when the audio is severely distorted. Audio compression changes the fingerprint slightly. We show that these small fingerprint differences due to compression can be used to estimate the signal-to-noise ratio (SNR) of the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structural Segmentation of Musical Audio by Constrained Clustering

    Publication Year: 2008 , Page(s): 318 - 326
    Cited by:  Papers (29)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1286 KB) |  | HTML iconHTML  

    We describe a method of segmenting musical audio into structural sections based on a hierarchical labeling of spectral features. Frames of audio are first labeled as belonging to one of a number of discrete states using a hidden Markov model trained on the features. Histograms of neighboring frames are then clustered into segment-types representing distinct distributions of states, using a cluster... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unified View of Prediction and Repetition Structure in Audio Signals With Application to Interest Point Detection

    Publication Year: 2008 , Page(s): 327 - 337
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1368 KB) |  | HTML iconHTML  

    In this paper, we present a new method for analysis of musical structure that captures local prediction and global repetition properties of audio signals in one information processing framework. The method is motivated by a recent work in music perception where machine features were shown to correspond to human judgments of familiarity and emotional force when listening to music. Using a notion of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals

    Publication Year: 2008 , Page(s): 338 - 349
    Cited by:  Papers (6)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1641 KB) |  | HTML iconHTML  

    We present LyricAlly, a prototype that automatically aligns acoustic musical signals with their corresponding textual lyrics, in a manner similar to manually-aligned karaoke. We tackle this problem based on a multimodal approach, using an appropriate pairing of audio and text processing to create the resulting prototype. LyricAlly's acoustic signal processing uses standard audio features but const... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming

    Publication Year: 2008 , Page(s): 350 - 358
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (501 KB) |  | HTML iconHTML  

    This paper presents the mathematical formulation and design methodology of progressive filtering (PF) for multimedia information retrieval, and discusses its application to the so-called query by singing/humming (QBSH), or more formally, melody recognition. The concept of PF and the corresponding dynamic programming-based design method are applicable to large multimedia retrieval systems for strik... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach

    Publication Year: 2008 , Page(s): 359 - 371
    Cited by:  Papers (9)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1181 KB) |  | HTML iconHTML  

    Robust data retrieval in the presence of uncertainty is a challenging problem in multimedia information retrieval. In query-by-humming (QBH) systems, uncertainty can arise in query formulation due to user-dependent variability, such as incorrectly hummed notes, and in query transcription due to machine-based errors, such as insertions and deletions. We propose a fingerprinting (FP) algorithm for r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Searching Musical Audio Using Symbolic Queries

    Publication Year: 2008 , Page(s): 372 - 381
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1153 KB) |  | HTML iconHTML  

    Finding a piece of music based on its content is a key problem in music in for music information retrieval . For example, a user may be interested in finding music based on knowledge of only a small fragment of the overall tune. In this paper, we consider the searching of musical audio using symbolic queries. We first propose a relative pitch approach for representing queries and pieces. Experimen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Index-Based Audio Matching

    Publication Year: 2008 , Page(s): 382 - 395
    Cited by:  Papers (20)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1072 KB) |  | HTML iconHTML  

    Given a large audio database of music recordings, the goal of classical audio identification is to identify a particular audio recording by means of a short audio fragment. Even though recent identification algorithms show a significant degree of robustness towards noise, MP3 compression artifacts, and uniform temporal distortions, the notion of similarity is rather close to the identity. In this ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Quick Search Method for Audio Signals Based on a Piecewise Linear Representation of Feature Trajectories

    Publication Year: 2008 , Page(s): 396 - 407
    Cited by:  Papers (6)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1463 KB) |  | HTML iconHTML  

    This paper presents a new method for a quick similarity-based search through long unlabeled audio streams to detect and locate audio clips provided by users. The method involves feature-dimension reduction based on a piecewise linear representation of a sequential feature trajectory extracted from a long audio stream. Two techniques enable us to obtain a piecewise linear representation: the dynami... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computational Models of Similarity for Drum Samples

    Publication Year: 2008 , Page(s): 408 - 423
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (1937 KB) |  | HTML iconHTML  

    In this paper, we optimize and evaluate computational models of similarity for sounds from the same instrument class. We investigate four instrument classes: bass drums, snare drums, high-pitched toms, and low-pitched toms. We evaluate two similarity models: one is defined in the ISO/IEC MPEG-7 standard, and the other is based on auditory images. For the second model, we study the impact of variou... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Musical Genre Classification Using Nonnegative Matrix Factorization-Based Features

    Publication Year: 2008 , Page(s): 424 - 434
    Cited by:  Papers (24)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (503 KB) |  | HTML iconHTML  

    Nonnegative matrix factorization (NMF) is used to derive a novel description for the timbre of musical sounds. Using NMF, a spectrogram is factorized providing a characteristic spectral basis. Assuming a set of spectrograms given a musical genre, the space spanned by the vectors of the obtained spectral bases is modeled statistically using mixtures of Gaussians, resulting in a description of the s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Hybrid Music Recommender System Using an Incrementally Trainable Probabilistic Generative Model

    Publication Year: 2008 , Page(s): 435 - 447
    Cited by:  Papers (26)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (975 KB) |  | HTML iconHTML  

    This paper presents a hybrid music recommender system that ranks musical pieces while efficiently maintaining collaborative and content-based data, i.e., rating scores given by users and acoustic features of audio signals. This hybrid approach overcomes the conventional tradeoff between recommendation accuracy and variety of recommended artists. Collaborative filtering, which is used on e-commerce... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Regression Approach to Music Emotion Recognition

    Publication Year: 2008 , Page(s): 448 - 457
    Cited by:  Papers (46)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (845 KB) |  | HTML iconHTML  

    Content-based retrieval has emerged in the face of content explosion as a promising approach to information access. In this paper, we focus on the challenging issue of recognizing the emotion content of music signals, or music emotion recognition (MER). Specifically, we formulate MER as a regression problem to predict the arousal and valence values (AV values) of each music sample directly. Associ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Score-Independent Audio Features for Description of Music Expression

    Publication Year: 2008 , Page(s): 458 - 466
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (758 KB) |  | HTML iconHTML  

    During a music performance, the musician adds expressiveness to the musical message by changing timing, dynamics, and timbre of the musical events to communicate an expressive intention. Traditionally, the analysis of music expression is based on measurements of the deviations of the acoustic parameters with respect to the written score. In this paper, we employ machine learning techniques to unde... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic Annotation and Retrieval of Music and Sound Effects

    Publication Year: 2008 , Page(s): 467 - 476
    Cited by:  Papers (64)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (798 KB) |  | HTML iconHTML  

    We present a computer audition system that can both annotate novel audio tracks with semantically meaningful words and retrieve relevant tracks from a database of unlabeled audio content given a text-based query. We consider the related tasks of content-based audio annotation and retrieval as one supervised multiclass, multilabel problem in which we model the joint probability of acoustic features... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Transactions on Audio, Speech, and Language Processing Edics

    Publication Year: 2008 , Page(s): 477 - 478
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Audio, Speech, and Language Processing Information for authors

    Publication Year: 2008 , Page(s): 479 - 480
    Save to Project icon | Request Permissions | PDF file iconPDF (45 KB)  
    Freely Available from IEEE
  • IEEE Signal Processing Society Information

    Publication Year: 2008 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (31 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Audio, Speech and Language Processing covers the sciences, technologies and applications relating to the analysis, coding, enhancement, recognition and synthesis of audio, music, speech and language.

 

This Transactions ceased publication in 2013. The current retitled publication is IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Li Deng
Microsoft Research