By Topic

Spoken Language Technology Workshop (SLT), 2010 IEEE

Date 12-15 Dec. 2010

Filter Results

Displaying Results 1 - 25 of 93
  • [Front cover]

    Publication Year: 2010 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (441 KB)  
    Freely Available from IEEE
  • [Title page]

    Publication Year: 2010 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (427 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2010 , Page(s): ii
    Save to Project icon | Request Permissions | PDF file iconPDF (424 KB)  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2010 , Page(s): iii - iv
    Save to Project icon | Request Permissions | PDF file iconPDF (445 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2010 , Page(s): v - xii
    Save to Project icon | Request Permissions | PDF file iconPDF (449 KB)  
    Freely Available from IEEE
  • Learning from images and speech with Non-negative Matrix Factorization enhanced by input space scaling

    Publication Year: 2010 , Page(s): 1 - 6
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (254 KB) |  | HTML iconHTML  

    Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatically assessing acoustic manifestations of personality in speech

    Publication Year: 2010 , Page(s): 7 - 12
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (140 KB) |  | HTML iconHTML  

    In this paper, we present first results on applying a personality assessment paradigm to speech input, and comparing human and automatic performance on this task. We cue a professional speaker to produce speech using different personality profiles and encode the resulting vocal personality impressions in terms of the Big Five NEO-FFI personality traits. We then have human raters, who do not know t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Significance of anchor speaker segments for constructing extractive audio summaries of broadcast news

    Publication Year: 2010 , Page(s): 13 - 18
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (221 KB) |  | HTML iconHTML  

    Analysis of human reference summaries of broadcast news showed that humans give preference to anchor speaker segments while constructing a summary. Therefore, we exploit the role of anchor speaker in a news show by tracking his/her speech to construct indicative/informative extractive audio summaries. Speaker tracking is done by Bayesian information criterion (BIC) technique. The proposed techniqu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What is left to be understood in ATIS?

    Publication Year: 2010 , Page(s): 19 - 24
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (109 KB) |  | HTML iconHTML  

    One of the main data resources used in many studies over the past two decades for spoken language understanding (SLU) research in spoken dialog systems is the airline travel information system (ATIS) corpus. Two primary tasks in SLU are intent determination (ID) and slot filling (SF). Recent studies reported error rates below 5% for both of these tasks employing discriminative machine learning tec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust representations for out-of-domain emotions using Emotion Profiles

    Publication Year: 2010 , Page(s): 25 - 30
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (128 KB) |  | HTML iconHTML  

    The proper representation of emotion is of vital importance for human-machine interaction. A correct understanding of emotion would allow interactive technology to appropriately respond and adapt to users. In human-machine interaction scenarios it is likely that over the course of an interaction, the human interaction partner will express an emotion not seen during the training of the machine's em... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigating modality selection strategies

    Publication Year: 2010 , Page(s): 31 - 36
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (311 KB) |  | HTML iconHTML  

    This paper describes a user study about the influence of efficiency on modality selection (speech vs. virtual keyboard/ speech vs. physical keyboard) and perceived mental effort. Efficiency was varied in terms of interaction steps. Based on previous research it was hypothesized that the number of necessary interaction steps determines the preference for a specific modality. Moreover the relationsh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using spoken utterance compression for meeting summarization: A pilot study

    Publication Year: 2010 , Page(s): 37 - 42
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (137 KB) |  | HTML iconHTML  

    Most previous work on meeting summarization focused on extractive approaches; however, directly concatenating the extracted spoken utterances may not form a good summary. In this paper, we investigate if it is feasible to compress the transcribed spoken utterances and if using the compressed utterances benefits meeting summarization. We model the utterance compression task as a sequence labeling p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unbiased discourse segmentation evaluation

    Publication Year: 2010 , Page(s): 43 - 48
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (134 KB) |  | HTML iconHTML  

    In this paper, we show that the performance measures Pk and Window Diff, commonly used for discourse, topic, and story segmentation evaluation, are biased in favor of segmentations with fewer or adjacent segment boundaries. By analytical and empirical means, we show how this results in a failure to penalize substantially defective segmentations. Our novel unbiased measure k-κ corrects this,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting authority bids in online discussions

    Publication Year: 2010 , Page(s): 49 - 54
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (91 KB) |  | HTML iconHTML  

    This paper looks at the problem of detecting a particular type of social behavior in discussions: attempts to establish credibility as an authority on a particular topic. Using maximum entropy modeling, we explore questions related to feature extraction and turn vs. discussion-level modeling in experiments with online discussion text given only a small amount of labeled training data. We also intr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Utilizing relationships between named entities to improve speech recognition in dialog systems

    Publication Year: 2010 , Page(s): 55 - 60
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (166 KB) |  | HTML iconHTML  

    In this paper, we address the problem of improving recognition accuracy of spoken named entities in the context of dialog systems for transactional applications. We propose utilizing the knowledge of relationships, that typically exist in many applications, between named entities spoken across different dialog states. For example, in a bank customer database each customer name is associated with o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving hmm-based extractive summarization for multi-domain contact center dialogues

    Publication Year: 2010 , Page(s): 61 - 66
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (153 KB) |  | HTML iconHTML  

    This paper reports the improvements we made to our previously proposed hidden Markov model (HMM) based summarization method for multi-domain contact center dialogues. Since the method relied on Viterbi decoding for selecting utterances to include in a summary, it had the inability to control compression rates. We enhance our method by using the forward-backward algorithm together with integer line... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic understanding by combining extended CFG parser with HMM model

    Publication Year: 2010 , Page(s): 67 - 72
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (125 KB) |  | HTML iconHTML  

    This paper presents a method for extracting both syntactic and semantic tags. An extended CFG parser works in conjunction with an HMM model, which handles unknown words and partially known words, to yield a complete syntactic and semantic interpretation of the utterance. Four experiments and applications were performed using the paradigm to show the usefulness of the approach in processing spoken ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Haptic Voice Recognition: Augmenting speech modality with touch events for efficient speech recognition

    Publication Year: 2010 , Page(s): 73 - 78
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (392 KB) |  | HTML iconHTML  

    This paper proposes the Haptic Voice Recognition (HVR), a multi-modal interface that combines speech and touch sensory inputs to perform voice recognition. These touch inputs form a series of haptic events that provide cues or `landmarks' for word boundaries. These word boundary cues greatly reduce the search space for speech recognition, thereby making the decoding process more efficient and suit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic model-based sentiment analysis of twitter messages

    Publication Year: 2010 , Page(s): 79 - 84
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (104 KB) |  | HTML iconHTML  

    We present a machine learning approach to sentiment classification on twitter messages (tweets). We classify each tweet into two categories: polar and non-polar. Tweets with positive or negative sentiment are considered polar. They are considered non-polar otherwise. Sentiment analysis of tweets can potentially benefit different parties, such as consumers and marketing researchers, for obtaining o... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Let's Buy Books: Finding eBooks using voice search

    Publication Year: 2010 , Page(s): 85 - 90
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    We describe Let's Buy Books, a dialog system that helps users search for eBook titles. In this paper we compare different vector space approaches to voice search and find that a hybrid approach using a weighted sub-space model smoothed with a general model provides the best performance over different conditions and evaluated using both synthetic queries and queries collected from users through que... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Good grief, i can speak it! preliminary experiments in audio restaurant reviews

    Publication Year: 2010 , Page(s): 91 - 96
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (113 KB) |  | HTML iconHTML  

    In this paper, we introduce a new envisioned application for speech which allows users to enter restaurant reviews orally via their mobile device, and, at a later time, update a shared and growing database of consumer-provided information about restaurants. During the intervening period, a speech recognition and NLP based system has analyzed their audio recording both to extract key descriptive ph... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The IBM Attila speech recognition toolkit

    Publication Year: 2010 , Page(s): 97 - 102
    Cited by:  Papers (38)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (156 KB) |  | HTML iconHTML  

    We describe the design of IBM's Attila speech recognition toolkit. We show how the combination of a highly modular and efficient library of low-level C++ classes with simple interfaces, an interconnection layer implemented in a modern scripting language (Python), and a standardized collection of scripts for system-building produce a flexible and scalable toolkit that is useful both for basic resea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards accurate recognition for children's oral reading fluency

    Publication Year: 2010 , Page(s): 103 - 108
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (616 KB) |  | HTML iconHTML  

    Systems for assessing and tutoring reading skills place unique requirements on underlying ASR technologies. This paper presents VersaReader, a system automatically measuring children's oral reading fluency skills. Critical techniques that improve the recognition accuracy and make the system practical are discussed in detail. We show that using a set of linguistic rules learned from a collection of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised cross-lingual speaker adaptation for accented speech recognition

    Publication Year: 2010 , Page(s): 109 - 114
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (158 KB) |  | HTML iconHTML  

    In this paper we present investigations on how the acoustic models in automatic speech recognition can be adapted across languages in unsupervised fashion to improve recognition of speech with a foreign accent. Recognition systems were trained on large Finnish and English corpora, and tested both on monolingual and bilingual material. Adaptation with bilingual and monolingual recognisers was compa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Muse: An open source speech technology research platform

    Publication Year: 2010 , Page(s): 115 - 120
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (121 KB) |  | HTML iconHTML  

    This paper introduces the open source muster speech engine (Muse) for speech technology research. The Muse platform abstracts common data types and software as used by speech technology researchers. It is designed to assist researchers in making repeatable experiments that are not hard coded to a specific platform, language, algorithm, or corpus. It contains a script language and a shell where use... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.