Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

16-16 Oct. 2002

Filter Results

Displaying Results 1 - 25 of 89
  • Proceedings Fourth IEEE International Conference on Multimodal Interfaces

    Publication Year: 2002
    Request permission for reuse | PDF file iconPDF (327 KB)
    Freely Available from IEEE
  • Layered representations for human activity recognition

    Publication Year: 2002, Page(s):3 - 8
    Cited by:  Papers (112)  |  Patents (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (464 KB) | HTML iconHTML

    We present the use of layered probabilistic representations using hidden Markov models for performing sensing, learning, and inference at multiple levels of temporal granularity We describe the use of representation in a system that diagnoses states of a user's activity based on real-time streams of evidence from video, acoustic, and computer interactions. We review the representation, present an ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating integrated speech- and image understanding

    Publication Year: 2002, Page(s):9 - 14
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (551 KB) | HTML iconHTML

    The capability to coordinate and interrelate speech and vision is a virtual prerequisite for adaptive, cooperative, and flexible interaction among people. It is therefore fair to assume that human-machine interaction, too, would benefit from intelligent interfaces for integrated speech and image processing. We first sketch an interactive system that integrates automatic speech processing with imag... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Techniques for interactive audience participation

    Publication Year: 2002, Page(s):15 - 20
    Cited by:  Papers (19)  |  Patents (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (449 KB) | HTML iconHTML

    At SIGGRAPH in 1991, Loren and Rachel Carpenter unveiled an interactive entertainment system that allowed members of a large audience to control an onscreen game using red and green reflective paddles. In the spirit of this approach, we present a new set of techniques that enable members of an audience to participate, either cooperatively or competitively, in shared entertainment experiences. Our ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Perceptual collaboration in Neem

    Publication Year: 2002, Page(s):21 - 26
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (417 KB) | HTML iconHTML

    The Neem Platform is a research test bed for Project Neem, concerned with the development of socially and culturally aware collaborative systems in a wide range of domains. In this paper we discuss a novel use of perceptual interfaces, applied to group collaboration support. In Neem, the multimodal content of human to human interaction is analyzed and reasoned upon. Applications react to this impl... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A tracking framework for collaborative human computer interaction

    Publication Year: 2002, Page(s):27 - 32
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (471 KB) | HTML iconHTML

    The ability to track many people and their body parts (i.e., face and hands) in a complex environment is crucial for designing collaborative natural human computer interaction (HCI). A challenging issue in tracking body parts is the data association uncertainty while assigning measurements to the proper tracks in the case of occlusion and close interaction of body parts of different people. This p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A structural approach to distance rendering in personal auditory displays

    Publication Year: 2002, Page(s):33 - 38
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (301 KB) | HTML iconHTML

    A virtual resonating environment aiming at enhancing our perception of distance is proposed. This environment reproduces the acoustics inside a tube, thus conveying peculiar distance cues to the listener. The corresponding resonator has been prototyped using a wave-based numerical scheme called waveguide mesh, that gave the necessary versatility to the model during the design and parameterization ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multimodal electronic travel aid device

    Publication Year: 2002, Page(s):39 - 44
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (315 KB) | HTML iconHTML

    This paper describes an electronic travel aid device, that may enable blind individuals to "see the world with their ears". A wearable prototype will be assembled using low-cost hardware: earphones, sunglasses fitted with two micro cameras, and a palmtop computer. The system, which currently runs on a desktop computer, is able to detect the light spot produced by a laser pointer, compute its angul... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lecture and presentation tracking in an intelligent meeting room

    Publication Year: 2002, Page(s):47 - 52
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (3978 KB) | HTML iconHTML

    Archiving, indexing, and later browsing through stored presentations and lectures is increasingly being used. We have investigated the special problems and advantages of lectures and propose the design and adaptation of a speech recognizer to a lecture such that the recognition accuracy can be significantly improved by prior analysis of the presented documents using a special class-based language ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel computing-based architecture for mixed-initiative spoken dialogue

    Publication Year: 2002, Page(s):53 - 58
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (489 KB)

    This paper describes a new method of implementing mixed-initiative spoken dialogue systems based on parallel computing architecture. In a mixed-initiative dialogue, the user as well as the system needs to be capable of controlling the dialogue sequence. In our implementation, various language models corresponding to different dialogue contents, such as requests for information or replies to the sy... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3D N-best search for simultaneous recognition of distant-talking speech of multiple talkers

    Publication Year: 2002, Page(s):59 - 63
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (374 KB) | HTML iconHTML

    A microphone array is a promising solution for realizing hands-free speech recognition in real environments. Accurate talker localization is very important for speech recognition using the microphone array. However, localization of a moving talker is difficult in noisy reverberant environments. Talker localization errors degrade the performance of speech recognition. To solve the problem, we propo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integration of tone related feature for Chinese speech recognition

    Publication Year: 2002, Page(s):64 - 68
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (297 KB) | HTML iconHTML

    Chinese is a tonal language that uses fundamental frequency, in addition to phones for word differentiation. Commonly used front-end features, such as mel-frequency cepstral coefficients (MFCC), however, are optimized for non-tonal languages such as English and are not mainly focused on pitch information that is important for tone identification. In this paper, we examine the integration of tone-r... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Talking heads: which matching between faces and synthetic voices?

    Publication Year: 2002, Page(s):69 - 74
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (713 KB) | HTML iconHTML

    The integration of synthetic faces and text-to-speech voice synthesis (what we call "talking heads") allows new applications in the area of man-machine interfaces. In the near future, talking heads might be useful communicative interface agents. But before making an extensive use of talking heads, several issues have to be checked according to their acceptability by users. An important issue is to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust noisy speech recognition with adaptive frequency bank selection

    Publication Year: 2002, Page(s):75 - 80
    Cited by:  Papers (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (344 KB) | HTML iconHTML

    With the development of automatic speech recognition technology, the robustness problem of speech recognition systems is becoming more and more important. This paper addresses the problem of speech recognition in an additive background noise environment. Since the frequency energy of different types of noise focuses on different frequency banks, the effects of additive noise on each frequency bank... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Covariance-tied clustering method in speaker identification

    Publication Year: 2002, Page(s):81 - 84
    Cited by:  Papers (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (350 KB) | HTML iconHTML

    Gaussian mixture models (GMMs) have been successfully applied to the classifier for speaker modeling in speaker identification. However, there are still problems to solve, such as the clustering methods. The conditional k-means algorithm utilizes Euclidean distance taking all data distribution as sphericity, which is not the distribution of the actual data. In this paper we present a new method ma... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context-based multimodal input understanding in conversational systems

    Publication Year: 2002, Page(s):87 - 92
    Cited by:  Papers (1)  |  Patents (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (333 KB) | HTML iconHTML

    In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Sometimes, merely fusing multimodal inputs together cannot derive a complete understanding. To address these inadequacies, we are building a semantics-based multimodal interpretation framework called MIND (Multimodal Interpretation for Natural Dialog). The unique feature of MIND is the use of a variety of c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Context-sensitive help for multimodal dialogue

    Publication Year: 2002, Page(s):93 - 98
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (392 KB) | HTML iconHTML

    Multimodal interfaces offer users unprecedented flexibility in choosing a style of interaction. However, users are frequently unaware of or forget shorter or more effective multimodal or pen-based commands. This paper describes a working help system that leverages the capabilities of a multimodal interface in order to provide targeted, unobtrusive, context-sensitive help. This multimodal help syst... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Referring to objects with spoken and haptic modalities

    Publication Year: 2002, Page(s):99 - 104
    Cited by:  Papers (2)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (689 KB)

    The gesture input modality considered in multimodal dialogue systems is mainly reduced to pointing or manipulating actions. With an approach based on spontaneous character of the communication, the treatment of such actions involves many processes. Without constraints, the user may use gesture in association with speech, and may exploit visual context peculiarities, guiding her/his articulation of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards visually-grounded spoken language acquisition

    Publication Year: 2002, Page(s):105 - 110
    Cited by:  Patents (1)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (329 KB) | HTML iconHTML

    A characteristic shared by most approaches to natural language understanding and generation is the use of symbolic representations of word and sentence meanings. Frames and semantic nets are examples of symbolic representations. Symbolic methods are inappropriate for applications which require natural language semantics to be linked to perception, as is the case in tasks such as scene description ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling output in the EMBASSI multimodal dialog system

    Publication Year: 2002, Page(s):111 - 116
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (270 KB) | HTML iconHTML

    In this paper we present the concept for abstract modeling of output render components. We illustrate how this categorization serves to seamlessly integrate previously unknown output multimodalities coherently into multimodal presentations of the EMBASSI dialog system. We present a case study and conclude with an overview of related work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal dialogue systems for interactive TV applications

    Publication Year: 2002, Page(s):117 - 122
    Cited by:  Papers (7)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (236 KB) | HTML iconHTML

    Many studies have shown the advantages of building multimodal systems, but not in the interactive TV application context. This paper reports on a qualitative study of a multimodal program guide for interactive TV. The system was designed by adding speech interaction to an existing TV program guide. Results indicate that spoken natural language input combined with visual output is preferable for TV... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human-robot interaction: engagement between humans and robots for hosting activities

    Publication Year: 2002, Page(s):123 - 128
    Cited by:  Papers (4)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (471 KB) | HTML iconHTML

    To participate in conversations with people, robots must not only see and talk to people, but must also make use of the conventions of conversation and connection to their human counterparts. This paper reports on research on engagement in human-human interaction and applications to (non-autonomous) robots interacting with humans in hosting activities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Viewing and analyzing multimodal human-computer tutorial dialogue: a database approach

    Publication Year: 2002, Page(s):129 - 134
    Cited by:  Patents (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (275 KB) | HTML iconHTML

    It is easier to record logs of multimodal human-computer tutorial dialogue than to make sense of them. In the 2000-2001 school year, we logged the interactions of approximately 400 students who used Project LISTEN's Reading Tutor and who read aloud over 2.4 million words. We discuss some difficulties we encountered converting the logs into a more easily understandable database. It is faster to wri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive dialog based upon multimodal language acquisition

    Publication Year: 2002, Page(s):135 - 140
    Cited by:  Papers (3)  |  Patents (5)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (479 KB)

    Communicating by voice with speech-enabled computer applications based on preprogrammed rule grammars suffers from constrained vocabulary and sentence structures. Deviations from the allowed language result in an unrecognized utterance that will not be understood and processed by the system. One way to alleviate this restriction consists in allowing the user to expand the computer's recognized and... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating emotional cues into a framework for dialogue management

    Publication Year: 2002, Page(s):141 - 146
    Cited by:  Papers (9)  |  Patents (3)
    Request permission for reuse | Click to expandAbstract | PDF file iconPDF (262 KB) | HTML iconHTML

    Emotions are very important in human-human communication but are usually ignored in human-computer interaction. Recent work focuses on recognition and generation of emotions as well as emotion driven behavior. Our work focuses on the use of emotions in dialogue systems that can be used with speech input or as well in multi-modal environments. We describe a framework for using emotional cues in a d... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.