1. Introduction
Activity detection and description is a key functionality of perceptually aware interfaces working in collaborative human communication environments like meeting-rooms or classrooms. Actually, in the context of person-machine communication, computers involved in human communication activities have to meet certain requirements and be designed to have minimal possible awareness from the users. Consequently, there is a need of perceptual user interfaces which are multimodal and robust, and which use unobtrusive sensors that should sense the ongoing human activity. As human activity is reflected in a rich variety of acoustic events, either produced by the human body or by objects handled by humans, acoustic event detection (AED) may help to describe the human and social activity. Ringing telephones, clapping or laughter inside a speech discourse, a strong yawn in the middle of a lecture, knocks on doors, doors opening and closing, footsteps, or even the difference between one person speaking or more people speaking at the same time, are auditory cues that can be used to detect relevant events and state changes on meetings.