This work tackles the issue of the speaker-addressee links in face-to-face multiparty conversation. Systems that archive meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on the participants' utterances etc. Unfortunately, they fail to adequately convey who is talking to whom. To solve this problem, we must automatically detect the addressee or addressees and develop video editing rules that can clearly convey who is talking to whom. In this paper, to detect the addressee, we statistically analyze the speakers' gaze behavior for (a) one-addressee utterances and (b) multi-addressee utterances. Experiments verify that speakers' gaze behavior is 89% accurate in classifying addressee type, using the discrimination function obtained by discriminant analysis. Finally, we present three new video editing rules based on utterance type, and indicate the possibility of more successfully conveying who is talking to whom.
Published in:
Robot and Human Interactive Communication, 2004. ROMAN 2004. 13th IEEE International Workshop on
Date of Conference: 20-22 Sept. 2004