In this paper we explore the way to allow a user to interactively organize a multimedia database through a dynamic interface, creating its own "audiovisual concepts" freely. The user defines distances on a small subset of documents, using low-level audio and video off-line automatically extracted descriptors. The semi-supervised learning process, relying on support vector regression used in an early fusion context, leads to generate a behavioral model of those descriptors thanks to human interaction, creating a personal audiovisual similarity measure.
Published in:
Image Processing, 2008. ICIP 2008. 15th IEEE International Conference on
Date of Conference: 12-15 Oct. 2008