Compact representations of videos through dominant and multiplemotion estimation
Sawhney, H.S.; Ayer, S.
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 18, Issue 8, Aug 1996 Page(s):814 - 830
Digital Object Identifier 10.1109/34.531801
Summary:An explosion of on-line image and video data in digital form is
already well underway. With the exponential rise in interactive
information exploration and dissemination through the World-Wide Web
(WWW), the major inhibitors of rapid access to on-line video data are
costs and management of capture and storage, lack of real-time delivery,
and nonavailability of content-based intelligent search and indexing
techniques. The solutions for capture, storage, and delivery may be on
the horizon or a little beyond. However, even with rapid delivery, the
lack of efficient authoring and querying tools for visual content-based
indexing may still inhibit as widespread a use of video information as
that of text and traditional tabular data is currently. In order to be
able to nonlinearly browse and index into videos through visual content,
it is necessary to develop authoring tools that can automatically
separate moving objects and significant components of the scene, and
represent these in a compact form. Given that video data comes in
torrents-almost a megabyte every 30th of a second-it will be highly
inefficient to search for objects and scenes in every frame of a video.
In this paper, we present techniques to automatically derive compact
representations of scenes and objects from the motion information. Image
motion is a significant cue in videos for the separation of scenes into
their significant components and for the separation of moving objects.
Motion analysis is useful in capturing the visual content of videos for
indexing and browsing in two different ways. First, separation of the
static scene from moving objects can be accomplished by employing
dominant 2D/3D motion estimation methods. Alternatively, if the goal is
to be able to represent the fixed scene too as a composition of
significant structures and objects, then simultaneous multiple motion
methods might be more appropriate. In either case, view-based summarized
representations of the scene can be created by video
compositing/mosaicing based on the estimated motions. We present robust
algorithms for both kinds of representations: 1) dominant motion
estimation based techniques which exploit a fairly common occurrence in
videos that a mostly fixed background (scene) is imaged with or without
independently moving objects, and 2) simultaneous multiple motion
estimation and representation of motion video using layered
representations. Ample examples of the representations achieved by each
method are included in the paper
View citation and abstract |