Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 5 • Date May 2004

Filter Results

Displaying Results 1 - 23 of 23
  • Table of contents

    Publication Year: 2004 , Page(s): c1 - c4
    Save to Project icon | Request Permissions | PDF file iconPDF (110 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Publication Year: 2004 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • Introduction to the Special Issue on Audio and Video Analysis for Multimedia Interactive Services

    Publication Year: 2004 , Page(s): 569 - 571
    Save to Project icon | Request Permissions | PDF file iconPDF (112 KB) |  | HTML iconHTML  
    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization-based automated home video editing system

    Publication Year: 2004 , Page(s): 572 - 583
    Cited by:  Papers (29)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (424 KB) |  | HTML iconHTML  

    In this paper, we present an optimization-based system that automates home video editing. This system automatically selects suitable or desirable highlight segments from a set of raw home videos and aligns them with a given piece of incidental music to create an edited video segment to a desired length based on the content of the video and incidental music. We developed an approach for extracting temporal structure and determining the importance of a video segment in order to facilitate the selection of highlight segments. Additionally we extract a temporal structure, beats, and tempos from the incidental music. In order to create more professional-looking results, the selected highlight segments satisfy a set of editing rules and are matched to the content of the incidental music. This task is formulated as a nonlinear 0-1 programming problem and the rules, which are adjustable and increasable, are embedded as constraints. The output video is rendered by connecting the selected highlight video segments with transition effects and the incidental music. Under this framework, we can choose the best-matched music for a given video and support different output styles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3-D physical motion-based bandwidth prediction for video conferencing

    Publication Year: 2004 , Page(s): 584 - 594
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (600 KB) |  | HTML iconHTML  

    Interactive video services such as video conferencing, distance learning, and online video games over the Internet and wireless networks are becoming increasingly prevalent. Because of limited bandwidth on networks and the bandwidth-hungry nature of video, interactive video requires extremely efficient resource management to gain real-time performance. Unlike pregenerated video whose traffic profiles can be computed in advance, efficiency and accuracy of dynamic resource allocation methods for interactive video depend critically on the performance of traffic prediction. Using either traffic data or image features, existing traffic prediction schemes can only provide a short-term traffic prediction. Based on a three-dimensional object's motion, this paper presents a new bandwidth prediction approach for video conferencing. We show that there is a strong correlation between video conferencing traffic and real motion of objects. The real motion can be predicted by the powerful technique Kalman filtering, and the estimated motion is used to make a long-term bandwidth prediction. The new traffic prediction model is tested and compared with the frame-based adaptive normalized least mean square error linear predictor and optical flow-based method with Kalman filtering. Experimental results show that our proposed traffic prediction model achieves much higher accuracy in long-term traffic prediction, which provides the possibility for networks to allocate resources efficiently for video conferencing services. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video extraction for fast content access to MPEG compressed videos

    Publication Year: 2004 , Page(s): 595 - 605
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB) |  | HTML iconHTML  

    As existing video processing technology is primarily developed in the pixel domain yet digital video is stored in compressed format, any application of those techniques to compressed videos would require decompression. For discrete cosine transform (DCT)-based MPEG compressed videos, the computing cost of standard row-by-row and column-by-column inverse DCT (IDCT) transforms for a block of 8×8 elements requires 4096 multiplications and 4032 additions, although practical implementation only requires 1024 multiplications and 896 additions. In this paper, we propose a new algorithm to extract videos directly from MPEG compressed domain (DCT domain) without full IDCT, which is described in three extraction schemes: 1) video extraction in 2×2 blocks with four coefficients; 2) video extraction in 4×4 blocks with four DCT coefficients; and 3) video extraction in 4×4 blocks with nine DCT coefficients. The computing cost incurred only requires 8 additions and no multiplication for the first scheme, 2 multiplication and 28 additions for the second scheme, and 47 additions (no multiplication) for the third scheme. Extensive experiments were carried out, and the results reveal that: 1) the extracted video maintains competitive quality in terms of visual perception and inspection and 2) the extracted videos preserve the content well in comparison with those fully decompressed ones in terms of histogram measurement. As a result, the proposed algorithm will provide useful tools in bridging the gap between pixel domain and compressed domain to facilitate content analysis with low latency and high efficiency such as those applications in surveillance videos, interactive multimedia, and image processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval

    Publication Year: 2004 , Page(s): 606 - 621
    Cited by:  Papers (61)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB) |  | HTML iconHTML  

    In this paper, a novel algorithm is presented for the real-time, compressed-domain, unsupervised segmentation of image sequences and is applied to video indexing and retrieval. The segmentation algorithm uses motion and color information directly extracted from the MPEG-2 compressed stream. An iterative rejection scheme based on the bilinear motion model is used to effect foreground/background segmentation. Following that, meaningful foreground spatiotemporal objects are formed by initially examining the temporal consistency of the output of iterative rejection, clustering the resulting foreground macroblocks to connected regions and finally performing region tracking. Background segmentation to spatiotemporal objects is additionally performed. MPEG-7 compliant low-level descriptors describing the color, shape, position, and motion of the resulting spatiotemporal objects are extracted and are automatically mapped to appropriate intermediate-level descriptors forming a simple vocabulary termed object ontology. This, combined with a relevance feedback mechanism, allows the qualitative definition of the high-level concepts the user queries for (semantic objects, each represented by a keyword) and the retrieval of relevant video segments. Desired spatial and temporal relationships between the objects in multiple-keyword queries can also be expressed, using the shot ontology. Experimental results of the application of the segmentation algorithm to known sequences demonstrate the efficiency of the proposed segmentation approach. Sample queries reveal the potential of employing this segmentation algorithm as part of an object-based video indexing and retrieval scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A rule-based video annotation system

    Publication Year: 2004 , Page(s): 622 - 633
    Cited by:  Papers (17)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB) |  | HTML iconHTML  

    A generic system for automatic annotation of videos is introduced. The proposed approach is based on the premise that the rules needed to infer a set of high-level concepts from low-level descriptors cannot be defined a priori. Rather, knowledge embedded in the database and interaction with an expert user is exploited to enable system learning. Underpinning the system at the implementation level is preannotated data that dynamically creates signification links between a set of low-level features extracted directly from the video dataset and high-level semantic concepts defined in the lexicon. The lexicon may consist of words, icons, or any set of symbols that convey the meaning to the user. Thus, the lexicon is contingent on the user, application, time, and the entire context of the annotation process. The main system modules use fuzzy logic and rule mining techniques to approximate human-like reasoning. A rule-knowledge base is created on a small sample selected by the expert user during the learning phase. Using this rule-knowledge base, the system automatically assigns keywords from the lexicon to nonannotated video clips in the database. Using common low-level video representations, the system performance was assessed on a database containing hundreds of broadcasting videos. The experimental evaluation showed robust and high annotation accuracy. The system architecture offers straightforward expansion to relevance feedback and autonomous learning capabilities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic indexing of soccer audio-visual sequences: a multimodal approach based on controlled Markov chains

    Publication Year: 2004 , Page(s): 634 - 643
    Cited by:  Papers (44)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (304 KB) |  | HTML iconHTML  

    Content characterization of sport videos is a subject of great interest to researchers working on the analysis of multimedia documents. In this paper, we propose a semantic indexing algorithm which uses both audio and visual information for salient event detection in soccer. The video signal is processed first by extracting low-level visual descriptors directly from an MPEG-2 bit stream. It is assumed that any instance of an event of interest typically affects two consecutive shots and is characterized by a different temporal evolution of the visual descriptors in the two shots. This motivates the introduction of a controlled Markov chain to describe such evolution during an event of interest, with the control input modeling the occurrence of a shot transition. After adequately training different controlled Markov chain models, a list of video segments can be extracted to represent a specific event of interest using the maximum likelihood criterion. To reduce the presence of false alarms, low-level audio descriptors are processed to order the candidate video segments in the list so that those associated to the event of interest are likely to be found in the very first positions. We focus in particular on goal detection, which represents a key event in a soccer game, using camera motion information as a visual cue and the "loudness" as an audio descriptor. The experimental results show the effectiveness of the proposed multimodal approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query feedback for interactive image retrieval

    Publication Year: 2004 , Page(s): 644 - 655
    Cited by:  Papers (16)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB) |  | HTML iconHTML  

    From a perceptual standpoint, the subjectivity inherent in understanding and interpreting visual content in multimedia indexing and retrieval motivates the need for online interactive learning. Since efficiency and speed are important factors in interactive visual content retrieval, most of the current approaches impose restrictive assumptions on similarity calculation and learning algorithms. Specifically, content-based image retrieval techniques generally assume that perceptually similar images are situated close to each other within a connected region of a given space of visual features. This paper proposes a novel method for interactive image retrieval using query feedback. Query feedback learns the user query as well as the correspondence between high-level user concepts and their low-level machine representation by performing retrievals according to multiple queries supplied by the user during the course of a retrieval session. The results presented in this paper demonstrate that this algorithm provides accurate retrieval results with acceptable interaction speed compared to existing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalized nonlinear relevance feedback for interactive content-based retrieval and organization

    Publication Year: 2004 , Page(s): 656 - 671
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (736 KB)  

    In this paper, a novel relevance feedback algorithm is proposed for improving the performance of interactive content-based retrieval systems. The algorithm recursively estimates the similarity measure, which is used for data ranking in description environments where similarity-based queries are applied, using a set of relevant/irrelevant samples feedback by the user to the system so that the adjusted response is a better approximation of the current user's information needs and preferences. In particular, using concepts of functional analysis, the similarity measure is expressed as a parametric form of known monotone increasing functional components. Then, the contribution of each functional component to the similarity measure is estimated through a recursive and efficient on-line learning algorithm so that: 1) the current user's needs and preferences, as indicated by a set of selected relevant/irrelevant samples, are satisfied as much as possible, while simultaneously 2) a minimal modification of the already estimated similarity measure is accomplished. Experimental results on a large real-life database using objective evaluation criteria, such as the precision-recall curve and the average normalized modified retrieval rank (ANMRR), indicate that the proposed scheme outperforms the compared ones. In addition, the proposed algorithm requires low computational complexity and it can be implemented in a recursive way. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relevance feedback in region-based image retrieval

    Publication Year: 2004 , Page(s): 672 - 681
    Cited by:  Papers (40)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (416 KB) |  | HTML iconHTML  

    Relevance feedback and region-based representations are two effective ways to improve the accuracy of content-based image retrieval systems. Although these two techniques have been successfully investigated and developed in the last few years, little attention has been paid to combining them together. We argue that integrating these two approaches and allowing them to benefit from each other will yield better performance than using either of them alone. To do that, on the one hand, two relevance feedback algorithms are proposed based on region representations. One is inspired from the query point movement method. By assembling all of the segmented regions of positive examples together and reweighting the regions to emphasize the latest ones, a pseudo image is formed as the new query. An incremental clustering technique is also considered to improve the retrieval efficiency. The other is the introduction of existing support vector machine-based algorithms. A new kernel is proposed so as to enable the algorithms to be applicable to region-based representations. On the other hand, a rational region weighting scheme based on users' feedback information is proposed. The region weights that somewhat coincide with human perception not only can be used in a query session, but can also be memorized and accumulated for future queries. Experimental results on a database of 10 000 general-purpose images demonstrate the effectiveness of the proposed framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Speech-to-video synthesis using MPEG-4 compliant visual features

    Publication Year: 2004 , Page(s): 682 - 692
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (600 KB) |  | HTML iconHTML  

    There is a strong correlation between the building blocks of speech (phonemes) and the building blocks of visual speech (visimes). In this paper, this correlation is exploited and an approach is proposed for synthesizing the visual representation of speech from a narrow-band acoustic speech signal. The visual speech is represented in terms of the facial animation parameters (FAPs), supported by the MPEG-4 standard. The main contribution of this paper is the development of a correlation hidden Markov model (CHMM) system, which integrates independently trained acoustic HMM (AHMM) and visual HMM (VHMM) systems, in order to realize speech-to-video synthesis. The proposed CHMM system allows for different model topologies for acoustic and visual HMMs. It performs late integration and reduces the amount of required training data compared to early integration modeling techniques. Temporal accuracy experiments, comparison of the synthesized FAPs to the original FAPs, and audio-visual automatic speech recognition (AV-ASR) experiments utilizing the synthesized visual speech were performed in order to objectively measure the performance of the system. The objective experiments demonstrated that the proposed approach reduces time alignment errors by 40.5% compared to the conventional temporal scaling method, that the synthesized FAP sequences are very similar to the original FAP sequences, and that synthesized FAP sequences contain visual speechreading information that can improve AV-ASR performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recognition of visual speech elements using adaptively boosted hidden Markov models

    Publication Year: 2004 , Page(s): 693 - 705
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB) |  | HTML iconHTML  

    The performance of automatic speech recognition (ASR) system can be significantly enhanced with additional information from visual speech elements such as the movement of lips, tongue, and teeth, especially under noisy environment. In this paper, a novel approach for recognition of visual speech elements is presented. The approach makes use of adaptive boosting (AdaBoost) and hidden Markov models (HMMs) to build an AdaBoost-HMM classifier. The composite HMMs of the AdaBoost-HMM classifier are trained to cover different groups of training samples using the AdaBoost technique and the biased Baum-Welch training method. By combining the decisions of the component classifiers of the composite HMMs according to a novel probability synthesis rule, a more complex decision boundary is formulated than using the single HMM classifier. The method is applied to the recognition of the basic visual speech elements. Experimental results show that the AdaBoost-HMM classifier outperforms the traditional HMM classifier in accuracy, especially for visemes extracted from contexts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accurate and quasi-automatic lip tracking

    Publication Year: 2004 , Page(s): 706 - 715
    Cited by:  Papers (28)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    Lip segmentation is an essential stage in many multimedia systems such as videoconferencing, lip reading, or low-bit-rate coding communication systems. In this paper, we propose an accurate and robust quasi-automatic lip segmentation algorithm. First, the upper mouth boundary and several characteristic points are detected in the first frame by using a new kind of active contour: the "jumping snake." Unlike classic snakes, it can be initialized far from the final edge and the adjustment of its parameters is easy and intuitive. Then, to achieve the segmentation, we propose a parametric model composed of several cubic curves. Its high flexibility enables accurate lip contour extraction even in the challenging case of a very asymmetric mouth. Compared to existing models, it brings a significant improvement in accuracy and realism. The segmentation in the following frames is achieved by using an interframe tracking of the keypoints and the model parameters. However, we show that, with a usual tracking algorithm, the keypoints' positions become unreliable after a few frames. We therefore propose an adjustment process that enables an accurate tracking even after hundreds of frames. Finally, we show that the mean keypoints' tracking errors of our algorithm are comparable to manual points' selection errors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Audio classification based on MPEG-7 spectral basis representations

    Publication Year: 2004 , Page(s): 716 - 725
    Cited by:  Papers (26)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    In this paper, we present an MPEG-7-based audio classification and retrieval technique targeted for analysis of film material. The technique consists of low-level descriptors and high-level description schemes. For low-level descriptors, low-dimensional features such as audio spectrum projection based on audio spectrum basis descriptors is produced in order to find a balanced tradeoff between reducing dimensionality and retaining maximum information content. High-level description schemes are used to describe the modeling of reduced-dimension features, the procedure of audio classification, and retrieval. A classifier based on continuous hidden Markov models is applied. The sound model state path, which is selected according to the maximum-likelihood model, is stored in an MPEG-7 sound database and used as an index for query applications. Various experiments are presented where the speaker- and sound-recognition rates are compared for different feature extraction methods. Using independent component analysis, we achieved better results than normalized audio spectrum envelope and principal component analysis in a speaker recognition system. In audio classification experiments, audio sounds are classified into selected sound classes in real time with an accuracy of 96%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining shape prior and statistical features for active contour segmentation

    Publication Year: 2004 , Page(s): 726 - 734
    Cited by:  Papers (26)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (408 KB) |  | HTML iconHTML  

    This paper deals with image and video segmentation using active contours. The proposed variational approach is based on a criterion featuring a shape prior allowing free-form deformation. The shape prior is defined as a functional of the distance between the active contour and a contour of reference. We develop the complete differentiation of this criterion. First we propose two applications using only the shape prior term: the first application concerns shape warping and the second concerns video interpolation. Then the shape prior is combined with region-based features. This general framework is applied to interactive segmentation and face tracking on a real sequence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification of video segmentation application scenarios

    Publication Year: 2004 , Page(s): 735 - 741
    Cited by:  Papers (8)  |  Patents (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    Video analysis can be used in the context of a wide variety of applications and therefore a multiplicity of techniques has been proposed in the literature. Each of those techniques is usually devoted to solving a specific part of the complete analysis problem, unless the problem is rather simple. Typically, to be able to propose meaningful analysis solutions, the analysis problem must first be appropriately constrained, taking into account the relevant application environment. Then, complementary types of analysis techniques may have to be used in combination to achieve the desired results. This paper proposes a classification of segmentation applications into a set of scenarios, according to the different application constraints and goals. This allows an easier selection of the appropriate video segmentation solution for each specific application. Examples of segmentation solutions for the most relevant scenarios identified are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multiscale representation method for nonrigid shapes with a single closed contour

    Publication Year: 2004 , Page(s): 742 - 753
    Cited by:  Papers (53)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (728 KB) |  | HTML iconHTML  

    In this paper, we discuss the criteria that should be satisfied by a descriptor for nonrigid shapes with a single closed contour. We then propose a shape representation method that fulfills these criteria. In the proposed approach, contour convexities and concavities at different scale levels are represented using a two-dimensional (2-D) matrix. The representation can be visualized as a 2-D surface, where "hills" and "valleys" represent contour convexities and concavities, respectively. The optimal matching of two shape representations is achieved using dynamic programming and a dissimilarity measure is defined based on this matching. The proposed algorithm is very efficient and invariant to several kinds of transformations including some articulations and modest occlusions. The retrieval performance of the approach is illustrated using the MPEG-7 shape database, which is one of the most complete shape databases currently available. Our experiments indicate that the proposed representation is well suited for object indexing and retrieval in large databases. Furthermore, the representation can be used as a starting point to obtain more compact descriptors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2005 IEEE International Symposium on Circuits and Systems (ISCAS 2005)

    Publication Year: 2004 , Page(s): 754
    Save to Project icon | Request Permissions | PDF file iconPDF (520 KB)  
    Freely Available from IEEE
  • 11th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2004)

    Publication Year: 2004 , Page(s): 755
    Save to Project icon | Request Permissions | PDF file iconPDF (494 KB)  
    Freely Available from IEEE
  • 2004 IEEE Asia-Pacific Conference on Circuits and Systems

    Publication Year: 2004 , Page(s): 756
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Publication Year: 2004 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it