By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 8 • Date Aug. 2012

Filter Results

Displaying Results 1 - 18 of 18
  • Table of contents

    Publication Year: 2012 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (230 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Publication Year: 2012 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • 3-D Head Tracking via Invariant Keypoint Learning

    Publication Year: 2012 , Page(s): 1113 - 1126
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10916 KB) |  | HTML iconHTML  

    Keypoint matching is a standard tool to solve the correspondence problem in vision applications. However, in 3-D face tracking, this approach is often deficient because the human face complexities, together with its rich viewpoint, nonrigid expression, and lighting variations in typical applications, can cause many variations impossible to handle by existing keypoint detectors and descriptors. In this paper, we propose a new approach to tailor keypoint matching to track the 3-D pose of the user head in a video stream. The core idea is to learn keypoints that are explicitly invariant to these challenging transformations. First, we select keypoints that are stable under randomly drawn small viewpoints, nonrigid deformations, and illumination changes. Then, we treat keypoint descriptor learning at different large angles as an incremental scheme to learn discriminative descriptors. At matching time, to reduce the ratio of outlier correspondences, we use second-order color information to prune keypoints unlikely to lie on the face. Moreover, we integrate optical flow correspondences in an adaptive way to remove motion jitter efficiently. Extensive experiments show that the proposed approach can lead to fast, robust, and accurate 3-D head tracking results even under very challenging scenarios. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Object Detection From Videos Captured by Moving Camera by Fuzzy Edge Incorporated Markov Random Field and Local Histogram Matching

    Publication Year: 2012 , Page(s): 1127 - 1135
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3594 KB) |  | HTML iconHTML  

    In this paper, we put forward a novel region matching-based motion estimation scheme to detect objects with accurate boundaries from videos captured by moving camera. Here, a fuzzy edge incorporated Markov random field (MRF) model is considered for spatial segmentation. The algorithm is able to identify even the blurred boundaries of objects in a scene. Expectation Maximization algorithm is used to estimate the MRF model parameters. To reduce the complexity of searching, a new scheme is proposed to get a rough idea of maximum possible shift of objects from one frame to another by finding the amount of shift in positions of the centroid. We propose a χ2-test-based local histogram matching scheme for detecting moving objects from complex scenes from low illumination environment and objects that change size from one frame to another. The proposed scheme is successfully applied for detecting moving objects from video sequences captured in both real-life and controlled environments. It is also noticed that the proposed scheme provides better results with less object background misclassification as compared to existing techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inter Frame Video Compression With Large Dictionaries of Tilings: Algorithms for Tiling Selection and Entropy Coding

    Publication Year: 2012 , Page(s): 1136 - 1149
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10065 KB) |  | HTML iconHTML  

    We propose the use of large tree-structured dictionaries of tilings for video compression. Our first contribution is the construction of a rate-distortion cost function that admits fast search algorithms to select the optimal tiling for the motion compensation stage of a video coder. The computation of the cost is enabled through novel algorithms to approximate the bit rate and the distortion. Our second contribution is an efficient arithmetic coding algorithm to encode the selected tree-structured tiling. We illustrate the effectiveness of our approach by showing that a H.264/AVC-like video coder utilizing one of the proposed tiling selection methods results in up to 16% savings in bit rate for several standard video sequences as compared to H.264/AVC. This is accomplished with only a modest increase in the computation time at the encoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation and Applications of Tri-State Self-Organizing Maps on FPGA

    Publication Year: 2012 , Page(s): 1150 - 1160
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6291 KB) |  | HTML iconHTML  

    This paper introduces a tri-state logic self-organizing map (bSOM) designed and implemented on a field programmable gate array (FPGA) chip. The bSOM takes binary inputs and maintains tri-state weights. A novel training rule is presented. The bSOM is well suited to FPGA implementation, trains quicker than the original self-organizing map (SOM), and can be used in clustering and classification problems with binary input data. Two practical applications, character recognition and appearance-based object identification, are used to illustrate the performance of the implementation. The appearance-based object identification forms part of an end-to-end surveillance system implemented wholly on FPGA. In both applications, binary signatures extracted from the objects are processed by the bSOM. The system performance is compared with a traditional SOM with real-valued weights and a strictly binary weighted SOM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Recurrent Pattern Matching Video Coding

    Publication Year: 2012 , Page(s): 1161 - 1173
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6740 KB) |  | HTML iconHTML  

    In this paper, we propose a pattern-matching-based algorithm for video compression. This algorithm, named multidimensional multiscale parser (MMP)-Video, is based on the H.264/AVC video encoder, but uses a pattern-matching paradigm instead of the state-of-the-art transform-quantization-entropy encoding approach. The proposed method adopts the use of multiscale recurrent patterns to compress both spatial and temporal prediction residues, totally replacing the use of transforms and quantization. Experimental results show that the coding performance of MMP-Video is better than the one of H.264/AVC high profile, especially for medium to high bit-rates. The gains range up to 0.7 dB, showing that, in spite of its larger computational complexity, the use of multiscale recurrent pattern matching paradigm deserves being investigated as an alternative for video compression. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Near-Duplicate Video Clip Detection Using Model-Free Semantic Concept Detection and Adaptive Semantic Distance Measurement

    Publication Year: 2012 , Page(s): 1174 - 1187
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9303 KB) |  | HTML iconHTML  

    Motivated by the observation that content transformations tend to preserve the semantic information conveyed by video clips, this paper introduces a novel technique for near-duplicate video clip (NDVC) detection, leveraging model-free semantic concept detection and adaptive semantic distance measurement. In particular, model-free semantic concept detection is realized by taking advantage of the collective knowledge in an image folksonomy (which is an unstructured collection of user-contributed images and tags), facilitating the use of an unrestricted concept vocabulary. Adaptive semantic distance measurement is realized by means of the signature quadratic form distance (SQFD), making it possible to flexibly measure the similarity between video shots that contain a varying number of semantic concepts, and where these semantic concepts may also differ in terms of relevance and nature. Experimental results obtained for the MIRFLICKR-25000 image set (used as a source of collective knowledge) and the TRECVID 2009 video set (used to create query and reference video clips) demonstrate that model-free semantic concept detection and SQFD can be successfully used for the purpose of identifying NDVCs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple Hypotheses Bayesian Frame Rate Up-Conversion by Adaptive Fusion of Motion-Compensated Interpolations

    Publication Year: 2012 , Page(s): 1188 - 1198
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12683 KB) |  | HTML iconHTML  

    Frame rate up-conversion (FRUC) improves the viewing experience of a video because the motion in a FRUC-constructed high frame-rate video looks more smooth and continuous. This paper proposes a multiple hypotheses Bayesian FRUC scheme for estimating the intermediate frame with maximum a posteriori probability, in which both temporal motion model and spatial image model are incorporated into the optimization criterion. The image model describes the spatial structure of neighboring pixels while the motion model describes the temporal correlation of pixels along motion trajectories. Instead of employing a single uniquely optimal motion, multiple “optimal” motion trajectories are utilized to form a group of motion hypotheses. To obtain accurate estimation for the pixels in missing intermediate frames, the motion-compensated interpolations generated by all these motion hypotheses are adaptively fused according to the reliability of each hypothesis. We revealed by numerical analysis that this reliability (i.e., the variance of interpolation errors along the hypothesized motion trajectory) can be measured by the variation of reference pixels along the motion trajectory. To obtain the multiple motion fields, a set of block-matching sizes is used and the motion fields are estimated by progressively reducing the size of matching block. Experimental results show that the proposed method can significantly improve both the objective and the subjective quality of the constructed high frame rate video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • In-Layer Multibuffer Framework for Rate-Controlled Scalable Video Coding

    Publication Year: 2012 , Page(s): 1199 - 1212
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3292 KB) |  | HTML iconHTML  

    Temporal scalability is supported in scalable video coding (SVC) by means of hierarchical prediction structures, where the higher layers can be ignored for frame rate reduction. Nevertheless, this kind of scalability is not totally exploited by the rate control (RC) algorithms since the hypothetical reference decoder (HRD) requirement is only satisfied for the highest frame rate substream of every dependence (spatial or coarse grain scalability) layer. In this paper, we propose a novel RC approach that aims to deliver several HRD-compliant temporal resolutions within a particular dependence layer. Instead of using the common SVC encoder configuration consisting of a dependence layer per each temporal resolution, a compact configuration that does not require additional dependence layers for providing different HRD-compliant temporal resolutions is proposed. Specifically, the proposed framework for rate-controlled SVC uses a set of virtual buffers within a dependence layer so that their levels can be simultaneously controlled for overflow and underflow prevention while minimizing the reconstructed video distortion of the corresponding substreams. This in-layer multibuffer approach has been built on the top of a baseline H.264/SVC RC algorithm for variable bit rate applications. The experimental results show that our proposal achieves a good performance in terms of mean quality, quality consistency, and buffer control using a reduced number of layers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Face Sketch–Photo Synthesis and Retrieval Using Sparse Representation

    Publication Year: 2012 , Page(s): 1213 - 1226
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (32344 KB) |  | HTML iconHTML  

    Sketch-photo synthesis plays an important role in sketch-based face photo retrieval and photo-based face sketch retrieval systems. In this paper, we propose an automatic sketch-photo synthesis and retrieval algorithm based on sparse representation. The proposed sketch-photo synthesis method works at patch level and is composed of two steps: sparse neighbor selection (SNS) for an initial estimate of the pseudoimage (pseudosketch or pseudophoto) and sparse-representation-based enhancement (SRE) for further improving the quality of the synthesized image. SNS can find closely related neighbors adaptively and then generate an initial estimate for the pseudoimage. In SRE, a coupled sparse representation model is first constructed to learn the mapping between sketch patches and photo patches, and a patch-derivative-based sparse representation method is subsequently applied to enhance the quality of the synthesized photos and sketches. Finally, four retrieval modes, namely, sketch-based, photo-based, pseudosketch-based, and pseudophoto-based retrieval are proposed, and a retrieval algorithm is developed by using sparse representation. Extensive experimental results illustrate the effectiveness of the proposed face sketch-photo synthesis and retrieval algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multioriented Video Scene Text Detection Through Bayesian Classification and Boundary Growing

    Publication Year: 2012 , Page(s): 1227 - 1235
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5495 KB) |  | HTML iconHTML  

    Multioriented text detection in video frames is not as easy as detection of captions or graphics or overlaid texts, which usually appears in the horizontal direction and has high contrast compared to its background. Multioriented text generally refers to scene text that makes text detection more challenging and interesting due to unfavorable characteristics of scene text. Therefore, conventional text detection methods may not give good results for multioriented scene text detection. Hence, in this paper, we present a new enhancement method that includes the product of Laplacian and Sobel operations to enhance text pixels in videos. To classify true text pixels, we propose a Bayesian classifier without assuming a priori probability about the input frame but estimating it based on three probable matrices. Three different ways of clustering are performed on the output of the enhancement method to obtain the three probable matrices. Text candidates are obtained by intersecting the output of the Bayesian classifier with the Canny edge map of the input frame. A boundary growing method is introduced to traverse the multioriented scene text lines using text candidates. The boundary growing method works based on the concept of nearest neighbors. The robustness of the method has been tested on a variety of datasets that include our own created data (nonhorizontal and horizontal text data) and two publicly available data, namely, video frames of Hua and complex scene text data of ICDAR 2003 competition (camera images). Experimental results show that the performance of the proposed method is encouraging compared with results of existing methods in terms of recall, precision, F-measures, and computational times. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single-Pass Rate Control With Texture and Non-Texture Rate-Distortion Models

    Publication Year: 2012 , Page(s): 1236 - 1245
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9954 KB) |  | HTML iconHTML  

    One of the challenges in video rate control lies in determining a quantization parameter (Qp) that will be used for both the rate-distortion (R-D) optimization process and the quantization of transform coefficients. In this paper, we attempt to achieve effective rate control with a different approach. By modeling the relationships of distortion, texture bits, non-texture bits, and Qp, we derive the Qp required for both R-D optimization and quantization through Lagrangian optimization. From experiments with several video sequences, we found that our rate control scheme is capable of effective rate control with only a few model updates during encoding. The proposed rate control scheme adapts quickly to the characteristics of the source data and is particularly effective at controlling the rate of videos with high and unpredictable motion content. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Xplore Digital Library [advertisement]

    Publication Year: 2012 , Page(s): 1246
    Save to Project icon | Request Permissions | PDF file iconPDF (1346 KB)  
    Freely Available from IEEE
  • IEEE Foundation [advertisement]

    Publication Year: 2012 , Page(s): 1247
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • Quality without compromise [advertisement]

    Publication Year: 2012 , Page(s): 1248
    Save to Project icon | Request Permissions | PDF file iconPDF (324 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Publication Year: 2012 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Publication Year: 2012 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it