By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 6 • Date June 2012

Filter Results

Displaying Results 1 - 17 of 17
  • Table of contents

    Publication Year: 2012 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (234 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Publication Year: 2012 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • Design and Implementation of Efficient Video Stabilization Engine Using Maximum a Posteriori Estimation and Motion Energy Smoothing Approach

    Publication Year: 2012 , Page(s): 817 - 830
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16147 KB) |  | HTML iconHTML  

    To smooth the video content caused by handheld devices, this paper designs a hardware-oriented engine for efficient video stabilization. This engine is realized based on the motion energy computation. Maximum a posteriori estimation derives the global motion. Significantly, the motion energy smoothing accomplishes video stabilization. The global motion is smoothed by calculating the continuous and curve energy of successive frames. In addition, to achieve real-time video stabilization, efficient hardware architecture is proposed. The novel data reuse scheme is designed for enhancing the speed of corner point detection. The estimation skip technique is manipulated for lowering the computation of local motion estimation. Double buffering and pipeline running is designed for efficiently deriving the global motion. With these approaches, the corresponding hardware architecture has the characteristics of high efficiency and high throughput. The experimental results show that the proposed video stabilization engine can produce well-smooth videos and have high precision. The proposed hardware architecture enhances the performance of video stabilization with real-time and large resolution-processing ability. The objective comparison also demonstrates our good performance on video stabilization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian Structure-Preserving Image Contrast Enhancement and its Simplification

    Publication Year: 2012 , Page(s): 831 - 843
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (18399 KB) |  | HTML iconHTML  

    In this paper, an efficient Bayesian framework is proposed for image contrast enhancement. Based on the image acquisition pipeline, we model the image enhancement problem as a maximum a posteriori (MAP) estimation problem, where the posteriori probability is formulated based on the local information of the given image. In our framework, we express the likelihood model as a local image structure preserving constraint, where the overall effect of the shutter speed and camera response function is approximated as a linear transformation. On the other hand, we design the prior model based on the observed image and some statistical property of natural images. With the proposed framework, we can effectively enhance the contrast of the image in a natural-looking way, while with fewer artifacts at the same time. Moreover, in order to apply the proposed MAP formulation to typical enhancement problems, like image editing, we further convert the estimation process into an intensity mapping process, which can achieve comparable enhancement performance with a much lower computational complexity. Simulation results have demonstrated the feasibility of the proposed framework in providing flexible and effective contrast enhancement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LLM Integer Cosine Transform and its Fast Algorithm

    Publication Year: 2012 , Page(s): 844 - 854
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7240 KB) |  | HTML iconHTML  

    Existing video coding standards use only 4 × 4 and 8 × 8 transforms for energy compaction. Recent research has found that the use of larger transforms, such as 16 × 16, together with the existing transforms can improve coding performance especially in high-definition (HD) videos which are becoming more and more common. This raises the interest of seeking high-performance higher-order transforms with low computation requirement. In this paper, a method to derive orthogonal integer cosine transforms is proposed. The order-2N transform is defined using the order-N transform. A family of these integer transforms, Loeffler, Ligtenberg, and Moschytz (LLM) integer cosine transform, is derived using this method. Its fast algorithm structure is the same as LLM fast discrete cosine transform (DCT) algorithm but requires integer operations only. This new family of transforms is not only very close to the DCT but also has excellent coding performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Error-Resilient and Error Concealment 3-D SPIHT for Multiple Description Video Coding With Added Redundancy

    Publication Year: 2012 , Page(s): 855 - 868
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9475 KB) |  | HTML iconHTML  

    In this paper, we present a multiple description video coding algorithm based on error-resilient and error concealment set partitioning in hierarchical trees (ERC-SPIHT). In this proposed approach, additional redundancy is generated by wavelet decomposing the spatial root subband and such redundancy is then intentionally inserted into the substreams. As a result, the novelty of the proposed approach is that the root subband coefficients lost during transmission in any substream can be reconstructed by exploiting both inherent redundancy and inserted redundancy. This reconstruction procedure is implemented in two steps, first by using existing 2-D error concealment techniques, and second with the proposed root subband recovery approach. The former step is used to estimate the missing coefficients in the spatial root and high frequency subbands by exploiting the inherent redundancy, while the latter attempts to utilize the inserted redundancy to further improve the precision in the estimation of the missing spatial root subband coefficients. The proposed root subband recovery method can be iteratively applied and accuracy of the reconstruction can be gradually increased with each iteration. Experimental results on different video sequences show that the proposed method maintains error-resilience with high coding efficiency. In particular, our results demonstrate that the proposed algorithm achieves a significant improvement on video quality by up to 2.5753 dB in the presence of a substream loss compared to ERC-SPIHT. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lips Contour Detection and Tracking Using Watershed Region-Based Active Contour Model and Modified H_{\infty }

    Publication Year: 2012 , Page(s): 869 - 874
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3604 KB) |  | HTML iconHTML  

    In this paper, a region-based active contour model (ACM) with local information using watershed segmentation is proposed for lips contour detection. Compared to the ACM with global energy terms, the proposed system provides a more precise lips contour convergence under the circumstances where the lips are difficult to distinguish using global statistics. Furthermore, since the ACM is sensitive to the initial contour position, a modified H based on Lyapunov stability theory is proposed to provide better tracking of the subsequent lips feature points as the ACM initialization. The integration of the proposed ACM and modified H has revealed an improvement of the overall lips contour detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Free Viewpoint Video Coding With Rate-Distortion Analysis

    Publication Year: 2012 , Page(s): 875 - 889
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9042 KB) |  | HTML iconHTML  

    To improve free viewpoint video (FVV) coding efficiency and optimize the quality of the synthesized virtual view video, this paper proposes a depth-assisted FVV coding framework and analyzes the rate-distortion (R-D) property of the synthesized virtual view video in FVV coding. In the depth-assisted FVV coding framework, the depth assigned disparity compensated prediction is introduced to exploit the correlation between multiview video (MVV) and depth. To model the R-D property of the synthesized virtual view video, a region-based view synthesis distortion estimation approach is investigated with respect to the distortion of MVV and depth. Subsequently, the general R-D property estimation models of MVV and depth are analyzed. Finally, a rate-allocation scheme is designed to optimize the quantization parameter pair of MVV and depth in FVV coding. The simulation results demonstrate that the proposed depth-assisted FVV coding framework can improve the FVV coding efficiency. The region-based view synthesis distortion estimation approach and the general R-D model are able to precisely approximate the R-D property of synthesized virtual view video in the multiview video plus depth based FVV coding frameworks. The proposed rate-allocation scheme can optimize the overall FVV coding efficiency to achieve a high-quality reconstructed video at the desired viewpoint with a given rate constraint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Addressing Visual Consistency in Video Retargeting: A Refined Homogeneous Approach

    Publication Year: 2012 , Page(s): 890 - 903
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11257 KB) |  | HTML iconHTML  

    For the video retargeting problem which adjusts video content into a smaller display device, it is not clear how to balance the three conflicting design objectives: 1) visual interestingness preservation; 2) temporal retargeting consistency; and 3) nondeformation. To understand their perceptual importance, we first identify that the latter two play a dominating role in making the retargeting results appealing. Then a statistical study on human response to the targeting scale is carried out, suggesting that the global preservation of contents pursued by most existing approaches is not necessary. Based on the newly prioritized objectives and the statistical findings, we design a video retargeting system which, as a refined homogeneous approach, addresses the temporal consistency issue holistically and is still capable of preserving high degree of visual interestingness. In particular, we propose a volume retargeting cost metric to jointly consider the retargeting objectives and formulate video retargeting as an optimization problem in graph representation. A dynamic programming solution is then given. In addition, we introduce a nonlinear fusion based attention model to measure the visual interestingness distribution. The experiment results from both image rendering and subjective tests indicate that our proposed attention modeling and video retargeting system outperform their conventional methods, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Differential Edit Distance: A Metric for Scene Segmentation Evaluation

    Publication Year: 2012 , Page(s): 904 - 914
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2706 KB) |  | HTML iconHTML  

    In this paper, a novel approach to evaluating video temporal decomposition algorithms is presented. The evaluation measures typically used to this end are nonlinear combinations of precision-recall or coverage-overflow, which are not metrics and additionally possess undesirable properties, such as nonsymmetricity. To alleviate these drawbacks, we introduce a novel unidimensional measure that is proven to be metric and satisfies a number of qualitative prerequisites that previous measures do not. This measure is named differential edit distance (DED), since it can be seen as a variation of the well-known edit distance. After defining DED, we further introduce an algorithm that computes it in less than cubic time. Finally, DED is extensively compared with state-of-the-art measures, namely, the harmonic means (F-score) of precision-recall and coverage-overflow. The experiments include comparisons of qualitative properties, the time required for optimizing the parameters of scene segmentation algorithms with the help of these measures, and a user study gauging the agreement of these measures with the users' assessment of the segmentation results. The results confirm that the proposed measure is a unidimensional metric that is effective in evaluating scene segmentation techniques and in helping to optimize their parameters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Row-Parallel 8 ,\times, 8 2-D DCT Architecture Using Algebraic Integer-Based Exact Computation

    Publication Year: 2012 , Page(s): 915 - 929
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5158 KB) |  | HTML iconHTML  

    An algebraic integer (AI)-based time-multiplexed row-parallel architecture and two final reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8 × 8 2-D DCT that is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on: 1) optimized Dempster-Macleod multipliers, and 2) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on field-programmable gate array (FPGA) chip. Six designs, for 4-bit and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented, and measured. The maximum clock rate and block rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8 × 307.787≈2.462 GHz if eventually embedded in a real- time video-processing system. The equivalent frame rate is about 1187.35Hz for the image size of 1920 × 1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tangent Bundles on Special Manifolds for Action Recognition

    Publication Year: 2012 , Page(s): 930 - 942
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5395 KB) |  | HTML iconHTML  

    Increasingly, machines are interacting with people through human action recognition from video streams. Video data can naturally be represented as a third-order data tensor. Although many tensor-based approaches have been proposed for action recognition, the geometry of the tensor space is seldom regarded as an important aspect. In this paper, we stress that a data tensor is related to a tangent bundle on a special manifold. Using a manifold charting, we can extract discriminating information between actions. Data tensors are first factorized using high-order singular value decomposition, where each factor is projected onto a tangent space and the intrinsic distance is computed from a tangent bundle for action classification. We examine a standard manifold charting and some alternative chartings on special manifolds, particularly, the special orthogonal group, Stiefel manifolds, and Grassmann manifolds. Because the proposed paradigm frames the classification scheme as a nearest neighbor based on the intrinsic distance, prior training is unnecessary. We evaluate our method on three public action databases including the Cambridge gesture, the UMD Keck body gesture, and the UCF sport datasets. The empirical results reveal that our method is highly competitive with the current state-of-the-art methods, robust to small alignment errors, and yet simpler. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed Robust Optimization for Scalable Video Multirate Multicast Over Wireless Networks

    Publication Year: 2012 , Page(s): 943 - 957
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4723 KB) |  | HTML iconHTML  

    This paper proposes a distributed robust optimization scheme to jointly optimize overall video quality and traffic performance for scalable video multirate multicast over practical wireless networks. In order to guarantee layered utility maximization, the initial nominal joint source and network optimization is defined, where each scalable layer is tailored in an incremental order and finds jointly optimal multicast paths and associated rates with network coding. To enhance the robustness of the nominal convex optimization formulation with nonlinear constraints, we reserve partial bandwidth for backup paths disjoint from the primal paths. It considers the path-overlapping allocation of backup paths for different receivers to take advantage of network coding, and takes into account the robust multipath rate-control and bandwidth reservation problem for scalable video multicast streaming when possible link failures of primary paths exist. Specifically, an uncertainty set of the wireless medium capacity is introduced to represent the uncertain and time-varying property of parameters related to the wireless channel. The targeted uncertainty in the robust optimization problem is studied in a form of protection functions with nonlinear constraints, to analyze the tradeoff between robustness and distributedness. Using the dual decomposition and primal-dual update approach, we develop a fully decentralized algorithm with regard to communication overhead. Through extensive experimental results under critical performance factors, the proposed algorithm could converge to the optimal steady-state more quickly, and adapt the dynamic network changes in an optimal tradeoff between optimization performance and robustness than existing optimization schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 10-bit CMOS DAC With Current Interpolated Gamma Correction for LCD Source Drivers

    Publication Year: 2012 , Page(s): 958 - 965
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9609 KB) |  | HTML iconHTML  

    This paper presents a compact 10-bit digital-to-analog converter (DAC) for LCD source drivers. The cyclic DAC architecture is used to reduce the area of LCD column drivers when compared to the use of conventional resistor-string DACs. The current interpolation technique is proposed to perform gamma correction after D/A conversion. The gamma correction circuit is shared by four DAC channels using the interleave technique. A prototype 10-bit DAC with gamma correction function is implemented in 0.35 μm CMOS technology and its average die size per channel is 0.053 mm2, which is smaller than those of the R-DACs with gamma correction function. The settling time of the 10-bit DAC is 1 μs, and the maximum INL and DNL are 2.13 least significant bit (LSB) and 1.30 LSB, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gait Recognition Under Various Viewing Angles Based on Correlated Motion Regression

    Publication Year: 2012 , Page(s): 966 - 980
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6147 KB) |  | HTML iconHTML  

    It is well recognized that gait is an important biometric feature to identify a person at a distance, e.g., in video surveillance application. However, in reality, change of viewing angle causes significant challenge for gait recognition. A novel approach using regression-based view transformation model (VTM) is proposed to address this challenge. Gait features from across views can be normalized into a common view using learned VTM(s). In principle, a VTM is used to transform gait feature from one viewing angle (source) into another viewing angle (target). It consists of multiple regression processes to explore correlated walking motions, which are encoded in gait features, between source and target views. In the learning processes, sparse regression based on the elastic net is adopted as the regression function, which is free from the problem of overfitting and results in more stable regression models for VTM construction. Based on widely adopted gait database, experimental results show that the proposed method significantly improves upon existing VTM-based methods and outperforms most other baseline methods reported in the literature. Several practical scenarios of applying the proposed method for gait recognition under various views are also discussed in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Circuits and Systems Society Information

    Publication Year: 2012 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (32 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Publication Year: 2012 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it