By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 4 • Date April 2011

Filter Results

Displaying Results 1 - 17 of 17
  • Table of contents

    Publication Year: 2011 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (65 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Publication Year: 2011 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • Contextual Bag-of-Words for Visual Categorization

    Publication Year: 2011 , Page(s): 381 - 392
    Cited by:  Papers (17)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (825 KB) |  | HTML iconHTML  

    Bag-of-words (BOW), which represents an image by the histogram of local patches on the basis of a visual vocabulary, has attracted intensive attention in visual categorization due to its good performance and flexibility. Conventional BOW neglects the contextual relations between local patches due to its Naïve Bayesian assumption. However, it is well known that contextual relations play an important role for human beings to recognize visual categories from their local appearance. This paper proposes a novel contextual bag-of-words (CBOW) representation to model two kinds of typical contextual relations between local patches, i.e., a semantic conceptual relation and a spatial neighboring relation. To model the semantic conceptual relation, visual words are grouped on multiple semantic levels according to the similarity of class distribution induced by them, accordingly local patches are encoded and images are represented. To explore the spatial neighboring relation, an automatic term extraction technique is adopted to measure the confidence that neighboring visual words are relevant. Word groups with high relevance are used and their statistics are incorporated into the BOW representation. Classification is taken using the support vector machine with an efficient kernel to incorporate the relational information. The proposed approach is extensively evaluated on two kinds of visual categorization tasks, i.e., video event and scene categorization. Experimental results demonstrate the importance of contextual relations of local patches and the CBOW shows superior performance to conventional BOW. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating Spatio-Temporal Context With Multiview Representation for Object Recognition in Visual Surveillance

    Publication Year: 2011 , Page(s): 393 - 407
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1259 KB) |  | HTML iconHTML  

    We present in this paper an integrated solution to rapidly recognizing dynamic objects in surveillance videos by exploring various contextual information. This solution consists of three components. The first one is a multi-view object representation. It contains a set of deformable object templates, each of which comprises an ensemble of active features for an object category in a specific view/pose. The template can be efficiently learned via a small set of roughly aligned positive samples without negative samples. The second component is a unified spatio-temporal context model, which integrates two types of contextual information in a Bayesian way. One is the spatial context, including main surface property (constraints on object type and density) and camera geometric parameters (constraints on object size at a specific location). The other is the temporal context, containing the pixel-level and instance-level consistency models, used to generate the foreground probability map and local object trajectory prediction. We also combine the above spatial and temporal contextual information to estimate the object pose in scene and use it as a strong prior for inference. The third component is a robust sampling-based inference procedure. Taking the spatio-temporal contextual knowledge as the prior model and deformable template matching as the likelihood model, we formulate the problem of object category recognition as a maximum-a-posteriori problem. The probabilistic inference can be achieved by a simple Markov chain Mento Carlo sampler, owing to the informative spatio-temporal context model which is able to greatly reduce the computation complexity and the category ambiguities. The system performance and benefit gain from the spatio-temporal contextual information are quantitatively evaluated on several challenging datasets and the comparison results clearly demonstrate that our proposed algorithm outperforms other state-of-the-art algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visual Object Tracking Based on Combination of Local Description and Global Representation

    Publication Year: 2011 , Page(s): 408 - 420
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1336 KB) |  | HTML iconHTML  

    This paper provides a novel method for visual object tracking based on the combination of local scale-invariant feature transform (SIFT) description and global incremental principal component analysis (PCA) representation in loosely constrained conditions. The state of object is defined by the position and shape of a parallelogram, which means that tracking results are given by locating the object in every frame using parallelograms. The whole method is constructed in the framework of particle filter which includes two models: the dynamic model and the observation model. In the dynamic model, particle states are predicted with the help of local SIFT descriptors. Local key point matching between successive frames based on SIFT descriptors provides us an important cue for the prediction of particle states; thus, we can efficiently spread particles in the neighborhood of the predicted position. In the observation model, every particle is evaluated by local key point-weighted incremental PCA representation, which can describe the object more accurately by giving large weights to the pixels in the influence area of key points. Moreover, by incorporating the dynamic forgetting factor, we can update the PCA eigenvectors online according to the object states, which makes our method more adaptable under different situations. Experimental results show that compared to other state-of-the-art methods, the proposed method is robust especially under some difficult conditions, such as strong motion of both object and background, large pose change, and illumination change. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Passive-Blind Forgery Detection Scheme Based on Content-Adaptive Quantization Table Estimation

    Publication Year: 2011 , Page(s): 421 - 434
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1213 KB) |  | HTML iconHTML  

    In this paper, we propose a passive-blind scheme for detecting forged images. The scheme leverages quantization table estimation to measure the inconsistency among images. To improve the accuracy of the estimation process, each AC DCT coefficient is first classified into a specific type; then the corresponding quantization step size is measured adaptively from its energy density spectrum (EDS) and the EDS's Fourier transform. The proposed content-adaptive quantization table estimation scheme is comprised of three phases: pre-screening, candidate region selection, and tampered region identification. In the pre-screening phase, we determine whether an input image has been JPEG compressed, and count the number of quantization steps whose size is equal to one. To select candidate regions for estimating the quantization table, we devise a candidate region selection algorithm based on seed region generation and region growing. First, the seed region generation operation finds a suitable region by removing suspect regions, after which the selected seed region is merged with other suitable regions to form a candidate region. To avoid merging suspect regions, a candidate region refinement operation is performed in the region growing step. After estimating the quantization table from the candidate region, an maximum-likelihood-ratio classifier exploits the inconsistency of the quantization table to identify tampered regions block by block. To evaluate the scheme's performance in terms of tampering detection, three common forgery techniques, copy-paste tampering, inpainting, and composite tampering, are used. Experiment results demonstrate that the proposed scheme can estimate quantization tables and identify tampered regions effectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Low Complexity Corner Detector

    Publication Year: 2011 , Page(s): 435 - 445
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1668 KB) |  | HTML iconHTML  

    Corner feature point detection with both the high-speed and high-quality is still very demanding for many real-time computer vision applications. The Harris and Kanade-Lucas-Tomasi (KLT) are widely adopted good quality corner feature point detection algorithms due to their invariance to rotation, noise, illumination, and limited view point change. Although they are widely adopted corner feature point detectors, their applications are rather limited because of their inability to achieve real-time performance due to their high complexity. In this paper, we redesigned Harris and KLT algorithms to reduce their complexity in each stage of the algorithm: Gaussian derivative, cornerness response, and non-maximum suppression (NMS). The complexity of the Gaussian derivative and cornerness stage is reduced by using an integral image. In NMS stage, we replaced a highly complex sorting and NMS by the efficient NMS followed by sorting the result. The detected feature points are further interpolated for sub-pixel accuracy of the feature point location. Our experimental results on publicly available evaluation data-sets for the feature point detectors show that our low complexity corner detector is both very fast and similar in feature point detection quality compared to the original algorithm. We achieve a complexity reduction by a factor of 9.8 and attain 50 f/s processing speed for images of size 640×480 on a commodity central processing unit with 2.53 GHz and 3 GB random access memory. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spatiotemporal Saliency Detection and Its Applications in Static and Dynamic Scenes

    Publication Year: 2011 , Page(s): 446 - 456
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2044 KB) |  | HTML iconHTML  

    This paper presents a novel method for detecting salient regions in both images and videos based on a discriminant center-surround hypothesis that the salient region stands out from its surroundings. To this end, our spatiotemporal approach combines the spatial saliency by computing distances between ordinal signatures of edge and color orientations obtained from the center and the surrounding regions and the temporal saliency by simply computing the sum of absolute difference between temporal gradients of the center and the surrounding regions. Our proposed method is computationally efficient, reliable, and simple to implement and thus it can be easily extended to various applications such as image retargeting and moving object extraction. The proposed method has been extensively tested and the results show that the proposed scheme is effective in detecting saliency compared to various state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixtures of von Mises Distributions for People Trajectory Shape Analysis

    Publication Year: 2011 , Page(s): 457 - 471
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1055 KB) |  | HTML iconHTML  

    People trajectory analysis is a recurrent task in many pattern recognition applications, such as surveillance, behavior analysis, video annotation, and many others. In this paper, we propose a new framework for analyzing trajectory shape, invariant to spatial shifts of the people motion in the scene. In order to cope with the noise and the uncertainty of the trajectory samples, we propose to describe the trajectories as a sequence of angles modeled by distributions of circular statistics, i.e., a mixture of von Mises (MovM) distributions. To deal with MovM, we define a new specific expectation-maximization (EM) algorithm for estimating the parameters and derive a closed form of the Bhattacharyya distance between single von Mises pdfs. Trajectories are then modeled with a sequence of symbols, corresponding to the most suitable distribution in the mixture, and compared each other after a global alignment procedure to cope with trajectories of different lengths. The trajectories in the training set are clustered according to their shape similarity in an off-line phase, and testing trajectories are then classified with a specific on-line EM, based on sufficient statistics. The approach is particularly suitable for classifying people trajectories in video surveillance, searching for abnormal (i.e., infrequent) paths. Tests on synthetic and real data are provided with also a complete comparison with other circular statistical and alignment methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From Xetal-II to Xetal-Pro: On the Road Toward an Ultralow-Energy and High-Throughput SIMD Processor

    Publication Year: 2011 , Page(s): 472 - 484
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2257 KB) |  | HTML iconHTML  

    Looking forward to the next generation of mobile streaming computing, the demanded energy efficiency of end-user terminals will become ever stringent. The Xetal-Pro processor, which is the latest member of the Xetal low-power single-instruction multiple data (SIMD) processor family from Philips, is presented in this paper. The predecessor of Xetal-Pro, known as Xetal-II, already ranks as one of the most computational-efficient [in terms of giga operations per second (GOPS)/Watt] processors available today, however, it cannot yet achieve the demanded energy efficiency (less than 1 pJ per operation). Unlike Xetal-II, Xetal-Pro supports ultrawide supply voltage (Vdd) scaling from the nominal supply to the subthreshold region. Although aggressive Vdd scaling causes severe throughput degradation, this can be partly compensated for by the massive parallelism in the Xetal family. Xetal-II includes a large on-chip frame memory (FM), which cannot be scaled well to an ultralow Vdd hence creating a big obstacle to increase energy efficiency. Therefore, we investigate both different FM realizations and memory organization alternatives. A hybrid memory system (HMS), which reduces the non-local memory traffic and enables further Vdd scaling, is proposed. For design space exploration of the right number of the scratchpad memory (SM) entries, the corresponding data locality analysis is provided, too. Moreover, some unique circuit implementation issues of Xetal-Pro such as the customized level-shifter are also discussed. Compared to Xetal-II operating at the nominal voltage, Xetal-Pro provides up to two times energy efficiency improvement even without Vdd scaling (essentially a consequence of data localization in the SM) when delivering the same amount of ultrahigh throughput. With Vdd scaling into the sub/near threshold region, Xetal-Pro could gain more than ten times energy reduction while still delivering a high throughput of 0.69 GOPS (- ounting multiply and add operations only). The new insight of Xetal-Pro sheds light on the direction of future ultralow-energy SIMD processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model-Based Joint Bit Allocation Between Texture Videos and Depth Maps for 3-D Video Coding

    Publication Year: 2011 , Page(s): 485 - 497
    Cited by:  Papers (22)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1478 KB) |  | HTML iconHTML  

    In 3-D video coding, texture videos and depth maps need to be jointly coded. The distortion of texture videos and depth maps can be propagated to the synthesized virtual views. Besides coding efficiency of texture videos and depth maps, joint bit allocation between texture videos and depth maps is also an important research issue in 3-D video coding. First, we present comprehensive analyses on the impacts of the compression distortion of texture videos and depth maps on the quality of the virtual views, and then derive a concise distortion model for the synthesized virtual views. Based on this model, the joint bit allocation problem is formulated as a constrained optimization problem, and is solved by using the Lagrangian multiplier method. Experimental results demonstrate the high accuracy of the derived distortion model. Meanwhile, the rate-distortion (R-D) performance of the proposed algorithm is close to those of search-based algorithms which can give the best R-D performance, while the complexity of the proposed algorithm is lower than that of search-based algorithms. Moreover, compared with the bit allocation method using fixed texture and depth bits ratio (5:1), a maximum 1.2 dB gain can be achieved by the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Error Resilient Video Coding Scheme Using Embedded Wyner–Ziv Description With Decoder Side Non-Stationary Distortion Modeling

    Publication Year: 2011 , Page(s): 498 - 512
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2281 KB) |  | HTML iconHTML  

    In this paper, we propose a generic error resilient video coding (ERVC) scheme using embedded Wyner-Ziv (WZ) description. At the encoder side, a joint source-channel R-D optimized mode selection (JSC-RDO-MS) algorithm with WZ-coded anchor frames is statistically studied and developed. Given a stationary first-order Markov Gaussian source, the proposed mode optimization is justified by an analysis of the RD impact on the WZ bit-rate. JSC-RDO-MS involves in the estimation of expected rate and distortion of WZ coding with the unavailable side information, and the WZ bit-rate of each coding mode is determined based on the error correction capability of the specific WZ codec. At the decoder side, an online correlation noise model between the source and the side-information is proposed with a mixture of Laplacians whose parameters are attained to reflect the coherence of the motion field of successive frames and the energy of prediction residual. Each mixture component represents the statistical distribution of prediction residuals, and the mixing coefficients represent the amount of errors in motion compensation. The proposed scheme achieves the so-called classification gain by exploiting the spatially non-stationary characteristics of the motion field and texture. Extensive experimental results show that the proposed WZ-ERVC scheme achieves a better overall RD performance than existing ERVC schemes, and the proposed modeling algorithm also significantly outperforms the conventional Laplacian model by up to 2 dB. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Jointly Optimized Mode Decisions in Redundant Video Streaming

    Publication Year: 2011 , Page(s): 513 - 518
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB) |  | HTML iconHTML  

    This letter investigates source-channel coding for error-resilient video streaming using redundant encoding. We estimate the end-to-end distortion per redundantly encoded macroblock (MB) via extension of the recursive optimal per-pixel estimate to encompass redundant transmissions. Redundant encoding is formulated as joint optimization of the MB parameters in the primary and redundant transmissions. We present three encoding strategies with different gain-complexity tradeoffs. The proposed methods are general in nature, and could be implemented on top of any (hybrid) video codec. Simulation results employing H.264's redundant slice mechanism show significant performance gains over conventional error-resilient encoding methods and naive redundant encoding schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Modeling of Inter-Frame Prediction Error and Its Adaptive Transform

    Publication Year: 2011 , Page(s): 519 - 523
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB) |  | HTML iconHTML  

    Most video coding standards use the discrete cosine transform, known to be near optimal for original images, to transform prediction errors. Since the statistical characteristics of prediction errors are quite different from those of original images, a more suitable transform for prediction errors has to be devised. In this letter, we introduce a novel statistical model for inter-frame prediction error and propose an adaptive transform based on the model. In addition, in order to reduce the computation time, a fast and efficient algorithm is developed. Experiments on well-known image sequences confirm that our proposed transform can improve the performance of transform coding significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Have you visited lately? www.ieee.org [advertisement]

    Publication Year: 2011 , Page(s): 524
    Save to Project icon | Request Permissions | PDF file iconPDF (210 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Publication Year: 2011 , Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology Information for authors

    Publication Year: 2011 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it