By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 10 • Date Oct. 2013

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (141 KB)  
    Freely Available from IEEE
  • Illumination-Robust Foreground Detection in a Video Surveillance System

    Page(s): 1637 - 1650
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (983 KB) |  | HTML iconHTML  

    This paper presents a foreground detection algorithm that is robust against illumination changes and noise, and provides a novel and practical choice for intelligent video surveillance systems using static cameras. This paper first introduces an online expectation-maximization algorithm that is developed from a basic batch version to update Gaussian mixture models in real time. Then, a spherical K-means clustering method is combined to provide a more accurate direction for the update when illumination is unstable. The combination is supported by the linearity of RGB color reflected from object surfaces, which is both theoretically proved by spectral reflection theory and experimentally validated in several observations. Foreground detection is carried out using a statistical framework with regional judgment. Noise in the detection stage is further reduced by a Bayesian iterative decision-making step. The experiments show that the proposed algorithm outcompetes several classical methods on several datasets, both in detection performance and in robustness to perturbations from illumination changes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Control-Point Representation and Differential Coding Affine-Motion Compensation

    Page(s): 1651 - 1660
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5874 KB) |  | HTML iconHTML  

    The affine-motion model is able to capture rotation, zooming, and the deformation of moving objects, thereby providing a better motion-compensated prediction. However, it is not widely used due to difficulty in both estimation and efficient coding of its motion parameters. To alleviate this problem, a new control-point representation that favors differential coding is proposed for efficient compression of affine parameters. By exploiting the spatial correlation between adjacent coding blocks, motion vectors at control points can be predicted and thus efficiently coded, leading to overall improved performance. To evaluate the proposed method, four new affine prediction modes are designed and embedded into the high-efficiency video coding test model HM1.0. The encoder adaptively chooses whether to use the new affine mode in an operational rate-distortion optimization. Bitrate savings up to 33.82% in low-delay and 23.90% in random-access test conditions are obtained for low-complexity encoder settings. For high-efficiency settings, bitrate savings up to 14.26% and 4.89% for these two modes are observed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video-Based Tracking, Learning, and Recognition Method for Multiple Moving Objects

    Page(s): 1661 - 1674
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8256 KB) |  | HTML iconHTML  

    This paper presents an extended Markov chain Monte Carlo (MCMC) method for tracking and an extended hidden Markov model (HMM) method for learning/recognizing multiple moving objects in videos with jittering backgrounds. A graphical user interface (GUI) with enhanced usability is also proposed. Previous MCMC and HMM-based methods are known to suffer performance impairments, degraded tracking and recognition accuracy, and higher computation costs when challenged with appearance and trajectory changes such as occlusion, interaction, and varying numbers of moving objects. This paper proposes a cost reduction method for the MCMC approach by taking moves, i.e., birth and death, out of the iteration loop of the Markov chain when different moving objects interact. For stable and robust tracking, an ellipse model with stochastic model parameters is used. Moreover, our HMM method integrates several different modules in order to cope with multiple discontinuous trajectories. The GUI proposed herein offers an auto-allocation module of symbols from images and a hand-drawing module for efficient trajectory learning and for interest trajectory addition. Experiments demonstrate the advantages of our method and GUI in tracking, learning, and recognizing spatiotemporal smooth and discontinuous trajectories. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Person Re-Identification by Regularized Smoothing KISS Metric Learning

    Page(s): 1675 - 1685
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (899 KB) |  | HTML iconHTML  

    With the rapid development of the intelligent video surveillance (IVS), person re-identification, which is a difficult yet unavoidable problem in video surveillance, has received increasing attention in recent years. That is because computer capacity has shown remarkable progress and the task of person re-identification plays a critical role in video surveillance systems. In short, person re-identification aims to find an individual again that has been observed over different cameras. It has been reported that KISS metric learning has obtained the state of the art performance for person re-identification on the VIPeR dataset . However, given a small size training set, the estimation to the inverse of a covariance matrix is not stable and thus the resulting performance can be poor. In this paper, we present regularized smoothing KISS metric learning (RS-KISS) by seamlessly integrating smoothing and regularization techniques for robustly estimating covariance matrices. RS-KISS is superior to KISS, because RS-KISS can enlarge the underestimated small eigenvalues and can reduce the overestimated large eigenvalues of the estimated covariance matrix in an effective way. By providing additional data, we can obtain a more robust model by RS-KISS. However, retraining RS-KISS on all the available examples in a straightforward way is time consuming, so we introduce incremental learning to RS-KISS. We thoroughly conduct experiments on the VIPeR dataset and verify that 1) RS-KISS completely beats all available results for person re-identification and 2) incremental RS-KISS performs as well as RS-KISS but reduces the computational cost significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel Adaptive Algorithm for Intra Prediction With Compromised Modes Skipping and Signaling Processes in HEVC

    Page(s): 1686 - 1694
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (371 KB) |  | HTML iconHTML  

    Up to 35 intra prediction modes are available for each Luma prediction unit in the coming HEVC standard. This can provide more accurate predictions and thereby improve the compression efficiency of intra coding. However, the encoding complexity is thus increased dramatically due to a large number of modes involved in the intra mode decision process. In addition, more overhead bits should be assigned to signal the mode index. Intuitively, it is not necessary for all modes to be checked and signaled all the time. Therefore, a novel adaptive modes skipping algorithm for mode decision and signaling processing is presented in this paper. More specifically, three optimized candidate sets with 1, 19, and 35 intra prediction modes are initiated for each prediction unit in the proposed algorithm. Based on the statistical properties of the neighboring reference samples used for intra prediction, the proposed algorithm is able to adaptively select the optimal set from the three candidates for each prediction unit preceding the mode decision and signaling processing. As a result, the mode decision process can be speeded up due to some modes skipping in the first two sets, and importantly less bits are required to signal the mode index. Experimental results show that, compared to the test model HM7.0 of HEVC, BD-Rate savings of 0.18% and as well as 0.18% on average are achieved for AI-Main and AI-HE10 cases for low-bitrate ranges, and the average encoding times can also be reduced by 8%-38% and 8%-34% for AI-Main and AI-HE10 cases in low-bitrate ranges, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Background Subtraction for Network Surveillance in H.264 Streaming Video

    Page(s): 1695 - 1703
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7957 KB) |  | HTML iconHTML  

    The H.264/Advanced Video Coding (AVC) is the industry standard in network surveillance offering the lowest bitrate for a given perceptual quality among any MPEG or proprietary codecs. This paper presents a novel approach for background subtraction in bitstreams encoded in the Baseline profile of H.264/AVC. Temporal statistics of the proposed feature vectors, describing macroblock units in each frame, are used to select potential candidates containing moving objects. From the candidate macroblocks, foreground pixels are determined by comparing the colors of corresponding pixels pair-wise with a background model. The basic contribution of the current work compared to the related approaches is that, it allows each macroblock to have a different quantization parameter, in view of the requirements in variable as well as constant bit-rate applications. Additionally, a low-complexity technique for color comparison is proposed which enables us to obtain pixel-resolution segmentation at a negligible computational cost as compared to those of classical pixel-based approaches. Results showing striking comparison against those of proven state-of-the-art pixel domain algorithms are presented over a diverse set of standardized surveillance sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Denoising Games

    Page(s): 1704 - 1716
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (22230 KB) |  | HTML iconHTML  

    Based on the observation that every small window in a natural image has many similar windows in the same image, the nonlocal denoising methods perform denoising by weighted averaging all the pixels in a nonlocal window and have achieved very promising denoising results. However, the use of fixed parameters greatly limits the denoising performance. Therefore, an important issue in pixel-domain image denoising algorithms is how to adaptively choose optimal parameters. While the Stein's principle is shown to be able to estimate the true mean square error (MSE) for determining the optimal parameters, there exists a tradeoff between the accuracy of the estimate and the minimum of the true MSE. In this paper, we study the impact of such a tradeoff and formulate the image denoising problem as a coalition formation game. In this game, every pixel/block is treated as a player, who tries to seek partners to form a coalition to achieve better denoising results. By forming a coalition, every player in the coalition can obtain certain gains by improving the accuracy of the Stein's estimate, while incurring some costs by increasing the minimum of the true MSE. Moreover, we show that the traditional approaches using same parameters for the whole image are special cases of the proposed game theoretic framework by choosing the utility function without a cost term. Finally, experimental results demonstrate the efficiency and effectiveness of the proposed game theoretic method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Optimal Deployment of Cloud-Assisted Video Distribution Services

    Page(s): 1717 - 1728
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (853 KB) |  | HTML iconHTML  

    For Internet video services, the high fluctuation of user demands in geographically distributed regions results in low resource utilizations of traditional content distribution network systems. Due to the capability of rapid and elastic resource provisioning, cloud computing emerges as a new paradigm to reshape the model of video distribution over the Internet, in which resources (such as bandwidth, storage) can be rented on demand from cloud data centers to meet volatile user demands. However, it is challenging for a video service provider (VSP) to optimally deploy its distribution infrastructure over multiple geo-distributed cloud data centers. A VSP needs to minimize the operational cost induced by the rentals of cloud resources without sacrificing user experience in all regions. The geographical diversity of cloud resource prices further makes the problem complicated. In this paper, we investigate the optimal deployment problem of cloud-assisted video distribution services and explore the best tradeoff between the operational cost and the user experience. We aim to pave the way for building the next-generation video cloud. Toward this objective, we first formulate the deployment problem into a min-cost network flow problem, which takes both the operational cost and the user experience into account. Then, we apply the Nash bargaining solution to solve the joint optimization problem efficiently and derive the optimal bandwidth provisioning strategy and optimal video placement strategy. In addition, we extend the algorithms to the online case and consider the scenario when peers participate into video distribution. Finally, we conduct extensive simulations to evaluate our algorithms in the realistic settings. Our results show that our proposed algorithms can achieve a good balance among multiple objectives and effectively optimize both operational cost and user experience. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gradient Vector Flow and Grouping-Based Method for Arbitrarily Oriented Scene Text Detection in Video Images

    Page(s): 1729 - 1739
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7669 KB) |  | HTML iconHTML  

    Text detection in videos is challenging due to low resolution and complex background of videos. Besides, an arbitrary orientation of scene text lines in video makes the problem more complex and challenging. This paper presents a new method that extracts text lines of any orientations based on gradient vector flow (GVF) and neighbor component grouping. The GVF of edge pixels in the Sobel edge map of the input frame is explored to identify the dominant edge pixels which represent text components. The method extracts edge components corresponding to dominant pixels in the Sobel edge map, which we call text candidates (TC) of the text lines. We propose two grouping schemes. The first finds nearest neighbors based on geometrical properties of TC to group broken segments and neighboring characters which results in word patches. The end and junction points of skeleton of the word patches are considered to eliminate false positives, which output the candidate text components (CTC). The second is based on the direction and the size of the CTC to extract neighboring CTC and to restore missing CTC, which enables arbitrarily oriented text line detection in video frame. Experimental results on different datasets, including arbitrarily oriented text data, nonhorizontal and horizontal text data, Hua's data and ICDAR-03 data (camera images), show that the proposed method outperforms existing methods in terms of recall, precision and f-measure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single Image Super-resolution With Detail Enhancement Based on Local Fractal Analysis of Gradient

    Page(s): 1740 - 1754
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (50822 KB) |  | HTML iconHTML  

    In this paper, we propose a single image super-resolution and enhancement algorithm using local fractal analysis. If we treat the pixels of a natural image as a fractal set, the image gradient can then be regarded as a measure of the fractal set. According to the scale invariance (a special case of bi-Lipschitz invariance) feature of fractal dimension, we will be able to estimate the gradient of a high-resolution image from that of a low-resolution one. Moreover, the high-resolution image can be further enhanced by preserving the local fractal length of gradient during the up-sampling process. We show that a regularization term based on the scale invariance of fractal dimension and length can be effective in recovering details of the high-resolution image. Analysis is provided on the relation and difference among the proposed approach and some other state of the art interpolation methods. Experimental results show that the proposed method has superior super-resolution and enhancement results as compared to other competitors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating Orientation Cue With EOH-OLBP-Based Multilevel Features for Human Detection

    Page(s): 1755 - 1766
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1148 KB) |  | HTML iconHTML  

    Detecting pedestrians efficiently and accurately is a fundamental step for many computer vision applications, such as smart cars and robotics. In this paper, we introduce a pedestrian detection system to extract human objectives using an on-board monocular camera. First of all, we use an experiment to demonstrate that the orientation information is critical in human detection. Secondly, the local binary patterns-based feature, oriented LBP (OLBP), is discussed. The OLBP feature integrates pixel intensity difference with texture orientation information to capture salient object features. Thirdly, a set of edge orientation histogram (EOH) and OLBP-based intrablock and interblock features is presented to describe cell-level and block-level structure information. These multilevel features capture larger-scale structure information which is more informative for pedestrian localization. Experiments on the Institut national de recherche en informatique et en automatique (INRIA) dataset and the Caltech pedestrian detection benchmark demonstrate that the new pedestrian detection system is not only comparable to the existing pedestrian detectors, but also performs at a faster speed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MESIP: A Configurable and Data Reusable Motion Estimation Specific Instruction-Set Processor

    Page(s): 1767 - 1780
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1285 KB) |  | HTML iconHTML  

    This paper proposes a new motion estimation (ME)-specific instruction-set processor (MESIP) with a novel search scan order with high data reusability, to efficiently implement various advanced ME algorithms. The proposed ME-specific instructions can be selectively used for ME algorithms. The novel data-reusing search scan order, called center biased search scan (CBSS), exploits the symmetry of the search pattern to reduce redundant data loading on MESIP by about 26.9% and 16.1% compared with raster scan and snake scan, respectively. MESIP has been implemented with IBM's 90-nm CMOS technology and has 203 K gates excluding memory. Simulation results show that the proposed MESIP can reduce the number of required instructions by up to 18.9% compared with existing ME processors. Moreover, MESIP achieves comparable performance even with ME ASICs and hence may be quite suitable for a low-power and high-performance ME implementation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing Distributed Source Coding for Interactive Multiview Video Streaming Over Lossy Networks

    Page(s): 1781 - 1794
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (591 KB) |  | HTML iconHTML  

    In interactive multiview video streaming (IMVS), a user observes one view at a time, but can periodically switch to a desired neighboring captured view as the video is played back in time. Previous IMVS works focus on efficient compression techniques that facilitate interactive view switching. In this paper, in addition to the loss-resilient aspect during network streaming we address how to design efficient coding tools and optimize frame structure for transmission to facilitate view switching and contain error propagation in differentially coded video due to packet losses. We first design a new unified distributed source coding (uDSC) frame-a new coding tool that simultaneously offers view switching and loss-resilient capabilities-for periodic insertion into the multiview frame structure. After inserting uDSC-frames into the coding structure, we schedule packets for network transmission in a rate-distortion optimal manner for both wireless multicast and wired unicast streaming scenarios. For wireless multicast over a Gilbert-Elliott loss model, frames in a group of pictures are packetized and reordered, so that uDSC frames are correctly decoded with high probability, mitigating error propagation. For wired unicast, we use a Markov decision process to optimize packet transmission to minimize expected distortion given a bandwidth constraint. Experimental results show that systems that insert uDSC frames and optimize packet transmission can outperform other competing coding schemes by up to 2.8 and 11.6 dB in wireless multicast and wired unicast streaming scenarios, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive Stereoscopic Video Conversion

    Page(s): 1795 - 1808
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1563 KB) |  | HTML iconHTML  

    This paper presents a system of converting conventional monocular videos to stereoscopic ones. In the system, an input monocular video is firstly segmented into shots so as to reduce operations on similar frames. An automatic depth estimation method is proposed to compute the depth maps of the video frames utilizing three monocular depth cues - depth-from-defocus, aerial perspective, and motion. Foreground/background objects can be interactively segmented on selected key frames and their depth values can be adjusted by users. Such results are propagated from key frames to nonkey frames within each video shot. Equipped with a depth-to-disparity conversion module, the system synthesizes the counterpart (either left or right) view for stereoscopic display by warping the original frames according to their disparity maps. The quality of converted videos is evaluated by human mean opinion scores, and experiment results demonstrate that the proposed conversion method achieves encouraging performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection

    Page(s): 1809 - 1821
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (948 KB) |  | HTML iconHTML  

    Moving object detection is an important and fundamental step for intelligent video surveillance systems because it provides a focus of attention for post-processing. A multilayer codebook-based background subtraction (MCBS) model is proposed for video sequences to detect moving objects. Combining the multilayer block-based strategy and the adaptive feature extraction from blocks of various sizes, the proposed method can remove most of the nonstationary (dynamic) background and significantly increase the processing efficiency. Moreover, the pixel-based classification is adopted for refining the results from the block-based background subtraction, which can further classify pixels as foreground, shadows, and highlights. As a result, the proposed scheme can provide a high precision and efficient processing speed to meet the requirements of real-time moving object detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Page(s): 1822
    Save to Project icon | Request Permissions | PDF file iconPDF (1157 KB)  
    Freely Available from IEEE
  • Technology insight on demand on IEEE.tv

    Page(s): 1823
    Save to Project icon | Request Permissions | PDF file iconPDF (1518 KB)  
    Freely Available from IEEE
  • IEEE Global History Network

    Page(s): 1824
    Save to Project icon | Request Permissions | PDF file iconPDF (3172 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (118 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (107 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it