By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 12 • Date Dec. 2011

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (67 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • Improved SIMD Architecture for High Performance Video Processors

    Page(s): 1769 - 1783
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1158 KB) |  | HTML iconHTML  

    Single instruction multiple data (SIMD) execution is in no doubt an efficient way to exploit the data level parallelism in image and video applications. However, SIMD execution bottlenecks must be tackled in order to achieve high execution efficiency. We first analyze in this paper the implementation of two major kernel functions of H.264/AVC namely, SATD and subpel interpolation, in conventional SIMD architectures to identify the bottlenecks in traditional approaches. Based on the analysis results, we propose a new SIMD architecture with two novel features: 1) parallel memory structure with variable block size and word length support, and 2) configurable SIMD structure. The proposed parallel memory structure allows great flexibility for programmers to perform data access of different block sizes and different word lengths. The configurable SIMD structure allows almost “random” register file access and slightly different operations in ALUs inside SIMD. The new features greatly benefit the realization of H.264/AVC kernel functions. For instance, the fractional motion estimation, particularly the half to quarter pixel interpolation, can now be executed with minimal or no additional memory access. When comparing with the conventional SIMD systems, the proposed SIMD architecture can have a further speedup of 2.1X to 4.6X when implementing H.264/AVC kernel functions. Based on Amdahl's law, the overall speedup of H.264/AVC encoding application can be projected to be 2.46X. We expect significant improvement can also be achieved when applying the proposed architecture to other image and video processing applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Contour Tracking by Combining Region and Boundary Information

    Page(s): 1784 - 1794
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1111 KB) |  | HTML iconHTML  

    This paper presents a new object tracking model that systematically combines region and boundary features. Besides traditional region features (intensity/color and texture), we design a new boundary-based object detector for accurate and robust tracking in low-contrast and complex scenes, which usually appear in the commonly used monochrome surveillance systems. In our model, region feature-based energy terms are characterized by probability models, and boundary feature terms include edge and frame difference. With a new weighting term, a novel energy functional is proposed to systematically combine the region and boundary-based components, and it is minimized by a level set evolution equation. For an efficient computational cost, motion information is utilized for new frame level set initialization. Compared with region feature-based models, the experimental results show that the proposed model significantly improves the performance under different circumstances, especially for objects in low-contrast and complex environments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication Mechanisms and Middleware for Distributed Video Surveillance

    Page(s): 1795 - 1809
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (965 KB) |  | HTML iconHTML  

    A new generation of advanced surveillance systems is being conceived as a collection of multisensor components such as video, audio, and mobile robots interacting in a cooperating manner to enhance situation awareness capabilities to assist surveillance personnel. The prominent issues that these systems face are the improvement of existing intelligent video surveillance systems, the inclusion of wireless networks, the use of low power sensors, the design architecture, the communication between different components, the fusion of data emerging from different type of sensors, the location of personnel (providers and consumers), and the scalability of the system. This paper focuses on the aspects pertaining to real-time distributed architecture and scalability. For example, to meet real-time requirements, these systems need to process data streams in concurrent environments, designed by taking into account scheduling and synchronization. This paper proposes a framework for the design of visual surveillance systems based on components derived from the principles of real-time networks/data-oriented requirements implementation scheme. It also proposes the implementation of these components using the well-known middleware technology common object request broker architecture. Results using this architecture for video surveillance are presented through an implemented prototype. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Distance Metric Learning for Object Tracking

    Page(s): 1810 - 1821
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (909 KB) |  | HTML iconHTML  

    Tracking an object without any prior information regarding its appearance is a challenging problem. Modern tracking algorithms treat tracking as a binary classification problem between the object class and the background class. The binary classifier can be learned offline, if a specific object model is available, or online, if there is no prior information about the object's appearance. In this paper, we propose the use of online distance metric learning in combination with nearest neighbor classification for object tracking. We assume that the previous appearances of the object and the background are clustered so that a nearest neighbor classifier can be used to distinguish between the new appearance of the object and the appearance of the background. In order to support the classification, we employ a distance metric learning (DML) algorithm that learns to separate the object from the background. We utilize the first few frames to build an initial model of the object and the background and subsequently update the model at every frame during the course of tracking, so that changes in the appearance of the object and the background are incorporated into the model. Furthermore, instead of using only the previous frame as the object's model, we utilize a collection of previous appearances encoded in a template library to estimate the similarity under variations in appearance. In addition to the utilization of the online DML algorithm for learning the object/background model, we propose a novel feature representation of image patches. This representation is based on the extraction of scale invariant features over a regular grid coupled with dimensionality reduction using random projections. This type of representation is both robust, capitalizing on the reproducibility of the scale invariant features, and fast, performing the tracking on a reduced dimensional space. The proposed tracking algorithm was tested under challenging conditions and achieved state-of-the art- performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Saliency Density Maximization for Efficient Visual Objects Discovery

    Page(s): 1822 - 1834
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1411 KB) |  | HTML iconHTML  

    Detection of salient objects in an image remains a challenging problem despite extensive studies in visual saliency, as the generated saliency map is usually noisy and incomplete. In this paper, we propose a new method to discover the salient object without prior knowledge on its shape and size. By searching the sub-image, i.e., a bounding box of maximum saliency density, the new formulation can automatically crop the salient objects of various sizes in spite of the cluttered background, and is capable to handle different types of saliency maps. A global optimal solution is obtained by the proposed density-based branch-and-bound search. The proposed method can apply to both images and videos. Experimental results on a public dataset of 5000 images show that our unsupervised detection approach is comparable to the state-of-the-art learning-based methods. Promising results are also observed in the salient object detection for videos with a good potential in video retargeting. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking Web Video Topics: Discovery, Visualization, and Monitoring

    Page(s): 1835 - 1846
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (882 KB) |  | HTML iconHTML  

    Despite the massive growth of web-shared videos in Internet, efficient organization and monitoring of videos remains a practical challenge. While nowadays broadcasting channels are keen to monitor online events, identifying topics of interest from huge volume of user uploaded videos and giving recommendation to emerging topics are by no means easy. Specifically, such process involves discovering of new topic, visualization of the topic content, and incremental monitoring of topic evolution. This paper studies the problem from three aspects. First, given a large set of videos collected over months, an efficient algorithm based on salient trajectory extraction on a topic evolution link graph is proposed for topic discovery. Second, topic trajectory is visualized as a temporal graph in 2-D space, with one dimension as time and another as degree of hotness, for depicting the birth, growth, and decay of a topic. Finally, giving the previously discovered topics, an incremental monitoring algorithm is proposed to track newly uploaded videos, while discovering new topics and giving recommendation to potentially hot topics. We demonstrate the application on three months' videos crawled from YouTube during December 2008 to February 2009. Both objective and user studies are conducted to verify the performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Block-Based Depth Maps Interpolation for Efficient Multiview Content Generation

    Page(s): 1847 - 1858
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1260 KB) |  | HTML iconHTML  

    For multiview video generation from 2-D video sequences, the most important stage is to efficiently synthesize the corresponding depth maps for all video frames. The depth map generation involving precise object segmentation and depth assignment requires intensive computation and human assistance which could be costly and possibly inconsistent. To achieve reliable depth map generation, an efficient depth map interpolation method from the existing pairs of keyframes and their corresponding depth maps at the both ends of video scene is presented. The proposed method contains forward/backward motion alignment, block-based updating, object-based updating, frame selection mechanism, and depth adjustment. Experimental results show that the proposed method can generate the depth maps successfully after objective and subjective evaluations. We believe that it could greatly help to achieve multiview video generation from 2-D video sequences effectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fast and Efficient Multi-View Depth Image Coding Method Based on Temporal and Inter-View Correlations of Texture Images

    Page(s): 1859 - 1868
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (668 KB) |  | HTML iconHTML  

    Multi-view video plus depth format, which includes texture and depth images, has been recently introduced as a video representation to support depth perception of scenes and efficient view generation at arbitrary positions. Especially, a depth image has been one of the significantly important issues for successful services of 3-D video applications. In this paper, we introduce a fast and efficient multi-view depth image coding method using the texture images. The proposed method determines to skip some blocks of the depth image at the early stage without a normal encoding process including rate-distortion optimization, based on temporal and inter-view correlations between the previously encoded texture images. The skipped blocks are predicted from the neighboring depth images. Experimental results demonstrate that the proposed method not only achieves the drastically high coding performance but also reduces the complexity of the encoder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DLIG: Direct Local Indirect Global Alignment for Video Mosaicing

    Page(s): 1869 - 1878
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1274 KB) |  | HTML iconHTML  

    In this paper, we present a framework for real-time mosaicing from video sequences recorded from an uncalibrated pan tilt zoom camera based on multiframe registration. To this end, a new frame alignment algorithm, the direct local indirect global (DLIG), is presented. The key idea of the DLIG alignment is to divide the frame alignment problem into the problem of registering a set of spatially related image patches. The registration is iteratively computed by sequentially imposing a good local match and global spatial coherence. The patch registration is performed using a tracking algorithm, so a very efficient local matching can be achieved. We use the patch-based registration to obtain multiframe registration, using the mosaic coordinates to relate the current frame to patches from different frames that partially share the current field of view. Multiframe registration prevents the error accumulation problem, one of the most important problems in mosaicing. We also show how to embed a kernel tracking algorithm in order to obtain a precise and extremely efficient mosaicing algorithm. Finally, we perform a quantitative evaluation of our algorithm, including a comparison with other alignment approaches, and studying its performance against interlaced videos and illumination changes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Stabilization and Completion Using Two Cameras

    Page(s): 1879 - 1889
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (965 KB) |  | HTML iconHTML  

    Video stabilization is important in many application fields, such as visual surveillance. Video stabilization and completion based on a single camera have been well studied in recent years, but it remains a very challenging problem. In this paper, we propose a novel framework to produce a stable high-resolution video for visual surveillance by using two cameras, in which one static camera serves to capture low-resolution wide-view-angle images, and the other is a pan-tilt-zoom camera to capture high-resolution images. Different with using a single camera, the interesting target can be detected and tracked more effectively and much more high-resolution information can be utilized for the stabilization and completion by using two videos from two cameras. A three-step stabilization approach is designed to deal with the resolution's discrepancy between two synchro videos and a four-stage completion strategy is taken to utilize more high-resolution information. Experimental results show that the proposed algorithm has a satisfying performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Quality Assessment Based on Measuring Perceptual Noise From Spatial and Temporal Perspectives

    Page(s): 1890 - 1902
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB) |  | HTML iconHTML  

    Video quality assessment (VQA) exploits important properties of the sophisticated human visual system (HVS). In this paper, we study a series of fundamental HVS characteristics for subjective video quality assessment, and incorporate them into a systematic framework to simulate subjective evaluation on impaired videos. Based on this framework, we develop a novel full-reference metric, namely, perceptual quality index (PQI). Specifically, the proposed PQI metric comprises four major modules: 1) visual performance equation for the foveal and extra-foveal vision based on the cortical magnification theory; 2) perceptible noise detection using a spatial-temporal just noticeable difference model, and its quantification in both spatial and temporal channels, considering the varying error sensitivity due to the contrast and motion masking effects; 3) instantaneous error summation with inhibition of weak local distortions, and quality degradation accumulation over time that models the visual persistence and recency effect; and 4) fusion of the spatial and temporal noise intensities into a perceptual quality index. Compared with some state-of-the-art VQA models, the PQI metric, which exploits multiple visual properties, measures video quality more accurately and reliably on two VQA databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and Formalization of Fuzzy Finite Automata for Detection of Irregular Fire Flames

    Page(s): 1903 - 1912
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (926 KB) |  | HTML iconHTML  

    Fire-flame detection using a video camera is difficult because a flame has irregular characteristics, i.e., vague shapes and color patterns. Therefore, in this paper, we propose a novel fire-flame detection method using fuzzy finite automata (FFA) with probability density functions based on visual features, thereby providing a systemic approach to handling irregularity in computational systems and the ability to handle continuous spaces by combining the capabilities of automata with fuzzy logic. First, moving regions are detected via background subtraction, and the candidate flame regions are then identified by applying flame color models. In general, flame regions have a continuous irregular pattern; therefore, probability density functions are generated for the variation in intensity, wavelet energy, and motion orientation and applied to the FFA. The proposed algorithm is successfully applied to various fire/non-fire videos, and its detection performance is better than that of other methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Susceptibility to Visual Discomfort of 3-D Displays by Visual Performance Measures

    Page(s): 1913 - 1923
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (426 KB) |  | HTML iconHTML  

    People with some signs of binocular dysfunctioning can be susceptible to visual complaints associated with viewing stereoscopic content at large viewing distances. Two performance measurements enabled to distinguish people by their binocular status (BS) in previous research: the ratio of performance of the Wilkins rate of reading test (WRRT) between 2-D and 3-D, and the vergence facility. In an experiment, first, an extensive optometric screening was carried out to differentiate visually asymptomatic young adults with good BS (GBS) (N = 27) from those with a moderate BS (MBS) (N = 6). Second, participants had to perform the WRRT at short viewing distance followed by a questionnaire under different screen disparity settings. The results reveal that the ratio of the WRRT between 0 and -1.5 screen disparity is an appropriate indicator of participants with MBS in comparison with participants with GBS. In addition, the results show that 0.75° of screen disparity is already problematic for people with MBS. We conclude that the WRRT-ratio has potential as a BS test in consumer applications to provide individual settings for comfortable screen disparities based on viewers' BS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Lossless Intra Coding of H.264/AVC by Pixel-Wise Spatial Interleave Prediction

    Page(s): 1924 - 1928
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (331 KB) |  | HTML iconHTML  

    H.264/AVC adopts many directional spatial prediction models in block-based manner that neighboring pixels on the left and top sides yield prediction for the pixels in a data block to be encoded. However, such models may adapt poorly to the rich textures inside blocks of video signal. In this letter, a new lossless intra coding method based on pixel-wise interleave prediction is presented to enhance the compression performance of H.264/AVC. In our scheme, pixels are coded alternately with interleave prediction, which makes full use of reconstructed pixels to predict later ones in bidirectional or multidirectional manner. Extensive experiments demonstrate that compared to the H.264/AVC standard, our scheme has higher compression ratio, especially for sequences of high resolution. In addition, the scheme can be regarded as a frame-level coding mode and can be easily integrated into the H.264/AVC framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalized Hybrid Intra and Wyner-Ziv Video Coding

    Page(s): 1929 - 1934
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    The hybrid Wyner-Ziv (WZ) and intra video coding system has many interesting features, like the possibility to perform real motion estimation, at the decoder, and low latency coding. In this letter, we present an improved system and we will generalize it to include the other types of WZ coding. Thus, we propose a new quantization and a new cyclic redundancy check checking technique and, most importantly, we propose many new coding modes to produce a more efficient system that exploits the strong features of each type. The resulting system can thus produce many types of hybrid frames and allows mode adaptive coding. The simulation results show an improvement of the rate-distortion efficiency and can largely outperform those of state of the art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Special issue on circuits, systems and algorithms for compressive sensing

    Page(s): 1935
    Save to Project icon | Request Permissions | PDF file iconPDF (102 KB)  
    Freely Available from IEEE
  • IEEE Foundation [advertisement]

    Page(s): 1936
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • 2011 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 21

    Page(s): 1937 - 1960
    Save to Project icon | Request Permissions | PDF file iconPDF (390 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (33 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it