By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 2 • Date Feb. 2013

Filter Results

Displaying Results 1 - 22 of 22
  • Table of contents

    Publication Year: 2013 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (235 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Publication Year: 2013 , Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (140 KB)  
    Freely Available from IEEE
  • Efficient Techniques for Depth Video Compression Using Weighted Mode Filtering

    Publication Year: 2013 , Page(s): 189 - 202
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9608 KB) |  | HTML iconHTML  

    This paper proposes efficient techniques to compress a depth video by taking into account coding artifacts, spatial resolution, and dynamic range of the depth data. Due to abrupt signal changes on object boundaries, a depth video compressed by conventional video coding standards often introduces serious coding artifacts over object boundaries, which severely affect the quality of a synthesized view. We suppress the coding artifacts by proposing an efficient postprocessing method based on a weighted mode filtering and utilizing it as an in-loop filter. In addition, the proposed filter is also tailored to efficiently reconstruct the depth video from the reduced spatial resolution and the low dynamic range. The down/upsampling coding approaches for the spatial resolution and the dynamic range are used together with the proposed filter in order to further reduce the bit rate. We verify the proposed techniques by applying them to an efficient compression of multiview-plus-depth data, which has emerged as an efficient data representation for 3-D video. Experimental results show that the proposed techniques significantly reduce the bit rate while achieving a better quality of the synthesized view in terms of both objective and subjective measures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Temporal Order Invariance for View-Invariant Action Recognition

    Publication Year: 2013 , Page(s): 203 - 211
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9740 KB) |  | HTML iconHTML  

    View-invariant action recognition is one of the most challenging problems in computer vision. Various representations are being devised for matching actions across different viewpoints to achieve view invariance. In this paper, we explore the invariance property of temporal order of action instances during action execution and utilize it for devising a new view-invariant action recognition approach. To ensure temporal order during matching, we utilize spatiotemporal features, feature fusion and temporal order consistency constraint. We start by extracting spatiotemporal cuboid features from video sequences and applying feature fusion to encapsulate within-class similarity for the same viewpoints. For each action class, we construct a feature fusion table to facilitate feature matching across different views. An action matching score is then calculated based on global temporal order constraint and number of matching features. Finally, the action label of the class with the maximum value of the matching score is assigned to the query action. Experimentation is performed on multiple view Inria Xmas motion acquisition sequences and West Virginia University action datasets, with encouraging results, that are comparable to the existing view-invariant action recognition techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Video Broadcast Over Downlink MIMO–OFDM Systems

    Publication Year: 2013 , Page(s): 212 - 223
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6472 KB) |  | HTML iconHTML  

    We propose a cross-layer design framework for efficient broadcasting scalable H.264 videos over the downlink multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing systems. The objective is to maximize the average peak signal-to-noise ratio of the received video streams by jointly optimizing video layer extraction, subcarrier allocation, modulation and coding, and transmit precoding, considering the heterogeneity of video sources and channel conditions. Specifically, to exploit the MIMO channel, we employ a codebook-based linear transmit precoding strategy with limited feedback. Given the fact that different quality layers of the video have different importance, we propose an adaptive modulation and coding scheme, where a fixed coding rate is used for each quality layer and unequal error protection is implemented for different layers. We further propose a subcarrier allocation strategy to assign transmission channels for different layers of different users' videos, to satisfy the decoding dependence constraints among the video layers and to maximize the reconstructed video quality. The proposed scalable video broadcast solution has a low complexity and low signaling overhead, which makes it suitable for practical implementations. We provide experimental results to demonstrate the effectiveness of the proposed solution, using both model-based simulations and an end-to-end software simulation testbed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pedestrian Detection Based on Blob Motion Statistics

    Publication Year: 2013 , Page(s): 224 - 235
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8977 KB) |  | HTML iconHTML  

    Pedestrian detection based on video analysis is a key functionality in automated surveillance systems. In this paper, we present efficient detection metrics that consider the fact that human movement presents distinctive motion patterns. Contrary to several methods that perform an intrablob analysis based on motion masks, we approach the problem without necessarily considering the periodic pixel motion inside the blob. As such, we do not analyze periodicity in the pixel luminances, but in the motion statistics of the tracked blob as a whole. For this, we propose the use of the following cues: 1) a cyclic behavior in the blob trajectory, and 2) an in-phase relationship between the change in blob size and position. In addition, we also exploit the relationship between blob size and vertical position, assuming that the camera is positioned sufficiently high. If the homography between the camera and the ground is known, the features are normalized by transforming the blob size to the real person size. For improved performance, we combine these features using the Bayes classifier. We also present a theoretical statistical analysis to evaluate the system performance in the presence of noise. We perform online experiments in a real industrial scenario and also with videos from well-known databases. The results illustrate the applicability of the proposed features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Silhouette Analysis-Based Action Recognition Via Exploiting Human Poses

    Publication Year: 2013 , Page(s): 236 - 243
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2798 KB) |  | HTML iconHTML  

    In this paper, we propose a novel scheme for human action recognition that combines the advantages of both local and global representations. We explore human silhouettes for human action representation by taking into account the correlation between sequential poses in an action. A modified bag-of-words model, named bag of correlated poses, is introduced to encode temporally local features of actions. To utilize the property of visual word ambiguity, we adopt the soft assignment strategy to reduce the dimensionality of our model and circumvent the penalty of computational complexity and quantization error. To compensate for the loss of structural information, we propose an extended motion template, i.e., extensions of the motion history image, to capture the holistic structural features. The proposed scheme takes advantages of local and global features and, therefore, provides a discriminative representation for human actions. Experimental results prove the viability of the complimentary properties of two descriptors and the proposed approach outperforms the state-of-the-art methods on the IXMAS action recognition dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Color Mismatch Compensation Method Based on a Physical Model

    Publication Year: 2013 , Page(s): 244 - 257
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (18919 KB) |  | HTML iconHTML  

    A new method for detecting and correcting color-timing mismatch in digital film is proposed. This method is based on a physical model which accounts for the absorption of light by the different layers of the film. The whole process consists of five sequential stages: shot change detection, registration, degraded zone detection, and unreliable motion detection and correction. One of the main issues in this paper is how to align the degraded and reference frames in the case of varying illumination conditions. A new transformation, based on the proposed physical model, is derived to make the data term of the optical flow framework robust to color change. The degraded regions are then detected and restored by using the computed flow field. The performance of the proposed method is evaluated through extensive tests on both simulated and real high definition films. The obtained results are very promising and confirm the efficiency of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancement of Image and Depth Map Using Adaptive Joint Trilateral Filter

    Publication Year: 2013 , Page(s): 258 - 269
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (23347 KB) |  | HTML iconHTML  

    In this paper, we present an adaptive joint trilateral filter (AJTF), which consists of domain, range, and depth filters. The AJTF is used for the joint enhancement of images and depth maps, which is achieved by suppressing the noise and sharpening the edges simultaneously. For improving the sharpness of the image and depth map, the AJTF parameters, the offsets, and the standard deviations of the range and depth filters are determined in such a way that image edges that match well with depth edges are emphasized. To this end, pattern matching between local patches in the image and depth map is performed and the matching result is utilized to adjust the AJTF parameters. Experimental results show that the AJTF produces sharpness-enhanced images and depth maps without overshoot and undershoot artifacts, while successfully reducing noise as well. A comparison of the performance of the AJTF with those of conventional image and depth enhancement algorithms shows that the proposed algorithm is effective. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Intra Coding With Adaptive Partial Reconstruction

    Publication Year: 2013 , Page(s): 270 - 279
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7892 KB) |  | HTML iconHTML  

    Intra prediction improves coding performance by reducing inter pixel redundancy. However, to accommodate the use of block transforms, not all pixels can be predicted from reconstructed pixels that are located close to themselves. This causes prediction performance to suffer as pixel values further apart are less correlated. This paper presents additional intra coding modes designed with the goal of improving prediction performance. Experimental results show an average gain of about 2% in the key technical area software when the new modes are incorporated in the current 8×8 prediction modes. Since the new coding modes (8×8) are designed with transform size smaller than coding block size, the modes can also be useful when the source block is larger than the maximum transform size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compact Representation for Dynamic Texture Video Coding Using Tensor Method

    Publication Year: 2013 , Page(s): 280 - 288
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5816 KB) |  | HTML iconHTML  

    Dynamic textures are important parts of natural video signals that usually generate an enormous size of high-dimensional data; therefore, effective representation methods are needed for relevant applications. This paper presents a new method for compact representation of high-dimensional data based on tensor decomposition, which can preserve the native form of the data. By treating the high-dimensional data as higher-order tensors, we propose a multiple tensor rank-R decomposition (MTRD) algorithm, which uses low-rank tensors to iteratively approximate the original tensor. Through our MTRD algorithm, the dimension of the data can be greatly reduced, and the decomposition coefficients give a compact representation of the data. As our compact representation method can characterize regular textures in video well, we apply it to dynamic texture video coding, and achieve a better video quality than H.264/AVC with a very low bit-rate just by simply quantizing and coding the decomposition coefficients. Experimental results show that the peak signal-to-noise ratio values of the reconstructed testing sequences can be improved from approximately 0.28 dB up to 8.96 dB, while their bit-rate reductions range from approximately 1.34% to 64.92%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware Implementation of a Digital Watermarking System for Video Authentication

    Publication Year: 2013 , Page(s): 289 - 301
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7442 KB) |  | HTML iconHTML  

    This paper presents a hardware implementation of a digital watermarking system that can insert invisible, semifragile watermark information into compressed video streams in real time. The watermark embedding is processed in the discrete cosine transform domain. To achieve high performance, the proposed system architecture employs pipeline structure and uses parallelism. Hardware implementation using field programmable gate array has been done, and an experiment was carried out using a custom versatile breadboard for overall performance evaluation. Experimental results show that a hardware-based video authentication system using this watermarking technique features minimum video quality degradation and can withstand certain potential attacks, i.e., cover-up attacks, cropping, and segment removal on video sequences. Furthermore, the proposed hardware-based watermarking system features low power consumption, low cost implementation, high processing speed, and reliability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Matching-Area-Based Seam Carving for Video Retargeting

    Publication Year: 2013 , Page(s): 302 - 310
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (15946 KB) |  | HTML iconHTML  

    This paper presents a video retargeting method considering both spatial and temporal coherence for resizing videos. Our algorithm is based on a novel matching-area-based temporal energy adjustment that allows per-frame seam carving to remove the optimal pixels to achieve spatially and temporally continuous resized videos. The temporal energy adjustment allows the seam to track the object it previously carved, and avoid carving the seam on different objects in two consecutive frames to achieve both spatial and temporal coherence. Our method outperforms other state-of-the-art retargeting systems, as demonstrated in the results and widely supported by the conducted user study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic License Plate Recognition (ALPR): A State-of-the-Art Review

    Publication Year: 2013 , Page(s): 311 - 325
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (869 KB) |  | HTML iconHTML  

    Automatic license plate recognition (ALPR) is the extraction of vehicle license plate information from an image or a sequence of images. The extracted information can be used with or without a database in many applications, such as electronic payment systems (toll payment, parking fee payment), and freeway and arterial monitoring systems for traffic surveillance. The ALPR uses either a color, black and white, or infrared camera to take images. The quality of the acquired images is a major factor in the success of the ALPR. ALPR as a real-life application has to quickly and successfully process license plates under different environmental conditions, such as indoors, outdoors, day or night time. It should also be generalized to process license plates from different nations, provinces, or states. These plates usually contain different colors, are written in different languages, and use different fonts; some plates may have a single color background and others have background images. The license plates can be partially occluded by dirt, lighting, and towing accessories on the car. In this paper, we present a comprehensive review of the state-of-the-art techniques for ALPR. We categorize different ALPR techniques according to the features they used for each stage, and compare them in terms of pros, cons, recognition accuracy, and processing speed. Future forecasts of ALPR are given at the end. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Propagating Certainty in Petri Nets for Activity Recognition

    Publication Year: 2013 , Page(s): 326 - 337
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6587 KB) |  | HTML iconHTML  

    This paper considers the problem of recognizing activities as they occur in surveillance video. Activities are high-level nonatomic semantic concepts which may have complex temporal structure. Activities are not easily identifiable using image features, but rather by the recognition of their composing events. Unfortunately, these composing events may only be observed up to a particular certainty. This paper describes particle filter Petri Net (PFPN), an activity recognition process that combines uncertain event observations to determine the likelihood that a particular activity is taking place in a video sequence. Our paper is based on previous study in which activities are specified as Petri Nets. The stochastic PFPN framework proposed in this paper improves over existing deterministic approaches to activity recognition by enabling the certainty reasoning required for coping with inherent ambiguity in both low-level video processing and activity definition. Furthermore, the PFPN approach reduces the dependence on a duration model and enables the creation of holistic activity models. Often when activity recognition frameworks are proposed they are strongly paired with a particular methodology for low-level video processing and event recognition. Each proposed approach is then applied to a nonstandard dataset. In our experiments, we provide an empirical comparison of our approach with leading activity recognition approaches across several datasets, using a constant event recognition as input. Our results illustrate the tradeoff between deterministic and stochastic activity recognition approaches. Furthermore, our experiments suggest that the holistic PFPN approach is more robust for activity recognition in the surveillance video domain than competing approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complexity Reduction and Performance Improvement for Geometry Partitioning in Video Coding

    Publication Year: 2013 , Page(s): 338 - 352
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9809 KB) |  | HTML iconHTML  

    Geometry partitioning for video coding involves establishing a partition line boundary within each block-shaped region and applying motion-compensated prediction to the two sub-regions created by the partition line. This paper presents techniques for enhancing the effectiveness and reducing the complexity of geometry partitioning schemes. A texture-difference-based approach is described to simplify the process of selecting the partition lines. Applying this approach together with a described skipping strategy for blocks with uniform texture can achieve a 94% reduction of encoding time while retaining a similar rate-distortion (R-D) performance to the full-search partitioning approach, when implemented for wedge-based geometry partitioning (WGP) in the context of H.264/MPEG-4 AVC JM 16.2. A bit rate improvement of approximately 6% is shown relative to not using geometry partitioning. For further R-D improvement, we describe a background-compensated prediction scheme to reduce the number of overhead bits used for motion vectors. Additionally, for systems in which high-quality depth maps are available, we incorporate depth map usage into the described approaches to generate a more accurate partitioning. Using these approaches with object-boundary-based geometry partitioning can achieve about 9% bit rate savings relative to using WGP, while keeping a similar computational complexity to the described complexity-reduced WGP. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Memory-Efficient High-Speed Convolution-Based Generic Structure for Multilevel 2-D DWT

    Publication Year: 2013 , Page(s): 353 - 363
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6249 KB) |  | HTML iconHTML  

    In this paper, we have proposed a design strategy for the derivation of memory-efficient architecture for multilevel 2-D DWT. Using the proposed design scheme, we have derived a convolution-based generic architecture for the computation of three-level 2-D DWT based on Daubechies (Daub) as well as biorthogonal filters. The proposed structure does not involve frame-buffer. It involves line-buffers of size 3(K-2)M/4 which is independent of throughput-rate, where K is the order of Daubechies/biorthogonal wavelet filter and M is the image height. This is a major advantage when the structure is implemented for higher throughput. The structure has regular data-flow, small cycle period TM and 100% hardware utilization efficiency. As per theoretical estimate, for image size 512 × 512, the proposed structure for Daub-4 filter requires 152 more multipliers and 114 more adders, but involves 82 412 less memory words and takes 10.5 times less time to compute three-level 2-D DWT than the best of the existing convolution-based folded structures. Similarly, compared with the best of the existing lifting-based folded structures, proposed structure for 9/7-filter involves 93 more multipliers and 166 more adders, but uses 85 317 less memory words and requires 2.625 times less computation time for the same image size. It involves 90 (nearly 47.6%) more multipliers and 118 (nearly 40.1%) more adders, but requires 2723 less memory words than the recently proposed parallel structure and performs the computation in nearly half the time of the other. Inspite of having more arithmetic components than the lifting-based structures, the proposed structure offers significant saving of area and power over the other due to substantial reduction in memory size and smaller clock-period. ASIC synthesis result shows that, the proposed structure for Daub-4 involves 1.7 times less area-delay-product (ADP) and consumes 1.21 times less energy per image- (EPI) than the corresponding best available convolution-based structure. It involves 2.6 times less ADP and consumes 1.48 times less EPI than the parallel lifting-based structure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Low-Power Fractional-Order Synchronizer for Syncless Time-Sequential Synchronization of 3-D TV Active Shutter Glasses

    Publication Year: 2013 , Page(s): 364 - 369
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7176 KB) |  | HTML iconHTML  

    The 3-D TV active shutter glasses (SGs) technology requires the communication of the sync timing for time-sequential frame synchronization, in contrast to the syncless film-type patterned retarder approach. To advance our previous work based on the fractional-order timer, this paper proposes a fractional-order synchronizer, including an adaptive sync reconstructor (ASR) based on FRT for syncless frame synchronization. The FRT enables accurate synchronization regardless of sync-clock speed. The hybrid cooperation of FRT and ASR reduces the required frequency of communication of the sync packets and turns off the emitter on the TV side for perfect syncless operation. The implemented one-chip solution uses less than roughly 11% of the operating current of major commercial SG. The syncless SG technique also minimizes synchronization failure and 3-D vision crosstalk from interruption in sync packets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Publication Year: 2013 , Page(s): 370
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE
  • IEEE Xplore Digital Library

    Publication Year: 2013 , Page(s): 371
    Save to Project icon | Request Permissions | PDF file iconPDF (1372 KB)  
    Freely Available from IEEE
  • IEEE Foundation

    Publication Year: 2013 , Page(s): 372
    Save to Project icon | Request Permissions | PDF file iconPDF (320 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Publication Year: 2013 , Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (107 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it