By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Issue 12 • Date Dec. 2014

Filter Results

Displaying Results 1 - 17 of 17
  • Table of contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (165 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (140 KB)  
    Freely Available from IEEE
  • Nonlocal Pixel Selection for Multisurface Fitting-Based Super-Resolution

    Page(s): 2013 - 2017
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1026 KB) |  | HTML iconHTML  

    In this paper, we address a super-resolution (SR) problem that constructs a high-resolution (HR) frame/image from a short sequence of low-resolution (LR) frames/images. It is well known that SR is a difficult problem, especially when the number of LR inputs is small. In particular, our previous work involving multisurface fitting-based SR exhibits relatively poor performance in the above case. To cope with this problem, we take advantage of nonlocal pixels to fit local surfaces. The pixels from nonlocal spatial–temporal positions are selected and weighted based on patch similarity and outlier removal. With this method, the fitted surfaces become more elaborate so that more details can be retrieved in SR results. Experiments demonstrate that the proposed method is very effective in producing HR frames through a small number of LR inputs when compared with some state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel DCT-Based Image Up-Sampling Using Learning-Based Adaptive ${k}$ -NN MMSE Estimation

    Page(s): 2018 - 2033
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6141 KB) |  | HTML iconHTML  

    Image up-sampling in the discrete cosine transform (DCT) domain is a challenging problem because DCT coefficients are de-correlated, such that it is nontrivial to estimate directly high-frequency DCT coefficients from observed low-frequency DCT coefficients. In the literature, DCT-based up-sampling algorithms usually pad zeros as high-frequency DCT coefficients or estimate such coefficients with limited success mainly due to the nonadaptive estimator and restricted information from a single observed image. In this paper, we tackle the problem of estimating high-frequency DCT coefficients in the spatial domain by proposing a learning-based scheme using an adaptive $k$ -nearest neighbor weighted minimum mean squares error (MMSE) estimation framework. Our proposed scheme makes use of the information from precomputed dictionaries to formulate an adaptive linear MMSE estimator for each DCT block. The scheme is able to estimate high-frequency DCT coefficients with very successful results. Experimental results show that the proposed up-sampling scheme produces the minimal ringing and blocking effects, and significantly better results compared with the state-of-the-art algorithms in terms of peak signal-to-noise ratio (more than 1 dB), structural similarity, and subjective quality measurements. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Joint Video Frame Set Division and Low-Rank Decomposition for Background Subtraction

    Page(s): 2034 - 2048
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3661 KB) |  | HTML iconHTML  

    The recently proposed robust principle component analysis (RPCA) has been successfully applied in background subtraction. However, low-rank decomposition makes sense on the condition that the foreground pixels (sparsity patterns) are uniformly located at the scene, which is not realistic in real-world applications. To overcome this limitation, we reconstruct the input video frames and aim to make the foreground pixels not only sparse in space but also sparse in time. Therefore, we propose a joint video frame set division and RPCA-based method for background subtraction. In addition, we use the motion as a priori knowledge which has not been considered in the current subspace-based methods. The proposed method consists of two phases. In the first phase, we propose a lower bound-based within-class maximum division method to divide the video frame set into several subsets. In this way, the successive frames are assigned to different subsets in which the foregrounds are located at the scene randomly. In the second phase, we augment each subset using the frames with a small quantity of motion. To evaluate the proposed method, the experiments are conducted on real-world and public datasets. The comparisons with the state-of-the-art background subtraction methods validate the superiority of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Density Encoding for Robust Stereo Matching

    Page(s): 2049 - 2062
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7926 KB) |  | HTML iconHTML  

    Stereo correspondence is challenging under realistic conditions due to uncontrolled factors that affect input images, including illumination inconsistencies and radiometric variations. Many local and global models have been suggested to address these problems; however, their performance is often degraded due to the assumption of color consistency between the left and right images. Therefore, we present a new local pattern, local density encoding, for stereo matching measurements to improve the performance of existing stereo methods. Our experimental results indicate that the proposed method is less sensitive to illumination changes and radiometric variations. Moreover, in the cases with normal and severe illumination changes, the proposed method is more robust than state-of-the-art data costs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape-From-Focus Depth Reconstruction With a Spatial Consistency Model

    Page(s): 2063 - 2076
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3948 KB) |  | HTML iconHTML  

    This paper presents a maximum a posteriori (MAP) framework to incorporate a spatial consistency prior model for depth reconstruction in the shape-from-focus (SFF) process. Existing SFF techniques, which reconstruct a dense 3-D depth from multifocus image frames, usually have poor performance over low-contrast regions and usually need a large number of frames to achieve satisfactory results. To overcome these problems, a new depth reconstruction process is proposed to estimate the depth values by solving an MAP estimation problem with the inclusion of a spatial consistency model. This consistency model assumes that within a local region, the depth value of each pixel can be roughly predicted by an affine transformation of the image features at that pixel. A local learning process is proposed to construct the consistency model directly from the multifocus image sequence. By adopting this model, the depth values can be inferred in a more robust way, especially over low-contrast regions. In addition, to improve the computational efficiency, a cell-based version of the MAP framework is proposed. Experimental results demonstrate the effective improvement in accuracy and robustness as compared with existing approaches over real and synthesized image data. In addition, experimental results also demonstrate that the proposed method can achieve quite impressive performance, even with only the use of a few image frames. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors

    Page(s): 2077 - 2089
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4143 KB) |  | HTML iconHTML  

    High Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity. The complexity increase of motion estimation (ME) procedure is rather significant, especially when considering the complicated partitioning structure of HEVC. To fully exploit the coding efficiency brought by HEVC requires a huge amount of computations. In this paper, we analyze the ME structure in HEVC and propose a parallel framework to decouple ME for different partitions on many-core processors. Based on local parallel method (LPM), we first use the directed acyclic graph (DAG)-based order to parallelize coding tree units (CTUs) and adopt improved LPM (ILPM) within each CTU (DAGILPM), which exploits the CTU-level and prediction unit (PU)-level parallelism. Then, we find that there exist completely independent PUs (CIPUs) and partially independent PUs (PIPUs). When the degree of parallelism (DP) is smaller than the maximum DP of DAGILPM, we process the CIPUs and PIPUs, which further increases the DP. The data dependencies and coding efficiency stay the same as LPM. Experiments show that on a 64-core system, compared with serial execution, our proposed scheme achieves more than 30 and 40 times speedup for $1920times 1080$ and $2560times 1600$ video sequences, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Motion Hooks for the Multiview Extension of HEVC

    Page(s): 2090 - 2098
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1699 KB) |  | HTML iconHTML  

    MV-HEVC refers to the multiview extension of High Efficiency Video Coding (HEVC). At the time of writing, MV-HEVC was being developed by the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) of International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) Moving Picture Experts Group and ITU-T VCEG. Before HEVC itself was technically finalized in January 2013, the development of MV-HEVC had already started and it was decided that MV-HEVC would only contain high-level syntax changes compared with HEVC, i.e., no changes to block-level processes, to enable the reuse of the first-generation HEVC decoder hardware as is for constructing an MV-HEVC decoder with only firmware changes corresponding to the high-level syntax part of the codec. Consequently, any block-level process that is not necessary for HEVC itself but on the other hand is useful for MV-HEVC can only be enabled through so-called hooks. Motion hooks refer to techniques that do not have a significant impact on the HEVC single-view version 1 codec and can mainly improve MV-HEVC. This paper presents techniques for efficient MV-HEVC coding by introducing hooks into the HEVC design to accommodate inter-view prediction in MV-HEVC. These hooks relate to motion prediction, hence named motion hooks. Some of the motion hooks developed by the authors have been adopted into HEVC during its finalization. Simulation results show that the proposed motion hooks provide on average 4% of bitrate reduction for the views coded with inter-view prediction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward Real-Time and Efficient Compression of Human Time-Varying Meshes

    Page(s): 2099 - 2116
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6952 KB) |  | HTML iconHTML  

    In this paper, a novel skeleton-based approach to human time-varying mesh (H-TVM) compression is presented. The topic of TVM compression is new and has many challenges, such as handling the lack of obvious mapping of vertices across frames and handling the variable connectivity across frames, while maintaining efficiency, which are the most important ones. Very few works exist in the literature, while not all of the challenges have been addressed yet. In addition, developing an efficient and real-time solution, handling the above, obviously is a difficult task. We attempt to address the H-TVM compression problem inspired from video coding using different types of frames and trying to efficiently remove inter-frame geometric redundancy utilizing the recent advances in human skeleton tracking. The overall approach focuses on compression efficiency, low distortion, and low computation time enabling for real-time transmission of H-TVMs. It efficiently compresses geometry and vertex attributes of TVMs. In addition, this paper is the first to provide an efficient method for connectivity coding of TVMs, by introducing a modification to the state-of-the-art MPEG-4 TFAN algorithm. Experiments are conducted in the MPEG-3DGC TVM database. The method outperforms the state-of-the-art standardized static mesh coder MPEG-4 TFAN at low bit-rates, while remaining competent at high bit-rates. It gives a practical proof of concept that in the combined problem of geometry, connectivity, and vertex attribute coding of TVMs, efficient inter-frame redundancy removal is possible, establishing ground for further improvements. Finally, this paper proposes a method for motion-based coding of H-TVMs that can further enhance the overall experience when H-TVM compression is used in a tele-immersion scenario. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy Consumption of Visual Sensor Networks: Impact of Spatio-Temporal Coverage

    Page(s): 2117 - 2131
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2443 KB) |  | HTML iconHTML  

    Wireless visual sensor networks (VSNs) are expected to play a major role in future IEEE 802.15.4 personal area networks (PANs) under recently established collision-free medium access control (MAC) protocols, such as the IEEE 802.15.4e-2012 MAC. In such environments, the VSN energy consumption is affected by a number of camera sensors deployed (spatial coverage), as well as a number of captured video frames of which each node processes and transmits data (temporal coverage). In this paper we explore this aspect for uniformly formed VSNs, that is, networks comprising identical wireless visual sensor nodes connected to a collection node via a balanced cluster-tree topology, with each node producing independent identically distributed bitstream sizes after processing the video frames captured within each network activation interval. We derive analytic results for the energy-optimal spatio-temporal coverage parameters of such VSNs under a priori known bounds for the number of frames to process per sensor and the number of nodes to deploy within each tier of the VSN. Our results are parametric to the probability density function characterizing the bitstream size produced by each node and the energy consumption rates of the system of interest. Experimental results are derived from a deployment of TelosB motes and reveal that our analytic results are always within 7% of the energy consumption measurements for a wide range of settings. In addition, results obtained via motion JPEG encoding and feature extraction on a multimedia subsystem (BeagleBone Linux Computer) show that the optimal spatio-temporal settings derived by our framework allow for substantial reduction of energy consumption in comparison with ad hoc settings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient SRAM FPGA-Based Wireless Vision Sensor Node: SENTIOF-CAM

    Page(s): 2132 - 2143
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3220 KB) |  | HTML iconHTML  

    Many wireless vision sensor networks (WVSNs) applications are characterized to have a low duty cycling. An individual wireless vision senor node (VSN) in WVSN is required to complete the tasks as quickly as possible. The execution of the tasks can be speeded up by exploiting the inherited parallelism in the tasks by using a hardware platform such as field-programmable gate array (FPGA). Traditionally, SRAM FPGAs are considered to be inefficient for duty cycled applications. This paper presents a low-complexity, energy-efficient, and reconfigurable VSN architecture based on SRAM FPGA using a design matrix, which includes tasks’ partitioning, a low-complexity background subtraction, bilevel coding, and duty cycling. The proposed VSN, referred to as SENTIOF-CAM, has been implemented on a prototype board and energy values of different states are measured for three real applications. The comparison results with existing solutions show that the proposed architecture with SRAM FPGA can achieve energy reduction of up to a factor of 69 as compared with software VSN solutions and approximately similar energy values to that for the FLASH FPGA-based VSN solutions. The lifetime based on measured energy values shows that, for a sample period of 5 min, a 3.2-years lifetime can be achieved with a battery of 37.44-kJ energy. In addition, the proposed solution offers a generic architecture with a smaller design complexity on a hardware reconfigurable platform and offers easy adaptation for a number of applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exposing Fake Bit Rate Videos and Estimating Original Bit Rates

    Page(s): 2144 - 2154
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3776 KB) |  | HTML iconHTML  

    Bit rate is one of the important criterions for digital video quality. With some video tools, however, video bit rate can be easily increased without improving the video quality at all. In such a case, a claimed high bit rate video would actually have poor visual quality if it is up-converted from an original lower bit rate version. Therefore, exposing fake bit rate videos becomes an important issue for digital video forensics. To the best of our knowledge, although some methods have been proposed for exposing fake bit rate MPEG-2 videos, no relative work has been reported to further estimate their original bit rates. In this paper, we first analyze the statistical artifacts of these fake bit rate videos, including the requantization artifacts based on the first-digit law in the DCT frequency domain (12-D) and the changes of the structural similarity indexes between the query video and its sequential bit rate down-converted versions in the spatial domain (4-D), and then we propose a compact yet very effective 16-D feature vector for exposing fake bit rate videos and further estimating their original bit rates. The extensive experiments evaluated on hundreds of video sequences with four different resolutions and two typical compression schemes (i.e., MPEG-2 and H.264/AVC) have shown the effectiveness of the proposed method compared with the existing relative ones. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Page(s): 2155
    Save to Project icon | Request Permissions | PDF file iconPDF (1156 KB)  
    Freely Available from IEEE
  • IEEE xplore digital library

    Page(s): 2156
    Save to Project icon | Request Permissions | PDF file iconPDF (1586 KB)  
    Freely Available from IEEE
  • IEEE Circuits and Systems Society Information

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (119 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Circuits and Systems for Video Technology information for authors

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (119 KB)  
    Freely Available from IEEE

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it