By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Popular Articles (April 2015)

Includes the top 50 most frequently downloaded documents for this publication according to the most recent monthly usage statistics.
  • 1. Overview of the High Efficiency Video Coding (HEVC) Standard

    Publication Year: 2012 , Page(s): 1649 - 1668
    Cited by:  Papers (327)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (4080 KB) |  | HTML iconHTML  

    High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard. View full abstract»

    Open Access
  • 2. An introduction to biometric recognition

    Publication Year: 2004 , Page(s): 4 - 20
    Cited by:  Papers (665)  |  Patents (50)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1136 KB) |  | HTML iconHTML  

    A wide variety of systems requires reliable personal recognition schemes to either confirm or determine the identity of an individual requesting their services. The purpose of such schemes is to ensure that the rendered services are accessed only by a legitimate user and no one else. Examples of such applications include secure access to buildings, computer systems, laptops, cellular phones, and ATMs. In the absence of robust personal recognition schemes, these systems are vulnerable to the wiles of an impostor. Biometric recognition, or, simply, biometrics, refers to the automatic recognition of individuals based on their physiological and/or behavioral characteristics. By using biometrics, it is possible to confirm or establish an individual's identity based on "who she is", rather than by "what she possesses" (e.g., an ID card) or "what she remembers" (e.g., a password). We give a brief overview of the field of biometrics and summarize some of its advantages, disadvantages, strengths, limitations, and related privacy concerns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 3. An Efficient SVD-Based Method for Image Denoising

    Publication Year: 2015 , Page(s): 1
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (4419 KB)  

    Nonlocal self-similarity of images has attracted considerable interest in the field of image processing and led to several state-of-the-art image denoising algorithms, such as BM3D, LPG-PCA, PLOWand SAIST. In this paper, we propose a computationally simple denoising algorithm by using the nonlocal self-similarity and the low-rank approximation. The proposed method consists of three basic steps. Firstly, our method classifies similar image patches by the block matching technique to form the similar patch groups, which results in the similar patch groups to be low-rank. Next, each group of similar patches is factorized by singular value decomposition (SVD) and estimated by taking only a few largest singular values and corresponding singular vectors. Lastly, an initial denoised image is generated by aggregating all processed patches. For low-rank matrices, SVD can provide the optimal energy compaction in the least square sense. The proposed method exploits the optimal energy compaction property of SVD to lead a low-rank approximation of similar patch groups. Unlike other SVD-based methods, the low-rank approximation in SVD domain avoids learning the local basis for representing image patches which usually is computationally expensive. Experimental results demonstrate that the proposed method can effectively reduce noise and be competitive with the current state-of-the-art denoising algorithms in terms of both quantitative metrics and subjective visual quality. View full abstract»

    Open Access
  • 4. A Real-Time Motion-Feature-Extraction VLSI Employing Digital-Pixel-Sensor-Based Parallel Architecture

    Publication Year: 2014 , Page(s): 1787 - 1799
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (4309 KB) |  | HTML iconHTML  

    A very-large-scale integration capable of extracting motion features from moving images in real time has been developed employing row-parallel and pixel-parallel architectures based on the digital pixel sensor technology. Directional edge filtering of input images is carried out in row-parallel processing to minimize the chip real estate. To achieve a real-time response of the system, a fully pixel-parallel architecture has been explored in adaptive binarization of filtered images for essential feature extraction as well as in their temporal integration and derivative operations. As a result, self-speed-adaptive motion feature extraction has been established. The chip was designed and fabricated in a 65-nm CMOS technology and used to build an object detection system. Motion-sensitive target image localization was demonstrated as an illustrative example. View full abstract»

    Open Access
  • 5. Overview of HEVC High-Level Syntax and Reference Picture Management

    Publication Year: 2012 , Page(s): 1858 - 1870
    Cited by:  Papers (7)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3050 KB) |  | HTML iconHTML  

    The increasing proportion of video traffic in telecommunication networks puts an emphasis on efficient video compression technology. High Efficiency Video Coding (HEVC) is the forthcoming video coding standard that provides substantial bit rate reductions compared to its predecessors. In the HEVC standardization process, technologies such as picture partitioning, reference picture management, and parameter sets are categorized as “high-level syntax.” The design of the high-level syntax impacts the interface to systems and error resilience, and provides new functionalities. This paper presents an overview of the HEVC high-level syntax, including network abstraction layer unit headers, parameter sets, picture partitioning schemes, reference picture management, and supplemental enhancement information messages. View full abstract»

    Open Access
  • 6. Intra Coding of the HEVC Standard

    Publication Year: 2012 , Page(s): 1792 - 1801
    Cited by:  Papers (35)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2162 KB) |  | HTML iconHTML  

    This paper provides an overview of the intra coding techniques in the High Efficiency Video Coding (HEVC) standard being developed by the Joint Collaborative Team on Video Coding (JCT-VC). The intra coding framework of HEVC follows that of traditional hybrid codecs and is built on spatial sample prediction followed by transform coding and postprocessing steps. Novel features contributing to the increased compression efficiency include a quadtree-based variable block size coding structure, block-size agnostic angular and planar prediction, adaptive pre- and postfiltering, and prediction direction-based transform coefficient scanning. This paper discusses the design principles applied during the development of the new intra coding methods and analyzes the compression performance of the individual tools. Computational complexity of the introduced intra prediction algorithms is analyzed both by deriving operational cycle counts and benchmarking an optimized implementation. Using objective metrics, the bitrate reduction provided by the HEVC intra coding over the H.264/advanced video coding reference is reported to be 22% on average and up to 36%. Significant subjective picture quality improvements are also reported when comparing the resulting pictures at fixed bitrate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 7. Overview of the H.264/AVC video coding standard

    Publication Year: 2003 , Page(s): 560 - 576
    Cited by:  Papers (760)  |  Patents (308)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (904 KB) |  | HTML iconHTML  

    H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 8. Fast HEVC Encoding Decisions Using Data Mining

    Publication Year: 2015 , Page(s): 660 - 673
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3949 KB) |  | HTML iconHTML  

    The High Efficiency Video Coding standard provides improved compression ratio in comparison with its predecessors at the cost of large increases in the encoding computational complexity. An important share of this increase is due to the new flexible partitioning structures, namely the coding trees, the prediction units, and the residual quadtrees, with the best configurations decided through an exhaustive rate-distortion optimization (RDO) process. In this paper, we propose a set of procedures for deciding whether the partition structure optimization algorithm should be terminated early or run to the end of an exhaustive search for the best configuration. The proposed schemes are based on decision trees obtained through data mining techniques. By extracting intermediate data, such as encoding variables from a training set of video sequences, three sets of decision trees are built and implemented to avoid running the RDO algorithm to its full extent. When separately implemented, these schemes achieve average computational complexity reductions (CCRs) of up to 50% at a negligible cost of 0.56% in terms of Bjontegaard Delta (BD) rate increase. When the schemes are jointly implemented, an average CCR of up to 65% is achieved, with a small BD-rate increase of 1.36%. Extensive experiments and comparisons with similar works demonstrate that the proposed early termination schemes achieve the best rate-distortion-complexity tradeoffs among all the compared works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 9. Crowded Scene Analysis: A Survey

    Publication Year: 2015 , Page(s): 367 - 386
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5549 KB) |  | HTML iconHTML  

    Automated scene analysis has been a topic of great interest in computer vision and cognitive science. Recently, with the growth of crowd phenomena in the real world, crowded scene analysis has attracted much attention. However, the visual occlusions and ambiguities in crowded scenes, as well as the complex behaviors and scene semantics, make the analysis a challenging task. In the past few years, an increasing number of works on the crowded scene analysis have been reported, which covered different aspects including crowd motion pattern learning, crowd behavior and activity analyses, and anomaly detection in crowds. This paper surveys the state-of-the-art techniques on this topic. We first provide the background knowledge and the available features related to crowded scenes. Then, existing models, popular algorithms, evaluation protocols, and system performance are provided corresponding to different aspects of the crowded scene analysis. We also outline the available datasets for performance evaluation. Finally, some research problems and promising future directions are presented with discussions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 10. Automatic License Plate Recognition (ALPR): A State-of-the-Art Review

    Publication Year: 2013 , Page(s): 311 - 325
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (869 KB) |  | HTML iconHTML  

    Automatic license plate recognition (ALPR) is the extraction of vehicle license plate information from an image or a sequence of images. The extracted information can be used with or without a database in many applications, such as electronic payment systems (toll payment, parking fee payment), and freeway and arterial monitoring systems for traffic surveillance. The ALPR uses either a color, black and white, or infrared camera to take images. The quality of the acquired images is a major factor in the success of the ALPR. ALPR as a real-life application has to quickly and successfully process license plates under different environmental conditions, such as indoors, outdoors, day or night time. It should also be generalized to process license plates from different nations, provinces, or states. These plates usually contain different colors, are written in different languages, and use different fonts; some plates may have a single color background and others have background images. The license plates can be partially occluded by dirt, lighting, and towing accessories on the car. In this paper, we present a comprehensive review of the state-of-the-art techniques for ALPR. We categorize different ALPR techniques according to the features they used for each stage, and compare them in terms of pros, cons, recognition accuracy, and processing speed. Future forecasts of ALPR are given at the end. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 11. Reversible data hiding

    Publication Year: 2006 , Page(s): 354 - 362
    Cited by:  Papers (315)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1464 KB) |  | HTML iconHTML  

    A novel reversible data hiding algorithm, which can recover the original image without any distortion from the marked image after the hidden data have been extracted, is presented in this paper. This algorithm utilizes the zero or the minimum points of the histogram of an image and slightly modifies the pixel grayscale values to embed data into the image. It can embed more data than many of the existing reversible data hiding algorithms. It is proved analytically and shown experimentally that the peak signal-to-noise ratio (PSNR) of the marked image generated by this method versus the original image is guaranteed to be above 48 dB. This lower bound of PSNR is much higher than that of all reversible data hiding techniques reported in the literature. The computational complexity of our proposed technique is low and the execution time is short. The algorithm has been successfully applied to a wide range of images, including commonly used images, medical images, texture images, aerial images and all of the 1096 images in CorelDraw database. Experimental results and performance comparison with other reversible data hiding schemes are presented to demonstrate the validity of the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 12. HEVC Deblocking Filter

    Publication Year: 2012 , Page(s): 1746 - 1754
    Cited by:  Papers (17)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3693 KB) |  | HTML iconHTML  

    This paper describes the in-loop deblocking filter used in the upcoming High Efficiency Video Coding (HEVC) standard to reduce visible artifacts at block boundaries. The deblocking filter performs detection of the artifacts at the coded block boundaries and attenuates them by applying a selected filter. Compared to the H.264/AVC deblocking filter, the HEVC deblocking filter has lower computational complexity and better parallel processing capabilities while still achieving significant reduction of the visual artifacts. View full abstract»

    Open Access
  • 13. A Novel Fast CU Encoding Scheme Based on Spatiotemporal Encoding Parameters for HEVC Inter Coding

    Publication Year: 2015 , Page(s): 422 - 435
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4393 KB) |  | HTML iconHTML  

    Recently, a new video coding standard, High Efficiency Video Coding (HEVC), has shown greatly improved coding efficiency by adopting hierarchical structures of coding unit (CU), prediction unit (PU), and transform unit (TU). To best achieve the coding efficiency, the best combinations of CU, PU, and TU must be found in the sense of the minimum rate-distortion (R-D) costs. Owing to this, a large computational complexity occurs. Among these CU, PU, and TU, the determination of CU sizes most significantly affects the R-D performance of HEVC encoders, which causes large computational costs in operation with PU and TU size determinations. In spite of recent works in the complexity reduction of HEVC encoders, most of the research has focused on the complexity reduction with fast CU split in intra slice coding and with early TU split in both intra and inter slice. In this paper, we propose a fast and an efficient CU encoding scheme based on the spatiotemporal encoding parameters of HEVC encoders, which consists of an improved early CU SKIP detection method and a fast CU split decision method. For the current CU block under encoding, the proposed scheme utilizes sample-adaptive-offset parameters as the spatial encoding parameter to estimate the texture complexity that affects the CU partition. In addition, the motion vectors, TU size, and coded block flag information are used as the temporal encoding parameters to estimate the temporal complexity that also affects the CU partition. The proposed scheme effectively utilizes the spatiotemporal encoding parameters that are the byproducts during the encoding process of HEVC without additionally required computation. The proposed novel fast CU encoding scheme significantly reduces the total encoding time with negligible RD-performance loss. The experimental results show that the proposed scheme achieves the total encoding time savings of average 49.6% and 42.7% only with average 1.4% and 1.0% bit-rate losses for various test seq- ences under random access and low delay B conditions, respectively. The proposed scheme has an advantage on the implementation for parallel processing in pipeline structures of HEVC encoders due to its independency with neighboring CU blocks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 14. Comparison of the Coding Efficiency of Video Coding Standards—Including High Efficiency Video Coding (HEVC)

    Publication Year: 2012 , Page(s): 1669 - 1684
    Cited by:  Papers (71)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6150 KB) |  | HTML iconHTML  

    The compression capability of several generations of video coding standards is compared by means of peak signal-to-noise ratio (PSNR) and subjective testing results. A unified approach is applied to the analysis of designs, including H.262/MPEG-2 Video, H.263, MPEG-4 Visual, H.264/MPEG-4 Advanced Video Coding (AVC), and High Efficiency Video Coding (HEVC). The results of subjective tests for WVGA and HD sequences indicate that HEVC encoders can achieve equivalent subjective reproduction quality as encoders that conform to H.264/MPEG-4 AVC when using approximately 50% less bit rate on average. The HEVC design is shown to be especially effective for low bit rates, high-resolution video content, and low-delay communication applications. The measured subjective improvement somewhat exceeds the improvement measured by the PSNR metric. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 15. Manifold Regularized Local Sparse Representation for Face Recognition

    Publication Year: 2015 , Page(s): 651 - 659
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2101 KB) |  | HTML iconHTML  

    Sparse representation-(or sparse coding)-based classification has been successfully applied to face recognition. However, it can become problematic in the presence of illumination variations or occlusions. In this paper, we propose a Manifold Regularized Local Sparse Representation (MRLSR) model to address such difficulties. The key idea behind the MRLSR method is that all coding vectors in sparse representation should be group sparse, which means holding the two properties of both individual sparsity and local similarity. As a consequence, the face recognition rate can be considerably improved. The MRLSR model is optimized by the modified homotopy algorithm, which keeps stable under different choices of the weighting parameter. Extensive experiments are performed on various face databases, which contain illumination variations and occlusions. We show that the proposed method outperforms the state-of-the-art approaches and provides the highest recognition rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 16. HEVC Complexity and Implementation Analysis

    Publication Year: 2012 , Page(s): 1685 - 1696
    Cited by:  Papers (63)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2385 KB) |  | HTML iconHTML  

    Advances in video compression technology have been driven by ever-increasing processing power available in software and hardware. The emerging High Efficiency Video Coding (HEVC) standard aims to provide a doubling in coding efficiency with respect to the H.264/AVC high profile, delivering the same video quality at half the bit rate. In this paper, complexity-related aspects that were considered in the standardization process are described. Furthermore, profiling of reference software and optimized software gives an indication of where HEVC may be more complex than its predecessors and where it may be simpler. Overall, the complexity of HEVC decoders does not appear to be significantly different from that of H.264/AVC decoders; this makes HEVC decoding in software very practical on current hardware. HEVC encoders are expected to be several times more complex than H.264/AVC encoders and will be a subject of research in years to come. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 17. Transform Coefficient Coding in HEVC

    Publication Year: 2012 , Page(s): 1765 - 1777
    Cited by:  Papers (10)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (2260 KB) |  | HTML iconHTML  

    This paper describes transform coefficient coding in the draft international standard of High Efficiency Video Coding (HEVC) specification and the driving motivations behind its design. Transform coefficient coding in HEVC encompasses the scanning patterns and coding methods for the last significant coefficient, significance map, coefficient levels, and sign data. Special attention is paid to the new methods of last significant coefficient coding, multilevel significance maps, high-throughput binarization, and sign data hiding. Experimental results are provided to evaluate the performance of transform coefficient coding in HEVC. View full abstract»

    Open Access
  • 18. No-Reference Video Quality Assessment Based on Artifact Measurement and Statistical Analysis

    Publication Year: 2015 , Page(s): 533 - 546
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4222 KB) |  | HTML iconHTML  

    A discrete cosine transform (DCT)-based no-reference video quality prediction model is proposed that measures artifacts and analyzes the statistics of compressed natural videos. The model has two stages: 1) distortion measurement and 2) nonlinear mapping. In the first stage, an unsigned ac band, three frequency bands, and two orientation bands are generated from the DCT coefficients of each decoded frame in a video sequence. Six efficient frame-level features are then extracted to quantify the distortion of natural scenes. In the second stage, each frame-level feature of all frames is transformed to a corresponding video-level feature via a temporal pooling, then a trained multilayer neural network takes all video-level features as inputs and outputs, a score as the predicted quality of the video sequence. The proposed method was tested on videos with various compression types, content, and resolution in four databases. We compared our model with a linear model, a support-vector-regression-based model, a state-of-the-art training-based model, and a four popular full-reference metrics. Detailed experimental results demonstrate that the results of the proposed method are highly correlated with the subjective assessments. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 19. Multiframe Super-Resolution Employing a Spatially Weighted Total Variation Model

    Publication Year: 2012 , Page(s): 379 - 392
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (17988 KB) |  | HTML iconHTML  

    Total variation (TV) has been used as a popular and effective image prior model in regularization-based image processing fields, such as denoising, deblurring, super-resolution (SR), and others, because of its ability to preserve edges. However, as the TV model favors a piecewise constant solution, the processing results in the flat regions of the image being poor, and it cannot automatically balance the processing strength between different spatial property regions in the image. In this paper, we propose a spatially weighted TV image SR algorithm, in which the spatial information distributed in different image regions is added to constrain the SR process. A newly proposed and effective spatial information indicator called difference curvature is used to identify the spatial property of each pixel, and a weighted parameter determined by the difference curvature information is added to constrain the regularization strength of the TV regularization at each pixel. Meanwhile, a majorization-minimization algorithm is used to optimize the proposed spatially weighted TV SR model. Finally, a significant amount of simulated and real data experimental results show that the proposed spatially weighted TV SR algorithm not only efficiently reduces the “artifacts” produced with a TV model in fat regions of the image, but also preserves the edge information, and the reconstruction results are less sensitive to the regularization parameters than the TV model, because of the consideration of the spatial information constraint. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 20. A Low-Complexity Embedded Compression Codec Design With Rate Control for High-Definition Video

    Publication Year: 2015 , Page(s): 674 - 687
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3580 KB) |  | HTML iconHTML  

    A hardwired design of embedded compression engine targeting the reduction of full high-definition (HD) video transmission bandwidth over the wireless network is developed. It adopts an intra-coding framework and supports both lossless and rate-controlled near lossless compression options. The lossless compression algorithm is based on a simplified Context-Based, Adaptive, Lossless Image Coding (CALIC) scheme featuring pixelwise gradient-adjusted prediction and error-feedback mechanism. To reduce the implementation complexity, an adaptive Golomb-Rice coding scheme in conjunction with a context modeling technique is used in lieu of an adaptive arithmetic coder. With the measures of prediction adjustment, the near lossless compression option can be implemented on top of the lossless compression engine with minimized overhead. An efficient bit-rate control scheme is also developed and can support rate or distortion-constrained controls. For full HD (previously encoded) and nonfull HD test sequences, the lossless compression ratio of the proposed scheme, on average, is 21% and 46%, respectively, better than the Joint Photographic Experts Group-Lossless Standard and the Fast, Efficient Lossless Image Compression System (FELICS) schemes. The near lossless compression option can offer additional 6%-20% bit-rate reduction while keeping the Peak Signal-to-Noise Ratio value 50 dB or higher. The codec is further optimized complexity-wise to facilitate a high-throughput chip implementation. It features a five-stage pipelined architecture and two parallel computing kernels to enhance the throughput. Fabricated using the Taiwan semiconductor manufacturing company 90-nm complementary metal-oxide-semiconductor technology, the design can operate at 200 MHz and supports a 64 frames/s processing rate for full HD videos. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 21. Region-Based Saliency Detection and Its Application in Object Recognition

    Publication Year: 2014 , Page(s): 769 - 779
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9507 KB)  

    The objective of this paper is twofold. First, we introduce an effective region-based solution for saliency detection. Then, we apply the achieved saliency map to better encode the image features for solving object recognition task. To find the perceptually and semantically meaningful salient regions, we extract superpixels based on an adaptive mean shift algorithm as the basic elements for saliency detection. The saliency of each superpixel is measured by using its spatial compactness, which is calculated according to the results of Gaussian mixture model (GMM) clustering. To propagate saliency between similar clusters, we adopt a modified PageRank algorithm to refine the saliency map. Our method not only improves saliency detection through large salient region detection and noise tolerance in messy background, but also generates saliency maps with a well-defined object shape. Experimental results demonstrate the effectiveness of our method. Since the objects usually correspond to salient regions, and these regions usually play more important roles for object recognition than background, we apply our achieved saliency map for object recognition by incorporating a saliency map into sparse coding-based spatial pyramid matching (ScSPM) image representation. To learn a more discriminative codebook and better encode the features corresponding to the patches of the objects, we propose a weighted sparse coding for feature coding. Moreover, we also propose a saliency weighted max pooling to further emphasize the importance of those salient regions in feature pooling module. Experimental results on several datasets illustrate that our weighted ScSPM framework greatly outperforms ScSPM framework, and achieves excellent performance for object recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 22. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard

    Publication Year: 2003 , Page(s): 620 - 636
    Cited by:  Papers (353)  |  Patents (128)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (829 KB)  

    Context-based adaptive binary arithmetic coding (CABAC) as a normative part of the new ITU-T/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel low-complexity method for binary arithmetic coding and probability estimation that is well suited for efficient hardware and software implementations. CABAC significantly outperforms the baseline entropy coding method of H.264/AVC for the typical area of envisaged target applications. For a set of test sequences representing typical material used in broadcast applications and for a range of acceptable video quality of about 30 to 38 dB, average bit-rate savings of 9%-14% are achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 23. Overview of the Scalable Video Coding Extension of the H.264/AVC Standard

    Publication Year: 2007 , Page(s): 1103 - 1120
    Cited by:  Papers (1218)  |  Patents (54)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1069 KB) |  | HTML iconHTML  

    With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 24. Novel FPGA Implementation of Hand Sign Recognition System With SOM–Hebb Classifier

    Publication Year: 2015 , Page(s): 153 - 166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4104 KB) |  | HTML iconHTML  

    This paper proposes a hardware posture recognition system with a hybrid network. The hybrid network consists of self-organizing map (SOM) and Hebbian network. Feature vectors are extracted from input posture images, which are mapped to a lower dimensional map of neurons in the SOM. The Hebbian network is a single-layer feedforward neural network trained with a Hebbian learning algorithm to identify categories. The recognition algorithm is robust to the change in location of hand signs, but it is not immune to rotation or scaling. Its robustness to rotation and scaling was improved by adding perturbation to the training data for the SOM-Hebb classifier. In addition, neuron culling is proposed to improve performance. The whole system is implemented on a field-programmable gate array employing novel video processing architecture. The system was designed to recognize 24 American sign language hand signs, and its feasibility was verified through both simulations and experiments. The experimental results revealed that the system could accomplish recognition at a speed of 60 frames/s, while achieving an accuracy of 97.1%. Due to a novel hardware implementation, the circuit size of the proposed system is very small, which is highly suitable for embedded applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 25. A Fast CU Size Decision Algorithm for HEVC

    Publication Year: 2015 , Page(s): 411 - 421
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3451 KB) |  | HTML iconHTML  

    High Efficiency Video Coding (HEVC) employs a coding unit (CU), prediction unit (PU), and transform unit (TU) based on the quadtree coding tree unit (CTU) structure to improve coding efficiency. However, the computational complexity increases greatly because the rate-distortion (RD) optimization process should be performed for all CUs, PUs, and TUs to obtain the optimal CTU partition. In this paper, a fast CU size decision algorithm is proposed to reduce the encoder complexity of HEVC. Based on the statistical analysis, three approaches with SKIP mode decision (SMD), CU skip estimation (CUSE), and early CU termination (ECUT) are considered. In SMD, it is determined that the remaining modes except for SKIP mode are preformed or not. CUSE and ECUT determine that larger CU sizes and smaller CU sizes are coded or not, respectively. Thresholds for SMD, CUSE, and ECUT are designed based on Bayes' rule with a complexity factor. Update process is performed to estimate the statistical parameters for SMD, CUSE, and ECUT considering the characteristic of RD cost. The experimental results demonstrate that the proposed CU size decision algorithm significantly reduces computational complexity by 69% on average with 2.99% Bjøntegaard difference bitrate (BDBR) increase for random access. The complexity reduction and BDBR increase for low delay are 68% and 2.46%, respectively. The experimental results also show that our proposed scheme performs well for various characteristic of sequences and outperforms the two previous state-of-the-art works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 26. How iris recognition works

    Publication Year: 2004 , Page(s): 21 - 30
    Cited by:  Papers (500)  |  Patents (31)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (568 KB) |  | HTML iconHTML  

    Algorithms developed by the author for recognizing persons by their iris patterns have now been tested in many field and laboratory trials, producing no false matches in several million comparison tests. The recognition principle is the failure of a test of statistical independence on iris phase structure encoded by multi-scale quadrature wavelets. The combinatorial complexity of this phase information across different persons spans about 249 degrees of freedom and generates a discrimination entropy of about 3.2 b/mm2 over the iris, enabling real-time decisions about personal identity with extremely high confidence. The high confidence levels are important because they allow very large databases to be searched exhaustively (one-to-many "identification mode") without making false matches, despite so many chances. Biometrics that lack this property can only survive one-to-one ("verification") or few comparisons. The paper explains the iris recognition algorithms and presents results of 9.1 million comparisons among eye images from trials in Britain, the USA, Japan, and Korea. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 27. Reconfigurable Processor for Binary Image Processing

    Publication Year: 2013 , Page(s): 823 - 831
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5208 KB) |  | HTML iconHTML  

    Binary image processing is a powerful tool in many image and video applications. A reconfigurable processor is presented for binary image processing in this paper. The processor's architecture is a combination of a reconfigurable binary processing module, input and output image control units, and peripheral circuits. The reconfigurable binary processing module, which consists of mixed-grained reconfigurable binary compute units and output control logic, performs binary image processing operations, especially mathematical morphology operations, and implements related algorithms more than 200 f/s for a 1024  × 1024 image. The periphery circuits control the whole image processing and dynamic reconfiguration process. The processor is implemented on an EP2S180 field-programmable gate array. Synthesis results show that the presented processor can deliver 60.72 GOPS and 23.72 GOPS/mm2 at a 220-MHz system clock in the SMIC 0.18-μm CMOS process. The simulation and experimental results demonstrate that the processor is suitable for real-time binary image processing applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 28. Sample Adaptive Offset in the HEVC Standard

    Publication Year: 2012 , Page(s): 1755 - 1764
    Cited by:  Papers (26)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4413 KB) |  | HTML iconHTML  

    This paper provides a technical overview of a newly added in-loop filtering technique, sample adaptive offset (SAO), in High Efficiency Video Coding (HEVC). The key idea of SAO is to reduce sample distortion by first classifying reconstructed samples into different categories, obtaining an offset for each category, and then adding the offset to each sample of the category. The offset of each category is properly calculated at the encoder and explicitly signaled to the decoder for reducing sample distortion effectively, while the classification of each sample is performed at both the encoder and the decoder for saving side information significantly. To achieve low latency of only one coding tree unit (CTU), a CTU-based syntax design is specified to adapt SAO parameters for each CTU. A CTU-based optimization algorithm can be used to derive SAO parameters of each CTU, and the SAO parameters of the CTU are inter leaved into the slice data. It is reported that SAO achieves on average 3.5% BD-rate reduction and up to 23.5% BD-rate reduction with less than 1% encoding time increase and about 2.5% decoding time increase under common test conditions of HEVC reference software version 8.0. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 29. Fast Intra Mode Decision for High Efficiency Video Coding (HEVC)

    Publication Year: 2014 , Page(s): 660 - 668
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11196 KB)  

    The latest High Efficiency Video Coding (HEVC) standard only requires 50% bit-rate of the H.264/AVC at the same perceptual quality, but with a significant encoder complexity increase. Hence, it is necessary and inevitable to develop fast HEVC encoding algorithms for its potential market adoption. In this paper, we propose a fast intra mode decision for the HEVC encoder. The overall fast intra mode decision algorithm consists of both micro- and macro-level schemes. At the micro-level, we propose the Hadamard cost-based progressive rough mode search (pRMS) to selectively check the potential modes instead of traversing all candidates (i.e., up to 35 in HEVC). Fewer effective candidates will be chosen by the pRMS for the subsequent rate-distortion optimized quantization (RDOQ) to derive the rate-distortion (R-D) optimal mode. An early RDOQ skip method is also introduced to further the complexity reduction. At the macrolevel, we introduce the early coding unit (CU) split termination if the estimated R-D cost [through aggregated R-D costs of (partial) sub-CUs] is already larger than the R-D cost of the current CU. On average, the proposed fast intra mode decision provides about 2.5 × speedup (without any platform or source code level optimization) with just a 1.0% Bjontegaard delta rate (BD-rate) increase using the HEVC common test condition. Moreover, our proposed solution also demonstrates the state-of-the-art performance in comparison with other works. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 30. Machine Recognition of Human Activities: A Survey

    Publication Year: 2008 , Page(s): 1473 - 1488
    Cited by:  Papers (258)  |  Patents (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1458 KB) |  | HTML iconHTML  

    The past decade has witnessed a rapid proliferation of video cameras in all walks of life and has resulted in a tremendous explosion of video content. Several applications such as content-based video annotation and retrieval, highlight extraction and video summarization require recognition of the activities occurring in the video. The analysis of human activities in videos is an area with increasingly important consequences from security and surveillance to entertainment and personal archiving. Several challenges at various levels of processing-robustness against errors in low-level processing, view and rate-invariant representations at midlevel processing and semantic representation of human activities at higher level processing-make this problem hard to solve. In this review paper, we present a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition, and learning of human activities from video and related applications. We discuss the problem at two major levels of complexity: 1) "actions" and 2) "activities." "Actions" are characterized by simple motion patterns typically executed by a single human. "Activities" are more complex and involve coordinated actions among a small number of humans. We will discuss several approaches and classify them according to their ability to handle varying degrees of complexity as interpreted above. We begin with a discussion of approaches to model the simplest of action classes known as atomic or primitive actions that do not require sophisticated dynamical modeling. Then, methods to model actions with more complex dynamics are discussed. The discussion then leads naturally to methods for higher level representation of complex activities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 31. Contour Model-Based Hand-Gesture Recognition Using the Kinect Sensor

    Publication Year: 2014 , Page(s): 1935 - 1944
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1692 KB) |  | HTML iconHTML  

    In RGB-D sensor-based pose estimation, training data collection is often a challenging task. In this paper, we propose a new hand motion capture procedure for establishing the real gesture data set. A 14-patch hand partition scheme is designed for color-based semiautomatic labeling. This method is integrated into a vision-based hand gesture recognition framework for developing desktop applications. We use the Kinect sensor to achieve more reliable and accurate tracking under unconstrained conditions. Moreover, a hand contour model is proposed to simplify the gesture matching process, which can reduce the computational complexity of gesture matching. This framework allows tracking hand gestures in 3-D space and matching gestures with simple contour model, and thus supports complex real-time interactions. The experimental evaluations and a real-world demo of hand gesture interaction demonstrate the effectiveness of this framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 32. Robust Background Subtraction for Network Surveillance in H.264 Streaming Video

    Publication Year: 2013 , Page(s): 1695 - 1703
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7957 KB) |  | HTML iconHTML  

    The H.264/Advanced Video Coding (AVC) is the industry standard in network surveillance offering the lowest bitrate for a given perceptual quality among any MPEG or proprietary codecs. This paper presents a novel approach for background subtraction in bitstreams encoded in the Baseline profile of H.264/AVC. Temporal statistics of the proposed feature vectors, describing macroblock units in each frame, are used to select potential candidates containing moving objects. From the candidate macroblocks, foreground pixels are determined by comparing the colors of corresponding pixels pair-wise with a background model. The basic contribution of the current work compared to the related approaches is that, it allows each macroblock to have a different quantization parameter, in view of the requirements in variable as well as constant bit-rate applications. Additionally, a low-complexity technique for color comparison is proposed which enables us to obtain pixel-resolution segmentation at a negligible computational cost as compared to those of classical pixel-based approaches. Results showing striking comparison against those of proven state-of-the-art pixel domain algorithms are presented over a diverse set of standardized surveillance sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 33. Efficient Mode Decision Schemes for HEVC Inter Prediction

    Publication Year: 2014 , Page(s): 1579 - 1593
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5778 KB) |  | HTML iconHTML  

    The emerging High Efficiency Video Coding (HEVC) standard reduces the bit rate by almost 40% over the preceding state-of-the-art Advanced Video Coding (AVC) standard with the same objective quality but at about 40% encoding complexity overhead. The main reason for HEVC complexity is inter prediction that accounts for 60%-70% of the whole encoding time. This paper analyzes the rate-distortion-complexity characteristics of the HEVC inter prediction as a function of different block partition structures and puts the analysis results into practice by developing optimized mode decision schemes for the HEVC encoder. The HEVC inter prediction involves three different partition modes: square motion partition, symmetric motion partition (SMP), and asymmetric motion partition (AMP) out of which the decision of SMPs and AMPs are optimized in this paper. The key optimization techniques behind the proposed schemes are: 1) a conditional evaluation of the SMP modes; 2) range limitations primarily in the SMP sizes and secondarily in the AMP sizes; and 3) a selection of the SMP and AMP ranges as a function of the quantization parameter. These three techniques can be seamlessly incorporated in the existing control structures of the HEVC reference encoder without limiting its potential parallelization, hardware acceleration, or speed-up with other existing encoder optimizations. Our experiments show that the proposed schemes are able to cut the average complexity of the HEVC reference encoder by 31%-51% at a cost of 0.2%-1.3% bit rate increase under the random access coding configuration. The respective values under the low-delay B coding configuration are 32%-50% and 0.3%-1.3%. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 34. Ultra-High-Throughput VLSI Architecture of H.265/HEVC CABAC Encoder for UHDTV Applications

    Publication Year: 2015 , Page(s): 497 - 507
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4543 KB) |  | HTML iconHTML  

    Ultra high definition television (UHDTV) imposes extremely high throughput requirement on video encoders based on High Efficiency Video Coding (H.265/HEVC) and Advanced Video Coding (H.264/AVC) standards. Context-adaptive binary arithmetic coding (CABAC) is the entropy coding component of these standards. In very-large-scale integration implementation, CABAC has known difficulties in being effectively pipelined and parallelized, due to the critical bin-to-bin data dependencies in its algorithm. This paper addresses the throughput requirement of CABAC encoding for UHDTV applications. The proposed optimizations including prenormalization, hybrid path coverage and lookahead rLPS to reduce the critical path delay of binary arithmetic encoding (BAE) by exploiting the incompleteness of data dependencies in rLPS updating. Meanwhile, the number of bins BAE delivers per clock cycle is increased by the proposed bypass bin splitting technique. The context modeling and binarization components are also optimized. As a result, our CABAC encoder delivers an average of 4.37 bins per clock cycle. Its maximum clock frequency reaches 420 MHz when synthesized in 90 nm. The corresponding overall throughput is 1836 Mbin/s that is 62.5% higher than the state-of-the-art architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 35. Modularity-Based Image Segmentation

    Publication Year: 2015 , Page(s): 570 - 581
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1830 KB) |  | HTML iconHTML  

    To address the problem of segmenting an image into sizeable homogeneous regions, this paper proposes an efficient agglomerative algorithm on the basis of modularity optimization. Given an oversegmented image that consists of many small regions, our algorithm automatically merges those neighboring regions that produce the largest increase in modularity index. When the modularity of the segmented image is maximized, the algorithm stops merging and produces the final segmented image. To preserve the repetitive patterns in a homogeneous region, we propose a feature on the basis of the histogram of states of image gradients and use it together with the color feature to characterize the similarity of two regions. By constructing the similarity matrix in an adaptive manner, the oversegmentation problem can be effectively avoided. Our algorithm is tested on the publicly available Berkeley Segmentation Data Set as well as the semantic segmentation data set and compared with other popular algorithms. Experimental results have demonstrated that our algorithm produces sizable segmentation, preserves repetitive patterns with appealing time complexity, and achieves object-level segmentation to some extent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 36. Efficient Feature Selection and Classification for Vehicle Detection

    Publication Year: 2015 , Page(s): 508 - 517
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1920 KB) |  | HTML iconHTML  

    The focus of this paper is on the problem of Haar-like feature selection and classification for vehicle detection. Haar-like features are particularly attractive for vehicle detection because they form a compact representation, encode edge and structural information, capture information from multiple scales, and especially can be computed efficiently. Due to the large-scale nature of the Haar-like feature pool, we present a rapid and effective feature selection method via AdaBoost by combining a sample's feature value with its class label. Our approach is analyzed theoretically and empirically to show its efficiency. Then, an improved normalization algorithm for the selected feature values is designed to reduce the intra-class difference, while increasing the inter-class variability. Experimental results demonstrate that the proposed approaches not only speed up the feature selection process with AdaBoost, but also yield better detection performance than the state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 37. High Throughput CABAC Entropy Coding in HEVC

    Publication Year: 2012 , Page(s): 1778 - 1791
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3860 KB) |  | HTML iconHTML  

    Context-adaptive binary arithmetic coding (CAB-AC) is a method of entropy coding first introduced in H.264/AVC and now used in the newest standard High Efficiency Video Coding (HEVC). While it provides high coding efficiency, the data dependencies in H.264/AVC CABAC make it challenging to parallelize and thus, limit its throughput. Accordingly, during the standardization of entropy coding for HEVC, both coding efficiency and throughput were considered. This paper highlights the key techniques that were used to enable HEVC to potentially achieve higher throughput while delivering coding gains relative to H.264/AVC. These techniques include reducing context coded bins, grouping bypass bins, grouping bins with the same context, reducing context selection dependencies, reducing total bins, and reducing parsing dependencies. It also describes reductions to memory requirements that benefit both throughput and implementation costs. Proposed and adopted techniques up to draft international standard (test model HM-8.0) are discussed. In addition, analysis and simulation results are provided to quantify the throughput improvements and memory reduction compared with H.264/AVC. In HEVC, the maximum number of context-coded bins is reduced by 8×, and the context memory and line buffer are reduced by 3× and 20×, respectively. This paper illustrates that accounting for implementation cost when designing video coding algorithms can result in a design that enables higher processing speed and lowers hardware costs, while still delivering high coding efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 38. Video-Based Human Behavior Understanding: A Survey

    Publication Year: 2013 , Page(s): 1993 - 2008
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (15635 KB) |  | HTML iconHTML  

    Understanding human behaviors is a challenging problem in computer vision that has recently seen important advances. Human behavior understanding combines image and signal processing, feature extraction, machine learning, and 3-D geometry. Application scenarios range from surveillance to indexing and retrieval, from patient care to industrial safety and sports analysis. Given the broad set of techniques used in video-based behavior understanding and the fast progress in this area, in this paper we organize and survey the corresponding literature, define unambiguous key terms, and discuss links among fundamental building blocks ranging from human detection to action and interaction recognition. The advantages and the drawbacks of the methods are critically discussed, providing a comprehensive coverage of key aspects of video-based human behavior understanding, available datasets for experimentation and comparisons, and important open research issues. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 39. Content-Based Image Retrieval Using Error Diffusion Block Truncation Coding Features

    Publication Year: 2015 , Page(s): 466 - 481
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4540 KB) |  | HTML iconHTML  

    This paper presents a new approach to index color images using the features extracted from the error diffusion block truncation coding (EDBTC). The EDBTC produces two color quantizers and a bitmap image, which are further processed using vector quantization (VQ) to generate the image feature descriptor. Herein two features are introduced, namely, color histogram feature (CHF) and bit pattern histogram feature (BHF), to measure the similarity between a query image and the target image in database. The CHF and BHF are computed from the VQ-indexed color quantizer and VQ-indexed bitmap image, respectively. The distance computed from CHF and BHF can be utilized to measure the similarity between two images. As documented in the experimental result, the proposed indexing method outperforms the former block truncation coding based image indexing and the other existing image retrieval schemes with natural and textural data sets. Thus, the proposed EDBTC is not only examined with good capability for image compression but also offers an effective way to index images for the content-based image retrieval system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 40. An Overview of Information Hiding in H.264/AVC Compressed Video

    Publication Year: 2014 , Page(s): 305 - 319
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12054 KB) |  | HTML iconHTML  

    Information hiding refers to the process of inserting information into a host to serve specific purpose(s). In this paper, information hiding methods in the H.264/AVC compressed video domain are surveyed. First, the general framework of information hiding is conceptualized by relating the state of an entity to a meaning (i.e., sequences of bits). This concept is illustrated by using various data representation schemes such as bit plane replacement, spread spectrum, histogram manipulation, divisibility, mapping rules, and matrix encoding. Venues at which information hiding takes place are then identified, including prediction process, transformation, quantization, and entropy coding. Related information hiding methods at each venue are briefly reviewed, along with the presentation of the targeted applications, appropriate diagrams, and references. A timeline diagram is constructed to chronologically summarize the invention of information hiding methods in the compressed still image and video domains since 1992. A comparison among the considered information hiding methods is also conducted in terms of venue, payload, bitstream size overhead, video quality, computational complexity, and video criteria. Further perspectives and recommendations are presented to provide a better understanding of the current trend of information hiding and to identify new opportunities for information hiding in compressed video. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 41. Person Re-Identification with Reference Descriptor

    Publication Year: 2015 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3626 KB)  

    Person identification across non-overlapping cameras, also known as person re-identification, aims to match people at different time and location. Re-identifying people is of great importance in crucial applications such as wide-area surveillance and visual tracking. Due to the appearance variations in pose, illumination, and occlusion in different camera views, person reidentification is inherently difficult. To address these challenges, a reference-based method is proposed for person re-identification across different cameras. Instead of directly matching people by their appearance, the matching is conducted in reference space where the descriptor for a person is translated from the original color or texture descriptors to similarity measures between this person and the exemplars in the reference set. A subspace is learned in which the correlations of the reference data from different cameras are maximized using Regularized Canonical Correlation Analysis (RCCA). For re-identification, the gallery data and the probe data are projected into this RCCA subspace and the reference descriptors (RDs) of the gallery and probe are generated by computing the similarity between them and the reference data. The identity of a probe is determined by comparing the RD of the probe and the RDs of the gallery. A re-ranking step is added to further improve the results using a saliency-based matching scheme. Experiments on publicly available datasets show that the proposed method outperforms most of the state-of-the-art approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 42. Binary Descriptor Based Nonparametric Background Modeling for Foreground Extraction by Using Detection Theory

    Publication Year: 2015 , Page(s): 595 - 608
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3048 KB) |  | HTML iconHTML  

    Recently, most background modeling approaches represent distributions of background changes by using parametric models such as Gaussian mixture models. Because of significant illumination changes and dynamic moving backgrounds with time, variations of background changes are hard to be modeled by parametric background models. Moreover, how to efficiently and effectively update parameters of parametric models to reflect background changes remains a problem. In this paper, we propose a novel coarse-to-fine detection theory algorithm to extract foreground objects on the basis of nonparametric background and foreground models represented by binary descriptors. We update background and foreground models by a first-in-first-out strategy to maintain the most recent observed background and foreground instances. As shown in the experiments, our method can achieve better foreground extraction results and fewer false alarms of surveillance videos with lighting changes and dynamic backgrounds in both collected and CDnet 2012 benchmark data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 43. Secure Reversible Image Data Hiding over Encrypted Domain via Key Modulation

    Publication Year: 2015 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1373 KB)  

    This work proposes a novel reversible image data hiding (RIDH) scheme over encrypted domain. The data embedding is achieved through a public key modulation mechanism, in which access to the secret encryption key is not needed. At the decoder side, a powerful two-class SVM classifier is designed to distinguish encrypted and non-encrypted image patches, allowing us to jointly decode the embedded message and the original image signal. Compared with the state-of-the-arts, the proposed approach provides higher embedding capacity, and is able to perfectly reconstruct the original image as well as the embedded message. Extensive experimental results are provided to validate the superior performance of our scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 44. Edge-Directed Single-Image Super-Resolution Via Adaptive Gradient Magnitude Self-Interpolation

    Publication Year: 2013 , Page(s): 1289 - 1299
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1243 KB) |  | HTML iconHTML  

    Super-resolution from a single image plays an important role in many computer vision systems. However, it is still a challenging task, especially in preserving local edge structures. To construct high-resolution images while preserving the sharp edges, an effective edge-directed super-resolution method is presented in this paper. An adaptive self-interpolation algorithm is first proposed to estimate a sharp high-resolution gradient field directly from the input low-resolution image. The obtained high-resolution gradient is then regarded as a gradient constraint or an edge-preserving constraint to reconstruct the high-resolution image. Extensive results have shown both qualitatively and quantitatively that the proposed method can produce convincing super-resolution images containing complex and sharp features, as compared with the other state-of-the-art super-resolution algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 45. Random-Grid-Based Visual Cryptography Schemes

    Publication Year: 2014 , Page(s): 733 - 744
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (46863 KB)  

    This paper discusses a random-grid-based nonexpanded visual cryptography scheme for generating both meaningful and noise-like shares. First, the distribution of black pixels on the share images and the stack image is analyzed. A probability allocation method is then proposed that is capable of producing the best contrast in both the share images and the stack image. With our method, not only can different cover images be used to hide the secret image, but the contrast can be adjusted as needed. The most important result is the improvement of the visual quality of both the share images and the stack image to their theoretical maximum. Our meaningful visual secret sharing method is shown in experiments to be superior to past methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 46. A New Data Transfer Method via Signal-Rich-Art Code Images Captured by Mobile Devices

    Publication Year: 2015 , Page(s): 688 - 700
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2498 KB) |  | HTML iconHTML  

    A new type of signal-rich-art image for applications of data transfer, called signal-rich-art code image, is proposed. The created code image is visually similar to a preselected target image and, with a given message embedded, achieves the effect of the so-called signal-rich art. With its function similar to that of a QR code, such a type of image is produced by encoding the message into a binary bit stream, representing the bits by binary code patterns of 2 × 2 blocks, and injecting the patterns into the target image by a novel image-block luminance modulation scheme. Each signal-rich-art code image may be printed or displayed and then is recaptured using a mobile-device camera. Skillful techniques for counting the number of pattern blocks and recognition of code patterns are also proposed for message extraction from the recaptured version of the signal-rich-art code image. Good experimental results and a comparison of them with those of an existing alternative method show the feasibility and superiority of the proposed new data transfer method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 47. Parallel H.264/AVC Fast Rate-Distortion Optimized Motion Estimation by Using a Graphics Processing Unit and Dedicated Hardware

    Publication Year: 2015 , Page(s): 701 - 715
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3650 KB) |  | HTML iconHTML  

    Heterogeneous systems on a single chip composed of a central processing unit, graphics processing unit (GPU), and field-programmable gate array (FPGA) are expected to emerge in the near future. In this context, the system on chip can be dynamically adapted to employ different architectures for execution of data-intensive applications. Motion estimation (ME) is one such task that can be accelerated using FPGA and GPU for high-performance H.264/Advanced Video Coding encoder implementation. This paper presents an inherent parallel low-complexity rate-distortion (RD) optimized fast ME algorithm well suited for parallel implementations, eliminating various data dependencies caused by a reliance on spatial predictions. In addition, this paper provides details of the GPU and FPGA implementations of the parallel algorithm by using OpenCL and Very High Speed Integrated Circuits (VHSIC) Hardware Descriptive Language (VHDL), respectively, and presents a practical performance comparison between the two implementations. The experimental results show that the proposed scheme achieves significant speedup on GPU and FPGA, and has comparable RD performance with respect to sequential fast ME algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 48. HEVC Encoding Optimization Using Multi-core CPUs and GPUs

    Publication Year: 2015 , Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4147 KB)  

    Although the High Efficiency Video Coding (HEVC) standard significantly improves coding efficiency of video compression, it is unacceptable even in offline applications to spend several hours compressing 10 seconds of High Definition (HD) video. In this paper, we propose using a multi-core Central Processing Unit (CPU) and an off-the-shelf Graphics Processing Unit (GPU) with 3,072 streaming processors (SPs) for HEVC fast encoding so that the speed optimization does not result in loss of coding efficiency. There are two key technical contributions in this paper. First, we propose an algorithm that is both parallel and fast for the GPU, which can utilize 3,072 SPs in parallel to estimate the motion vector of every Prediction Unit (PU) in every combination of the Coding Unit (CU) and PU partitions. Furthermore, the proposed GPU algorithm can avoid coding efficiency loss caused by the lack of a motion vector predictor (MVP). Second, we propose a fast algorithm for the CPU, which can fully utilize the results from the GPU to significantly reduce the number of possible CU and PU partitions without any coding efficiency loss. Our experimental results show that compared with the reference software, we can encode high resolution video that consumes 1.9% of the CPU time and 1.0% of the GPU time, with only a 1.4% rate increase. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 49. Reducing Motion Blur Artifact of Foveal Projection for a Dynamic Focus-Plus-Context Display

    Publication Year: 2015 , Page(s): 547 - 556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4244 KB) |  | HTML iconHTML  

    This paper presents a novel technique to reduce the motion blur artifacts of foveal projection in a dynamic focus-plus-context (DF+C) display. The DF+C display is generally configured with multiple projectors and provides a nonuniform spatial resolution that consists of high-resolution (hi-res) regions (foveal projection) and low-resolution regions (peripheral projection). A serious problem of the DF+C display is motion blur, which inevitably occurs when a foveal projection is moved by a pan-tilt mirror or gantry. We propose a solution that reduces the motion blur artifacts, and evaluate how this solution improves the image quality using both qualitative and quantitative experiments. Our proposed method defines an error function to assess the displayed image quality as the difference between an original hi-res image and the displayed image by considering the nonuniform spatial property of human visual acuity. Then, it decides the set of positions and moving techniques of foveal projections so that the sum of errors during a video sequence is minimized. Through experiment, we confirmed that the proposed method can provide a better image quality and significantly improve the motion blur artifacts when compared with a conventional DF+C display. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 50. Fast CU Splitting and Pruning for Suboptimal CU Partitioning in HEVC Intra Coding

    Publication Year: 2013 , Page(s): 1555 - 1564
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9621 KB) |  | HTML iconHTML  

    High Efficiency Video Coding (HEVC), a new video coding standard currently being established, adopts a quadtree-based Coding Unit (CU) block partitioning structure that is flexible in adapting various texture characteristics of images. However, this causes a dramatic increase in computational complexity compared to previous video coding standards due to the necessity of finding the best CU partitions. In this paper, a fast CU splitting and pruning method is presented for HEVC intra coding, which allows for significant reduction in computational complexity with small degradations in rate-distortion (RD) performance. The proposed fast splitting and pruning method is performed in two complementary steps: 1) early CU split decision and 2) early CU pruning decision. For CU blocks, the early CU splitting and pruning tests are performed at each CU depth level according to a Bayes decision rule method based on low-complexity RD costs and full RD costs, respectively. The statistical parameters for the early CU split and pruning tests are periodically updated on the fly for each CU depth level to cope with varying signal characteristics. Experimental results show that our proposed fast CU splitting and pruning method reduces the computational complexity of the current HM to about 50% in encoding time with only 0.6% increases in BD rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it