By Topic

Circuits and Systems for Video Technology, IEEE Transactions on

Early Access Articles

Early Access articles are new content made available in advance of the final electronic or print versions and result from IEEE's Preprint or Rapid Post processes. Preprint articles are peer-reviewed but not fully edited. Rapid Post articles are peer-reviewed and edited but not paginated. Both these types of Early Access articles are fully citable from the moment they appear in IEEE Xplore.

Filter Results

Displaying Results 1 - 25 of 127
  • Noise Estimation of Natural Images via Statistical Analysis and Noise Injection

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7936 KB)  

    We develop a framework for estimating noise level of natural image using two important statistics: high-kurtosis and scale-invariance in transform domain. By exploring the said priors of natural image statistics in 2D discrete cosine transform (DCT) domain, we reveal the limitations of these statistics for images with highly directional edges or large smooth areas. Then we derive a novel two-step estimation scheme for noise variance: 1) in preliminary estimation, an integration of wavelet and non-directional DCT transform is used to alleviate the influence of image’s structures; 2) a noise-injection rectification is further devised to further deal with the noise-free image contents. Simulation and comparative study demonstrate that this algorithm reliably infers noise variance and is robustness over wide ranges of visual content and noise levels, while outperforms some relevant methods. This work can significantly improve the performance of existing denoising techniques that require the noise variance as a critical parameter. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Nonparametric Image Parsing

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (9965 KB)  

    In this paper, we present an adaptive nonparametric solution to the image parsing task, namely annotating each image pixel with its corresponding category label. For a given test image, first, a locality-aware retrieval set is extracted from the training data based on super-pixel matching similarities, which are augmented with feature extraction for better differentiation of local super-pixels. Then, the category of each super-pixel is initialized by the majority vote of the k-nearest-neighbor superpixels in the retrieval set. Instead of fixing k as in traditional non-parametric approaches, here we propose a novel adaptive nonparametric approach which determines the sample-specific k for each test image. In particular, k is adaptively set to be the number of the fewest nearest super-pixels which the images in the retrieval set can use to get the best category prediction. Finally, the initial super-pixel labels are further refined by contextual smoothing. Extensive experiments on challenging datasets demonstrate the superiority of the new solution over other stateof- the-art nonparametric solutions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Background Prior Based Salient Object Detection via Deep Reconstruction Residual

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (10643 KB)  

    Detection of salient objects from images is gaining increasing research interest in recent years as it can substantially facilitate a wide range of content-based multimedia applications. Based on the assumption that foreground salient regions are distinctive within a certain context, most conventional approaches rely on a number of hand designed features and their distinctiveness measured using local or global contrast. Although these approaches have shown effective in dealing with simple images, their limited capability may cause difficulties when dealing with more complicated images. This paper proposes a novel framework for saliency detection by first modeling the background and then separating salient objects from the background. We develop stacked denoising autoencoders with deep learning architectures to model the background where latent patterns are explored and more powerful representations of data are learnt in an unsupervised and bottom up manner. Afterwards, we formulate the separation of salient objects from the background as a problem of measuring reconstruction residuals of deep autoencoders. Comprehensive evaluations on three benchmark datasets and comparisons with 9 state-of-the-art algorithms demonstrate the superiority of the proposed work. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Second-Order Configuration of Local Features for Geometrically Stable Image Matching and Retrieval

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2023 KB)  

    Local features offer high repeatability, which supports efficient matching between images, but they do not provide sufficient discriminative power. Imposing a geometric coherence constraint on local features improves the discriminative power but makes the matching sensitive to anisotropic transformations. We propose a novel feature representation approach to solve the latter problem. Each image is abstracted by a set of tuples of local features. We revisit affine shape adaptation and extend its conclusion to characterize the geometrically stable feature of each tuple. The representation thus provides higher repeatability with anisotropic scaling and shearing than previous research. We develop a simple matching model by voting in the geometrically stable feature space, where votes arise from tuple correspondences. To make the required index space linear as regards the number of features, we propose a second approach called a Centrality-Sensitive Pyramid to select potentially meaningful tuples of local features on the basis of their spatial neighborhood information. It achieves faster neighborhood association and has a greater robustness to errors in interest point detection and description. We comprehensively evaluated our approach using Flickr Logos 32, Holiday, Oxford Buildings and Flickr 100K benchmarks. Extensive experiments and comparisons with advanced approaches demonstrate the superiority of our approach in image retrieval tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-View Action Recognition Based on a Statistical Translation Framework

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4779 KB)  

    Actions captured under view changes pose serious challenges to modern action recognition methods. In this paper, we propose an effective approach for cross-view action recognition based on a statistical translation framework, which boils down to estimation of visual word transfer probabilities across views. Specifically, local features are extracted from action video frames and form bags of words based on k-means clustering. Though the appearance of an action may vary due to view changes, the underlying transfer tendency between visual words across views can be exploited. We propose two methods to measure the visual-word-based transfer relationship, which are eventually based on frequency counts of word pairs. In the first method, word transfer probabilities are estimated by maximizing the likelihood of a shared action set with the EM algorithm. In the second method, the word transfer probabilities are estimated by using likelihood-ratio tests. The two methods achieve comparable results and perform better when they are combined. For cross-view action classification, we compute action transfer probabilities based on the estimated word transfer probabilities, and then implement a K-NN-like classification based on action video transfer probabilities. We verified our method on the public multi-view IXMAS dataset and WVU dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Soft Decision Quantization with Adaptive Preselection and Dynamic Trellis Graph

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1413 KB)  

    Soft decision quantization (SDQ) is an efficient tool for video coding to achieve coefficient-level rate-distortion optimized quantization (RDOQ) with 6%~8% bit rate saving. However, the software and hardware implementations of SDQ suffer from either high complexity or low throughput capacity due to complex Viterbi trellis search and sequential processing in CABAC. In this paper, a fast SDQ algorithm is proposed to decrease the number of trellis stages to decrease the complexity and to break the data dependency in optimal SDQ. First, preselection is performed according to hard decision quantization (HDQ) results by intelligent coding cost estimation and comparison, during which some coefficients are judged to be "safely" excluded from trellis search, resulting in considerable complexity reduction. Second, a dynamic trellis graph with flexible structure is constructed according to the “unsafe” nonzero coefficients to accelerate the remaining partial Viterbi search. Third, a dynamic threshold selection model is proposed for adaptive thresholding to increase the probability of right preselection, under a constraint on a predefined maximal probability of wrong preselection. Experimental results show that compared with optimal SDQ, the proposed algorithm can at least reduce the computation complexity by 50%-80%, memory accesses by 75%-82%, and the sequential processing latency in hardware implementation by 87.25%, with less than 0.4% BD-RATE increment when maximal three "unsafe" coefficients are kept for trellis search in one block. This work is suitable for high-throughput hardware and computation-sensitive software implementations for SDQ and RDOQ for H.264/AVC and HEVC standards. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation over Detection via Optimal Sparse Reconstructions

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (13265 KB)  

    This paper addresses the problem of semantic segmentation, where the possible class labels are from a pre-defined set. We exploit top-down guidance, i.e. the coarse localization of the objects and their class labels, provided by object detectors. For each detected bounding box figure-ground segmentation is performed and the final result is achieved by merging the figureground segmentations. The main idea of the proposed approach, which is presented in our preliminary work [1], is to reformulate the figure-ground segmentation problem as sparse reconstruction pursuing the object mask in a non-parametric manner. The latent segmentation mask should be coherent subject to sparse error caused by intra-category diversity, thus the object mask is inferred by making use of sparse representations over the training set. In order to handle local spatial deformations, local patch-level masks are also considered and inferred by sparse representations over the spatially nearby patches. The sparse reconstruction coefficients and the latent mask are alternately optimized by applying the Lasso algorithm and the Accelerated Proximal Gradient method. The proposed formulation results in a convex optimization problem, thus the global optimal solution is achieved. In this paper we provide theoretical analysis of the convergence and optimality. We also give an extended numerical analysis of the proposed algorithm and a comprehensive comparison with the related semantic segmentation methods on the challenging PASCAL VOC object segmentation datasets and the Weizmann-Horses Dataset. The experimental results demonstrate that the proposed algorithm achieves competitive performance comparing to the state-of-the-arts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast PU Skip and Split Termination Algorithm for HEVC Intra Prediction

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1382 KB)  

    High Efficiency Video Coding (HEVC) is developed for next-generation video coding which achieves significant improvements in coding efficiency compared to H.264/AVC by adopting various tools including a quadtree-based block partitioning structure. However, this causes high encoding complexity for exhaustive rate distortion (RD) cost computation of extended Prediction Unit (PU) searching. In this paper, a fast PU skip and split termination algorithm is proposed. The proposed method consists of three algorithms; early skip, PU skip and PU split termination. The early skip algorithm allows immediate skipping of the RD cost computation for large PUs according to the neighboring PUs. Based on Bayes’s rule, the PU skip algorithm allows for skipping the full RD cost computation, and the split termination algorithm terminates further PU splitting using the RD cost of rough mode decision (RMD). The decision parameter for the PU skip and split termination is presented as the ratio of the RMD RD costs between the current PU and the spatially adjacent or upper depth PU. The simulation results show that the proposed algorithm achieves a 53.52% encoding time savings while maintaining almost the same RD performances as the HEVC reference software. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Software H.264/AVC to HEVC Transcoding on Distributed Multi-Core Processors

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB)  

    The latest High Efficiency Video Coding (HEVC) standard achieves significant compression efficiency improvement over the H.264/AVC standard, but with a much higher computational complexity. In this paper, we propose a novel framework for software-based H.264/AVC to HEVC transcoding, integrated with tools such as Wavefront Parallel Processing (WPP) that are useful for achieving higher levels of parallelism on multicore processors and distributed systems. By utilizing information extracted from the input H.264/AVC bitstream, the transcoding process can be greatly accelerated with a visual quality loss that is modest for many applications. Based on the HEVC HM 14.0 reference software and using standard HEVC test bitstreams, the proposed transcoder can achieve up to 60x speed up on a Quad Core 8-thread server over decoding-re-encoding based on ffmpeg and the HM software with a BD-Rate loss of 15%-20%. By implementing GOP-level task distribution on a distributed system with 9 processing units, the proposed software transcoder can achieve a speed for transcoding 720p30 in real-time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Tag Refinement with View-Dependent Concept Representations

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12974 KB)  

    Image tag refinement is the task of refining initial tags of an image such that the refined tags can better reflect the content of the image and therefore can help users better access that image. The quality of tag refinement depends on the quality of concept representations which build a mapping from concepts to visual images. While good progress was made in the past decade on tag refinement, the previous approaches only achieved limited success due to their limited concept representations. In this paper, we show that the visual appearances of a concept consist of both a generic view and a specific view, and therefore we can comprehensively represent a concept by two components. To ensure a clean concept representation, this representation is learned on clean click-through data, where noises are greatly reduced. In the framework, a coarse-to-fine image tag refinement is proposed, which 1) first generates an efficient star-graph to find candidate tags but missing in the initial tag list of an input image, and 2) guided by this view-dependent concept representation, formulates a probabilistic objective function to eliminate irrelevant tags. Extensive experiments on two widely used standard datasets (MIRFlickr-25K and NUS-WIDE-270K) demonstrate the effectiveness of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Depth-Based Texture Coding in AVC-Compatible 3D Video Coding

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1417 KB)  

    The target of 3D video coding is to compress multi-view video plus depth (MVD) format data, which consists of a texture image and its corresponding depth map. In the MVD format, the depth map plays an important role for successful services in 3D video applications, because it enables users to experience 3D by generating arbitrary intermediate views. The depth map has a strong correlation with its associated texture data, so it can be utilized to improve texture coding efficiency. This paper introduces a novel and efficient depth-based texture coding scheme. It includes depth-based motion vector prediction (DMVP), block-based view synthesis prediction (BVSP), and adaptive luminance compensation (ALC), which were adopted in an AVC-compatible 3D video coding (3D-AVC) standard. Simulation results demonstrate that the proposed scheme reduces the total coding bitrates of texture and depth by 19.06% for the coded PSNR and 17.01% for the synthesized PSNR in a P-I-P view prediction structure, respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Background Modeling Based on Sparse Representation and Outlier Iterative Removal

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4574 KB)  

    Background modeling is a critical component for various vision-based applications. Most traditional methods tend to be inefficient when solving large-scale problems. In this paper, we introduce sparse representation into the task of largescale stable-background modeling, and reduce the video size by exploring its ”discriminative” frames. A cyclic iteration process is then proposed to extract the background from the discriminative frame set. The two parts combine to form our Sparse Outlier Iterative Removal (SOIR) algorithm. The algorithm operates in tensor space to obey the natural data structure of videos. Experimental results show that a few discriminative frames determine the performance of the background extraction. Further, SOIR can achieve high accuracy and high speed simultaneously when dealing with real video sequences. Thus, SOIR has an advantage in solving large-scale tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-layer Fairness-driven Concurrent Multipath Video Delivery over Heterogenous Wireless Networks

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6720 KB)  

    The growing availability of various wireless access technologies promotes increasing demand for mobile video applications. Stream Control Transmission Protocol (SCTP)-based Concurrent Multipath Transfer (CMT) improves the wireless video delivery performance with its parallel transmission and bandwidth aggregation features. However, the existing CMT solutions deployed at the transport layer only are not accurate enough due to lower layer uncertainties such as variations of the wireless channel. In addition, CMT-based video transmission may use excessive bandwidth in comparison with the popular TCPbased flows, which results in unfair sharing of network resources. This paper proposes a novel Cross-Layer Fairness-Driven SCTPbased Concurrent Multipath Transfer solution (CMT-CL/FD) to improve video delivery performance while remaining fair to the competing TCP flows. CMT-CL/FD utilizes a crosslayer approach to monitor and analyze path quality, which includes wireless channel measurements at data-link layer and rate/bandwidth estimations at transport layer. Furthermore, an innovative window-based mechanism is applied for flow control to balance delivery fairness and efficiency. Finally, CMT-CL/FD intelligently distributes video data over different paths depending on their estimated quality to mitigate packet reordering and loss, under the constraint of TCP-friendly flow control. Simulation results show how CMT-CL/FD outperforms existing solutions in terms of both video delivery performance and TCP-friendliness. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image Interpolation via Low-rank Matrix Completion and Recovery

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (11994 KB)  

    Methods of achieving image super-resolution have been the object of research for some time. These approaches suggest that when a low-resolution image is directly downsampled from its corresponding high-resolution image without blurring, i.e., the blurring kernel is the Dirac delta function, the reconstruction becomes an imageinterpolation problem. Hence, this is a pervasive way to explore the linear relationship among neighboring pixels to reconstruct a high-resolution image from a low-resolution input image. This paper seeks an efficient method to determine the local order of the linear model implicitly. According to the theory of low-rank matrix completion and recovery, a method for performing single-image superresolution is proposed by formulating the reconstruction as the recovery of a low-rank matrix, which can be solved by the augmented Lagrange multiplier method. In addition, the proposed method can be used to handle noisy data and random perturbations robustly. Experimental results show that the proposed method is effective and competitive compared with other methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two Maximum Entropy Based Algorithms for Running Quantile Estimation in Non-Stationary Data Streams

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4176 KB)  

    The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many semi-automatic surveillance analytics systems which detect abnormalities in close-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper we specifically address the problem of estimating the running quantile of a data stream when the memory for storing observations is limited. We make several major contributions: (i) we highlight the limitations of approaches previously described in the literature which make them unsuitable for non-stationary streams, (ii) we describe a novel principle for the utilization of the available storage space, (iii) we introduce two novel algorithms which exploit the proposed principle in different ways, and (iv) we present a comprehensive evaluation and analysis of the proposed algorithms and the existing methods in the literature on both synthetic data sets and three large ‘real-world’ streams acquired in the course of operation of an existing commercial surveillance system. Our findings convincingly demonstrate that both of the proposed methods are highly successful and vastly outperform the existing alternatives. We show that the better of the two algorithms (‘data-aligned histogram’) exhibits far superior performance in comparison with the previously described methods, achieving more than 10 times lower estimate errors on real-world data, even when its available working memory is an order of magnitude smaller. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A CMOS Readout with High-Precision and Low-Temperature-Coefficient Background Current Skimming for Infrared Focal Plane Array

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (750 KB)  

    A high-performance CMOS readout structure employing a new background current skimming technique for infrared focal plane array (IFPA) applications is proposed, analyzed and verified. Both the background current skimming circuit and bias circuit have good immunity to threshold-voltage variations. In addition, the background current skimming circuit is also independent of temperature variations. An experimental readout chip has been fabricated using SMIC 0.18 μm 1P6M process technology with a unit-cell size of 30 μm × 27 μm and power consumption less than 0.06 mW. With 4.0 V power supply, the ROIC provides an dynamic output range over 3.0 V and an output linearity of more than 99%. The background suppression current whose level is tunable between 470 nA and 5.0 μA has a variation of 2.2% corresponding to a temperature coefficient of 275 ppm/°C. The simulation and experiment results confirmed the good performance of the proposed background current skimming circuit for IFPA applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CoDe4D: Color-Depth Local Spatio-Temporal Features for Human Activity Recognition from RGB-D Videos

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5264 KB)  

    Human activity recognition has a variety of important real-world applications, such as video analysis, surveillance, and human-robot interaction. As a promising video representation method, local spatial-temporal (LST) features have received increasing attention from computer vision, machine learning and robotics communities. However, approaches based on traditional LST features only use color information, which face several challenges, such as illumination changes and dynamic backgrounds. The recent availability of commercial color-depth cameras makes it much cheaper, faster and easier to acquire depth information, which provides a potential to implement more discriminative and robust LST features. In this paper, we introduce the new 4-dimensional color-depth (CoDe4D) LST feature that incorporates both intensity and depth information acquired from RGB-D cameras. Our feature detector constructs a saliency map through applying independent filters in xyzt dimension to represent texture, shape and pose variations, and selects its local maxima as interest points. Our multi-channel orientation histogram (MCOH) descriptor applies a 4D support region, which is adaptive to linear perspective view changes, on each interest point. Then, image gradients of color-depth patches within the support region are computed and quantized using a spherical coordinate based method to form a final feature vector. We build a complete activity recognition system by combining our features with bag-of-features representations and SVMs. To evaluate the performance of our CoDe4D LST features and the complete system, we conduct experiments using four benchmark color-depth human activity datasets, including UTK Action3D, Berkeley MHAD, ACT42, and MSR Daily Activity 3D datasets. Experimental results demonstrate the promising representative power of our CoDe4D features, which obtain the state-of-the-art performance on activity recognition from RGB-D visual data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Depth Map Rasterization Using Triangulation and Color Consistency For Various Sampling Structures

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4088 KB)  

    The reconstruction of a dense depth map from sparsely sampled depth information has been studied to improve stereoscopic 3D images. Among various sparse-to-dense depth map reconstruction methods, triangular-patch-based approaches have been widely used. However, the triangular-patchbased approaches suffer a limitation in preserving depth discontinuities, because they assume that each triangular patch has a planar depth map. This paper presents a novel depth map rasterization method that considers the depth discontinuities and color consistencies in each triangular patch. Experiment results show that the proposed depth map rasterization method reconstructs dense depth maps with higher accuracy than the previous depth map upsampling methods for both regular and irregular depth map sampling structures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Slice Representation for Human Action Classification

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1299 KB)  

    Common action recognition methods describe an action sequence along with its time axis, i.e., firstly extracting features from the X-Y plane, then modeling the dynamic changes along with the time axis. Other than the ordinary X-Y plane based representation, other views, e.g., X-T slice based representation, may be more efficient to distinguish different actions. In this paper, we investigate different slicing views of the spatiotemporal volume to organize action sequences and propose an efficient slice representation for human action recognition. Firstly, a Minimum Average Entropy (MinAE) principle is proposed to select the optimal slicing angle for each action sequence adaptively. This allows the foreground pixels to be distributed in the fewest slices so as to reduce more uncertainty caused by the information dispersed in different slices. Then, the obtained slice sequence is transformed into a pair of 1D signals to describe the distribution of foreground pixels along the time axis. Finally, the Mel Frequency Cepstrum Coefficient (MFCC) features are calculated to describe the spectrum characteristics of the 1D signals over time. Thus, a 3D spatio-temporal action volume is efficiently transformed into a low dimensional spectrum features. Extensive experiments on the 2D human action datasets (the UIUC and the WEIZMANN) as well as the MSR Action3D depth dataset demonstrate the effectiveness of the slice based representation, where the recognition performance can reach to the state-of-the-art level with high efficiency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploratory Product Image Search with Circle-to-Search Interaction

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (12789 KB)  

    Exploratory search is emerging as a new form of information seeking activity in the research community, which generally combines browsing and searching content together to help users gain additional knowledge and form accurate queries, thereby assisting the users with their seeking and investigation activities. However, there have been few attempts at addressing integrated exploratory search solutions when image browsing is incorporated into the exploring loop. In this work, we investigate the challenges of understanding users search interests from the product images being browsed and inferring their actual search intentions. We propose a novel interactive image exploring system for allowing users to lightly switch between browse and search processes, and naturally complete visual-based exploratory search tasks in an effective and efficient way. This system enables users to specify their visual search interests in product images by circling any visual objects in web pages, and then the system automatically infers users’ underlying intent by analyzing the browsing context and by analyzing the same or similar product images obtained by largescale image search technology. Users can then utilize the recommended queries to complete intent-specific exploratory tasks. The proposed solution is one of the first attempts to understand users’ interests for a visual-based exploratory product search task by integrating the “browse” and “search” activities. We have evaluated our system performance based on five million product images. The evaluation study demonstrates that the proposed system provides accurate intent-driven search results and fast response to exploratory search demands compared to the conventional image search methods, and also, provides users with robust results to satisfy their exploring experience. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FDQM: Fast Quality Metric for Depth Maps without View Synthesis

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1052 KB)  

    We propose a fast quality metric for depth maps, called FDQM, which efficiently evaluates the impacts of depth map errors on the qualities of synthesized intermediate views in multi-view video plus depth applications. In other words, the proposed FDQM assesses view synthesis distortions in the depth map domain, without performing the actual view synthesis. First, we estimate the distortions at pixel positions, which are specified by reference disparities and distorted disparities, respectively. Then, we integrate those pixel-wise distortions into an FDQM score by employing a spatial pooling scheme, which considers occlusion effects and the characteristics of human visual attention. As a benchmark of depth map quality assessment, we perform a subjective evaluation test for intermediate views, which are synthesized from compressed depth maps at various bit-rates. We compare the subjective results with objective metric scores. Experimental results demonstrate that the proposed FDQM yields highly correlated scores to the subjective ones. Moreover, FDQM requires at least 10 times less computations than conventional quality metrics, since it does not perform the actual view synthesis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Compressive Sensing Approach to Describe Indoor Scenes for Blind People

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1197 KB)  

    This paper introduces a new portable camera-based method for helping blind people to recognize indoor objects. Unlike the state-of-the-art techniques, which typically perform the recognition task by limiting it to a single predefined class of objects, we propose here a completely different alternative scheme, defined as coarse description. It aims at expanding the recognition task to multiple objects and, at the same time, keeping the processing time under control by sacrificing some information details. The benefit is to increment the awareness and the perception of a blind person to his direct contextual environment. The coarse description issue is addressed via two image multilabeling strategies which differ in the way image similarity is computed. The first one makes use of the Euclidean distance measure, while the second one relies on a semantic similarity measure modeled by means of Gaussian process (GP) estimation. In order to achieve fast computation capability, both strategies rely on a compact image representation based on compressive sensing. The proposed methodology was assessed on two indoor datasets representing different indoor environments. Encouraging results were achieved in terms of both accuracy and processing time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient adaptive binary range coder and its VLSI architecture

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (966 KB)  

    In this paper we propose a new hardware-efficient adaptive binary range coder (ABRC) and its VLSI architecture. To achieve this we follow an approach, which allows to reduce the bit capacity of the multiplication needed in the interval division part and show how to avoid the need to use a loop in the renormalization part of ABRC. The probability estimation in the proposed ABRC is based on a look-up table free virtual sliding window. To obtain a higher compression performance we propose a new adaptive window size selection algorithm. In comparison with an ABRC with a single window the proposed system provides a faster probability adaptation at the initial encoding/decoding stage, and more accurate probability estimation for very low entropy binary sources. We show that the VLSI architecture of the proposed ABRC attains a throughput of 105.92 MSymbols/sec on the FPGA platform, and consumes 18.15 mW for the dynamic part power. In comparison with the state-of-the-art MQ-coder (used in JPEG2000 standard) and the M-coder (used in H.264/AVC and H.265/HEVC standards) the proposed ABRC architecture provides comparable throughput, reduced memory and power consumption. Experimental results obtained for a wavelet video codec with JPEG2000-like bit-plane entropy coder shows that the proposed ABRC allows to reduce the bit rate by 0.8–8% in comparison with the MQ-coder and from 1.0–24.2% in comparison with the M-coder. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic Contrast Enhancement Technology with Saliency Preservation

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3414 KB)  

    In this paper, we investigate the problem of image contrast enhancement. Most existing relevant technologies often suffer from the drawback of excessive enhancement, thereby introducing noise/artifacts and changing visual attention regions. One frequently used solution is manual parameter tuning, which is, however, impractical for most applications since it is labor intensive and time-consuming. In this research, we find that saliency preservation can help produce appropriately enhanced images, i.e. improved contrast without annoying artifacts. We therefore design an automatic contrast enhancement technology with a complete histogram modification framework and an automatic parameter selector. This framework combines the original image, its histogram equalized product, and its visually-pleasing version created by a sigmoid transfer function that was developed in our recent work. Then a visual quality judging criterion is developed based on the concept of saliency preservation, which assists the automatic parameters selection, and finally properly enhanced image can be generated accordingly. We test the proposed scheme on Kodak and VQEG databases and compare with the classical histogram equalization technique and its variations as well as state-of-the-art contrast enhancement approaches. Experimental results demonstrate that our technique has superior saliency preservation ability and outstanding enhancement effect. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and ARM-Embedded Implementation of A Chaotic Map-Based Real-Time Secure Video Communication System

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2765 KB)  

    A systematic methodology is proposed for a chaotic map-based real-time video encryption and decryption system with Advanced RISC Machine (ARM)-embedded hardware implementation. According to the anti-control principle of dynamical systems, first, an 8-dimensional discrete-time chaotic map-based system is constructed, which possesses the required property of 1-1 surjection in the integer range [0, N −1], where N denotes the number of frame pixels, suitable for position scrambling of each video frame. Then, an 8-D discrete-time hyper-chaotic system is designed for encryption-decryption of RGB tricolor pixel values. Based on the ARM-embedded platform super4412 model with Cortex-A9 processor, together with the standard QT cross-platform, an integrated chaotic map-based real-time secure video communication system is designed, implemented and evaluated. In addition, the security performance of the designed system is tested by using criteria from the NIST statistical test suite. The main feature of this method is that, both scrambling-antiscrambling of RGB tricolor pixel positions and encryption-decryption of pixel values are realized at the same time for enhancing the security. As is well known, compared with numerical simulations, hardware implementation for such a secure video communication system is very difficult to achieve, but it was successfully implemented and tested in a real world network environment by us. Both theoretical analysis and experimental results validate the feasibility and real-time performance of the new secure video communication system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The emphasis is focused on, but not limited to:
1. Video A/D and D/ A
2. Video Compression Techniques and Signal Processing
3. Multi-Dimensional Filters and Transforms
4. High Speed Real-Tune Circuits
5. Multi-Processors Systems—Hardware and Software
6. VLSI Architecture and Implementation for Video Technology 

 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Dan Schonfeld
Multimedia Communications Laboratory
ECE Dept. (M/C 154)
University of Illinois at Chicago (UIC)
Chicago, IL 60607-7053
tcsvt-eic@tcad.polito.it

Managing Editor
Jaqueline Zelkowitz
tcsvt@tcad.polito.it