By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 11 • Date Nov. 2011

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (161 KB)  
    Freely Available from IEEE
  • Editor's Note

    Page(s): 2129 - 2130
    Save to Project icon | Request Permissions | PDF file iconPDF (108 KB)  
    Freely Available from IEEE
  • Computational versus Psychophysical Bottom-Up Image Saliency: A Comparative Evaluation Study

    Page(s): 2131 - 2146
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB) |  | HTML iconHTML  

    The predictions of 13 computational bottom-up saliency models and a newly introduced Multiscale Contrast Conspicuity (MCC) metric are compared with human visual conspicuity measurements. The agreement between human visual conspicuity estimates and model saliency predictions is quantified through their rank order correlation. The maximum of the computational saliency value over the target support area correlates most strongly with visual conspicuity for 12 of the 13 models. A simple multiscale contrast model and the MCC metric both yield the largest correlation with human visual target conspicuity (>;0.84). Local image saliency largely determines human visual inspection and interpretation of static and dynamic scenes. Computational saliency models therefore have a wide range of important applications, like adaptive content delivery, region-of-interest-based image compression, video summarization, progressive image transmission, image segmentation, image quality assessment, object recognition, and content-aware image scaling. However, current bottom-up saliency models do not incorporate important visual effects like crowding and lateral interaction. Additional knowledge about the exact nature of the interactions between the mechanisms mediating human visual saliency is required to develop these models further. The MCC metric and its associated psychophysical saliency measurement procedure are useful tools to systematically investigate the relative contribution of different feature dimensions to overall visual target saliency. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Component-Wise Analysis of Constructible Match Cost Functions for Global Stereopsis

    Page(s): 2147 - 2159
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2098 KB) |  | HTML iconHTML  

    Match cost functions are common elements of every stereopsis algorithm that are used to provide a dissimilarity measure between pixels in different images. Global stereopsis algorithms incorporate assumptions about the smoothness of the resulting distance map that can interact with match cost functions in unpredictable ways. In this paper, we present a large-scale study on the relative performance of a structured set of match cost functions within several global stereopsis frameworks. We compare 272 match cost functions that are built from component parts in the context of four global stereopsis frameworks with a data set consisting of 57 stereo image pairs at three different variances of synthetic sensor noise. From our analysis, we infer a set of general rules that can be used to guide derivation of match cost functions for use in global stereopsis algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian Estimation of Beta Mixture Models with Variational Inference

    Page(s): 2160 - 2173
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4664 KB)  

    Bayesian estimation of the parameters in beta mixture models (BMM) is analytically intractable. The numerical solutions to simulate the posterior distribution are available, but incur high computational cost. In this paper, we introduce an approximation to the prior/posterior distribution of the parameters in the beta distribution and propose an analytically tractable (closed form) Bayesian approach to the parameter estimation. The approach is based on the variational inference (VI) framework. Following the principles of the VI framework and utilizing the relative convexity bound, the extended factorized approximation method is applied to approximate the distribution of the parameters in BMM. In a fully Bayesian model where all of the parameters of the BMM are considered as variables and assigned proper distributions, our approach can asymptotically find the optimal estimate of the parameters posterior distribution. Also, the model complexity can be determined based on the data. The closed-form solution is proposed so that no iterative numerical calculation is required. Meanwhile, our approach avoids the drawback of overfitting in the conventional expectation maximization algorithm. The good performance of this approach is verified by experiments with both synthetic and real data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Processing Allocation in Video

    Page(s): 2174 - 2187
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1459 KB) |  | HTML iconHTML  

    Large stores of digital video pose severe computational challenges to existing video analysis algorithms. In applying these algorithms, users must often trade off processing speed for accuracy, as many sophisticated and effective algorithms require large computational resources that make it impractical to apply them throughout long videos. One can save considerable effort by applying these expensive algorithms sparingly, directing their application using the results of more limited processing. We show how to do this for retrospective video analysis by modeling a video using a chain graphical model and performing inference both to analyze the video and to direct processing. We apply our method to problems in background subtraction and face detection, and show in experiments that this leads to significant improvements over baseline algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hough Forests for Object Detection, Tracking, and Action Recognition

    Page(s): 2188 - 2202
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1998 KB) |  | HTML iconHTML  

    The paper introduces Hough forests, which are random forests adapted to perform a generalized Hough transform in an efficient way. Compared to previous Hough-based systems such as implicit shape models, Hough forests improve the performance of the generalized Hough transform for object detection on a categorical level. At the same time, their flexibility permits extensions of the Hough transform to new domains such as object tracking and action recognition. Hough forests can be regarded as task-adapted codebooks of local appearance that allow fast supervised training and fast matching at test time. They achieve high detection accuracy since the entries of such codebooks are optimized to cast Hough votes with small variance and since their efficiency permits dense sampling of local image patches or video cuboids during detection. The efficacy of Hough forests for a set of computer vision tasks is validated through experiments on a large set of publicly available benchmark data sets and comparisons with the state-of-the-art. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Light-Efficient Photography

    Page(s): 2203 - 2214
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3146 KB)  

    In this paper, we consider the problem of imaging a scene with a given depth of field at a given exposure level in the shortest amount of time possible. We show that by 1) collecting a sequence of photos and 2) controlling the aperture, focus, and exposure time of each photo individually, we can span the given depth of field in less total time than it takes to expose a single narrower-aperture photo. Using this as a starting point, we obtain two key results. First, for lenses with continuously variable apertures, we derive a closed-form solution for the globally optimal capture sequence, i.e., that collects light from the specified depth of field in the most efficient way possible. Second, for lenses with discrete apertures, we derive an integer programming problem whose solution is the optimal sequence. Our results are applicable to off-the-shelf cameras and typical photography conditions, and advocate the use of dense, wide-aperture photo sequences as a light-efficient alternative to single-shot, narrow-aperture photography. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Improving the Efficiency of Tensor Voting

    Page(s): 2215 - 2228
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1929 KB)  

    This paper proposes two alternative formulations to reduce the high computational complexity of tensor voting, a robust perceptual grouping technique used to extract salient information from noisy data. The first scheme consists of numerical approximations of the votes, which have been derived from an in-depth analysis of the plate and ball voting processes. The second scheme simplifies the formulation while keeping the same perceptual meaning of the original tensor voting: The stick tensor voting and the stick component of the plate tensor voting must reinforce surfaceness, the plate components of both the plate and ball tensor voting must boost curveness, whereas junctionness must be strengthened by the ball component of the ball tensor voting. Two new parameters have been proposed for the second formulation in order to control the potentially conflictive influence of the stick component of the plate vote and the ball component of the ball vote. Results show that the proposed formulations can be used in applications where efficiency is an issue since they have a complexity of order O(1). Moreover, the second proposed formulation has been shown to be more appropriate than the original tensor voting for estimating saliencies by appropriately setting the two new parameters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization in Differentiable Manifolds in Order to Determine the Method of Construction of Prehistoric Wall Paintings

    Page(s): 2229 - 2244
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2451 KB)  

    In this paper, a general methodology is introduced for the determination of potential prototype curves used for the drawing of prehistoric wall paintings. The approach includes 1) preprocessing of the wall-paintings contours to properly partition them, according to their curvature, 2) choice of prototype curves families, 3) analysis and optimization in 4-manifold for a first estimation of the form of these prototypes, 4) clustering of the contour parts and the prototypes to determine a minimal number of potential guides, and 5) further optimization in 4-manifold, applied to each cluster separately, in order to determine the exact functional form of the potential guides, together with the corresponding drawn contour parts. The methodology introduced simultaneously deals with two problems: 1) the arbitrariness in data-points orientation and 2) the determination of one proper form for a prototype curve that optimally fits the corresponding contour data. Arbitrariness in orientation has been dealt with a novel curvature based error, while the proper forms of curve prototypes have been exhaustively determined by embedding curvature deformations of the prototypes into 4--manifolds. Application of this methodology to celebrated wall paintings excavated at Tyrins, Greece, and the Greek island of Thera manifests that it is highly probable that these wall paintings were drawn by means of geometric guides that correspond to linear spirals and hyperbolae. These geometric forms fit the drawings' lines with an exceptionally low average error, less than 0.39 mm. Hence, the approach suggests the existence of accurate realizations of complicated geometric entities more than 1,000 years before their axiomatic formulation in the Classical Ages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Multiscale Stereo Matching from Fundus Images with Radiometric Differences

    Page(s): 2245 - 2258
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1289 KB) |  | HTML iconHTML  

    A robust multiscale stereo matching algorithm is proposed to find reliable correspondences between low contrast and weakly textured retinal image pairs with radiometric differences. Existing algorithms designed to deal with piecewise planar surfaces with distinct features and Lambertian reflectance do not apply in applications such as 3D reconstruction of medical images including stereo retinal images. In this paper, robust pixel feature vectors are formulated to extract discriminative features in the presence of noise in scale space, through which the response of low-frequency mechanisms alter and interact with the response of high-frequency mechanisms. The deep structures of the scene are represented with the evolution of disparity estimates in scale space, which distributes the matching ambiguity along the scale dimension to obtain globally coherent reconstructions. The performance is verified both qualitatively by face validity and quantitatively on our collection of stereo fundus image sets with ground truth, which have been made publicly available as an extension of standard test images for performance evaluation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust Visual Tracking and Vehicle Classification via Sparse Representation

    Page(s): 2259 - 2272
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2870 KB) |  | HTML iconHTML  

    In this paper, we propose a robust visual tracking method by casting tracking as a sparse approximation problem in a particle filter framework. In this framework, occlusion, noise, and other challenging issues are addressed seamlessly through a set of trivial templates. Specifically, to find the tracking target in a new frame, each target candidate is sparsely represented in the space spanned by target templates and trivial templates. The sparsity is achieved by solving an ℓ1-regularized least-squares problem. Then, the candidate with the smallest projection error is taken as the tracking target. After that, tracking is continued using a Bayesian state inference framework. Two strategies are used to further improve the tracking performance. First, target templates are dynamically updated to capture appearance changes. Second, nonnegativity constraints are enforced to filter out clutter which negatively resembles tracking targets. We test the proposed approach on numerous sequences involving different types of challenges, including occlusion and variations in illumination, scale, and pose. The proposed approach demonstrates excellent performance in comparison with previously proposed trackers. We also extend the method for simultaneous tracking and recognition by introducing a static template set which stores target images from different classes. The recognition result at each frame is propagated to produce the final result for the whole video. The approach is validated on a vehicle tracking and classification task using outdoor infrared video sequences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical Computations on Grassmann and Stiefel Manifolds for Image and Video-Based Recognition

    Page(s): 2273 - 2286
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1580 KB)  

    In this paper, we examine image and video-based recognition applications where the underlying models have a special structure-the linear subspace structure. We discuss how commonly used parametric models for videos and image sets can be described using the unified framework of Grassmann and Stiefel manifolds. We first show that the parameters of linear dynamic models are finite-dimensional linear subspaces of appropriate dimensions. Unordered image sets as samples from a finite-dimensional linear subspace naturally fall under this framework. We show that an inference over subspaces can be naturally cast as an inference problem on the Grassmann manifold. To perform recognition using subspace-based models, we need tools from the Riemannian geometry of the Grassmann manifold. This involves a study of the geometric properties of the space, appropriate definitions of Riemannian metrics, and definition of geodesics. Further, we derive statistical modeling of inter and intraclass variations that respect the geometry of the space. We apply techniques such as intrinsic and extrinsic statistics to enable maximum-likelihood classification. We also provide algorithms for unsupervised clustering derived from the geometry of the manifold. Finally, we demonstrate the improved performance of these methods in a wide variety of vision applications such as activity recognition, video-based face recognition, object recognition from image sets, and activity-based video clustering. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trajectory Learning for Activity Understanding: Unsupervised, Multilevel, and Long-Term Adaptive Approach

    Page(s): 2287 - 2301
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16339 KB)  

    Society is rapidly accepting the use of video cameras in many new and varied locations, but effective methods to utilize and manage the massive resulting amounts of visual data are only slowly developing. This paper presents a framework for live video analysis in which the behaviors of surveillance subjects are described using a vocabulary learned from recurrent motion patterns, for real-time characterization and prediction of future activities, as well as the detection of abnormalities. The repetitive nature of object trajectories is utilized to automatically build activity models in a 3-stage hierarchical learning process. Interesting nodes are learned through Gaussian mixture modeling, connecting routes formed through trajectory clustering, and spatio-temporal dynamics of activities probabilistically encoded using hidden Markov models. Activity models are adapted to small temporal variations in an online fashion using maximum likelihood regression and new behaviors are discovered from a periodic retraining for long-term monitoring. Extensive evaluation on various data sets, typically missing from other work, demonstrates the efficacy and generality of the proposed framework for surveillance-based activity analysis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Unsupervised Organization of Image Collections: Taxonomies and Beyond

    Page(s): 2302 - 2315
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2518 KB)  

    We introduce a nonparametric Bayesian model, called TAX, which can organize image collections into a tree-shaped taxonomy without supervision. The model is inspired by the Nested Chinese Restaurant Process (NCRP) and associates each image with a path through the taxonomy. Similar images share initial segments of their paths and thus share some aspects of their representation. Each internal node in the taxonomy represents information that is common to multiple images. We explore the properties of the taxonomy through experiments on a large (~104) image collection with a number of users trying to locate quickly a given image. We find that the main benefits are easier navigation through image collections and reduced description length. A natural question is whether a taxonomy is the optimal form of organization for natural images. Our experiments indicate that although taxonomies can organize images in a useful manner, more elaborate structures may be even better suited for this task. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Are MSER Features Really Interesting?

    Page(s): 2316 - 2320
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    Detection and description of affine-invariant features is a cornerstone component in numerous computer vision applications. In this note, we analyze the notion of maximally stable extremal regions (MSERs) through the prism of the curvature scale space, and conclude that in its original definition, MSER prefers regular (round) regions. Arguing that interesting features in natural images usually have irregular shapes, we propose alternative definitions of MSER which are free of this bias, yet maintain their invariance properties. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Similarity Measure for Image and Volumetric Data Based on Hermann Weyl's Discrepancy

    Page(s): 2321 - 2329
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1313 KB) |  | HTML iconHTML  

    The paper focuses on similarity measures for translationally misaligned image and volumetric patterns. For measures based on standard concepts such as cross-correlation, L_p-norm, and mutual information, monotonicity with respect to the extent of misalignment cannot be guaranteed. In this paper, we introduce a novel distance measure based on Hermann Weyl's discrepancy concept that relies on the evaluation of partial sums. In contrast to standard concepts, in this case, monotonicity, positive-definiteness, and a homogenously linear upper bound with respect to the extent of misalignment can be proven. We show that this monotonicity property is not influenced by the image's frequencies or other characteristics, which makes this new similarity measure useful for similarity-based registration, tracking, and segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Motion Regularization for Matting Motion Blurred Objects

    Page(s): 2329 - 2336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3806 KB) |  | HTML iconHTML  

    This paper addresses the problem of matting motion blurred objects from a single image. Existing single image matting methods are designed to extract static objects that have fractional pixel occupancy. This arises because the physical scene object has a finer resolution than the discrete image pixel and therefore only occupies a fraction of the pixel. For a motion blurred object, however, fractional pixel occupancy is attributed to the object's motion over the exposure period. While conventional matting techniques can be used to matte motion blurred objects, they are not formulated in a manner that considers the object's motion and tend to work only when the object is on a homogeneous background. We show how to obtain better alpha mattes by introducing a regularization term in the matting formulation to account for the object's motion. In addition, we outline a method for estimating local object motion based on local gradient statistics from the original image. For the sake of completeness, we also discuss how user markup can be used to denote the local direction in lieu of motion estimation. Improvements to alpha mattes computed with our regularization are demonstrated on a variety of examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPAMI Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (161 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (149 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois