By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 7 • Date July 2014

Filter Results

Displaying Results 1 - 21 of 21
  • Table of Contents

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (355 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Pattern Analysis and Machine Intelligence Editorial Board

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (319 KB)  
    Freely Available from IEEE
  • As-Projective-As-Possible Image Stitching with Moving DLT

    Page(s): 1285 - 1298
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3744 KB) |  | HTML iconHTML  

    The success of commercial image stitching tools often leads to the impression that image stitching is a “solved problem”. The reality, however, is that many tools give unconvincing results when the input photos violate fairly restrictive imaging assumptions; the main two being that the photos correspond to views that differ purely by rotation, or that the imaged scene is effectively planar. Such assumptions underpin the usage of 2D projective transforms or homographies to align photos. In the hands of the casual user, such conditions are often violated, yielding misalignment artifacts or “ghosting” in the results. Accordingly, many existing image stitching tools depend critically on post-processing routines to conceal ghosting. In this paper, we propose a novel estimation technique called Moving Direct Linear Transformation (Moving DLT) that is able to tweak or fine-tune the projective warp to accommodate the deviations of the input data from the idealized conditions. This produces as-projective-as-possible image alignment that significantly reduces ghosting without compromising the geometric realism of perspective image stitching. Our technique thus lessens the dependency on potentially expensive postprocessing algorithms. In addition, we describe how multiple as-projective-as-possible warps can be simultaneously refined via bundle adjustment to accurately align multiple images for large panorama creation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalized Boundaries from Multiple Image Interpretations

    Page(s): 1312 - 1324
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (5138 KB) |  | HTML iconHTML  

    Boundary detection is a fundamental computer vision problem that is essential for a variety of tasks, such as contour and region segmentation, symmetry detection and object recognition and categorization. We propose a generalized formulation for boundary detection, with closed-form solution, applicable to the localization of different types of boundaries, such as object edges in natural images and occlusion boundaries from video. Our generalized boundary detection method (Gb) simultaneously combines low-level and mid-level image representations in a single eigenvalue problem and solves for the optimal continuous boundary orientation and strength. The closed-form solution to boundary detection enables our algorithm to achieve state-of-the-art results at a significantly lower computational cost than current methods. We also propose two complementary novel components that can seamlessly be combined with Gb: first, we introduce a soft-segmentation procedure that provides region input layers to our boundary detection algorithm for a significant improvement in accuracy, at negligible computational cost; second, we present an efficient method for contour grouping and reasoning, which when applied as a final post-processing stage, further increases the boundary detection performance. View full abstract»

    Open Access
  • Dynamic Probabilistic CCA for Analysis of Affective Behavior and Fusion of Continuous Annotations

    Page(s): 1299 - 1311
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1584 KB) |  | HTML iconHTML  

    Fusing multiple continuous expert annotations is a crucial problem in machine learning and computer vision, particularly when dealing with uncertain and subjective tasks related to affective behavior. Inspired by the concept of inferring shared and individual latent spaces in Probabilistic Canonical Correlation Analysis (PCCA), we propose a novel, generative model that discovers temporal dependencies on the shared/individual spaces (Dynamic Probabilistic CCA, DPCCA). In order to accommodate for temporal lags, which are prominent amongst continuous annotations, we further introduce a latent warping process, leading to the DPCCA with Time Warpings (DPCTW) model. Finally, we propose two supervised variants of DPCCA/DPCTW which incorporate inputs (i.e., visual or audio features), both in a generative (SG-DPCCA) and discriminative manner (SD-DPCCA). We show that the resulting family of models (i) can be used as a unifying framework for solving the problems of temporal alignment and fusion of multiple annotations in time, (ii) can automatically rank and filter annotations based on latent posteriors or other model statistics, and (iii) that by incorporating dynamics, modeling annotation-specific biases, noise estimation, time warping and supervision, DPCTW outperforms state-of-the-art methods for both the aggregation of multiple, yet imperfect expert annotations as well as the alignment of affective behavior. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

    Page(s): 1325 - 1339
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2298 KB) |  | HTML iconHTML  

    We introduce a new dataset, Human3.6M, of 3.6 Million accurate 3D Human poses, acquired by recording the performance of 5 female and 6 male subjects, under 4 different viewpoints, for training realistic human sensing systems and for evaluating the next generation of human pose estimation models and algorithms. Besides increasing the size of the datasets in the current state-of-the-art by several orders of magnitude, we also aim to complement such datasets with a diverse set of motions and poses encountered as part of typical human activities (taking photos, talking on the phone, posing, greeting, eating, etc.), with additional synchronized image, human motion capture, and time of flight (depth) data, and with accurate 3D body scans of all the subject actors involved. We also provide controlled mixed reality evaluation scenarios where 3D human models are animated using motion capture and inserted using correct 3D geometry, in complex real environments, viewed with moving cameras, and under occlusion. Finally, we provide a set of large-scale statistical models and detailed evaluation baselines for the dataset illustrating its diversity and the scope for improvement by future work in the research community. Our experiments show that our best large-scale model can leverage our full training set to obtain a 20% improvement in performance compared to a training set of the scale of the largest existing public dataset for this problem. Yet the potential for improvement by leveraging higher capacity, more complex models with our large dataset, is substantially vaster and should stimulate future research. The dataset together with code for the associated large-scale learning models, features, visualization tools, as well as the evaluation server, is available online at http://vision.imar.ro/human3.6m. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Iterative Discovery of Multiple AlternativeClustering Views

    Page(s): 1340 - 1353
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1678 KB) |  | HTML iconHTML  

    Complex data can be grouped and interpreted in many different ways. Most existing clustering algorithms, however, only find one clustering solution, and provide little guidance to data analysts who may not be satisfied with that single clustering and may wish to explore alternatives. We introduce a novel approach that provides several clustering solutions to the user for the purposes of exploratory data analysis. Our approach additionally captures the notion that alternative clusterings may reside in different subspaces (or views). We present an algorithm that simultaneously finds these subspaces and the corresponding clusterings. The algorithm is based on an optimization procedure that incorporates terms for cluster quality and novelty relative to previously discovered clustering solutions. We present a range of experiments that compare our approach to alternatives and explore the connections between simultaneous and iterative modes of discovery of multiple clusterings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiple Kernel Learning for Visual Object Recognition: A Review

    Page(s): 1354 - 1369
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2331 KB) |  | HTML iconHTML  

    Multiple kernel learning (MKL) is a principled approach for selecting and combining kernels for a given recognition task. A number of studies have shown that MKL is a useful tool for object recognition, where each image is represented by multiple sets of features and MKL is applied to combine different feature sets. We review the state-of-the-art for MKL, including different formulations and algorithms for solving the related optimization problems, with the focus on their applications to object recognition. One dilemma faced by practitioners interested in using MKL for object recognition is that different studies often provide conflicting results about the effectiveness and efficiency of MKL. To resolve this, we conduct extensive experiments on standard datasets to evaluate various approaches to MKL for object recognition. We argue that the seemingly contradictory conclusions offered by studies are due to different experimental setups. The conclusions of our study are: (i) given a sufficient number of training examples and feature/kernel types, MKL is more effective for object recognition than simple kernel combination (e.g., choosing the best performing kernel or average of kernels); and (ii) among the various approaches proposed for MKL, the sequential minimal optimization, semi-infinite programming, and level method based ones are computationally most efficient. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relating Things and Stuff via ObjectProperty Interactions

    Page(s): 1370 - 1383
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1459 KB) |  | HTML iconHTML  

    In the last few years, substantially different approaches have been adopted for segmenting and detecting “things” (object categories that have a well defined shape such as people and cars) and “stuff” (object categories which have an amorphous spatial extent such as grass and sky). While things have been typically detected by sliding window or Hough transform based methods, detection of stuff is generally formulated as a pixel or segment-wise classification problem. This paper proposes a framework for scene understanding that models both things and stuff using a common representation while preserving their distinct nature by using a property list. This representation allows us to enforce sophisticated geometric and semantic relationships between thing and stuff categories via property interactions in a single graphical model. We use the latest advances made in the field of discrete optimization to efficiently perform maximum a posteriori (MAP) inference in this model. We evaluate our method on the Stanford dataset by comparing it against state-of-the-art methods for object segmentation and detection. We also show that our method achieves competitive performances on the challenging PASCAL '09 segmentation dataset. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape Analysis of Planar Multiply-Connected Objects Using Conformal Welding

    Page(s): 1384 - 1401
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2978 KB) |  | HTML iconHTML  

    Shape analysis is a central problem in the field of computer vision. In 2D shape analysis, classification and recognition of objects from their observed silhouettes are extremely crucial but difficult. It usually involves an efficient representation of 2D shape space with a metric, so that its mathematical structure can be used for further analysis. Although the study of 2D simply-connected shapes has been subject to a corpus of literatures, the analysis of multiply-connected shapes is comparatively less studied. In this work, we propose a representation for general 2D multiply-connected domains with arbitrary topologies using conformal welding. A metric can be defined on the proposed representation space, which gives a metric to measure dissimilarities between objects. The main idea is to map the exterior and interior of the domain conformally to unit disks and circle domains (unit disk with several inner disks removed), using holomorphic 1-forms. A set of diffeomorphisms of the unit circle S1 can be obtained, which together with the conformal modules are used to define the shape signature. A shape distance between shape signatures can be defined to measure dissimilarities between shapes. We prove theoretically that the proposed shape signature uniquely determines the multiply-connected objects under suitable normalization. We also introduce a reconstruction algorithm to obtain shapes from their signatures. This completes our framework and allows us to move back and forth between shapes and signatures. With that, a morphing algorithm between shapes can be developed through the interpolation of the Beltrami coefficients associated with the signatures. Experiments have been carried out on shapes extracted from real images. Results demonstrate the efficacy of our proposed algorithm as a stable shape representation scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo Time-of-Flight with Constructive Interference

    Page(s): 1402 - 1413
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2022 KB) |  | HTML iconHTML  

    This paper describes a novel method to acquire depth images using a pair of ToF (Time-of-Flight) cameras. As opposed to approaches that filter, calibrate or do 3D reconstructions posterior to the image acquisition, we combine the measurements of the two cameras within a modified acquisition procedure. The new proposed stereo-ToF acquisition is composed of three stages during which we actively modify the infrared lighting of the scene: first, the two cameras emit an infrared signal one after the other (stages 1 and 2), and then, simultaneously (stage 3). Assuming the scene is static during the three stages, we gather the depth measurements obtained with both cameras and define a cost function to optimize the two depth images. A qualitative and quantitative evaluation of the performance of the proposed stereo-ToF acquisition is provided both for simulated and real ToF cameras. In both cases, the stereo-ToF acquisition produces more accurate depth measurements. Moreover, an extension to the multi-view ToF case and a detailed study on the interference specifications of the system are included. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured Time Series Analysis for Human Action Segmentation and Recognition

    Page(s): 1414 - 1427
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1597 KB) |  | HTML iconHTML  

    We address the problem of structure learning of human motion in order to recognize actions from a continuous monocular motion sequence of an arbitrary person from an arbitrary viewpoint. Human motion sequences are represented by multivariate time series in the joint-trajectories space. Under this structured time series framework, we first propose Kernelized Temporal Cut (KTC), an extension of previous works on change-point detection by incorporating Hilbert space embedding of distributions, to handle the nonparametric and high dimensionality issues of human motions. Experimental results demonstrate the effectiveness of our approach, which yields realtime segmentation, and produces high action segmentation accuracy. Second, a spatio-temporal manifold framework is proposed to model the latent structure of time series data. Then an efficient spatio-temporal alignment algorithm Dynamic Manifold Warping (DMW) is proposed for multivariate time series to calculate motion similarity between action sequences (segments). Furthermore, by combining the temporal segmentation algorithm and the alignment algorithm, online human action recognition can be performed by associating a few labeled examples from motion capture data. The results on human motion capture data and 3D depth sensor data demonstrate the effectiveness of the proposed approach in automatically segmenting and recognizing motion sequences, and its ability to handle noisy and partially occluded data, in the transfer learning module. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking by Sampling and IntegratingMultiple Trackers

    Page(s): 1428 - 1441
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2095 KB) |  | HTML iconHTML  

    We propose the visual tracker sampler, a novel tracking algorithm that can work robustly in challenging scenarios, where several kinds of appearance and motion changes of an object can occur simultaneously. The proposed tracking algorithm accurately tracks a target by searching for appropriate trackers in each frame. Since the real-world tracking environment varies severely over time, the trackers should be adapted or newly constructed depending on the current situation, so that each specific tracker takes charge of a certain change in the object. To do this, our method obtains several samples of not only the states of the target but also the trackers themselves during the sampling process. The trackers are efficiently sampled using the Markov Chain Monte Carlo (MCMC) method from the predefined tracker space by proposing new appearance models, motion models, state representation types, and observation types, which are the important ingredients of visual trackers. All trackers are then integrated into one compound tracker through an Interacting MCMC (IMCMC) method, in which the trackers interactively communicate with one another while running in parallel. By exchanging information with others, each tracker further improves its performance, thus increasing overall tracking performance. Experimental results show that our method tracks the object accurately and reliably in realistic videos, where appearance and motion drastically change over time, and outperforms even state-of-the-art tracking methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visual Tracking: An Experimental Survey

    Page(s): 1442 - 1468
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (3538 KB) |  | HTML iconHTML  

    There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers. View full abstract»

    Open Access
  • What Makes a Photograph Memorable?

    Page(s): 1469 - 1482
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5903 KB) |  | HTML iconHTML  

    When glancing at a magazine, or browsing the Internet, we are continuously exposed to photographs. Despite this overflow of visual information, humans are extremely good at remembering thousands of pictures along with some of their visual details. But not all images are equal in memory. Some stick in our minds while others are quickly forgotten. In this paper, we focus on the problem of predicting how memorable an image will be. We show that memorability is an intrinsic and stable property of an image that is shared across different viewers, and remains stable across delays. We introduce a database for which we have measured the probability that each picture will be recognized after a single view. We analyze a collection of image features, labels, and attributes that contribute to making an image memorable, and we train a predictor based on global image descriptors. We find that predicting image memorability is a task that can be addressed with current computer vision techniques. While making memorable images is a challenging task in visualization, photography, and education, this work is a first attempt to quantify this useful property of images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning Pullback HMM Distances

    Page(s): 1483 - 1489
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (589 KB) |  | HTML iconHTML  

    Recent work in action recognition has exposed the limitations of methods which directly classify local features extracted from spatio-temporal video volumes. In opposition, encoding the actions' dynamics via generative dynamical models has a number of attractive features: however, using all-purpose distances for their classification does not necessarily deliver good results. We propose a general framework for learning distance functions for generative dynamical models, given a training set of labelled videos. The optimal distance function is selected among a family of pullback ones, induced by a parametrised automorphism of the space of models. We focus here on hidden Markov models and their model space, and design an appropriate automorphism there. Experimental results are presented which show how pullback learning greatly improves action recognition performances with respect to base distances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open Access

    Page(s): 1490
    Save to Project icon | Request Permissions | PDF file iconPDF (1157 KB)  
    Freely Available from IEEE
  • myIEEE

    Page(s): 1491
    Save to Project icon | Request Permissions | PDF file iconPDF (771 KB)  
    Freely Available from IEEE
  • Rock Stars of Cybersecurity Conference [advertisement]

    Page(s): 1492
    Save to Project icon | Request Permissions | PDF file iconPDF (1863 KB)  
    Freely Available from IEEE
  • IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (319 KB)  
    Freely Available from IEEE
  • IEEE Computer Society

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (355 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois