By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 7 • Date July 2008

Filter Results

Displaying Results 1 - 20 of 20
  • [Front cover]

    Publication Year: 2008 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2008 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (221 KB)  
    Freely Available from IEEE
  • Bayes Classification of Online Arabic Characters by Gibbs Modeling of Class Conditional Densities

    Publication Year: 2008 , Page(s): 1121 - 1131
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2290 KB) |  | HTML iconHTML  

    This study investigates Bayes classification of online Arabic characters using histograms of tangent differences and Gibbs modeling of the class-conditional probability density functions. The parameters of these Gibbs density functions are estimated following the Zhu et al. constrained maximum entropy formalism, originally introduced for image and shape synthesis. We investigate two partition function estimation methods: one uses the training sample, and the other draws from a reference distribution. The efficiency of the corresponding Bayes decision methods, and of a combination of these, is shown in experiments using a database of 9,504 freely written samples by 22 writers. Comparisons to the nearest neighbor rule method and a Kohonen neural network method are provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constrained connectivity for hierarchical image partitioning and simplification

    Publication Year: 2008 , Page(s): 1132 - 1145
    Cited by:  Papers (39)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6664 KB) |  | HTML iconHTML  

    This paper introduces an image partitioning and simplification method based on the constrained connectivity paradigm. According to this paradigm, two pixels are said to be connected if they satisfy a series of constraints defined in terms of simple measures such as the maximum gray-level differences over well-defined pixel paths and regions. The resulting connectivity relation generates a unique partition of the image definition domain. The simplification of the image is then achieved by setting each segment of the partition to the mean value of the pixels falling within this segment. Fine to coarse partition hierarchies (and, therefore, images of increasing degree of simplification) are produced by varying the threshold value associated with each connectivity constraint. The paper also includes a generalization to multichannel images, application examples, a review of related image segmentation techniques, and pseudocode for an implementation based on queue and stack data structures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TRUST-TECH-Based Expectation Maximization for Learning Finite Mixture Models

    Publication Year: 2008 , Page(s): 1146 - 1157
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1607 KB) |  | HTML iconHTML  

    The expectation maximization (EM) algorithm is widely used for learning finite mixture models despite its greedy nature. Most popular model-based clustering techniques might yield poor clusters if the parameters are not initialized properly. To reduce the sensitivity of initial points, a novel algorithm for learning mixture models from multivariate data is introduced in this paper. The proposed algorithm takes advantage of TRUST-TECH (TRansformation Under STability-reTaining Equilibria CHaracterization) to compute neighborhood local maxima on the likelihood surface using stability regions. Basically, our method coalesces the advantages of the traditional EM with that of the dynamic and geometric characteristics of the stability regions of the corresponding nonlinear dynamical system of the log-likelihood function. Two phases, namely, the EM phase and the stability region phase, are repeated alternatively in the parameter space to achieve local maxima with improved likelihood values. The EM phase obtains the local maximum of the likelihood function and the stability region phase helps to escape out of the local maximum by moving toward the neighboring stability regions. Though applied to Gaussian mixtures in this paper, our technique can be easily generalized to any other parametric finite mixture model. The algorithm has been tested on both synthetic and real data sets and the improvements in the performance compared to other approaches are demonstrated. The robustness with respect to initialization is also illustrated experimentally. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets

    Publication Year: 2008 , Page(s): 1158 - 1170
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2268 KB) |  | HTML iconHTML  

    We consider the problem of learning a ranking function that maximizes a generalization of the Wilcoxon-Mann-Whitney statistic on the training data. Relying on an e-accurate approximation for the error function, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m2) to O(m), where m is the number of training samples. Experiments on public benchmarks for ordinal regression and collaborative filtering indicate that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when the algorithms are trained on the same data. However, since it is several orders of magnitude faster than the current state-of-the-art approaches, it is able to leverage much larger training data sets. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Motion Segmentation and Depth Ordering Using an Occlusion Detector

    Publication Year: 2008 , Page(s): 1171 - 1185
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3688 KB) |  | HTML iconHTML  

    We present a novel method for motion segmentation and depth ordering from a video sequence in general motion. We first compute motion segmentation based on differential properties of the spatio-temporal domain and scale-space integration. Given a motion boundary, we describe two algorithms to determine depth ordering from two- and three-frame sequences. A remarkable characteristic of our method is its ability compute depth ordering from only two frames. The segmentation and depth ordering algorithms are shown to give good results on six real sequences taken in general motion. We use synthetic data to show robustness to high levels of noise and illumination changes; we also include cases where no intensity edge exists at the location of the motion boundary or when no parametric motion model can describe the data. Finally, we describe psychophysical experiments showing that people, like our algorithm, can compute depth ordering from only two frames even when the boundary between the layers is not visible in a single frame. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sequential Kernel Density Approximation and Its Application to Real-Time Visual Tracking

    Publication Year: 2008 , Page(s): 1186 - 1197
    Cited by:  Papers (34)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3914 KB) |  | HTML iconHTML  

    Visual features are commonly modeled with probability density functions in computer vision problems, but current methods such as a mixture of Gaussians and kernel density estimation suffer from either the lack of flexibility by fixing or limiting the number of Gaussian components in the mixture or large memory requirement by maintaining a nonparametric representation of the density. These problems are aggravated in real-time computer vision applications since density functions are required to be updated as new data becomes available. We present a novel kernel density approximation technique based on the mean-shift mode finding algorithm and describe an efficient method to sequentially propagate the density modes over time. Although the proposed density representation is memory efficient, which is typical for mixture densities, it inherits the flexibility of nonparametric methods by allowing the number of components to be variable. The accuracy and compactness of the sequential kernel density approximation technique is illustrated by both simulations and experiments. Sequential kernel density approximation is applied to online target appearance modeling for visual tracking, and its performance is demonstrated on a variety of videos. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation and Tracking of Multiple Humans in Crowded Environments

    Publication Year: 2008 , Page(s): 1198 - 1211
    Cited by:  Papers (73)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2676 KB) |  | HTML iconHTML  

    Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracking the Visual Focus of Attention for a Varying Number of Wandering People

    Publication Year: 2008 , Page(s): 1212 - 1229
    Cited by:  Papers (22)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4362 KB) |  | HTML iconHTML  

    In this paper, we define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W), determining where a person is looking when their movement is unconstrained. The VFOA-W estimation is a new and important problem with implications in behavior understanding and cognitive science and real-world applications. One such application, presented in this paper, monitors the attention passers-by pay to an outdoor advertisement by using a single video camera. In our approach to the VFOA-W problem, we propose a multiperson tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting variable-dimensional state-space, we propose a Reversible-Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme and a novel global observation model, which determines the number of people in the scene and their locations. To determine if a person is looking at the advertisement or not, we propose Gaussian Mixture Model (GMM)-based and Hidden Markov Model (HMM)-based VFOA-W models, which use head pose and location information. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where up to three mobile observers pass in front of an advertisement. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Balanced Exploration and Exploitation Model Search for Efficient Epipolar Geometry Estimation

    Publication Year: 2008 , Page(s): 1230 - 1242
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2917 KB) |  | HTML iconHTML  

    The estimation of the epipolar geometry is especially difficult when the putative correspondences include a low percentage of inlier correspondences and/or a large subset of the inliers is consistent with a degenerate configuration of the epipolar geometry that is totally incorrect. This work presents the balanced exploration and exploitation model (BEEM) search algorithm, which works very well especially for these difficult scenes. The algorithm handles these two problems in a unified manner. It includes the following main features: 1) balanced use of three search techniques: global random exploration, local exploration near the current best solution, and local exploitation to improve the quality of the model, 2) exploitation of available prior information to accelerate the search process, 3) use of the best found model to guide the search process, escape from degenerate models, and define an efficient stopping criterion, 4) presentation of a simple and efficient method to estimate the epipolar geometry from two scale-invariant feature transform (SIFT) correspondences, and 5) use of the locality-sensitive hashing (LSH) approximate nearest neighbor algorithm for fast putative correspondence generation. The resulting algorithm when tested on real images with or without degenerate configurations gives quality estimations and achieves significant speedups compared to the state-of-the-art algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Universal and Adapted Vocabularies for Generic Visual Categorization

    Publication Year: 2008 , Page(s): 1243 - 1256
    Cited by:  Papers (28)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1715 KB)  

    Generic visual categorization (GVC) is the pattern classification problem that consists in assigning labels to an image based on its semantic content. This is a challenging task as one has to deal with inherent object/scene variations, as well as changes in viewpoint, lighting, and occlusion. Several state-of-the-art GVC systems use a vocabulary of visual terms to characterize images with a histogram of visual word counts. We propose a novel practical approach to GVC based on a universal vocabulary, which describes the content of all the considered classes of images, and class vocabularies obtained through the adaptation of the universal vocabulary using class-specific data. The main novelty is that an image is characterized by a set of histograms - one per class - where each histogram describes whether the image content is best modeled by the universal vocabulary or the corresponding class vocabulary. This framework is applied to two types of local image features: low-level descriptors such as the popular SIFT and high-level histograms of word co-occurrences in a spatial neighborhood. It is shown experimentally on two challenging data sets (an in-house database of 19 categories and the PASCAL VOC 2006 data set) that the proposed approach exhibits state-of-the-art performance at a modest computational cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discriminative Feature Co-Occurrence Selection for Object Detection

    Publication Year: 2008 , Page(s): 1257 - 1269
    Cited by:  Papers (25)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5107 KB) |  | HTML iconHTML  

    This paper describes an object detection framework that learns the discriminative co-occurrence of multiple features. Feature co-occurrences are automatically found by sequential forward selection at each stage of the boosting process. The selected feature co-occurrences are capable of extracting structural similarities of target objects leading to better performance. The proposed method is a generalization of the framework proposed by Viola and Jones, where each weak classifier depends only on a single feature. Experimental results obtained using four object detectors for finding faces and three different hand poses, respectively, show that detectors trained with the proposed algorithm yield consistently higher detection rates than those based on their framework while using the same number of features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multiscale Categorical Object Recognition Using Contour Fragments

    Publication Year: 2008 , Page(s): 1270 - 1281
    Cited by:  Papers (69)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2846 KB) |  | HTML iconHTML  

    Psychophysical studies show that we can recognize objects using fragments of outline contour alone. This paper proposes a new automatic visual recognition system based only on local contour features, capable of localizing objects in space and scale. The system first builds a class-specific codebook of local fragments of contour using a novel formulation of chamfer matching. These local fragments allow recognition that is robust to within-class variation, pose changes, and articulation. Boosting combines these fragments into a cascaded sliding-window classifier, and mean shift is used to select strong responses as a final set of detection. We show how learning can be performed iteratively on both training and test sets to bootstrap an improved classifier. We compare with other methods based on contour and local descriptors in our detailed evaluation over 17 challenging categories and obtain highly competitive results. The results confirm that contour is indeed a powerful cue for multiscale and multiclass visual object recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Path Similarity Skeleton Graph Matching

    Publication Year: 2008 , Page(s): 1282 - 1292
    Cited by:  Papers (38)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1729 KB) |  | HTML iconHTML  

    This paper proposes a novel graph matching algorithm and applies it to shape recognition based on object silhouettes. The main idea is to match skeleton graphs by comparing the geodesic paths between skeleton endpoints. In contrast to typical tree or graph matching methods, we do not consider the topological graph structure. Our approach is motivated by the fact that visually similar skeleton graphs may have completely different topological structures. The proposed comparison of geodesic paths between endpoints of skeleton graphs yields correct matching results in such cases. The skeletons are pruned by contour partitioning with discrete. Curve evolution, which implies that the endpoints of skeleton branches correspond to visual parts of the objects. The experimental results demonstrate that our method is able to produce correct results in the presence of articulations, stretching, and contour deformations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Theoretical Analysis of Bagging as a Linear Combination of Classifiers

    Publication Year: 2008 , Page(s): 1293 - 1299
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2121 KB) |  | HTML iconHTML  

    We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the original training set, we discuss how the considered theoretical framework could be exploited to this aim. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inverse Compositional Estimation of 3D Pose And Lighting in Dynamic Scenes

    Publication Year: 2008 , Page(s): 1300 - 1307
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1023 KB) |  | HTML iconHTML  

    In this paper, we show how we can estimate, accurately and efficiently, the 3D motion of a rigid object and time-varying lighting in a dynamic scene. This is achieved in an inverse compositional tracking framework with a novel warping function that involves a 2D rarr 3D rarr 2D transformation. This also allows us to extend traditional two-frame inverse compositional tracking to a sequence of frames, leading to even higher computational savings. We prove the theoretical convergence of this method and show that it leads to significant reduction in computational burden. Experimental analysis on multiple video sequences shows impressive speedup over existing methods while retaining a high level of accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Depth Map Calculation for a Variable Number of Moving Objects using Markov Sequential Object Processes

    Publication Year: 2008 , Page(s): 1308 - 1312
    Cited by:  Papers (3)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (887 KB) |  | HTML iconHTML  

    We advocate the use of Markov sequential object processes for tracking a variable number of moving objects through video frames with a view towards depth calculation. A regression model based on a sequential object process quantifies goodness of fit; regularization terms are incorporated to control within and between frame object interactions. We construct a Markov chain Monte Carlo method for finding the optimal tracks and associated depths and illustrate the approach on a synthetic data set as well as a sports sequence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPAMI Information for authors

    Publication Year: 2008 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (221 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2008 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (136 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois