Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 5:00 PM ET (12:00 - 21:00 UTC). We apologize for the inconvenience.
By Topic

Pattern Analysis and Machine Intelligence, IEEE Transactions on

Issue 7 • Date July 2010

Filter Results

Displaying Results 1 - 21 of 21
  • [Front cover]

    Publication Year: 2010 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2010 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (193 KB)  
    Freely Available from IEEE
  • A Combinatorial Solution for Model-Based Image Segmentation and Real-Time Tracking

    Publication Year: 2010 , Page(s): 1153 - 1164
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2259 KB) |  | HTML iconHTML  

    We propose a combinatorial solution to determine the optimal elastic matching of a deformable template to an image. The central idea is to cast the optimal matching of each template point to a corresponding image pixel as a problem of finding a minimum cost cyclic path in the three-dimensional product space spanned by the template and the input image. We introduce a cost functional associated with each cycle, which consists of three terms: a data fidelity term favoring strong intensity gradients, a shape consistency term favoring similarity of tangent angles of corresponding points, and an elastic penalty for stretching or shrinking. The functional is normalized with respect to the total length to avoid a bias toward shorter curves. Optimization is performed by Lawler's Minimum Ratio Cycle algorithm parallelized on state-of-the-art graphics cards. The algorithm provides the optimal segmentation and point correspondence between template and segmented curve in computation times that are essentially linear in the number of pixels. To the best of our knowledge, this is the only existing globally optimal algorithm for real-time tracking of deformable shapes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From Canonical Poses to 3D Motion Capture Using a Single Camera

    Publication Year: 2010 , Page(s): 1165 - 1181
    Cited by:  Papers (6)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7198 KB) |  | HTML iconHTML  

    We combine detection and tracking techniques to achieve robust 3D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the cases of golf motions filmed using a static camera and walking motions acquired using a potentially moving one. We will show that our approach, although monocular, is both metrically accurate because it integrates information over many frames and robust because it can recover from a few misdetections. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Order-Preserving Moves for Graph-Cut-Based Optimization

    Publication Year: 2010 , Page(s): 1182 - 1196
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6057 KB) |  | HTML iconHTML  

    In the last decade, graph-cut optimization has been popular for a variety of labeling problems. Typically, graph-cut methods are used to incorporate smoothness constraints on a labeling, encouraging most nearby pixels to have equal or similar labels. In addition to smoothness, ordering constraints on labels are also useful. For example, in object segmentation, a pixel with a “car wheel” label may be prohibited above a pixel with a “car roof” label. We observe that the commonly used graph-cut alpha-expansion move algorithm is more likely to get stuck in a local minimum when ordering constraints are used. For a certain model with ordering constraints, we develop new graph-cut moves which we call order-preserving. The advantage of order-preserving moves is that they act on all labels simultaneously, unlike alpha-expansion. More importantly, for most labels alpha, the set of alpha-expansion moves is strictly smaller than the set of order-preserving moves. This helps to explain why in practice optimization with order-preserving moves performs significantly better than alpha-expansion in the presence of ordering constraints. We evaluate order-preserving moves for the geometric class scene labeling (introduced by Hoiem et al.) where the goal is to assign each pixel a label such as “sky,” “ground,” etc., so ordering constraints arise naturally. In addition, we use order-preserving moves for certain simple shape priors in graph-cut segmentation, which is a novel contribution in itself. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rank Classification of Linear Line Structures from Images by Trifocal Tensor Determinability

    Publication Year: 2010 , Page(s): 1197 - 1210
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2040 KB) |  | HTML iconHTML  

    The problem we address is: Given line correspondences over three views, what is the condition of the line correspondences for the spatial relation of the three associated camera positions to be uniquely recoverable? The observed set of lines in space is called critical if there are multiple projectively nonequivalent configurations of the camera positions that can picture the same image triplet of the lines. We tackle the problem from the perspective of trifocal tensor, a quantity that captures the relative pose of the cameras in relation to the captured views. We show that the rank of a matrix that leads to the estimation of the tensor is reduced to 7, 11, 15 if the observed lines come from a line pencil, a line bundle, and a line field, respectively, which are line families belonging to linear line space; and 12, 19, 23 if the lines come from a general linear ruled surface, a general linear line congruence, and a general linear line complex, which are subclasses of linear line structures. We show that the above line structures, with the exception of linear line congruence and linear line complex, ought to be critical line structures. All of these structures are quite typical in reality, and thus, the findings are important to the validity and stability of practically all algorithms related to structure from motion and projective reconstruction using line correspondences. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Similarity and Points of Interest

    Publication Year: 2010 , Page(s): 1211 - 1226
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (7045 KB) |  | HTML iconHTML  

    In this work, we present a new approach to interest point detection. Different types of features in images are detected by using a common computational concept. The proposed approach considers the total variability of local regions. The total sum of squares computed on the intensity values of a local circular region is divided into three components: between-circumferences sum of squares, between-radii sum of squares, and the remainder. These three components normalized by the total sum of squares represent three new saliency measures, namely, radial, tangential, and residual. The saliency measures are computed for regions with different radii and scale spaces are built in this way. Local extrema in scale space of each of the saliency measures are located. They represent features with complementary image properties: blob-like features, corner-like features, and highly textured points. Results obtained on image sets of different object classes and image sets under different types of photometric and geometric transformations show high robustness of the method to intraclass variations as well as to different photometric transformations and moderate geometric transformations and compare favorably with the results obtained by the leading interest point detectors from the literature. The proposed approach gives a rich set of highly distinctive local regions that can be used for object recognition and image matching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Spectral Symmetry Analysis

    Publication Year: 2010 , Page(s): 1227 - 1238
    Cited by:  Papers (4)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3367 KB) |  | HTML iconHTML  

    We present a spectral approach for detecting and analyzing rotational and reflectional symmetries in n-dimensions. Our main contribution is the derivation of a symmetry detection and analysis scheme for sets of points IRn and its extension to image analysis by way of local features. Each object is represented by a set of points S ∈ IRn, where the symmetry is manifested by the multiple self-alignments of S . The alignment problem is formulated as a quadratic binary optimization problem, with an efficient solution via spectral relaxation. For symmetric objects, this results in a multiplicity of eigenvalues whose corresponding eigenvectors allow the detection and analysis of both types of symmetry. We improve the scheme's robustness by incorporating geometrical constraints into the spectral analysis. Our approach is experimentally verified by applying it to 2D and 3D synthetic objects as well as real images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Survey of Pedestrian Detection for Advanced Driver Assistance Systems

    Publication Year: 2010 , Page(s): 1239 - 1258
    Cited by:  Papers (119)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4480 KB) |  | HTML iconHTML  

    Advanced driver assistance systems (ADASs), and particularly pedestrian protection systems (PPSs), have become an active research area aimed at improving traffic safety. The major challenge of PPSs is the development of reliable on-board pedestrian detection systems. Due to the varying appearance of pedestrians (e.g., different clothes, changing size, aspect ratio, and dynamic shape) and the unstructured environment, it is very difficult to cope with the demanded robustness of this kind of system. Two problems arising in this research area are the lack of public benchmarks and the difficulty in reproducing many of the proposed methods, which makes it difficult to compare the approaches. As a result, surveying the literature by enumerating the proposals one--after-another is not the most useful way to provide a comparative point of view. Accordingly, we present a more convenient strategy to survey the different approaches. We divide the problem of detecting pedestrians from images into different processing steps, each with attached responsibilities. Then, the different proposed methods are analyzed and classified with respect to each processing stage, favoring a comparative viewpoint. Finally, discussion of the important topics is presented, putting special emphasis on the future needs and challenges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two-Dimensional Polar Harmonic Transforms for Invariant Image Representation

    Publication Year: 2010 , Page(s): 1259 - 1270
    Cited by:  Papers (16)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3546 KB) |  | HTML iconHTML  

    This paper introduces a set of 2D transforms, based on a set of orthogonal projection bases, to generate a set of features which are invariant to rotation. We call these transforms Polar Harmonic Transforms (PHTs). Unlike the well-known Zernike and pseudo-Zernike moments, the kernel computation of PHTs is extremely simple and has no numerical stability issue whatsoever. This implies that PHTs encompass the orthogonality and invariance advantages of Zernike and pseudo-Zernike moments, but are free from their inherent limitations. This also means that PHTs are well suited for application where maximal discriminant information is needed. Furthermore, PHTs make available a large set of features for further feature selection in the process of seeking for the best discriminative or representative features for a particular application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Visual Word Ambiguity

    Publication Year: 2010 , Page(s): 1271 - 1283
    Cited by:  Papers (139)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3381 KB) |  | HTML iconHTML  

    This paper studies automatic image classification by modeling soft assignment in the popular codebook model. The codebook model describes an image as a bag of discrete visual words selected from a vocabulary, where the frequency distributions of visual words in an image allow classification. One inherent component of the codebook model is the assignment of discrete visual words to continuous image features. Despite the clear mismatch of this hard assignment with the nature of continuous features, the approach has been successfully applied for some years. In this paper, we investigate four types of soft assignment of visual words to image features. We demonstrate that explicitly modeling visual word assignment ambiguity improves classification performance compared to the hard assignment of the traditional codebook model. The traditional codebook model is compared against our method for five well-known data sets: 15 natural scenes, Caltech-101, Caltech-256, and Pascal VOC 2007/2008. We demonstrate that large codebook vocabulary sizes completely deteriorate the performance of the traditional model, whereas the proposed model performs consistently. Moreover, we show that our method profits in high-dimensional feature spaces and reaps higher benefits when increasing the number of image categories. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification of Complex Information: Inference of Co-Occurring Affective States from Their Expressions in Speech

    Publication Year: 2010 , Page(s): 1284 - 1297
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1778 KB) |  | HTML iconHTML  

    We present a classification algorithm for inferring affective states (emotions, mental states, attitudes, and the like) from their nonverbal expressions in speech. It is based on the observations that affective states can occur simultaneously and different sets of vocal features, such as intonation and speech rate, distinguish between nonverbal expressions of different affective states. The input to the inference system was a large set of vocal features and metrics that were extracted from each utterance. The classification algorithm conducted independent pairwise comparisons between nine affective-state groups. The classifier used various subsets of metrics of the vocal features and various classification algorithms for different pairs of affective-state groups. Average classification accuracy of the 36 pairwise machines was 75 percent, using 10-fold cross validation. The comparison results were consolidated into a single ranked list of the nine affective-state groups. This list was the output of the system and represented the inferred combination of co-occurring affective states for the analyzed utterance. The inference accuracy of the combined machine was 83 percent. The system automatically characterized over 500 affective state concepts from the Mind Reading database. The inference of co-occurring affective states was validated by comparing the inferred combinations to the lexical definitions of the labels of the analyzed sentences. The distinguishing capabilities of the system were comparable to human performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixtures of Factor Analyzers with Common Factor Loadings: Applications to the Clustering and Visualization of High-Dimensional Data

    Publication Year: 2010 , Page(s): 1298 - 1309
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1897 KB) |  | HTML iconHTML  

    Mixtures of factor analyzers enable model-based density estimation to be undertaken for high-dimensional data, where the number of observations n is not very large relative to their dimension p. In practice, there is often the need to further reduce the number of parameters in the specification of the component-covariance matrices. To this end, we propose the use of common component-factor loadings, which considerably reduces further the number of parameters. Moreover, it allows the data to be displayed in low--dimensional plots. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Structure of Multiplicative Motions in Natural Imagery

    Publication Year: 2010 , Page(s): 1310 - 1316
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1606 KB) |  | HTML iconHTML  

    A theoretical investigation of the frequency structure of multiplicative image motion signals is presented, e.g., as associated with translucency phenomena. Previous work has claimed that the multiplicative composition of visual signals generally results in the annihilation of oriented structure in the spectral domain. As a result, research has focused on multiplicative signals in highly specialized scenarios where highly structured spectral signatures are prevalent, or introduced a nonlinearity to transform the multiplicative image signal to an additive one. In contrast, in this paper, it is shown that oriented structure is present in multiplicative cases when natural domain constraints are taken into account. This analysis suggests that the various instances of naturally occurring multiple motion structures can be treated in a unified manner. As an example application of the developed theory, a multiple motion estimator previously proposed for translation, additive transparency, and occlusion is adapted to multiplicative image motions. This estimator is shown to yield superior performance over the alternative practice of introducing a nonlinear preprocessing step. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing Highly Reliable Fiducial Markers

    Publication Year: 2010 , Page(s): 1317 - 1324
    Cited by:  Papers (15)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1702 KB) |  | HTML iconHTML  

    Fiducial markers are artificial landmarks added to a scene to facilitate locating point correspondences between images, or between images and a known model. Reliable fiducials solve the interest point detection and matching problems when adding markers is convenient. The proper design of fiducials and the associated computer vision algorithms to detect them can enable accurate pose detection for applications ranging from augmented reality, input devices for HCI, to robot navigation. Marker systems typically have two stages, hypothesis generation from unique image features and verification/identification. A set of criteria for high robustness and practical use are identified and then optimized to produce the ARTag fiducial marker system. An edge-based method robust to lighting and partial occlusion is used for the hypothesis stage, and a reliable digital coding system is used for the identification and verification stage. Using these design criteria large gains in performance are achieved by ARTag over conventional ad hoc designs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pairwise Costs in Multiclass Perceptrons

    Publication Year: 2010 , Page(s): 1324 - 1328
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (987 KB) |  | HTML iconHTML  

    A novel loss function to train a net of K single-layer perceptrons (KSLPs) is suggested, where pairwise misclassification cost matrix can be incorporated directly. The complexity of the network remains the same; a gradient's computation of the loss function does not necessitate additional calculations. Minimization of the loss requires a smaller number of training epochs. Efficacy of cost-sensitive methods depends on the cost matrix, the overlap of the pattern classes, and sample sizes. Experiments with real-world pattern recognition (PR) tasks show that employment of novel loss function usually outperforms three benchmark methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Metrology Using a Single Camera

    Publication Year: 2010 , Page(s): 1329 - 1335
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    This paper presents a video metrology approach using an uncalibrated single camera that is either stationary or in planar motion. Although theoretically simple, measuring the length of even a line segment in a given video is often a difficult problem. Most existing techniques for this task are extensions of single image-based techniques and do not achieve the desired accuracy especially in noisy environments. In contrast, the proposed algorithm moves line segments on the reference plane to share a common endpoint using the vanishing line information followed by fitting multiple concentric circles on the image plane. A fully automated real-time system based on this algorithm has been developed to measure vehicle wheelbases using an uncalibrated stationary camera. The system estimates the vanishing line using invariant lengths on the reference plane from multiple frames rather than the given parallel lines, which may not exist in videos. It is further extended to a camera undergoing a planar motion by automatically selecting frames with similar vanishing lines from the video. Experimental results show that the measurement results are accurate enough to classify moving vehicles based on their size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reinterpreting the Application of Gabor Filters as a Manipulation of the Margin in Linear Support Vector Machines

    Publication Year: 2010 , Page(s): 1335 - 1341
    Cited by:  Papers (8)
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (808 KB) |  | HTML iconHTML  

    Linear filters are ubiquitously used as a preprocessing step for many classification tasks in computer vision. In particular, applying Gabor filters followed by a classification stage, such as a support vector machine (SVM), is now common practice in computer vision applications like face identity and expression recognition. A fundamental problem occurs, however, with respect to the high dimensionality of the concatenated Gabor filter responses in terms of memory requirements and computational efficiency during training and testing. In this paper, we demonstrate how the preprocessing step of applying a bank of linear filters can be reinterpreted as manipulating the type of margin being maximized within the linear SVM. This new interpretation leads to sizable memory and computational advantages with respect to existing approaches. The reinterpreted formulation turns out to be independent of the number of filters, thereby allowing the examination of the feature spaces derived from arbitrarily large number of linear filters, a hitherto untestable prospect. Further, this new interpretation of filter banks gives new insights, other than the often cited biological motivations, into why the preprocessing of images with filter banks, like Gabor filters, improves classification performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the Feature Selection Criterion Based on an Approximation of Multidimensional Mutual Information

    Publication Year: 2010 , Page(s): 1342 - 1343
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (129 KB) |  | HTML iconHTML  

    We derive the feature selection criterion presented in [1] and [2] from the multidimensional mutual information between features and the class. Our derivation: 1) specifies and validates the lower-order dependency assumptions of the criterion and 2) mathematically justifies the utility of the criterion by relating it to Bayes classification error. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPAMI Information for authors

    Publication Year: 2010 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (193 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2010 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) is published monthly. Its editorial board strives to present most important research results in areas within TPAMI's scope.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
David A. Forsyth
University of Illinois