By Topic

Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on

Date 27 June-2 July 2004

Go

Filter Results

Displaying Results 1 - 25 of 148
  • Point matching as a classification problem for fast and robust object pose estimation

    Page(s): II-244 - II-250 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1009 KB) |  | HTML iconHTML  

    We propose a novel approach to point matching under large viewpoint and illumination changes that are suitable for accurate object pose estimation at a much lower computational cost than state-of-the-art methods. Most of these methods rely either on using ad hoc local descriptors or on estimating local affine deformations. By contrast, we treat wide baseline matching of key points as a classification problem, in which each class corresponds to the set of all possible views of such a point. Given one or more images of a target object, we train the system by synthesizing a large number of views of individual key points and by using statistical classification tools to produce a compact description of this view set. At run-time, we rely on this description to decide to which class, if any, an observed feature belongs. This formulation allows us to use a classification method to reduce matching error rates, and to move some of the computational burden from matching to training, which can be performed beforehand. In the context of pose estimation, we present experimental results for both planar and non-planar objects in the presence of occlusions, illumination changes, and cluttered backgrounds. We show that the method is both reliable and suitable for initializing real-time applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards robust structure-based enhancement and horizon picking in 3-D seismic data

    Page(s): II-482 - II-489 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (818 KB) |  | HTML iconHTML  

    We present a novel structure-enhancing adaptive filter guided by features derived from the gradient structure tensor. We employ this filter to reduce noise in seismic data and to assist in generating seed points for initializing an automatic horizon picking algorithm. In addition, our algorithm takes seismic attributes into consideration to reduce the possibilities of false horizon generation and fault crossing. Comparative experimental results are presented to highlight the potential of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frame synchronization and multi-level subspace analysis for video based face recognition

    Page(s): II-902 - II-907 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (295 KB) |  | HTML iconHTML  

    In this paper, we develop a new video-to-video face recognition algorithm. The major advantage of the video based method is that more information is available in a video sequence than in a single image. In order to take advantage of the large amount of information in the video sequence and at the same time overcome the processing speed and data size problems we develop several new techniques including temporal and spatial frame synchronization and multi-level subspace analysis for video cube processing. The method preserves all the spatial-temporal information contained in a video sequence. Near perfect classification results are obtained on the XM2VTS face video database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A cognitive vision system for action recognition in office environments

    Page(s): II-827 - II-833 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (426 KB) |  | HTML iconHTML  

    The emerging cognitive vision paradigm is concerned with vision systems that evaluate, gather and integrate contextual knowledge for visual analysis. In reasoning about events and structures, cognitive vision systems should rely on multiple computations in order to perform robustly even in noisy domains. Action recognition in an unconstrained office environment thus provides an excellent testbed for research on cognitive computer vision. In this contribution, we present a system that consists of several computational modules for object and action recognition. It applies attention mechanisms, visual learning and contextual as well as probabilistic reasoning to fuse individual results and verify their consistency. Database technologies are used for information storage and an XML based communication framework integrates all modules into a consistent architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time combined 2D+3D active appearance models

    Page(s): II-535 - II-542 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (904 KB) |  | HTML iconHTML  

    Active appearance models (AAMs) are generative models commonly used to model faces. Another closely related types of face models are 3D morphable models (3DMMs). Although AAMs are 2D, they can still be used to model 3D phenomena such as faces moving across pose. We first study the representational power of AAMs and show that they can model anything a 3DMM can, but possibly require more shape parameters. We quantify the number of additional parameters required and show that 2D AAMs can generate model instances that are not possible with the equivalent 3DMM. We proceed to describe how a non-rigid structure-from-motion algorithm can be used to construct the corresponding 3D shape modes of a 2D AAM. We then show how the 3D modes can be used to constrain the AAM so that it can only generate model instances that can also be generated with the 3D modes. Finally, we propose a real-time algorithm for fitting the AAM while enforcing the constraints, creating what we call a "combined 2D+3D AAM". View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmenting, modeling, and matching video clips containing multiple moving objects

    Page(s): II-914 - II-921 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1404 KB) |  | HTML iconHTML  

    This paper presents a novel representation for dynamic scenes composed of multiple rigid objects that may undergo different motions and be observed by a moving camera. Multi-view constraints associated with groups of affine-invariant scene patches and a normalized description of their appearance are used to segment a scene into its rigid parts, construct three-dimensional protective, affine, and Euclidean models of these parts, and match instances of models recovered from different image sequences. The proposed approach has been implemented, and it is applied to the detection and recognition of moving objects in video sequences and the identification of shots that depict the same scene in a video clip (shot matching). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape representation and classification using the Poisson equation

    Page(s): II-61 - II-67 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (591 KB) |  | HTML iconHTML  

    Silhouettes contain rich information about the shape of objects that can be used for recognition and classification. We present a novel approach that allows us to reliably compute many useful properties of a silhouette. Our approach assigns for every internal point of the silhouette a value reflecting the mean time required for a random walk beginning at the point to hit the boundaries. This function can be computed by solving Poisson's equation, with the silhouette contours providing boundary conditions. We show how this function can be used to reliably extract various shape properties including part structure and rough skeleton, local orientation and aspect ratio of different parts, and convex and concave sections of the boundaries. In addition to this we discuss properties of the solution and show how to efficiently compute this solution using multi-grid algorithms. We demonstrate the utility of the extracted properties by using them for shape classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cue integration through discriminative accumulation

    Page(s): II-578 - II-585 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (326 KB) |  | HTML iconHTML  

    Object recognition systems aiming to work in real world settings should use multiple cues in order to achieve robustness. We present a new cue integration scheme, which extends the idea of cue accumulation to discriminative classifiers. We derive and test the scheme for support vector machines (SVMs), but we also show that it is easily extendible to any large margin classifier. In the case of one-class SVMs the scheme can be interpreted as a new class of Mercer kernels for multiple cues. Experimental comparison with a probabilistic accumulation scheme is favorable to our method. Comparison with voting scheme shows that our method may suffer as the number of object classes increases. Based on these results, we propose a recognition algorithm consisting of a decision tree where decisions at each node are taken using our accumulation scheme. Results obtained using this new algorithm compare very favorably to accumulation (both probabilistic and discriminative) and voting scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian fusion of camera metadata cues in semantic scene classification

    Page(s): II-623 - II-630 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (601 KB) |  | HTML iconHTML  

    Semantic scene classification based only on low-level vision cues has had limited success on unconstrained image sets. On the other hand, camera metadata related to capture conditions provides cues independent of the captured scene content that can be used to improve classification performance. We consider two problems: indoor-outdoor classification and sunset detection. Analysis of camera metadata statistics for images of each class revealed that metadata fields, such as exposure time, flash fired, and subject distance, is most discriminative for both indoor-outdoor and sunset classification. A Bayesian network is employed to fuse content-based and metadata cues in the probability domain and degrades gracefully, even when specific metadata inputs are missing (a practical concern). Finally, we provide extensive experimental results on the two problems, using content-based and metadata cues to demonstrate the efficacy of the proposed integrated scene classification scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian face recognition using support vector machine and face clustering

    Page(s): II-374 - II-380 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (276 KB) |  | HTML iconHTML  

    In this paper, we first develop a direct Bayesian based support vector machine by combining the Bayesian analysis with the SVM. Unlike traditional SVM-based face recognition method that needs to train a large number of SVMs, the direct Bayesian SVM needs only one SVM trained to classify the face difference between intra-personal variation and extra-personal variation. However, the added simplicity means that the method has to separate two complex subspaces by one hyper-plane thus affects the recognition accuracy. In order to improve the recognition performance we develop three more Bayesian based SVMs, including the one-versus-all method, the hierarchical agglomerative clustering based method, and the adaptive clustering method. We show the improvement of the new algorithms over traditional subspace methods through experiments on two face databases, the FERET database and the XM2VTS database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust subspace clustering by combined use of kNND metric and SVD algorithm

    Page(s): II-592 - II-599 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (540 KB) |  | HTML iconHTML  

    Subspace clustering has many applications in computer vision, such as image/video segmentation and pattern classification. The major issue in subspace clustering is to obtain the most appropriate subspace from the given noisy data. Typical methods (e.g., SVD, PCA, and eigen-decomposition) use least squares techniques, and are sensitive to outliers. In this paper, we present the k-th nearest neighbor distance (kNND) metric, which, without actually clustering the data, can exploit the intrinsic data cluster structure to detect and remove influential outliers as well as small data clusters. The remaining data provide a good initial inlier data set that resides in a linear subspace whose rank (dimension) is upper-bounded. Such linear subspace constraint can then be exploited by simple algorithms, such as iterative SVD algorithm, to (1) detect the remaining outliers that violate the correlation structure enforced by the low rank subspace, and (2) reliably compute the subspace. As an example, we apply our method to extracting layers from image sequences containing dynamically moving objects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computing depth under ambient illumination using multi-shuttered light

    Page(s): II-234 - II-241 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    Range imaging has become a critical component of many computer vision applications. The quality of the depth data is of critical importance, but so is the need for speed. Shuttered light-pulse (SLP) imaging uses active illumination hardware to provide high quality depth maps at video frame rates. Unfortunately, current analytical models for deriving depth from SLP imagers are specific to the number of shutters and have a number of deficiencies. As a result, depth estimation often suffers from bias due to object reflectivity, incorrect shutter settings, or strong ambient illumination such as that encountered outdoors. These limitations make SLP imaging unsuitable for many applications requiring stable depth readings. This paper introduces a method that is general to any number of shutters. Using three shutters, the new method produces invariant estimates under changes in ambient illumination, producing high quality depth maps in a wider range of situations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic identity characterization for face recognition

    Page(s): II-805 - II-812 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (423 KB) |  | HTML iconHTML  

    We present a general framework for characterizing the object identity in a single image or a group of images with each image containing a transformed version of the object, with applications to face recognition. In terms of the transformation, the group is made of either many still images or frames of a video sequence. The object identity is either discrete- or continuous-valued. This probabilistic framework integrates all the evidence of the set and handles the localization problem, illumination and pose variations through subspace identity encoding. Issues and challenges arising in this framework are addressed and efficient computational schemes are presented. Good face recognition results using the PIE database are reported. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distortion estimation techniques in solving visual CAPTCHAs

    Page(s): II-23 - II-28 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (548 KB) |  | HTML iconHTML  

    This paper describes two distortion estimation techniques for object recognition that solve EZ-Gimpy and Gimpy-r, two of the visual CAPTCHAs ("completely automated public turing test to tell computers and humans apart") with high degrees of success. A CAPTCHA is a program that generates and grades tests that most humans can pass but current computer programs cannot pass. We have developed a correlation algorithm that correctly identifies the word in an EZ-Gimpy challenge image 99% of the time and a direct distortion estimation algorithm that correctly identifies the four letters in a Gimpy-r challenge image 78% of the time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direct super-resolution and registration using raw CFA images

    Page(s): II-600 - II-607 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (497 KB) |  | HTML iconHTML  

    Super-resolution has been applied mainly to grayscale images, but producing a high-resolution color image using a single CCD sensor has not been investigated in detail until recently. This work aims at producing a high-resolution color image directly from raw "color mosaic" images obtained by a single CCD equipped with a color filter array. This method is based on a generalized formulation of super-resolution that simultaneously performs both resolution enhancement and demosaicing. The other key factor of our topic is a precise sub-pixel registration of multiple raw images. We examined direct registration of raw images based on an imaging model, which shows precise estimation of motion among severely aliased raw images. The proposed method is verified through experiments using synthetic and real images. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A GMM parts based face representation for improved verification through relevance adaptation

    Page(s): II-855 - II-861 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (388 KB) |  | HTML iconHTML  

    Motivated by the success of parts based representations in face detection we have attempted to address some of the problems associated with applying such a philosophy to the task of face verification. Hitherto, a major problem with this approach in face verification is the intrinsic lack of training observations, stemming from individual subjects, in order to estimate the required conditional distributions. The estimated distributions have to be generalized enough to encompass the differing permutations of a subject's face yet still be able to discriminate between subjects. In our work the well known Gaussian mixture model (GMM) framework is employed to model the conditional density function of the parts based representation of the face. We demonstrate that excellent performance can be obtained from our GMM based representation through the employment of adaptation theory, specifically relevance adaptation (RA). Our results are presented for the frontal images of the BANCA database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Novel region-based modeling for human detection within highly dynamic aquatic environment

    Page(s): II-390 - II-397 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (860 KB) |  | HTML iconHTML  

    One difficult challenge in autonomous video surveillance is on handling highly dynamic backgrounds. This difficulty is compounded if foreground objects of interest are partially hidden by specular reflections or glare. In this paper, we provide numerous insights into the technical difficulties faced in developing an automated video surveillance system within a hostile environment: an outdoor public swimming pool. For robust detection performance, we focused on two central aspects: (i) effective modeling of the dynamic outdoor aquatic background with rapid illumination changes, splashes and random spatial movements of background elements, owing to the movement of water ripples; and (ii) enhancing the visibility of swimmers that are partially hidden by specular reflections. Several innovations have been introduced from scratch in this paper. The first is the development of a scheme that models the background as regions of dynamic homogeneous processes. This model facilitates the implementation of an efficient spatial searching scheme for background subtraction that could exploit long-range spatial dependencies between pixels. The second is the implementation of a spatio-temporal filtering scheme that enhances the detection of swimmers that are partially hidden by specular reflections of artificial nighttime lighting, serving as a pre-processing module to foreground detection for nighttime operation. These various algorithms have been tightly integrated under a unified framework and demonstrated on a busy Olympic-sized outdoor public pool. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovering human body configurations: combining segmentation and recognition

    Page(s): II-326 - II-333 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (976 KB) |  | HTML iconHTML  

    The goal of this work is to detect a human figure image and localize his joints and limbs along with their associated pixel masks. In this work we attempt to tackle this problem in a general setting. The dataset we use is a collection of sports news photographs of baseball players, varying dramatically in pose and clothing. The approach that we take is to use segmentation to guide our recognition algorithm to salient bits of the image. We use this segmentation approach to build limb and torso detectors, the outputs of which are assembled into human figures. We present quantitative results on torso localization, in addition to shortlisted full body configurations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • From facial expression to level of interest: a spatio-temporal approach

    Page(s): II-922 - II-927 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (344 KB)  

    This paper presents a novel approach to recognize the six universal facial expressions from visual data and use them to derive the level of interest using psychological evidences. The proposed approach relies on a two-step classification built on the top of refined optical flow computed from sequence of images. First, a bank of linear classifier was applied at frame level and the output of this stage was coalesced to produce a temporal signature for each observation. Second, temporal signatures thus computed from the training data set were used to train discrete hidden Markov models (HMMs) to learn the underlying models for each universal facial expressions. The average recognition rate of the proposed facial expression classifier is 90.9% without classifier fusion and 91.2% with fusion using a five fold cross validation scheme on a database of 488 video sequences that include 97 subjects. Recognized facial expressions were combined with the intensity of activity (motion) around the apex frame to measure the level of interest. To further illustrate the efficacy of the proposed approach two set of experiments, namely, television (TV) broadcast data (108 sequences of facial expression containing severe lighting conditions, diverse subjects and expressions) analysis and emotion elicitation on 21 subjects were conducted. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Orthogonal complement component analysis for positive samples in SVM based relevance feedback image retrieval

    Page(s): II-586 - II-591 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (379 KB) |  | HTML iconHTML  

    Relevance feedback (RF) is an important tool to improve the performance of content-based image retrieval system. Support vector machine (SVM) based RF is popular because it can generalize better than most other classifiers. However, directly using SVM in RF may not be appropriate, since SVM treats the positive and negative feedbacks equally. Given the different properties of positive samples and negative samples in RF, they should be treated differently. Considering this, we propose an orthogonal complement components analysis (OCCA) combined with SVM in this paper. We then generalize the OCCA to Hilbert space and define the kernel empirical OCCA (KEOCCA). Through experiments on a Corel photo database with 17,800 images, we demonstrate that the proposed method can significantly improve the performance of conventional SVM-based RF. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCA-SIFT: a more distinctive representation for local image descriptors

    Page(s): II-506 - II-513 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (616 KB) |  | HTML iconHTML  

    Stable local feature detection and representation is a fundamental component of many image registration and object recognition algorithms. Mikolajczyk and Schmid (June 2003) recently evaluated a variety of approaches and identified the SIFT [D. G. Lowe, 1999] algorithm as being the most resistant to common image deformations. This paper examines (and improves upon) the local image descriptor used by SIFT. Like SIFT, our descriptors encode the salient aspects of the image gradient in the feature point's neighborhood; however, instead of using SIFT's smoothed weighted histograms, we apply principal components analysis (PCA) to the normalized gradient patch. Our experiments demonstrate that the PCA-based local descriptors are more distinctive, more robust to image deformations, and more compact than the standard SIFT representation. We also present results showing that using these descriptors in an image retrieval application results in increased accuracy and faster matching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Jitter camera: high resolution video from a low resolution detector

    Page(s): II-135 - II-142 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (793 KB) |  | HTML iconHTML  

    Video cameras must produce images at a reasonable frame-rate and with a reasonable depth of field. These requirements impose fundamental physical limits on the spatial resolution of the image detector. As a result, current cameras produce videos with a very low resolution. Moving the camera and applying super-resolution reconstruction algorithms can computationally enhance the resolution of videos. However, a moving camera introduces motion blur, which limits super-resolution quality. We analyze this effect and derive a theoretical result showing that motion blur has a substantial degrading effect on the performance of super resolution. The conclusion is, that in order to achieve the highest resolution, motion blur should be avoided. Sampling the space-time volume of the video in a specific manner can minimize motion blur. We have developed a novel camera, called the "jitter camera" that achieves this sampling. By applying an adaptive super-resolution algorithm to the video produced by the jitter camera, we show that resolution can be notably enhanced for stationary or slowly moving objects, while it is improved slightly or left unchanged for objects with fast and complex motions. The end result is a video that has a significantly higher resolution than the captured one. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating and employing multiple levels of zoom for activity recognition

    Page(s): II-928 - II-935 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (558 KB) |  | HTML iconHTML  

    To facilitate activity recognition, analysis of the scene at multiple levels of detail is necessary. Required prerequisites for our activity recognition are tracking objects across frames and establishing a consistent labeling of objects across cameras. This paper makes several innovative uses of the epipolar constraint in the context of activity recognition. We first demonstrate how we track heads and hands using the epipolar geometry. Next we show how the detected objects are labeled consistently across cameras and zooms by employing epipolar, spatial, trajectory, and appearance properties. Finally we show how our method, utilizing the multiple levels of detail, is able to answer activity recognition problems which are difficult to answer with a single level of detail. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effect of colorspace transformation, the illuminance component, and color modeling on skin detection

    Page(s): II-813 - II-818 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (394 KB) |  | HTML iconHTML  

    Skin detection is an important preliminary process in human motion analysis. It is commonly performed in three steps: transforming the pixel color to a non-RGB colorspace, dropping the illuminance component of skin color, and classifying by modeling the skin color distribution. In this paper, we evaluate the effect of these three steps on the skin detection performance. The importance of this study is a new comprehensive colorspace and color modeling testing methodology that would allow for making the best choices for skin detection. Combinations of nine colorspaces, the presence of the absence of the illuminance component, and the two color modeling approaches are compared. The performance is measured by using a receiver operating characteristic (ROC) curve on a large dataset of 805 images with manual ground truth. The results reveal that (1) colorspace transformations can improve performance in certain instances, (2) the absence of the illuminance component decreases performance, and (3) skin color modeling has a greater impact than colorspace transformation. We found that the best performance was obtained by transforming the pixel color to the SCT or HSI colorspaces, keeping the illuminance component, and modeling the color with the histogram approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient model-based linear head motion recovery from movies

    Page(s): II-414 - II-421 Vol.2
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (761 KB) |  | HTML iconHTML  

    We propose an efficient method that estimates the motion parameters of a human head from a video sequence by using a three-layer linear iterative process. In the innermost layer, we estimate the motion of each input face image in a video sequence based on a generic face model and a small set of feature points. A fast iterative least-square method is used to recover these motion parameters. After that, we iteratively estimate three model scaling factors using multiple frames with the recovered poses in the middle layer. Finally, we update 3D coordinates of the feature points on the generic face model in the outer-most layer. Since all iterative processes can be solved linearly, the computational cost is low. Tests on synthetic data under noisy conditions and two real video sequences have been performed. Experimental results show that the proposed method is robust and has good performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.