By Topic

Applications of Computer Vision, 2002. (WACV 2002). Proceedings. Sixth IEEE Workshop on

Date 4-4 Dec. 2002

Filter Results

Displaying Results 1 - 25 of 57
  • Proceedings Sixth IEEE Workshop on Applications of Computer Vision (WACV 2002)

    Save to Project icon | Request Permissions | PDF file iconPDF (259 KB)  
    Freely Available from IEEE
  • Multi-view face detection with FloatBoost

    Page(s): 184 - 188
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (396 KB) |  | HTML iconHTML  

    In this paper, a new boosting algorithm, called FloatBoost, is proposed to construct a strong face-nonface classifier. FloatBoost incorporates the idea of Floating Search into AdaBoost, and yields similar or higher classification accuracy than AdaBoost with a smaller number of weak classifiers. We also present a novel framework for fast multi-view face detection. A detector-pyramid architecture is designed to quickly discard a vast number of non-face sub-windows and hence perform multi-view face detection efficiently. This results in the first real-time multi-view face detection system which runs at 5 frames per second for 320x240 image sequence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Page(s): 335 - 336
    Save to Project icon | Request Permissions | PDF file iconPDF (163 KB)  
    Freely Available from IEEE
  • Dense disparity maps in real-time with an application to augmented reality

    Page(s): 225 - 230
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (467 KB) |  | HTML iconHTML  

    This work presents a technique for computing dense disparity maps from a binocular stereo camera system. The methods are applied in an Augmented Reality setting for combining real and virtual worlds with proper occlusions. The proposed stereo correspondence technique is based oil area matching and facilitates an efficient strategy by using the concept of a three-dimensional similarity accumulator whereby occlusions are detected and object boundaries are extracted correctly. The main contribution of this paper is the way we fill the accumulator using absolute differences of images and computing a mean filter on these difference images. This. is. where the main advantages of the accumulator approach can be exploited, since all entries can be computed in parallel and thus extremely efficient. Additionally, we-perform an asymmetric correction step and a post-processing of the disparity maps that maintains object edges. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Range synthesis for 3D environment modeling

    Page(s): 231 - 236
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (476 KB) |  | HTML iconHTML  

    In this paper a range synthesis algorithm is proposed as an initial solution to the problem of 3D environment modeling from sparse data. We develop a statistical learning method for inferring and extrapolating range data from as little as one intensity image and from those (sparse) regions where both range and intensity information is available. Our work is related to methods for texture synthesis using Markov Random Field methods. We demonstrate that MRF methods can also be applied to general intensity images with little associated range information and used to estimate range values where needed without making any strong assumptions about the kind of surfaces in the world Experimental results show the feasibility of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation of myocardium using velocity field constrained front propagation

    Page(s): 84 - 89
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (401 KB) |  | HTML iconHTML  

    We present a velocity-constrained front propagation approach for myocardium segmentation from magnetic resonance intensity image (MRI) and its matching phase contrast velocity (PCV) images. Our curve evolution criterion is dependent on the prior probability distribution of the myocardial boundary and the conditional boundary probability distribution, which is constructed from the MRI intensity gradient, the PCV magnitude, and the local phase coherence of the PCV direction. A two-step boundary finding strategy is employed to facilitate the computation. For the first image frame, a gradient-only fast marching/level set step is used to approach the boundary, and a narrowband is formed around the curve. The initial boundary is then refined using the full information from priors and all three image sources. For the other frames, the resulting contours from the previous frames are used as the initialization contours, and only refinement step is needed. Experiment results from canine MRI sequence are presented, and are compared to results from gradient-only segmentation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal motion estimation from visual and inertial measurements

    Page(s): 314 - 319
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (271 KB) |  | HTML iconHTML  

    Cameras and inertial sensors are good candidates to be deployed together for autonomous vehicle motion estimation, since each can be used to resolve the ambiguities in the estimated motion that results from using the other modality alone. We present an algorithm that computes optimal vehicle motion estimates by considering all of the measurements from a camera, rate gyro, and accelerometer simultaneously. Such optimal estimates are useful in their own right, and as a gold standard for the comparison of online algorithms. By comparing the motions estimated using visual and inertial measurements, visual measurements only, and inertial measurements only against ground truth, we show that using image and inertial data together can produce highly accurate estimates even when the results produced by each modality alone are very poor Our test datasets include both conventional and omnidirectional image sequences, and an image sequence with a high percentage of missing data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive aperture control for image acquisition

    Page(s): 320 - 324
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (400 KB) |  | HTML iconHTML  

    Image processing strongly relies on the quality of the input images, as images of good quality can significantly decrease the development effort for image processing and analysis algorithms. A flexible acquisition system for image enhancement, which is able to operate in real time under changing brightness conditions, is suggested The system is based on controlling the aperture of the lens, which makes it useable in combination with all types of image sensors. The control scheme is based on an adaptive image quality estimator and can be used for full images and regions of interest within images. We demonstrate the real-time performance of our approach for different static and dynamic in- and outdoor test scenarios with and without region of interest. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Eye typing using Markov and active appearance models

    Page(s): 132 - 136
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    We propose a non-intrusive eye tracking system intended for the use of everyday gaze typing using web cameras. We argue that high precision in gaze tracking is not needed for on-screen typing due to natural language redundancy. This facilitates the use of low-cost video components for advanced multi-modal interactions based on video tracking systems. Robust methods are needed to track the eyes using web cameras due to the poor image quality. A realtime tracking scheme using a mean-shift color tracker and an Active Appearance Model of the eye is proposed. It is possible from this model to infer the state of the eye such as eye corners and the pupil location under scale and rotational changes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Kinematic-based human motion analysis in infrared sequences

    Page(s): 208 - 212
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1982 KB)  

    In an infrared (IR) image sequence of human walking, the human silhouette can be reliably extracted from the background regardless of lighting conditions and colors of the human surfaces and backgrounds in most cases. Moreover, some important regions containing skin, such as face and hands, can be accurately detected in IR image sequences. In this paper, we propose a kinematic-based approach for automatic human motion analysis from IR image sequences. The proposed approach estimates 3D human walking parameters by performing a modified least squares fit of the 3D kinematic model to the 2D silhouette extracted from a monocular IR image sequence, where continuity and symmetry of human walking and detected hand regions are also considered in the optimization function. Experimental results show that the proposed approach achieves good performance in gait analysis with different view angles With respect to the walking direction, and is promising for further gait recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Appearance-based eye gaze estimation

    Page(s): 191 - 195
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (316 KB) |  | HTML iconHTML  

    We present a method for estimating eye gaze direction, which represents a departure from conventional eye gaze estimation methods, the majority of which are based on tracking specific optical phenomena like corneal reflection and the Purkinje images. We employ an appearance manifold model, but instead of using a densely sampled spline to perform the nearest manifold point query, we retain the original set of sparse appearance samples and use linear interpolation among a small subset of samples to approximate the nearest manifold point. The advantage of this approach is that since we are only storing a sparse set of samples, each sample can be a high dimensional vector that retains more representational accuracy than short vectors produced with dimensionality reduction methods. The algorithm was tested with a set of eye images labelled with ground truth point-of-regard coordinates. We have found that the algorithm is capable of estimating eye gaze with a mean angular error of 0.38 degrees, which is comparable to that obtained by commercially available eye trackers. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Boosting image orientation detection with indoor vs. outdoor classification

    Page(s): 95 - 99
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (231 KB) |  | HTML iconHTML  

    Automatic detection of image orientation is a very important operation in photo image management. In this paper, we propose an automated method based on the boosting algorithm to estimate image orientations. The proposed method has the capability of rejecting images based on the confidence score of the orientation detection. Also, images are classified into indoor and outdoor, and this classification result is used to further refine the orientation detection. To select features more sensitive to the rotation, we combine the features by subtraction operation and select the most useful features by boosting algorithm. The proposed method has several advantages: small model size, fast classification speed, and effective rejection scheme. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamical road modeling and matching for direct visual navigation

    Page(s): 237 - 241
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (668 KB)  

    This paper proposes a new concept of direct visual navigation, (DVN), which superimposes virtual direction indicators and traffic information into the real road scene to give drivers efficient and direct visual navigation guidance. To align the virtual objects properly with respect to the real world, we need to solve the so-called Registration Problem in Augmented Reality (AR) context. Traditional solutions always employ a fixed and known-structure model as well as the object depth information to obtain the 3D-2D correlations, which is not possible in the case of on-road driving navigation. With the constraints of road structure and on-road vehicle motion features, this paper presents a dynamical multi-lane road shape modeling method as well as a road model matching method to simplify the 3D-2D correlation problem to the 2D-2D road model matching on projective image. Additional road shape lookup table (RSL) concept is also presented in this paper to calculate the road model matching score. The algorithms proposed in this paper are validated with the experimental results from real road test under different conditions and types of road. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Active facial tracking for fatigue detection

    Page(s): 137 - 142
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (369 KB) |  | HTML iconHTML  

    The vision-based driver fatigue detection is one of the most prospective commercial applications of facial expression recognition technology. The facial feature tracking is the primary technique issue in it. Current facial tracking technology faces three challenges: (1) detection failure of some or all of features due to a variety of lighting conditions and head motions; (2) multiple and non-rigid object tracking; and (3) features occlusion when the head is in oblique angles. In this paper, we propose a new active approach. First, the active IR sensor is used to robustly detect pupils under variable lighting conditions. The detected pupils are then used to predict the head motion. Furthermore, face movement is assumed to be locally smooth so that a facial feature can be tracked with a Kalman filter. The simultaneous use of the pupil constraint and the Kalman filtering greatly increases the prediction accuracy for each feature position. Feature detection is accomplished in the Gabor space with respect to the vicinity of predicted location. Local graphs consisting of identified features are extracted and used to capture the spatial relationship among detected features. Finally, a graph-based reliability propagation is proposed to tackle the occlusion problem and verify the tracking results. The experimental results show validity of our active approach to real-life facial tracking under variable lighting conditions, head orientations, and facial expressions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic pose estimation of complex 3D building models

    Page(s): 148 - 152
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1095 KB) |  | HTML iconHTML  

    3D models of urban sites with geometry and facade textures are needed for many planning and visualization applications. Approximate 3D wireframe model can be derived from aerial images but detailed textures must be obtained from ground level images. Integrating such views with the 3D models is difficult as only small parts of buildings may be visible in a single view. We describe a method that uses two or three vanishing points, and three 3D to 2D line correspondences to estimate the rotational and translational parameters of the ground level cameras. The valid set of multiple combinations of 3D to 2D line pairs is chosen by a hypotheses generation and evaluation Some experimental results are presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Arm gesture detection in a classroom environment

    Page(s): 153 - 157
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (401 KB) |  | HTML iconHTML  

    Detecting human arm motion in a typical classroom environment is a challenging task due to the noisy and highly dynamic background, varying light conditions, as well as the small size and multiple number of possible matched objects. A robust vision system that can detect events of students' hands being raised for asking questions is described. This system is intended to support the collaborative demands of distributed classroom lecturing and further serve as a test case for real-time gesture recognition vision systems. Various techniques including temporal and spatial segmentation, skin color identification, as well as shape and feature analysis are investigated and discussed. Limitations and problems are also analyzed and testing results are illustrated. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video de-abstraction or how to save money on your wedding video

    Page(s): 264 - 268
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (526 KB)  

    There exist an increasing body of work dealing with video still abstraction, the extraction of representative still images from a video sequence. This work focuses in the other direction: given a video abstract and raw unedited video data, we produce an edited video. We focus on the application of generating wedding videos. We use the existing wedding photo album as an abstract, and produce an edited wedding video from it. The photo album serves us in determining importance of raw shots, as well as style and order. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Group behavior recognition with multiple cameras

    Page(s): 177 - 183
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    We propose in this paper an approach for recognizing group of people behaviors using multiple cameras with overlapping FOVs (Field Of View). In this context, Behavior recognition first relies on low level motion detection and frame to frame tracking which generate a graph of mobile objects for each camera. Second, to take advantage of all cameras observing the same scene, a combination mechanism is performed to combine the graphs computed for each camera into a global one. This global graph is then used for long term tracking of groups of people evolving in the scene. Finally, the result of the group tracking is used by a higher level module which recognizes predefined scenarios corresponding to specific group behaviors. This article focuses on the graphs combination mechanism and on the recognition of group behaviors. At the end, results on these two algorithms are described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A real-time precrash vehicle detection system

    Page(s): 171 - 176
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB) |  | HTML iconHTML  

    This paper presents an in-vehicle real-time monocular precrash vehicle detection system. The system acquires grey level images through a forward facing low light camera and achieves an average detection rate of 10Hz. The vehicle detection algorithm consists of two main steps: multi-scale driven hypothesis generation and appearance-based hypothesis verification. In the multi-scale hypothesis generation step, possible image locations where vehicles might be present are hypothesized. This step uses multi-scale techniques to speed up detection but also to improve system robustness by making system performance less sensitive to the choice of certain parameters. Appearance-base hypothesis verification verifies those hypothesis using Haar Wavelet decomposition for feature extraction and Support Vector Machines (SVMs) for classification. The monocular system was tested under different traffic scenarios (e.g., simply structured highway, complex urban street, varying weather conditions), illustrating good performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation of complex buildings from aerial images and 3D surface reconstruction

    Page(s): 215 - 219
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (846 KB)  

    This paper presents a new method for extraction of buildings in aerial images. We first present a method based on rectangular buildings, which are the most common constructions. We then extend this method to more complex shapes by decomposition in a set of rectangles. These rectangles are used to enhance a 3D reconstruction of the digital elevation model (DEM). Based on stereo data, we use the DEM and the orthoimage for a first segmentation of all areas at elevation above ground. We estimate the rectangle parameters over any given blob and define a criterion for checking the similarity between shape and model. We introduce a new approach for automatic reconstruction of buildings of complex shapes using an iterative splitting of the region until it is covered by a set of rectangles. This automatic process is successfully illustrated on synthetic and real examples. In order to refine location and size of the model, we present a deformable rectangle template. The final rectangle and complex shape models are used together with elevation to obtain a 3D realistic reconstruction of the scene including building models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A model-driven method of estimating the state of clothes for manipulating it

    Page(s): 63 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (704 KB) |  | HTML iconHTML  

    Aiming at manipulating clothes, a model-driven method of estimating the state of hanging clothes is proposed. We suppose a system consisting of two manipulators and a camera. The task considered in this paper is to hold a pullover at its two shoulders by two manipulators respectively, as a first step for folding it. The proposed method estimates the state of the clothes held by one manipulator in a model-driven way and indicates the position to be held next by the other manipulator. First, the possible appearances of the pullover when it is held at one point are roughly predicted. Using discriminative features of the predicted appearances, the possible states for the observed appearance are selected. Each appearance of the possible state is partially deformed so as to get close to the observed appearance. The state whose appearance successfully approaches closest to the observed appearance is selected as the final decision. The point to be held next is determined according to the state. The results of preliminary experiments using actual images have shown the good potential of the proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A kernel logit approach for face and non-face classification

    Page(s): 100 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3297 KB)  

    This paper introduces a kernel logit approach for face and non-face classification. The approach is based on the combined use of the multinomial logit model (MLM) and "kernel feature compound vectors." The MLM is one of the neural network models for multiclass pattern classification, and is supposed to be equal or better in classification performance than linear classification methods. The "kernel feature compound vectors" are compound feature vectors of geometric image features and Kernel features. Evaluation and comparison experiments were conducted by using face and non,face images (Face: training 100, cross-validation 300, test 325, Non-face : training 200, cross-validation 1000, test 1000) gathered from the available face databases and others. The experimental result obtained by the proposed method was better than the results obtained by the Support Vector Machines (SVM) and the Kernel Fisher Discriminant Analysis (KFDA). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Genetic feature subset selection for gender classification: a comparison study

    Page(s): 165 - 170
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB) |  | HTML iconHTML  

    We consider the problem of gender classification from frontal facial images using genetic feature subset selection. We argue that feature selection is an important issue in gender classification and demonstrate that Genetic Algorithms (GA) can select good subsets of features (i.e., features that encode mostly gender information), reducing the classification error. First, Principal Component Analysis (PCA) is used to represent each image as a feature vector (i.e., eigen-features) in a low-dimensional space. Genetic Algorithms (GAs) are then employed to select a subset of features from the low-dimensional representation by disregarding certain eigenvectors that do not seem to encode important gender information. Four different classifiers were compared in this study using genetic feature subset selection: a Bayes classifier, a Neural Network (NN) classifier, a Support Vector Machine (SVM) classifier, and a classifier based on Linear Discriminant Analysis (LDA). Our experimental results show a significant error rate reduction in all cases. The best performance was obtained using the SVM classifier. Using only 8.4% of the features in the complete set, the SVM classifier achieved an error rate of 4.7% from an average error rate of 8.9% using manually selected features. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast and robust planar registration by early consensus with applications to document stitching

    Page(s): 245 - 250
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB) |  | HTML iconHTML  

    This paper presents a fast and extremely robust feature-based method for planar registration of partly overlapping images that uses a two-stage robust fitting approach comprising a fast estimation of a transformation hypothesis (that we show is highly likely to be correct) followed by a confirmation and refinement stage. The method is particularly suited for automatic stitching of oversize documents scanned in two or more parts. We show simulations, also supported by practical experiments, that prove both the robustness and computational efficiency of the approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pose estimation and integration for complete 3D model reconstruction

    Page(s): 143 - 147
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (475 KB) |  | HTML iconHTML  

    An automatic 3D model reconstruction technique is presented to acquire complete 3D models of real objects. The technique is based on novel approaches to pose estimation and integration. Two different poses of an object are used because a single pose often hides some surfaces from a range sensor. The presence of hidden surfaces makes the 3D model reconstructed from any single pose a partial model. Two such partial 3D models are reconstructed for two different poses of the object using a multi-view 3D modeling technique. The two partial 3D models are then registered. Coarse registration is facilitated by a novel pose estimation technique between two models. The pose is estimated by matching a stable tangent plane (STP) of each pose model with the base tangent plane (BTP) which is invariant for a vision system. The partial models are then integrated to a complete 3D model based on voxel classification defined in multi-view integration. Texture mapping is done to obtain a photo-realistic reconstruction of the object. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.