By Topic

Computer Vision and Pattern Recognition Workshops (CVPRW), 2013 IEEE Conference on

Date 23-28 June 2013

Filter Results

Displaying Results 1 - 25 of 162
  • [Front cover]

    Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (19 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (135 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): v - xviii
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • Real-Time Mobile Food Recognition System

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1296 KB) |  | HTML iconHTML  

    We propose a mobile food recognition system the poses of which are estimating calorie and nutritious of foods and recording a user's eating habits. Since all the processes on image recognition performed on a smart-phone, the system does not need to send images to a server and runs on an ordinary smartphone in a real-time way. To recognize food items, a user draws bounding boxes by touching the screen first, and then the system starts food item recognition within the indicated bounding boxes. To recognize them more accurately, we segment each food item region by GrubCut, extract a color histogram and SURF-based bag-of-features, and finally classify it into one of the fifty food categories with linear SVM and fast 2 kernel. In addition, the system estimates the direction of food regions where the higher SVM output score is expected to be obtained, show it as an arrow on the screen in order to ask a user to move a smartphone camera. This recognition process is performed repeatedly about once a second. We implemented this system as an Android smartphone application so as to use multiple CPU cores effectively for real-time recognition. In the experiments, we have achieved the 81.55% classification rate for the top 5 category candidates when the ground-truth bounding boxes are given. In addition, we obtained positive evaluation by user study compared to the food recording system without object recognition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Style Finder: Fine-Grained Clothing Style Detection and Retrieval

    Page(s): 8 - 13
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1279 KB) |  | HTML iconHTML  

    With the rapid proliferation of smartphones and tablet computers, search has moved beyond text to other modalities like images and voice. For many applications like Fashion, visual search offers a compelling interface that can capture stylistic visual elements beyond color and pattern that cannot be as easily described using text. However, extracting and matching such attributes remains an extremely challenging task due to high variability and deformability of clothing items. In this paper, we propose a fine-grained learning model and multimedia retrieval framework to address this problem. First, an attribute vocabulary is constructed using human annotations obtained on a novel fine-grained clothing dataset. This vocabulary is then used to train a fine-grained visual recognition system for clothing styles. We report benchmark recognition and retrieval results on Women's Fashion Coat Dataset and illustrate potential mobile applications for attribute-based multimedia retrieval of clothing items and image annotation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo Camera Tracking for Mobile Devices

    Page(s): 14 - 19
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (603 KB) |  | HTML iconHTML  

    We present our current work on a camera tracking algorithm designed for a mobile device equipped with a stereo camera. The tracker runs in real-time on a prototype mobile platform and it can be used as the core engine of augmented reality applications. In order to cope with the limited resources available, we design an algorithm that relies on the stereo camera only for the 3D reconstruction of points, while the point tracking is performed only on one of the two images, thus reducing the computational effort. We show some preliminary results in which the camera tracker as been validated in a realistic scenario and it is proved to have an adequate robustness for an augmented reality application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards Auto-calibration of Smart Phones Using Orientation Sensors

    Page(s): 20 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1064 KB) |  | HTML iconHTML  

    In this paper, we address the problem of auto calibration of cameras which can rotate freely and change focal length, and we present an algorithm for finding the intrinsic parameters using only two images. We utilize orientation sensors found on many modern smart phones to help decompose the infinite homography into two equivalent upper triangular matrices based only on the intrinsic parameters. We account for small translations between views by calculating the homography based on correspondences on objects that are far away from the camera. We show results based on both real and synthetic data, and quantify the tolerance of our system to small translations and errors in the orientation sensors. Our results are comparable to other recent auto-calibration work while requiring only two images and being tolerant to some translation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection of Moving Objects with Non-stationary Cameras in 5.8ms: Bringing Motion Detection to Your Mobile Device

    Page(s): 27 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1622 KB) |  | HTML iconHTML  

    Detecting moving objects on mobile cameras in real-time is a challenging problem due to the computational limits and the motions of the camera. In this paper, we propose a method for moving object detection on non-stationary cameras running within 5.8 milliseconds (ms) on a PC, and real-time on mobile devices. To achieve real time capability with satisfying performance, the proposed method models the background through dual-mode single Gaussian model (SGM) with age and compensates the motion of the camera by mixing neighboring models. Modeling through dual-mode SGM prevents the background model from being contaminated by foreground pixels, while still allowing the model to be able to adapt to changes of the background. Mixing neighboring models reduces the errors arising from motion compensation and their influences are further reduced by keeping the age of the model. Also, to decrease computation load, the proposed method applies one dual-mode SGM to multiple pixels without performance degradation. Experimental results show the computational lightness and the real-time capability of our method on a smart phone with robust detection performances. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mobile Video Capture of Multi-page Documents

    Page(s): 35 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (637 KB) |  | HTML iconHTML  

    This paper presents a mobile application for capturing images of printed multi-page documents with a smartphone camera. With today's available document capture applications, the user has to carefully capture individual photographs of each page and assemble them into a document, leading to a cumbersome and time consuming user experience. We propose a novel approach of using video to capture multipage documents. Our algorithm automatically selects the best still images corresponding to individual pages of the document from the video. The technique combines video motion analysis, inertial sensor signals, and an image quality (IQ) prediction technique to select the best page images from the video. For the latter, we extend a previous no-reference IQ prediction algorithm to suit the needs of our video application. The algorithm has been implemented on an iPhone 4S. Individual pages are successfully extracted for a wide variety of multi-page documents. OCR analysis shows that the quality of document images produced by our app is comparable to that of standard still captures. At the same time, user studies confirm that in the majority of trials, video capture provides an experience that is faster and more convenient than multiple still captures. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Collision Detection for Visually Impaired from a Body-Mounted Camera

    Page(s): 41 - 47
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1459 KB) |  | HTML iconHTML  

    A real-time collision detection system using a body-mounted camera is developed for visually impaired and blind people. The system computes sparse optical flow in the acquired videos, compensates for camera self-rotation using external gyro-sensor, and estimates collision risk in local image regions based on the motion estimates. Experimental results for a variety of scenarios involving static and dynamic obstacles are shown in terms of time-to-collision and obstacle localization in test videos. The proposed approach is successful in estimating collision risk for head-on obstacles as well as obstacles that are close to the walking paths of the user. An end-to-end collision warning system based on inputs from a video camera as well as a gyro-sensor has been implemented on a generic laptop and on an embedded OMAP-3 compatible platform. The proposed embedded system represents a valuable contribution toward the development of a portable vision aid for visually impaired and blind patients. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video Demo: An Egocentric Vision Based Assistive Co-robot

    Page(s): 48 - 49
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (346 KB) |  | HTML iconHTML  

    We present the video demo of the prototype of an egocentric vision based assistive co-robot system. In this co-robot system, the user is wearing a pair of glasses with a forward looking camera, and is actively engaged in the control loop of the robot in navigational tasks. The egocentric vision glasses serve for two purposes. First, it serves as a source of visual input to request the robot to find a certain object in the environment. Second, the motion patterns computed from the egocentric video associated with a specific set of head movements are exploited to guide the robot to find the object. These are especially helpful for quadriplegic individuals who do not have the needed hand functionality for control with other modalities (e.g., joystick). In our co-robot system, when the robot does not fulfill the object finding task in a pre-specified time window, it would actively solicit user controls for guidance. Then the users can use the egocentric vision based gesture interface to orient the robot towards the direction of the object. After that the robot will automatically navigate towards the object until it finds it. Our experiments validated the efficacy of the closed-loop design to engage the human in the loop. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mobile Exergames - Burn Calories While Playing Games on a Smartphone

    Page(s): 50 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (337 KB) |  | HTML iconHTML  

    Exergames combine exercising with game play by requiring the users to perform some kind of physical activity (and exercise) in order to score points in the game. In this paper, we present a novel mobile exergaming framework, which requires the users to physically move and jump in order to score points in a game that is played on a smartphone. Our system uses a custom designed Exercising Pad (called ExerPad) in order to track the user's physical movement, and then automatically updates the corresponding game character's position on the screen. The ExerPad contains different shaped images, which are captured from the smartphone's inbuilt camera, and are automatically detected by our shape detection algorithm. We also use the smartphone's inbuilt accelerometer and gyroscope to detect other physical movements from the user such as jumping, turning etc. The experimental results show that the proposed mobile exergames helps its users to burn calories and have fun at the same time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Mobile Vision System for Fast and Accurate Ellipse Detection

    Page(s): 52 - 53
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (414 KB) |  | HTML iconHTML  

    Several papers addressed ellipse detection as a first step for several computer vision applications, but most of the proposed solutions are too slow to be applied in real time on large images or with limited hardware resources, as in the case of mobile devices. This demo is based on a novel algorithm for fast and accurate ellipse detection. The proposed algorithm relies on a careful selection of arcs which are candidate to form ellipses and on the use of Hough transform to estimate parameters in a decomposed space. The demo will show it working on a commercial smart-phone. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stabilization of Magnified Videos on a Mobile Device for Visually Impaired

    Page(s): 54 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (389 KB) |  | HTML iconHTML  

    The camera function in smart phones has a great potential to help visually impaired people with discerning scene details. Many apps have been developed to turn a smart phone into a handy video magnifier using digital zoom. As digital zoom normally cannot provide sufficient magnification for distant objects, optical telescopic devices attached to the smart phone cameras can be used to increase magnification power. However, image jitter of hand held phone cameras is greatly enlarged too, and can impair patients' reading efficiency, especially for distant objects. Gyro-sensor based stabilization methods are not effective for this application due to the limited precision of the gyro-sensor in common mobile devices. We have implemented an image motion based video stabilization method on iOS that is more sensitive than the gyro-sensor. The image motion based stabilization not only appeared to be able to remove most of the image jitter visually, it also improved the performance of human subjects, including 2 visually impaired, at discerning distant details. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Augmented Linear Discriminant Analysis Approach for Identifying Identical Twins with the Aid of Facial Asymmetry Features

    Page(s): 56 - 63
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (862 KB) |  | HTML iconHTML  

    In this work, we have proposed an Augmented Linear Discriminant Analysis (ALDA) approach to identify identical twins. It learns a common subspace that not only can identify from which family the individual comes, but also can distinguish between individuals within the same family. We evaluate the ALDA against the traditional LDA approach for subspace learning on the Notre Dame twin database. We have shown that the proposed ALDA method with the aid of facial asymmetry features significantly outperforms other well-established facial descriptors (LBP, LTP, LTrP), and the ALDA subspace method does a much better job in distinguishing identical twins than LDA. We are able to achieve 48.50% VR at 0.1% FAR for identifying family membership of identical twin individuals in the crowd and an averaged 82.58% VR at 0.1% FAR for verifying identical twin individuals within the same family, a significant improvement over traditional descriptors and traditional LDA method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous 3D Face Authentication Using RGB-D Cameras

    Page(s): 64 - 69
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1125 KB) |  | HTML iconHTML  

    We present a continuous 3D face authentication system that uses a RGB-D camera to monitor the accessing user and ensure that only the allowed user uses a protected system. At the best of our knowledge, this is the first system that uses 3D face images to accomplish such objective. By using depth images, we reduce the amount of user cooperation that is required by the previous continuous authentication works in the literature. We evaluated our system on four 40 minutes long videos with variations in facial expressions, occlusions and pose, and an equal error rate of 0.8% was achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fixation and Saccade Based Face Recognition from Single Image per Person with Various Occlusions and Expressions

    Page(s): 70 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (587 KB) |  | HTML iconHTML  

    Face recognition technique is widely used in the real-world applications over the past decade. Different from other biometric traits such as fingerprint and iris, face is the biological nature for humans to recognise a person even met just once. In this paper, we propose a novel method, which simulates the mechanism of fixations and saccades in human visual perception, to handle the face recognition from single image per person problem. Our method is robust to the local deformations of the face (i.e., expression changes and occlusions). Especially for the occlusion related problems, which have not received enough attentions compared with other challenging variations of illumination, expression and pose, our method significantly outperforms the state-of-the-art approaches despite various types of occlusions. Experimental results on the FRGC and the AR databases confirm the effectiveness of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Issues in Rotational (Non-)invariance and Image Preprocessing

    Page(s): 76 - 83
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1759 KB) |  | HTML iconHTML  

    This paper addresses two problems that have been largely overlooked in the literature. First, many systems seek to use, and algorithms claim to provide, rotational in-variance, such as fingerprint minutiae or SIFT/SURF features. We introduce a statistical test for rotational independence, using lossless rotations to show the differences are statistically significant and cannot be attributed to image noise. We use this to experimentally show fingerprint feature extractors fail to be rotation independent. We show the popular "rotation invariant" SURF and SIFT feature extractors, used in both biometric and general vision, also fail the rotation independence test. We then introduce a match-twist-match (MTM) paradigm and experimentally demonstrate that, by reducing the effective angular difference between probe and gallery, we can improve system matching performance. Our analysis, using FVC2002 and FVC2004 datasets, further shows that differences in extracted features impact the overall system performance of fingerprint matching of both matchers tested. Using the MTM approach, we reduce our secure template system's errors by 10%-20% -- helping us to define the current state of the art in the FVC-OnGoing Secure template competition with an EER of 1.698%. We end by bringing to the forefront the growing danger of sensors over-preprocessing of images. We show examples of the problems that can arise with preprocessing. As our rotation experiments showed, the impact of even modest numbers of feature errors suggest these preprocessing issues are likely very significant. We suggest the need for policy guidelines that require disclosure of preprocessing steps used and the development of standards for testing the impact of preprocessing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A New Metric for Latent Fingerprint Image Preprocessing

    Page(s): 84 - 91
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1026 KB) |  | HTML iconHTML  

    We propose a new image-based metric and explore its utility as a quality diagnostic for fingerprint image preprocessing. Due to the low quality of the latent fingerprint images, preprocessing is a common step in the forensic analysis workflow, and furthermore is critical to the success of fingerprint identification. Whereas fingerprint analysis is a well-studied field with a deep history, forensic image preprocessing is a relatively new domain in need of research and development of analysis and best practice guidance. Our new metric is based on an extension of the Spectral Image Validation and Verification (SIVV). SIVV was originally developed to differentiate ten-print or rolled fingerprint images from other non-fingerprint images such as face or iris images. Several modifications are required to extend SIVV analysis to the latent space. We propose, implement, and test this new SIVV-based metric to measure latent fingerprint image quality and the effectiveness of the forensic latent fingerprint preprocessing step. Preliminary results show that this new metric can provide positive indications of both latent fingerprint image quality, and the effectiveness of fingerprint preprocessing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minutiae-Based Matching State Model for Combinations in Fingerprint Matching System

    Page(s): 92 - 97
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (661 KB) |  | HTML iconHTML  

    In this paper we investigate the question of combining multi-sample matching results obtained during repeated attempts of fingerprint based authentication. In order to utilize the information corresponding to multiple input templates in a most efficient way, we propose a minutiae-based matching state model which uses relationship between test templates and enrolled template. The principle of this algorithm is that matching parameters, i.e the sets of matched minutiae, between these templates should be consistent in genuine matchings. Experiments are performed on FVC2002 fingerprint databases. Result shows that the system utilizing the proposed matching state model is able to outperform the original system with raw matching scores. Likelihood ratio and multilayer perceptron are used as combination methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Anti-spoofing in Action: Joint Operation with a Verification System

    Page(s): 98 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6191 KB) |  | HTML iconHTML  

    Besides the recognition task, today's biometric systems need to cope with additional problem: spoofing attacks. Up to date, academic research considers spoofing as a binary classification problem: systems are trained to discriminate between real accesses and attacks. However, spoofing counter-measures are not designated to operate stand-alone, but as a part of a recognition system they will protect. In this paper, we study techniques for decision- level and score-level fusion to integrate a recognition and anti-spoofing systems, using an open-source framework that handles the ternary classification problem (clients, impostors and attacks) transparently. By doing so, we are able to report the impact of different spoofing counter-measures, fusion techniques and thresholding on the overall performance of the final recognition system. For a specific use- case covering face verification, experiments show to what extent simple fusion improves the trustworthiness of the system when exposed to spoofing attacks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Computationally Efficient Face Spoofing Detection with Motion Magnification

    Page(s): 105 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB) |  | HTML iconHTML  

    For a robust face biometric system, a reliable anti-spoofing approach must be deployed to circumvent the print and replay attacks. Several techniques have been proposed to counter face spoofing, however a robust solution that is computationally efficient is still unavailable. This paper presents a new approach for spoofing detection in face videos using motion magnification. Eulerian motion magnification approach is used to enhance the facial expressions commonly exhibited by subjects in a captured video. Next, two types of feature extraction algorithms are proposed: (i) a configuration of LBP that provides improved performance compared to other computationally expensive texture based approaches and (ii) motion estimation approach using HOOF descriptor. On the Print Attack and Replay Attack spoofing datasets, the proposed framework improves the state-of-art performance, especially HOOF descriptor yielding a near perfect half total error rate of 0%and 1.25% respectively. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape and Texture Based Countermeasure to Protect Face Recognition Systems against Mask Attacks

    Page(s): 111 - 116
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (615 KB) |  | HTML iconHTML  

    Photographs, videos or masks can be used to spoof face recognition systems. In this paper, a countermeasure is proposed to protect face recognition systems against 3D mask attacks. The reason for the lack of studies on countermeasures against mask attacks is mainly due to the unavailability of public databases dedicated to mask attack. In this study, a 2D+3D mask attacks database is used that is prepared for a research project in which the authors are all involved. The proposed countermeasure is based on the fusion of the information extracted from both the texture and the depth images in the mask database, and provides satisfactory results to protect recognition systems against mask attacks. Another contribution of this study is that the countermeasure is integrated to the selected baseline systems for 2D and 3D face recognition, which provides to analyze the performances of the systems with/without attacks and with/without the countermeasure. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What Is a "Good" Periocular Region for Recognition?

    Page(s): 117 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1695 KB) |  | HTML iconHTML  

    In challenging image acquisition settings where the performance of iris recognition algorithms degrades due to poor segmentation of the iris, image blur, specular reflections, and occlusions from eye lids and eye lashes, the periocular region has been shown to offer better recognition rates. However, the definition of a periocular region is subject to interpretation. This paper investigates the question of what is the best periocular region for recognition by identifying sub-regions of the ocular image when using near-infrared (NIR) or visible light (VL) sensors. To determine the best periocular region, we test two fundamentally different algorithms on challenging periocular datasets of contrasting build on four different periocular regions. Our results indicate that system performance does not necessarily improve as the ocular region becomes larger. Rather in NIR images the eye shape is more important than the brow or cheek as the image has little to no skin texture (leading to a smaller accepted region), while in VL images the brow is very important (requiring a larger region). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.