By Topic

Applied Imagery Pattern Recognition Workshop, 2008. AIPR '08. 37th IEEE

Date 15-17 Oct. 2008

Filter Results

Displaying Results 1 - 25 of 44
  • Boosted multi image features for improved face detection

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (759 KB) |  | HTML iconHTML  

    In this paper, we present novel approaches of automatically detecting human faces in images which is extremely important for any face recognition system. This paper expands on the traditional Viola-Jones approach by proposing to boost a plethora of mixed feature sets for face detection; we do this by adding non-Haar-like elements to a large pool of mixed features in an Adaboost framework. We show how to generate discriminative support vector machine (SVM) type features and Gabor-type features (in various orientations and frequencies and central locations) and use this whole pool as possible discriminative candidate feature sets in modeling the patterns of a frontal view human face. This general and large-diversity pool of features is used to build a boosted strong classifier and we show we can improve the generalization performance of the AdaBoost approach, and as a result improving the robustness of the face detector. We report performance on the MIT+CMU face database and compare the result with other published face detection algorithms. We also discuss processing times and speeding up methods to offset the increase in complexity in order to achieve face detection in real time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tear-duct detector for identifying left versus right iris images

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (552 KB) |  | HTML iconHTML  

    In this paper, we present different pattern recognition approaches for automatically detecting tear ducts in iris acquired eye images for enhancing iris recognition and detecting mislabeling in datasets. Detecting the tear duct in an image will tell an iris recognition system whether the presented eye image is that of a left or a right eye. This will enable the iris matcher to match the enrolled image against images in the database belonging to the same side, thus reducing the error rates by eliminating the chance of matching a left iris to a right iris or vice-versa. This is a major problem in many single iris imaging acquisition devices currently deployed in the field where the data recorded is mislabeled due to human error. We present several techniques of detecting tear ducts, including boosted Haar features, support vector machines (SVM), and more traditional approaches like PCA and LDA. Finally, we show that tear duct detection improves the detection of left/right iris recognition over previous approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification of indecent videos by low complexity repetitive motion detection

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (433 KB) |  | HTML iconHTML  

    This paper proposes a fast method for detection of indecent video content using repetitive motion analysis. Unlike skin detection, motion will provide invariant features irrespective of race and color. The video material to be evaluated is divided into short fixed-length sections. By filtering different combinations of B-frame motion vectors using adjacency in time and space, one dominant motion vector is constructed for each frame. The power spectral density estimate of this dominant motion vector is then computed using a periodogram with a Hamming window. The resulting power spectrum is then subjected to a Slepian selection window to restrict the spectrum to a limited frequency range typical of indecent movement, as empirically derived by us. A threshold detector is then applied to detect repetitive motion in video sections. However, there are instances where repetitive motion occurs in these shorter sections without the video as a whole being indecent. As a second step, an additional detector can be employed to determine if the sections over a longer period of time can be classified as containing indecent material. The proposed method is resource efficient and do not require the typical IDCT step of video decoding. Further, the computationally expensive spectral estimation calculations are done using only one value per frame. Evaluations performed using a restricted set of videos show promising results with high true positive probability (>85%) for a low false positive probability (<10%) for the repetitive motion detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An edge detection technique in images

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (710 KB) |  | HTML iconHTML  

    A novel edge detection method based on a neighbourhood similarity criterion is presented in this paper. In this algorithm, the pixels in the original image that have minimum numbers of similar pixels among its neighbouring pixels in the filtering window are labelled as edge pixels. Simulation results demonstrate that this approach performs well in noise-free images but it is superior to the others in images corrupted by AWGV. Moreover, the algorithm is fast and has low computational complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bandwidth efficient sensor architectures with feature extraction

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1001 KB) |  | HTML iconHTML  

    We report on processing techniques to effectively control the data bandwidth in larger format focal plane array (FPA) sensors. We have developed an image processing architecture for foveating variable acuity FPAs that give a controlled reduction in the data rate via simple circuits that estimate activity on the FPA image plane. Integrated on-FPA signal processing goals are to perform pre-processing that is usually performed downstream in a dedicated processing module. Techniques for image pre-processing described in this paper allow transmitting ldquoactiverdquo pixel data while skipping unchanging pixels. These techniques for image pre-processing adjacent to the FPA allows significant reductions in the data rate, size, weight and power for small and low cost systems that cannot work with a large image processors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A robust segmentation approach to iris recognition based on video

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2456 KB) |  | HTML iconHTML  

    One of the key problems of conventional iris recognition methods is that they are based on processing single iris image and require good image quality as an essential condition. These requisites entail considerable constraints on users for taking iris images. Video based iris recognition can provide convenience and time efficiency to the subjects with undemanding restrains during iris acquisition. These videos which are usually taken of moving subjects at a distance, although convenient to the user, may result in an acquisition approach that can certainly introduce unexpected noise sources to the iris images, impacting as a consequence the verification accuracy for iris recognition. With this dilemma, this study introduces a new segmentation approach for video based iris recognition. The proposed approach consists of two steps. The first step consists of a video frame selection which is to obtain qualified frames from near infrared (NIR) video, where the subjects' eye images are extracted based on the featured reflection spots generated by the specified video camera. The second step is based on an iris segmentation designed to isolate the iris from the eye image. Since iris images obtained from NIR video may suffer from different kinds of noise effects, a new strategy is proposed for iris segmentation to overcome such noise effects and process effectively the eye images (frames) in the presence of noise. More importantly, the proposed iris segmentation strategy would not only separate the iris part from the sclera and pupil, but also can identify the extraneous overlapping parts caused by eyelids, eyelashes, and reflection spots. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • MirrorTrack — a real-time multiple camera approach for multi-touch interactions on glossy display surfaces

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3130 KB) |  | HTML iconHTML  

    This paper presents a real-time multiple camera approach for multi-touch interaction system that takes advantage of specular display surface (such as conventional LCD displays) and the mirror-effect in a low-azimuth camera angle to detect and track fingers their reflections simultaneously. Building on our prior work, 1. We use multi-resolution processing to greatly improve runtime performance of the system; 2. We employ different edge detection and pattern recognition algorithms for different processing resolution to help detect fingers more accurately and efficiently; 3. We track both the location of a fingertip and its pointing direction so it can be identified more effectively; 4. We use a full stereo algorithm to compute finger locations in the 3D space more accurately. Our system has many advantages. 1. It works with any glossy flat panel display; 2. It avoids clumsy set-up time of a top-down camera with the concomitant screen glare problems; 3. It supports both touch and hover operation; 4. It can work with large vertical display without the usual occlusion problems. We describe our approach and implementation in details. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Behavior recognition architecture for surveillance applications

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (650 KB) |  | HTML iconHTML  

    Differentiating between normal human activity and aberrant behavior via closed circuit television cameras is a difficult and fatiguing task. The vigilance required of human observers when engaged in such tasks must remain constant, yet attention falls off dramatically over time. In this paper we propose an architecture for capturing data and creating a test and evaluation system to monitor video sensors and tag aberrant human activities for immediate review by human monitors. A psychological perspective provides the inspiration of depicting isolated human motion by point-light walker (PLW) displays, as they have been shown to be salient for recognition of action. Low level intent detection features are used to provide an initial evaluation of actionable behaviors. This relies on strong tracking algorithms that can function in an unstructured environment under a variety of environmental conditions. Critical to this is creating a description of ldquosuspicious behaviorrdquo that can be used by the automated system. The resulting confidence value assessments are useful for monitoring human activities and could potentially provide early warning of IED placement activities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A spatial feature enhanced MMI algorithm for multi-modal wild-fire image registration

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (284 KB) |  | HTML iconHTML  

    The integration of multi-spectral airborne imagery and geographic data for wildfire and emergency response requires 3D multiple view registration. Registration of maps, visible imagery and IR imagery, especially LWIR, is challenging because of the difference in brightness, color and features that are available in the different modalities. We have developed a semi-automated workflow for the registration and exploitation of this imagery and data that can produce quick-turnaround products for research and wildfire management. The technique is based upon an enhancement of the conventional maximization of mutual information. This technique largely overcomes the problems that arise from uncorrelated variations in pixel intensity between visible sensors, LWIR sensors that respond to temperature variations, and artificial colorations present in maps. A measure of registration confidence based upon the kurtosis of search space has been developed to enable operators to be cued to examine suspicious result produced by the semi-automated workflow algorithms. Experiments on real wild-fire imagery demonstrate the performance of the technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Full body tracking using an agent-based architecture

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5313 KB) |  | HTML iconHTML  

    We present an agent-based full body tracking and 3D animation system to generate motion data using stereo calibrated cameras. The novelty of our approach is that agents are bound to body-part (bone structure) being tracked. These agents are autonomous, self-aware entities that are capable of communicating with other agents to perform tracking within agent coalitions. Each agent seeks for ldquoevidencerdquo for its existence both from low-level features (e.g. motion vector fields, color blobs) as well as from its peers (other agents representing body-parts with which it is compatible), and it also combines the knowledge from high-level abstraction. Multiple agents may represent different ldquocandidatesrdquo for a body-part, and compete for a place within a coalition that constitutes the tracking of an articulated human body. The power of our approach is the flexibility by which domain information may be encoded within each agent to produce an overall tracking solution. We demonstrate the effectiveness of tracking system by testing actions (random moving and walking). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identity dominance: Using fingerprints to link an individual to a larger social structure

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (340 KB) |  | HTML iconHTML  

    This is a fingerprint pattern and ridge count analysis for two population groups used to associate an individual to a group through qualitative and quantitative comparison. The fingerprint data from the two groups were analyzed using a classification and regression tree algorithm. Four distinct trees were produced. The first tree separated the two populations using only finger number and pattern. Subsequent trees separated the two populations using finger number, pattern, and ridge count. Including ridge counts increased the per-finger classification accuracy from 56.4% to 73.9% and 79.5% for right and left loop patterns respectively. Whorls with both ridge counts improved the classification accuracy to 83.3%. The classification accuracies provided the basis for determining the probability of correctly associating a person to one of the two groups. For each finger, the probability of correctly associating the finger to the group is binomially distributed based upon the classification probabilities. Association is based upon a majority vote. In the worst case with only finger pattern and finger number available, the expected probability of correctly associating the individual is 54.1% using all ten fingers. Adding ridge counts raises the lower bound to 90.8%. The upper bound using whorls with two ridge counts is 98.4%. Between these two extremes are cases in which the patterns vary among the fingers. Because the probability of correctly associating the individual to the city depends on the data available, cases where the fingerprint patterns or the deltas are not discernible reduce the probability of correct association accordingly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dual IR spectral video inspection of a concealed live animal

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (618 KB) |  | HTML iconHTML  

    Multi-spectral videos have been used in many different fields, however, most prevalently, in the military and medical application areas. Computer vision experts are especially interested in using satellite-grade infrared (IR) sensors for object detection, recognition and identification (DRI) tasks. There has however, been increased interest in using multi-spectral videos for tasks such as inspection/surveillance, image synthesis, N-D object modeling, collision avoidance and intelligent navigation system development. Implicit in the acquisition and processing of videos (and images) for 3D rigid-body objects are the issues of restoration and registration via the traditional affine transformation. In this paper, we present a dynamic scheme for passive-ID recognition of a 3D deformable object, a live hamster; a recognition-ID generated by the fusion of a long-infrared (LIR), 8-12 mum, and middle-infrared (MIR), 5-8 mum, video cameras. This fusion, based on the nonlinear blind demixing of pixels, was previously applied to perform early passive breast cancer detection. By combining blind pixel demixing of a pair of spectral videos (image sequences) and the adaptive neighborhood histogram modification method, we have generalized local video restoration and registration for a live animal in a concealed environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Quantifying interpretability for motion imagery with applications to image compression

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (349 KB) |  | HTML iconHTML  

    For still imagery, the national imagery interpretability rating scale (NIIRS) has served as a community standard for quantifying interpretability. No comparable scale exists for motion imagery. This paper summarizes a series of user evaluations to understand and quantify the effects of critical factors affecting the perceived interpretability of motion imagery. These evaluations provide the basis for relating perceived image interpretability to image parameters, including ground sample distance (GSD) and frame rate. The first section of this paper presents the key findings from these studies. The second part of the paper applies these methods to quantifying information loss due to compression of motion imagery. We consider several methods for video compression (JPEG2000, MPEG-2, and H.264) at various bitrates. A set of objective image quality metrics were computed for the parent video clip and the various compressed products. The metrics are compared to subjective ratings provided by trained imagery analysts. The imagery analysts rated each clip relative to image interpretability tasks. Both the objective metrics and the ratings by analysts indicate the interpretability loss arising from the compression. The findings indicate the compression rates at which image interpretability declines significantly, with implications for sensor system design, systems architecture, and mission planning. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time, multiple hot-target tracking and multi-spectral fusion

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6682 KB) |  | HTML iconHTML  

    The use of technology to enhance a user interface has always been the driving force for new and upcoming research. This is of even more importance when the technology is being used by the armed forces and needs to improve the soldier's situational awareness in hostile conditions. Night vision goggles can provide a tactical advantage to soldiers by giving them the ability to see when their foes cannot. Digital night vision goggles can also support algorithms that improve tactical advantage even further by cueing the soldier under certain conditions. However, cueing can also prove to be a distraction. We have developed several algorithms that can be used to track a thermally hot target through a scene and have explored several methods of presenting the information to the user through the display. Our target hardware is a four-aperture, three spectral band night vision goggle; the goggle includes stereo intensified visible/near infrared, short wave infrared and longwave infrared sensors. Thermal imagery is used to detect and track hot-targets and this information is used in several cueing schemes fused into the overall scene displayed to the user. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A survey on behavior analysis in video surveillance for homeland security applications

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (205 KB) |  | HTML iconHTML  

    Surveillance cameras are inexpensive and everywhere these days but the manpower required to monitor and analyze them is expensive. Consequently the videos from these cameras are usually monitored sparingly or not at all; they are often used merely as archive, to refer back to once an incident is known to have taken place. Surveillance cameras can be a far more useful tool if instead of passively recording footage, they can be used to detect events requiring attention as they happen, and take action in real time. This is the goal of automated visual surveillance: to obtain a description of what is happening in a monitored area, and then to take appropriate action based on that interpretation. Video surveillance for humans is one of the most active research topics in computer vision. It has a wide spectrum of promising homeland security applications. Video management and interpretation systems have become quite capable in recent years. This paper looks into how hardware and software can be put together to solve surveillance problems in an age of increased concern with public safety and security. In general, the framework of a video surveillance system includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, behavior understanding and description, and fusion of information from multiple cameras. Despite recent progress in computer vision and other related areas, there are still major technical challenges to be overcome before reliable automated video surveillance can be realized. This paper reviews developments and general strategies of stages involved in video surveillance, and analyzes the feasibility and challenges for combining motion analysis, behavior analysis, and standoff biometrics for identification of known suspects, anomaly detection, and behavior understanding. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Investigating useful and distinguishing features around the eyelash region

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (667 KB) |  | HTML iconHTML  

    Traditionally, iris recognition is always about analyzing and extracting features from iris texture. We proposed to investigate regions around eyelashes and extract useful information which helps us to perform ethnic classification. We propose an algorithm which is easy to implement and effective. First, we locate eyelash region by using ASM to model eyelid boundary. Second, we extract local patch around local landmarks. After image processing, we are able to separate eyelashes and extract features from the directions of eyelashes. Those features are descriptive and can be used to train classifiers. Experimental results show our method can successfully perform East-Asian/Caucasian classification up to 93% accuracy, which shows our proposed method is useful and promising. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid training of image classifiers through adaptive, multi-frame sampling method

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    Computer vision methods, such as automatic target recognition (ATR) techniques, have the potential to improve the accuracy of military systems for weapon deployment and targeting, resulting in greater utility and reduced collateral damage. A major challenge, however, is training the ATR algorithm to the specific environment and mission. Because of the wide range of operating conditions encountered in practice, advanced training based on a pre-selected training set may not provide the robust performance needed. Training on a mission-specific image set is a promising approach, but requires rapid selection of a small, but highly representative training set to support time-critical operations. To remedy these problems and make short-notice seeker missions a reality, we developed learning and mining using bagged augmented decision trees (LAMBAST). LAMBAST examines large databases and extracts sparse, representative subsets of target and clutter samples of interest. For data mining, LAMBAST uses a variant of decision trees, called random decision trees (RDTs). This approach guards against overfitting and can incorporate novel, mission-specific data after initial training via perpetual learning. We augment these trees with a distribution modeling component that eliminates redundant information, ignores misrepresentative class distributions in the database, and stops training when decision boundaries are sufficiently sampled. These augmented random decision trees enable fast investigation of multiple images to train a reliable, mission-specific ATR. This paper presents the augmented random decision tree framework, develops the sampling procedure for efficient construction of the sample, and illustrates the procedure using relevant examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Non-Gaussian methods in biomedical imaging

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    Most statistical models for applications rely on the Gaussian assumption. Yet, in many realistic situations, the underlying variation or uncertainty is essentially non-Gaussian. In detection problems, for instance, the Gaussian assumption leads to false alarms in cases where the tail is a fatter one, such as in the case of the Laplace density function. In classification problems, the Gaussian model for variability may be too restrictive, and other models, such as the Generalized Gaussian density function, are more appropriate. We will present examples of such models as applied to applications with multiple images, and show performance in two applications: functional magnetic resonance imaging, and stem cell classification. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image fusion with multiband linear arrays

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (521 KB) |  | HTML iconHTML  

    Combining information from multiple spectral bands to describe a scene in which objects will be tracked can aid a well designed algorithm in differentiating between interesting and unimportant objects. This will lead to a more effective automated tracking system. Successful image fusion requires accurate imagery and good image registration. While offering certain advantages, using linear arrays also introduces new challenges. This paper will discuss the design of a system that combines a dual band infrared (IR) imager with a visible imager to feed a robust tracker with an intuitive user interface. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating monomodal biometric matchers through logistic regression rank aggregation approach

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (419 KB) |  | HTML iconHTML  

    Biometric system relies on person's behavioral and/or physiological characteristics as an alternative means of person authentication (traditional means being password, smart card, ID etc.). However, biometric system based solely on a single biometric may not always meet security requirements. Thus multibiometric systems are emerging as a trend which helps in overcoming limitations of single biometric solutions, such as when a user does not have a quality sample to present to the system and reduces the ability of the system to be tricked fraudulently. A reliable and successful multibiometric system needs an effective fusion scheme to integrate the information presented by multiple matchers. In this research, we integrate results of three monom.odal biometric matchers (face, ear and iris) with the logistic regression approach of rank level fusion method. In this approach, not only the outcomes of the three mono-modal matchers are considered, but also their effectiveness, based on previous research, are also considered for final rank aggregation. Experiment results indicate that Logistic Regression method outperform Borda count method or plurality voting method. The system can be a contribution to the homeland and border security or other security applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi features hybrid Active Shape Model for automated lip contours tracking in video sequence

    Page(s): 1 - 5
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB) |  | HTML iconHTML  

    We propose and evaluate a novel method for enhancing performance of lips contour tracking, which is based on the concept of active shape models (ASM) and multi features. On the first image of the video sequence, lip region is detected using the Bayesian's rule in which lip color information is modeled by a Gaussian mixture model (GMM) which is trained by expectation-maximization (EM) algorithm. The lip region is then used to initialize the lip shape model. A single feature-based ASM presents good performance only in particular conditions but gets stuck in local minima for noisy conditions (like beard, wrinkle, poor texture, low contrast between lip and skin, etc). To enhance the convergence, we propose to use 2 features: normal profile and grey level patches, and combine them with a voting approach. The standard ASM is not able to take into account temporal information from previous frames therefore the lip contours are tracked by replacing the standard ASM with a hybrid active shape model (HASM) which is capable to take advantage of the temporal information. Initial experimental results on video sequences show that MF-HASM is more robust to local minimum problem and gives a higher accuracy than traditional single feature-based method in lip tracking problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Current challenges in automating visual perception

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4333 KB) |  | HTML iconHTML  

    After nearly half a century of computer vision research, application-specific systems are common but the goal of developing a robust, general-purpose computer vision system remains out of reach. Rather than focus on the strengths and weaknesses of current computer vision approaches, this paper will enumerate and investigate the challenges that must be overcome before this goal can be achieved. Key challenges include handling variations in environment or acquisition parameters such as lighting, view angle, distance, and image quality; recognizing naturally occurring as well as intentionally deceptive variations in object appearance; providing robust general-purpose image segmentation and co-registration; generating 3D representations from 2D images; developing useful object representations; providing required knowledge that is not represented in the image itself; and managing computational complexity. Each of these challenges, along with their relevance to solving the vision problem, will be discussed. Understanding these challenges as a whole may provide insight into underlying mechanisms that will provide the backbone of a robust general-purpose computer vision system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-cost, high-speed computer vision using NVIDIA's CUDA architecture

    Page(s): 1 - 7
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1266 KB) |  | HTML iconHTML  

    In this paper, we introduce real time image processing techniques using modern programmable graphic processing units (GPU). GPUs are SIMD (single instruction, multiple data) device that is inherently data-parallel. By utilizing NVIDIA's new GPU programming framework, ldquocompute unified device architecturerdquo (CUDA) as a computational resource, we realize significant acceleration in image processing algorithm computations. We show that a range of computer vision algorithms map readily to CUDA with significant performance gains. Specifically, we demonstrate the efficiency of our approach by a parallelization and optimization of Canny's edge detection algorithm, and applying it to a computation and data-intensive video motion tracking algorithm known as ldquovector coherence mappingrdquo (VCM). Our results show the promise of using such common low-cost processors for intensive computer vision tasks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of compression schemes for wide area video

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1076 KB) |  | HTML iconHTML  

    Current and upcoming wide-area aerial video collectors have very large effective focal plane arrays, and can generate a tremendous amount of data. This presents significant challenges for onboard storage and for real-time downlink. This paper presents the results of an evaluation of a number of different image and video compression schemes on wide-area video. In general, we found that video compression produces 3 to 5 times more compression than single image compression, at equivalent quality. The quality was measured using the Structural Similarity metric. We also found that the stream can be compressed by a factor of 100-200 without a perceptual loss in quality. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploitation of massive numbers of simple events

    Page(s): 1 - 6
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3191 KB)  

    Emerging image-based sensor systems can observe a relatively large area (e.g., the size of an urban neighborhood) for long time intervals either continually or with high revisit rates. This type of sensor data makes new types of exploitation possible, but only with the assistance of automated exploitation aids because of the massive volume of data that must be studied as a whole. Automated methods to extract the simplest events from image sequences are often fairly robust (e.g., change events derived from EO or SAR image sequences or from video-derived tracks). Massive numbers of such events can contain information with high intelligence value. This paper examines this general-purpose problem: How massive numbers of the simplest sensor-derived events can be exploited. We summarize the basic functionality an intelligence analyst needs for studying this type of event data, in short: to understand the spatial structure, temporal structure and event-pair structure within an area of regard. Then we present a number of algorithms for automated exploitation of such data, and some visualization tools to help analysts study such data. Experimental results using all those technologies are also presented. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.