By Topic

Computer Vision, IET

Issue 6 • Date November 2012

Filter Results

Displaying Results 1 - 13 of 13
  • Special section: invited papers from editorial board members [Editorial]

    Page(s): 499
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (52 KB)  

    In 2006 I agreed to become the first Editor-in-Chief of this journal. One of my first tasks was to appoint an international editorial board whose interests covered the broad theme of cognitive vision, that is to say computational vision motivated by either neuroscience or psychology. In 2011, after 6 years devoted to the journal, I decided it was time to pass on the baton, and Majid Mirmehdi agreed to take over as Editor-in-Chief. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-to-many feature matching in object recognition: a review of three approaches

    Page(s): 500 - 513
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (921 KB)  

    The mainstream object categorisation community relies heavily on object representations consisting of local image features, due to their ease of recovery and their attractive invariance properties. Object categorisation is therefore formulated as finding, that is, `detecting`, a one-to-one correspondence between image and model features. This assumption breaks down for categories in which two exemplars may not share a single local image feature. Even when objects are represented as more abstract image features, a collection of features at one scale (in one image) may correspond to a single feature at a coarser scale (in the second image). Effective object categorisation therefore requires the ability to match features many-to-many. In this paper, we review our progress on three independent object categorisation problems, each formulated as a graph matching problem and each solving the many-to-many graph matching problem in a different way. First, we explore the problem of learning a shape class prototype from a set of class exemplars which may not share a single local image feature. Next, we explore the problem of matching two graphs in which correspondence exists only at higher levels of abstraction, and describe a low-dimensional, spectral encoding of graph structure that captures the abstract shape of a graph. Finally, we embed graphs into geometric spaces, reducing the many-to-many graph-matching problem to a weighted point matching problem, for which efficient many-to-many matching algorithms exist. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Saliency in images and video: a brief survey

    Page(s): 514 - 523
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (465 KB)  

    Salient image regions permit non-uniform allocation of computational resources. The selection of a commensurate set of salient regions is often a step taken in the initial stages of many computer vision algorithms, thereby facilitating object recognition, visual search and image matching. In this study, the authors survey the role and advancement of saliency algorithms over the past decade. The authors first offer a concise introduction to saliency. Next, the authors present a summary of saliency literature cast into their respective categories then further differentiated by their domains, computational methods, features, context and use of scale. The authors then discuss the achievements and limitations of the current state of the art. This information is augmented by an outline of the datasets and performance measures utilised as well as the computational techniques pervasive in the literature. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multimodal imaging: modelling and segmentation with biomedical applications

    Page(s): 524 - 539
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1673 KB)  

    The maximum a posteriori (MAP) technique, combining intensity and spatial interactions, has been a standard statistical approach for image segmentation. Crucial steps for the MAP technique are the model identification, incorporation of priors, and the optimisation approach. This paper describes an unsupervised MAP segmentation framework of N-dimensional multimodal images. The input image and its desired labelling are described by a joint Markov-Gibbs random field (MGRF) model of independent image signals and interdependent region labels. A kernel approach is used to model the joint and marginal probability densities of objects from the gray level histogram, incorporating a generalised linear combination of Gaussians (LCG). A novel maximum likelihood estimate (MLE) for the number of classes in the LCG model is introduced. An approach is devised for MGRF model identification based on region characteristics. The segmentation process employs LCG to provide an initial segmentation, then α-expansion move algorithm iteratively refines the labelled image using MGRF. The resulting MAP algorithm is studied in terms of convergence and sensitivity to initialisation, improper estimation of the number of classes, and discontinuities in the objects. The framework is modular, allowing incorporation of intensity and spatial interactions with varying complexity, and can be extended to incorporate shape priors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Backgroundless detection of pedestrians in cluttered conditions based on monocular images: a review

    Page(s): 540 - 550
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (409 KB)  

    The significant progress in visual surveillance has been motivated by the need to emulate some of the human ability to monitor activity in human-made environments, particularly in the contexts of security and safety. The rapid rise in numbers of cameras installed in public and private places makes such automation desirable, at least to reduce CCTV workload. Real-world applications of visual surveillance impose the need of robust real-time solutions, able to deal with a wide range of circumstances and environmental conditions. Conventional approaches work based on what has become known as motion (or change) detection followed by tracking (in single or multiple camera systems). Objects of interest are represented by rectangular blobs and decisions on whether something might be interesting are made on rules or learned patterns of presence and trajectories of such blobs. There is growing interest in looking `inside the box` for applications that are concerned with detailed human activity recognition and with robust detection of people even when image backgrounds change, as is the case of a moving camera. In this study, the authors consider the general problem of robust pedestrian detection irrespective of background, reviewing the state of the art, showing some representative results and suggesting ways forward. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse local discriminant projections for discriminant knowledge extraction and classification

    Page(s): 551 - 559
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (616 KB)  

    One of the major disadvantages of the linear dimensionality reduction algorithms, such as principle component analysis (PCA) and linear discriminant analysis (LDA), is that the projections are lack of physical interpretation. Moreover, which features or variables play an important role in feature extraction and classification in classical linear dimensionality reduction methods is still not investigated well. This paper proposes a novel supervised learning method called sparse local discriminant projections (SLDPs) for linear dimensionality reduction. Differed from the recent manifold-learning-based methods such as local preserving projections (LPPs), SLDP introduces a sparse constraint into the objective function and integrates the local geometry, discriminant information and within-class geometry to obtain the sparse projections. The sparse projections can be efficiently computed by the Elastic Net. The most important and interesting thing is that the sparse projections learned by SLDP have a direct physical interpretation and provide us the discriminant knowledge and insightful understanding for the extracted features. The experimental results show that SLDP can give reasonable semantic results and achieves competitive performance compared with some techniques such as PCA, LPP, neighbourhood preserving embedding (NPE) and the recently proposed unified sparse subspace learning (USSL). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Empirical investigation into the correlation between vignetting effect and the quality of sensor pattern noise

    Page(s): 560 - 566
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (545 KB)  

    The sensor pattern noise (SPN) is a unique attribute of the content of images that can facilitate identification of source digital imaging devices. Owing to its potential in forensic applications, it has drawn much attention in the digital forensic community. Although much work has been done on the applications of the SPN, investigations into its characteristics have been largely overlooked in the literature. In this study, the authors aim to fill this gap by providing insight into the characteristic dependency of the SPN quality on its location in images. They have observed that the SPN components at the image periphery are not reliable for the task of source camera identification, and tend to cause higher false-positive rates. Empirical evidence is presented in this work. The authors suspect that this location-dependent SPN quality degradation has strong connection with the so-called `vignetting effect`, as both exhibit the same type of location dependency. The authors recommend that when image blocks are to be used for forensic investigations, they should be taken from the image centre before SPN extraction is performed in order to reduce false-positive rate. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Non-linear factorised dynamic shape and appearance models for facial expression analysis and tracking

    Page(s): 567 - 580
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (927 KB)  

    Facial expressions exhibit non-linear shape and appearance deformations with variations in different people and expressions. The authors present a non-linear factorised shape and appearance model for facial expression analysis and tracking. The novel non-linear factorised generative model of facial expressions, using conceptual manifold embedding and empirical kernel maps, provides accurate facial expression shape and appearance. It preserves non-linear facial deformations based on the configuration, face style and expression type. The proposed model supports tasks, such as facial expression recognition, person identification and global and local facial motion tracking. Given a sequence of images, temporal embedding, expression type and person identification parameters are iteratively estimated for facial expression analysis. The authors combine global facial motion estimation and local facial deformation estimation for large global and subtle local facial motion tracking. The authors employ local facial motion deformation estimation using a thin-plate spline for subtle facial motion tracking. The global shape and appearance model provides appearance templates for the estimation of local deformation. Experimental results using Cohen-Kanade AU-coded facial expressions demonstrate facial expression recognition using estimated personal style parameter, and facial deformation tracking using global and local facial motion estimation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive pattern spectrum image description using Euclidean and Geodesic distance without training for texture classification

    Page(s): 581 - 589
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (832 KB)  

    Mathematical morphology can be used to extract a shape-size distribution called pattern spectrum (PS) with texture description purposes. However, the structuring element (SE) used to compute it does not vary along the image; and therefore it does not capture its geometrical variations. The author-s proposal consists of computing an SE at each pixel whose size and shape varies with two distance criterions: an Geodesic distance and a Euclidean distance, in order to fit the texture as well as possible. Combining the Geodesic and the Euclidean descriptors as just one descriptor, the classification results of several textures from the VisTex and Brodatz database show that this approach outperforms the classical PS, the Geodesic and the Euclidean descriptors separately and, in contrast with other adaptive methods, it does not require previous training. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-human tracking from sparse detection responses

    Page(s): 590 - 602
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (625 KB)  

    In this study, the authors focus on the performance improvements of multi-human tracking from sparse detection responses. Many previous detection-based data association tracking methods used dense detection responses as input, but they did not take into account in the case of sparse detection responses. Dense detection responses are difficult to obtain in complex environments all the time. In order to achieve this goal, they propose a particle-filter-based triple threshold method to build reliable trajectories. Here, they apply topic model to represent human appearance. The appearance of each person can be considered as topic distribution. Then a cost function algorithm is used to associate these trajectories in a time sliding window for final tracking results. The cost function is composed of four parts: appearance cost, motion direction cost, object size cost and distance cost. These four parts are integrated into a unified formula to build this cost function. Finally, they use three challenging datasets to evaluate the performance of the author's approach in the case of dense and sparse detection responses, respectively, and compare with state-of-the-art approaches. The results show that their approach can obtain better tracking performance than that of previous methods in both cases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fusion of visual cues of intensity and texture in Markov random fields image segmentation

    Page(s): 603 - 609
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (505 KB)  

    This study proposes an algorithm that fuses visual cues of intensity and texture in Markov random fields region growing texture image segmentation. The idea is to segment the image in a way that takes EdgeFlow edges into consideration, which provides a single framework for identifying objects boundaries based on texture and intensity descriptors. This is achieved by modifying the energy minimisation process, so that it would penalise merging regions that have EdgeFlow edges in the boundary between them. Experimental results confirm the hypothesis that the integration of edge information increases the precision of the segmentation by ensuring the conservation of the homogeneous objects contours during the region growing process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Brain magnetic resonance image segmentation based on an adapted non-local fuzzy c-means method

    Page(s): 610 - 625
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (2126 KB)  

    Intensity inhomogeneities cause considerable difficulties in the quantitative analysis of magnetic resonanceimages (MRIs). Consequently, intensity inhomogeneities estimation is a necessary step before quantitative analysis of MR data can be undertaken. This study proposes a new energy minimisation framework for simultaneous estimation of the intensity inhomogeneities and segmentation. The method was formulated by modifying the objective function of the standard fuzzy c-means algorithm to compensate for intensity inhomogeneities by using basis functions and to compensate for noise by using improved non-local information. The energy function depends on the coefficients of the basis functions, the membership ratios, the centroid of the tissues and an improved non-local information in the image. Intensity inhomogeneities estimation and image segmentation are simultaneously achieved by calculating the result of minimising this energy. The non-local framework has been widely used to provide non-local information; however, the traditional framework only considers the neighbouring patch information, which will lose information of the corner and end points. This study presents an improved non-local framework, which can contain the corner and end points region information. Experimental results on both real MRIs and simulated MR data show that the authors method can obtain more accurate results when segmenting images with bias field and noise. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Codebook reconstruction with holistic information fusion

    Page(s): 626 - 634
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (918 KB)  

    Bag of feature model has been shown to be one of the most successful methods in generic image categorisation. However, creating codebook by clustering local feature vectors (e.g. Kmeans) may lose holistic information of images. This study presents a novel process called `Correlation Feedback` for codebook construction. It introduces semantic similarities of words by measuring correlations among distribution of them within one image. Furthermore, the authors employ label propagation process to spread the affinities among all features. An enhanced codebook is constructed based on fusion of the new similarity matrix with locality preserving projection, which is a linear manifold learning algorithm that can be expanded on both training and testing samples. Experimental results on 15 different scenes and ImageNet show promising performance of importing the novel similarity to dictionary construction. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IET Computer Vision seeks original research papers in a wide range of areas of computer vision. The vision of the journal is to publish the highest quality research work that is relevant and topical to the field, but not forgetting those works that aim to introduce new horizons and set the agenda for future avenues of research in Computer Vision.

Full Aims & Scope