Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). We apologize for the inconvenience.
By Topic

Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on

Date 20-25 June 2011

Filter Results

Displaying Results 1 - 25 of 439
  • Efficient marginal likelihood optimization in blind deconvolution

    Publication Year: 2011 , Page(s): 2657 - 2664
    Cited by:  Papers (50)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (389 KB) |  | HTML iconHTML  

    In blind deconvolution one aims to estimate from an input blurred image y a sharp image x and an unknown blur kernel k. Recent research shows that a key to success is to consider the overall shape of the posterior distribution p(x, ky) and not only its mode. This leads to a distinction between MAPx, k strategies which estimate the mode pair x, k and often lead to undesired results, and MAPk strategies which select the best k while marginalizing over all possible x images. The MAPk principle is significantly more robust than the MAPx, k one, yet, it involves a challenging marginalization over latent images. As a result, MAPk techniques are considered complicated, and have not been widely exploited. This paper derives a simple approximated MAPk algorithm which involves only a modest modification of common MAPx, k algorithms. We show that MAPk can, in fact, be optimized easily, with no additional computational complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Natural image denoising: Optimality and inherent bounds

    Publication Year: 2011 , Page(s): 2833 - 2840
    Cited by:  Papers (21)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (354 KB) |  | HTML iconHTML  

    The goal of natural image denoising is to estimate a clean version of a given noisy image, utilizing prior knowledge on the statistics of natural images. The problem has been studied intensively with considerable progress made in recent years. However, it seems that image denoising algorithms are starting to converge and recent algorithms improve over previous ones by only fractional dB values. It is thus important to understand how much more can we still improve natural image denoising algorithms and what are the inherent limits imposed by the actual statistics of the data. The challenge in evaluating such limits is that constructing proper models of natural image statistics is a long standing and yet unsolved problem. To overcome the absence of accurate image priors, this paper takes a non parametric approach and represents the distribution of natural images using a huge set of 1010 patches. We then derive a simple statistical measure which provides a lower bound on the optimal Bayesian minimum mean square error (MMSE). This imposes a limit on the best possible results of denoising algorithms which utilize a fixed support around a denoised pixel and a generic natural image prior. Our findings suggest that for small windows, state of the art denoising algorithms are approaching optimality and cannot be further improved beyond ~ 0.1dB values. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Sobolev-type metric for polar active contours

    Publication Year: 2011 , Page(s): 1017 - 1024
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2681 KB) |  | HTML iconHTML  

    Polar object representations have proven to be a powerful shape model for many medical as well as other computer vision applications, such as interactive image segmentation or tracking. Inspired by recent work on Sobolev active contours we derive a Sobolev-type function space for polar curves. This so-called polar space is endowed with a metric that allows us to favor origin translations and scale changes over smooth deformations of the curve. Moreover, the resulting curve flow inherits the coarse-to-fine behavior of Sobolev active contours and is thus very robust to local minima. These properties make the resulting polar active contours a powerful segmentation tool for many medical applications, such as cross-sectional vessel segmentation, aneurysm analysis, or cell tracking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-target tracking by continuous energy minimization

    Publication Year: 2011 , Page(s): 1265 - 1272
    Cited by:  Papers (33)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2481 KB) |  | HTML iconHTML  

    We propose to formulate multi-target tracking as minimization of a continuous energy function. Other than a number of recent approaches we focus on designing an energy function that represents the problem as faithfully as possible, rather than one that is amenable to elegant optimization. We then go on to construct a suitable optimization scheme to find strong local minima of the proposed energy. The scheme extends the conjugate gradient method with periodic trans-dimensional jumps. These moves allow the search to escape weak minima and explore a much larger portion of the variable-dimensional search space, while still always reducing the energy. To demonstrate the validity of this approach we present an extensive quantitative evaluation both on synthetic data and on six different real video sequences. In both cases we achieve a significant performance improvement over an extended Kalman filter baseline as well as an ILP-based state-of-the-art tracker. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards cross-category knowledge propagation for learning visual concepts

    Publication Year: 2011 , Page(s): 897 - 904
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (439 KB) |  | HTML iconHTML  

    In recent years, knowledge transfer algorithms have become one of most the active research areas in learning visual concepts. Most of the existing learning algorithms focuses on leveraging the knowledge transfer process which is specific to a given category. However, in many cases, such a process may not be very effective when a particular target category has very few samples. In such cases, it is interesting to examine, whether it is feasible to use cross-category knowledge for improving the learning process by exploring the knowledge in correlated categories. Such a task can be quite challenging due to variations in semantic similarities and differences between categories, which could either help or hinder the cross-category learning process. In order to address this challenge, we develop a cross-category label propagation algorithm, which can directly propagate the inter-category knowledge at instance level between the source and the target categories. Furthermore, this algorithm can automatically detect conditions under which the transfer process can be detrimental to the learning process. This provides us a way to know when the transfer of cross-category knowledge is both useful and desirable. We present experimental results on real image and video data sets in order to demonstrate the effectiveness of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Are sparse representations really relevant for image classification?

    Publication Year: 2011 , Page(s): 1545 - 1552
    Cited by:  Papers (44)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (302 KB) |  | HTML iconHTML  

    Recent years have seen an increasing interest in sparse representations for image classification and object recognition, probably motivated by evidence from the analysis of the primate visual cortex. It is still unclear, however, whether or not sparsity helps classification. In this paper we evaluate its impact on the recognition rate using a shallow modular architecture, adopting both standard filter banks and filter banks learned in an unsupervised way. In our experiments on the CIFAR-10 and on the Caltech-101 datasets, enforcing sparsity constraints actually does not improve recognition performance. This has an important practical impact in image descriptor design, as enforcing these constraints can have a heavy computational cost. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Smoothly varying affine stitching

    Publication Year: 2011 , Page(s): 345 - 352
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (15795 KB) |  | HTML iconHTML  

    Traditional image stitching using parametric transforms such as homography, only produces perceptually correct composites for planar scenes or parallax free camera motion between source frames. This limits mosaicing to source images taken from the same physical location. In this paper, we introduce a smoothly varying affine stitching field which is flexible enough to handle parallax while retaining the good extrapolation and occlusion handling properties of parametric transforms. Our algorithm which jointly estimates both the stitching field and correspondence, permits the stitching of general motion source images, provided the scenes do not contain abrupt protrusions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Noise resistant graph ranking for improved web image search

    Publication Year: 2011 , Page(s): 849 - 856
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1309 KB) |  | HTML iconHTML  

    In this paper, we exploit a novel ranking mechanism that processes query samples with noisy labels, motivated by the practical application of web image search re-ranking where the originally highest ranked images are usually posed as pseudo queries for subsequent re-ranking. Availing ourselves of the low-frequency spectrum of a neighborhood graph built on the samples, we propose a graph-theoretical framework amenable to noise resistant ranking. The proposed framework consists of two components: spectral filtering and graph-based ranking. The former leverages sparse bases, progressively selected from a pool of smooth eigenvectors of the graph Laplacian, to reconstruct the noisy label vector associated with the query sample set and accordingly filter out the query samples with less authentic positive labels. The latter applies a canonical graph ranking algorithm with respect to the filtered query sample set. Quantitative image re-ranking experiments carried out on two public web image databases bear out that our re-ranking approach compares favorably with the state-of-the-arts and improves web image search engines by a large margin though we harvest the noisy queries from the top-ranked images returned by these search engines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Real-time human pose recognition in parts from single depth images

    Publication Year: 2011 , Page(s): 1297 - 1304
    Cited by:  Papers (434)  |  Patents (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4782 KB) |  | HTML iconHTML  

    We propose a new method to quickly and accurately predict 3D positions of body joints from a single depth image, using no temporal information. We take an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem. Our large and highly varied training dataset allows the classifier to estimate body parts invariant to pose, body shape, clothing, etc. Finally we generate confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes. The system runs at 200 frames per second on consumer hardware. Our evaluation shows high accuracy on both synthetic and real test sets, and investigates the effect of several training parameters. We achieve state of the art accuracy in our comparison with related work and demonstrate improved generalization over exact whole-skeleton nearest neighbor matching. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online domain adaptation of a pre-trained cascade of classifiers

    Publication Year: 2011 , Page(s): 577 - 584
    Cited by:  Papers (27)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1886 KB) |  | HTML iconHTML  

    Many classifiers are trained with massive training sets only to be applied at test time on data from a different distribution. How can we rapidly and simply adapt a classifier to a new test distribution, even when we do not have access to the original training data? We present an on-line approach for rapidly adapting a “black box” classifier to a new test data set without retraining the classifier or examining the original optimization criterion. Assuming the original classifier outputs a continuous number for which a threshold gives the class, we reclassify points near the original boundary using a Gaussian process regression scheme. We show how this general procedure can be used in the context of a classifier cascade, demonstrating performance that far exceeds state-of-the-art results in face detection on a standard data set. We also draw connections to work in semi-supervised learning, domain adaptation, and information regularization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning effective human pose estimation from inaccurate annotation

    Publication Year: 2011 , Page(s): 1465 - 1472
    Cited by:  Papers (24)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1839 KB) |  | HTML iconHTML  

    The task of 2-D articulated human pose estimation in natural images is extremely challenging due to the high level of variation in human appearance. These variations arise from different clothing, anatomy, imaging conditions and the large number of poses it is possible for a human body to take. Recent work has shown state-of-the-art results by partitioning the pose space and using strong nonlinear classifiers such that the pose dependence and multi-modal nature of body part appearance can be captured. We propose to extend these methods to handle much larger quantities of training data, an order of magnitude larger than current datasets, and show how to utilize Amazon Mechanical Turk and a latent annotation update scheme to achieve high quality annotations at low cost. We demonstrate a significant increase in pose estimation accuracy, while simultaneously reducing computational expense by a factor of 10, and contribute a dataset of 10,000 highly articulated poses. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Shape grammar parsing via Reinforcement Learning

    Publication Year: 2011 , Page(s): 2273 - 2280
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1872 KB) |  | HTML iconHTML  

    We address shape grammar parsing for facade segmentation using Reinforcement Learning (RL). Shape parsing entails simultaneously optimizing the geometry and the topology (e.g. number of floors) of the facade, so as to optimize the fit of the predicted shape with the responses of pixel-level 'terminal detectors'. We formulate this problem in terms of a Hierarchical Markov Decision Process, by employing a recursive binary split grammar. This allows us to use RL to efficiently find the optimal parse of a given facade in terms of our shape grammar. Building on the RL paradigm, we exploit state aggregation to speedup computation, and introduce image-driven exploration in RL to accelerate convergence. We achieve state-of-the-art results on facade parsing, with a significant speed-up compared to existing methods, and substantial robustness to initial conditions. We demonstrate that the method can also be applied to interactive segmentation, and to a broad variety of architectural styles. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parameter learning with truncated message-passing

    Publication Year: 2011 , Page(s): 2937 - 2943
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (698 KB) |  | HTML iconHTML  

    Training of conditional random fields often takes the form of a double-loop procedure with message-passing inference in the inner loop. This can be very expensive, as the need to solve the inner loop to high accuracy can require many message-passing iterations. This paper seeks to reduce the expense of such training, by redefining the training objective in terms of the approximate marginals obtained after message-passing is “truncated” to a fixed number of iterations. An algorithm is derived to efficiently compute the exact gradient of this objective. On a common pixel labeling benchmark, this procedure improves training speeds by an order of magnitude, and slightly improves inference accuracy if a very small number of message-passing iterations are used at test time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Structured light 3D scanning in the presence of global illumination

    Publication Year: 2011 , Page(s): 713 - 720
    Cited by:  Papers (10)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2607 KB) |  | HTML iconHTML  

    Global illumination effects such as inter-reflections, diffusion and sub-surface scattering severely degrade the performance of structured light-based 3D scanning. In this paper, we analyze the errors caused by global illumination in structured light-based shape recovery. Based on this analysis, we design structured light patterns that are resilient to individual global illumination effects using simple logical operations and tools from combinatorial mathematics. Scenes exhibiting multiple phenomena are handled by combining results from a small ensemble of such patterns. This combination also allows us to detect any residual errors that are corrected by acquiring a few additional images. Our techniques do not require explicit separation of the direct and global components of scene radiance and hence work even in scenarios where the separation fails or the direct component is too low. Our methods can be readily incorporated into existing scanning systems without significant overhead in terms of capture time or hardware. We show results on a variety of scenes with complex shape and material properties and challenging global illumination effects. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse shape composition: A new framework for shape prior modeling

    Publication Year: 2011 , Page(s): 1025 - 1032
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (325 KB) |  | HTML iconHTML  

    Image appearance cues are often used to derive object shapes, which is usually one of the key steps of image understanding tasks. However, when image appearance cues are weak or misleading, shape priors become critical to infer and refine the shape derived by these appearance cues. Effective modeling of shape priors is challenging because: 1) shape variation is complex and cannot always be modeled by a parametric probability distribution; 2) a shape instance derived from image appearance cues (input shape) may have gross errors; and 3) local details of the input shape are difficult to preserve if they are not statistically significant in the training data. In this paper we propose a novel Sparse Shape Composition model (SSC) to deal with these three challenges in a unified framework. In our method, training shapes are adaptively composed to infer/refine an input shape. The a-priori information is thus implicitly incorporated on-the-fly. Our model leverages two sparsity observations of the input shape instance: 1) the input shape can be approximately represented by a sparse linear combination of training shapes; 2) parts of the input shape may contain gross errors but such errors are usually sparse. Using L1 norm relaxation, our model is formulated as a convex optimization problem, which is solved by an efficient alternating minimization framework. Our method is extensively validated on two real world medical applications, 2D lung localization in X-ray images and 3D liver segmentation in low-dose CT scans. Compared to state-of-the-art methods, our model exhibits better performance in both studies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Entropy rate superpixel segmentation

    Publication Year: 2011 , Page(s): 2097 - 2104
    Cited by:  Papers (37)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (817 KB) |  | HTML iconHTML  

    We propose a new objective function for superpixel segmentation. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. We present a novel graph construction for images and show that this construction induces a matroid - a combinatorial structure that generalizes the concept of linear independence in vector spaces. The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of ½ for the optimality of the solution. Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coupled information-theoretic encoding for face photo-sketch recognition

    Publication Year: 2011 , Page(s): 513 - 520
    Cited by:  Papers (31)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (577 KB) |  | HTML iconHTML  

    Automatic face photo-sketch recognition has important applications for law enforcement. Recent research has focused on transforming photos and sketches into the same modality for matching or developing advanced classification algorithms to reduce the modality gap between features extracted from photos and sketches. In this paper, we propose a new inter-modality face recognition approach by reducing the modality gap at the feature extraction stage. A new face descriptor based on coupled information-theoretic encoding is used to capture discriminative local face structures and to effectively match photos and sketches. Guided by maximizing the mutual information between photos and sketches in the quantized feature spaces, the coupled encoding is achieved by the proposed coupled information-theoretic projection tree, which is extended to the randomized forest to further boost the performance. We create the largest face sketch database including sketches of 1, 194 people from the FERET database. Experiments on this large scale dataset show that our approach significantly outperforms the state-of-the-art methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Affinity learning on a tensor product graph with applications to shape and image retrieval

    Publication Year: 2011 , Page(s): 2369 - 2376
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (415 KB) |  | HTML iconHTML  

    As observed in several recent publications, improved retrieval performance is achieved when pairwise similarities between the query and the database objects are replaced with more global affinities that also consider the relation among the database objects. This is commonly achieved by propagating the similarity information in a weighted graph representing the database and query objects. Instead of propagating the similarity information on the original graph, we propose to utilize the tensor product graph (TPG) obtained by the tensor product of the original graph with itself. By virtue of this construction, not only local but also long range similarities among graph nodes are explicitly represented as higher order relations, making it possible to better reveal the intrinsic structure of the data manifold. In addition, we improve the local neighborhood structure of the original graph in a preprocessing stage. We illustrate the benefits of the proposed approach on shape and image ranking and retrieval tasks. We are able to achieve the bull's eye retrieval score of 99.99% on MPEG-7 shape dataset, which is much higher than the state-of-the-art algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scene shape from texture of objects

    Publication Year: 2011 , Page(s): 2017 - 2024
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (14393 KB) |  | HTML iconHTML  

    Joint reasoning about objects and 3D scene layout has shown great promise in scene interpretation. One visual cue that has been overlooked is texture arising from a spatial repetition of objects in the scene (e.g., windows of a building). Such texture provides scene-specific constraints among objects, and thus facilitates scene interpretation. We present an approach to: (1) detecting distinct textures of objects in a scene, (2) reconstructing the 3D shape of detected texture surfaces, and (3) combining object detections and shape-from-texture toward a globally consistent scene interpretation. Inference is formulated within the reinforcement learning framework as a sequential interpretation of image regions, starting from confident regions to guide the interpretation of other regions. Our algorithm finds an optimal policy that maps states of detected objects and reconstructed surfaces to actions which ought to be taken in those states, including detecting new objects and identifying new textures, so as to minimize a long-term loss. Tests against ground truth obtained from stereo images demonstrate that we can coarsely reconstruct a 3D model of the scene from a single image, without learning the layout of common scene surfaces, as done in prior work. We also show that reasoning about texture of objects improves object detection. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • What makes a chair a chair?

    Publication Year: 2011 , Page(s): 1529 - 1536
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2739 KB) |  | HTML iconHTML  

    Many object classes are primarily defined by their functions. However, this fact has been left largely unexploited by visual object categorization or detection systems. We propose a method to learn an affordance detector. It identifies locations in the 3d space which “support” the particular function. Our novel approach “imagines” an actor performing an action typical for the target object class, instead of relying purely on the visual object appearance. So, function is handled as a cue complementary to appearance, rather than being a consideration after appearance-based detection. Experimental results are given for the functional category “sitting”. Such affordance is tested on a 3d representation of the scene, as can be realistically obtained through SfM or depth cameras. In contrast to appearance-based object detectors, affordance detection requires only very few training examples and generalizes very well to other sittable objects like benches or sofas when trained on a few chairs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Recovery of corrupted low-rank matrices via half-quadratic based nonconvex minimization

    Publication Year: 2011 , Page(s): 2889 - 2896
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (347 KB) |  | HTML iconHTML  

    Recovering arbitrarily corrupted low-rank matrices arises in computer vision applications, including bioinformatic data analysis and visual tracking. The methods used involve minimizing a combination of nuclear norm and l1 norm. We show that by replacing the l1 norm on error items with nonconvex M-estimators, exact recovery of densely corrupted low-rank matrices is possible. The robustness of the proposed method is guaranteed by the M-estimator theory. The multiplicative form of half-quadratic optimization is used to simplify the nonconvex optimization problem so that it can be efficiently solved by iterative regularization scheme. Simulation results corroborate our claims and demonstrate the efficiency of our proposed method under tough conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image ranking and retrieval based on multi-attribute queries

    Publication Year: 2011 , Page(s): 801 - 808
    Cited by:  Papers (35)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1626 KB) |  | HTML iconHTML  

    We propose a novel approach for ranking and retrieval of images based on multi-attribute queries. Existing image retrieval methods train separate classifiers for each word and heuristically combine their outputs for retrieving multiword queries. Moreover, these approaches also ignore the interdependencies among the query terms. In contrast, we propose a principled approach for multi-attribute retrieval which explicitly models the correlations that are present between the attributes. Given a multi-attribute query, we also utilize other attributes in the vocabulary which are not present in the query, for ranking/retrieval. Furthermore, we integrate ranking and retrieval within the same formulation, by posing them as structured prediction problems. Extensive experimental evaluation on the Labeled Faces in the Wild(LFW), FaceTracer and PASCAL VOC datasets show that our approach significantly outperforms several state-of-the-art ranking and retrieval methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Contextualizing object detection and classification

    Publication Year: 2011 , Page(s): 1585 - 1592
    Cited by:  Papers (39)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (674 KB) |  | HTML iconHTML  

    In this paper, we investigate how to iteratively and mutually boost object classification and detection by taking the outputs from one task as the context of the other one. First, instead of intuitive feature and context concatenation or postprocessing with context, the so-called Contextualized Support Vector Machine (Context-SVM) is proposed, where the context takes the responsibility of dynamically adjusting the classification hyperplane, and thus the context-adaptive classifier is achieved. Then, an iterative training procedure is presented. In each step, Context-SVM, associated with the output context from one task (object classification or detection), is instantiated to boost the performance for the other task, whose augmented outputs are then further used to improve the former task by Context-SVM. The proposed solution is evaluated on the object classification and detection tasks of PASCAL Visual Object Challenge (VOC) 2007 and 2010, and achieves the state-of-the-art performance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fully automated greedy square jigsaw puzzle solver

    Publication Year: 2011 , Page(s): 9 - 16
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2089 KB) |  | HTML iconHTML  

    In the square jigsaw puzzle problem one is required to reconstruct the complete image from a set of non-overlapping, unordered, square puzzle parts. Here we propose a fully automatic solver for this problem, where unlike some previous work, it assumes no clues regarding parts' location and requires no prior knowledge about the original image or its simplified (e.g., lower resolution) versions. To do so, we introduce a greedy solver which combines both informed piece placement and rearrangement of puzzle segments to find the final solution. Among our other contributions are new compatibility metrics which better predict the chances of two given parts to be neighbors, and a novel estimation measure which evaluates the quality of puzzle solutions without the need for ground-truth information. Incorporating these contributions, our approach facilitates solutions that surpass state-of-the-art solvers on puzzles of size larger than ever attempted before. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning photographic global tonal adjustment with a database of input / output image pairs

    Publication Year: 2011 , Page(s): 97 - 104
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3050 KB) |  | HTML iconHTML  

    Adjusting photographs to obtain compelling renditions requires skill and time. Even contrast and brightness adjustments are challenging because they require taking into account the image content. Photographers are also known for having different retouching preferences. As the result of this complexity, rule-based, one-size-fits-all automatic techniques often fail. This problem can greatly benefit from supervised machine learning but the lack of training data has impeded work in this area. Our first contribution is the creation of a high-quality reference dataset. We collected 5,000 photos, manually annotated them, and hired 5 trained photographers to retouch each picture. The result is a collection of 5 sets of 5,000 example input-output pairs that enable supervised learning. We first use this dataset to predict a user's adjustment from a large training set. We then show that our dataset and features enable the accurate adjustment personalization using a carefully chosen set of training photos. Finally, we introduce difference learning: this method models and predicts difference between users. It frees the user from using predetermined photos for training. We show that difference learning enables accurate prediction using only a handful of examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.