Skip to Main Content
There has been recent progress on the problem of recognizing specific objects in very large datasets. The most common approach has been based on the bag-of-words (BOW) method, in which local image features are clustered into visual words. This can provide significant savings in memory compared to storing and matching each feature independently. In this paper we take an additional step to reducing memory requirements by selecting only a small subset of the training features to use for recognition. This is based on the observation that many local features are unreliable or represent irrelevant clutter. We are able to select Â¿usefulÂ¿ features, which are both robust and distinctive, by an unsupervised preprocessing step that identifies correctly matching features among the training images. We demonstrate that this selection approach allows an average of 4% of the original features per image to provide matching performance that is as accurate as the full set. In addition, we employ a graph to represent the matching relationships between images. Doing so enables us to effectively augment the feature set for each image through merging of useful features of neighboring images. We demonstrate adjacent and 2-adjacent augmentation, both of which give a substantial boost in performance.