Skip to Main Content
Recently, bag of words (BoW) model has led to many significant results in visual object classification. However, due to the limited descriptive and discriminative ability of visual words, the resulting performance of visual object classification is still incomparable to its analogy in text domain, i.e. document categorization. Furthermore, for weakly labeled image data, where we only know whether an object is present or not, traditional learning based methods may suffer from background clutters and large appearance variations. To address these issues, we propose a novel visual phrase based Multiple Instance Learning (MIL) method. In this method, the visual phrase is first generated from over-segmented image regions of homogeneous appearance and visual words within each region, which may provide enhanced descriptive ability by enforcing the spatial coherency. Then a MIL algorithm is applied to efficiently learn from the weakly labeled image data. The experiments on benchmark datasets show that our proposed method always significantly outperforms several state-of-the-art algorithms, such as Spatial Pyramid Matching (SPM) and Spatial-LTM.