Skip to Main Content
In this paper, we study computational models and techniques to merge textual and image features to classify multimedia documents into semantically meaningful groups. A vector-based framework is used to index documents on the basis of textual, pictorial and composite (textual-pictorial) information. The scheme makes use of weighted document terms and color invariant image features to obtain a high-dimensional image descriptor in vector form to be used as an index. Based on supervised learning, a classifier is used to organize the multimedia documents. Due to space limitations, in this paper, we focus on the application of classifying/finding pictures of people on the Internet. Performance evaluations are reported on the accuracy of merging textual and pictorial information for classification.