Skip to Main Content
Content-based multimedia retrieval faces many challenges such as semantic gap, imbalanced data, and varied qualities of the media. Feature selection as a component of the retrieval process plays an important role. The aim of feature selection is to identify a subset of features by removing irrelevant or redundant features. An effective subset of features can not only improve model performance and reduce computational complexity, but also enhance semantic interpretability. To achieve these objectives, in this paper, a novel metric that integrates the correlation and reliability information between each feature and each class obtained from Multiple Correspondence Analysis (MCA) is proposed to score the features for feature selection. Based on these scores, a ranked list of features can be generated and different selection criteria can be adopted to select a subset of features. To evaluate the proposed framework, four other well-known feature selection methods, namely information gain, chi-square measure, correlation-based feature selection, and relief are compared with the proposed method over five popular classifiers using the benchmark data from TRECVID 2009 high-level feature extraction task. The results show that the proposed method outperforms the other methods in terms of classification accuracy, the size of feature subspace, and the ability to capture the semantic information.