Skip to Main Content
Due to the visually polysemous barrier, videos and images may be annotated by multiple tags. Discovering the correlations among different tags can significantly help predicting precise labels for videos and images. Many of recent studies toward multi-label learning construct a linear subspace embedding with encoded multi-label information, such that data points sharing many common labels tend to be close to each other in the embedded subspace. Motivated by the advances of compressive sensing research, a sparse representation that selects a compact subset to describe the input data can be more discriminative. In this paper, we propose a sparse multi-label learning method to circumvent the visually polysemous barrier of multiple tags. Our approach learns a multi-label encoded sparse linear embedding space from a related dataset, and maps the target data into the learned new representation space to achieve better annotation performance. Instead of using l1-norm penalty (lasso) to induce sparse representation, we propose to formulate the multi-label learning as a penalized least squares optimization problem with elastic-net penalty. By casting the video concept detection and image annotation tasks into a sparse multi-label transfer learning framework in this paper, ridge regression, lasso, elastic net, and the multi-label extended sparse discriminant analysis methods are, respectively, well explored and compared.