Skip to Main Content
In this paper we consider two traditional metrics for evaluating the performance in automatic image annotation, the normalised score (NS) and the precision/recall (PR) statistics, particularly in connection with a de facto standard 5000 Corel image benchmark annotation task. We also motivate and describe a third performance measure, de-symmetrised termwise mutual information (DTMI), as a principled compromise between the two traditional extremes. In addition to discussing the measures theoretically, we correlate them experimentally for a family of annotation system configurations derived from the PicSOM image content analysis framework. Looking at the obtained performance figures, we notice that such kind of a system based on the fusion of numerous global image features clearly outperforms the considered methods in the literature.