Skip to Main Content
While a listener may derive semantic associations for audio clips from direct auditory cues (e.g., hearing “bass guitar”) as well as from “context” (e.g., inferring “bass guitar” in the context of a “rock” song), most state-of-the-art systems for automatic music annotation ignore this context. Indeed, although contextual relationships correlate tags, many auto-taggers model tags independently. This paper presents a novel, generative approach to improve automatic music annotation by modeling contextual relationships between tags. A Dirichlet mixture model (DMM) is proposed as a second, additional stage in the modeling process, to supplement any auto-tagging system that generates a semantic multinomial (SMN) over a vocabulary of tags when annotating a song. For each tag in the vocabulary, a DMM captures the broader context the tag defines by modeling tag co-occurrence patterns in the SMNs of songs associated with the tag. When annotating songs, the DMMs refine SMN annotations by leveraging contextual evidence. Experimental results demonstrate the benefits of combining a variety of auto-taggers with this generative context model. It generally outperforms other approaches to modeling context as well.