Skip to Main Content
In this paper we describe a novel approach to applying text-based information retrieval techniques to music collections. We represent tracks with a joint vocabulary consisting of both conventional words, drawn from social tags, and audio muswords, representing characteristics of automatically-identified regions of interest within the signal. We build vector space and latent aspect models indexing words and muswords for a collection of tracks, and show experimentally that retrieval with these models is extremely well-behaved. We find in particular that retrieval performance remains good for tracks by artists unseen by our models in training, and even if tags for their tracks are extremely sparse.