Skip to Main Content
Building models of the structure in musical signals raises the question of how to evaluate and compare different modeling approaches. One possibility is to use the model to impute deliberately removed patches of missing data, then to compare the model's predictions with the part that was removed. We analyze a corpus of popular music audio represented as beat-synchronous chroma features, and compare imputation based on simple linear prediction to more complex models including nearest neighbor selection and shift-invariant probabilistic latent component analysis. Simple linear models perform best according to Euclidean distance, despite producing stationary results which are not musically meaningful. We therefore investigate alternate evaluation measures and observe that an entropy difference metric correlates better with our expectations for musically consistent reconstructions. Under this measure, the best-performing imputation algorithm reconstructs masked sections by choosing the nearest neighbor to the surrounding observations within the song. This result is consistent with the large amount of repetition found in pop music.