Skip to Main Content
We investigate a data-driven approach to the analysis and transcription of polyphonic music, using a probabilistic model which is able to find sparse linear decompositions of a sequence of short-term Fourier spectra. The resulting system represents each input spectrum as a weighted sum of a small number of "atomic" spectra chosen from a larger dictionary; this dictionary is, in turn, learned from the data in such a way as to represent the given training set in an (information theoretically) efficient way. When exposed to examples of polyphonic music, most of the dictionary elements take on the spectral characteristics of individual notes in the music, so that the sparse decomposition can be used to identify the notes in a polyphonic mixture. Our approach differs from other methods of polyphonic analysis based on spectral decomposition by combining all of the following: a) a formulation in terms of an explicitly given probabilistic model, in which the process estimating which notes are present corresponds naturally with the inference of latent variables in the model; b) a particularly simple generative model, motivated by very general considerations about efficient coding, that makes very few assumptions about the musical origins of the signals being processed; and c) the ability to learn a dictionary of atomic spectra (most of which converge to harmonic spectral profiles associated with specific notes) from polyphonic examples alone-no separate training on monophonic examples is required.