A key processing step in music-to-score alignment systems is the estimation of the intantaneous match between an audio observation and the score. We here propose a general formulation of this matching measure, using a linear transformation from the symbolic domain to any time-frequency representation of the audio. We investigate the learning of the mapping for several common audio representations, based on a best-fit criterion. We evaluate the effectiveness of our mapping approach with two different alignment systems, on a large database of popular and classical polyphonic music. The results show that the learning procedure significantly improves the precision of the alignments, compared to common heuristic templates used in the literature.
Published in:
Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on
Date of Conference: 16-19 Oct. 2011