Skip to Main Content
This paper presents our experiences in building a system for transcription of polyphonic piano music. By transcription we mean the conversion of an audio recording of a polyphonic piano performance to a series of notes and their starting times. Our final goal is to build a transcription system that would transcribe polyphonic piano music over the entire piano range and with large polyphony. The system consists of three main stages. We first use a cochlear model based on the gammatone filterbank to transform an audio signal of a piano performance into time-frequency space. In the second stage we use a network of coupled adaptive oscillators to extract partial tracks from the output of the cochlear model and in the third stage we employ artificial neural networks acting as pattern recognisers to extract notes from the output of the oscillator network. The system uses several networks each trained to recognize the occurrence of a specific note in the input signal.