Skip to Main Content
In this paper, we discuss a new approach named Harmonic-Temporal-Timbral Clustering (HTTC) for the analysis of single- channel audio signal of multi-instrument polyphonic music to estimate the pitch, onset timing, power and duration of all the acoustic events and to classify them into timbre categories simultaneously. Each acoustic event is modeled by a harmonic structure and a smooth envelope both represented by Gaussian mixtures. Based on the similarity between these spectro- temporal structures, timbres are clustered to form timbre categories. The entire process is mathematically formulated as a minimization problem for the I-divergence between the HTTC parametric model and the observed spectrogram of the music audio signal to simultaneously update harmonic, temporal and timbral model parameters through the EM algorithm. Some experimental results are presented to discuss the performance of the algorithm.