Skip to Main Content
In our previous work, we proposed a time series analysis framework that detects outlying subsequences from an input time series to bring out events of interest in sports and surveillance audio . The input time series in this framework could consist of mid-level audio classification labels or low-level cepstral features. In this paper, we present an algorithm using kernel alignment to merge the segmentation results of these two time series representations for the same content. The algorithm first finds an optimal kernel bandwidth parameter (Sigma) that aligns the similarity matrices obtained from the low-level and the mid-level time series. Then, it uses the gain in kernel alignment as a measure to further match the segmentations. Our results with sports audio show that the proposed algorithm combines the advantages of both low and mid-level time series, by suppressing irrelevant patterns while maintaining sufficient information for discovering key-audio classes.