Skip to Main Content
This paper describes a technique for audio clips retrieval. The audio clips are modeled using a common universal codebook. The codebook is based on a bag-of-features (BOF). The features extracted from all clips are grouped into clusters using the k-means algorithm. The individual audio clips are modeled by the normalized distribution of the numbers of cluster bins. The latent semantic indexing (LSI) is applied to the feature-audio clip matrix to represent the data in latent semantic space. Then the primary audio clip description is converted to the vector in anchor reference space. Each component of the anchor vector is a probabilistic similarity between this clip and the clip corresponding to the considered component. Then LSI is applied to new feature-audio clip matrix, mapping the data to the latent semantic space based on anchor representation. For audio retrieval the nearest-neighbor (NN) algorithm is exploited. The described algorithm demonstrates high retrieval performance.