Skip to Main Content
The goal of this work is to build an audio information retrieval system which provides users with flexibility in formulating their queries: from audio examples to naïve text. Specifically, the focus of this paper is on using naïve text to create input queries describing the desired information of the users. Using naïve text queries, however, raises interoperability issues between annotation and retrieval processes due to the wide variety of available audio descriptions. In this paper, we propose an intermediate audio description layer (iADL) to solve the interoperability issues between the annotation and retrieval processes. The iADL comprises two axes corresponding to semantic and onomatopoeic descriptions based on human-to-human communication experiments on how humans express sounds verbally. Various text modeling schemes, such as latent semantic analysis (LSA) and latent topic model, are utilized to transform the naïve text onto the proposd iADL.