Skip to Main Content
Automatic speech recognition (ASR) is one indispensable technology to communicate with a service robot. In real-world environments, ASR faces many kinds of sound sources and they should be discriminated to improve ASR performance. In ASR systems, speech is usually detected from the input signal by voice activity detection (VAD) scheme. Speech and music, how ever, are not easily discriminated by the VAD because they share similar characteristics such as periodicity. In this paper, we adopt a speech/music discriminator into the front-end of the ASR system in order to disable music stream not to be an input for the ASR system. Our speech/music discriminator employs the mean of minimum cepstral distances (MMCD) as a feature parameter. Experimental result shows the MMCD parameter outperforms the conventional feature parameter, spectral flux.