Skip to Main Content
Speech recognition systems must still be improved when they are exposed to noisy environments. For this improvement, developments of the standard evaluation corpus and assessment technologies are essential. Recently, the AURORA-2,3 corpus and their evaluation scenarios have had significant impact on noisy speech recognition research. This paper introduces a Japanese noisy speech corpus and its evaluation scripts, called AURORA-2J The AURORA-2J is a Japanese connected digits corpus. The data collection and evaluation scenarios are designed in the same way as AURORA-2 with the help of the ETSI AURORA group. Furthermore, we have collected an in-car speech corpus similar to AURORA-3. The in-car speech corpus includes Japanese connected digits and command words collected in a moving car. This paper describes the data collection, baseline scripts, and its baseline performance.