This paper describes a robot referee for "rock- paper-scissors (RPS)" sound games; the robot decides the winner from a combination of rock, paper and scissors uttered by two or three people simultaneously without using any visual information. In this referee task, the robot has to cope with speech with low signal-to-noise ratio (SNR) due to a mixture of speeches, robot motor noises, and ambient noises. Our robot referee system, thus, consists of two subsystems - a real-time robot audition subsystem and a dialog subsystem focusing on RPS sound games. The robot audition subsystem can recognize simultaneous speeches by exploiting two key ideas; preprocessing consisting of sound source localization and separation with a microphone array, and system integration based on missing feature theory (MFT). Preprocessing improves the SNR of a target sound signal using geometric source separation with a multi-channel post-filter. MFT uses only reliable acoustic features in speech recognition and masks out unreliable parts caused by interfering sounds and preprocessing. MFT thus provides smooth integration between preprocessing and automatic speech recognition. The dialog subsystem is implemented as a system-initiative dialog system for multiple players based on deterministic finite automata. It first waits for a trigger command to start an RPS sound game, controls the dialog with players in the game, and finally decides the winner of the game. The referee system is constructed for Honda ASIMO with an 8-ch microphone array. In the case with two players, we attained a 70% task completion rate for the games on average.