I. Introduction
In order to introduce the ability of music listening and cognition into musical expressive robotic agents, which simultaneously generate corporeal motor responses, robot audition algorithms must consider causal and low-cost computations. Besides, they must cope with the signal distortions caused by environmental and robot's ego noises (i.e., motor noises generated during the robot's motion), since these degrade the performance of Music Information Retrieval (MIR) algorithms at the audio signal level.