I. Introduction
Device-free target tracking and gesture recognition have attracted increasing attention for the convenience of human-machine interaction. Target tracking based on acoustic signals is preferred over vision-based solutions with lighting requirements and RF-based ones with coarse-grained resolution and lower accuracy. However, existing acoustic approaches based on portable devices, e.g., smartphones, only track a single target, which therefore exclusively forbids the movement of any other objects except the aimed single target.