I. Introduction
Perceiving the depth of the scene is crucial for robotic tasks such as manipulation [1], [2], auto-pilot [3], [4], and navigation [5]. Although state-of-the-art sensors including Time-of-Flight (ToF) cameras, structured light cameras or Light Detection And Ranging (LiDAR) can serve as the quick solutions for depth sensing, their outputs are relatively sparse and may contain defects for many reasons, which is a well-known issue in the robotics community [6]. Stereo RGB cameras, on the other hand, provide a good alternative. It is probably safe to say that compared with the counterparts, they can adapt to highly dynamic situations because they project no light but merely receiving them passively, and thus will not fail due to missing requisite feedback.