I. Introduction
Inspired by the structural organization of the human visual system that tightly combines the eye retina and the visual cortex, the vision chip [1] integrates an image sensor and massively parallel computing elements on a single silicon substrate, to form a compact real-time visual system-on-chip (SoC), suitable for end-to-end cognitive visual perception in edge scenarios. Nowadays, current reported vision chips achieve intelligent edge processing with the fast development of deep neural networks (DNN) [2]. Simultaneously, the increasing image resolution and imaging modes, including 2D gray imaging and 3D depth imaging, can improve the capability and intelligent level of vision chips. However, it requires vision chips to have a high throughput and flexible processing capability to handle this massive and multi-type visual information.