Loading web-font TeX/Main/Regular
A Real-Time 2D/3D Perception Visual Vector Processor for 1920 × 1080 High-Resolution High-Speed Intelligent Vision Chips | IEEE Journals & Magazine | IEEE Xplore

A Real-Time 2D/3D Perception Visual Vector Processor for 1920 × 1080 High-Resolution High-Speed Intelligent Vision Chips


Abstract:

Edge computing of reliable multimodal (2D RGB/3D RGB-Depth) data has a wide range of applications. However, many of currently reported visual processors cannot flexibly h...Show More

Abstract:

Edge computing of reliable multimodal (2D RGB/3D RGB-Depth) data has a wide range of applications. However, many of currently reported visual processors cannot flexibly handle multimodal data, e.g., the visual streams of RGB-Depth data. The key challenge exists that these prior visual processors do not come with efficient and unified instruction set architecture (ISA) for both conventional and intelligent cognition on the 2D/3D multimodal sensory data. To fill such a gap, this paper proposes a programmable intelligent visual vector processor compatible with multimodal 2D/3D visual data processing ( 1920\times 1080 -pixel resolution). The processor consists of a reconfigurable processing element (PE) array, a memory access network flexibly configurable to be fine- or coarse-grained, and a high throughput I/O interface. The vectorial PE array with neighbor PE access increases the data reuse rate and parallel computation efficiency, and can implement both convolutional neural networks (CNNs) and conventional image processing algorithms. The proposed ISA is customized and optimally tailored targeting 2D/3D image processing from RGB/Time-of-Flight(ToF) raw data to intelligent inference results. The chip is fabricated in a 55-nm CMOS process. The experimental results showed that the area efficiency, peak performance, and peak throughput of our chip attained as high as 14.41GOPS/mm2, 409.6GOPS, and 9.6Gbps at 200MHz, respectively. The measured processing speeds of this chip on ToF depth reconstruction is 87fps ( 480\times 270 ) or 31 fps( 1920\times 1080 ),on 3D object classification is 219fps ( 256\times 256 ), and on CNN-based 2D object tracking is 36fps ( 256\times 256 ).
Published in: IEEE Transactions on Circuits and Systems I: Regular Papers ( Volume: 71, Issue: 2, February 2024)
Page(s): 740 - 753
Date of Publication: 15 December 2023

ISSN Information:

Funding Agency:


I. Introduction

Inspired by the structural organization of the human visual system that tightly combines the eye retina and the visual cortex, the vision chip [1] integrates an image sensor and massively parallel computing elements on a single silicon substrate, to form a compact real-time visual system-on-chip (SoC), suitable for end-to-end cognitive visual perception in edge scenarios. Nowadays, current reported vision chips achieve intelligent edge processing with the fast development of deep neural networks (DNN) [2]. Simultaneously, the increasing image resolution and imaging modes, including 2D gray imaging and 3D depth imaging, can improve the capability and intelligent level of vision chips. However, it requires vision chips to have a high throughput and flexible processing capability to handle this massive and multi-type visual information.

Contact IEEE to Subscribe

References

References is not available for this document.