Skip to Main Content
Convolutional networks (ConvNets) are biologically inspired hierarchical architectures that can be trained to perform a variety of detection, recognition and segmentation tasks. ConvNets have a feed-forward architecture consisting of multiple linear convolution filters interspersed with pointwise non-linear squashing functions. This paper presents an efficient implementation of ConvNets on a low-end DSP-oriented field programmable gate array (FPGA). The implementation exploits the inherent parallelism of ConvNets and takes full advantage of multiple hardware multiply accumulate units on the FPGA. The entire system uses a single FPGA with an external memory module, and no extra parts. A network compiler software was implemented, which takes a description of a trained ConvNet and compiles it into a sequence of instructions for the ConvNet Processor (CNP). A ConvNet face detection system was implemented and tested. Face detection on a 512 times 384 frame takes 100 ms (10 frames per second), which corresponds to an average performance of 3.4 times 109 connections per second for this 340 million connection network. The design can be used for low-power, lightweight embedded vision systems for micro-UAVs and other small robots.