This paper presents a parallel FDFM (Few DSP blocks and Few block RAMs) processor core approach for implementing a perceptron. In our new approach, a perceptron is implemented a processor core using few DSPs and few block RAMs in the FPGA. This approach is promising because we can obtain high throughput using multiple FDFM cores that work in parallel. Also, even if the FPGA does not have enough remaining space for a perceptron, we can implement it using only few DSP slices and few block RAMs. We have implemented 150 processor cores for perceptrons in a Xilinx Virtex-4 family FPGA XC4SX35-10FF668. The implementation results show that 150 processor cores can be implemented in the FPGA using 150 DPS48 slices, 190 block RAMs, and 11679 slices. It runs in 161.546MHz clock frequency and a single evaluation of 96 nodes perceptron can be performed $10.959times 10^6$ times per second. We have also implemented in the FPGA board of the Virtex-4 Xtreme DSP development kit and confirmed that our 150 processor cores work correctly.
Published in:
Networking and Computing (ICNC), 2011 Second International Conference on
Date of Conference: Nov. 30 2011-Dec. 2 2011