Skip to Main Content
Embedded processors for video image recognition require to address both the cost (die size and power) versus real-time performance issue, and also to achieve high flexibility due to the immense diversity of recognition targets, situations, and applications. This paper describes IMAP, a highly parallel SIMD linear processor and memory array architecture that addresses these trading-off requirements. By using parallel and systolic algorithmic techniques, despite of its simple architecture IMAP achieves to exploit not only the straightforward per image row data level parallelism (DLP), but also the inherent DLP of other memory access patterns frequently found in various image recognition tasks, under the use of an explicit parallel C language (IDC). We describe and evaluate IMAP-CE, a latest IMAP processor, which integrates 128 of 100MHz 8 bit 4-way VLIW PEs, 128 of 2KByte RAMs, and one 16 bit RISC control processor, into a single chip. The PE instruction set is enhanced for supporting IDC codes. IMAP-CE is evaluated mainly by comparing its performance running IDC codes with that of a 2.4GHz Intel P4 running optimized C codes. Based on the use of parallelizing techniques, benchmark results show a speedup of up to 20 for image filter kernels, and of 4 for a full image recognition application.