Skip to Main Content
FPGAs offer a promising platform for the implementation of artificial neural networks (ANNs) and their training, combining the use of custom optimized hardware with low cost and fast development time. However, purely hardware realizations tend to focus on throughput, resorting to restrictions on applicable network topology or low-precision data representation, whereas flexible solutions allowing a wide variation of network parameters and training algorithms are usually restricted to software implementations. This paper proposes a mixed approach, introducing a system-on-chip (SoC) implementation where computations are carried out by a high efficiency neural coprocessor with a large number of parallel processing elements. System flexibility is provided by on-chip software control and the use of floating-point arithmetic, and network parallelism is exploited through replicated logic and application-specific coprocessor architecture, leading to fast training time. Performance results and design limitations and trade-offs are discussed.