Skip to Main Content
In this paper a high-performance data-path architecture is proposed for synthesizing DSP kernels. The data-path primitive resources are identical small templates. The steering logic allows the control-unit to quickly implement desirable templates so that system's performance benefits from chaining of operations. The small number of these data-path computational resources coupled with their simple structure allows for chaining and latency reduction over existing methods with template-based computational resources. Data flow graph scheduling and binding is accommodated by efficient algorithms that achieve minimum latency at the expense of a negligible overhead to the control circuit and the clock period. Compared with data paths implemented by primitive resources, a reduction in latency is achieved when the proposed architecture is adopted. Also, efficiency in terms of chaining exploitation over existing template-based architectures is shown.