Skip to Main Content
This paper demonstrates the design of efficient asynchronous bundled-data pipelines for the matrix-vector multiplication core of discrete cosine transforms (DCTs). The architecture is optimized for both zero and small-valued data, typical in DCT applications, yielding both high average performance and low average power. The proposed bundled-data pipelines include novel data-dependent delay lines with integrated control circuitry to efficiently implement speculative completion sensing. The control circuits are based on a novel control-circuit template that simplifies the design of such nonlinear pipelines. Extensive post-layout back-end timing analysis was performed to gain confidence in the timing margins as well as to quantify performance and energy. Comparison with a synchronous counterpart suggests that our best asynchronous design yields 30% higher average throughput with negligible energy overhead.