Skip to Main Content
This paper presents comprehensive energy-throughput comparisons of two well-known asynchronous design styles applied to a matrix-vector multiplication core of the discrete cosine transforms (DCT). The first design style, bundled-data pipelines, uses a single-rail synchronous datapath with recently proposed true-four-phase controllers integrated with data-dependent delay lines. The design achieves reasonably-high average performance and very low energy but requires significant design effort to verify the two-sided timing constraints (set-up and hold) typical of bundled-data pipelines. The second design style, 2D QDI pipelines, consists of a network of small communicating cells communicating through delay-insensitive 1-of-N encoded channels. Compared to the bundled-data counterpart, transistor-level simulations show that all QDI designs achieve higher throughput at the cost of larger area and energy and in particular have 22% better Eτ2 metric. In addition, the QDI designs require less design effort than the bundled-data counterpart, because they require virtually no timing verification.