By Topic

A 280 mV-to-1.1 V 256b Reconfigurable SIMD Vector Permutation Engine With 2-Dimensional Shuffle in 22 nm Tri-Gate CMOS

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

7 Author(s)
Hsu, S.K. ; Circuit Res. Lab., Intel Corp., Hillsboro, OR, USA ; Agarwal, A. ; Anders, M.A. ; Mathew, S.K.
more authors

An ultra-low voltage reconfigurable 4-way to 32-way SIMD vector permutation engine is fabricated in 22 nm tri-gate bulk CMOS, consisting of a 32-entry × 256b 3-read/1-write ported register file with a 256b byte-wise any-to-any permute crossbar for 2-dimensional shuffle. The register file integrates a vertical shuffle across multiple entries into read/write operations, and includes clock-less static reads with shared P/N dual-ended transmission gate (DETG) writes, improving register file VMIN by 250 mV across PVT variations with a wide dynamic operating range of 280 mV-1.1 V. The permute crossbar implements an interleaved folded byte-wise multiplexer layout forming an any-to-any fully connected tree to perform a horizontal shuffle with permute accumulate circuits, and includes vector flip-flops, stacked min-delay buffers, shared gates, and ultra-low voltage split-output (ULVS) level shifters improving logic VMIN by 150 mV, while enabling peak energy efficiency of 585 GOPS/W measured at 260 mV, 50 °C. The permutation engine achieves: (i) nominal register file performance of 1.8 GHz, 106 mW measured at 0.9 V, 50 °C, (ii) robust register file functionality measured down to 280 mV with peak energy efficiency of 154 GOPS/W, (iii) scalable permute crossbar performance of 2.9 GHz, 69 mW measured at 1.1 V, 50 °C with sub-threshold operation at 240 mV, 10 MHz consuming 19 μW, and (iv) a 64b 4 × 4 matrix transpose algorithm and AoS to SoA conversion with 40%-53% energy savings and 25%-42% improved peak throughput measured at 1.8 GHz, 0.9 V.

Published in:

Solid-State Circuits, IEEE Journal of  (Volume:48 ,  Issue: 1 )