On Wednesday, July 29th, IEEE Xplore will undergo scheduled maintenance from 7:00-9:00 AM ET (11:00-13:00 UTC). During this time there may be intermittent impact on performance. We apologize for any inconvenience.
Looking forward to the next generation of mobile streaming computing, the demanded energy efficiency of end-user terminals will become ever stringent. The Xetal-Pro processor, which is the latest member of the Xetal low-power single-instruction multiple data (SIMD) processor family from Philips, is presented in this paper. The predecessor of Xetal-Pro, known as Xetal-II, already ranks as one of the most computational-efficient [in terms of giga operations per second (GOPS)/Watt] processors available today, however, it cannot yet achieve the demanded energy efficiency (less than 1 pJ per operation). Unlike Xetal-II, Xetal-Pro supports ultrawide supply voltage (Vdd) scaling from the nominal supply to the subthreshold region. Although aggressive Vdd scaling causes severe throughput degradation, this can be partly compensated for by the massive parallelism in the Xetal family. Xetal-II includes a large on-chip frame memory (FM), which cannot be scaled well to an ultralow Vdd hence creating a big obstacle to increase energy efficiency. Therefore, we investigate both different FM realizations and memory organization alternatives. A hybrid memory system (HMS), which reduces the non-local memory traffic and enables further Vdd scaling, is proposed. For design space exploration of the right number of the scratchpad memory (SM) entries, the corresponding data locality analysis is provided, too. Moreover, some unique circuit implementation issues of Xetal-Pro such as the customized level-shifter are also discussed. Compared to Xetal-II operating at the nominal voltage, Xetal-Pro provides up to two times energy efficiency improvement even without Vdd scaling (essentially a consequence of data localization in the SM) when delivering the same amount of ultrahigh throughput. With Vdd scaling into the sub/near threshold region, Xetal-Pro could gain more than ten times energy reduction while still delivering a high throughput of 0.69 GOPS (- ounting multiply and add operations only). The new insight of Xetal-Pro sheds light on the direction of future ultralow-energy SIMD processors.