By Topic

Extending the Performance and Energy-Efficiency of Shared Memory Multicores with Nanophotonic Technology

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Morris, R. ; Sch. of Electr. Eng. & Comput. Sci., Ohio Univ., Athens, OH, USA ; Jolley, E. ; Kodi, A.K.

As the number of cores increases exponentially on a single chip, the design and integration of both the on-chip network facilitating intercore communication, and the cache coherence protocol for enabling shared memory programming have become critical for improved energy-efficiency and overall chip performance. With traditional metal interconnects facing stringent energy constraints, researchers are currently pursuing disruptive solutions such as nanophotonics for improved energy-efficiency. Cache coherence in multicores can be enforced effectively by snoopy protocols; however, broadcasting every cache miss can limit the scalability while consuming excess energy. In this paper, we propose PULSE, a nanophotonic broadcast tree-based network for snoopy cache coherent multicores. To limit the energy-penalty from broadcasting (and thereby splitting) optical signals, we direct the optical signal from the external laser such that only the subset of requesters can receive the optical signal. Furthermore, as cache blocks are shared by a few cores, we propose a multicast version of PULSE called multi-PULSE that predicts the sharers' for each L2 miss and morphing the broadcast to a multicast network. We evaluate the energy and performance using CACTI and SIMICS on 16-core and 64-core versions of PULSE and multi-PULSE for Splash-2, PARSEC, and SPEC CPU2006 benchmarks and compare to electrical networks, optical networks, and another cache filtering techniques. Our results indicate that PULSE outperforms competitive electrical/optical networks by 60 percent in terms of execution time, and multi-PULSE reduces average energy from 10 to 80 percent even with a few mispredictions.

Published in:

Parallel and Distributed Systems, IEEE Transactions on  (Volume:25 ,  Issue: 1 )