By Topic

Fast and cycle-accurate modeling of a multicore processor

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Asif Khan ; Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA ; Muralidaran Vijayaraghavan ; Silas Boyd-Wickizer ; Arvind

An ideal simulator allows an architect to swiftly explore design alternatives and accurately determine their impact on performance. Design exploration requires simulators to be easily modifiable, and accurate performance estimates require detailed models. Unfortunately, detailed modeling not only impacts the ease with which a simulator can be modified, but also the speed at which it can be executed, resulting in fidelity being traded for simulation speed. Although FPGA-based simulators have dramatically higher speed than software simulators, sacrificing fidelity is still common. In this paper we present Arete, an FPGA-based processor simulator, which offers high performance along with accuracy and modifiability. We begin with a cycle-level specification of a multicore architecture which includes realistic in-order cores and detailed models of shared, coherent memory and on-chip network. We then describe how this specification is implemented faithfully and efficiently on FPGAs. Arete delivers a performance of up to 11 MIPS per core. We run a subset of the PARSEC benchmark suite on top of off-the-shelf SMP Linux, and achieve an average performance of 55 MIPS for an 8-core model.We also describe two significant architectural explorations: one involving three different branch predictors and the other requiring major modifications to the cache-coherence protocol.

Published in:

Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on

Date of Conference:

1-3 April 2012