By Topic

Near fine grain parallel processing using static scheduling on single chip multiprocessors

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
K. Kimura ; Dept. of Electr., Electron. & Comput. Eng., Waseda Univ., Tokyo, Japan ; H. Kasahara

With the increase of the number of transistors integrated on a chip, efficient use of transistors and scalable improvement of effective performance of a processor are getting important problems However, it has been thought that popular superscalar and VLIW world have difficult to obtain scalable improvement of effective performance in future because of the limitation of instruction level parallelism. To cope with this problem, a single chip multiprocessor (SCM) approach with multi grain parallel processing inside a chip, which hierarchically exploits loop parallelism and coarse grain parallelism among subroutines, loops and basic blocks in addition to instruction level parallelism, is thought one of the most promising approaches. This paper evaluates effectiveness of the single chip multiprocessor architectures with a shared cache, global registers, distributed shared memory and/or local memory for near fine grain parallel processing as the first step of research on SCM architecture to support multi grain parallel processing. The evaluation shows OSCAR (Optimally Scheduled Advanced Multiprocessor) architecture having distributed shared memory and local memory in addition to centralized shared memory and attachment of global register gives us significant speed rip such as 13.8% to 143.8% for four processors compared with shared cache architecture for applications which have been difficult to extract parallelism effectively

Published in:

Innovative Architecture for Future Generation High-Performance Processors and Systems, 1999. International Workshop

Date of Conference:

Dec 2000