Recent work has shown that pipelining and multiple instruction issuing are architecturally equivalent in their abilities to exploit parallelism, but there has been little work directly comparing the performance of these fine-grain parallel architectures with that of the coarse-grain multiprocessors. Using trace-driven simulations, the authors compare the performance of a superscalar processor and a pipelined processor using dynamic dependence checking with that of a shared memory multiprocessor. For very parallel programs, they find that the fine-grain processors must bypass an unrealistically large number of branches to match the performance of the multiprocessor. When executing programs with a wide range of potential parallelism, the best performance is obtained using a multiprocessor where each individual processor has a fine-grain parallelism of two to four
Published in:
System Sciences, 1991. Proceedings of the Twenty-Fourth Annual Hawaii International Conference on
(Volume:i
)
Date of Conference: 8-11 Jan 1991