Simultaneous multithreaded vector architectures combine the best of data-level and instruction-level parallelism and perform better than either approach could separately. Our design achieves performance equivalent to executing 15 to 26 scalar instructions/cycle for numerical applications
Published in:
Micro, IEEE
(Volume:17
,
Issue:
5
)
Date of Publication: Sep/Oct 1997