The performance of the Astronautics ZS-1, a decoupled access/execute (DAE) processor, is examined. The CPU is composed of two subsystems: an access processor, which handles address generation and fixed-point operations; and an execute processor, which handles floating-point operations. These two systems communicate through a network of queues and operate in a fairly decoupled manner. This architecture exhibits a form of fine-grain parallelism, called slip, that improves performance. Some performance bounds for the ZS-1 are developed. A simple count of resource usage is sufficient to establish a good upper bound on performance for most vector loops. A dependence graph is used to form another bound that is particularly useful for nonvector loops. This two-bound model accounts for compiler characteristics, such as loop unrolling, and for hardware characteristics, such as memory latency. This model is applied to the first 12 Livermore loops and compared to simulation results for a variety of memory systems. This comparison indicates how well the ZS-1 tolerates increased memory latency as a function of slip and provides insights regarding application codes, architectures, and compiler capabilities
Published in:
System Sciences, 1990., Proceedings of the Twenty-Third Annual Hawaii International Conference on
(Volume:i
)
Date of Conference: 2-5 Jan 1990