I. Introduction
Modern vector processors have significant advantages over commodity-based scalar processors for memory-intensive scientific applications [1]. However, vector processors still keep a single core architecture, though chip multiprocessors (CMPs) have become the mainstream in recent processor architectures. To realize more efficient and powerful computations on a vector processor, CMP architectures should be applied to vector processor design in the near future. Since the computational efficiency of vector processors relies on their high memory bandwidth, a novel memory design that provides each vector core on a chip with a sufficiently-high memory bandwidth is strongly required to keep the high computational efficiency on a chip multi-vector processor (CMVP).