The performance gap between processor and memory is very serious problem in high-performance computing because effective performance is limited by memory ability. In order to overcome this problem, it is indispensable to make good use of wide on-chip memory bandwidth. For this purpose, architecture and compiler co-optimization is a promising approach because most data access is regular and/or predictable in high performance computing. Thus, we propose a new VLSI architecture called SCIMA as a platform of the co-optimization. SCIMA integrates software controllable memory (SCM) into a processor chip in addition to ordinary data cache. SCM and cache can be reconfigured by software during computation. Hence, the memory hierarchy itself is the target of compiler optimization. In this sense, architecture and compiler co-optimization is realized in SCIMA. Towards the co-optimization, we have developed a directive-based compiler and an algorithm of SCM usage to insert directives automatically. In this paper, we present the directives and the outline of the algorithm for automatic optimization.