1. Introduction
Recent processors improve their performance by using a large number of cores and very wide SIMD instructions. Intel's latest Xeon Phi processor (Knights Landing) [1] supports over 60 cores and AVX512 with a vector length of 512 bits. ARM also recently announced a vector instruction set named SVE (Scalable Vector Extension) [2]–[4]. It uniformly supports vector lengths from 128 bits to 2048 bits in units of 128 bits, enabling assembler programs to be independent of vector length. It was announced that the Japanese flagship supercomputer, post-K computer, will adopt ARM SVE as the SIMD instruction set for its manycore processor.