Skip to Main Content
Effective utilization of cache memories is a key factor in achieving high performance for computing large signal transforms. Nonunit stride access in the computation of large signal transforms results in poor cache performance, leading to severe degradation in the overall performance. In this paper, we develop a cache-conscious technique, called a dynamic data layout, to improve the performance of large signal transforms. In our approach, data reorganization is performed between computation stages to reduce cache misses. We develop an efficient search algorithm to determine an optimal tree with the minimum execution time among possible factorization trees based on the size of the signal transform and the data access stride. Our approach is applied to compute the fast Fourier transform (FFT) and the Walsh-Hadamard transform (WHT). Experiments were performed on Alpha 21264, MIPS R10000, UltraSPARC III, and Pentium 4. Experimental results show that our FFT and WHT achieve performance improvement of up to 3.52 times over other state-of-the-art FFT and WHT packages. The proposed optimization is portable across various platforms.