Skip to Main Content
The memory reference in digital signal processors (DSP) is among the most costly of operations due to its long latency and substantial power consumption. Previously proposed twiddle-factor-based butterfly grouping methods can effectively minimize memory references due to twiddle factors for implementing any existing fast Fourier transform (FFT) algorithms on DSP. However, the performance of its C implementation on DSP is far behind the corresponding TI assembly benchmark for radix-2 DIF FFT due to limitations of the compiler. In this paper, we propose a hand-coded assembly implementation for the radix-2 DIF FFT algorithm with the twiddle-factor-based butterfly grouping method on a TI TMS320C64× DSP. Experimental results show that for 1024-pt radix-2 DIF FFT, our hand-coded assembly implementation is 8 times faster than the C implementation and slightly faster than the TI assembly benchmark while requiring only 50% of memory references due to twiddle factors compared to the TI assembly benchmark.