Recent advances in technology make it possible to integrate multiple processors into a single chip to build high performance parallel programmable digital signal processors (PPDSPs). These processors are expected to replace many dedicated digital signal processors to implement important image/signal processing algorithms such as discrete cosine transform (DCT). The paper addresses the issue of how to compare fast 2D-DCT algorithms when they are implemented on a PPDSP. Previously, the efficiency of these algorithms is compared based on the number of operations. This comparison is reasonable when these algorithms are implemented on a dedicated DSP. However, this comparison may not be suitable for general-purpose PPDSPs. The paper proposes to use three parameters, the number of data accesses, the number of communications, and the distance of communications, as new criterion for performance comparison of DCT algorithms. An algorithm-level technique is developed to estimate these parameters for DCT algorithms. The comparison results based on the parameters show that the algorithm proposed by Cho and Lee (1991) might be the best choice for a PPDSP unless it requires large overhead for communication between remote processors. In this case, the conventional row-column method with a fast 1D-DCT algorithm might be the most efficient
Published in:
High Performance Computing on the Information Superhighway, 1997. HPC Asia '97
Date of Conference: 28 Apr-2 May 1997