Skip to Main Content
Fully pipelined parallel architectures are derived for high-throughput and reduced-hardware realization of prime-factor cyclic convolution using hardware-efficient modules for short-length rectangular transform (RT). Moreover, a new approach is proposed for the computation of block pseudocyclic convolution using a block cyclic convolution of equal length along with some correction terms, so that the block pseudocyclic representation of cyclic convolution for non-prime-factor-length (N=rP , when r and P are not mutually prime) could be computed efficiently using the algorithms and architectures of short-length cyclic convolutions. Low-complexity algorithms are derived for efficient computation of those error terms, and overall complexities of the proposed technique are estimated for r=2, 3, 4, 6, 8 and 9. The proposed algorithms are used further to design high-throughput and reduced-hardware structures for cyclic convolution where the cofactors are not relatively prime. The proposed structures for high-throughput implementation are found to offer a reduction of nearly 50%-75% of area-delay product over the existing structures for several convolution-lengths. Low-complexity structures for input/output addition units of short length convolutions are derived and used them along with high-throughput modules for hardware-efficient realization of multifactor convolution, which offers nearly 25%-75% reduction of area-delay complexity over the existing structures for various non-prime-factor length convolutions.
Circuits and Systems for Video Technology, IEEE Transactions on (Volume:18 , Issue: 10 )
Date of Publication: Oct. 2008