Skip to Main Content
This paper presents a performance-optimized version of the flexible triangle (FTS) block-matching search algorithm. The FTS is a fast block-matching algorithm for motion estimation proposed in previous work that, given a block of pixels, is used to search for the best-matching block in a given search area using only a selected subset of available positions rather than searching all available positions as done by full search algorithm which is computationally very expensive. Further analysis to previous FPGA implementation of the FTS indicates that additional parallelism can be employed to improve the overall processing time of the FTS algorithm. In addition to this, investigating the performance bottlenecks and redesigning some of the used hardware modules can increase the maximum supported frequency for the entire FTS FPGA implementation. The proposed design changes were implemented in VHDL and synthesized for using Xilinx virtex-5. Simulation results indicate that the proposed implementation reduced the average number of cycles required to process a block by 17%. Moreover, synthesis results indicate that the proposed design is able to increase the maximum supported frequency by around 38% compared to the previous FPGA implementation of the FTS algorithm. Consequently, the maximum supported frame rate has been increased by around 66%.