Skip to Main Content
Media processing has motivated strong changes in the focus and design of processors. These applications are composed of heterogeneous regions of code, some of them with high levels of DLP and other ones with only modest amounts of ILP. A common approach to deal with these applications are μSIMD-VLIWprocessors. However, the ILP regions fail to scale when we increase the width of the machine, which, on the other hand, is desired to achieve high performance in the DLP regions. In this paper, we propose and evaluate adding vector capabilities to a μSIMD-VLIW core to speed-up the execution of the DLP regions, while, at the same time, reducing the fetch bandwidth requirements. Results show that, in the DLP regions, both 2 and 4-issue width vector-μSIMD-VLIW architectures outperform a 8-issue width μSIMD-VLIW in factors of up to 2.7X and 4.2X (1.6X and 2.1X in average) respectively. As a result, the DLP regions become less than 10% of the total execution time and performance is dominated by the ILP regions.