Skip to Main Content
Graphics and media processing is quickly emerging to become one of the key computing workloads. Programmable graphics processors give designers extra flexibility by running a small program for each fragment in the graphics pipeline. We investigate low-cost mechanisms to obtain good performance for modern graphics programs on a general purpose CPU. We present a compiler that compiles SIMD graphics program and generates efficient code on a general purpose CPU. The generated code can process between 25-0.3 million vertices per second on a 2.2 GHz Intel Pentium® 4 processor for a group of typical graphics programs. We also evaluate the impact of three changes in the architecture and compiler. Adding support for new specialized instructions improves the performance of the programs by 27.4% on average. A novel compiler optimization called mask analysis improves the performance of the programs by 19.5% on average. Increasing the number of architectural SIMD registers from 8 to 16 registers significantly reduces the number of memory accesses due to register spills.