I. Introduction
GPUs have become the dominant co-processor architecture for accelerating highly parallel applications as they offer greater instruction throughput and memory bandwidth than conventional CPUs. Such chips are being used to accelerate desktops, workstations, and supercomputers; GPU computing is also emerging as an important factor for mobile computing.