Programming
Unlike general-purpose processors, GPUs are optimized to perform floating-point operations on large data sets. Until recently, harvesting this processing power required knowledge of graphics libraries, such as OpenGL, which represents a significant time investment to master. However, recent developments in GPU architectures and development tools have made such devices more accessible to a broader audience. NVIDIA's Compute Unified Device Architecture (CUDA; www.nvidia.com/cuda). for example, lets users develop algorithms in C++ with some language extensions. Developers write kernels in CUDA that execute on the GPU and define a single thread of execution's behavior. Thousands of threads execute a kernel concurrently, and the GPU's thread manager maps them all to physical thread processors. The kernel is invoked on the host side, at which time the host CPU determines how many threads to execute—the host CPU also controls memory management and data transfer. A special compiler called nvcc translates kernels and host programs into code that executes on both the CPU and GPU.