This work presents a graphics processing unit (GPU)-based implementation of a fully 3-D PET iterative reconstruction code, FIRST (Fast Iterative Reconstruction Software for [PET] Tomography), which was developed by our group. We describe the main steps followed to convert the FIRST code (which can run on several CPUs using the message passing interface [MPI] protocol) into a code where the main time-consuming parts of the reconstruction process (forward and backward projection) are massively parallelized on a GPU. Our objective was to obtain significant acceleration of the reconstruction without compromising the image quality or the flexibility of the CPU implementation. Therefore, we implemented a GPU version using an abstraction layer for the GPU, namely, CUDA C. The code reconstructs images from sinogram data, and with the same System Response Matrix obtained from Monte Carlo simulations than the CPU version. The use of memory was optimized to ensure good performance in the GPU. The code was adapted for the VrPET small-animal PET scanner. The CUDA version is more than 70 times faster than the original code running in a single core of a high-end CPU, with no loss of accuracy.