Skip to Main Content
Image reconstruction for the ECAT HRRT PET scanner with MOLAR is computationally demanding and requires a computer cluster for reasonable run times. Parallel computing using GPUs and CUDA offers a means to accelerate MOLAR. However, forward and backprojection operations present unique challenges that must be overcome to achieve acceptable speedup. In this study we implement GPU-accelerated versions of MOLAR's forward projection, backprojection and algorithm update modules and compare their performance to CPU-only versions. During this implementation we optimized the GPU thread configurations for each of these modules separately, along with a hybrid forward-backprojection module that is used for algorithm updates. We also numerically evaluated the reconstruction results to assess the impact of floating-point to integer conversions dictated by the GPU architecture. We found forward projection to be 41 times faster than the CPU-only code, while backprojection was 20 times faster. We found the optimal thread configurations always assigned 64 threads to a thread block, but with different distributions across the nested indexing loops within each module. These results show that MOLAR's forward and backprojection modules can be adequately accelerated to make the MOLAR reconstruction package much more efficient.