The quality of an image is highly critical for applications such as robotic vision, surveillance, medical imaging, etc. The images captured in real-time are seldom noise free and therefore require noise removal for further processing. Out of several proposed noise removal schemes, an isotropic diffusion filtering is known to achieve highly precise results. However, the accuracy comes at an expense of high computation cost, especially for large data sets. The highly parallel nature of the aforementioned filtering algorithm makes it a good candidate for the General Purpose Graphical Processing Unit (GPGPU) clusters. In this research, we present a GPGPU cluster-based implementation of the non-linear an isotropic diffusion filter. Our implementation maps the computationally intensive parts of the algorithm to the GPGPU devices while the communication and serial processing are performed by the CPU hosts. Our efficiently mapped multi-node GPGPU implementation is capable of processing images as large as 156 mega-pixels and achieves a speed-up of 29x over an equivalent MPI-only implementation. In addition, our multi-node GPGPU implementation exhibits reasonable scaling behavior that improves with the size of the images.
Published in:
Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on
Date of Conference: 10-11 July 2012