Skip to Main Content
Ultrasonic imaging and reconstruction tools are commonly used to detect, identify and measure defects in different mechanical parts. Due to the complexity of the underlying physics and to the evergrowing quantity of acquired data, computation time is becoming a limitation to the optimal inspection of a mechanical part. This article presents the performances of several implementations of a computational heavy algorithm, named Total Focusing Method (TFM), on both Graphics Processing Units (GPU) and General Purpose Processors (GPP). Combination of algorithmic and architectural optimizations have been used. More specifically, on GPU, details are given on shared memory usage and manual thread handling to improve cache localty, whereas on GPP, benefits of SIM-Dization and multithreading are disucussed. With those optimizations, domain specific dimensions enabled the maximum usage of architectural capabilities resulting in a refined algorithm similar to the Berkley dwarf of MapReduce. Both GPU and GPP optimized implementations result in a memory-bound behavior where GPU outperforms the GPP of a factor ×4.