Parallel implementation of the non-overlapping template matching test using CUDA | IEEE Journals & Magazine | IEEE Xplore