By Topic

Design and evaluation of a parallel k-nearest neighbor algorithm on CUDA-enabled GPU

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Shenshen Liang ; Chinese Acad. of Sci., Grad. Univ., Beijing, China ; Ying Liu ; Cheng Wang ; Liheng Jian

Recent developments in Graphics Processing Units (GPUs) have enabled inexpensive high performance computing for general-purpose applications. Due to GPU's tremendous computing capability, it has emerged as the co-processor of CPU to achieve a high overall throughput. CUDA programming model provides the programmers adequate C language like APIs to better exploit the parallel power of the GPU. K-nearest neighbor (KNN) is a widely used classification technique and has significant applications in various domains, especially in text classification. The computational-intensive nature of KNN requires a high performance implementation. In this paper, we propose CUKNN, a CUDA-based parallel implementation of KNN. It launches two CUDA kernels, distance calculation kernel and selecting kernel. In the distance calculation kernel, a great number of concurrent CUDA threads are issued, where each thread performs the calculation between the query object and a reference object; in the selecting kernel, threads in a block find the local-k nearest neighbors of the query object concurrently, and then a thread is invoked to find the global-k nearest neighbors out of the queues of local-k neighbors. Various CUDA optimization techniques are applied to maximize the utilization of GPU. We evaluate our implementation by using synthetic dataseis and a real physical simulation dataset. The experimental results demonstrate that CUKNN outperforms the serial KNN on an HP xw8600 workstation significantly, achieving up to 46.7IX speedup on the synthetic dataseis and 42.49X on the physical simulation dataset including I/O cost. It also shows good scalability when varying the number of dimensions of the reference dataset, the number of objects in the reference dataset, and the number of objects in the query dataset.

Published in:

Web Society (SWS), 2010 IEEE 2nd Symposium on

Date of Conference:

16-17 Aug. 2010