By Topic

Parallelizing Principal Component Analysis for Robust Facial Recognition Using CUDA

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Todd Goodall ; Holcombe Dept. of Electr. & Comput. Eng., Clemson Univ., Clemson, SC, USA ; Scott Gibson ; Melissa C. Smith

Facial recognition techniques are of interest for tracking and identification in densely populated areas where security is an important concern. Traditional recognition techniques have yielded acceptable results with high repeatability but require special conditions such as a voluntary and stationary subject, close proximity, and appropriate lighting. Because no single algorithm yields robust results under uncontrolled conditions, more than one algorithm must be considered. Three popular template-based algorithms involved in facial recognition include Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and Independent Component Analysis (ICA). Since facial recognition algorithms require processor-intensive and complicated matrix calculations, these three algorithms could be improved with hardware that can accelerate these calculations. The PCA algorithm studied in this research is a common mathematical method that has been parallelized for other applications, but not for the purposes of facial recognition using General Purpose Graphics Processing Units (GPGPUs). NVIDIA's CUDA parallel computing architecture is employed to implement the PCA algorithm for the GPGPU device. A C implementation of the PCA algorithm has been optimized specifically for use with CUDA kernels and future parallelization in mind. The results provided in this paper show that the parallel GPGPU implementation outperforms the multithreaded C implementation on a general purpose CPU.

Published in:

Application Accelerators in High Performance Computing (SAAHPC), 2012 Symposium on

Date of Conference:

10-11 July 2012