Data mining algorithms are expensive by nature, but when dealing with today's dataset sizes, they are becoming even more slow and hard to use. Previous work has focused on parallelizing data mining algorithms on different architectures, and more recently, applications are starting to take advantage of the massive computation power and high bandwidth offered by GPUs. However there has been almost no prior work in offering a general methodology for parallelizing all types of data mining applications on hybrid architectures. This paper presents a framework for fast and efficient parallelization of data mining algorithms on GPU systems. The framework implements I/O transfer models that deal with the huge amount of data entries which are processed by this type of algorithms, all with numerous dependencies. Also the framework allows users to specify data requirements for each task so that the data scheduler can map efficiently each task on a GPU node and on a block in each of these processors improving the overall performance of the algorithm with around 20%.
Published in:
Parallel and Distributed Computing (ISPDC), 2011 10th International Symposium on
Date of Conference: 6-8 July 2011