Loading [MathJax]/jax/output/HTML-CSS/fonts/TeX/SansSerif/Regular/Main.js
SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDs | IEEE Journals & Magazine | IEEE Xplore

SGMiner: A Fast and Scalable GPU-Based Frequent Pattern Miner on SSDs


A fast and scalable GPU- and disk-based method, called SGMiner, for frequent itemset mining that exploits multiple GPUs and SSDs.

Abstract:

Frequent itemset mining is extensively employed as an essential data mining technique. Nevertheless, as the data size grows, the applicability of this method decreases ow...Show More

Abstract:

Frequent itemset mining is extensively employed as an essential data mining technique. Nevertheless, as the data size grows, the applicability of this method decreases owing to the relatively poor performance of the existing methods. Though numerous efficient sequential frequent itemset mining methods have been developed, the performance that can be achieved is clearly limited by the fact that they exploit only one thread. To overcome these limitations, a number of parallel methods using multi-core central processing units (CPUs), multiple machines or many-core graphic processing units (GPU) have been proposed. However, these methods are relatively slow in performance and have low scalability, mainly owing to large memory requirements for intermediate data, significant disk I/Os, and heavy computation. In this study, to resolve the aforementioned problems, we propose {\mathsf {SGMiner}} , which is a new, fast, and scalable GPU- and disk-based method on a single machine equipped with multiple graphic processing units (GPUs) and multiple solid-state drives (SSDs) for extracting frequent patterns. It is based on an algorithm similar to the Apriori algorithm and neither has intermediate data nor large disk I/O overheads owing to its exploitation of SSDs. Moreover, we propose storing transaction databases, namely bitmap transaction chunks, in SSDs, streaming the chunks to GPU device memory via the main memory with reduced I/O overhead, and performing fast support counting with GPUs based on the chunks. In addition, when exploiting multiple GPUs and SSDs, it proposes a concept of replicating bitmap transaction chunks stored in SSDs to GPUs in a streaming fashion. This could allow an almost equal workload to be distributed evenly across multiple GPUs with reduced I/O overheads. The experiments we conducted demonstrate that {\mathsf {SGMiner}} outperforms the existing methods in terms of scalability and performance with enhanced robustness.
A fast and scalable GPU- and disk-based method, called SGMiner, for frequent itemset mining that exploits multiple GPUs and SSDs.
Published in: IEEE Access ( Volume: 10)
Page(s): 62502 - 62519
Date of Publication: 01 June 2022
Electronic ISSN: 2169-3536

Funding Agency:


References

References is not available for this document.