Association mining in finding relationships between items in a dataset has been demonstrated to be practical in business applications. Many companies are applying association mining on market data for analyzing consumers' purchase behavior. The Apriori algorithm is the most established algorithm for association mining in finding frequent itemsets. However, the time complexity of the Apriori algorithm is dominated by the size of candidate itemsets. Research to date has focused on the efficient discovery of itemsets in a large dataset. Those improvements include the optimizations of data structures, the partitioning of datasets, and the parallelism of data mining. In this paper, we propose a distributed association mining algorithm in finding frequent itemsets. The work is different from many existing distributed algorithms where most of existing algorithms center on the reduction of the size of the dataset. Our distributed algorithm focuses on the reduction of the size of candidate itemsets. The work of candidate k-itemsets generation is evenly distributed to the nodes for workload balancing among processors. The complexity analysis of the distributed algorithm is also presented.
Published in:
Parallel and Distributed Processing with Applications (ISPA), 2010 International Symposium on
Date of Conference: 6-9 Sept. 2010