Skip to Main Content
A new parallel algorithm for finding the frequent itemsets in databases is presented. It differs fundamentally of well known Apriori algorithm, where at the beginning of every step, the dimension of the new frequent itemsets increases by 1 . In our algorithm the frequent itemsets are determined by progressively enlarging the interval which the individual items appertain, i.e. if at the k-th step the new candidates are from [i, i+k] intervals, i=1, 2,..., n-k, at the next step, k+1, the new candidates will belong to [i, i+k+1] intervals, i=1, 2,..., n-k-1. The frequent individual items are identified by their index. The basic idea is that the new frequent itemsets with individual items from the interval [i, j], simultaneously contain the items i and j. The frequent itemsets are built by sharing the work between n processors. Hereby, the processor Pi computes, step by step, the sets Fi,j of the frequent itemsets with individual items from the intervals [i, j], j=i,..., n. In order to compute the set Fi,j, the processing unit Pi uses Fi,j-1 obtained in the previous step and Fi+1,j received from the processor Pi+1. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start, which allows mapping communication to hardware. Another major advantage is that the set of the transactions can be distributed to processors prior to beginning. This is possible because a processor Pi has to compute Fi,j, j=i,..., n and therefore only the transactions containing the frequent item i are needed.