Skip to Main Content
Sequential pattern mining is increasingly becoming useful and essential in many scientific and commercial domains. Enormous sizes of available datasets and possibly large number of candidate patterns demand efficient and scalable algorithms. In this paper, we present an efficient parallel algorithm named pre-clustering based sequential pattern mining (PCSPM). The algorithm groups sequence data into some clusters according to a similarity definition, and then distribute the clusters to the nodes of distributed memory parallel computer and form some node sets according to the clusters. By limiting the most of communication in each node set, it can greatly reduce the unnecessary communications among parallel computing nodes, and therefore, save much time of communication. The experimental results and the relevant analysis show that PCSPM algorithm is efficient and available.