By Topic

TFP: an efficient algorithm for mining top-k frequent closed itemsets

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Jianyong Wang ; Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China ; J. Han ; Y. Lu ; P. Tzvetkov

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size.

Published in:

IEEE Transactions on Knowledge and Data Engineering  (Volume:17 ,  Issue: 5 )