We present FAST (finding associations from sampled transactions), a refined sampling-based mining algorithm that is distinguished from prior algorithms by its novel two-phase approach to sample collection. In phase I a large sample is collected to quickly and accurately estimate the support of each item in the database. In phase II, a small final sample is obtained by excluding "outlier" transactions in such a manner that the support of each item in the final sample is as close as possible to the estimated support of the item in the entire database. We propose two approaches to obtaining the final sample in phase II: trimming and growing. The trimming procedure starts from the large initial sample and removes outlier transactions until a specified stopping criterion is satisfied. In contrast, the growing procedure selects representative transactions from the initial sample and adds them to an initially empty data set
Published in:
Data Engineering, 2002. Proceedings. 18th International Conference on
Date of Conference: 2002