By Topic

Mining Frequent Patterns with Wildcards from Biological Sequences

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Yu He ; Dept. of Computer Science, University of Vermont, Burlington, VT 05401. ; Xindong Wu ; Xingquan Zhu ; Abdullah N. Arslan

Frequent pattern mining from sequences is a crucial step for many domain experts, such as molecular biologists, to discover rules or patterns hidden in their data. In order to find specific patterns, many existing tools require users to specify gap constraints beforehand. In reality, it is often nontrivial to let a user provide such gap constraints. In addition, a change made to the gap values may give completely different results, and require a separate time-consuming re-mining procedure. Consequently it is desirable to develop an algorithm to automatically and efficiently find patterns without user-specified gap constraints. In this paper, a frequent pattern mining problem without user-specified gap constraints is presented and studied. Given a sequence and a support threshold value, all subsequences whose support is not less than the given threshold value will be discovered. These frequent subsequences then form patterns later on. Two heuristic methods (one-way vs two-way scan) are proposed to mine frequent subsequences and estimate the maximum support for both artificial and real world data. Given a specific pattern, the simulated results demonstrate that the one-way scan heuristic performs better in the sense of estimating the maximum support with more than ninety percent accuracy.

Published in:

2007 IEEE International Conference on Information Reuse and Integration

Date of Conference:

13-15 Aug. 2007