By Topic

New data structures for analyzing frequent factors in strings

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Baena-Garcia, M. ; Dipt. Lenguajes y Cienc. de la Comput., Univ. de Malaga, Malaga, Spain ; Morales-Bueno, R.

Discovering frequent factors from long strings is an important problem in many applications, such as biosequence mining. In classical approaches, the algorithms process a vast database of small strings. However, in this paper we analyze a small database of long strings. The main difference resides in the high number of patterns to analyze. To tackle the problem, we have developed a new algorithm for discovering frequent factors in long strings. This algorithm uses a new data structure to arrange nodes in a trie. A positioning matrix is defined as a new positioning strategy. By using positioning matrices, we can apply advanced prune heuristics in a trie with a minimal computational cost. The positioning matrices let us process strings including Short Tandem Repeats and calculate different interestingness measures efficiently. The algorithm has been successfully used in natural language and biological sequence contexts.

Published in:

Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on

Date of Conference:

22-24 Nov. 2011