Abstract:
This paper extend Ukkonen's online suffix tree construction algorithm to support substring frequency queries, by adding count fields to the internal nodes of the tree. Th...Show MoreMetadata
Abstract:
This paper extend Ukkonen's online suffix tree construction algorithm to support substring frequency queries, by adding count fields to the internal nodes of the tree. This has applications in the field of sequential data compression. One major problem is that Ukkonen's online construction algorithm does not maintain explicit end of string markers in the tree. The major part of our work concerns quickly determining where the end markers for a particular edge would be, so that frequencies can be correctly obtained. So a complete characterization of all end markers on leaf edges is given. Furthermore we found that edges between two internal nodes can contain at most one end marker. Using these results, the algorithms are given to update the count fields and do frequency queries correctly. All algorithms have been implemented and tested correct in practice.
Published in: Data Compression Conference, 2004. Proceedings. DCC 2004
Date of Conference: 23-25 March 2004
Date Added to IEEE Xplore: 24 August 2004
Print ISBN:0-7695-2082-0
Print ISSN: 1068-0314
Citations are not available for this document.