Journals & Magazines >IEEE Access >Volume: 8

A Fast Algorithm for the Largest Area First Parsing of Real Strings

We present a new algorithm for the largest area first (greedy) parsing of a string that is based on local updates in the enhanced suffix array, and that works in quasilin...

Abstract:

The largest area first parsing of a string often leads to the best results in grammar compression for a variety of input data. However, the fastest existing algorithm has...Show More

Metadata

Abstract:

The largest area first parsing of a string often leads to the best results in grammar compression for a variety of input data. However, the fastest existing algorithm has Θ(N² log N) time complexity, which makes it impractical for real-life applications. We present a new largest area first parsing method that has O(N³) complexity in the improbable worst case but works in the quasilinear time for most practical purposes. This result is based on the fact that in the real data, the sum of all depths of an LCP-interval tree, over all of the positions in a suffix array of an input string, is only larger than the size of the input by a small factor α. We present the analysis of the algorithm in terms of α, and the experimental results confirm that our method is practical even for genome sized inputs. We provide the C++11 code for the implementation of our method. Additionally, we show that by a combination of the previous and new algorithms, the worst-case complexity of the largest area first parsing is improved by a factor of ³√N.

We present a new algorithm for the largest area first (greedy) parsing of a string that is based on local updates in the enhanced suffix array, and that works in quasilin...

Published in: IEEE Access ( Volume: 8)

Page(s): 141990 - 142002

Date of Publication: 03 August 2020

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2020.3013676

Funding Agency:

Contents

References is not available for this document.

A Fast Algorithm for the Largest Area First Parsing of Real Strings

Abstract:

Metadata

Abstract:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Fast Algorithm for the Largest Area First Parsing of Real Strings

Alerts

Abstract:

Metadata

Abstract:

Funding Agency:

Authors

Figures

References

Keywords

Metrics

Footnotes

References

IEEE Account

Purchase Details

Profile Information

Need Help?