Loading [MathJax]/extensions/MathMenu.js
n-Gram Statistics for Natural Language Understanding and Text Processing | IEEE Journals & Magazine | IEEE Xplore

n-Gram Statistics for Natural Language Understanding and Text Processing


Abstract:

n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were...Show More

Abstract:

n-gram (n = 1 to 5) statistics and other properties of the English language were derived for applications in natural language understanding and text processing. They were computed from a well-known corpus composed of 1 million word samples. Similar properties were also derived from the most frequent 1000 words of three other corpuses. The positional distributions of n-grams obtained in the present study are discussed. Statistical studies on word length and trends of n-gram frequencies versus vocabulary are presented. In addition to a survey of n-gram statistics found in the literature, a collection of n-gram statistics obtained by other researchers is reviewed and compared.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: PAMI-1, Issue: 2, April 1979)
Page(s): 164 - 172
Date of Publication: 30 April 1979

ISSN Information:

PubMed ID: 21868845

Contact IEEE to Subscribe

References

References is not available for this document.