Skip to Main Content
It is attractive to observe the history of one pattern in the retrospective corpus such that one might sense the trends related to that pattern efficiently, where one pattern history was defined as the frequency distribution of that pattern over time. Pattern history could provide information analysts with valuable information and clues for trend analysis. Note that one pattern could be a token or a sequence of words in this study. To extract significant patterns from a large amount of texts, and meanwhile compute the corresponding patterns histories, a scalable and external memory approach based on bucket-like suffixes sorting and push-pop stack operations is proposed. To highlight the scalability and robustness of this approach, experimental data consisted of 3, 225, 549 articles (about 4 GB) downloaded from the PubMed for 20 years from 1990 to 2009, and the total computation time of patterns histories was about 48 hours using only one PC. Experimental results showed that specific patterns histories did reveal the variations of some events and gave hints for trend analysis.