Skip to Main Content
Although the abundance of information and its accessibility represents an important cultural advance, it also introduces a new challenge: retrieving relevant information. However, the growing body of available data provides an ideal test bed for theoretical constructions and models. This opportunity has stimulated considerable interest from researchers in many different communities-physicists, mathematicians, economists, and statisticians, to name a few. In this spirit, we seek to discover the most suitable tools for examining large masses of data and extracting useful information from it. The information-theoretic method described in this article applies to any kind of corpora of character strings, independent of the type of coding behind them. The method has great potential for fields where human intuition might fail: DNA and protein sequences, geological time series, stock market data, medical monitoring, and so on.