The authors take a real world application from a text database and present a case history. The techniques ultimately led to a discovery contradicting an accepted paradigm in seismology. Using simple, tailored, keyword extraction, they examined a text collection of earthquake data. A discovery was made when an unusual pattern emerged from the text. They then tested a more comprehensive numerical database, treating the the text discovery as a hypothesis. It was verified using a standard χ2 statistic. The hypothesis was significant earthquakes in the longitude regions that include California, occur more often in the morning hours than any other time of day
Published in:
Scientific and Statistical Database Management, 1997. Proceedings., Ninth International Conference on
Date of Conference: 11-13 Aug 1997