Skip to Main Content
One problem that has arisen in recent years is the extraction of useful information from changes in a data stream including natural language. Statistical tests on single word occurrences can reveal many apparent differences. Understanding the reasons behind such changes in the data requires methods for discovering structure within the entire set of individual changed items. This work presents a methodology for understanding how a language model has altered based on utterance clustering and statistical tests on individual features. It further examines clustering of lexical items via profiles of changes in association scores. A machine using an analysis package based on these techniques can isolate novel portions of the data stream. Human inspection of such data then readily determines the nature of the observed change. We investigate several variants of this analysis upon data drawn from an automated call center.