Skip to Main Content
The large volume of data on the Internet makes it extremely difficult to extract high-level information, such as recurring or time-varying trends in document content. Dimensionality reduction techniques can be applied to simplify the analysis process but the amount of data is still quite large. If the analysis is restricted to just text documents then Latent Dirichlet Allocation (LDA) can be used to quantify semantic, or topical, groupings in the data set. This paper proposes a method that combines LDA with the visualization capabilities of Self-Organizing Maps to track topic trends over time. By examining the response of a map over time, it is possible to build a detailed picture of how the contents of a dataset change.