Skip to Main Content
This paper describes the use of a novel associative memory neural network architecture to perform unsupervised phrase detection in a large, unstructured, English text corpus. To significantly increase the difficulty associated with processing the text corpus, the network is exposed to over 270 thousand Web pages from the .edu domain with no textual substitution or alteration (for spelling, grammar, etc.). The corpus, consisting of 150M words, is represented as a string of sparse tokens and phrase detection is performed through the use of the unique information theoretic quantity of mutual significance.
Neural Networks, 2003. Proceedings of the International Joint Conference on (Volume:4 )
Date of Conference: 20-24 July 2003