Humans communicate with text in thousands of languages, in dozens of scripts, in a variety of binary codes, on millions of topics. There is a need, for both government and commercial applications, to identify these text characteristics to enable follow-on processing such as transcoding, translation, transliteration, routing and prioritization. This paper deals with the implementation of real-time mining of unstructured text on high-speed hardware capable of processing network data streams at gigabyte per second speeds.
Published in:
Data Mining Workshops, 2008. ICDMW '08. IEEE International Conference on
Date of Conference: 15-19 Dec. 2008