Skip to Main Content
We implement a text-classification engine on a single FPGA chip running on a 50 Mhz clock. It is based on arithmetic coding data compression. The text classifier is based on the non-parametric nearest-neighbor algorithm. It computes a compression-based distance between two text files. We have devised a parallel hardware architecture for the computation of the tag-interval that encodes the data sequence in arithmetic coding. This architecture achieves a large speedup factor. Even with a relatively slow 50 Mhz clock the hardware solution performs 26 times faster than a software-based implementation of this classifier in C++ on a Pentium® D CPU running on a 3 Ghz clock. There are many applications where such a hardware-based classifier is an advantage not only because of its high speed of execution but because it can be embedded as a single chip into small special-purpose systems with limited computational resources. For instance, on a communication board (passively monitoring network traffic and classifying anomalous patterns), on a CCTV camera (classifying abnormal behavior for homeland security), on a satellite to do real-time classification of high resolution images and on a small-scale weapon that requires real-time target classification. Since we use a universal-distance computed by data compression once a corpus of labeled texts is uploaded onto the chip there is no need for any feature extraction or machine learning.