Skip to Main Content
In this research, we propose NTC (Neural Text Categorizer) as the approach to text categorization. Traditional approaches to text categorization require encoding documents into numerical vectors which leads to the two main problems: huge dimensionality and sparse distribution in each numerical vector. In this research, documents are encoded into string vectors instead of numerical vectors, and a new neural network called NTC which receive a string vector as its input vector is used for text categorization. The goal of this research is to avoid the two main problems by encoding documents into alternative structured data to numerical vectors. We will validate the performance of NTC by comparing it with other machine learning algorithms on the standard test bed, Reuter 21578.