Skip to Main Content
We present a series of comparative experiments on using statistical classifiers for task classification. Our experiments focus on three aspects: the effect of different features on the performance of the classifier, the robustness of classifiers with different features on data variability and the effect of size of training data on the performance of the classifier. For Chinese input sentences, three linguistics units can be used as the features: Chinese characters, Chinese words and semantic constituents. Both advantages and disadvantages of them are analyzed in details. A controlled study using Naive Bayes classifiers is conducted to examine the impact of different features on the performance of classifiers. The classifiers with different features are evaluated respectively on the clean and noisy test data to investigate their robustness. Learning curves of the classifiers with different features are given to show the effect of size of training data.