Abstract:
Machine learning method in text classification has expanded from topic identification to more challenging tasks such as sentiment classification, and it is valuable to ex...Show MoreMetadata
Abstract:
Machine learning method in text classification has expanded from topic identification to more challenging tasks such as sentiment classification, and it is valuable to explore, compare methods applied in sentiment classification and investigate relevant influence factors. The chief aim of the present work is to compare four machine learning methods to sentiment classification of Chinese review. The corpus is made up of 16000 reviews from website. We investigate the factors which affect the performance: namely feature representation via Word-Based Unigram (WBU), Bigram (WBB) and Chinese Character-Based Bigram (CBB), Trigram (CBT); feature weighting schemes and feature dimensionality. Experimental evaluations show that performance depends on different settings. As a result, we draw a conclusion that Naive Bayes (NB) classifier obtains the best averaging performance when using WBB, CBT as features with bool weighting under different dimensionality to the task.
Published in: 2007 International Conference on Natural Language Processing and Knowledge Engineering
Date of Conference: 30 August 2007 - 01 September 2007
Date Added to IEEE Xplore: 29 October 2007
ISBN Information: