By Topic

Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Dino Isa ; The University of Nottingham, Malaysia Campus, Semenyih ; Lam H. Lee ; V. P. Kallimani ; R. RajKumar

This work implements an enhanced hybrid classification method through the utilization of the naive Bayes approach and the support vector machine (SVM). In this project, the Bayes formula was used to vectorize (as opposed to classify) a document according to a probability distribution reflecting the probable categories that the document may belong to. The Bayes formula gives a range of probabilities to which the document can be assigned according to a predetermined set of topics (categories) such as those found in the "20 Newsgroups" data set for instance. Using this probability distribution as the vectors to represent the document, the SVM can then be used to classify the documents on a multidimensional level. The effects of an inadvertent dimensionality reduction caused by classifying using only the highest probability using the naive Bayes classifier can be overcome using the SVM by employing all the probability values associated with every category for each document. This method can be used for any data set and shows a significant reduction in training time as compared to the Lsquare method and significant improvement in the classification accuracy when compared to pure naive Bayes systems and also the TF-IDF/SVM hybrids.

Published in:

IEEE Transactions on Knowledge and Data Engineering  (Volume:20 ,  Issue: 9 )