Skip to Main Content
Internet services that has become easier to access has contributed to the drastic increase in the number of web pages. This phenomenon has created new difficulties to internet users about retrieving the latest, relevant and excellent web information. This is due to the enormous contents of web information that have caused problems in the restructuring of web information. Thus, in order to ensure the latest, quality and relevant web information is optimally retrievable, it is necessary to undertake the task of web document classification. This paper discusses the result of classifying web document using the extraction and machine learning techniques. Four types of kernels namely the Radial Basis Function (RBF), linear, polynomial and sigmoid are applied to test the accuracy of the classification. The results show that the accuracy percentage of web document classification will increase whenever more web document is used. The results also show that linear kernel technique is the best in web document classification compared to RBF, polynomial and sigmoid.
Information Technology (ITSim), 2010 International Symposium in (Volume:2 )
Date of Conference: 15-17 June 2010