The majority of used kernels in SVMs concern continuous data, and neglect the structure of the text. In contrast to classical kernels, we propose the use of various string kernels for spam filtering. On the other hand, data preprocessing is a vital part of text classification where the objective is to generate feature vectors usable by SVM kernels. We detail a feature mapping variant in text classification (TC) that yields improved performance for the standard SVM in filtering task. Furthermore, we propose an online active framework for spam filtering.
Published in:
Computers and Communications, 2009. ISCC 2009. IEEE Symposium on
Date of Conference: 5-8 July 2009