Skip to Main Content
This paper introduces an approach to classifying emails into phishing/non-phishing categories using the C5.0 algorithm which achieves very high precision and an ensemble of other classifiers that achieve high recall. The representation of instances used in this paper is very small consisting of only five features. Results of an evaluation of this system, using over 8,000 emails approximately half of which were phishing emails and the remainder legitimate, are presented. These results show the benefits of using this recall boosting technique over that of any individual classifier or collection of classifiers.