By Topic

Phishing website detection using Latent Dirichlet Allocation and AdaBoost

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Venkatesh Ramanathan ; Department of Computer Science, George Mason University, Fairfax, Virginia, 22030, USA ; Harry Wechsler

One of the ways criminals steal identity in the cyberspace is using phishing. Attackers host phishing websites that resemble a legitimate website and entice users to click on hyperlinks which directs them to these fake websites. Attackers use these fake sites to capture personal information such as login, passwords and social security numbers from innocent victims, which they later use to commit crimes. We propose here a robust methodology to detect phishing websites that employs for semantic analysis a topic modeling technique, Latent Dirichlet Allocation, and for classification, AdaBoost. The methodology developed is a content driven approach that is device independent and language neutral. The website content of mobile and desktop clients are collected by employing an intelligent web crawler. The website contents that are not in English are translated to English using Google's language translator. Topic model is built using the translated contents of desktop and mobile clients. The phishing website classifier is built using (i) distribution probabilities for the topics found as features using Latent Dirichlet Allocation and (ii) AdaBoost voting technique. Experiments were conducted using one of the large public corpus of website data containing 47500 phishing websites and 52500 good websites. Results show that our method achieves a F-measure of 99%.

Published in:

Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on

Date of Conference:

11-14 June 2012