Skip to Main Content
Phishing has been easy and effective way for trickery and deception on the Internet. While solutions such as URL blacklisting have been effective to some degree, their reliance on exact match with the blacklisted entries makes it easy for attackers to evade. We start with the observation that attackers often employ simple modifications (e.g., changing top level domain) to URLs. Our system, PhishNet, exploits this observation using two components. In the first component, we propose five heuristics to enumerate simple combinations of known phishing sites to discover new phishing URLs. The second component consists of an approximate matching algorithm that dissects a URL into multiple components that are matched individually against entries in the blacklist. In our evaluation with real-time blacklist feeds, we discovered around 18,000 new phishing URLs from a set of 6,000 new blacklist entries. We also show that our approximate matching algorithm leads to very few false positives (3%) and negatives (5%).