Skip to Main Content
Phishing websites impersonate legitimate counterparts to lure users into visiting their websites. Once users visit a phishing website then the phishing website may steal users' private information or cause drive-by downloads. To detect a phishing website, human experts compare the claimed identity of a website with features in the website. For example, human experts often compare the domain name in the URL against the claimed identity. Most legitimate websites have domain names that match their identities, while phishing websites usually have less relevance between their domain names and their claimed (fake) identities. In addition to blacklists, whitelists, heuristics, and classifications used in the state-of-the-art systems, we propose to consider websites' identity claims. Our phishing detection system mimics this human expert behavior. Given a website, our system learns the identity that this website claims, and computes the textual relevance between this claimed identity and other features in the website. Our phishing detection system then uses this textual relevance as one of the features for classification, and our classifiers achieve more than 98% of true positive rate and very low false positive rate between 0.5% and 1%.