Abstract:
Electronic mail has been in use for decades and more than four billion users access their emails using different domains and servers. Emails are considered an official wa...Show MoreMetadata
Abstract:
Electronic mail has been in use for decades and more than four billion users access their emails using different domains and servers. Emails are considered an official way of communication in remote working modes and in online businesses. Email labeling can reduce the amount of effort to manage this communication. Email classification is so far done to classify emails such as Spam, Non-spam, Junk, social media, etc. However, email classification keeping in view the types of cybercrimes committed through email is not done. Emails can be labeled as Spam, Phishing, fraudulent, harassing, bullying, or can be a general/normal email. This identification is one of the most challenging tasks for both email service providers and consumers. Several spam identification models have previously been proposed and tested but very limited work has been done so far on the multi-class classification of emails. Emails can be classified into more than two classes (spam and ham). In this paper, we have proposed a solution to classify emails into four classes: fraudulent, suspicious, harassment, and normal. A deep learning approach named Long Short Term Memory(LSTM) with stratified sampling has been used to identify the email classes. An effort has also been made to balance the input dataset using over-sampling methods. The proposed model obtained a classification accuracy of more than 90%. with stratified sampling only and more than 95% by applying data balancing techniques on the dataset.
Date of Conference: 23-25 November 2021
Date Added to IEEE Xplore: 09 February 2022
ISBN Information: