Abstract:
Networking plays leading role to meet day-to-day activities. Huge usage of networking in education systems, financial transactions, businesses, and social gatherings acro...Show MoreMetadata
Abstract:
Networking plays leading role to meet day-to-day activities. Huge usage of networking in education systems, financial transactions, businesses, and social gatherings across the globe peaked to the level of spam URL's. In addition to this, the quench for finding most relevant and up-to-date data in any sector lead to severe threat of spam URL's. It's quite difficult to identify the real website from fake one because of spam URL's and thus leads the users to fall under hacker's trick which further threatens the system safety. Therefore, it is worth to classify spam URL's to identify the true websites from fake ones. This led to the emergence of using machine learning techniques to classify Spam URL'S data. In this research, spam URL's data was taken from Kaggle database and machine learning classifiers were used to classify the URL's into spam or non-spam URL's. 10-fold cross validation and hold out method was used. It was observed that random forest yielded high classification accuracy of 97% in classifying the URL data when 10-fold cross validation was performed when compared to hold-out method. Support vector machine yielded 92% classification accuracy followed by naive bayes with a classification accuracy of 91%. The outcomes were assessed on few parameters like accuracy, true positive rate, false positive rate, precision and recall.
Published in: 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS)
Date of Conference: 22-23 June 2022
Date Added to IEEE Xplore: 26 September 2022
ISBN Information: