Journals & Magazines >IEEE Access >Volume: 11

QsecR: Secure QR Code Scanner According to a Novel Malicious URL Detection Framework

QsecR framework.

Abstract:

Malicious Uniform Resource Locators (URLs) are the major issue posed by cybersecurity threats. Cyberattackers spread malicious URLs to carry out attacks such as phishing ...Show More

Metadata

Abstract:

Malicious Uniform Resource Locators (URLs) are the major issue posed by cybersecurity threats. Cyberattackers spread malicious URLs to carry out attacks such as phishing and malware, which lead unsuspecting visitors into scams, resulting in monetary loss and information theft. The adoption of Quick Response (QR) codes with malicious URLs is a growing concern and is an open security issue. The existing QR link detection scanner applications mostly utilize the blacklist method to detect malicious URLs, which is not the optimal method for detecting new websites. Recently, machine learning methods have gained popularity as a means of enhancing the detection of malicious URLs. However, these methods are entirely data-dependent, and a large and updated dataset is required for the training to create an effective detection method. This research proposes QsecR, a secure and privacy-friendly QR code scanner, according to a malicious URL detection framework. QsecR is an Android QR code scanner based on predefined static feature classification by employing 39 classes of blacklist, lexical, host-based, and content-based features. A dataset containing 4000 real-world random URLs was gathered from URLhaus and PhishTank. The QsecR is evaluated by several QR code scanners in terms of security and privacy. The experimental result shows that QsecR outperforms others and achieves a detection accuracy of 93.50% and a precision value of 93.80%, which is significantly higher than the current secure QR code scanners. Also, QsecR is one of the most privacy-friendly application with the least privilege permission.

QsecR framework.

Published in: IEEE Access ( Volume: 11)

Page(s): 92523 - 92539

Date of Publication: 03 July 2023

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2023.3291811

Contents

SECTION I.

Introduction

Android is the most popular operating system in the world, with more than 2.5 billion active users and a market share of about 75% among mobile operating systems [1]. The Google Play Store has significantly risen over the previous decade and reached ${\$}$ 38.6 billion in 2020 and its revenue has grown by 167% during the past four years [2]. Also, it offered approximately 2.9 million apps in 2020, which were downloaded 108 billion times [1]. With the increasing popularity of this operating system on the global market, its threats are also expanding rapidly [3]. Android was the most vulnerable operating system in the past few years [4].

One particular threat vector in Android is QR codes. The Quick Response code (QR code) is the most popular two-dimensional barcode, and it was invented by the Japanese company Denso Wave in 1994. A QR code has the capability to encode a variety of data types such as numerals, alphabetical, Kanji, katakana, and hiragana characters, symbols, binary data, control codes, and others [5]. They may be used to display text to the user, to add a Vcard, to connect to a wireless network, as well as open a Uniform Resource Locator (URL) that links to a webpage [6], [7]. QR codes are extensively utilized in a variety of sectors such as payment, advertising, access control, product identification, and at this time for the Covid-19 vaccine and tracking [8]. QR codes are widely used because of their high data capacity, readability speed, and reliability [9].

In general, security issues build up when technology is widely used and becomes popular. The QR code is no exception, and besides a broad range of its advantages, it attracts the attention of cyberattackers and has been misused as an attack vector [10]. The first cybercriminal attempts using QR codes were observed in September 2011 [11]. Attackers utilize QR code embedded with a malicious link to lead visitors to a malicious website to unconsciously download Jimm that is infected by TrojanSMS.AndroidOS.Jfake.f malware and sends SMS to premium rate numbers. According to [12], financial crimes employing QR codes are on the rise, with around US ${\$}$ 13 million reported stolen via QR code scams in China.

There is no doubt that the majority of Internet users lack the basic skills to access the Internet safely. Specifically, they are unable to distinguish between malicious and benign (safe) websites [13]. The success and widespread use of QR codes in attacks is due to the fact that they are unreadable by human eyes and can be read only using specific scanning devices [14], [15]. QR code link redirects the user to a website that has already been modified by an attacker with the aim of gaining access to the victim’s sensitive information.

Depending on whether the QR code scanner is a human or an automated program, several attack scenarios are possible. These attacks are phishing, malware propagation, cross-site scripting (XSS), SQL injection, command injection, and attacks on the scanner applications [16], [17], [18]. Phishing and malware propagation attacks that involve human intervention are the focus of this paper. Also, focus on the potential threats to the privacy of Android QR code scanner applications by requesting unusual permissions from users during installation.

A QRishing attack is a form of phishing attack in which the attacker encodes a phishing website in a QR code [19], [20]. Phishing is the most frequent attack using QR codes in which the victim scans the QR code image with their smartphone and is led to a fake website that appears to be legitimate in order to steal sensitive information such as login details and credit card numbers [21], [22].

Malicious websites are frequently used by attackers to deliver malware software, and the adoption of QR codes together with malware propagation is a growing concern [23]. In this method, the attacker encodes a malicious URL in a QR code and once scanned by a QR code scanner, it will direct the victim to a webpage from where they can be driven by a download attack. The attacker can infect the user’s systems and cause serious harm through viruses, ransomware, spyware, botnets, Trojan horses, or worms [8].

Existing security techniques for detecting malicious QR links concentrate mostly on the security of the web browser. The efficiency and accuracy of the applications are entirely dependent on the capability of the browser and its malicious link detection method and plug-ins [24]. Nonetheless, the majority of these browsers rely on the blacklist approach, which is not the most effective means of identifying recently developed websites. Nevertheless, this approach is limited in its scope and fails to recognize newly generated or obfuscated malicious URLs.

Recently, some scanners have utilized machine learning techniques for malicious URL detection. Despite the fact that machine learning techniques have made significant improvements in detecting malicious URLs over the past decade and overcome blacklist method limitations, this approach is not without its limitations, the most prominent being its dependence on data, which poses a significant drawback.

To contribute to the research gap, we propose a malicious URL detection framework according to predefined static feature classification. Then we expanded the concept to the detection of malicious QR codes and developed QsecR, a secure and privacy-friendly QR code Android scanner. Specifically,

We designed a heuristic malicious URL detection framework that is able to detect new malicious URLs in real time with a high level of accuracy.
We provide a predefined static feature classification for detecting malicious URLs. The predefined values are assigned a range of values for the classes to do analyses and comparisons. The 39 classes of blacklist, lexical, host-based, and content-based features are extracted and classified.
We have devised a feature evaluation method that assesses the value provided by individual features. It evaluates whether all features contribute to the final calculation, and if any feature does not provide a value, the method will make a decision to use the value of another feature instead, based on specific conditions.
We developed QsecR, a secure and privacy-friendly QR code Android scanner against malicious QR links and examined the outcomes of the comparison. QsecR achieves a detection accuracy of 93.50% and overcome other secure Android QR code scanner applications.

The rest of the paper is structured as follows. Section II reviews the literature on the security and privacy of QR code scanners. Also, in details, we reviewed the malicious URL detection methods used by secure QR code scanner applications. Section III introduces the design and overall framework of QsecR. This section includes three phases: the redirection, feature extraction and classification, and malicious URL detection phases. Section IV describes the performance evaluation. This section starts with a description of the dataset, which includes 4000 real-world URLs. Then it presents several evaluation metrics to measure the performance of the scanners. Afterward, evaluate the current Android QR code scanner application in terms of security and privacy. In this part, some malware and phishing websites are selected to evaluate the security performance of QR code scanners in terms of detecting malicious URLs and analyzing the permission requested from the scanner during installation. Finally, the performance evaluation of QsecR is given in the last part of this section. Findings indicate that QsecR achieves an average accuracy rate of 93.50% and a precision value of 93.80%. Section V figures out the conclusion of this paper and shows the future work for this research.

SECTION II.

Related Work

The security and privacy issues of QR code scanners have been relatively well studied over the past few years [6], [7], [8], [9], [15], [16], [17], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28], [29]. We will briefly review the literature on the security and privacy of current QR code scanners.

A. Security of QR Code Scanners

QR code attacks are still a real concern for scanners [9], [24], [26]. QR-in-QR attacks, QR code payment attacks, QR code counterfeiting, and QR code information leakage have been identified as potential QR code security threats. Reference [31] describes a novel two-layer QR code attacks using an automated method that can encode two distinct messages in a QR code that can be decoded individually by switching scanning orientations. Reference [16] present QR code attack scenarios for Bitcoin payment by altering wallet address characters via a QR code generator. Reference [14] suggest a nested QR code, which combines two QR codes simultaneously in a square space while both are clearly readable depending on the scanner’s orientation.

There are more than 300 applications by searching for the words “QR code scanner” or “QR code reader” in the Google Play Store, and most of them provide scanning services. A few of them provide a security method to prevent users from link threats, while others are without any security concerns despite their popularity. The average user has difficulty to distinguish between benign and malicious QR codes due to the fact that they are unreadable by human eyes and require specific scanning devices. QR code security scanner applications that offer security are categorized into several methods [9], [30], but in this research they are categorized as link security-based and cryptographic-based. [8].

1) Cryptographic-Based Method

Cryptographic-based methods are utilized in the QR code scanners to encrypt, sign, and control access to the content to provides confidentiality and privacy [9]. Furthermore, digital signatures can accomplish authentication, integrity, and non-repudiation [21]. There are limited number of applications that support generating and scanning cryptographic QR codes [9]. [32] BarSec is a comprehensive barcode security scanner, by adopting symmetric and asymmetric cryptographic mechanisms and offers barcode authentication, data integrity, access control, and confidentiality. AMP QR Code scanner present an anti-malware and phishing detection method, which provides encryption for QR codes by using the AES mechanism [27].

Despite the fact that some QR code security scanners provide cryptographic features, they are still vulnerable and have several fundamental drawbacks. Their main drawbacks are using weak algorithms, short key lengths, lack of following standard encryption structures or optimal encoding schemes, size overhead, and most importantly, requiring to use the same application (generator) to decode the QR code [17]. With these factors in mind, it’s easy to comprehend why cryptographic-based methods haven’t seen widespread adoption and utilization.

2) Link Security-Based (URL-Based) Method

Link security-based methods are an online protection technique that is provided by the QR code scanners that analyze the URLs encoded in QR codes and prevent users from being redirected to phishing and malware websites [24]. The secure QR code scanners generally utilize two protection methods for link security-based methods. These methods are blacklist and machine learning, which are explained in the following section [8], [9].

The blacklist method is the most preferred link security-based approach for QR code scanners. The URLs that have already been identified as potentially harmful (phishing, malware) are located in blacklist databases and have gathered over time [29]. It is assumed that a URL is harmful if it appears in the blacklist database and a warning is produced; otherwise, it is benign. This strategy is extremely fast and simple to implement due to the minimal query overhead, and it produces very low false positive errors [33], [34].

Reference [10] proposes SafeQR, a QR code scanner which is able to detect phishing and malware attacks by invoking the Application Programming Interface (API) of two famous blacklist databases, Google Safe Browsing [35] and PhishTank [36]. AMP QR Code Android scanner application [27] present an anti-malware and phishing detection method, by calling the VirusTotal [37] API.

The primary objective of malicious URL detection methods is to defend users against online attacks and related threats in real-time [38]. However, detection of malicious URLs has traditionally been done mostly via blacklists, bu this method is not exhaustive and cannot identify newly created or obfuscated malicious URLs [39]. Since attackers create hundreds of new websites daily (due to the short-lived of malicious URLs because they are suspended after recognition), the blacklist method is unable to identify fresh URLs that have not yet been added to the blacklist database [40], [41].

As big data becomes increasingly popular, machine learning techniques that are both generalizable and resistant to real attacks have evolved as the most widely used means of detecting malicious URLs [39]. Machine learning is a type of artificial intelligence and is the study of computer algorithms that improve detection models automatically via experience and analyzing data [42].

The main goal of machine learning is to develop a malicious URL detection model by utilizing sample data, referred to as a training dataset, and finding patterns in it [38]. These patterns can then be used to make predictions, categorize, and cluster objects without explicit programming [43]. This method classifies the features represented in a URL by extracting the APIs and other components of a website and then trains a prediction model on a dataset that includes both malicious and benign URLs.

The machine learning detection method is utilized in limited number of QR code scanners [8], [24], [28]. QR fence [24] presents, a threat-oriented QR malicious link detection framework, based on a novel machine learning model which integrates multiple classification algorithms, such as IBK, NB, RT, J48 and Logistic to train 31 lexical and content-based features. Reference [28] proposes QRphish, an automated QR code phishing detection approach based on a Bayes classifier machine learning model to train 20 lexical and host-based features. Reference [8] describes BarAI, secure real-time artificial intelligence system against malicious QR Code links. They used multiple machine learning classifications such as Naive Bayes (NB), Support Vector Machine (SVM), Logistic Regression (LR), K-nearest neighbours (KNN), and decision tree J48 (DT) classifier, including 17 lexical features.

Despite the fact that machine learning techniques have made significant improvements in detecting malicious URLs over the past decade and overcome blacklist method limitations, there are still numerous critical weaknesses that remain. The main drawbacks of machine learning are [34], [44], and [45]:

Label Dataset. The model requires tedious and time-consuming labelling datasets for training supervised machine learning.
Massive Stores of Training Data. A huge dataset is required for the training of machine learning algorithms in order to create an appropriate model for detecting malicious URLs.
Data Dependent (Quality of Dataset). The detection model’s reliability and level of accuracy are entirely dependent on the quality of the dataset. According to [46] the models that were built based on a training dataset with a high level of accuracy are ineffective for detecting URLs in another dataset.
Incur high retaining cost. Continuous retraining with an updated dataset enables a method to identify and interact with new and real-world malicious URLs, which requires a significant amount of time and software resources [41].
Difficulty of Selecting Features. Feature selection is incredibly difficult and requires expertise to select informative features and classes that enhance the detection performance of a method.

Recently, deep learning (a type of machine learning) has been applied to the challenge of detecting malicious URLs, and it has been effective in tackling some problems in machine learning [47]. It eliminates the feature selection procedure, which increases system performance and prevents the loss caused by the selection of incompatible features [48]. It doesn’t require tedious feature extraction, which leads to a training model with minimal effort and results in an appropriate pattern for detecting malicious URLs [44], [49]. While there is progress, there are still significant issues that remain. Table 1 presents the limitation of current QR code scanner’s detection methods.

TABLE 1 Weakness of Current QR Code Scanner’s Detection Methods

B. Privacy of QR Code Scanners

The safety of QR code scanner applications is a major concern for QR code security [20], [29]. There are several potential threats to the privacy of Android devices, but the most severe is an application’s request for excessive permissions [50]. There is always the possibility that an attacker would discover a vulnerability in a QR code scanner application, which may lead to gaining access control over the entire smartphone and acquiring entry to the user’s sensitive data [9]. The reason that is resulting to this vulnerability is that the application is seeking full permission to access the user’s smartphone resources during the installation process [24].

The APK file is the list of all the information related to an Android application. Android utilizes a permission system to limit application access to system resources. The application should seek the required permissions via the AndroidManifest.xml file if it attempts to access hardware or software resources [51]. The number of Android built-in permissions is continuously increasing, from 166 permissions at API level 15 to 325 permissions at API level 28. More permissions mean more opportunities for exploitation [52].

Developers request a variety of permissions from the end user’s smartphone, but they may be unaware of the risks associated with obtaining these permissions. Some developers just request the permissions necessary for their apps, however others believe that getting unnecessary permissions will ensure that their apps continue to run under all circumstances [50]. According to [8] and [9], privacy-friendly QR code scanner applications need standard architectural choices for developers to build applications with the least privilege permission. The necessary permissions that should be requested from applications are camera (scan the QR code) and internet (check the URL link). The least-privileged permissions for Save-Privacy QR code scanners are [9]:

Camera: takes pictures and videos;
Network: gives network access and views network connections;
Wi-Fi: view Wi-Fi connections;

However, some of these scanners request unusual permissions which can lead to exploitation, such as changing or erasing the contents of the user’s SD card, location, microphone, Bluetooth, telephone access for the purpose of directly calling phone numbers, SMS, and drawing over the other apps to modify system settings, etc.

These permissions may expose scanners to vulnerabilities. [53] discovered that a popular QR code scanner application downloaded over 10 million times from the Google Play Store has infected up to 10 million devices via a software update in December 2020. According to [54], six malicious QR code scanner applications with more than 500,000 downloads were discovered in the Google Play Store in 2018, which propagated a virus known as Andr/HiddnAD-AJ. These applications are able to evade Google’s scanning by hiding malicious code and delaying the start of operation until six hours after installation, allowing them to avoid detection. Based on the findings of [55], Zebra Crossing or ZXing [56] with over 126 million installations in 2016 contained three specific vulnerabilities, namely code injection, unauthorized actions, and information leakage. Table 2 illustrates a summary of the related works, including our proposed scanner.

TABLE 2 Summary of Related Work

SECTION III.

QsecR Framework

The proposed framework of QsecR is shown in Figure. 1. The malicious URL detection framework is separated into three phases. The first phase checks the URL for redirection. The specific purpose of this phase is that if a shortened or redirected URL is used, it redirects it to the original website in order to evaluate by features. The second phase is feature classification. This phase consists of four features, each of which consists of various classes, and its primary objective is to extract relevant information from the URL in order to detect malicious URLs. The third phase is to evaluate the result and detect malicious URLs. It evaluates the value given by features. If all the features provide a value, it proceeds to the final computation; however, if any of them fails to deliver a value, it employs other features’ values and utilizes all these values for detection.

FIGURE 1.

Proposed malicious QR Code detection framework.

Show All

A. URL Redirection

Cybercriminals attempt to bypass malicious URL detection techniques using obfuscation methods. Short URLs have established as the most effective obfuscation method to trick users by displaying malicious URLs as legitimate, and are widely used in phishing and malware websites [47], [60]. The redirection phase describes the procedure to take the input URL from the QR code and then run the algorithm for detecting obfuscated URLs. The specific purpose of this phase is to redirect URLs to the original website in order to evaluate by features if a short URL is used. This method helps to improve the detection accuracy of the framework by sending the original URLs to the feature extraction and classification part and evaluating them by features. If a URL does not redirect and send to these features, some of them will wrongly evaluate the website and provide an incorrect value, resulting in a high false positive rate.

The procedure for this phase begins with reading the QR code image. If the QR code contains text format, the application shows the user that the QR is benign and displays content to the user. However, if it includes a link, it checks if the URL includes a list of data formats and file extensions (URLs include IP addresses, executive files, and multimedia file formats). If yes, it displays the original URL and ends this phase (sending the URL to the next phase), and if no, it opens the website in a WebView. Android WebView is a Chrome-powered system component that allows Android apps to show online information. In this part, the WebView checks if the inserted URL is original or redirected. If original, it displays the original URL and ends this phase. If not, it opens the original website in the second WebView and redirects to the first WebView by running a method that is called “overrides URL loading” and repeating it until the original URL is displayed [61]. The first WebView counts the number of redirections, and the second one shows the original URL. The significance of this phase is that it returns the website to its original URL if it has been redirected even more than ten times. In the last step, the URL will be displayed to the user, and if a redirection occurs, it will be displayed as well. Here, the URL is ready to go on to the next phase and be processed through the features.

B. Feature Extraction and Classification

There are specific features that must be selected and implemented in order to have a successful and effective detection system. Also, selecting features is extremely hard and needs expertise, and may lead to decreased detection performance due to ineffective feature selection [62]. The initial step is to classify and extract relevant information that adequately characterize the URL. These classes are extracted through parsing and analyzing various components of URLs, as well as by utilizing several APIs to provide valuable information.

In this research, the feature classes are gathered from previous malicious URL detection models, which achieved high level of accuracy [43], [63]. A malicious URL detection method based on supervised batch and online machine learning classifiers [43]. This research focuses on multi feature classification by employing 117 static and dynamic lexical, host-based, redirection, and content-based features. In the other research [63], authors proposed a machine learning approach for malicious URL detection by combining linear and non-linear space transformation approaches. It utilized 62 classes for feature classification, extracted from blacklist, lexical, host-based, and content-based features.

Although it is possible to utilize numerous classes, and this may improve the detection accuracy slightly, it is overwhelming and increases the response time. However, in this research, the classification is enhanced by having new classes and by utilizing only critical classes that assist in detecting malicious URLs to increase accuracy.

The critical classes refer to types of classes that assist in extracting essential statistical information from a URL, which leads to detecting whether this website is malicious or benign. These classes give relevant information regarding the website, which results in increased detection accuracy. The critical classes are identified according to the results of detecting malicious URLs in several datasets. Also, in this research, some new classes are added in the certificate and host-based features parts.

This phase consists of four features, each of which consists of various classes, and its primary objective is to extract relevant information from the URL in order to detect malicious URLs. These features are the blacklist, lexical, host-based, and content-based. In QsecR framework, we employed 39 classes, which are shown and described in detail in table 3.

TABLE 3 List of Selected Features Along With Their Characteristics and Reasons for Their Selection

The blacklist feature searches the URL in several databases and checks if it has already been discovered to be harmful (Phishing, Malware) or not. The VirusTotal [64] API is used to search over 90 different blacklists and web security service websites to determine how many of them contain the provided URL.

The lexical feature is the textual properties of a URL and extracts various details from the URL strings. The URL is broken into multiple components to enhance feature classification, such as the entire URL, hostname, path, and top-level domain (TLD), and each of them is inspected individually for analysis.

The host-based feature extracts a variety of information about the host and web rank. It looks at a website’s traffic statistics and popularity, as well as the host details and the website’s owner’s personal information. This feature invoked two APIs that are called WhoXy [65] and OpenPageRank [66]. WhoXy is a type of WHOIS API, which is a hosted web service and returns well-parsed WHOIS information in multiple formats. The OpenPageRank aims to share the proposed host ranks and visit metrics.

The content-based feature is statistical information, which the website should completely download from the server and extract information from the raw content. It provides a lot of valuable information and is categorized into HTML, JavaScript, and certificate parts. Each part has various classes and searches through the programming functions.

The URL that was delivered in the previous phase will be used for each feature. Depending on the functionality, it may collect information by invoking APIs, parsing the URL, or downloading HTML. Then extract the information from them according to predefined values that shown in table 3. The classes are defined and categorized via numeric and binary values. The numerical values represent the sorts of classes required to count the objects, and binary classes are defined by the presence of unique objects.

The feature’s values could be predefined (fixed) or dynamic according to the value specified for a class. In the malicious URL detection methods machine learning and deep learning methods are define the dynamic value for the classes after training the dataset and allocate a range for it. The predefined methods are determined and set the values and the rules of the classes before the feature extraction [67]. This method is assigned to overcome the data-dependent limitations of learning methods.

The predefined method assists in creating a range of classes and overcoming the majority of the limitations experienced by current malicious URL detection methods. The predefined values are generated and configured based on the static features and characteristics of URLs that are collected according to various conditions in multiple datasets [68]. It assigns the value for the classes with the greatest performance to detect malicious websites, and its performance is independent of the quality of the dataset. The predefined values of the classes in table 3 are stored in the GitHub account [69].

Afterward, it does certain comparisons based on the predefined classes and returns a result. Each class has a value between 1 and 5, which means 5 indicates that the URL is more likely to be malicious and 1 indicates that it is benign. Due to the comparison, each class should return a value. Finally, the overall outcome will be determined by averaging the class values, and a feature should return a final value of 1 to 5. These values will be sent to the next phase for detection.

Although obtaining so many additional classes is possible and may provide a new perspective on the research, it is overwhelming, increases the response time, and may provide some security challenges. Furthermore, due to the fact that this framework is data-independent, new classes can be added easily without having to retrain the entire framework.

C. Malicious URL Detection

The detection framework was developed using the quantity of accessible data and the total amount of data and is constructed as follows. The proposed malicious URL detection framework (DF) is based on a predefined static feature classification method [68], [70] and is presented in equation 1.

$\begin{equation*} DF=\sum \nolimits _{i=1}^{i=n} {} \left ({F_{i}\ast 20 }\right) \tag{1}\end{equation*}$ View Source

$\text{F}_{\mathrm {i}}$

represents the feature value, which i={1, 2,…, n} shows the number of features and the total value is multiplied by 20 to determine the wealth of each feature, which is out of 100. DM is compared to the threshold value, which is 200; if it exceeds the threshold, the URL is malicious; otherwise, it is benign. The rule below is the feature evaluation method which is implemented in this research.

Rule:

$\begin{align*} if ~{any ~F}_{i}&=-1~ and ~other ~F_{i}~\ge ~3.5 ~or ~blacklist \\ &\ge ~3 ~then~ {F}_{i}~ value~ assign~ to ~greatest~ feature~value\end{align*}$ View Source

This method evaluates the value delivered from features. It determines if all of the features deliver value for the final calculation, and if any of them fails to deliver a value, the method will decide to use the other feature’s value instead for other features according to various conditions. The $\text{F}_{\mathrm {i}}$ represents the feature value and is obtained by using Equation 2.

$\begin{equation*} F_{i}=\sum \limits _{i=1}^{i=n} \left ({C_{i} }\right) \tag{2}\end{equation*}$ View Source

$\text{C}_{\mathrm {i}}$

indicates the value of the class, where i = {1, 2,…, n} displays the number of classes of a feature, and each class can have a value of 1, 3, or 5 depending on the numerous comparisons and conditions that are present in [69]. The predefined values are assigned a range of values for the classes to do analyses and comparisons and return a result.

Drawbacks of the existing secure QR code scanners were discovered through observation and experimentation. Feature evaluation method was applied in this phase to enhance the detection accuracy of malicious URLs, which solved the majority of detection problems. This method evaluates the value delivered from features in the feature classification phase. It checks to see if all the features provide the value for final calculation (except lexical feature that always return a value), and if any of them fails to deliver a value, it needs to provide future actions. This scenario might play out if an application programming interface (API) fails to respond or if a server unexpectedly goes down. This is the novelty of this research, which includes an overwhelming calculation, and the framework will decide to use the other feature’s value instead for detection. Even if two features do not respond, this framework is able to detect malicious URLs with a high level of accuracy. Figure 2 shows the pseudocode of the malicious detection framework.

FIGURE 2.

Pseudocode of the malicious detection framework.

Show All

SECTION IV.

Performance Evaluation and Experimental Setup

In this section, we evaluate the performance of the QsecR framework. First, we introduce the experimental dataset used in this research. The dataset contains 4000 real-world random URLs. Second, we present evaluation metrics, which include five metrics and equations. Third, we assess a comprehensive systematic review of QR code scanners and compare several applications from security and privacy perspectives. Fourth, we present the design and development of the QsecR application. Fifth, we evaluate the performance of QsecR and benchmark it with other secure QR code scanners using the proposed dataset. Sixth, we discuss the factors contributing that lead to QsecR’s outperformance of other scanners in terms of accuracy.

A. Dataset

The proposed malicious URL detection framework utilizes a training dataset that includes 5,500 samples. The 1500 malicious URLs were collected from a malicious URL dataset in Kaggle that was collected from 2020 to 2022, and 4000 benign samples were collected from the top 4000 site links of Alexa in 2022.

The experimental dataset contains 4000 real-world URLs that were gathered recently and contains 2000 benign and 2000 malicious URLs (1000 phishing and 1000 malware). The malicious URLs were collected from two of the most well-known malware and phishing databases, URLhaus and PhishTank [72], [73]. The CSV format is used to store the entire dataset [59]. Each URL in the dataset is labelled as malicious or benign.

All the websites have gathered for the dataset are online, and the servers are responding, which can help to evaluate our research in real time. The URLs are checked by different tools for verification and labelled as benign or malicious. For evaluating the framework’s performance, a variety of URL characteristics were picked for this dataset, such as shortened URLs, URLs with IP addresses, obfuscated URLs, extremely lengthy and short URLs, and URLs that were redirected more than twice (Table 4).

TABLE 4 Types of Challenging URLs Used in Dataset

B. Evaluation Metrics

We compared the results using the confusion matrix, a table designed to visualize the performance of QR link detection (Table 5). It includes the following prediction quality measures:

True Positive (TP) indicates the number of URLs that were correctly detected as malicious.
True Negative (TN) indicates the number of benign URLs correctly detected as benign.
False Positive (FP) indicates the number of benign URLs that were incorrectly detected as malicious.
False Negative (FN) indicates the number of malicious URLs that were incorrectly detected as benign.

TABLE 5 Confusion Matrix

Besides, to comprehensively represent QsecR performance, we evaluate it using some metrics which are: Accuracy (Acc), False Positive Rate (FPR), Precision (Pre), Recall (Rec), and F-1 score (F1). The evaluation metrics are shows in Figure 3.

FIGURE 3.

Evaluation metrics.

Show All

C. Comparison of QR Code Scanners

In this section, we present a comprehensive systematic review of QR code scanner applications on Android and evaluate several scanner applications from the security and privacy perspective. These apps were selected randomly based on previous researches, popularity, and security features. Unfortunately, related articles that developed secure QR code scanner do not provide public access for evaluation of the applications [24], [28].

Table 6 shows the details of the 15 secure QR code scanner applications as well as QsecR that were compared [74]. The items that are considered are version, number of downloads, users’ rate, and security features. The information in this table demonstrates the lack of security features in various scanners that have received millions of downloads. These scanners do not provide adequate protection to safeguard users from potential threats [75], [76].

TABLE 6 Qualitative Comparison of QR Code Scanners

The next step is to evaluate the secure QR code scanners which provide security features. Table 7 illustrates the components that were evaluated for these applications. These components are check URL, display URL, get full URL (redirect), directly open URL, URL checking method, and detection framework. As seen in this table, the majority of secure QR code scanners do not meet baseline security requirements. [9], [25] present the criteria to develop a secure and usable QR code scanner. [58], [77], [78], [79], [80], [81], [82] are unable to redirect a URL and obtain the full URL. Even though some of them [77], [79] immediately access the website without user permission. Furthermore, the major detection method utilized in these applications is the blacklist method which has the lowest accuracy in detecting malicious URLs, and hardly can discover scanners that employ alternative detection methods [32].

TABLE 7 Evaluate Security Features of QR Code Scanners

The malicious URLs have been generated to evaluate the security strength of QR code scanners. The Zoo [87] and Zphisher [88] toolkits have been used to generate malicious URLs for phishing and malware propagation toolkits on GitHub [69]. The Zoo is a very popular malware repository for analysis and enables researchers who are interested to evaluate around 350 live malware projects. It contains 11 android destructive tools in the different categories of viruses, botnets, Trojans, and ransomware. Since Dendroid and Android Spy iBanking have the Android application file format (APK), they are selected from these tools and are suitable to test the QR code scanners. The Dendroid is a trojan Android remote administration toolkit that provides a variety of spying options such as taking photos, downloading pictures, recording audio, recording video, recording calls, sending texts, and more, as well as getting full permission from the user during installation [89]. The Android Spy iBanking is a botnet and is interesting for spying users’ specific capabilities, including SMS messages, redirecting incoming voice calls, and capturing audio and video. It uses various banking Trojans in an effort to bypass a mobile two-factor authentication method.

Also, Zphisher provides multiple updated phishing websites and allows users to perform phishing attacks on several sites and social media, such as Facebook, Twitter, PayPal, Instagram, Netflix, and many more. In this study, PayPal website was selected to evaluate the efficiently of the QR code scanners to detect phishing attacks.

Finally, the most well-known URL shortening websites, Bitly [90] and Rebrandly [91], were utilized to generate short URLs to evaluate the usability of existing scanners. The main reason for using short URLs is that the most secure QR code scanners (malicious URL detection frameworks) cannot detect URL redirection, and those that can are not able to detect it two times.

Table 8 demonstrates the security performance of QR code scanners as well as QsecR. There are nine malware and phishing QR codes utilizing obfuscation techniques to evaluate the effectiveness of these scanners. Figure 4 shows the result of this evaluation and the number of detected phishing and malware QR codes by apps. The outcome reveals that [32] are capable of detecting some harmful websites, and [57] are suspicious about phishing attacks, while others lack the ability to detect malicious URLs. Also, our proposed scanner detected all the phishing and malware samples.

TABLE 8 Evaluate the Security Performance of QR Code Scanners Utilizing Nine Malware and Phishing QR Codes

FIGURE 4.

The number of detected phishing and malware QR codes by secure scanners.

Show All

Existing secure QR code scanners are susceptible to flaws for a number of reasons. First, current scanners are incapable of redirecting shortened URLs and send the fake URL for detection. Second, the primary detection method of existing scanners is blacklist-based, which is incapable of identifying newly created or obfuscated malicious URLs. Third, the scanners that employ machine learning for detecting malicious URLs utilize weak feature classification and do not implement any method if the features do not deliver value for the final calculation.

The other weakness of some scanners is that they feature a suspicious-URL option that might make it difficult to distinguish between malicious and safe links. However, QsecR solves this weakness.

Furthermore, we have reviewed all the applications based on the permissions they require during installation. The safety of QR code scanner applications is a major concern for QR code security [20], [29]. There are several potential threats to the privacy of Android devices, but the most severe is an application’s request for excessive permissions [50], which may lead to gaining access control over the entire smartphone and acquiring entry to the user’s sensitive data.

The necessary permissions that should be requested from applications are camera and internet (Wi-Fi, network) [9]. Table 9 shows the requested access from the applications.

* Camera (Cam): Take pictures and videos.
Storage (Stg):1- Read the contents of your USB storage.2- modify or delete the contents of your USB storage.
Location (Loc):1-Approximate location (network-based), 2-Precise location (GPS and network-based).
Contacts (Cont):1- Read your contacts. 2- Modify your contacts.
* Wi-Fi Connection Information (wi-fi):1- View Wi-Fi connects. 2- Connect and disconnect from Wi-Fi. 3- Allow Wi-Fi multicast Reception.
Photos/ Media/ Files (Files):1- Read the contents of your USB storage. 2- Modify or delete the contents of your USB storage.
Phone (Ph):1- Read phone status and identity. 2- Directly call phone numbers.
Device ID & Call Information (DevID): Read phone status and identity.
Device & APP History (DevHis): Read your web bookmark and history.
Calendar (Cal): Read calendar events plus confidential information.
* Network (Net):1- Full network access. 2- View network connections. 3- Receive data from the internet. 4- Change network connectivity.
Others (Oths):1- Control Flashlight. 2-Control vibration. 3- Prevent device from sleeping 4- Disable your screen lock. 5- Run at startup. 6- Draw over other apps 7- Install shortcuts 8- Google play license 9- microphone.

TABLE 9 Requested Permission From Applications During Installation

According to [8] and [9], save privacy QR code scanner applications with the least privilege permission must only request access to the camera, Wi-Fi, and network. However, some of these applications request unusual permission. Most applications request access to storage to read, modify, or delete the contents. Reference [77] request for microphone access; [80] asking for Bluetooth permission; [57], [75], [78], [80], [82], [83], [86] requesting access to the location. Reference [81] asking for uncommon permissions to read contacts, call, SMS, and draw over the other apps’ modified system settings. Reference [82] requests extremely abnormal access to draw other apps, control near-field communication, retrieve and run other apps, and read phone status and identity. Reference [83] request access to the telephone for the purpose of directly calling phone numbers. Reference [79] and [86] requesting permission to access the calendar.

The proposed QR code scanner (QsecR) [59] is one of the most privacy-friendly application with the least privilege permission, which only requests access to the camera, Wi-Fi, and network. After that is [32] with request to access the camera, Wi-Fi, storage, and network.

D. The QsecR Application

According to the recommendations presented in the [9], [26], and [27], we have developed the QsecR application (available at the GitHub account [59]). The application is developed based on the MVVM architecture in Kotlin programming language on Android. The QR code scanner was developed according to the 2.1.2 library based on ZXing [93]. Figure 5 shows the interface of QsecR, its features, and the process of detecting malicious URLs.

FIGURE 5.

The interface, features, and process of detecting malicious URLs of QsecR.

Show All

Figure 5.1 presents the main interface of the QsecR application. This application contains an insert button that is able to manually insert a URL or scan a QR code, as shown in Figure 5.2. The Figure 5.3 demonstrate the process of manually inserting the website for detecting malicious URLs. Figure 5.4 illustrates the process of scanning a QR code using the scanner. If the application detects that the QR code contains text, it shows the message directly. The Figure 5.5 shows the result of scanned QR code. The application detected that the website is benign. The application shows the detection details by clicking on the website name. This part shows each feature’s value and the final score. Since the final score is less than the threshold value of 200, it is benign. Furthermore, since it is a safe website, the application allows the user to open the link by clicking on the open URL button. Figure 5.6 shows the procedure for detecting a URL when a feature does not respond. In this part, the content-based feature did not respond because the server was down and could not download the website and extract information. Since the other features did not detect that the URL was malicious (according to Figure 2), it did not employ the other features’ value for the content-based feature. In the same situation, Figure 5.7 shows the procedure for detecting malicious URLs when the content-based feature does not respond. Since the blacklist features detect that the URL is malicious (according to Figure 2), the value of content-based is employed for the blacklist feature. Also, the application does not allow the user to open the link. Figure 5.8 shows the result of scanning several QR code images that remained in the history.

E. Performance Evaluation

The performance of secure QR code scanners was evaluated in the previous section. The outcome reveals that only Barsec [32] is capable of detecting some harmful websites, while others lack the ability to detect malicious URLs in real time. To show the feasibility of the QsecR scanner, we evaluate the performance of QsecR and Barsec utilizing the proposed dataset, which contains 4000 real-world URLs. The confusion matrix of QsecR is shown in Table 10. The confusion matrix is used to measure the detection performance of a scanner. The QsecR correctly detected 1876 benign URLs out of 2000 and with 124 undetected websites. Also, it correctly detected 1780 malicious URLs out of 2000 phishing and malware websites and only could not identify 130 websites, which shows great detection performance.

TABLE 10 Confusion Matrix of QsecR

The confusion matrix of BarSec is illustrated in Table 11. It accurately identified 1546 out of 2000 benign URLs and 454 false positives. In addition, it accurately identified 1599 malicious URLs out of a total of 2000 websites, with a false-negative rate of 401.

TABLE 11 Confusion Matrix of BarSec

The details of the performance evaluation of the QsecR and Barsec are shown in Table 12 and Figure 6 illustrate. As shown in this table, the proposed QR link security detection framework outperforms the other scanners. The accuracy and precision of QsecR are 93.50% and 93.80%, respectively, compared to Barsec, whereas they are 78.63% and 77.30%. The findings demonstrate that QsecR improved the malicious URL detection accuracy of Android QR code scanners.

TABLE 12 Detail Performance Evaluation of the QsecR and Barsec

FIGURE 6.

The performance of the QsecR and Barsec.

Show All

F. Discussion

In this research, we propose QsecR, a secure, privacy-friendly, and usable QR code scanner, according to a data-independent malicious URL detection framework. It outperforms other scanners by an accuracy of 93.50 %, and the factors contributing to that will be discussed here.

The redirection section plays a key role in detecting obfuscated URLs (short URLs) and sending the original URLs to the feature extraction and classification part, which evaluates the original URLs by features. If a URL does not redirect and send to these features, some of them will wrongly evaluate the website and provide an incorrect value, resulting in a high false-positive rate.

The other essential component of the malicious URL detection framework is effective feature classification. The feature classification is based on a predefined static classification, which assigns a value to the classes with the greatest performance in detecting malicious websites. The predefined method overcomes the majority of the limitations experienced by current malicious URL detection methods. Also, we employ 39 classes of blacklist, lexical, host-based, and content-based features. The classes are selected according to their importance and effectiveness in detecting malicious URLs. Although it is possible to implement over 150 different classes and this may improve the detection accuracy slightly, it is overwhelming, increases the response time, requires complicated calculations, and may present some security challenges.

The crucial aspect of this detection framework is the evaluation of the value that is delivered by features and situations where the features do not deliver a value, which is hard to find in the other frameworks. This scenario may play out if an API call is unsuccessful or if a server goes down. It will employ alternative feature values for detection based on a variety of circumstances.

The other important principle that sets QsecR (malicious URL detection framework) apart from other scanners is that it classifies websites exclusively as harmful or benign (not a third option such as suspicious).

SECTION V.

Conclusion and Future Work

In this paper, we proposed QsecR, a secure and privacy-friendly QR code scanner. It was according to a data-independent malicious URL detection framework based on predefined static feature classification. We employed 39 classes of blacklist, lexical, host-based, and content-based features. Furthermore, we implemented a feature evaluation method that evaluated the value that was delivered by features and utilized the value of other features in the absence of value from any of the other features. We evaluated the performance of the proposed scanner using 4000 real-world datasets and compared the result with other secure QR code scanners in terms of security and privacy. The result showed QsecR outperforms others with accuracy and precision of 93.50% and 93.80%, respectively. Furthermore, it is one of the most privacy-friendly application with the least privilege permission.

Although the proposed framework performs well, further improvement is still required along with further studies to improve the entire system. Our objective is to improve the accuracy of malicious URL detection by assigning a priority coefficient to the classes and features according to their level of importance. Besides, enhance the feature classification by expanding the number of classes to get a more accurate result in the detection of malicious URLs. Also, another potential research agenda is evaluating secure QR code scanners in terms of response time. Last but not least, future research should focus on conducting comprehensive measurements of device resource usage performance. Specifically, an investigation into memory utilization, network usage, CPU usage, etc. Measuring device resources during experiments allows researchers to assess the performance and efficiency of applications and frameworks and how they operate under various conditions. This data helps identify bottlenecks, optimize resource allocation, and enhance overall system performance.

References is not available for this document.

QsecR: Secure QR Code Scanner According to a Novel Malicious URL Detection Framework

Abstract:

Metadata

Abstract:

Introduction