A Systematic Literature Review on the Characteristics and Effectiveness of Web Application Vulnerability Scanners

Web applications have been a significant target for successful security breaches in the last few years. They are currently secured, as a primary method, by searching for their vulnerabilities with specialized tools referred to as Web Application Vulnerability Scanners (WVSs). Although, these dynamic approaches of testing have some advantages, there is still a scarcity of studies that explore their features and detection capabilities in a systematic way. This article reports findings from a Systematic Literature Review (SLR) to look into the characteristics and effectiveness of the most frequently used WVSs. A total of 90 research papers were carefully evaluated. Thirty (30) WVSs were collected and reported, with only 12 having at least one quantitative assessment of effectiveness. These 12 WVSs were evaluated by 15 original evaluation studies. We found that these evaluations tested mostly only two of the Open Web Application Security Project (OWASP) Top Ten vulnerability types: SQL injection (SQLi) (13/15) and Cross-Site Scripting (XSS) (8/15). Additionally, only one work evaluated six of the OWASP Top Ten vulnerability types and for only one scanner. We also found that the reported detection rates were highly dissimilar between these 15 evaluations. Based on these surprising results we suggest avenues for future directions.


I. INTRODUCTION
Web application vulnerabilities such as SQL injection (SQLi), Cross-Site Scripting (XSS), and Cross-Site Request Forgery (CSRF) are becoming common and widely reported nowadays. These vulnerabilities give attackers unauthorized access to sensitive information, such as credit card data, accounts, and medical information. Approaches for identifying vulnerabilities in web applications are classified as either Black-box testing or White-box. They are also known as Dynamic Application Security Testing (DAST) or Static Application Security Testing (SAST). White-box testing is used by security consultants who are well versed in different programming languages, creating algorithms, and skilled in inspecting application code [25]. On the other hand, Blackbox testing is used by cyber security professionals who are experts in the different technologies, skilled in analyzing user-supplied input, and capable of thinking outside the box. Black-box testing is the most common approach used for identifying vulnerabilities by testing them dynamically [6]. Tools that use a dynamic or Black-box testing approach are usually called Web Vulnerability Scanners (WVSs). Software developers and cyber security experts use these scanners to find vulnerabilities in web applications. Theses scanners have the capability of automatically evaluating the security of web applications with minimal human intervention and are usually marketed as 'point-and-click pentesting' tools [25]. Many WVSs, both commercial and free/open source, are available to help developers and security analysts discover vulnerabilities in web applications. However, these scanners vary in their technical features and vulnerability detection performance. Therefore, the selection of WVS should be based on various factors such as scanner characteristics, availability of documentations and the capability of the scanner in detecting vulnerabilities . To the best of our knowledge and date, the present article is the most comprehensive systematic literature review on the effectiveness and characteristics of WVSs. The contributions of this study are the following: 1) This survey performed a systematic literature review on published studies about WVSs. It includes four search engines and four key phrases for a total of 16 searches.
2) The returned publications were then analyzed and classified based on their type of contributions -methodology, approach, evaluation, or survey. 3) Each of the acquired web vulnerability scanners (total 30) were classified based on seven different characteristics: number of citations, license type, last update date, scanner technology, run platform, user interface type, documentation availability, and capability of detecting the OWASP Top Ten vulnerabilities. 4) Data was then collected and tallied regarding the reported effectiveness and vulnerability type detection rate of web vulnerability scanners as reported by the obtained (total 15) evaluation studies. 5) All details of the researchers' method, data, and findings were presented in tables and graphs within this article to make it a complete, self-contained, and stateof-the-art account on web vulnerability scanners.
The rest of this article is organized as follows: Section II: Background introduces web application vulnerability types and web application scanner types and approaches; Section III Related Work, describes related but less comprehensive surveys; Section IV: Research Questions; Section V: Research Methods, describes our systematic approach to this literature search; Section VI Results and Findings, describes in detail the findings and results; Section VII: Discussion, analyzes the results collected by the study.

A. WEB APPLICATION VULNERABILITIES
The vulnerability can be described as the security flaw which silently exists in a software program or application. Attackers or malicious entities try to exploit the vulnerability to gain illegal access to data owned by the application. The whole exploitation activity includes three actors: the application itself (victim), the implementation of the attack to compromise the application (attack), and the entity that carries the attack (attacker) [63]. OWASP Top 10 2010, OWASP Top 10 2013 and OWASP Top 10 2017 are detailed in Table 1.

B. BLACK-BOX WEB APPLICATION SECURITY TESTING
For many years, the most commonly used approach for testing a web application is Black-box security testing. This strategy serves as a talent to test the running web applications to discover security vulnerabilities and loopholes without prior knowledge of internal coding of the application. Typically, the testing team would be considered users of the application as they are provided with valid access to the user account, where the tester acts like an attacker to find out vulnerabilities and flaws in tested web applications [1].
Adam et al. [1] defined four phases for a Black-box vulnerability test: 1) Planning phase: The rules and objectives for the test can be set in this phase.
2) Discovery phase: This phase is divided into two stages. The first stage includes the initiation of the test and the collection of information. The second stage performs vulnerability analysis, which occurs after the attack phase. 3) Attack phase: This phase examines the various vulnerabilities in the target application, which is also known as "the heart of the test". 4) Reporting phase: This phase provides documentation with a combination of other phases. An assessment plan is developed in the planning phase, whereas the discovery and attack phases involve recording and periodical reporting of events to the director. Finally, a report is presented which explains known vulnerabilities, the ranking of risks, and tips for the improvement of the recognized weaknesses [16].

C. BLACK-BOX WEB APPLICATION SECURITY TESTING ADVANTAGES
• Consistent: The Black-box testing is capable of showing the consistencies or inconsistencies of the system's requirements specifications [87]. • Simple: The tester does not have to deal with the tested system's internal structure or code, so the tester does not face many difficulties in performing the Black-box testing. It merely involves examining the inputs and outputs of the tested system, so it is not imperative to have in-depth internal knowledge of the system. Moreover, the source code also does not need to be accessible for conducting the test. • Rapid: Black-box tests do not require a long time for its preparation because the tester is not required to have full knowledge of the system in question. These tests follow the user paths, which are limited in relatively small systems [87]. • Impartial: Black-box tests show the result, whether or not the system works. Rather than the "developer" point of view, the tests are taken from a user point of view, creating independence for each party [87].

D. BLACK-BOX WEB APPLICATION SECURITY TESTING DISADVANTAGES
• It is hard to make precise test cases without exact specifications [87]. • It is not easy to distinguish potential and practicable inputs in constrained testing time . • It is possible that the already executed tests may be reperformed by the coder [87]. • In this testing, several areas of the program may remain untested [87]. This vulnerability allows users to make up a URL to access hidden content in the system and exploit it. The system needs to be configured so the URL access is allowed only with proper authorization [63], [65], [87].

12
Sensitive Data Exposure ✓ ✓ The sensitive data such as password, credit card details, personal information needs to be protected while this type of data is transferred as well as in storage using various encryption mechanisms. The data gets exposed if it is not protected while in transit or storage [63], [65], [87].

Insufficient Transport
Layer Protection

✓
The transport layer protocol is used for providing requisite encryption to avoid data vulnerability during transit. The data becomes vulnerable to man-in-middle attack and spoofing attack if it is not protected with proper encryption during transit [63], [65], [87].
14 Unvalidated Redirects and Forwards

✓ ✓
Many web sites direct the user by providing the link to other sites or web pages. However, if the redirected site's credibility is not checked or verified, the user can become the victim of malware or phishing [63], [65], [87]. 15 Insecure Deserialization ✓ The system uses serialization to convert binary data to ASCII characters so that it can be sent using common protocols. On the receipt side, the serialized data is deserialized. However, if the data after serialization is not sent along with integrity checks, an attacker can modify it, then the program code can be changed when it is deserialized leading to serious vulnerability [63], [65], [87]. 16 Using Components with Known Vulnerabilities

✓ ✓
A program using various components such as framework libraries and software modules during its execution. If any of these components have vulnerabilities, then the attacker can exploit it and launch an attack [63], [65], [87]. VOLUME 4, 2016

E. WEB APPLICATION VULNERABILITY SCANNERS AND THEIR ARCHITECTURE
Black-box WVSs are the automated testing tools used for examining and detecting vulnerabilities in web applications. Several WVSs test the prevalent vulnerabilities in web applications and web servers. These scanners are either academic research projects or open-source tools developed by academic members and researchers who are interested in studying and improving web application vulnerability tools or commercial products that are owned by software companies. The commercialized scanners usually provide more effective results than open-source scanners; however, they can cost from just under 100.00 to over 6000.0 [81]. The design of a WVS includes three core components as per the usage scenario. First, the crawler module grabs the content of the web pages. Second, the attacker module is designated for launching the attacks. Third, the analysis module highlights/mentions vulnerabilities.
• The Crawling Module is the most crucial component of a WVS and is performed by utilizing a "crawler" component. It investigates the web application to identify and recover web pages and the related input vectors like input fields in HTML forms, and request parameters GET and POST, and cookies. Moreover, the crawler generates an indexed list of all the crawled web pages. The detection of web vulnerabilities determines the quality of the crawler. If the scanner's attack engine is sub par, a vulnerability may be missed [48]. • The Fuzzing Module is used to investigate the URLs of the pages and input vectors. After the crawling, the attack patterns recognized in the previous step are sent by the crawler to the entry points. It produces a vulnerable value that triggers a type of vulnerability for each entry tested using the WVS. For example, the fuzzer tries to detect XSS vulnerabilities by injecting malicious JavaScript code or SQL injection vulnerabilities by using SQL strings with specific meanings, such as ticks and SQL operators [48]. • The Analysis Module examines the pages that the WVS returns due to the attack that the attacker module launched to detect potential vulnerabilities and provide feedback to other modules. For instance, input testing of SQL injection will return a page that contains a database error message; then, the analysis module may deduce the presence of an SQL injection vulnerability [48].

III. RELATED WORK
Black-box testing has been the focus of many recent studies aimed at improving security in data, systems, and networks. However, only a few surveys and overviews on Black-box web vulnerability scanners were returned by this research. Bertoglio and Zorzo [18] systemically reviewed 54 primary studies using quality criteria to selected papers to determine reliability and credibility. The criteria grouped papers as 'Good', ' Very good ', and 'Excellent'. The study identified scanners used for penetration testing and their characteristics.
Based on their analysis, 13 scanners were identified as the most cited ones. It also identified frameworks, methodologies, and security, testing models. Additionally, it analyzed the relationship between scanners and models besides some challenges of penetration testing. The researchers further identified process efficiency and effectiveness as critical challenges besides the vulnerability assessment process. Also, they noted that challenges in the analysis model and security scanners influence the security of scanners. Our research method is more extensive than the above study. We analyzed 320 studies in our work, and a total of 30 scanners were collected and identified in the paper. All of the returned scanners ' characteristics were provided based on what was indicated in the research papers and information available on the scanners' websites. Similarly, a survey study by Mirjalili et al. [65] explored the applications of web penetration testing and its models and highlighted the comparison between web vulnerability scanners. The survey reviewed previous literature on pen test methods and scanners and divided it into three categories. The first category examined and compared various methods and scanners. The second one proposed a new method or scanner for detecting vulnerabilities in web applications. The third category involved proposing a proper testing environment for executing web penetration testing. Moreover, the paper observed a correlation between 13 opensource and seven commercial scanners. It also noted that there are two core factors to judge the effectiveness and efficiency of the scanners. First is the "Structural Design" which deals with the GUI (Graphical User Interface), user ease, customization, and performance. The other key decision factor is the "Supported Features and Functionalities", which incorporate crawling techniques (automatic/manual), analysis techniques, auditing, and logging along with the generation of user reports. The researchers found that some of the reviewed scanners had technical problems such as the inability to detect some types of attacks, such as stored SQLi and stored XSS attacks. Also, some scanners did not support new technologies and were incapable of detecting vulnerabilities attributed to application logic flows. In our survey, we identified 21 free/open source and 9 commercial scanners. In addition, our research covers additional aspects such as the developer that created the scanner, the technology utilized to design it, and the scanner's operating platform (e.g. Windows, Mac OS X or Linux). We also looked at the scanner's user interface, whether it was GUI or CLI. Furthermore, we included the availability of documents, such as the user manual and installation guide. Another study conducted by Kyriakos et al. [55] reviewed existing literature on web vulnerability scanners. The researchers delved deep into fundamental open-source scanners and databases. They examined the web vulnerability of fundamental open-source scanners and databases by comparing them based on configuration, functionality, and support. The study also examined the scanners by comparing their accuracy of identifying vulnerability, errors in a web application, and their frequency. Moreover, it evaluated the functionality of the scanners based on categorization, vulnerability coverage, risk assessment inference, and counter-measuring. Besides, the researchers determined configuration using architecture, operation system support, level of usage, required resources, modularity, and access control mode. The researchers concluded that complete benchmarking of vulnerabilities, scanning strategy and workflow is essential to support the execution of the scanners. In comparison to this study, our analysis is more complete because it covers the most prevalent commercial and open-source scanners, whereas this study solely focused on open-source scanners. Additionally, we examined and discussed common web vulnerability scanner features, whereas this study only addressed three: settings, functionality, and support. Furthermore, the performance of the researched scanners in finding vulnerabilities in web applications was not included in this study. However, we reviewed in-depth the findings of evaluation studies undertaken on these scanners, as well as the knowledge gap in this domain.
Kumar and Sheth [56] conducted a review on the Zeroday vulnerabilities and the web application scanners that are used to detect these vulnerabilities in web services. The study explained different techniques used to detect and prevent zero-day vulnerability based on statistical-based methods, behaviour and signature-based methods, and hybrid techniques. The primary objective of each technique is to recognize the exploits' existence, eliminate them in realtime, and minimize the damage induced by the attack. One significant challenge is to ensure that the victim's machine threshold delay for analysis and quarantine is not exceeded. However, in some cases, this can cause undermining of the affected system. The researchers concluded that Zeroday attacks could misuse obscure vulnerabilities due to the absence or lack of antivirus, patches, and intrusion-detection signatures. To combat zero-day attacks, updating the system can disclose patches for most of the unknown vulnerabilities that were not detected during the system's development. In addition to that, the researchers suggested a robust framework designed to help the penetration tester detect and prevent zero-day vulnerabilities and remote code execution. Our research is thorough, and it includes information on all vulnerabilities identified by the Open Web Application Security Project (OWASP), including the OWASP Top Ten -2010, the OWASP Top Ten -2013, and the OWASP Top Ten -2017. Furthermore, we looked into both the commercial and opensource scanners for detecting these security laws.
Seng et al. [78] conducted another survey on the available methodologies used to assess web vulnerability scanners regarding test coverage, attack coverage, and vulnerability detection rate. It also highlighted the OWASP Top Ten vulnerabilities in web applications and the popular test-beds used to evaluate the web vulnerability scanners. In this study, the authors investigated some popular web vulnerability scanners, including Acunetix Web Vulnerability Scanner, Burp-Suite, N-Sparker, Wapiti, W3af, Vega, Arachni, and Owasp Zap. Nevertheless, the paper could not answer some of the research questions aimed at quantifying the quality of web application security scanners. For instance, the suitable number of testbeds used to benchmark a web application security scanner remained unknown. It only showed that the number of testbeds used to benchmark web vulnerability scanners ranged from zero to thousands. Besides, the researchers did not specify measurement metrics used in describing the test coverage of web application vulnerability scanners, attack coverage, and vulnerability detection rate.

IV. RESEARCH QUESTIONS
This paper investigates the WVSs to address three main Research Questions (RQs): RQ.1 What are the most cited web application vulnerability scanners? Ans. Table 3 reports the most cited web vulnerability scanners by other researchers' studies. RQ.2 What are the general characteristics of the reported scanners? Ans. Table 4 and Table 5 are built to list all the characteristics of the scanners to satisfy this question. RQ.3 What are the most common OWASP Top Ten vulnerabilities tested by the reported scanners? Ans. To answer this question and respond, Table 6 and Table  7 contain the evaluation results of studies conducted by other researchers.

V. RESEARCH METHODS
Systematic Literature Review (SLR) refers to the type of literature review that assists researchers in finding, classifying and investigating the existing literature for any particular research query. As its main objective, SLR assesses the already present literature in accordance with the research question and finds the gap in it. This SLR is following the guidelines provided by PRISMA [91].

A. SEARCH STRATEGY
PRISMA refers to the minimum set of evidence-based items used to detail meta-analyses and systematic reviews [91]. As its primary focus, it makes sure that the systematic reviews are reported transparently and completely and also details information flow through the various phases, such as identification, screening, eligibility and included: Identification : Researchers used chains of related words to get relevant papers in order to meet the objective of the study. Some of the keywords used in this search included Black-box, penetration testing, scanner, and tool. The researchers further developed an adequate set of search phrases by studying relevant literature. The selected search phrases include "vulnerability scanner", "web application vulnerability scanner", "penetration testing tool", and "injection tool." The authors also surveyed journal papers and international conference proceedings from databases such as Google Scholar, ACM Digital Library, SpringerLink, and IEEE xplore to acquire relevant research papers. Overall, 320 manuscripts were retrieved with keywords stated above.  Screening : The researchers merged the results from all searches, thus eliminating duplicate entries. After removing the duplicated articles, the authors obtained a total of 233 papers from the resources outlined above. They also read the title and the abstract of each paper for screening purposes and they found only 179 articles related to web application vulnerability scanners.
Eligibility : The researchers assessed the full texts of the 179 articles and only collected studies that introduced, compared, evaluated, or reviewed web vulnerability scanners. As a result, 89 studies did not meet the research objectives and were excluded, hence.
Included : The authors considered ninety (90) studies relevant and therefore included them in this study as they fulfilled the objectives of the study and answered the present research questions.

B. INCLUSION AND EXCLUSION
A set of inclusion and exclusion criteria was used to filter all research papers after their discovery. The following criteria were used to determine papers' inclusion: • Only peer-reviewed articles must be considered. • The article should cover Black-box web vulnerability scanners.
• Choosing the most complete version of the study for inclusion if it has been published in more than one journal. However, the exclusion criteria were as follows: • Duplicate studies. • Papers that are unrelated to Black-box web vulnerability scanners were omitted. • Inaccessible articles: To receive a private copy of them, an email was written to their writers. The articles were discarded if no response was received. • Articles that are written in a language other than English • Very brief publications (e.g., posters) that make only a minor contribution

VI. RESULTS AND FINDINGS
This section presents the results of the data collection, as well as how each question was answered. Please note that we only report on what was discovered in the reviewed papers. We do not personalize the information gathered.

A. PAPERS DISTRIBUTION
The 90 papers that were returned are distributed on the basis of the search engines they were obtained from, as depicted in figure 2 below. There are clear differences in the divisions which are apparent in the graph. For example, Google Scholar elicited the most relevant papers, while SpringerLink produced the lowest amount. This demonstrates the different outcomes from these resources when using the designated search terms. Furthermore, Figure 3 shows that the returned papers fall into three distinct categories: journal articles (51 papers), conference proceedings (34 papers), and workshop papers (5 papers). A majority of these papers (total of 57%), as shown in the figure, are journal articles; the conference papers represent 38% of the papers, and the rest of the studies compose workshop papers (5%). This implies that the workshop papers have a smaller impact than the other types of literature. Conclusively, the majority (95%) of the afore-
Moving forward, Shelly [81] analyzed the flaws and limitations of several WVSs. The evaluated scanners include W3af, Acunetix WVS, Burp Suite Pro, HP WebInspect,IBM Security AppScan and Netsparker. The researcher developed a custom vulnerable web application as a testbed used to test the selected scanners. It had two versions: a secure version for detecting false-positive results and an insecure version for detecting false-negative results. In this evaluation study, the researcher referred to the evaluated scanners as Scanner A, Scanner B, Scanner C, Scanner D, Scanner E, and Scanner F without considering the order. The mean value of these scanners' detection rate in detecting SQLi and XSS vulnerabilities was % 96 and % 43 respectively.
Additionally, Vieira et al. [88] conducted an experimental study to evaluate SQLi vulnerability in different web services. In this study, three well-known vulnerability scanners were used to identify the security loopholes in the available web services. The researchers decided not to mention the versions and brands of the evaluated scanners. Thus, they referred to the scanners as VS1.1, VS1.2, VS2, and VS3, where VS1.1 and VS1.2 refer to different versions of the same brand. The four evaluated scanners reported different performances in detecting SQLi. The detection rate of the fours scanners, VS1.1, VS1.2, VS2, and VS3 was 17.5%, 16.8%, 20% and 31.4%, respectively. Therefore, the mean value of the detection rate for all scanners is 21 %.
Further ahead, Makino and Klyuev [61] evaluated and compared OWASP ZAP and Skipfish in detecting (SQLi) and (XSS) in web applications. Two benchmarks were used for evaluating the effectiveness of the compared scanners, WAVSEP (Web Application Vulnerability Scanner Evaluation Project) and DVWA (Damn Vulnerable Web Application). This evaluation study is used to characterize the distinctive features and the detailed analysis of each scanner's reports and features for the vulnerability analysis. After the detailed analysis, it was concluded that OWASP ZAP performed better than Skipfish in detecting vulnerabilities, raising fewer false positives.
Moreover, Antunes and Vieira [7] compared the effectiveness of penetration testing and static code analysis techniques on the detection of SQLi in web services code. They used three popular commercial WVSs to detect vulnerabilities in a set of vulnerable services. The used scanners include HP WebInspect, IBM Rational AppScan and Acunetix Web Vulnerability Scanner. The brands of the scanners were not mentioned to assure neutrality. Thus, the scanners were referred to in this study as VS1, VS2, VS3 (without any order in particular). The performances of the three scanners were 50.8%, 36.1% and 9.8% for VS1, VS2, VS3 respectively. In this evaluation analysis, the mean value was taken to be 32.2%. Moving further, Šuteva et al. [85] tested and assessed six open-source or free WVSs (principally aimed at false-negative rates) by using the famous and vulnerable web application, 'WackoPicko'. The rates of false negatives of all the scanners were very high, ranging from 68.8 for IronWasp to 100 for W3af. NetSparker showed a high rate in finding all possible XSS vulnerabilities. Also, Aliero.M et al. [5] conducted an analytical evaluation to compare the effectiveness of their proposed approach-SQLIV-with the effectiveness of existing academic scanners (Acunetix WVS, IBM Security AppScan, OWAZP ZAP, Wapiti, Vega and W3af). The results showed that the two commercial scanners Acunetix WVS and IBM AppScan as well as the open-source scanner W3af achieved a high performance of 80 % in detecting SQLi vulnerabilities. Furthermore, Antunes and Vieira [12] proposed a new approach to designing a vulnerability testing scanner for web services. The researchers executed a case study to demonstrate their scanner's effectiveness in detecting SQLi vulnerabilities in web services. In this experiment, three commercial scanners representing the stateof-the-art vulnerability testing for web applications and web services were used. They include IBM Rational AppScan, HP WebInspect, and Acunetix Web Vulnerability Scanner. They referred to them as VS1, VS2, and VS3 without any particular order. The coverage of the tools, VS1, VS2, and VS3, stood at 51%, 38%, and 3% respectively. Also, the mean value was calculated to be 31%. Going further, Martirosyan [62] evaluated the effectiveness of Acunetix WVS in detecting OWASP Top ten vulnerabilities. The researcher used the MusicStore web application as a testbed for this study. The evaluation result showed that the scanner detected Insecure Direct Object References vulnerability with a perfect detection rate of 100 %. However, it performed poorly in detecting Insecure Cryptographic Storage with detection rate was only 28%. Moreover, Antunes and Vieira [10] evaluated three commercial scanners anonymously to compare their effectiveness with the effectiveness of their approach (SignWS) in detecting SQLi vulnerabilities. The three commercial scanners include Acunetix, IBM Rational AppScan, and HP WebInspect. The commercial scanners were named VS1, VS2, and VS3 without any consideration for the order. The detection rates of VS1, VS2, and VS3 were 32.28%, 24.05%, and 1.90%, respectively. Additionally, the mean value of the detection rate for the three scanners stood at 19%.
Further, Garn et al. [40] provided a methodology for a better detection process of XSS in web applications. They used Burp Suite Pro and OWASP Zed Attack Proxy (ZAP) to test their methodology. Mutillidae II version 2.6.3. was used as a testbed for running this experiment. The result showed that Burp Suite Pro performed better in finding XSS vulnerabilities with a detection rate of 88.9%, whereas the detection rate for ZAP was only 80%. Moving forward, DURIĆ [27] ran an evaluation experiment to compare the performance of his approach for detecting SQLi with the performance of some well-known WVSs. The selected scanners were four open-source scanners: W3af, Nikto, Wapiti, Vega, and ZAP, and one commercial scanner, Acunetix. The author employed six experienced master students to develop the testing environment for this experiment. The result showed that Acunetix achieved the best performance with a detection rate of 50%. Acunetix detected eight vulnerabilities out of 16, Wapiti detected six, W3af detected five, and Vega detected only one vulnerability. Interestingly, ZAP did not detect any vulnerabilities in any of the three applications. Additionally, Antunes and Vieira [8] ran an evaluation experiment to compare their approach (VS.WS) with four commercial vulnerability scanners (two of them were different versions of the same vendor). The goal of this study was to identify SQLi vulnerabilities in web services. The evaluated scanners included HP WebInspect, IBM Rational AppScan and Acunetix. Also, to maintain anonymity and equality, the specific scanner applications names and their corresponding versions were not mentioned by the researchers. They referred to the four scanners in their study as VS1.1, VS1.2, VS2, and VS3, with VS 1.1 and 2.2 being two different versions of the same vendor. The detection rate of SQL vulnerabilities by VS1.1, VS1.2, VS2, and VS3 was 84%, 84%, 30%, and 38% respectively. Also, their mean value was calculated to be 59%. .
Lastly, Shah [79] conducted an evaluation study to measure the Burp Suite capability in detecting vulnerabilities in web applications. The researcher used the OWASP Benchmark to evaluate the scanner's detection rate and crawling coverage. The total number of vulnerabilities detected by the scanner is 26, representing 50 % of the SQLi in the tested web application and produced 0.0 false-positive results. The time used to complete the scanning process was six hours and twenty minutes. The two benchmarks used for evaluating the effectiveness of the compared scanners were Web Application Vulnerability Scanner Evaluation Project (WAVSEP) and Damn Vulnerable Web Application (DVWA).

VII. DISCUSSION
The content of this section revolves around examining and discussing the results of the preceding section. Based on the results derived from this research, it was found that only a very small number of surveys and overviews have  Site Request Forgery (CSRF) (10,13) Broken Authentication and Session Management (10,13) 50% [62] Insecure Direct Object References (10,13) 100% [62] Missing Function Level Access Control (13) Security Misconfiguration (10,13,17) Insecure Cryptographic Storage (10) 28% [62] Failure to Restrict URL Access (10) Insufficient Transport Layer Protection (10) 29% [62] Unvalidated Redirects and Forwards (10,13) Sensitive Data Exposure (13,17) Using Components with Known Vulnerabilities (13,17) Broken Authentication (17) XML External Entities (17) Broken Access Control (17)  31% [27] 100% [61] 55% [63] 67% [5] 0% [27] 59% [85] 44% [5] 96% [6] 37% [27] 48% [63] 98% [6] 95% [61] 64% [85] 67% [5] Cross Site Scripting (XSS) (10,13,17) 100% [61] 60% [85] 76% [63] 80% [40] 35% [85] 61% [6] 64% [63] 67.5% [6] 82% [61] 73% [85] 60% [85] 6% [27] Site Request Forgery (CSRF) (10,13) Broken Authentication and Session Management (10,13) Insecure Direct Object References (10,13) Missing Function Level Access Control (13) Security Misconfiguration (10,13,17) Insecure Cryptographic Storage (10) Failure to Restrict URL Access (10) Insufficient Transport Layer Protection (10) Unvalidated Redirects and Forwards (10,13) Sensitive Data Exposure (13,17) Using Components with Known Vulnerabilities (13,17) Broken Authentication (17) XML External Entities (17) Broken Access Control (17) Insecure Deserialization (17) Insufficient Logging and Monitoring (17) been conducted on Black-box web vulnerability scanners; a majority of them revolve around merely summing up the concepts of the approaches without targeting their characteristics and effectiveness [18], [65], [55], [78]. However, the present study contains a systematic literature review on the most cited web vulnerability scanners, summarizing their characteristics and discussing the results of different evaluation studies conducted to compare their effectiveness in detecting the common web applications vulnerabilities. Based on the data collected from the reviewed studies, thirty (30) scanners were identified and it was found that their frequencies in the reviewed studies varied from scanner to scanner. For example, it was found that Acunetix WVS was the most cited scanner as it was cited by 39 papers; however, some scanners including JSPChecker, Havij, SQLDOM, SQL check, Vinject, WebSSARI, SQL Guard, SecuriFly and SQLInjectionGen were only reported by one paper each. We also found that there was no major difference between frequency of the commercial and open-source scanners in the reviewed papers, which may indicate that open-source scanners have similar importance for the researchers as the commercial scanners. Interestingly, among all returned studies, we found that the technical features and characteristics of the web vulnerability scanners were only discussed by a small number of studies. Therefore, we investigated the scanners' official websites, and we documented the main characteristics of them, including the technology utilized to design the scanner, and the scanner's operating platform (e.g., Windows, Mac OS X or Linux). We also looked at the scanner 's user interface, whether it was GUI or CLI. Moreover, we included the availability of documentations, such as the user manual and installation guide. As a results, we found Java to be the most frequently used language for designing the tools. We also found that all the identified commercial scanners including Acunetix WVS, HP Webinspect, IBM Security AppScan, Burp Suite pro, NetSparker and QualysGuard were provided with a Graphical User Interface (GUI), while some of the open-source scanners such as Wapiti, Skipfish were implemented with a command-line interface (CLI). Users can interact easily with the scanners that use GUI to perform scanning process; however, using scanners with CLI mode requires more technical knowledge from the users. Furthermore, even though the present study identified thirty (30) web vulnerability scanners, only twelve (12) of them were evaluated by prior research. The evaluation studies focused on measuring the capabilities of the scanners in detecting the OWASP Top 10 vulnerabilities. Based on the data collected from the evaluative studies, it was found that most of the OWASP Top 10 vulnerabilities tested by the previous studies were SQL Injection and Cross-Site Scripting [6], [81], [88], [7], [85], [12], [10], [40], [27], [79]. Only one work evaluated Acunetix WAS scanner against six vulnerability types from the OWASP Top 10 list, including Broken Authentication and Session Management, Insecure Direct Object References, Insecure Cryptographic Storage, and Insufficient Transport Layer Protection, besides SQL Injection and Cross-Site Scripting. This might be because SQL Injection and Cross-Site Scripting are the most popular web application vulnerabilities and because they are the most exploited web application vulnerabilities that yield effective results. Furthermore, these vulnerabilities enable attackers to access the back-end database of web applications and to steal, destroy, and edit confidential information. It can also be found that the detection rates of the evaluated WVSs fall between 0% and 100% for SQLi, whereas those for XSS fall between 6% and 100%. Interestingly, the evaluations conducted in these studies show inconsistencies in the results reported by the different scanners. Moreover, these scanners vary significantly in the detected vulnerability types, and the detection rates. This, in turn, may drastically decrease the level of trustworthiness that may be attributed to WVSs and subsequently increase the demand for further research that quantitatively evaluates the quality and accuracy of web application vulnerability scanners.

VIII. CONCLUSION AND KNOWLEDGE GAPS
In this article, we have systematically surveyed, collected, organized, and evaluated most of the available knowledge on web vulnerability scanners. We identified the most frequently used scanners and we investigated their features and characteristics. We also collected and analyzed the reported detection rates and accuracy of these scanners to detect OWASP Top Ten vulnerability types. We have achieved this by examining the published research field of web vulnerability scanners in three ways: 1) By examining articles that proposed a new, revolutionary method, algorithm, or scanner for detecting web vulnerabilities. 2) By examining the articles that themselves analyzed and compared the existing scanners for detecting web vulnerabilities. 3) By drawing insights from the existing surveys and literature reviews.
When we analyzed the relatively few (15) published evaluations of the performance of web vulnerability scanners, we discovered two unexpected and, we believe, three very important findings: 1) SQLi and XSS vulnerability types were the most common tested types among the OWASP Top Ten vulnerability types. The other types of vulnerabilities in the OWASP Top Ten list were almost not tested. Only one evaluation was found that reported evaluating four (4) other OWASP Top Ten vulnerabilities and this study evaluated only one commercial web vulnerability scanner. A total of 13 studies evaluated SQLi and 8 studies evaluated XSS performance for several scanners; However, most studies only evaluated one or two scanners against only one or two non-standard, and hence difficult to replicate, web applications. 2) After analyzing and collating the efficacy results as published in the 15 evaluations, we found disparate and inconsistent efficacy reports as detailed in Table 6 and  Table 7. 3) We found no published evaluations assessing the usability or quality of use of web vulnerability scanners.
Based on this findings, we would like to make the following recommendations for future directions: 1) Evaluations of web vulnerability scanners should include multiple and standard web applications such Damn Vulnerable Web Application (DVWA), OWASP Juice Shop, and Mutillidae. This would support experiment consistency and repeatability. 2) Evaluations of web vulnerability scanners should be based on the OWASP Top Ten vulnerability types or other common nomenclature for web vulnerabilities. The lack of standardization in this aspect makes it nearly impossible to adequately measure and compare the efficacy of different scanners. 3) Evaluations of web vulnerability scanners should include disclosures of affiliations or lack-of-thereof with commercial sponsors that may be potential biases for the evaluation. 4) Evaluations of web vulnerability scanners from a usability or quality-of-use perspective should also be performed.