Skip to Main Content
Benchmarking the security of web applications is complex and, although there are many proposals of metrics, no consensual quantitative security metric has been proposed so far. Static analysis is an effective approach for detecting vulnerabilities, but the complexity of applications and the large variety of vulnerabilities prevent any single tool from being foolproof. In this application paper we investigate the hypothesis of combining the output of multiple static code analyzers to define metrics for comparing the trustworthiness of web applications. Various experiments, including a benchmarking campaign over seven distinct open source web forums, show that the raw number of vulnerabilities reported by a set of tools allows rough trustworthiness comparison. We also study the use of normalization and false positive rate estimation to calibrate the output of each tool. Results show that calibration allows computing a very accurate metric that can be used to easily and automatically compare different applications.