Abstract:
Regular expression Denial of Service (ReDoS) represents an algorithmic complexity attack that exploits the processing of regular expressions (regexes) to produce a denial...Show MoreMetadata
Abstract:
Regular expression Denial of Service (ReDoS) represents an algorithmic complexity attack that exploits the processing of regular expressions (regexes) to produce a denial-of-service attack. This attack occurs when a regex’s evaluation time scales polynomially or exponentially with input length, posing significant challenges for software developers. The advent of Large Language Models (LLMs) has revolutionized the generation of regexes from natural language prompts, but not without its risks. Prior works showed that LLMs can generate code with vulnerabilities and security smells. In this paper, we examined the correctness and security of regexes generated by LLMs as well as the characteristics of LLM-generated vulnerable regexes. Our study also examined ReDoS patterns in actual software projects, aligning them with corresponding regex equivalence classes and algorithmic complexity. Moreover, we analyzed developer discussions on GitHub and StackOverflow, constructing a taxonomy to investigate their experiences and perspectives on ReDoS. In this study, we found that GPT-3.5 was the best LLM to generate regexes that are both correct and secure. We also observed that LLM-generated regexes mainly have polynomial ReDoS vulnerability patterns, and it is consistent with vulnerable regexes found in open source projects. We also found that developers’ main discussions around insecure regexes is related to mitigation strategies to remove vulnerable regexes. CCS CONCEPTS • Software and its engineering\rightarrow State based definitions; • Security and privacy \rightarrow Denial-of-service attacks; • Computing methodologies \rightarrow Multi-task learning.
Date of Conference: 15-16 April 2024
Date Added to IEEE Xplore: 18 June 2024
ISBN Information:
ISSN Information:
Conference Location: Lisbon, Portugal