Loading web-font TeX/Main/Regular
Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums | IEEE Conference Publication | IEEE Xplore

Understanding Regular Expression Denial of Service (ReDoS): Insights from LLM-Generated Regexes and Developer Forums


Abstract:

Regular expression Denial of Service (ReDoS) represents an algorithmic complexity attack that exploits the processing of regular expressions (regexes) to produce a denial...Show More

Abstract:

Regular expression Denial of Service (ReDoS) represents an algorithmic complexity attack that exploits the processing of regular expressions (regexes) to produce a denial-of-service attack. This attack occurs when a regex’s evaluation time scales polynomially or exponentially with input length, posing significant challenges for software developers. The advent of Large Language Models (LLMs) has revolutionized the generation of regexes from natural language prompts, but not without its risks. Prior works showed that LLMs can generate code with vulnerabilities and security smells. In this paper, we examined the correctness and security of regexes generated by LLMs as well as the characteristics of LLM-generated vulnerable regexes. Our study also examined ReDoS patterns in actual software projects, aligning them with corresponding regex equivalence classes and algorithmic complexity. Moreover, we analyzed developer discussions on GitHub and StackOverflow, constructing a taxonomy to investigate their experiences and perspectives on ReDoS. In this study, we found that GPT-3.5 was the best LLM to generate regexes that are both correct and secure. We also observed that LLM-generated regexes mainly have polynomial ReDoS vulnerability patterns, and it is consistent with vulnerable regexes found in open source projects. We also found that developers’ main discussions around insecure regexes is related to mitigation strategies to remove vulnerable regexes. CCS CONCEPTS • Software and its engineering\rightarrow State based definitions; • Security and privacy \rightarrow Denial-of-service attacks; • Computing methodologies \rightarrow Multi-task learning.
Date of Conference: 15-16 April 2024
Date Added to IEEE Xplore: 18 June 2024
ISBN Information:

ISSN Information:

Conference Location: Lisbon, Portugal

Contact IEEE to Subscribe

References

References is not available for this document.