Regardless of which positions you are fulfilling in the development lifecycle in your organization, regular expressions are useful tools to make your work more efficient. While writing a basic regex itself might not be very hard, defining a robust and secure one could be a real challenge.
You may start to ponder that I remembered that some security incidents were caused by an insufficient regex pattern where a malicious user input was NOT blocked or detected by the regex. You are right in that perspective and this is a very common scenarios of poor regex leading to a security incident. But in this article, I would like to talk about the danger of poorly designed regexes from a different perspective, ReDOS, which is an overlooked security issue by many developers and security engineers in my opinion.
ReDOS, an overlooked security risk
Before we start the exploration of ReDOS, we need to understand two basic things 1) Regular expression is a sequence of characters that specifies a search pattern in text, which are commonly used by string-searching algorithms 2) when the regex matching (string searching) is performed, the behind scene algorithm is using a so called backtracking algorithm, where it is a brute forcing method to find a solution/path in a context that will match with the regex, it means there could be an infinite loop when a regex matching is performed behind the scene. That is the root cause of the DOS caused by regular expression, ReDOS, for short.
Though the severity of ReDOS could be critical as it could paralyze your web server just with a single malformed string, this vulnerability is always overlooked by many security engineers because 1)they are not able to identify a regex with potential ReDOS vulnerability 2) ignore or underestimate the risk of it 3)not able to craft a malicious string to demonstrate the exploitation of this vulnerabilities. As a security engineer, I was trying to downplay the severity of the ReDOS vulnerability as well as I was not persuaded to believe one single string would be able to totally freeze a web server. I started to change my mind after I was able to exploit a ReDOS vulnerability imposed by an inefficient regex with an elaborate string. During the exploitation, the elaborate string consumed 100% CPU of the server when a regex match is performed between the string (payload) and regex
How did I start?
When I was performing a static code analysis against a piece of code with some help of a SAST tool, a vulnerability was flagged against the following regex and it states the following piece of regex could lead to ReDOS attack. (Note: The regex was modified and changed to make the POC easier to follow)
var regex = /\A([\sa-z!#]|\w#\w|[\s\w]+\”)*\z/g |
After reviewing the regex, I started to craft a string by using an online tool https://regex101.com/ to match the regex. It did not take me long to find out that the regex101 is reporting Catastrophic backtracking error when evaluating the following string.
That error message clearly shows the regex has a flaw in it. I was then using the Debugger functions provided by Regex101 to validate how the backtrack steps are created. After running the debugger, it indicated that there are some group repetitions in the regex, which makes the backtracking into an endless loop.
After figuring out a possible ReDOS vulnerability could be exploited, I created the following POC codes and ran it on a free tier AWS EC2 t2.micro instance (1G RAM).
The CPU usage of the EC2 instance hiked to 100% after the length of the test payload reached 99 characters after running the POC code, which really surprised me.
The above POC definitely changed my understanding of the ReDOS vulnerability as the impact could be catastrophic. You might start to ask yourself about how to spot this kind of vulnerability in your code and eliminate it before it gets exploited.
Identify DeROS vulnerabilities in you codes
It is relatively complicated to justify whether your regex is vulnerable to ReDOS. However, there are a couple of approaches that could make it a little bit easier for you.
Use some patterns to evaluate your Regex
Suggested by OWASP, there are so called pattern to identify evil regex
- Grouping with repetition
- Inside the repeated group:
- Repetition
- Alternation with overlapping
When you find a regex with repetition patterns, for example, + character is used in your regex, you should start to pay some attention to the regex as it contains repetition patterns.
Use some automation tool
There are also some automation tools that you might be able to use, for example, rxxr2 and ReScue are some open source tools you could use neatly. If you have your source code hosted in Github and have Github Advance Security enabled, you could scan your code with CodeQL and the vulnerable regex could be flagged as well though you still need to verify it manually.
Conclusion
Composing an efficient and robust regex is hard and to spot a regex vulnerable to ReDOS is not easier either. This article just attempts to explore the basic problems of ReDOS and how to identify and exploit it with the help of some tools by providing a concrete example.
As stated in the before ahead, most developers, even the security engineers are not really aware of the potential risks of an efficient regex. The best way to avoid the ReDOS risk caused by an inefficient regex is to inform your developers and security engineers about the danger of the ReDOS and perform a thorough code review and testing when a regex has to be used in your code.