Using HTML Entity Encode to mitigate XSS vulnerability, then double check it

HTML Entity Encode (HTML Encoding) is a commonly deployed escaping/encoding method to mitigate XSS vulnerability as consciousness of XSS is growing.  A very big portion of web applications are using HTML Entity Encoding to handle untrusted data, and this method is robust enough to protect them from XSS attack for most of the time. However, under some situation, you might still expose your web applications under XSS attack even though HTML entity Encoding is implemented.  

A real world example

Following example is a mock up from one client website (the original web application is a single page application where JavaScript Code is heavily implemented), where HTML Entity Encode was deployed but failed to eliminate the XSS vulnerability. Supposed the vulnerable URL is http://www.example/test.jsp?query=userinput and injection point is the query parameter.  After sending a request to it under a modern web browser, the source code looks like,

htmlencode is a customized function on the server side to apply HTML encodings to specified string in order to combat  XSS vulnerability. The above snippet shows two piece of information a) The user input value is HTML encoded and reflected in the response under one <input> field, b) The html encode value was then assigned to innerHTML attribute of  an element when the page is loaded.

HTML Entity Encode is not sufficient here

At the first glance, it seems the mitigation method is robust enough because the user input is HTML encoded correctly and encapsulated under a double quote.   Whereas, it turns out this web application is still bearing XSS vulnerability with it.

When an attacking vector with malicious code http://www.example/test.jsp?query=<img src=x onerror=alert(1)> is requested in a web browser, malicious code <img src=x onerror=alert(1)> is still  parsed by the web browser  and the inherent JavaScript code is executed even though the user input value  is HTML encoded as &lt;img src=x onerror=alert(1)&gt; in the response page  .

What is behind this scenario?

In order to get a closer look to the problem, we might start to analyze the source code of the response from the request with attacking vector.

<body onload=”myFunction()”>

JavaScript code document.getElementById(“search_result”).innerHTML=document.getElementById(“query”).value; is the culprit that spoils the HTML Entity Encode method.  When HTML parser (HTML parse is one of the most complicated and important components of a web browser, it controls how your raw html source code is turned into web pages) runs and builds up the response page for the first time, the attribute value entity <img src=x onerror=alert(1)>in the input field will  be decoded when the html parsers is parsing the value attribute. Though it is decoded at this step, it is not intercepted as HTML content yet. Later, the decoded value is passed to the innerHTML and it will be intercepted as HTML content because the innerHTML indicated the HTML parser to parse it as HTML format content.  In short, the html encoding value in the input field is parsed twice. As a consequence, the injected malicious code will be executed in the web browser and leads to XSS attacks.

Same Flaws observed in some open source web applications

After conducting research on some open source web applications by using Qualys Web Application Scanner,  WAS detected similar XSS vulnerability in some open source web applications even though HTML Entity Encode is applied. The following pattern was observed among these vulnerability where HTML Entity Encode is used.

<input  onfocus=”JavaScriptCodehtmlencode(userinput)JavaSctiptCode” >

In the pattern, the user input is HTML Entity encoded and reflected in the event handler (onfocus is one of the event handlers).  Similar to the scenarios discussed at the beginning, the HTML Entity Encoding is defeated because web browser (actually it is the HTML parser) will HTML decode the value of the event handle before it is executed as JavaScript code.

Conclusion

This example is not a rare or special case. Especially, while building single pages applications is trendy and considered a modern web development practice, it is common to see HTML encoded user input value is reused in a single page.  For web developers and security engineers, it is important to bear in mind that HTML parsing is a very tricky work. When HTML Entity Encode method is used to handle untrusted data, you should not only check whether the encoded user input value is placed correctly in the response, but also pay attention on the whole context of the page.

Leave a Reply

Your email address will not be published. Required fields are marked *