Tag Archives: data leakage

Steal Restricted Sensitive Data with Template Languages

Template language is a language which allows developers defining placeholders that should later on be inserted or replaced with some dynamic data (variables). As it indicates from the definition, the main usage of template language is to give more flexibility to allow developers to insert some dynamic data into a predefined template. The dynamic data could be generated  from a different server or a new service based on the condition of existing sessions or use cases.  There are numerous templating languages widely used in web developments. Among them, Handlebars, EJS, Django, Mustache and Freemarker are very popular ones.  The three main components when using a  template language are , dynamic data(variables), template and the template engine to compile the data and template.

How Template Language works

As template languages provide more flexibility for web developments, it also introduces some security issues due to it. Clearly SSTI is the most notorious vulnerability discovered among various template languages.

Security Concerns beyond SSTI with Template Languages 

SSTI vulnerabilities could be avoid

Server Side Template Injections (SSTI) issues are the most common vulnerabilities discovered among many different languages. Server-side template injection is when an attacker is able to use native template syntax to inject a malicious payload into a template, which is then executed on server-side when the template engine/processor processes the user supplied template. A list of vulnerable template languages and its exploitation injection code could be found here and it is quite comprehensive to understand.

Most of SSTI exploitation leads to arbitrary code execution and server compromise. Due to that, many template languages deploy default  Sandbox  and Sanitization features to prevent the template engine from accessing risky modules by disabling them in default settings. It means, when a user-provided template or data is processed by the engine, it can not access these risky modules even though the malicious template contains a call to the risky modules. For example,  HandleBars introduced a new restriction  to forbidden access prototype properties and methods of the context object by default since 4.6.0 to mitigate the code execution caused by server side template injections. Some applications using template language are also deploying a very strict sanitization method to disallow certain characters or regexes to prevent other vulnerabilities caused by SSTI, such as adding sanitize function against the final output to prevent XSS issues. 

Even though a strong Sandbox added by the template language itself and a robust sanitization method is deployed on the top of it to ensure the template could not be abused by SSTI attack ,  your applications could be still at risk due to improper configuration of how dynamic data could be consumed by the template engine.

Data leakage still occurs when template engines could process data out of the permitted scope.

Take the following instance as an example.


Under one application, an Admin user could create an organization and make sensitive operations through Dashboard or  performing API requests. Once an organization is created, the Admin could add multiple users with limited permission to the Organization settings. A user could invite new users to join the organization by sending them an invitation email. To make the email more dynamic and allow the users to modify the email template, it is using a template language to compile the email template.

Under a standard operation, a user could send an email to invite a new user by taking the following steps. 

Step 1:  A user could create the following email template from the dashboard and use it to send email to a new user.

<h2>Dear Friends </h2>
<div>  
<p>   Please join {{ organization.name }} to share your fun moments by clicking the invitation link {{organization.invitation_link}}. Your friends are waiting for you <p>
 <p>Best {{ user.name }}</p>
<div>

Step 2:  Application will process the email template with the template language engine once the user saves the template.

The application server will a) validate whether there are potential template injection threats by using both the sanitization and sandbox method  b)If the template is safe and syntax is correct, replace the placeholders like {{ organizatioin.name }}, {{ user. name }} with the dynamic data extracted from the server. For example, the App Server could query the DB and get the current Organization and user  data from DB and present it with a JSON object format.

Step 3:  The invitation email will be sent to another user with the final output.

Once the template engine replaces all the placeholders in the email template with the dynamic data to generate the final email output, an email will be sent to the invited user. 

Supposed that security control implemented on the server side is robust enough to prevent Server Side Template Injection attack by its sanitization and sandbox method,  But it could still leave an open security hole due to lack of access control of  dynamic data and insufficient validation when consuming the dynamic data.

Under this case, the organization data pulled from the application server contains more data than the user is permitted to access, for example, the api_key and api_private_token which should NOT be accessible by a team user  in a normal workflow. A non-admin user has no way to extract this sensitive data.

However, a user now could access them by crafting a deliberate template to steal them even without triggering any violations. If the user is using the following crafted template, the organization api_key and api_private_token will be disclosed to them when sending out an inviting email using this template.

<h2>Dear Friends </h2>
<div> 
 <p>   Please join {{ organization.name }} to share your fun moments by clicking the invitation link {{organization.invitation_link}}. Your friends are waiting for you.  <p>
 <p>Best {{ user.name }}</p> 
{{organization.api_key}} {{ogranization.api_private_token}}
<div>

Why does the template engine access more data than the users permitted?

There are various reasons why the server provides more data out of the user’s permission scope to the template engine when processing the template.  Here are three common reasons by referring to a couple of real scenarios that I experienced.

Reason 1:  Sanitization and sandbox method is only applied to check SSTI attacks patterns. 

If the user supplied template is NOT violating certain rules defined to match SSTI attack pattern, the server template engine will proceed the replacement action without validating whether the template is attempting to consume the data beyond its designed scope.

Reason 2:  Insufficient integration testing between micro services

It is very common for a company to have different teams for frontend and backend  service development. The Frontend team will be in charge of providing an interface for users to define a template and validate the user supplied template . Whereas,  the backend team will provide the functions to extract the dynamic data to replace the template once  the frontend passes a validated template to the backend. Both teams seem to perform their responsibility correctly, however, the frontend is blind to what kind of dynamic data the backend service provides and the backend has no way to validate which kind of data is allowed to be consumed by the frontend without a good suite of  integration tests.

Reason 3:  Access Control is not implemented in internal micro services

In a micro service development environment, I have seen many times that no access controls are deployed in the internal micro services. Once the request passes the access control implemented in the public services, the internal micro service is not going to perform another layer of validation when the public service calls the internal service. In this case, the internal service that pulls the organization data from the DB does not validate whether the user has the permission to access certain fields. 

How to prevent data leakage from abusing Template Language

To avoid data leakage caused by taking advantage of the template language, various means are available for developers to adopt during the development phase. 

  • Use a whitelist of dynamic data (variables in the template, {{ }}) rather than blacklist if a whitelist method is possible when validating the user supplied template
  • Perform the sanitization and validation after the user supplied template is compiled by the template engine to check whether there is potential sensitive data after the compilation..
  • Add access control and permission validation between services. If service A is going to consume data from service B, perform a permission check to ensure the user calling service A has the right permission to consume all the data provided by service B.

Besides adopting strict rules when processing template language during the development phase, a comprehensive and thorough test is vital to catch some overlooked areas.

Conclusion

While enjoying the flexibility provided by Template Language, developers and security teams should bear in mind that more flexibility also provides more attacking surface for malicious users. The SSTI issue is not the only security issue that you should be aware of,  you need to pay attention to the potential date leakage caused by insufficient sanitization or lack of access control to sensitive data. It means, your sanitization pattern should not only match potential SSTI attack patterns , but sensitive data patterns as well.

Insecure logging could be a burden to your security team

If you are part of a security team, it is very likely that your team has been feverishly remediating the vulnerabilities caused by log4j in the past two months.  It is really frustrating and struggling as the potential damage of this vulnerability could be catastrophic if exploited. However, the slimy bright side of it , at least, means that  your developer team is trying to implement logging functions in the product for monitoring or debugging purposes.

However, logging itself sometimes could be another security issue that is often overlooked as many developers are treating logging as an internal debugging and monitoring functions where security enforcement is often missing. I have observed many cases where improper logging functions turn out to be security incidents and add many burdens to its security team to overturn the damage.

Apply logging functions with security controls

Before we dive deep into the details, let us look at the following  piece of code from an old internal project that I created a while back. If you are using NodeJs Express framework, you could pinpoint that this piece of code is acting as a middleware to log every single HTTP request with the request body into a log file

The above piece of code definitely composes  security issues if you start to review it from security perspective. First of all, sensitive information in the HTTP request could be added to logs files and it could potentially cause a data leakage if the logs files are accessed by unauthorized users. The internal project is a web application with a login and registration function. As consequence of the above logging functions, the registration verification token and the username and password in the HTTP requestes could be leaked into the log files.

Potential Risks Caused by Logging

Risk 1: Sensitive Data are logged in log files

Logging sensitive data without proper masking or filtering methods is a common security ignorance from startup to enterprise due to many reasons. Couple of years back, twitter sent out an announcement for its users and urged them to change passwords due to unmasked/unfiltered passwords being logged into an internal log file.  

Reason 1: Security is not baked into entire SDLC

Many development organizations are involved their security teams at the test phase of the software/service development cycles. Without consulting the security team at design and development phase, many developers are not aware which data should be masked or filtered before implementing the log functions.

One tricky and representative example  that I have experienced was that the development teams got a list of blacklist data entries, like IP, password and token by referring to a document created from the security team a while back. They applied the filtering method into the log function without consulting with the Security team. However, the ‘referer’ header containing sensitive API tokens from customers was logged into the log files as it was not included in the predefined black list. This implementation mistake was discovered after the feature has been shipped into production environment and it took a while to purge the sensitive data from the log systems.

Reason 2:  Lack of standard logging functions in a complex environment

With more and more companies adopting the micro service architecture and making the development environment complex, lack of standard log functions could be another reason where sensitive data is logged and exposed into log files.

The following diagram is a typical workflow of micro service architecture, where the API gateway is exposed to the public to handle API requests and many micro servers are deployed in its private VPC to process the API requests.  Some developers are probably aware that sensitive data must be filtered out at the API gateway level before sending it to the S3 log system. However, when the requests passed are handled in the internal micro service (for example, micro service B), the developers might forget to perform the filtering as they believe the service is residing in the internal VPC and there is no need to filter sensitive data before writing to the log files. As a result of that, potential sensitive data could be logged in to the log file by some internal micro services.

Reason 3: Insufficient QA and Security Testing

It is common some QA are only performing blackbox testing whereas Security Team are only employing some automation scanning tools to scan the applications to find the potential flaw in the codes. Then it is very difficult for the QA team and security team to figure out the security issues caused by logging without manual code reviewing.

Risk 2: Malicious Data are process and logged without validation

Another risk when implementing logging functions and writing data to log is that maliciou data is processed and executed without any validation. You might be curious about why I should perform certain validation when processing the log data and storing it in a log file as the entire purpose of logging is to capture the raw data and use it for analysis. 

The reason is that you might be at risk of Deserialization exploitation when validation is absent from your logging functions. I have seen many developers dumping the entire object into the log file in some cases. When this happens, they are very likely to use some serialization functions to serialize the object and write it to the log file. After that, they may deserialize the logged data for analyzing purposes.  In this case, it is possible that you are at risk of deserialization exploitation.

Take the Log4j as an example, except the Log4jShell vulnerability, it has been suffering from a couple of deserialization vulnerabilities where untrusted log data could lead to remote code execuations.  

Some Best Practices

To ensure your logging function is not becoming a burden for your security and even turned against you when it leads to a security incident. Some best practices could be followed.

Involve Security at every phase of SDLC when implement log function

Security applies at every phase of the software development life cycle (SDLC). If you don’t have a security review procedure set up in your organization,  it is time for you to define it now.  By designing  a secure log function or log management feature by collaborating with your security team would save your organization much more time and effort. For example, the security team might ask you to avoid using GET instead of POST if your logging function is going to log all the requests in the log. They could also ask you to mask certain sensitive data as soon as the data is processed before it gets logged in to the log files.

Implement a standard and centralized logging function

When the organization is getting larger and larger, your platform and service is getting more sophisticated. Without a standard or centralized logging function, each team is forced to choose its own way to implement logging functions in the service they are in charge of. This could add many security uncertainties in the log functions as you could not foreseen how the logging function is implemented 

Consistent monitoring and scan your log data

Sometimes, unexpected data could still be logged into the log file even though you have set up strict logging functions to mask or filter all the sensitive data.  For example, your clients might not follow your API usage guidance and send sensitive data when calling your API endpoints. Your log function might log these sensitive data into the log files as the usage of the API is not intended as it was designed.  Under this case, you need to have a monitoring tool to scan your log data to check whether there is unexpected sensitive data logging into the log files.

Understand that data you are logging

If you could not measure it, you could not manage it”. It also applies to security. If you don’t know which kind of data you are logging, you could not really secure it. For example, if you are going to dump an entire object into your log file by calling some serialization functions without validation the object data, you are likely to log some malicious data and could lead to an exploitation.