Tag Archives: Prevent Sensitive Data Leakage

5 typical ways engineers leak sensitive information and how to mitigate them

Background

From the AWS incidents where an engineer leaked private keys into a public Github repo to the credentials leakage in the online tool, Postman, it is obvious that our engineers could be the weak link in the system when it comes to critical sensitive data leaks.

By learning from all the publicly disclosed incidents, penetration testing experience and some security monitoring programs that I participated in, we could classify the common ways in which an engineer could leak sensitive information into 5 different areas.

Different Ways in Which Engineers Can Cause Sensitive Data Leakage

Sensitive Data Leaks in a Github Repo

The mistake that many developers make is to store secrets in their source code and check them into source control tools, like Github and Gitlab.  According to the latest GitGuardian report, 5.5 commits out of 1000 exposed at least one secret and 1 out of 10 authors are exposing secrets in a Github repo. 

You may argue that the security risk is manageable if these repos are private and only a few developers in your organization have access to them, even if this is a really lousy security practice. The incident experienced by Uber shows how bad things could get even if you put secrets in a private repository. In the incident, an attacker was able to access a S3 bucket with 57 million records of user data after extracting AWS credentials in a commit from a private repo. The worst-case situation is that you mistakenly push some secrets-containing code into public repositories, either under an enterprise or personal Github account. 

Mitigation

  1. Train the engineers not send company related information to a public repo
  2. Enable Secret Scanning in your CI/D pipeline

Sensitive Data Leaks in a Log File

The Log4j issue may be the first thing that springs to mind when thinking about security in logging functions. Yet, logging itself can pose a significant security risk because many  developers use it as an internal debugging tool and log far more data than is required. Sometimes, the logged data contains very sensitive information that should not be recorded anywhere as it would cause severe security incidents. For example, a couple of years back, Twitter sent out an announcement to its users and urged them to change their passwords due to unmasked passwords being logged into an internal log file.

We found that there are three main reasons why developers are logging sensitive information in to log files:

  • Debug log statements are not removed before shipping into production

Throughout development, many developers use logging as a debugging method. They added some log statements to track changes, but they forgot to delete the debug log statements before merging the code into production.

  • Developers are not fully aware of what is logged

As the system becomes more complex, one function requires interactions from multiple services and data is passed between many services. Data transmitted from an upstream service may contain sensitive data, but developers at a downstream service are unaware of the payload in the data. As a result, critical data may be unintentionally logged.

  • Filtering is not applied for debug log

Under some circumstances, it is necessary for a developer to log an error or exception along with the payload causing this error. This could be a problem if no filtering or masking is applied because the exception or the payload could contain sensitive data. 

Mitigation

  1. Integrate sensitive log statement detection with your CI/CD process and implement it at the PR level.
  2. Motivate peer reviewers to pay close attention to the log function
  3. Employ tagged or structured logging.

Sensitive Data Leaks When Using Online Tools

Presently, as more organizations go to the cloud, numerous tools, such as Base64 Decode, JSON format, Diff Checker, online data sharing tools, API testing platforms (i.e., Postman), and even the trendy ChatGPT, have online versions. Many developers benefit from these web tools because you may acquire your desired output with a simple copy/paste.

However, using these online tools carries the risk of disclosing some of your data to the public as you cannot control where your data will go. Many engineers appear to be unaware of the potential consequences. For example, according to a recent study, 2.3% of workers are pasting sensitive material into ChatGPT.

You can find a number of API tokens that have been exposed by searching Postman’s public network with certain keywords (token, secret, etc.).

Mitigation

  1. Engineers should be trained not to send sensitive data to internet tools.
  2. Instead of using online tools, install a toolkit locally.

Sensitive Data Leaks in Misconfigured Cloud Environments

Misconfiguration in a cloud environment is another way an engineer could mistakenly expose sensitive data publicly. There are many types of cloud misconfigurations, but a typical one is when an engineer gives too open permissions to some cloud assets. For example, an engineer could mistakenly configure a S3 bucket policy to grant public access to a bucket which contains sensitive information and cause sensitive data leakages. In addition, you could accidentally deploy internal instances to the public without authorization. 

Mitigation

  1. Keep persistent surveillance of your cloud environment.
  2. Apply your security setup during the build stage, and be wary about giving engineers or devops ad hoc access or permissions.
  3. Engineers should be given the bare minimum of permissions for the desired tasks.

Sensitive Data Leaks in insecure Channels

As a developer, it is safe to assume that you frequently use Zoom and Slack. Although these tools are simple to use, it is also  easy to see how we can unintentionally divulge private information and sensitive data via a Zoom chat or a Slack message.

You might think that we are just sharing among coworkers, but if you share sensitive information in a public channel where anyone at your company can see it, there is no way for us to know how widely it will be distributed both inside and beyond the company.

Mitigation

  1. Train employees to not share sensitive data that they wouldn’t put in an email
  2. Integrate some automation tools to detect sensitive data across these insecure channels

Conclusion

Both technical and human errors have the potential to expose sensitive information. Despite the fact that you implemented  strong data protection systems in place to guard against potential technical errors, there are still many ways in which a developer could expose private information.

When it comes to cybersecurity, humans are always the weakest link and human error is very difficult to avoid. For instance, a developer might unintentionally push private information into a public repository using  their own account; however, by the time we notice it with an automation tool, it would be too late. Please avoid making these frequent mistakes as a developer when committing your code or sharing certain data.