The 9th of December 2021 brought one of the most widespread vulnerabilities in the modern technological era. If your business was affected by the Log4j vulnerability, you likely remember this day for all the wrong reasons. This vulnerability allows arbitrary code execution on the Apache Log4j library, used by some of the biggest names in Tech. Think Apple, Microsoft, Cisco, IBM - the list goes on.
Checkpoint recorded over 800,000 exploitation attempts within 72 hours after the Log4j vulnerability became public. The risks were so dangerous that the Common Vulnerability Scoring System (CVSS) ranked Log4j 10 out of 10 in severity. That’s the first time a 10/10 is NOT what you want to hear.
But over a year has passed, and developers have been working overtime to develop solutions. Tools such as Semgrep play a vital role in today's operations, making it fast and efficient to identify bugs and dependency vulnerabilities while enforcing code standards. In this article, we'll talk about Log4j and how to use Semgrep to identify Log4j and similar vulnerabilities proactively.
Apache Log4j is an open-source logging library developed by Apache that is widely used in Java-based applications. The library enables developers to seamlessly generate log files for debugging and monitoring in web applications.
The vulnerability named Log4Shell exists within this library in versions 2.0-beta9 through 2.15.0, meaning that any application running a vulnerable version of this library potentially allows attackers to execute arbitrary code remotely.
The danger of this vulnerability lies in the ease and nature of its exploitation. Attackers can exploit it remotely without gaining prior access to the web application by sending specially crafted log messages that trigger the code execution. Plus, numerous proof-of-concepts became available right after the first public release of this vulnerability, making it easy for anyone to exploit it.
Attackers can then perform various malicious actions, such as data theft, system hijacking, and denial-of-service attacks. Given its potential for extensive harm, the severity of this vulnerability cannot be overstated, so identifying and addressing it is essential to prevent disastrous outcomes.
Semgrep is an open-source static analysis tool that has existed since 2020. It's a source code scanner (SAST) that finds security vulnerabilities in code. The tool analyzes code locally, unlike other SAST tools, making it ideal for developers who want to perform quick security checks during the development process.
Semgrep supports popular languages such as Java, Python, and JavaScript and contains pre-built rules that users may select from. Users can also create customized rules for specific use cases to evaluate the code against, as shown in the custom rule configuration below.
The example below demonstrates the interface for creating custom rules for various languages in the Semgrep platform.
With Jit, you can easily integrate Semgrep into your CI/CD pipeline, automate your SAST scans, and detect vulnerabilities early in development, avoiding the disastrous consequences that come with the exploitation of Log4j and other vulnerabilities.
After the vulnerability’s public release, Semgrep’s community quickly added a new rule to its registry to detect Log4j-related vulnerabilities. In this tutorial, we guide you through using Semgrep to detect Log4j vulnerabilities and explain how Semgrep and Semgrep rules work.
You can easily install Semgrep locally by installing the Semgrep CLI on either Mac, Linux, or Docker. If installing Semgrep locally is not an option, you can directly integrate Semgrep into the supported CI pipeline tools platforms such as GitHub Actions, GitLab CI/CD, Jenkins, Azure Pipelines, and more.
Once the installation or integration is complete, you can choose the rules from the rule registry or create custom rules in YAML for customized detection capabilities.
Once you've set up the rules, you can run Semgrep on your code base to detect vulnerabilities.
Once Semgrep completes its scan, you can review the results and fix any detected issues. Semgrep provides detailed reports highlighting the lines of code with detected vulnerabilities.
There are multiple ways that you can run Semgrep, depending on the requirement. This tutorial highlights the steps that you can use to deploy and use Semgrep locally.
Install Semgrep by using the following command on a Linux terminal.
python3 -m pip install semgrep
Check the version of Semgrep you installed, which also verifies if there were any issues with the installation.
semgrep --version
You can use the following command to run a quick scan using Semgrep on the source code stored locally (replace the path with the path to where your source code is held).
semgrep --config=auto /home/lahiru/code/
This command assesses the code specified and provides the results locally.
Semgrep has a pre-built set of rules covering the most commonly found vulnerabilities. However, you can also create custom rules using YAML to customize the detection capabilities.
In the following example, this rule uses a simple pattern that directly matches the System.out.println() method call with a single argument of any string literal. The $msg variable in the pattern will match any string literal argument passed to the method. This rule should generate a warning message for any System.out.println() usage in Java code.
rules:
- id: system-out-println
metadata:
severity: warning
description: "Avoid using System.out.println() in production code"
pattern: |
System.out.println($msg)
languages:
- java
Once you save this rule as a . YAML file, you can use the following command to run an assessment using your custom rule:
semgrep --config rule.yaml
Using the Log4j vulnerability detection rule from the rule library, you can create a custom rule locally or use the command shown in the Semgrep CLI.
rules:
- id: log4j2_tainted_argument
patterns:
- pattern: $LOGGER.$METHOD(...);
- pattern-inside: |
import org.apache.log4j.$PKG;
...
Logger $LOGGER;
...
- pattern-not: $LOGGER.$METHOD("...");
message: log4j $LOGGER.$METHOD tainted argument
languages:
- java
severity: WARNING
The custom rule definition above allows developers to look for a specific pattern that makes the code vulnerable to the Log4j vulnerability. The breakdown of this custom rule definition is as follows:
If you create a custom YAML rule file locally, you can use the following command to run the assessment using the custom rule:
semgrep --config /home/lahiru/SemgrepTest/rule.yaml
Log4j was a wake-up call for organizations to adopt a security-first mindset. Thankfully, despite the chaos and concern that Log4j caused, the Dev and security communities quickly developed adaptive solutions such as Semgrep’s.
By orchestrating Semgrep with Jit and using Jit’s custom rules, you can easily incorporate SAST scans into your CI/CD pipeline, automate these, and get real-time notifications of vulnerabilities, so you can fix issues before they become a problem.