Cross-file analysis taint traces
Introduction
This article documents the cross-file (interfile) dataflow analysis in Semgrep Code. This document helps you to enable these features and provides an overview of the benefits compared to the analysis of Semgrep OSS.
Viewing the path of tainted data
With dataflow traces, Semgrep Code can provide you with a visualization of the path of tainted, or untrusted, data in specific findings. This path can help you track the sources and sinks of the tainted data as they propagate through the body of a function or a method. For general information about taint analysis, see Taint tracking.
When running Semgrep Code from the command line, you can pass in the flag
--dataflow-traces
to use this feature.
You can view dataflow traces in:
- Semgrep AppSec Platform by going to Semgrep Code's Findings page. For more details, see Path of tainted data in Semgrep Code.
- The PR or MR comments created by Semgrep Code running in your CI/CD system. To enable
this feature, see:
- GitHub users: Dataflow traces in PR comments
- GitLab users: Dataflow traces in MR comments
Get cross-file findings
To get cross-file (interfile) findings in your organization, follow the steps in Perform cross-file analysis.
Displaying tainted data in Semgrep Code
Not all Semgrep rules or rulesets make use of dataflow traces, or taint tracking. Ensure that you have a ruleset, such as the default ruleset added in your Policies page. If this ruleset is not added, go to https://semgrep.dev/p/default, and then click Add to Policy. You can add rules that use taint tracking from Semgrep Registry.
To view the detailed path of tainted data with dataflow traces:
- Log in to Semgrep AppSec Platform, and click Code in the left panel to view your findings.
- Select the finding you're interested in, then do one of the following actions:
- If the default Group by Rule is enabled, click View details icon on the card of the finding.
- If No grouping view is enabled, click the header hyperlink on the card of the finding. In the example on the screenshot below, it is the tainted-sql-string.
- In the Data flow section, you can see the source, traces, and sink of the tainted data. The example below displays the path of tainted data across multiple files because Semgrep Pro Engine was enabled.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.