A pipeline that reasons like an attacker.
HELIX doesn't run a checklist. It runs a loop: form a hypothesis, execute a real tool, observe what came back, and re-decide what to try next. Underneath sit a search-driven planner and a six-layer guardrail engine that keep every move in scope and in bounds.
Six stages, recon to verification
Each engagement moves through the same disciplined arc, the same one a senior operator would follow by hand, only autonomous.
Discover
Map the target. Enumerate surface, endpoints, parameters and entry points, the recon that every later move depends on.
Understand
Build a model of how the target behaves: auth flows, roles, data shapes and business logic, context the planner reasons over.
Exploit
Test hypotheses with real offensive tools. The agent decides the approach; sqlmap, nuclei, Frida and the rest do the work.
Chain
Combine individual weaknesses into a higher-impact path within the engagement, turning a low-severity foothold into a real one.
Prove
Capture a copy-pasteable reproducer, assign CVSS and CWE, and write language-specific remediation. No proof, no finding.
Verify
A Skeptic agent refutes anything unproven and the correlator dedupes across agents, so only confirmed, distinct issues ship.
Monte-Carlo Tree Search, not a mega-prompt
A stateful MCTS planner drives the whole engagement. It hypothesizes candidate moves, executes the most promising one for real, observes the result, scores it, and re-decides. It uses UCB1 to balance exploiting a promising lead against exploring new ground, and prunes branches that fail, so it never bangs on the same closed door twice. Deduce there's a WAF in the way? Change strategy.
hypoth login form may allow SQLi
execute sqlmap --level 3 /auth
observe 403, WAF signature detected
decide prune branch · pivot
hypoth JSON body bypasses WAF rule
execute tamper via content-type
observe time-based delay confirmed
decide promote · capture reproducer
Six layers on every tool call
Autonomy without recklessness. Every single tool call passes through all six layers, in order, before anything touches your target. See the full controls on the Security page.
Scan mode
Passive, safe or full, you set the aggressiveness per engagement, so HELIX never pushes harder than you authorized.
Scope respect
A hard in-scope allow-list. Anything outside the boundary you defined is rejected before it can run.
Destructive-action blocking
A pattern detector stops data-destroying and service-saturating actions before they execute.
Budget cap
A hard LLM-spend ceiling per engagement. The operator stays within a cost bound you control.
Rate limiting
Request pacing that keeps engagements from degrading availability, pressure, not a flood.
Human-in-the-loop
Production targets require explicit approval gates. Staging first, then prod, with a human in the seat.
Posture that doesn't go stale between engagements
Schedule re-scans on an interval and HELIX produces run-over-run diffs, what's new, what's resolved, what regressed, so you watch posture move rather than reading a one-off snapshot. You can also trigger an engagement straight from your CI pipeline via the public API, on whatever event matters to you.
run #41 → run #42 diff
+ new 2 BOLA on /v2/invoices
- resolved 5 XSS on /search
~ regressed 1 auth bypass returned
trigger: scheduled · interval 24h
also available: POST /v1/engagements
Watch the loop close on your target
Recon to reproducible proof, with guardrails you control.