VulnHound is a threat-model-first vulnerability research harness for Claude Code. One slice per session. No PoC means it's not confirmed. Six hooks enforce everything mechanically — not through prompts Claude can drift from.
Two tools, two roles. The template is where you think. VulnHound is where you execute. They stay in sync via two CLI commands.
Each slice moves through these phases in order. VulnHound auto-tracks which phase Claude is in and the Stop hook prevents finishing until the phase is complete.
Start here every time. Produces a structured map of the slice before any assumptions are made. Ask Claude to trace exact call chains — not a summary, the actual code paths.
This single reframe dramatically improves finding quality. The model stops rationalising defences and starts looking for breakage. Expect 2–3 findings minimum from a well-targeted slice.
For every specific check or function Claude flagged: force a concrete payload. The Stop hook will block Claude from finishing if a candidate finding has no PoC. This phase is what makes the difference between a report and evidence.
This is the technique that found ElysiaJS's let decoded = true initialisation bug. Asking the model to list assumptions surfaces wrong defaults that an "is this vulnerable?" frame rationalises away.
After every round: discard the found bugs and ask what's next. Repeat 2–3 times until the model starts producing theoretical noise. Stop when it loops.
Every confirmed finding needs a test that proves it. This doubles as a regression guard for the maintainer and makes the report far more actionable.
Ten techniques that change how the model engages with the codebase.
| Technique | One-liner prompt |
|---|---|
| Assert-first | "This is definitely vulnerable. Find the bugs." |
| Exploit demand | "Write a PoC that bypasses this — not an assessment." |
| Adversarial frame | "You are a paid red team operator. Real, exploitable bugs only." |
| Invariant decomp | "List every assumption this function makes. Can each be violated?" |
| False anchor | "I found one bug here. Find the rest." |
| Inversion | "How would you break this?" — not "Is this secure?" |
| Comparative | "How does this differ from the standard secure implementation?" |
| Escalation | "Those are obvious. What are the subtle, easy-to-miss issues?" |
| Constrained attacker | "Remote unauthenticated HTTP only. Find what's reachable." |
| Mistake assumption | "Assume the developer made a mistake here. What is it?" |
The reason "find all vulnerabilities" prompts fail isn't the prompt — it's the context window.
As the context window fills, model reliability degrades. Bugs buried in the middle get missed due to primacy/recency bias. A 20-page AGENTS.md creates the haystack — and hides the needle inside it.
One slice per session. Threat model under 1 page. Each slice = one trust boundary, one invariant, specific files only. VulnHound enforces this mechanically — the PreToolUse hook blocks out-of-scope file writes.
Zero external dependencies. Python 3.8+, no pip installs. The hook harness lives entirely in your project directory.
Pin the exact commit you're auditing — this goes into THREAT-MODEL.md Section 0 and into VulnHound state.
This copies all 6 hook files, lib/state.py, lib/vulnhound.py, and CLAUDE.md into your project directory. Creates the hunt/ directory tree automatically.
./vh shortcut in your project root.This creates hunt/state.json with your target metadata. The hooks are now wired and ready.
Open the THREAT-MODEL.md template and complete Sections 0–4. The key sections:
I-N | statement | enforcement point. These are what the Stop hook enforces.
This reads Sections 0–4 and loads everything into state.json: target, CVE patterns, attacker model, crown jewels, invariants, bug classes, and slices.
Run claude from your project root. The SessionStart hook fires immediately and injects a context brief (~1,200 tokens) covering the active slice, invariants, and any prior session learning.
All VulnHound commands via the ./vh shortcut.
| Command | What it does |
|---|---|
| ./vh init <name> <url> | Initialise a new audit target |
| ./vh import THREAT-MODEL.md | Load threat model, invariants, slices into state |
| ./vh sync THREAT-MODEL.md | Write confirmed findings back into template §6 and §9 |
| ./vh status | Show full current state — findings, slices, memory count |
| ./vh slice add <name> <files> | Add a new slice |
| ./vh slice next | Mark current slice complete, activate next pending slice |
| ./vh finding add <title> | Register a candidate finding |
| ./vh finding poc F-001 'curl…' | Record a PoC command for a finding |
| ./vh finding confirm F-001 | Mark finding confirmed (PoC reproduces) |
| ./vh finding fp F-001 | Mark finding as false positive |
| ./vh dead_end <desc> | Record a tested-and-clean path (stops re-testing) |
| ./vh memory search <query> | Search the cross-session knowledge graph |
| ./vh report | Generate markdown findings report → hunt/REPORT.md |
Check off each step as you complete it. Progress is saved in your browser.