Stop AI from inventing audit evidence.
An AI audit can say PASS with 92% confidence and still have nothing behind it. Canary independently verifies whether an automated audit is actually supported by evidence — and scores how much.
Illustration of a typical confidence-only audit.
Same scoring method we apply to ourselves — a confidence-only audit earns none of it.
Confidence is not evidence.
A real one, not a mockup
A casual audit calls @mobilenext/mobile-mcp "anonymous usage analytics." Canary installed it, drove it, and intercepted the traffic. Evidence: 21 real POSTs to us.i.posthog.com carrying AgentName, ToolName and Duration — i.e. which MCP tools your agent invokes, on launch and on every call. The raw, redacted trace is published next to the verdict, and the verdict is signed.
See the verdict + raw evidence →
Why AI audits fail
Hallucinated evidence
The model references evidence — a file, a check, a certification — that does not exist.
Circular validation
The same AI generates the audit and grades it. No independent observation ever happens.
Confidence inflation
A high confidence score is reported with no supporting evidence behind it.
Agent risk
Agents act on unverifiable claims — installing, connecting, sending data — as if they were verified.
What Canary does instead
For every tool it audits, Canary produces evidence, not opinion: it installs and runs the tool in an isolated sandbox, feeds it traceable decoy data, and intercepts every outbound request (TLS broken with its own CA). It checks each claim against the tool's full public documentation and adversarially tries to refute any "undisclosed" before asserting it. Then it computes an Integrity Score and Evidence Coverage, and signs the result. 25 verdicts are live now — each one capture-backed, version-pinned and Ed25519-signed. See the benchmarks →
The register
- @openbnb/mcp-server-airbnb — Sends data to www.airbnb.com (US, jurisdiction tier 2) as its core function. No telemetry,…
- @felores/airtable-mcp-server — Sends data to api.airtable.com (US, jurisdiction tier 2) as its core function. No telemetr…
- @roychri/mcp-server-asana — Sends data to app.asana.com (US, jurisdiction tier 2) as its core function. No telemetry, …
- chroma-mcp — No network egress was observed: scanned with --client-type ephemeral (in-memory, local-onl…
- @upstash/context7-mcp — Sends data to context7.com (US, jurisdiction tier 2) as its core function; analytics metad…
- @winor30/mcp-server-datadog — Sends data to api.datadoghq.com (US, jurisdiction tier 2) as its core function. No telemet…
- duckduckgo-mcp-server — Sends data to html.duckduckgo.com (US, jurisdiction tier 2) as its core function. No telem…
- @modelcontextprotocol/server-everything — No network egress to external destinations was observed — the tool ran purely locally.…
- @modelcontextprotocol/server-filesystem — No network egress to external destinations was observed — the tool ran purely locally.…
- @forestadmin/mcp-server — Sends data to api.forestadmin.com (FR, jurisdiction tier 0) as its core function. No telem…
- mcp-server-git — No network egress to external destinations was observed — the tool ran purely locally.…
- @yoda.digital/gitlab-mcp-server — Sends data to gitlab.com (US, jurisdiction tier 2) as its core function. No telemetry, ana…
- hyperbrowser-mcp — Sends data to app.hyperbrowser.ai (US, jurisdiction tier 2) as its core function. No telem…
- linear-mcp-server — Sends data to api.linear.app (US, jurisdiction tier 2) as its core function. No telemetry,…