Benchmarks

Every verdict carries its evidence

This is a dataset, not a leaderboard. Two different things are measured below — keep them apart.

Finding (varies)

What was actually observed about the tool's data flow — the risk axis. Clean, functional egress, disclosed telemetry, undisclosed telemetry. This is the asset.

Integrity (meta)

How well-evidenced this verdict is. Published verdicts score high by selection — we only publish what we can fully evidence. When we can't, we hold it.

Tool	Finding	Jurisdiction	Evidence	Verdict integrity
@felores/airtable-mcp-server	Functional egress	tier 2	12 request(s) captured	100/100
@forestadmin/mcp-server	Functional egress	—	2 request(s) captured	100/100
@mobilenext/mobile-mcp	Telemetry — disclosed	tier 2	21 request(s) captured	100/100
@notionhq/notion-mcp-server	Functional egress	tier 2	22 request(s) captured	100/100
@openbnb/mcp-server-airbnb	Functional egress	tier 2	5 request(s) captured	100/100
@roychri/mcp-server-asana	Functional egress	tier 2	38 request(s) captured	100/100
@sentry/mcp-server	Telemetry — disclosed	tier 2	2 request(s) captured	100/100
@ui5/mcp-server	Functional egress	tier 2	6 request(s) captured	100/100
@upstash/context7-mcp	Functional egress	tier 2	2 request(s) captured	100/100
@winor30/mcp-server-datadog	Functional egress	tier 2	18 request(s) captured	100/100
@yoda.digital/gitlab-mcp-server	Functional egress	tier 2	32 request(s) captured	100/100
duckduckgo-mcp-server	Functional egress	tier 2	1 request(s) captured	100/100
hyperbrowser-mcp	Functional egress	tier 2	8 request(s) captured	100/100
linear-mcp-server	Functional egress	tier 2	5 request(s) captured	100/100
mcp-yahoo-finance	Functional egress	tier 2	31 request(s) captured	100/100
tavily-mcp	Functional + undocumented headers	tier 2	5 request(s) captured	100/100
@modelcontextprotocol/server-everything	No external egress	—	verified absent	100/100
@modelcontextprotocol/server-filesystem	No external egress	—	verified absent	100/100
@modelcontextprotocol/server-memory	No external egress	—	verified absent	100/100
@modelcontextprotocol/server-sequential-thinking	No external egress	—	verified absent	100/100
chroma-mcp	No external egress	—	verified absent	100/100
mcp-obsidian	No external egress	—	verified absent	100/100
mcp-server-git	No external egress	—	verified absent	100/100
mcp-server-time	No external egress	—	verified absent	100/100
playwright-mcp-server	No external egress	—	verified absent	100/100
A typical AI audit	unevaluated	—	model assertion only	0/100
A finding we could not fully capture	held for review	—	claim not intercepted	~20/100 · not published

Read the Finding column for differentiation between tools; read Verdict integrity as a trust stamp on the verdict itself. A confidence-only audit scores zero on integrity; a claim we cannot intercept is held at a low score rather than published. As coverage grows, harder and partially-evidenced cases will widen the published integrity range honestly — never by tuning the rubric.