Does tavily-mcp send data, and where? — data-flow verdict

100/100 integrity 100% evidence coverage evidence-backed Measures evidence support, not confidence — how this is scored

Verdict (the facts)

Tool: npm/tavily-mcp
Integrity axis: undisclosed_processing — Observed behaviour matches the tool's stated function; the egress above is the tool doing its advertised job. 'honest' is the integrity axis — it does NOT imply the data flow is irrelevant; see the data-flow axis and jurisdiction.
Data-flow axis: Sends data to api.tavily.com (US, jurisdiction tier 2) as its core function; analytics metadata rides on the same functional request (disclosure: partial). No separate telemetry destination or third-party observability SDK was found.
Disclosure: partial — All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no third-party analytics/error-reporting SDK. Riding on the functional request are analytics-metadata headers: X-Human-Id (opt-in, sent only if the operator sets TAVILY_HUMAN_ID, documented in the README as enabling per-user analytics) plus an always-on X-Session-Id (per-process random UUID, non-PII) and X-Client-Source attribution. The per-user analytics purpose is disclosed; the always-on session/attribution headers are not documented — hence partial.
Capture self-test: verified
Severity: low — integrity axis only (undeclared exfiltration). Functional egress and disclosed metadata are reported as neutral facts and are not graded here.
Version (pinned): 0.2.20 · commit fc09f6e5e76622987e0688ad061047cb240062db
Content hash: sha256:d13b0bd08a52986c0be20d86c37af0b6b17475f2869b14924feb9e34e02d2528
Signature: ed25519:RTgxTmND+U6YRZGNq+UjdTiWawz0gE69zVsZS2… · Ed25519 public key · sha256:49cf8457b42a7048
Scanned: 2026-06-14T00:00:00Z — Pinned to tavily-mcp@0.2.20 (git fc09f6e5e76622987e0688ad061047cb240062db), published 2026-05-29. This verdict applies to that exact version; a newer release would require a re-scan.
Re-verified: 2026-06-14 — pinned version current
Categories: search functional-partial US published
Observation history: 1 scan(s); first seen 2026-06-14T00:00:00Z · latest 2026-06-14T00:00:00Z

Observed egress destinations

host	country	jurisdiction	class	disclosure	frequency	kind
api.tavily.com	US	tier 2	functional	by purpose	on launch and on every tool call	search/research backend (carries the query + target URLs — the tool's function); piggybacked analytics headers ride on the same request (X-Human-Id opt-in disclosed; X-Session-Id/X-Client-Source always-on, undocumented)

Each destination is classified FUNCTIONAL (the tool's advertised job requires the call — a neutral fact about where your data goes), SESSION/AUTH (handshake with the same operator), or TELEMETRY/ERROR_REPORTING (an observability side-channel not required for the function). Disclosure is judged across the tool's full public doc surface, not just its README, and any 'undisclosed telemetry' finding is adversarially refuted before it is asserted.

Jurisdiction context: Tier 2 = third country (e.g. US): transferring EU personal data to a third country requires a transfer basis under GDPR Art. 44-49 (e.g. SCCs / EU-US Data Privacy Framework) — an obligation on you, the deployer; the tool gives no control over this flow. This is the applicable framework, not a finding that the tool violates it.

Evidence — the captured request (verify, don't just trust)

Capture self-test: verified — a beacon decoy was emitted from the tool's network context; its presence in the intercept means a 'no egress' result would have been trustworthy.

Observed: POST https://api.tavily.com/search ×5 — intercepted (the tool's HTTPS was terminated against the sandbox CA; the egress was then blocked by strict-egress, but the full request was captured)

Payload fields actually sent:

query
search_depth
topic
include_domains
exclude_domains
country
start_date
end_date
urls
extract_depth
format
url
instructions
select_paths
select_domains
chunks_per_source
input
model

Captured payload sample (one event):

{"query":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","search_depth":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","topic":"general","include_domains":["FILE-CONTENT::canary-edd5879f-file-95add22b7836::END"],"exclude_domains":["FILE-CONTENT::canary-edd5879f-file-95add22b7836::END"],"country":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","start_date":"FILE-CONTENT::canary-edd5879f-file-95ad

Captured in the sandbox run. The distinct_id (a persistent machine identifier) and the write-only, public-by-design ingestion key are truncated above; payload_fields is the union observed across the run.

Reproduce it yourself (canary-sandbox (open methodology; Docker backend)):
python -m canary.cli scan <target> --backend docker # target: npm tavily-mcp@0.2.20
Re-run it yourself: the scanner installs the pinned version, drives the tool over MCP, and intercepts all egress.

Full raw captured trace + verification: /verdict/tavily-mcp/evidence.json — every captured request (redacted), the verdict content-hash and the package checksum, for an AI or auditor that wants the underlying observation, not just the conclusion.

Disclosure check (the §824 evidence)

Read: README (X-Human-Id / per-user analytics); package source (header construction)
Quoted from the tool's own docs: “X-Human-Id enables per-user analytics (set via TAVILY_HUMAN_ID; unset = off by default).”
Match: All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no third-party analytics/error-reporting SDK. Riding on the functional request are analytics-metadata headers: X-Human-Id (opt-in, sent only if the operator sets TAVILY_HUMAN_ID, documented in the README as enabling per-user analytics) plus an always-on X-Session-Id (per-process random UUID, non-PII) and X-Client-Source attribution. The per-user analytics purpose is disclosed; the always-on session/attribution headers are not documented — hence partial.
Residual gap: X-Session-Id (random per-process UUID) and X-Client-Source are always-on attribution metadata not mentioned in any doc — low-sensitivity, non-PII; described as 'not mentioned in docs', not as user tracking.

How we know this — claims by basis

Observed — directly in the capture, reproducible

The tool sent 5 request(s) to api.tavily.com carrying fields: query, search_depth, topic, include_domains, exclude_domains, country, start_date, end_date, urls, extract_depth, format, url. — Captured in the sandbox run (published redacted in the evidence artifact); re-run the scan to reproduce. (confidence: high)

Inferred — our reasoning over the observation

The repeated requests suggest the flow fires on launch and on each tool call. — 5 requests in one run — an inferred pattern, not proven across launches. (confidence: medium)

Documented — the tool's own statement

The tool's own docs state (quoted): X-Human-Id enables per-user analytics (set via TAVILY_HUMAN_ID; unset = off by default). — README (X-Human-Id / per-user analytics); package source (header construction) (confidence: high)

Classified — our adversarially-reviewed judgment

api.tavily.com is classified as functional (required for the tool's advertised function). — Adversarially reviewed. (confidence: high)
Disclosure status: partial. — All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no thi (confidence: high)

Method

Installed and run in an isolated container; fed traceable decoy data; all outbound traffic intercepted (TLS broken via own CA, iptables transparent redirect). Endpoints, resolved geo/jurisdiction and frequency are observed facts. Capture self-test passed.

Scope

Compares the tool's declared destinations against what was observed in one sandbox run. Checks transparency / integrity for a cooperative tool, NOT resistance to deliberate evasion. "honest"/"clean" means "observed without deviation within our reach", NOT "guaranteed no hidden egress". Out of scope: exfiltration split/chunked across requests; tool-side encryption of the payload before egress; input/time/state-triggered processing not triggered in the run.

Machine-readable verdict: /verdict/tavily-mcp.json. This page describes observed behaviour and its relation to the tool's own disclosures — it is not a legal judgment. Search context: does tavily-mcp send data, tavily-mcp privacy, tavily-mcp data flow, tavily-mcp telemetry, where does tavily-mcp send data, is tavily-mcp safe, what data does tavily-mcp collect, how to disable tavily-mcp telemetry, tavily-mcp opt out tracking, tavily-mcp GDPR data residency, tavily-mcp third-party / jurisdiction.