Does tavily-mcp send data, and where? — data-flow verdict
provisional · AUTOMATED — forensic confirmation pending. A preliminary, fact-based flag, not a judgment that the tool is unlawful or unsafe.
100/100 integrity
100% evidence coverage
evidence-backed
Measures evidence support, not confidence — how this is scored
Verdict (the facts)
- Tool
- npm/tavily-mcp
- Integrity axis
- undisclosed_processing — Observed behaviour matches the tool's stated function; the egress above is the tool doing its advertised job. 'honest' is the integrity axis — it does NOT imply the data flow is irrelevant; see the data-flow axis and jurisdiction.
- Data-flow axis
- Sends data to api.tavily.com (US, jurisdiction tier 2) as its core function; analytics metadata rides on the same functional request (disclosure: partial). No separate telemetry destination or third-party observability SDK was found.
- Disclosure
- partial — All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no third-party analytics/error-reporting SDK. Riding on the functional request are analytics-metadata headers: X-Human-Id (opt-in, sent only if the operator sets TAVILY_HUMAN_ID, documented in the README as enabling per-user analytics) plus an always-on X-Session-Id (per-process random UUID, non-PII) and X-Client-Source attribution. The per-user analytics purpose is disclosed; the always-on session/attribution headers are not documented — hence partial.
- Capture self-test
- verified
- Severity
- low — integrity axis only (undeclared exfiltration). Functional egress and disclosed metadata are reported as neutral facts and are not graded here.
- Version (pinned)
- 0.2.20 · commit fc09f6e5e76622987e0688ad061047cb240062db
- Content hash
- sha256:d13b0bd08a52986c0be20d86c37af0b6b17475f2869b14924feb9e34e02d2528
- Signature
- ed25519:RTgxTmND+U6YRZGNq+UjdTiWawz0gE69zVsZS2… · Ed25519 public key · sha256:49cf8457b42a7048
- Scanned
- 2026-06-14T00:00:00Z — Pinned to tavily-mcp@0.2.20 (git fc09f6e5e76622987e0688ad061047cb240062db), published 2026-05-29. This verdict applies to that exact version; a newer release would require a re-scan.
- Re-verified
- 2026-06-14 — pinned version current
- Categories
- search functional-partial US published
- Observation history
- 1 scan(s); first seen 2026-06-14T00:00:00Z · latest 2026-06-14T00:00:00Z
Observed egress destinations
| host | country | jurisdiction | class | disclosure | frequency | kind |
| api.tavily.com | US | tier 2 | functional | by purpose | on launch and on every tool call | search/research backend (carries the query + target URLs — the tool's function); piggybacked analytics headers ride on the same request (X-Human-Id opt-in disclosed; X-Session-Id/X-Client-Source always-on, undocumented) |
Each destination is classified FUNCTIONAL (the tool's advertised job requires the call — a neutral fact about where your data goes), SESSION/AUTH (handshake with the same operator), or TELEMETRY/ERROR_REPORTING (an observability side-channel not required for the function). Disclosure is judged across the tool's full public doc surface, not just its README, and any 'undisclosed telemetry' finding is adversarially refuted before it is asserted.
Jurisdiction context: Tier 2 = third country (e.g. US): transferring EU personal data to a third country requires a transfer basis under GDPR Art. 44-49 (e.g. SCCs / EU-US Data Privacy Framework) — an obligation on you, the deployer; the tool gives no control over this flow. This is the applicable framework, not a finding that the tool violates it.
Evidence — the captured request (verify, don't just trust)
Capture self-test: verified — a beacon decoy was emitted from the tool's network context; its presence in the intercept means a 'no egress' result would have been trustworthy.
Observed: POST
https://api.tavily.com/search ×5
— intercepted (the tool's HTTPS was terminated against the sandbox CA; the egress was then blocked by strict-egress, but the full request was captured)
Payload fields actually sent:
- query
- search_depth
- topic
- include_domains
- exclude_domains
- country
- start_date
- end_date
- urls
- extract_depth
- format
- url
- instructions
- select_paths
- select_domains
- chunks_per_source
- input
- model
Captured payload sample (one event):
{"query":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","search_depth":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","topic":"general","include_domains":["FILE-CONTENT::canary-edd5879f-file-95add22b7836::END"],"exclude_domains":["FILE-CONTENT::canary-edd5879f-file-95add22b7836::END"],"country":"FILE-CONTENT::canary-edd5879f-file-95add22b7836::END","start_date":"FILE-CONTENT::canary-edd5879f-file-95ad
Captured in the sandbox run. The distinct_id (a persistent machine identifier) and the write-only, public-by-design ingestion key are truncated above; payload_fields is the union observed across the run.
Reproduce it yourself (canary-sandbox (open methodology; Docker backend)):
python -m canary.cli scan <target> --backend docker # target: npm tavily-mcp@0.2.20
Re-run it yourself: the scanner installs the pinned version, drives the tool over MCP, and intercepts all egress.
Full raw captured trace + verification:
/verdict/tavily-mcp/evidence.json
— every captured request (redacted), the verdict content-hash and the package checksum, for an AI or auditor that wants the underlying observation, not just the conclusion.
Disclosure check (the §824 evidence)
- Read
- README (X-Human-Id / per-user analytics); package source (header construction)
- Quoted from the tool's own docs
- “X-Human-Id enables per-user analytics (set via TAVILY_HUMAN_ID; unset = off by default).”
- Match
- All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no third-party analytics/error-reporting SDK. Riding on the functional request are analytics-metadata headers: X-Human-Id (opt-in, sent only if the operator sets TAVILY_HUMAN_ID, documented in the README as enabling per-user analytics) plus an always-on X-Session-Id (per-process random UUID, non-PII) and X-Client-Source attribution. The per-user analytics purpose is disclosed; the always-on session/attribution headers are not documented — hence partial.
- Residual gap
- X-Session-Id (random per-process UUID) and X-Client-Source are always-on attribution metadata not mentioned in any doc — low-sensitivity, non-PII; described as 'not mentioned in docs', not as user tracking.
How we know this — claims by basis
A verdict is a reproducible evidence container, not just a claim. Each assertion is tagged: an observation is in the capture and reproducible; an inference is our reasoning over it; documented is the tool’s own statement; a classification is our adversarially-reviewed judgment. Observation never reads as inference.
Observed — directly in the capture, reproducible
- The tool sent 5 request(s) to api.tavily.com carrying fields: query, search_depth, topic, include_domains, exclude_domains, country, start_date, end_date, urls, extract_depth, format, url. — Captured in the sandbox run (published redacted in the evidence artifact); re-run the scan to reproduce. (confidence: high)
Inferred — our reasoning over the observation
- The repeated requests suggest the flow fires on launch and on each tool call. — 5 requests in one run — an inferred pattern, not proven across launches. (confidence: medium)
Documented — the tool's own statement
- The tool's own docs state (quoted): X-Human-Id enables per-user analytics (set via TAVILY_HUMAN_ID; unset = off by default). — README (X-Human-Id / per-user analytics); package source (header construction) (confidence: high)
Classified — our adversarially-reviewed judgment
- api.tavily.com is classified as functional (required for the tool's advertised function). — Adversarially reviewed. (confidence: high)
- Disclosure status: partial. — All traffic goes to a single host, api.tavily.com — the functional search/research backend; carrying the query and target URLs there is the tool's advertised purpose. There is no separate telemetry destination and no thi (confidence: high)
Method
Installed and run in an isolated container; fed traceable decoy data; all outbound traffic intercepted (TLS broken via own CA, iptables transparent redirect). Endpoints, resolved geo/jurisdiction and frequency are observed facts. Capture self-test passed.
Scope
Compares the tool's declared destinations against what was observed in one sandbox run. Checks transparency / integrity for a cooperative tool, NOT resistance to deliberate evasion. "honest"/"clean" means "observed without deviation within our reach", NOT "guaranteed no hidden egress".
Out of scope: exfiltration split/chunked across requests; tool-side encryption of the payload before egress; input/time/state-triggered processing not triggered in the run.
Machine-readable verdict: /verdict/tavily-mcp.json.
This page describes observed behaviour and its relation to the tool's own disclosures — it is not a legal judgment.
Search context: does tavily-mcp send data, tavily-mcp privacy, tavily-mcp data flow, tavily-mcp telemetry, where does tavily-mcp send data, is tavily-mcp safe, what data does tavily-mcp collect, how to disable tavily-mcp telemetry, tavily-mcp opt out tracking, tavily-mcp GDPR data residency, tavily-mcp third-party / jurisdiction.