ctx_fetch_and_index
Fetch a URL, convert it to markdown, and index it; raw HTML never enters context.
ctx_fetch_and_index brings web content — docs, changelogs, API references,
specs — into the knowledge base without the raw bytes ever entering context. The
tool fetches the page, converts HTML to markdown, and indexes the result. Only a
small per-source preview returns to you; the full text is searchable with
ctx_search afterward.
This is how you read the web under context-mode. A documentation page that would cost tens of kilobytes as raw HTML is reduced to a short preview, and the content you actually need comes back later as a focused search hit.
Parameters
| Parameter | Required | Description |
|---|---|---|
url | one of | A single URL to fetch and index. |
requests | one of | An array of objects, each with a url and an optional source, for multi-URL fetches. When both url and requests are supplied, requests wins. |
source | no | Label for a single url. For a batch, put source in each requests entry. |
concurrency | no | 1 to 8. Default 1. Parallelizes multi-URL fetches, capped by the host's logical CPU count. Indexing always runs serially. |
ttl | no | Cache window in milliseconds. Default 24 hours. 0 bypasses the cache. |
force | no | Re-fetch and ignore the cache. |
JSON responses are chunked by key path, the same as markdown is chunked by heading, so a fetched API payload stays queryable section by section.
Caching
Every fetch is cached on disk and reused within the TTL, so refetching the same
URL inside the window costs nothing. Stale entries are cleaned up after 14 days.
Set ttl to 0 or pass force when you need the live version of a page that
changes often, such as a status feed or a fast-moving changelog.
This is a plain HTTP fetch with no headless browser. Content that a JavaScript-only single-page app renders on the client may not materialize. Prefer a server-rendered page or a direct API endpoint when the target is an SPA.
Example
Fetch several reference pages in parallel, each under its own label, then search across all of them in one call.
ctx_fetch_and_index(
requests: [
{ url: "https://example.com/docs/auth", source: "vendor-auth" },
{ url: "https://example.com/docs/webhooks", source: "vendor-webhooks" },
{ url: "https://example.com/docs/ratelimits", source: "vendor-ratelimits" }
],
concurrency: 5
)
ctx_search(
queries: [
"How is the webhook signature verified?",
"What is the per-minute rate limit?"
]
)