Architecture
How context-mode is built: server, sandbox, knowledge base, security, and adapters.
context-mode is a small set of cooperating components with one job: keep raw bytes out of the context window. The model writes a program, the program runs where the data already lives, and only the printed answer comes back. Everything below exists to make that loop fast, safe, and durable across a long session.
The big picture
A request flows through five stages. Your agent calls a ctx_* tool over MCP.
The server validates the call and decides where it goes. Code runs in a
sandboxed subprocess; large output is truncated or indexed instead of returned
verbatim. Anything worth keeping lands in a searchable knowledge base. Across the
whole session, a separate database records what happened so memory survives
compaction.
The result is a system where a single payload typically costs a few hundred tokens instead of tens of thousands — about 98% saved across a full session.
MCP server
The server is the front door. It exposes the ctx_* tools your agent sees —
ctx_execute, ctx_search, ctx_batch_execute, ctx_fetch_and_index,
ctx_index, and the status tools — and routes every call to the right
subsystem. It owns no business logic of its own; it validates inputs, dispatches
to the executor or the content store, and shapes the response so that only the
answer crosses back into context.
Tool descriptions follow a strict WHEN, WHEN NOT, RETURNS, EXAMPLE
structure and deliberately avoid forbidding language. The goal is to help the
model pick the right tool by describing fit, not by issuing prohibitions that the
model has to reason around.
Polyglot executor
The executor runs the model's code in a sandboxed subprocess, isolated from the agent process. It supports 12 languages. For JavaScript and TypeScript it auto-detects Bun and uses it when present for faster startup, falling back to Node.js otherwise.
Output is handled on the way out, not on the way in. When a program prints more than a threshold, the executor truncates the stream or routes it into the knowledge base rather than flooding context. That is the whole point of "Think in Code": a script that reads a 60 KB file and prints one number costs you one number.
// ctx_execute(language: "ts", code: ...)
const log = await Bun.file("build.log").text();
const errors = log.split("\n").filter((l) => l.includes("ERROR"));
console.log(`${errors.length} errors; first: ${errors[0] ?? "none"}`);Knowledge base
The content store is a SQLite database using the FTS5 full-text engine with BM25
ranking. Indexed content is chunked by type so search returns focused, relevant
snippets: Markdown splits by headings, JSON splits by key paths, and plain text
splits by lines. When you call ctx_search, BM25 scores the chunks and returns
the best matches — not the raw document.
This is what lets a fetched web page, a long log, or a directory of files become durable, queryable memory instead of a one-time context cost.
Security layer
Before any command runs, a security layer evaluates it against deny and allow rules and screens for shell-escape attempts. Denied commands never reach the sandbox. The layer distinguishes two kinds of refusal: a neutral redirect, which simply steers a command to a better tool, and a true restriction, which blocks something genuinely unsafe. Routing-deny reasons keep those cases separate so the model gets an accurate signal instead of a blanket "no".
A redirect is not a failure. When a command is pointed at the sandbox instead of being run inline, that is the system protecting your context window, not restricting what you can do.
Runtime detection
A runtime-detection layer probes the machine for available language runtimes —
which interpreters are installed, whether Bun is present, what versions exist —
so the executor can choose the fastest viable path for each language. This is
also what context-mode doctor reports on when it checks your runtimes.
SessionDB and analytics
Session events are captured in a separate SQLite database, SessionDB. It runs in
WAL journal mode with a configured busy_timeout and no external lockfile, which
makes it multi-writer safe: several hook processes can record events
concurrently without corrupting state or blocking each other. That property
matters because hooks fire from many short-lived processes, not one long-running
one.
On top of SessionDB sit the session modules: extract pulls structured signal out
of raw events, snapshot captures point-in-time state, and analytics turns the
record into the numbers you see in ctx stats and the Insight dashboard. Because
this history lives outside the context window, it survives compaction — your
agent can search what happened earlier even after the conversation has been
summarized.
Adapters
Each supported host connects through a per-host adapter that wires the hooks for that CLI. context-mode spans more than a dozen agent CLIs through three integration paradigms: JSON stdin/stdout hooks, TypeScript plugins, and MCP-only routing files. The adapter is the thin layer that translates a given host's event model into the same internal pipeline, so the executor, knowledge base, and SessionDB behave identically no matter where the agent runs.
Recorded decisions
A few architectural choices are worth stating plainly, because they shape how the rest of the system behaves:
- SessionDB is multi-writer safe. Concurrent hook processes can all write at
once thanks to WAL mode and
busy_timeout, with no external lockfile to contend over. - Tool descriptions are structured, not restrictive. Every tool follows
WHEN,WHEN NOT,RETURNS,EXAMPLEand avoids forbidding language so the model chooses by fit. - Deny reasons separate redirect from restriction. A neutral redirect to a better tool is reported differently from a true safety block.
- Stats use a strict compression formula. The savings figure in
ctx statscomes from a fixed formula, so the number you see is consistent and reproducible rather than an estimate.