Architecture

How context-mode is built: server, sandbox, knowledge base, security, and adapters.

context-mode is a small set of cooperating components with one job: keep raw bytes out of the context window. The model writes a program, the program runs where the data already lives, and only the printed answer comes back. Everything below exists to make that loop fast, safe, and durable across a long session.

The big picture

A request flows through five stages. Your agent calls a ctx_* tool over MCP. The server validates the call and decides where it goes. Code runs in a sandboxed subprocess; large output is truncated or indexed instead of returned verbatim. Anything worth keeping lands in a searchable knowledge base. Across the whole session, a separate database records what happened so memory survives compaction.

The result is a system where a single payload typically costs a few hundred tokens instead of tens of thousands — about 98% saved across a full session.

MCP server

The server is the front door. It exposes the ctx_* tools your agent sees — ctx_execute, ctx_search, ctx_batch_execute, ctx_fetch_and_index, ctx_index, and the status tools — and routes every call to the right subsystem. It owns no business logic of its own; it validates inputs, dispatches to the executor or the content store, and shapes the response so that only the answer crosses back into context.

Tool descriptions follow a strict WHEN, WHEN NOT, RETURNS, EXAMPLE structure and deliberately avoid forbidding language. The goal is to help the model pick the right tool by describing fit, not by issuing prohibitions that the model has to reason around.

Polyglot executor

The executor runs the model's code in a sandboxed subprocess, isolated from the agent process. It supports 12 languages. For JavaScript and TypeScript it auto-detects Bun and uses it when present for faster startup, falling back to Node.js otherwise.

Output is handled on the way out, not on the way in. When a program prints more than a threshold, the executor truncates the stream or routes it into the knowledge base rather than flooding context. That is the whole point of "Think in Code": a script that reads a 60 KB file and prints one number costs you one number.

Process in the sandbox, return only the answer

// ctx_execute(language: "ts", code: ...)
const log = await Bun.file("build.log").text();
const errors = log.split("\n").filter((l) => l.includes("ERROR"));
console.log(`${errors.length} errors; first: ${errors[0] ?? "none"}`);

Knowledge base

The content store is a SQLite database using the FTS5 full-text engine with BM25 ranking. Indexed content is chunked by type so search returns focused, relevant snippets: Markdown splits by headings, JSON splits by key paths, and plain text splits by lines. When you call ctx_search, BM25 scores the chunks and returns the best matches — not the raw document.

This is what lets a fetched web page, a long log, or a directory of files become durable, queryable memory instead of a one-time context cost.

Knowledge base

How indexing, chunking, and BM25 search work in detail.

Security layer

Before any command runs, a security layer evaluates it against deny and allow rules and screens for shell-escape attempts. Denied commands never reach the sandbox. The layer distinguishes two kinds of refusal: a neutral redirect, which simply steers a command to a better tool, and a true restriction, which blocks something genuinely unsafe. Routing-deny reasons keep those cases separate so the model gets an accurate signal instead of a blanket "no".

A redirect is not a failure. When a command is pointed at the sandbox instead of being run inline, that is the system protecting your context window, not restricting what you can do.

Runtime detection

A runtime-detection layer probes the machine for available language runtimes — which interpreters are installed, whether Bun is present, what versions exist — so the executor can choose the fastest viable path for each language. This is also what context-mode doctor reports on when it checks your runtimes.

SessionDB and analytics

Session events are captured in a separate SQLite database, SessionDB. It runs in WAL journal mode with a configured busy_timeout and no external lockfile, which makes it multi-writer safe: several hook processes can record events concurrently without corrupting state or blocking each other. That property matters because hooks fire from many short-lived processes, not one long-running one.

On top of SessionDB sit the session modules: extract pulls structured signal out of raw events, snapshot captures point-in-time state, and analytics turns the record into the numbers you see in ctx stats and the Insight dashboard. Because this history lives outside the context window, it survives compaction — your agent can search what happened earlier even after the conversation has been summarized.

Session continuity

How memory persists across compaction and resumes.

Adapters

Each supported host connects through a per-host adapter that wires the hooks for that CLI. context-mode spans more than a dozen agent CLIs through three integration paradigms: JSON stdin/stdout hooks, TypeScript plugins, and MCP-only routing files. The adapter is the thin layer that translates a given host's event model into the same internal pipeline, so the executor, knowledge base, and SessionDB behave identically no matter where the agent runs.

Recorded decisions

A few architectural choices are worth stating plainly, because they shape how the rest of the system behaves:

SessionDB is multi-writer safe. Concurrent hook processes can all write at once thanks to WAL mode and busy_timeout, with no external lockfile to contend over.
Tool descriptions are structured, not restrictive. Every tool follows WHEN, WHEN NOT, RETURNS, EXAMPLE and avoids forbidding language so the model chooses by fit.
Deny reasons separate redirect from restriction. A neutral redirect to a better tool is reported differently from a true safety block.
Stats use a strict compression formula. The savings figure in ctx stats comes from a fixed formula, so the number you see is consistent and reproducible rather than an estimate.

The big picture

MCP server

Polyglot executor

Knowledge base

Knowledge base

Security layer

Runtime detection

SessionDB and analytics

Session continuity

Adapters

Recorded decisions

Where to go next

Think in Code

Benchmarks

On this page