Use the corpus

The corpus is designed to be useful in several ways. Pick whichever fits your workflow.

1. Plug into your AI assistant (one line, hosted)

The fastest path. The corpus runs as a hosted MCP server at corpus.lantern.io/mcp. Zero install, no toolchain, always reflects the latest committed state of the repo (auto-deploys on every push to main).

Claude Code

claude mcp add --transport http -s user circumvention-corpus https://corpus.lantern.io/mcp

Verify with claude mcp list; it should show ✓ Connected. The server's four tools become available in any conversation.

Codex CLI

Add the server to ~/.codex/config.toml (create the file if it doesn't exist):

[mcp_servers.circumvention-corpus]
url = "https://corpus.lantern.io/mcp"

Then in any Codex session: /mcp lists configured servers; circumvention-corpus should appear with its four tools.

Claude Desktop

Edit your config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%/Claude/claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "circumvention-corpus": {
      "url": "https://corpus.lantern.io/mcp",
      "transport": "http"
    }
  }
}

Restart Claude Desktop.

Cursor / VS Code Copilot / other MCP clients

Any MCP-compliant client takes a URL via the Streamable HTTP transport. Same shape — drop the URL above into your client's MCP config.

2. Browse this site

Every paper has a stable URL: /papers/<id>/. Tag indexes (censors, techniques, defenses) let you walk the field by axis. The whole site rebuilds from the YAML on every push to main; whatever you see here matches the source repo.

3. Read the YAML directly

Every paper is a small YAML file in corpus/papers/. The JSON schema documents every field. The taxonomy documents the controlled-vocabulary IDs that tag fields use. If you're building your own tooling on top of the corpus, this is the most boring, most stable interface — clone the repo, walk the directory.

git clone https://github.com/getlantern/circumvention-corpus
cd circumvention-corpus
ls corpus/papers/                       # one YAML per paper
yq '.censors' corpus/papers/2023-wu-fully-encrypted-detect.yaml

4. Self-host the MCP server (offline / privacy)

For users behind aggressive censorship who can't reach Cloudflare, or anyone who'd rather not send queries off-machine. The corpus ships a Go MCP server with stdio transport — single binary, no runtime deps.

go install github.com/getlantern/circumvention-corpus/cmd/corpus-mcp@latest

# Then register it. The --corpus flag points at a local clone of the repo.
git clone https://github.com/getlantern/circumvention-corpus ~/code/circumvention-corpus
claude mcp add -s user circumvention-corpus \
  $(go env GOPATH)/bin/corpus-mcp -- --corpus $HOME/code/circumvention-corpus

What the MCP server exposes

Four tools, designed to compose:

search_papers: Keyword + tag-filter search. Filters: censors, techniques, defenses, year_min, year_max, venue, core_only. Returns ranked records with abstract, tags, and team notes.
get_paper: Full record for a single paper id, plus any extracted findings tagged to it. Use after search_papers when the agent needs the full notes / references / metadata.
list_taxonomy: Returns the controlled vocabulary so the agent knows the canonical IDs to filter on. Especially useful as the first call in a session — gives the model the mental model of the field's structure.
find_related: Papers that share tags with a given paper. mode = same_technique (default), same_censor, or same_defense.

Example questions the MCP makes easy:

"Find every paper that evaluates a defense against the GFW's fully-encrypted-traffic detector."
"What did anyone publish about Iran's censorship in 2024-2025?"
"For my new protocol design: which papers should I read about active probing?"
"Show me the citation neighborhood of 2023-wu-fully-encrypted-detect."

5. Build something on top

The schema is CC0. The metadata is CC0. Build whatever you want with it — your own UI, a notification system that pings you when papers tagged with a specific technique appear, a sister index for a different region. The whole point of having a structured-metadata layer is that the data outlives whatever interface we put on top of it.