Contribute

If you'd rather use the corpus than contribute to it, see use the corpus.

Add a paper

Pick a stable id: YYYY-firstauthor-shortslug (lowercase, dashes).
Create corpus/papers/<id>.yaml following the schema.
Tag against the controlled vocabulary in the taxonomy. If a tag you need doesn't exist, add it to schema/taxonomy.yaml in the same PR.
Set visibility honestly. If unsure, default to non-public; promoting later is easy, recalling a leak isn't.
Write the notes field. The abstract is what the authors said; the notes are what your team thinks about it.
Open a PR. CI runs the corpus integrity test (every tag must resolve, every reference must exist).

For private papers

Don't put them in this repo. Use a separate private repo with the same schema (Lantern team: circumvention-corpus-private). The MCP server reads both data dirs locally; the public site you're reading right now serves only visibility: public records.

Add a tag to the taxonomy

Open a PR editing schema/taxonomy.yaml. New terms should have a definition and ideally a citation to a paper that uses the concept. Synonyms map alternate spellings to the canonical term.

Extract findings from a paper

The findings/ directory holds extracted claims (one- to three-sentence statements like "the GFW's classifier achieves 94% precision on Snowflake DTLS handshakes") tagged against the same vocabulary as papers. This is the highest-leverage curation work — it's what makes the corpus answer questions like "what did anyone find about technique X" without re-reading every paper.

An LLM (Claude/GPT/etc.) can propose findings if you feed it a paper; commit them only after a human review.