Contribute
If you'd rather use the corpus than contribute to it, see use the corpus.
Add a paper
- Pick a stable id:
YYYY-firstauthor-shortslug(lowercase, dashes). - Create
corpus/papers/<id>.yamlfollowing the schema. - Tag against the controlled vocabulary in the taxonomy. If a tag you need doesn't exist, add it to
schema/taxonomy.yamlin the same PR. - Set
visibilityhonestly. If unsure, default to non-public; promoting later is easy, recalling a leak isn't. - Write the
notesfield. The abstract is what the authors said; the notes are what your team thinks about it. - Open a PR. CI runs the corpus integrity test (every tag must resolve, every reference must exist).
For private papers
Don't put them in this repo. Use a separate private repo with the same schema (Lantern team: circumvention-corpus-private). The MCP server reads both data dirs locally; the public site you're reading right now serves only visibility: public records.
Add a tag to the taxonomy
Open a PR editing schema/taxonomy.yaml. New terms should have a definition and ideally a citation to a paper that uses the concept. Synonyms map alternate spellings to the canonical term.
Extract findings from a paper
The findings/ directory holds extracted claims (one- to three-sentence statements like "the GFW's classifier achieves 94% precision on Snowflake DTLS handshakes") tagged against the same vocabulary as papers. This is the highest-leverage curation work — it's what makes the corpus answer questions like "what did anyone find about technique X" without re-reading every paper.
An LLM (Claude/GPT/etc.) can propose findings if you feed it a paper; commit them only after a human review.