FINDING · EVALUATION
Latent semantic analysis applied to the Chinese-language Wikipedia (942,033 terms across 94,863 documents, k=600 rank reduction) discovered 122 previously unknown GFC-filtered keywords starting from only 12 seed concepts; each list of 2,500 candidate terms required 1.2–6.7 hours to probe, with an average of 3.5 hours.
From 2007-crandall-conceptdoppler — ConceptDoppler: A Weather Tracker for Internet Censorship · §4.3 · 2007 · Computer and Communications Security
Implications
- Circumvention servers can use LSA-based corpus analysis over a living news corpus to maintain a near-real-time keyword blacklist, enabling proactive server-side evasion before clients encounter blocks.
- The efficiency gain from LSA is prerequisite for tracking a dynamic blacklist: exhaustive per-word probing of a natural-language vocabulary is computationally and network-invasively infeasible at scale.
Tags
Extracted by claude-sonnet-4-6 — review before relying.