2007-crandall-conceptdoppler

ConceptDoppler: A Weather Tracker for Internet Censorship

Jedidiah R. Crandall, Daniel Zinn, Michael Byrd, Earl Barr, Rich East · Computer and Communications Security · 2007

canonical link →

Tags

censors: generic
techniques: measurement-platform

findings extracted from this paper

ChinaNET (CHINANET-*) performed 324/389 = 83.3% of all filtering observed across 296 probed hosts over a two-week period, and 99.1% of all filtering that occurred at the first hop past the Chinese border, despite constituting only 77% of first-hop routers encountered.

§3.3.1 evaluation keyword-filteringrst-injection cn
GFC keyword filtering exhibits strong diurnal patterns in which filtering effectiveness drops markedly during busy network periods, sometimes letting more than one fourth of packets containing known filtered keywords pass through unimpeded; the blocking timeout after a keyword RST was measured at 90 seconds for the tested route.

§3.2 detection keyword-filteringrst-injection cn
GFC keyword filtering is distributed across the backbone, not confined to border routers: only 29.6% of filtering occurred at the first hop into China's address space, 11.8% occurred beyond the third hop (with as many as 13 hops past the border in one case), and 28.3% of the 296 probed Chinese hosts were reachable via paths with no filtering at all.

§3.3 detection keyword-filteringrst-injection cn
Latent semantic analysis applied to the Chinese-language Wikipedia (942,033 terms across 94,863 documents, k=600 rank reduction) discovered 122 previously unknown GFC-filtered keywords starting from only 12 seed concepts; each list of 2,500 candidate terms required 1.2–6.7 hours to probe, with an average of 3.5 hours.

§4.3 evaluation keyword-filteringmeasurement-platform cn
When the GFC keyword blacklist is known, multiple server-side-only evasion techniques become viable requiring no client modification: IP packet fragmentation to split keywords across MTU boundaries, HTML comment injection mid-keyword (e.g., 'Fa<!- Comment ->lun Gong'), alternative URL percent-encodings (e.g., 'F%61lun Gong'), and spam-style character substitution ('F@1un G0-ng'); the GFC implementation was observed not to check control characters in URL requests.

§5 defense keyword-filtering cn