2015-ensafi-analyzing
findings extracted from this paper
-
The GFW blocks Tor primarily by dropping SYN/ACK segments entering China from blacklisted IP/port pairs, not by dropping SYN segments leaving China. Of 142,802 CN→Tor-Relay measurements, 81.52% were Server-to-client-dropped versus only 0.55% Client-to-server-dropped. Blocking Tor directory authorities also showed substantial Client-to-server drops (19.61%), suggesting authorities may be treated differently.
-
GFW filtering failures — cases where blocked Tor traffic passed through — showed no conspicuous geographic patterns across China. The maximum observed Pearson correlation coefficient between neighboring clients' failure counts was 0.26 (near-zero), and failure cases were geographically distributed in proportion to Internet penetration, not clustered by province or ISP region.
-
GFW failures are both persistent and intermittent: four client/server pairs showed all 22 hourly measurements over a full day returning No-packets-dropped (entirely unblocked), while many others showed only sporadic failures. Temporal analysis showed failures cluster in bursts of hours, with probability of a second failure decaying sharply beyond ~5 hours after the first.
-
Routing is the dominant structural factor in GFW failures. CERNET (the Chinese Educational and Research Network) accounted for 503 of 135 destination IPs' failures — by far the most of any network — and packets transiting CERNET→CERNET links reached Tor destinations at an r=0.9896 ratio, near 1.0. Within CHINANET and CNC Group backbones, the Tor-to-non-Tor traversal ratio dropped to 0.403 and 0.272 respectively (Table 4), indicating heavy intra-ISP filtering.
-
The hybrid idle scan technique converts approximately 1% of the total IPv4 address space into passive measurement vantage points without requiring control of either the censored client or the destination server, enabling full bipartite connectivity measurements across 161 geographically stratified Chinese clients and 176 servers over 27 days. After data pruning for quality, 36% of raw measurements were usable; ARMA modeling was sufficient (over Hidden Markov Models) because only level-shift detection was needed.