2023-raman-global
findings extracted from this paper
-
Manual analysis of 700+ unique packet groupings from possibly tampered connections yielded 19 high-confidence tampering signatures — up from 6 in prior work — covering 86.9% of all possibly tampered connections. Post-SYN signatures account for 43.2% of possibly tampered connections (99.5% matching a known signature), post-ACK for 16.1% (98.7%), and post-first-data-packet (PSH+ACK) for 5.3% (97.9%), with 19 signatures described as flag-sequence patterns of the form ⟨X→Y⟩ in Table 1.
-
Sampling 1-in-10,000 TCP connections at Cloudflare's 285+ PoPs (serving ~17–20% of the Internet's websites, handling 45M HTTP requests/second at average load) over two weeks in January 2023 revealed that 25.7% of all sampled connections were 'possibly tampered.' The passive technique requires no vantage points inside censored networks, covering cellular, enterprise, and low-penetration-country networks that active measurement cannot reach.
-
Post-handshake tampering signatures (⟨SYN;ACK→RST⟩ and ⟨SYN;ACK→RST+ACK⟩) constitute 34.4% of tampered connections from Iranian networks, but over 70% from Sri Lanka networks and over 81% from Turkmenistan networks, suggesting that censors in the latter two countries disproportionately block at the IP/TCP-handshake level before any application-layer content is visible — consistent with IP-list-based blocking rather than SNI-based DPI.
-
Censoring middleboxes predominantly use RST injection rather than in-path packet dropping because injecting forged RST/RST+ACK packets does not require the middlebox to sit in the data path — off-path copies of packets suffice. The GFW specifically injects both RST and RST+ACK packets simultaneously after an offending PSH, a known idiosyncratic signature, while Iran's censor uses post-handshake RST injection (⟨SYN;ACK→RST⟩) and packet drops at the same stage.
-
Passive measurement of real user connections demonstrates that published active-measurement test lists (Citizen Lab, Herdict, GreatFire, Berkman Klein, and top-K lists) miss a considerable fraction of domains that are actively being tampered with, as confirmed in §5.5. Because passive measurement is driven by real user requests rather than an a priori domain list, it can discover blocked domains that were never included in any test list and has no dependency on volunteers providing ground truth.