2020-niaki-iclab
findings extracted from this paper
-
ICLab's semi-automated block page discovery — combining HTML tag-frequency vector clustering with locality-sensitive hashing (LSH) of page text — identified 48 previously unknown block page signatures from 13 countries: 15 via structural clustering across 5 countries and 33 via textual similarity clustering across 8 countries. The system seeds from 308 manually verified regular expressions and uses a URL-to-country ratio sort (largest ratio discovered: 286) to prioritize candidates for manual review, eliminating reliance on brittle hand-maintained regex lists alone.
-
Between January 2017 and September 2018, ICLab conducted 53,906,532 measurements of 45,565 URLs across 62 countries and 234 ASes, detecting blocking of 3,602 unique URLs in 60 countries via DNS manipulation, TCP packet injection, and block page delivery. Iran blocked 20–30% of Alexa top-500 URLs — more than any other monitored country — while Saudi Arabia consistently blocked roughly 10%. The global trend in detected censorship shows a steady decrease, which the authors attribute to rising adoption of TLS and circumvention tools.
-
ICLab's longitudinal monitoring detected censorship shifts coinciding with political events weeks before press coverage: Turkey's filtering rate rose from roughly 3% to 5% in late April 2017 — with blocked content shifting from pornography to news and political sites — ahead of a June 2017 constitutional referendum. India's censorship dropped from roughly 2% to 0.8% following a net neutrality announcement in late 2017, then partially recovered to roughly 1.5% after mid-2018 regulations clarified that illegal-content filtering would continue. Within the same country, different blocking techniques were applied to different content categories simultaneously (e.g., Turkey used DNS manipulation for illegal/streaming URLs but block pages for pornography and news).
-
Of 19,493,925 TCP packet injection events ICLab detected, only 0.7% (143,225) could be definitively attributed to censorship after multi-heuristic filtering; a further 58% (15,589,882) were RST-or-ICMP-unreachable events classified only as 'probable censorship' because ordinary network failure could not be excluded. Block pages appeared in just 3.4% of definitively-censored injections, meaning the vast majority of censor-side TCP disruption is covert. DNS manipulation detection achieved a false positive rate of approximately 10⁻⁴ using a threshold of θ=11 autonomous systems, cross-checked against block page observations.
-
ICLab's commercial VPN vantage points reside in data-center ('content') ASes for 41% of monitored networks, which may experience less aggressive censorship than residential ISPs, making VPN-based measurements a systematic lower bound on blocking rates faced by ordinary users. In countries where both VPN and volunteer-operated device (VOD) vantages coexist, identical block pages were observed from both AS types, indicating similar overt blocking policies, but covert IP-based or RST-injection blocking may still differ by AS class.