2021-hoang-great
findings extracted from this paper
-
GFWatch tested 534M distinct domains over 9 months (averaging 411M/day) and detected 311K censored domains, the largest such measurement in the literature. Of 138.7K base domains, only 1.3% appear in the top 100K most popular domains, confirming the GFW targets large numbers of obscure and unpopular domains far beyond well-known sites like Facebook or Twitter.
-
A circumvention strategy of holding DNS responses and filtering those matching the known forged-IP pool achieves 99.8% accuracy, correctly classifying 1,005,444,476 of 1,007,002,451 poisoned resolutions. From inside China, 99% of forged responses arrive within 364ms before the legitimate response, establishing 364ms as the recommended hold-on duration; from outside China, 11% of forged responses arrive after the legitimate one, making the IP-blocklist check necessary to avoid misclassifying genuine responses as poisoned.
-
The GFW's bidirectional DNS filtering — which poisons DNS queries regardless of whether they originate inside or outside China — has polluted the caches of major public DNS resolvers worldwide: Google (74,715 censored domains), Cloudflare (71,560), OpenNIC (65,567), and OpenDNS (63,295), with 77K censored domains found polluted in total. This is compounded by the fact that 38% of base censored domains (53K) have at least one authoritative name server inside China, ensuring systematic external pollution for those domains.
-
GFWatch discovered 1,781 unique forged IPv4 addresses used in GFW DNS poisoning, yet injection is non-random: only 600 (33.6%) account for 99% of all censored responses, with the remainder in a long tail responsible for just 1%. The forged IPv4 pool is dominated by addresses belonging to Facebook (783 IPs, 44%), WZ Communications (277, 15.6%), Twitter (200, 11.2%), and Dropbox (180, 10.1%); all forged IPv6 responses use the bogus Teredo prefix 2001::/32.
-
The GFW uses substring-matching regular expressions rather than exact domain matching, causing 41K of 311K censored domains to be overblocked — unrelated domains that happen to contain a censored domain string. The three base domains causing the most overblocking (919.com, jetos.com, 33a.com) collectively caused 15K unrelated domains to be inadvertently censored.