FINDING · DETECTION
The GFW uses substring-matching regular expressions rather than exact domain matching, causing 41K of 311K censored domains to be overblocked — unrelated domains that happen to contain a censored domain string. The three base domains causing the most overblocking (919.com, jetos.com, 33a.com) collectively caused 15K unrelated domains to be inadvertently censored.
From 2021-hoang-great — How Great is the Great Firewall? Measuring China's DNS Censorship · §4.1 · 2021 · USENIX Security Symposium
Implications
- Domain names chosen for circumvention infrastructure should be audited against the GFW base domain list to avoid substring collisions that cause automatic blocking.
- Even newly registered domains with innocuous content can be collaterally censored if they textually contain a blocked keyword — operators should screen candidate domains before deployment.
Tags
Extracted by claude-sonnet-4-6 — review before relying.