2026-sheffey-geedge
findings extracted from this paper
-
The largest single source of censored domains in the GNL is MESA lab's SNI monitoring dataset (E21-SNI-Top200w.txt) containing 57,362 censored domains, and E21-SNI-Top120W-20221020.txt with 36,467 domains—totaling over 93K domains from network tap data alone for a single country (E21 = Ethiopia per InterSecLab attribution). A separate Xinjiang dataset (XJ-CUCC-SNI-Top200w.txt) contains 13,604 domains. These datasets "do not seem to come from popular domain lists, and instead appear to be gathered from network taps," confirming that Geedge builds censorship target lists directly from passive traffic observation.
-
Of 6,915,266 domains extracted from the 572 GiB Geedge Networks Leak (GNL), 298,955 censored domains (93.7% of all GNL-censored domains) appear in neither Tranco top-1M nor CitizenLab test lists. Measurements across China (Guangzhou/Nanjing), Myanmar, Pakistan, and Algeria confirmed censorship via DNS injection and SNI-based TLS connection termination. The GNL covers 25–62% of Tranco-censored domains across countries, showing substantial but incomplete overlap. This vendor-side ground truth reveals a censorship surface roughly two orders of magnitude larger than curated academic test lists.
-
The GNL reveals that Geedge actively maintains dedicated VPN-infrastructure tracking datasets. The China-specific component includes 7,016 domains in a "vpn-finder-plugins" repository (mesalab_git/intelligence-learning-engine), 4,810 NordVPN server domains, and a Pakistan-specific file listing 68 Psiphon CDN domains (geedge_docs/TSGEN/.../Psiphon-CDN_20240430.json) dated April 2024. A Myanmar deployment file (M22-VPN List.html, 27 domains) further confirms country-specific VPN blocklists are operationally maintained. The "Appsketch" program reverse-engineers VPN apps to extract domains and IP addresses for blocking.