2017-weinberg-topics
findings extracted from this paper
-
Survival analysis of 423,265 pages with Wayback Machine histories shows pages on politically controversial topics have substantially shorter lifetimes than those on uncontroversial topics; topic change — not just page deletion — must be treated as 'death' for probe-list purposes, since a page that switches topic no longer contains the sensitive material that made it censorship-relevant.
-
China's Great Firewall adds sites to its blacklist within hours of their becoming newsworthy and drops them again just as quickly; conversely, Pakistan's pornography crackdown used a rarely-updated blocklist, causing 50% of consumption to shift to unlisted sites. An outdated probe list will therefore underestimate GFW effectiveness and overestimate effectiveness in countries with static lists.
-
Analysis of 758,191 URLs across 22 probe lists found near-zero URL-level Jaccard similarity between nearly all list pairs (most < 0.01), including between country blacklists; even at hostname level, blacklists share little with each other or with researcher-curated lists like ONI's 12,107-URL list, indicating that any single probe list systematically misses large portions of what is actually censored.
-
Topic correlation analysis across 2,904 list-topic pairs (585 significant after Bonferroni correction at α = 0.05) shows social media is disproportionately represented in country blacklists relative to the broader web; video-sharing sites are also frequently blocked, likely to suppress political organization, copyright infringement, or competition with local businesses.
-
Syria's 2015 blocklist contained a disproportionately large share of software-related sites because censors applied indiscriminate TLD-based blocking of all .il (Israeli) domain names regardless of content, demonstrating that non-topic-based criteria (country-code TLD, ASN) can sweep in entirely unrelated infrastructure and are detectable only through anomaly spot-checks rather than content analysis.