FINDING · EVALUATION
Term frequency clustering of block pages achieves an F-1 measure of 0.98, correctly recovering manually identified block-page templates; page-length clustering performs far worse at F-1 of 0.64. Across the full ONI dataset, only 37 distinct term frequency vectors were found from five years of measurements, indicating that filtering vendors rarely change block-page HTML structure.
From 2014-jones-automated — Automated Detection and Fingerprinting of Censorship Block Pages · §5.1, §5.2 · 2014 · Internet Measurement Conference
Implications
- The structural stability of block-page templates (only 37 distinct vectors over 5 years) means a small, static signature library suffices for reliably identifying which commercial filtering product is in use — circumvention tools can use this to tailor evasion to a specific vendor.
- HTML-structure clustering is more reliable than byte-length heuristics for fingerprinting, so diagnostic tooling should prefer tag-frequency vectors over size thresholds when attributing blocking to a specific ISP product.
Tags
Extracted by claude-sonnet-4-6 — review before relying.