FINDING · EVALUATION
A Random Forest classifier trained solely on structural features of third-party request trees achieves ROC AUC of 0.81 and 72% balanced accuracy across 4,660 news domains with ≥50 daily observations. Performance degrades to ROC AUC 0.78 and 0.68 for domains requiring ≥100 and ≥150 daily observations respectively, driven by reduced training-set size rather than feature quality.
From 2025-sivan-sevilla-probing — Probing the third-party infrastructure of digital news on the Web · §5.1, Table 3 · 2025 · Free and Open Communications on the Internet
Implications
- Third-party request structure alone is a strong signal for site classification; circumvention infrastructure operators should expect similar structural fingerprinting to be applied to proxy or mirror sites.
- Model performance is highly sensitive to training-set breadth—classifiers built on small domain samples degrade substantially, suggesting that actively expanding the diversity of circumvention infrastructure reduces classifier confidence.
Tags
Extracted by claude-sonnet-4-6 — review before relying.