2024-tsai-modeling
findings extracted from this paper
-
CenDTect, an unsupervised decision-tree system using iterative parallel DBSCAN, analyzed more than 70 billion Censored Planet data points (January 2019 – December 2022) and discovered 15,360 HTTP(S) censorship event clusters across 192 countries and 1,166 DNS event clusters across 77 countries. Manual validation against 38 known censorship events from news reports confirmed all human-identified events were recoverable from CenDTect's output. The system additionally identified more than 100 ASes in 32 countries with persistent ISP-level blocking and 11 temporary blocking events in 2022 correlated with elections, protests, and armed conflict.
-
CenDTect uses cross-classification accuracy — how well a decision tree trained on one domain's blocking pattern predicts another domain's blocking — as a distance metric to cluster domains that share the same blocking policy. This metric outperforms prior time-series approaches because it is interpretable (the resulting decision tree directly reveals the blocking mechanism: which ISP, which port, which protocol) rather than producing opaque anomaly scores. The approach scales to planetary-measurement volumes without requiring labelled training data.