2024-calle-toward

Toward Automated DNS Tampering Detection Using Machine Learning

Paola Calle, Larissa Savitsky, Arjun Nitin Bhagoji, Nguyen Phong Hoang, Shinyoung Cho · Free and Open Communications on the Internet · 2024

canonical link →

Tags

censors: generic
techniques: dns-poisoning ml-classifier
defenses: amp-cache

findings extracted from this paper

Majority-vote ML inference (OCSVM + IF) over OONI data uncovered at least 5 previously undocumented DNS injection IPs active in Russia (e.g., 195.19.90.226, 95.167.13.51, 61.95.167.13.50, 188.19.132.154, 144.85.142.29.248) absent from OONI's existing blocking-fingerprints database, along with novel fingerprints in Italy, Czech Republic, and the UK. Records with fewer than 50 instances were excluded as a conservative false-positive filter.

§4.3, Table 5 detection dns-poisoningml-classifiermeasurement-platform ru
XGBoost trained on a single month of OONI data achieves near-optimal performance; expanding the training window to 24 months produces deviations of only 0–5 percentage points for FNR, 0.07 PP for FPR, and 0.10 PP for accuracy — suggesting that larger windows introduce noise and overfitting rather than improving detection. Isolation Forest performance degrades more sharply, with accuracy dropping ~5 PP as training data grows beyond 6 months.

§4.2, Figure 3 evaluation dns-poisoningml-classifiermeasurement-platform generic
For the Isolation Forest model, resolver ASN (SHAP importance 0.237) and probe ASN (0.220) are the two most predictive features for DNS tampering, reflecting that censorship is topologically concentrated at specific network vantage points. For XGBoost, headers_match dominates (0.317), followed by asn_control_match (0.177), indicating that supervised models rely more on cross-layer consistency signals. DNS tampering represents only 0.5–0.8% of all OONI measurements across 2022–2023 (Figure 2), creating severe class imbalance in any training set.

§4.1, Table 4, Figure 2 detection dns-poisoningml-classifiermeasurement-platform generic
XGBoost achieves a False Positive Rate of 0.0005, True Positive Rate of 0.9403, and overall accuracy of 0.9991 on OONI global DNS measurement data (2.5% stratified sample), vastly outperforming unsupervised alternatives: Isolation Forest achieves FPR 0.1321 / ACC 0.8699, and One-Class SVM degrades to FPR 0.9711 / ACC 0.0598, making OCSVM effectively unusable for this task.

§4.1, Table 3 evaluation dns-poisoningml-classifiermeasurement-platform generic