2019-sheffey-improving
findings extracted from this paper
-
An adaptive censor that retrains classifiers on both unmodified and GAN-transformed Meek traffic ('informed NN') partially recovers detection capability: informed NN achieves a PR-AUC of 0.440 against modified traffic versus 0.309 for the naive NN, and achieves FPR of 0.667 versus 1.000 for the naive NN. However, the informed NN suffers from catastrophic interference and performs worse on FPR than the naive classifier on unmodified data (0.545 vs. 0.002).
-
A GAN-based adversarial transformer applied to Meek traffic signatures increases mean classifier FPR from 0.183 to 0.834 and decreases mean area under the precision-recall curve (PR-AUC) from 0.990 to 0.414 across naive neural network, informed neural network, and CART decision tree classifiers evaluated on three geographically distinct datasets (residential, university, AWS).
-
The paper identifies that Meek traffic is compared against average HTTPS traffic across all domains rather than against traffic specific to the CDN fronting host (e.g., ajax.aspnetcdn.com for meek-azure), meaning a transformed signature that mimics generic HTTPS may still appear anomalous relative to expected traffic to that specific CDN host. This dataset construction limitation means real-world GAN-guided shaping must target host-specific traffic baselines, not population-wide HTTPS baselines.
-
Prior ML classifiers achieve near-perfect detection of unmodified Meek traffic using side-channel features: Wang et al. attain a false positive rate (FPR) as low as 0.0002 with a CART decision tree, Yao et al. achieve 99.98% accuracy with a hidden Markov model, and Nasr et al. deanonymize Meek flows with FPR of 0.0005 using a neural network. The distinguishing features are TCP payload size distributions (Meek concentrates 60–70 byte payloads) and inter-arrival time distributions (higher latency).
-
Incorporating perturbation loss — the mean absolute difference between original and transformed traffic signatures — into the GAN's training objective constrains the transformer to make minimal modifications, reducing the implementation overhead a real-time traffic shaper would require. The perturbation loss is weighted at 10× relative to classification losses, enforcing sparse modifications while still fooling the discriminator.