2024-wails-precisely
findings extracted from this paper
-
CNN-based deep learning reduces obfs4 false positive rate by an order of magnitude versus the best decision tree (FPR 2.9×10⁻³ vs. 3×10⁻²) while maintaining 100% recall, and achieves near-perfect Snowflake data-flow detection (Precλ=1k = 0.95, Fλ=1k = 0.97). However, at realistic base rates λ > 10⁶ all CNN classifiers still yield near-zero precision, leaving per-flow deep learning alone insufficient for nation-state-scale deployment.
-
The paper identifies that circumvention systems relying on long-lived, consistent proxy servers are fundamentally vulnerable to host-based temporal detection regardless of per-flow obfuscation quality, and recommends adversarial examples, ephemeral obfuscation servers, and programmable or polymorphic protocols as countermeasures. Snowflake's volunteer-browser proxy architecture—where proxies are ephemeral and addresses are not reused—is highlighted as inherently more resistant to host-based classification than static bridge designs like obfs4.
-
State-of-the-art ML-based obfs4 detection (Wang et al. decision tree) achieves 97% precision at equal base rates (λ=1) but precision collapses to 3% at a still-conservative λ=1,000; at λ=10⁶ precision approaches zero for all classifiers tested. This base-rate failure was previously uncharacterized because prior evaluations only considered balanced or near-balanced datasets.
-
Combining a CNN flow classifier with host-based temporal accumulation eliminates all false positive classifications after observing at most 38 flows per host while maintaining perfect recall for all obfs4 and obfs⋆ bridges. The scheme requires only 14 bits of state per (IP, port) pair; tracking 4×10⁹ destination services requires no more than 50 GiB of storage, feasible on commodity hardware.
-
obfs4 and obfs⋆ produce characteristic wire patterns—bursts of roughly MTU-sized payloads followed by a randomly-sized chaff packet—that CNN classifiers detect purely from packet-size sequences without payload inspection. A trivial per-bridge entropy-biasing re-encoding (obfs⋆) completely defeats the hand-tuned decision tree (0% precision, 0% recall) but does not reduce CNN detectability, because the CNN generalizes across size-distribution variants.