2013-dyer-protocol

Protocol Misidentification Made Easy with Format-Transforming Encryption

Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, Thomas Shrimpton · Computer and Communications Security · 2013

canonical link →

Tags

censors: generic
techniques: dpi
defenses: format-transform marionette

findings extracted from this paper

Manually-generated FTE regexes achieve a 100% misclassification rate against all six tested DPI systems — appid, l7-filter, YAF, bro, nProbe, and the proprietary enterprise-grade DPI-X — for HTTP, SSH, and SMB target protocols. Each regex took less than 30 minutes to specify and debug against known classifiers.

§4.2, Figure 4 defense dpiml-classifier generic
FTE proxy overhead compared to socks-over-ssh: the intersection-ssh format incurred 0% average latency increase and only 16% bandwidth overhead (1,164 KB vs. 1,348 KB per Alexa Top 50 site). The worst-case auto-http format incurred 29% latency increase (5.5 s vs. 7.1 s) and 181% bandwidth overhead (3,279 KB), primarily due to ciphertext expansion and FTE/SOCKS negotiation on persistent empty TCP connections.

§5, Figure 5–6 evaluation dpi generic
An FTE-tunneled Tor circuit using intersection, manual, and auto HTTP formats successfully traversed the Great Firewall of China from a VPS inside China to a server in the United States on port 80. A persistent tunnel polling a censored URL every five minutes remained active for one month until VPS account termination, with no blocking observed.

§6 deployment dpiactive-probing cn
Default Tor connections to a private bridge inside China were detected by the Great Firewall via active probing: an initial connection succeeded, followed by a probe from a Chinese IP address approximately 15 minutes later that performed a TLS handshake and then blacklisted the (IP, port) combination. Subsequent connection attempts resulted in a successful SYN followed by spoofed TCP RSTs terminating both the client and bridge connections.

§6 detection active-probingrst-injection cn
Regex-based DPI is fundamentally vulnerable to format-transforming encryption: because every tested system (including the proprietary enterprise-grade DPI-X, rated for 1.5 Gbps at $8,000) classifies protocols solely by membership in a regular language, any ciphertext can be guaranteed to match any chosen regex. The paper argues this forces DPI to adopt machine learning, active probing, or non-regular semantic checks — but notes that making such checks fast, scalable, and low-false-positive at line rate for arbitrary target protocols remains an open problem.

§3, §7 detection dpiml-classifieractive-probing generic