DEFENSES
format-transform Format-transforming encryption
Synonyms: FTE
10 papers on file
- 2025-niere-transport Transport Layer Obscurity: Circumventing SNI Censorship on the TLS-Layer
- 2025-sheffey-extended Extended Abstract: I’ll Shake Your Hand: What Happens After DNS Poisoning
- 2024-niere-http-smuggling Turning Attacks into Advantages: Evading HTTP Censorship with HTTP Request Smuggling
- 2024-niere-tls-attacker TLS-Attacker: A Dynamic Framework for Analyzing TLS Implementations
- 2023-niere-poster Poster: Circumventing the GFW with TLS Record Fragmentation
- 2020-oakley-protocol Protocol Proxy: An FTE-based covert channel
- 2017-tanash-decline The Decline of Social Media Censorship and the Rise of Self-Censorship after the 2016 Failed Turkish Coup
- 2015-dyer-marionette Marionette: A Programmable Network-Traffic Obfuscation System
- 2014-luchaup-libfte LibFTE: A Toolkit for Constructing Practical, Format-Abiding Encryption Schemes
- 2013-dyer-protocol Protocol Misidentification Made Easy with Format-Transforming Encryption
23 findings tagged here
-
The Henan Firewall is stateless in two exploitable ways: (1) it requires the TCP header to be exactly 20 bytes—enabling any TCP option (e.g., TCP Timestamps, which Windows disables by default) to bypass it entirely; (2) it does not perform TCP reassembly, so splitting a TLS ClientHello across two TCP segments such that the SNI extension straddles the boundary bypasses the censor. Both bypasses require only client-side changes and have already been implemented in Xray, GoodbyeDPI, and Shadowrocket. TLS record fragmentation (splitting the ClientHello across multiple TLS records within one TCP segment) also defeats both the Henan Firewall and the GFW, since neither performs TLS reassembly.
-
The GFW's QUIC censor does not reassemble QUIC client Initial packets that are split across multiple UDP datagrams, nor does it reassemble QUIC CRYPTO frames split within a single datagram. Three practical bypasses follow: (1) send any UDP datagram with a random payload before the QUIC Initial—the GFW uses 60-second UDP flow state and won't inspect a mid-flow packet; (2) fragment the TLS ClientHello SNI across multiple QUIC CRYPTO frames; (3) use an unknown QUIC version number in the first packet (Version Negotiation bypass, payload undecryptable). Chrome independently exploits (2) through its Chaos Protection feature (since 2021) and post-quantum Kyber key-agreement (since v124, Sep 2024), whose larger key sizes force fragmentation across UDP datagrams. As of January 2025, the GFW also does not block ECH-containing QUIC payloads unless the outer (cleartext) SNI is on the blocklist.
-
Marionette, the prior programmable protocol system, executes user-specified plugin code in a generic Python runtime, making proxies and clients vulnerable to a malicious or buggy protocol distributor and creating a single point of failure in distributed networks like Tor. Marionette also lacks support for multiple simultaneous protocols and version upgrades, limiting its ability to respond to changing censorship rules across heterogeneous client populations.
-
Classical public-key steganography (Algorithm 1 from [54]) has a 100% failure rate when encoding a 16-byte message using GPT-2, because GPT-2's per-token entropy drops near zero frequently and standard rejection sampling cannot find an acceptable token. Entropy bounding reduces failure to 0–10% but introduces detectable statistical bias: selected tokens come from a visibly different probability distribution than baseline samples.
-
Protocol Proxy uses 'protected static protocols' — UDP-based protocols whose blocking causes severe collateral damage (e.g., Synchrophasor power-grid traffic, NTP) — as cover channels. Because any detection rule that fires on Protocol Proxy traffic also fires on legitimate PMU traffic, censors face a forced trade-off between blocking circumvention and disrupting critical infrastructure.
-
A deterministic Hidden Markov Model trained on 770,000+ real Synchrophasor samples produces interpacket timing that is statistically indistinguishable from the host protocol: the two-sample Kolmogorov–Smirnov test yields p = 0.21 (threshold 0.05, fail to reject null), and χ² homogeneity p-values for all three timing states are 0.82, 0.37, and 0.15 respectively.
-
Observation-based FTE constructs each packet field exclusively from values previously observed in real host-protocol traffic, guaranteeing syntactic equivalence. Wireshark correctly decodes Protocol Proxy-generated packets as valid Synchrophasor frames with correct checksums, and the Phasor Data Concentrator hardware accepts them; any rule blocking Protocol Proxy traffic must therefore also block legitimate PMU packets.
-
The Protocol Proxy achieves an observed goodput of only 182 bps against a 54 Mbps baseline link (>99.99% reduction), well below the theoretical ceiling of 15,477 bps; the gap is attributed to TCP retransmission overhead and the TCP header transiting the proxy. Tor baseline goodput measured at 7.31 Mbps by comparison.
-
Static protocols — UDP-based with no application-layer handshake — are immune to stateful protocol analysis that defeated SkypeMorph: without a handshake state machine, a censor cannot flag discrepancies between observed and expected protocol states. This eliminates the detection vector that Houmansadr et al. (2013) exploited to identify SkypeMorph via handshake mismatch.
-
Marionette is the first programmable obfuscation system to simultaneously satisfy all five threat-model dimensions evaluated in Figure 2: resistance to blacklist DPI, whitelist DPI, statistical-test DPI, protocol-enforcing proxy traversal, and multi-layer traffic control, while sustaining throughput above 1 Mbps (up to 6.7 Mbps). Every prior system (obfs4, ScrambleSuit, SkypeMorph, StegoTorus, FTE, JumpBox, etc.) fails at least one dimension, most commonly stateful proxy traversal or statistical-feature control.
-
Format-Transforming Encryption (FTE) fails under proxy-induced ciphertext modification — a single character change causes decryption failure — while Marionette's probabilistic context-free grammar (CFG) templates tolerate header rewriting, connection multiplexing, and content alteration by intermediate proxies. Validated across 10,000 streams through Squid 3.4.9, achieving 5.8 Mbps downstream and 0.41 Mbps upstream goodput.
-
Format-transforming encryption (FTE) as deployed in the Tor Browser Bundle is detected by combining a URI Shannon-entropy threshold (≥5.5 bits) with an exact URI length check (239 bytes) on the first HTTP GET request. This embellished test produces only 264 false positives across approximately 10 million HTTP URIs in three campus datasets, while a length-only test causes roughly 15% false-positive rate over the same flows.
-
CART decision-tree classifiers trained on entropy-based and packet-header features detect all five Tor pluggable transports (obfsproxy3/4, FTE, meek-amazon, meek-google) with average PR-AUC=0.987, TPR=0.986, and FPR=0.003 on synthetic traces. On 14 million real campus flows the highest per-obfuscator FPR is 0.65%, and meek-google yields only 842 false positives across all three datasets. However, cross-environment portability is poor: classifiers trained on an Ubuntu/campus setup and tested on a Windows/home network achieve true-positive rates as low as 52% with false-positive rates reaching 12%.
-
DFA state-space explosion makes DFA-based FTE impractical for many realistic network-monitor regexes: the minimum DFA for `(a|b)*a(a|b){16}` has 131,073 states requiring 266 MB of precomputed tables, while the equivalent NFA has only 36 states requiring 73 KB — a reduction of roughly four orders of magnitude. Some formats in the Snort corpus required up to 383 MB under DFA-based ranking, rendering them prohibitive for deployment.
-
In PostgreSQL benchmarks, FPE-encrypted account-balance fields (libfte P-DD scheme, regex `\-[0-9]{9}`) reduce throughput by only 0.8% for complex mixed-transaction workloads (USUUI) and only 1.1% for SELECT-only workloads, relative to conventional authenticated encryption. Per-query latency for FPE versus authenticated encryption is identical across all five tested query types.
-
A deterministic FTE scheme (T-DD) that maps 16-digit credit card numbers to 7-byte ciphertext strings achieves simultaneous encryption and compression, reducing on-disk table size from 112 MB (authenticated encryption) to 42 MB — a 62.5% reduction — while maintaining provable privacy. The compression arises because the ciphertext format's message space is smaller than the plaintext's.
-
LibFTE exposes a regex-based API (Python, C++, JavaScript) that instantiates DPI-defeating FTE schemes from a regular-expression format specification alone, without expert cryptographic knowledge. The DCRS FTE scheme implemented in the library makes ciphertexts indistinguishable from real HTTP, SMTP, SMB, or other network-protocol messages under state-of-the-art DPI, and was already integrated into the Tor Browser Bundle at time of publication.
-
LibFTE's NFA-based 'relaxed ranking' sidesteps the PSPACE-hardness obstacle that previously made direct NFA ranking unworkable. Across 3,458 Snort IDS regular expressions in the network-monitor-circumvention setting, NFA-based ranking reduces client/server memory requirements by as much as 30% compared to DFA-based approaches.
-
Manually-generated FTE regexes achieve a 100% misclassification rate against all six tested DPI systems — appid, l7-filter, YAF, bro, nProbe, and the proprietary enterprise-grade DPI-X — for HTTP, SSH, and SMB target protocols. Each regex took less than 30 minutes to specify and debug against known classifiers.
-
FTE proxy overhead compared to socks-over-ssh: the intersection-ssh format incurred 0% average latency increase and only 16% bandwidth overhead (1,164 KB vs. 1,348 KB per Alexa Top 50 site). The worst-case auto-http format incurred 29% latency increase (5.5 s vs. 7.1 s) and 181% bandwidth overhead (3,279 KB), primarily due to ciphertext expansion and FTE/SOCKS negotiation on persistent empty TCP connections.
-
An FTE-tunneled Tor circuit using intersection, manual, and auto HTTP formats successfully traversed the Great Firewall of China from a VPS inside China to a server in the United States on port 80. A persistent tunnel polling a censored URL every five minutes remained active for one month until VPS account termination, with no blocking observed.
-
Regex-based DPI is fundamentally vulnerable to format-transforming encryption: because every tested system (including the proprietary enterprise-grade DPI-X, rated for 1.5 Gbps at $8,000) classifies protocols solely by membership in a regular language, any ciphertext can be guaranteed to match any chosen regex. The paper argues this forces DPI to adopt machine learning, active probing, or non-regular semantic checks — but notes that making such checks fast, scalable, and low-false-positive at line rate for arbitrary target protocols remains an open problem.
-
DPI boxes used for censorship do not rely solely on simple regular expressions but also employ context-sensitive languages for protocol identification. The paper notes that precise knowledge of these DPI patterns could be fed directly into format-transforming encryption to enable targeted protocol misidentification.