DEFENSES
obfs4 obfs4 (randomized handshake)
7 papers on file
- 2025-himmelberger-drivel Drivel: A Quantum-Safe Fully Encrypted Protocol Proxy
- 2025-pereira-position Position Paper: A Case for Machine-Checked Verification of Circumvention Systems
- 2020-frolov-detecting Detecting Probe-resistant Proxies
- 2015-frolov-the-use-of-tls The use of TLS in censorship circumvention
- 2015-ensafi-active-probing Examining how the Great Firewall discovers hidden circumvention servers
- 2015-wang-seeing Seeing through Network-Protocol Obfuscation
- 2012-lincoln-bootstrapping Bootstrapping Communications into an Anti-Censorship System
53 findings tagged here
-
During the June 2025 Iran shutdown, circumvention tool performance diverged sharply by transport design. Psiphon's multi-protocol architecture sustained 1.5 million concurrent users—roughly one-third of its normal Iranian base. Lantern's "proxyless" protocol (domain-fronting via CDN, ~40% of Lantern's Iranian traffic) showed moderate success. Tor usage collapsed during the blackout but bridge connections surged and rebounded quickly after lifting. BeePass (serving 500k+ daily users at shutdown onset) used live A/B testing of port/obfuscation-prefix combinations to probe the censors' blocking parameters in real time. The Ceno Browser's P2P network grew from 600 active peers on June 13 to ~8,000 by July 11, indicating that decentralized fallback paths stayed up even during peak blocking.
-
Classification from the first 5 packets × 320 bytes (1600-byte burst) achieves near-perfect accuracy across Tor (F1=0.9990), VPN (F1=0.9871), malware (F1=0.9954), and IoT attack traffic (F1=0.9966), with IP addresses masked and only header and initial payload retained. The earliest portion of each packet provides sufficient discriminative information for a classification decision made within the first kilobyte of a flow.
-
Drivel evaluates its design against the GFW's fully-encrypted-traffic detector (documented in Wu et al. 2023). The thesis demonstrates that switching to post-quantum primitives does not by itself change the traffic's appearance to a statistical censor classifier — the fully-encrypted detection problem is independent of the underlying cryptographic algorithm and must be addressed at the traffic-shaping layer regardless of key-exchange choice.
-
Drivel is an obfs4-style fully-encrypted proxy protocol that replaces obfs4's pre-quantum cryptographic primitives with post-quantum alternatives. It is one of the first circumvention protocols explicitly designed to remain secure under a quantum adversary, addressing the forward-secrecy threat to deployed circumvention traffic recorded today for future decryption.
-
Most deployed circumvention protocols (obfs4, Shadowsocks, Trojan, VMess, etc.) still rely on pre-quantum primitives (X25519, AES-GCM, ChaCha20). Drivel is the first published treatment of how to perform this migration in the specific context of a fully-encrypted pluggable transport, providing a design template and security analysis that does not exist elsewhere in the circumvention literature.
-
State-of-the-art ML classifiers (Deep Fingerprinting, Decision Tree, Random Forest, nPrintML) trained on known UPGen protocols and benign traffic always incur high out-of-distribution false-positive rates when attempting to block unknown UPGen protocols — in the vast majority of experiments the OOD FPR is 100%. The one exception (SSH OOD, Deep Fingerprinting) achieved a UPGen TPR of only 20%. By contrast, identical classifiers successfully generalize to block unknown Obfs4 flows with near-zero collateral damage in 3 of 4 cases.
-
In laboratory benchmarks, the best UPGen-generated protocol achieves 252 ms TTFB latency (vs 212 ms Obfs4, 313 ms TLS) and 4.25 Gbit/s throughput per core (vs 4.65 Gbit/s Obfs4, 9.42 Gbit/s TLS). The worst-case UPGen protocol (4.5 RTT handshake) reaches 677 ms TTFB but 3.70 Gbit/s throughput. In large-scale distributed Tor simulations, the choice of UPGen protocol had no statistically significant effect on end-to-end Tor flow performance.
-
The paper evaluates two short-term mitigations—TCP delayed ACK on the proxy server and connection multiplexing—but finds both are limited: delayed ACK produces atypical ACK timing that may itself be fingerprintable, and multiplexing only adds entropy without eliminating the RTTdiff signal. Critically, obfs4 and ScrambleSuit's delay-based timing obfuscation are described as 'fundamentally limited' because they manipulate inter-arrival times without eliminating the underlying transport/application-layer session misalignment. The paper concludes no existing obfuscation scheme provides a principled defense against timing-based proxy fingerprinting.
-
Cross-layer RTT discrepancy (RTTdiff) is a protocol-agnostic fingerprint that exploits an inherent architectural property of all proxy setups: transport-layer sessions terminate at the proxy while application-layer sessions remain end-to-end. Evaluation across 10 proxy protocols—including VMess, Shadowsocks, VLESS, Trojan, XTLS-Vision, and obfs4-wrapped SOCKS—shows near-identical detection rates for all except obfs4, confirming the fingerprint is not tied to any specific obfuscation scheme. At FPR=0.01, per-website detection rates exceed 70% across all tested client and proxy location combinations.
-
Random padding alone raises the classifier FPR only slightly (0.11% to 0.15%), and connection multiplexing alone raises it to 0.53%; however, combining both defenses raises FPR to 2.57%, making the detector impractical for a real-world censor and yielding TPR of 93.40%.
-
CNN-based deep learning reduces obfs4 false positive rate by an order of magnitude versus the best decision tree (FPR 2.9×10⁻³ vs. 3×10⁻²) while maintaining 100% recall, and achieves near-perfect Snowflake data-flow detection (Precλ=1k = 0.95, Fλ=1k = 0.97). However, at realistic base rates λ > 10⁶ all CNN classifiers still yield near-zero precision, leaving per-flow deep learning alone insufficient for nation-state-scale deployment.
-
The paper identifies that circumvention systems relying on long-lived, consistent proxy servers are fundamentally vulnerable to host-based temporal detection regardless of per-flow obfuscation quality, and recommends adversarial examples, ephemeral obfuscation servers, and programmable or polymorphic protocols as countermeasures. Snowflake's volunteer-browser proxy architecture—where proxies are ephemeral and addresses are not reused—is highlighted as inherently more resistant to host-based classification than static bridge designs like obfs4.
-
State-of-the-art ML-based obfs4 detection (Wang et al. decision tree) achieves 97% precision at equal base rates (λ=1) but precision collapses to 3% at a still-conservative λ=1,000; at λ=10⁶ precision approaches zero for all classifiers tested. This base-rate failure was previously uncharacterized because prior evaluations only considered balanced or near-balanced datasets.
-
Combining a CNN flow classifier with host-based temporal accumulation eliminates all false positive classifications after observing at most 38 flows per host while maintaining perfect recall for all obfs4 and obfs⋆ bridges. The scheme requires only 14 bits of state per (IP, port) pair; tracking 4×10⁹ destination services requires no more than 50 GiB of storage, feasible on commodity hardware.
-
obfs4 and obfs⋆ produce characteristic wire patterns—bursts of roughly MTU-sized payloads followed by a randomly-sized chaff packet—that CNN classifiers detect purely from packet-size sequences without payload inspection. A trivial per-bridge entropy-biasing re-encoding (obfs⋆) completely defeats the hand-tuned decision tree (0% precision, 0% recall) but does not reduce CNN detectability, because the CNN generalizes across size-distribution variants.
-
Obfuscated proxy traffic (including Shadowsocks, VMess, VLESS, Trojan, obfs4, and REALITY) can be reliably fingerprinted by detecting encapsulated TLS handshakes — the inner TLS ClientHello that appears inside an outer encrypted tunnel. This fingerprint is protocol-agnostic: any proxy that wraps TLS-bearing application traffic will produce it. The authors deployed a similarity-based classifier within a mid-size ISP serving over one million users and demonstrated detection with minimal collateral damage.
-
Achieving active security (FEP-CCFA) requires that on any AEAD decryption failure a fully encrypted protocol silently return the empty string and keep the channel open indefinitely, never emitting a channel-closure signal. Any observable behavioral difference — including connection termination timing — leaks information about ciphertext-boundary locations to an active adversary.
-
Shadowsocks transmits a fixed-size AEAD-encrypted length field followed by the AEAD-encrypted payload with no support for reducing ciphertext size via fragmentation, while Obfs4 permits input-side padding but not output fragmentation. These designs impose distinct minimum output message lengths, allowing a passive adversary to distinguish between them — and identify short-message sessions — based solely on the minimum observed message length.
-
No existing fully encrypted protocol — including Obfs4, Shadowsocks, VMess, and Obfuscated OpenSSH — simultaneously satisfies passive indistinguishability (FEP-CPFA), active-manipulation resistance (FEP-CCFA), and output-length shaping. The paper presents a novel stream-based construction that provably satisfies all three using AEAD-authenticated length blocks, an output buffer supporting arbitrary fragmentation, and a padding mechanism allowing the sender to emit exactly p output bytes on demand.
-
Obfs4's data-transport phase encrypts per-record length fields with an unauthenticated stream cipher. An active adversary can overwrite this field to force a predictable TCP connection termination at a calculable byte offset; the authors experimentally confirmed that Tor-over-Obfs4 connections can be reliably distinguished from other FEPs because client initiation messages have consistent lengths.
-
Censors optimize for utility under asymmetric misclassification costs rather than raw accuracy: false positives (blocking legitimate traffic) carry economic and political costs that make censors conservative about deploying classifiers with high false-positive rates. Multi-flow stateful classifiers — such as the obfs4 Elligator probabilistic distinguisher, which requires correlating observations across multiple connections — are operationally more expensive than single-packet or connection-initiation classifiers, which the author suggests explains why probabilistic multi-flow distinguishers have not been exploited in practice even when theoretically available.
-
Despite fully encrypted protocols existing since obfs2 in 2012, the first documented evidence of the GFW passively detecting them purely by randomness appeared only in 2021 — approximately a decade later — and was limited to certain foreign IP address ranges and a subsampled fraction of traffic. Meanwhile, the GFW had been discovering obfs2/obfs3 servers via active probing as early as 2013, indicating censors found active-probing-based address discovery cheaper and more reliable than passive statistical classifiers for this protocol family.
-
Three independent implementation flaws in obfs4proxy's Elligator encoding made obfs4 public-key representatives passively distinguishable from uniform random bytes: (1) non-canonical square roots allowed a square-then-root test matching 100% of obfs4 outputs but only ~50% of random strings; (2) bit 255 was always zero; (3) only large prime-order subgroup points were encoded. A classifier exploiting these achieves 100% sensitivity (obfs4 never falsely marked as random) at less-than-100% specificity. All three were fixed in obfs4proxy-0.0.12 (December 2021) and 0.0.14 (September 2022).
-
Protocol fingerprinting — including DPI-based identification of VPNs, circumvention tools, and E2EE messengers — was active in only 6% of countries during the measurement period (13% all-time), but all confirmed instances came from focused individual studies, not from mass measurement platforms like OONI or Censored Planet. The authors flag encrypted traffic analysis (ETA) tools and next-generation firewalls (NGFWs) capable of blocking Signal or Tor Browser as an emerging threat to freedom of expression.
-
The GFW's fully-encrypted detector (deployed Nov 2021) operates by exempting likely-benign traffic and blocking the rest. Five inferred exemption rules applied to the first TCP payload (pkt): Ex1 — popcount(pkt)/len(pkt) ≤ 3.4 or ≥ 4.6 (bits/byte); Ex2 — first 6+ bytes are printable ASCII [0x20–0x7e]; Ex3 — more than 50% of bytes are printable ASCII; Ex4 — more than 20 contiguous printable ASCII bytes; Ex5 — first bytes match TLS or HTTP fingerprint. Traffic failing all five exemptions is blocked. Experiments confirmed all rules still held as of February 2023.
-
OpenVPN's application-layer P_ACK packets — uniform in size and concentrated only in the handshake phase — provide a timing and count fingerprint detectable via threshold comparison over 10-packet bins. Tunnel-based obfuscation wrappers (Stunnel, SSH, obfs2/3, Shadowsocks) that do not add random padding preserve the 1:1 packet correspondence with the underlying OpenVPN stream, leaving 16 of 20 tested tunnel-based obfuscated configurations vulnerable to ACK fingerprinting.
-
A two-phase passive-filter-plus-active-probing framework deployed at a 1-million-user ISP identified 85.90% of vanilla OpenVPN flows (1,718/2,000) and 72.67% of obfuscated flows (1,468/2,020), with an upper-bound false positive rate of 0.0039% across over 10 million flows — three orders of magnitude lower than prior ML-based approaches (1.4–5.5%). The system processed 15 TB and 2 billion flows per day on a single commodity server.
-
Even with tls-auth/tls-crypt HMAC protection making OpenVPN servers nominally 'probe-resistant' (silent to unauthenticated clients), the framework fingerprints servers via TCP-level timing side channels: a complete 16-byte client-reset probe triggers an immediate connection drop (HMAC validation fails after full packet reassembly), while a 15-byte truncated probe causes the server to stall awaiting the final byte until a server-specific handshake timeout expires. Over 97% of non-OpenVPN endpoints have RST thresholds below 500 or above 4,000 bytes, versus OpenVPN's characteristic 1,550–1,660 bytes derived from default MTU configurations.
-
Current randomized-payload circumvention tools (obfs4/ScrambleSuit, SkypeMorph, VoIP-tunneling) rely on censors 'defaulting open' — treating unidentified traffic as innocuous. If censors instead block all traffic not explicitly recognizable as meaningful plaintext, these tools fail entirely. The paper notes anecdotal evidence this is already occurring, including blocking of some TLS 1.3 connections.
-
In Iran in 2013, censors dropped or throttled certain TCP connections after 60 seconds, severely disrupting circumvention protocols like obfs4 that fuse session state with a single long-lived TCP connection, while short-lived HTTP connections were largely unaffected. obfs4 has no session concept independent of the underlying TCP connection; when that connection is terminated, all end-to-end state is lost and a new session must restart from scratch.
-
Turbo Tunnel inserts an interior session/reliability protocol (KCP or QUIC) between the obfuscation layer and user streams, decoupling end-to-end session state from any single transport connection. A session survives TCP termination, proxy rotation, or unreliable carriers by retransmitting lost packets over a new connection bearing the same session identifier. The pattern was implemented in obfs4, meek, and Snowflake, with Turbo Tunnel–enabled Snowflake shipping in Tor Browser alpha releases 9.5a13 (desktop) and 10.0a1 (Android).
-
Manually-crafted decision trees combining probe non-response, FIN/RST close type, and connection timing achieved a false-positive rate below 0.001% for obfs4, Lampshade, Shadowsocks, and OSSH across 1.9 million endpoints; for OSSH specifically, 7 of 8 flagged Tap endpoints were confirmed genuine Psiphon proxies by developers. MTProto was the sole exception, producing 3,144 false positives (0.56% of Tap, 0.02% of ZMap) because its infinite-timeout behavior is shared by a non-negligible population of common hosts.
-
Endpoints that never close a connection and never respond to any probe ('infinite timeout') represent 0.7% of the ISP Tap dataset and 42% of the ZMap active-scan dataset; this is the single most common probe-indifferent behavior in both datasets. MTProto already exploits this: its strategy of keeping failed connections open indefinitely produces the highest false-positive rate (0.56% of Tap) among all tested protocols, making it effectively uncountable at acceptable collateral-damage thresholds.
-
Across 433,286 endpoints from a 10 Gbps university ISP passive tap, 94% responded with data to at least one of 8 protocol probes (TLS, HTTP, STUN, S7, Modbus, DNS-AXFR, random bytes, empty); all five tested probe-resistant proxies (obfs4, Lampshade, Shadowsocks, MTProto, OSSH) never responded with data to any probe. This single filter reduces the suspect set from 433,286 to ~26,000 endpoints and rules out 94% of ISP-observed hosts as non-proxies with zero false negatives against the tested protocols.
-
Each probe-resistant proxy exposes a unique TCP close-threshold fingerprint: obfs4 closes with FIN at 8,192–16,384 bytes and RST at the next multiple of 1,448 bytes beyond that; Lampshade at FIN 256 bytes / RST 257 bytes; Shadowsocks-python and -outline both at FIN 50 bytes (outline also RST at 51); OSSH at FIN 24 bytes / RST 25 bytes. A binary-search tool using random probes can discover these thresholds remotely without knowing any shared secret, providing a protocol-specific fingerprint independent of payload content.
-
Frolov et al. (2020) found that over 94% of Internet servers respond with data to at least one popular protocol probe, making probe-resistant proxies that remain entirely silent statistically anomalous. Censors can further fingerprint silent proxies by their unique timeout or data-limit behaviors before connection close (e.g., Lampshade closes immediately after 256 bytes of unrecognized data, or waits exactly 90 seconds before timing out).
-
Frolov et al. (2020) found that obfs4, Shadowsocks Outline, Psiphon's OSSH, and Lantern's Lampshade are all identifiable by TCP flag and timing patterns when servers close connections on error, because each tool's timeout value and FIN/ACK behavior are distinct. Their recommended mitigation—'forever read' on errors so the prober always closes first—forces the server to terminate with FIN/ACK consistently across all code paths.
-
Frolov and Wustrow show that every major TLS-based circumvention tool (Tor Browser, Lantern, OpenVPN, Psiphon, etc.) produces a TLS ClientHello fingerprint that is statistically distinguishable from real Chrome or Firefox: differences include cipher-suite ordering, extension set, extension ordering, ALPN values, and curve preferences. A passive observer with a classifier over ClientHello fields can identify the tool with high precision without decrypting any traffic.
-
Conjure phantom hosts resist active probing by requiring knowledge of a per-client registration seed secret before the station responds. A ZMap scan of over 1 billion random IP/port combinations found that 99.4% of responding servers returned no data after a random OSSH-style probe and 7.42% closed with TCP RST — behavior indistinguishable from Conjure's OSSH transport — meaning censors face steep false-positive rates when attempting to identify phantom proxies via active probing.
-
obfs4 successfully established Tor circuits on the authors' own unpublished bridge relays but failed to connect to any public obfs4 bridge, consistent with the GFW having scraped and blacklisted public bridge addresses. This demonstrates that address confidentiality is a prerequisite for obfs4's effectiveness, independent of its obfuscation properties.
-
In the heavily censored environment (E3), all successful connections used meek domain-fronting bridges (meek-amazon: 11 participants, meek-google: 9, meek-azure: 3); not a single participant successfully connected using flashproxy, fte, fte-ipv6, obfs4, or scramblesuit, despite all being available as built-in options.
-
77% of public bridges offer only vanilla Tor, which is trivially detectable via TLS certificate pattern matching. An additional 15% offer Pluggable Transports with conflicting security properties (e.g., obfs4 + obfs3 + obfs2 co-deployed on the same bridge), allowing a censor to confirm and block the bridge via the weakest PT and thereby disable all stronger PTs on the same IP — including active-probing-resistant transports like obfs4 and ScrambleSuit.
-
Tor's vanilla TLS certificate presents a distinctive pattern (SubjectCN=www.[random].com; IssuerCN=www.[random].net using base32 random strings), which never changes across certificate rotations every 2 hours. Using this pattern against Censys and Shodan scan data without running any active scans, the authors discovered 694 private bridges and 645 private proxies, and deanonymized the IP address of 35% of public bridges with clients (23% of all active public bridges) in April 2016.
-
The Great Firewall of China blocked newly published obfs4 Tor Browser default bridges after delays of 7, 2, 18, 11, and 36 days following the first public software release, and up to 57 days after bridges were first discoverable via bug-tracker ticket filing. Iran showed no blocking of the same default bridges across the entire five-month measurement period.
-
Some obfs4 bridges exhibited a roughly 24-hour periodic semi-blocking pattern from China, where bridges cycled between reachable and blocked states with a ~24-hour period. This diurnal pattern differed between the two China probe sites and between bridges, and one blocking failure coincided with a documented nationwide GFW outage that also briefly restored access to Google services.
-
Randomization-based obfuscation systems (obfs2/3, obfs4, ScrambleSuit, Dust) resist blacklist DPI but fail entirely under protocol-whitelist filtering, as explicitly demonstrated during the Iranian elections where censors permitted only known-good protocols. Pure randomization provides no signal of being a permitted protocol, making it trivially blockable under any whitelist regime.
-
The GFW sends protocol-specific probe payloads tailored to each circumvention tool: Tor bridges receive a TLS ClientHello mimicking Tor's own; obfs2/obfs3 servers receive random-looking payloads; Shadowsocks servers receive random bytes. A server that responds differently to these crafted probes versus innocent traffic (e.g., by sending a valid protocol handshake in response to a probe) reveals itself and is subsequently blocked.
-
CART decision-tree classifiers trained on entropy-based and packet-header features detect all five Tor pluggable transports (obfsproxy3/4, FTE, meek-amazon, meek-google) with average PR-AUC=0.987, TPR=0.986, and FPR=0.003 on synthetic traces. On 14 million real campus flows the highest per-obfuscator FPR is 0.65%, and meek-google yields only 842 false positives across all three datasets. However, cross-environment portability is poor: classifiers trained on an Ubuntu/campus setup and tested on a Windows/home network achieve true-positive rates as low as 52% with false-positive rates reaching 12%.
-
The paper demonstrates that 'having no fingerprint is itself a fingerprint': randomizing obfuscators that emit uniformly random bytes from the first packet are detectable precisely because conventional protocols (TLS, SSH, HTTP) always begin with fixed plaintext headers. This structural distinction requires no deep payload parsing — the attack operates on only the first TCP packet — and achieves TPR=1.0 / FPR=0.002 against obfsproxy3/4 using commodity-implementable statistics.
-
Obfsproxy3 and obfsproxy4 are reliably detected by an entropy-distribution test (KS test, block size k=8) applied to the first 2,048 bytes of the first client-to-server packet, combined with a minimum payload-length check of 149 bytes. On three university campus datasets totaling over 14 million TCP flows, the test achieves TPR=1.0 with FPR ranging from 0.24% to 0.33%. Omitting the length check raises the SSL/TLS false-positive rate to approximately 23%.
-
Obfsproxy (predecessor to obfs4) listens on randomized ports as an explicit defense against discovery by comprehensive Internet-wide scanning, because an adversary must scan all 65,535 ports to locate bridges rather than a single known port — multiplying scan cost by roughly 65,000× relative to a single-port sweep.
-
ScrambleSuit defeats active probing by requiring clients to prove knowledge of an out-of-band shared secret before the server responds; a probing censor receives only silence. Two mechanisms are provided: session tickets (preferred for non-Tor applications) and an authenticated UniformDH handshake (optimized for Tor's shared-secret bridge distribution model), with both producing payloads computationally indistinguishable from random.
-
Tor's fixed 512-byte cells packed into TLS 1.0 records produce a characteristic TCP payload of 586 bytes (512 + 74 bytes of TLS overhead). A perimeter filter running a simple exponential moving average (τ ← ατ + (1−α)1ₗ₌₅₈₆, α=0.1, T=0.4) identifies Tor flows within a few dozen packets; this attack succeeds at backbone rates of ~540,000 packets/second on commodity hardware. Obfsproxy does not alter packet sizes or timings and therefore does not defeat this classifier.