TECHNIQUES
fully-encrypted-detect Fully-encrypted protocol detection
Generalization of random-payload-detect: detect any protocol where the entire byte stream looks uniformly random. (USENIX 2023.)
10 papers on file
- 2025-geedge-mesa-leak Geedge & MESA Leak: Analyzing the Great Firewall's Largest Document Leak
- 2025-himmelberger-drivel Drivel: A Quantum-Safe Fully Encrypted Protocol Proxy
- 2025-interseclab-internet-coup The Internet Coup
- 2025-wilson-extended Extended Abstract: Shaperd: Easily Adoptable Real-Time Traffic Shaper for Fully Encrypted Protocols
- 2017-frolov-water-pluggable WATER: a programmable framework for pluggable transports
- 2023-fenske-security Security Notions for Fully Encrypted Protocols
- 2023-fifield-comments Comments on certain past cryptographic flaws affecting fully encrypted censorship circumvention protocols
- 2023-wu-fully-encrypted-detect How the Great Firewall of China detects and blocks fully encrypted traffic
- 2022-blocking-tls-circumvention Large scale blocking of TLS-based censorship circumvention tools in China
- 2020-bock-detecting Detecting and Evading Censorship-in-Depth: A Case Study of Iran's Protocol Filter
44 findings tagged here
-
Stage 1 of the detection pipeline uses a lightweight heuristic: restrict analysis to IP addresses in "VPS-dense ASNs," which censors already target for resource-intensive inspection of fully-encrypted traffic. This pre-filter dramatically reduces the search space before applying the more expensive dual-role behavioral analysis. The evaluation was conducted without Stages 1 and 3 due to dataset limitations, meaning the reported 23% recall and 0.18% FPR are conservative lower bounds on the full pipeline's performance.
-
LetsVPN permanently exited the Chinese mainland market in late April 2026 after its technical team spent 20 days making hourly adjustments and confirmed it could not restore connectivity. The official announcement designated April 8, 2026 as the effective service-termination date and initiated a full refund program.
-
AEGIS, a flow-physics-only ML classifier using a Hyperbolic Liquid State Space Model evaluated on a 400GB adversarial corpus including VLESS Reality, GhostBear, and AMOI-morphed traffic, achieves F1-score 0.9952, 99.50% TPR, and 0.2141% FPR at 262.27 µs inference latency on an RTX 4090. The system discards all payload bytes and classifies traffic exclusively on 6-dimensional flow physics: packet size, inter-arrival time, directionality, TCP window size, TCP flags, and payload ratio.
-
Explicitly disentangling packet headers (structured, low-entropy) from encrypted payloads (high-entropy, stochastic) into separate MoE branches yields consistent gains across six datasets: 86.85% F1 on 120-class TLS 1.3 traffic (CSTNET-TLS), 97.88% F1 on USTC-TFC2016 malware/benign flows, and 92.65% F1 on imbalanced IoT traffic (CIC-IoT2022), demonstrating that headers and payloads carry fundamentally different and independently exploitable discriminative signals.
-
Encrypted traffic exhibits a 'full-frequency' spectral property where both low- and high-frequency components are highly active with comparable intensity, unlike natural images which are dominated by low-frequency components. Fourier Transform analysis across CIC-IoT2023, DoHBrw2020, and ISCX-Tor2016 confirms this distinction is pervasive. This signature is an inherent consequence of encryption disrupting byte-level semantics into a visually disordered, noise-like spatial pattern.
-
FreeUp operates under a zero-positive (unsupervised) learning paradigm — trained exclusively on normal traffic with no labeled anomaly examples — yet achieves 95.53% AUC on Tor traffic and 85.44% AUC on DNS-over-HTTPS tunneling detection. This demonstrates that frequency-aware anomaly detectors generalize to novel circumvention protocols without requiring any labeled attack data, eliminating the labeling bottleneck that previously limited ML-based censorship detection.
-
I2P payload entropy was measured at close to 8 bits per byte across sampled packets (Figure 9), confirming that payload content is cryptographically indistinguishable from random noise and provides no usable signal for classification. All experimental variants using raw payload alone achieved poor and high-variance accuracy (72.5–76.5%), while excluding payload improved accuracy to 99.5% in lab conditions.
-
Systematic measurement platforms (OONI, Censored Planet, Cloudflare Radar, NetBlocks) have inherent blind spots due to geographic coverage and protocol-specific test constraints; critical censorship discoveries — including GFW fully-encrypted protocol blocking and regional Chinese censorship — were first surfaced by user reports on forums and GitHub issue pages of circumvention tools, not by automated measurement infrastructure.
-
Two Iranian ASes apply a protocol allowlist that drops traffic not matching known application-layer protocol patterns (after ~6 packets), independently of the destination IP. Experiments with fresh /26 phantom subnets showed that prefixing Conjure connections with a plain HTTP GET payload evaded this blocking for four weeks, while TLS Client Hello-prefixed and SSH-prefixed connections were blocked within 72 hours (TLS) or 72 hours after port rotation (SSH). HTTP GET on port 80 was the only tested prefix that survived the full experiment window.
-
Iran's June 2025 shutdown enforced a strict national protocol whitelist: only DNS (UDP/53), HTTP (port 80), and HTTPS (port 443) traffic from Iranian networks to external servers was forwarded; all other protocols—including OpenVPN (UDP/1194), SSH (port 22), and arbitrary TCP/UDP ports—were silently dropped without response by DPI at the border.
-
The September 2025 leak of ~600 GB from Geedge Networks and the MESA Lab (Institute of Information Engineering, Chinese Academy of Sciences) is the largest known document disclosure from the GFW vendor ecosystem. It establishes a direct lineage: MESA Lab (founded 2012 by Fang Binxing's team, annual contracted revenue >35M RMB by 2016) spun out Geedge Networks in 2018, with MESA alumni filling key engineering roles (e.g. Zheng Chao as CTO). The leak includes ~64 GB of MESA git repositories, ~35 GB of MESA internal documents, ~15 GB of Geedge internal documents, and a ~3 GB Jira export — providing direct access to source code, work logs, and internal communications behind GFW R&D.
-
Internal Geedge documents confirm active contracts to deploy GFW-derived censorship and surveillance infrastructure in Myanmar, Pakistan, Ethiopia, Kazakhstan, and at least one additional unidentified country under the Belt and Road framework, in addition to domestic deployments in Xinjiang, Jiangsu, and Fujian. The exported product (the Tiangou Secure Gateway / TSG line) is not a stripped-down export variant — leaked TSG documentation shows DPI, active-probing, ML classifiers, and granular per-region traffic control rules that mirror the domestic GFW capability set.
-
Drivel evaluates its design against the GFW's fully-encrypted-traffic detector (documented in Wu et al. 2023). The thesis demonstrates that switching to post-quantum primitives does not by itself change the traffic's appearance to a statistical censor classifier — the fully-encrypted detection problem is independent of the underlying cryptographic algorithm and must be addressed at the traffic-shaping layer regardless of key-exchange choice.
-
Drivel is an obfs4-style fully-encrypted proxy protocol that replaces obfs4's pre-quantum cryptographic primitives with post-quantum alternatives. It is one of the first circumvention protocols explicitly designed to remain secure under a quantum adversary, addressing the forward-secrecy threat to deployed circumvention traffic recorded today for future decryption.
-
Most deployed circumvention protocols (obfs4, Shadowsocks, Trojan, VMess, etc.) still rely on pre-quantum primitives (X25519, AES-GCM, ChaCha20). Drivel is the first published treatment of how to perform this migration in the specific context of a fully-encrypted pluggable transport, providing a design template and security analysis that does not exist elsewhere in the circumvention literature.
-
InterSecLab's 76-page analysis of the Geedge/MESA leak (based on nine months of indexing and translating >100,000 documents) characterizes the Tiangou Secure Gateway (TSG) product line as a commercially deployable detection stack that combines deep packet inspection, real-time mobile subscriber monitoring, active probing, ML-based traffic classifiers, and granular per-region rule sets. TSG is not a research prototype — leaked documentation includes deployment timelines and client government interactions for Kazakhstan, Ethiopia, Pakistan, Myanmar, and one unnamed country, with censorship rules explicitly tailored to each region.
-
Censorship classifiers and traffic analysis attacks consistently exploit the initial seconds of a proxy connection, where packet-size, inter-arrival-time, and burst features are maximally discriminative. Cited work demonstrates that website fingerprinting classifiers trained solely on the first few seconds of Tor traffic achieve high accuracy, and real-world GFW detection of fully-encrypted protocols also targets early-connection bytes.
-
Among surveyed channels, Skyhook, PushRSS, SQS, AMPCache, and Meek satisfy all three UP channel properties (unidirectional, no client auth, higher bandwidth); CloudTransport and Raven do not because they require authenticated user accounts; Tor's email- and Telegram-based bridge distribution also fails the no-auth requirement. The analysis was prompted in part by the 2022 GFW entropy-based blocking event, which required software updates to be pushed to users before fully-encrypted protocols could resume functioning.
-
Custom CCAs that deviate from standard TCP/QUIC congestion response fundamentally contradict the core circumvention principle of traffic indistinguishability: by failing to back off under congestion signals, they produce traffic patterns that diverge from the vast majority of Internet flows that censors value, eliminating the collateral-damage protection that makes circumvention tools hard to block wholesale.
-
Shaperd's adaptive blocking-detection mode can integrate with external blockage-detection tools (e.g., Troll Patrol) to detect when a constraint set is no longer effective and automatically switch to an alternate constraint set, changing packet patterns to restore connectivity without user intervention.
-
The GFW detects fully encrypted protocols using ad-hoc rules including the percentage of printable ASCII characters per packet (threshold: over 50%) and the observation that FEP entropy is considerably higher than normal encrypted TLS traffic. These rules are subject to frequent changes, making rigid FEP designs unable to adapt.
-
Packet timings are a distinct detection vector for circumvention tools beyond payload content and packet lengths, as demonstrated by Wails et al. 2024. Prior FEP-specific shaping work (Fenske et al.) addressed packet lengths but explicitly left timing shaping for future work, leaving a known gap in detection resistance.
-
Shaperd's proof-of-concept prototype (~1000 lines of Go) introduces a minimal 4.1% throughput overhead for a single entropy constraint; the first additional constraint added 5.1% overhead and the second added 5.5%, with total overhead scaling with constraint count and rigor.
-
Shaperd introduces a constraint-agnostic traffic shaping system that operates on both packet content and timing in real time, designed for drop-in integration with any existing FEP. The system uses a four-component constraint definition (function, value, comparison operator, target packets) capable of expressing any rule based on a computable deterministic function over packet contents.
-
WATER (WebAssembly Transport Executables at Runtime) defines a pluggable-transport architecture in which the transport logic is compiled to a WASM module that is loaded and executed at runtime by a thin Go host process. This separates the stable host ABI (dial, accept, read, write) from the rapidly-evolving transport logic, allowing new or updated transports to be delivered as small WASM binaries without recompiling or redeploying the host application.
-
WATER (WebAssembly Transport Executables Runtime) separates transport logic from the host application by compiling it to a WASM module (WATM) that is distributed and loaded independently at runtime. Deploying a new or updated circumvention technique requires only distributing the new WATM binary and optional configuration — no change to the host application and no app-store update cycle is required.
-
Discop achieves provably perfect steganographic security (DKL(Pc‖Ps) = 0) by constructing multiple 'distribution copies' of a generative model's predicted distribution and using the copy index to encode the secret message. Because all copies share identical token probabilities, the stego distribution is exactly equal to the cover distribution and no steganalyzer can perform better than random guessing.
-
Achieving active security (FEP-CCFA) requires that on any AEAD decryption failure a fully encrypted protocol silently return the empty string and keep the channel open indefinitely, never emitting a channel-closure signal. Any observable behavioral difference — including connection termination timing — leaks information about ciphertext-boundary locations to an active adversary.
-
Shadowsocks transmits a fixed-size AEAD-encrypted length field followed by the AEAD-encrypted payload with no support for reducing ciphertext size via fragmentation, while Obfs4 permits input-side padding but not output fragmentation. These designs impose distinct minimum output message lengths, allowing a passive adversary to distinguish between them — and identify short-message sessions — based solely on the minimum observed message length.
-
No existing fully encrypted protocol — including Obfs4, Shadowsocks, VMess, and Obfuscated OpenSSH — simultaneously satisfies passive indistinguishability (FEP-CPFA), active-manipulation resistance (FEP-CCFA), and output-length shaping. The paper presents a novel stream-based construction that provably satisfies all three using AEAD-authenticated length blocks, an output buffer supporting arbitrary fragmentation, and a padding mechanism allowing the sender to emit exactly p output bytes on demand.
-
Obfs4's data-transport phase encrypts per-record length fields with an unauthenticated stream cipher. An active adversary can overwrite this field to force a predictable TCP connection termination at a calculable byte offset; the authors experimentally confirmed that Tor-over-Obfs4 connections can be reliably distinguished from other FEPs because client initiation messages have consistent lengths.
-
Censors optimize for utility under asymmetric misclassification costs rather than raw accuracy: false positives (blocking legitimate traffic) carry economic and political costs that make censors conservative about deploying classifiers with high false-positive rates. Multi-flow stateful classifiers — such as the obfs4 Elligator probabilistic distinguisher, which requires correlating observations across multiple connections — are operationally more expensive than single-packet or connection-initiation classifiers, which the author suggests explains why probabilistic multi-flow distinguishers have not been exploited in practice even when theoretically available.
-
Despite fully encrypted protocols existing since obfs2 in 2012, the first documented evidence of the GFW passively detecting them purely by randomness appeared only in 2021 — approximately a decade later — and was limited to certain foreign IP address ranges and a subsampled fraction of traffic. Meanwhile, the GFW had been discovering obfs2/obfs3 servers via active probing as early as 2013, indicating censors found active-probing-based address discovery cheaper and more reliable than passive statistical classifiers for this protocol family.
-
Three independent implementation flaws in obfs4proxy's Elligator encoding made obfs4 public-key representatives passively distinguishable from uniform random bytes: (1) non-canonical square roots allowed a square-then-root test matching 100% of obfs4 outputs but only ~50% of random strings; (2) bit 255 was always zero; (3) only large prime-order subgroup points were encoded. A classifier exploiting these achieves 100% sensitivity (obfs4 never falsely marked as random) at less-than-100% specificity. All three were fixed in obfs4proxy-0.0.12 (December 2021) and 0.0.14 (September 2022).
-
The GFW detects Shadowsocks by flagging apparently high-entropy connections that are not TLS or HTTP, but this detection is brittle: connections are explicitly allowed if the first 6 bytes of the first packet of a flow are all printable ASCII characters (range 0x20–0x7E). Adding a 6-byte alphanumeric preamble to the Shadowsocks message definition is sufficient to bypass this heuristic and requires only a short patch to the protocol specification file.
-
The GFW's fully-encrypted detector (deployed Nov 2021) operates by exempting likely-benign traffic and blocking the rest. Five inferred exemption rules applied to the first TCP payload (pkt): Ex1 — popcount(pkt)/len(pkt) ≤ 3.4 or ≥ 4.6 (bits/byte); Ex2 — first 6+ bytes are printable ASCII [0x20–0x7e]; Ex3 — more than 50% of bytes are printable ASCII; Ex4 — more than 20 contiguous printable ASCII bytes; Ex5 — first bytes match TLS or HTTP fingerprint. Traffic failing all five exemptions is blocked. Experiments confirmed all rules still held as of February 2023.
-
The GFW applies the fully-encrypted detector probabilistically and only to a targeted subset of IP address space. Each qualifying connection is blocked with probability p = 26.3% (geometric distribution fit over 109,489 affected IPs in a 10% IPv4 scan); residual censorship then blocks the same 3-tuple (client IP, server IP, server port) for 180 seconds after a first block. The detector only monitors ~26% of connections and targets specific IP ranges of popular data centers (VPS providers such as Alibaba US, Constant, DigitalOcean, Linode); large CDNs (Akamai, Cloudflare) and most residential/enterprise IPs are unaffected. 98% of scanned IPs were unaffected. Simulated on live university traffic, the rules would block ~0.6% of normal connections as collateral damage.
-
Starting October 3, 2022, more than 100 users reported simultaneous blocking of TLS-based circumvention servers running Trojan, Xray, V2Ray TLS+WebSocket, VLESS, and gRPC. Blocking was port-specific initially (mainly port 443, but also non-443 ports), then escalated to full IP blocking when users switched ports. Domain names were not added to DNS or SNI blocklists. naiveproxy was notably not affected. The blocking was dynamic in at least some cases (browsers could still reach the port, but circumvention tools could not), strongly indicating protocol-level identification rather than blind port blocking.
-
The October 2022 blocking wave is the confirmed operational deployment of the fully-encrypted-traffic detector later formalized in Wu et al. (USENIX Security 2023). The detector was therefore in live production from at least late 2022, more than a year before the academic paper describing it was published. This event establishes that the GFW's passive fully-encrypted classifier operates at scale in adversarial real-world conditions, not just in controlled experiments.
-
Current randomized-payload circumvention tools (obfs4/ScrambleSuit, SkypeMorph, VoIP-tunneling) rely on censors 'defaulting open' — treating unidentified traffic as innocuous. If censors instead block all traffic not explicitly recognizable as meaningful plaintext, these tools fail entirely. The paper notes anecdotal evidence this is already occurring, including blocking of some TLS 1.3 connections.
-
Against censors that detect blacklisted application protocols by examining only the first 30 packets of a flow (e.g., the technique in Wang et al. 2015), a single IP migration after 30 packets have been exchanged is sufficient to defeat detection while incurring minimal performance overhead—the client continues the connection normally on the new address.
-
ScholarCloud's 'message blinding' — a non-public byte mapping (f: [0, 2^8) → [0, 2^8)) applied between domestic and remote proxy — successfully evades GFW deep packet inspection with 0.22% average packet loss rate, statistically indistinguishable from native VPN (0.21%). The paper reports that even this simple encoding suffices because the GFW cannot classify the traffic; confidentiality of the algorithm is the operative property, not cryptographic strength. Because the operator controls both proxy endpoints, the blinding scheme can be rotated at any time without requiring client-side updates.
-
The paper demonstrates that 'having no fingerprint is itself a fingerprint': randomizing obfuscators that emit uniformly random bytes from the first packet are detectable precisely because conventional protocols (TLS, SSH, HTTP) always begin with fixed plaintext headers. This structural distinction requires no deep payload parsing — the attack operates on only the first TCP packet — and achieves TPR=1.0 / FPR=0.002 against obfsproxy3/4 using commodity-implementable statistics.
-
Undetectability of a message requires that it be indistinguishable from 'random noise' — an attacker cannot sufficiently distinguish whether the message exists or not. This is distinct from anonymity, which protects only the relationship between an IOI and a subject, not the IOI's existence itself. Undetectability is possible only for subjects not involved in the IOI; senders and recipients cannot achieve it against each other.