CENSORS
generic Generic / not censor-specific
Use for technique papers that don't evaluate against a specific censor.
325 papers on file
- 2026-almutairi-server Server, Client, or Relay? Dual-Role Detection of Circumvention Relays
- 2026-anon-anytls-anytls-sing-box-2026 AnyTLS协议是什么?AnyTLS原理、sing-box部署与客户端配置完整指南(2026) | 二毛
- 2026-edorh-shieldshare ShieldShare: Building a VPN-backed Android Hotspot for Secure Internet Sharing with Per-User Traffic Accounting
- 2026-fan-activeflowmark-assessing-tor ActiveFlowMark: Assessing Tor Anonymity under Active Bandwidth Watermarking
- 2026-fares-game The Game Has Changed: Revisiting proxy distribution and game theory
- 2026-ferrel-aegis-adversarial-entropy-guided AEGIS: Adversarial Entropy-Guided Immune System -- Thermodynamic State Space Models for Zero-Day Network Evasion Detection
- 2026-he-trafficmoe-heterogeneity-aware-mixture TrafficMoE: Heterogeneity-aware Mixture of Experts for Encrypted Traffic Classification
- 2026-jois-assemblage Assemblage: Chipping Away at Censorship with Generative Steganography
- 2026-kamali-huma Huma: Censorship Circumvention via Web Protocol Tunneling with Deferred Traffic Replacement
- 2026-kang-censorless-serverless CensorLess: Cost-Efficient Censorship Circumvention Through Serverless Cloud Functions
- 2026-kulatilleke-mambanetburst-direct-byte-level MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining
- 2026-lange-towards Towards Automated DNS Censorship Circumvention
- 2026-lee-quicstep QUICstep: Evaluating connection migration based QUIC censorship circumvention
- 2026-lian-decompose-understand-fuse Decompose to Understand, Fuse to Detect: Frequency-Decoupled Anomaly Detection for Encrypted Network Traffic
- 2026-lipphardt-dual Dual Standards: Examining Content Moderation Disparities Between API and WebUI Interfaces in Large Language Models
- 2026-lugoloobi-known-their-actions Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces
- 2026-mathews-tracing-chain-deep Tracing the Chain: Deep Learning for Stepping-Stone Intrusion Detection
- 2026-micallef-reportor-facilitating-user ReporTor: Facilitating User Reporting of Issues Encountered in Naturalistic Web Browsing via Tor Browser
- 2026-patterniha-mitm-domainfronting MITM-DomainFronting: client-only domain fronting via local TLS MITM with a user-installed CA
- 2026-pulls-ephemeral-network-layer-fingerprinting Ephemeral Network-Layer Fingerprinting Defenses
- 2026-ratliff-mirage Mirage: Private, Mobility-based Routing for Censorship Evasion
- 2026-rohrer-convolutional-neural-networks-deanonymisation-i2p Convolutional-Neural-Networks for Deanonymisation of I2P Traffic
- 2026-sheffey-geedge Geedge Cases: Censorship Measurement Insights from the Geedge Networks Leak
- 2026-song-personafingerprint-measuring-persona PersonaFingerprint: Measuring Persona Inference on Modern Websites with LLM-Driven Browsing
- 2026-tolley-architectural Architectural VPN Vulnerabilities, Disclosure Fatigue, and Structural Failures
- 2026-vilalonga-obscura-enabling-ephemeral Obscura: Enabling Ephemeral Proxies for Traffic Encapsulation in WebRTC Media Streams Against Cost-Effective Censors
- 2026-xian-more-than-meets More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting
- 2026-yan-efficient-provably-secure Efficient Provably Secure Linguistic Steganography via Range Coding
- 2026-yang-invisible-adversaries-systematic Invisible Adversaries: A Systematic Study of Session Manipulation Attacks on VPNs
- 2026-yuan-demux-boundary-aware-multi-scale DEMUX: Boundary-Aware Multi-Scale Traffic Demixing for Multi-Tab Website Fingerprinting
- 2026-zohaib-extended Extended Abstract: CensorAlert -- Leveraging LLM Agents for Automated Censorship Report Aggregation and Analysis
- 2025-arora-improving-performance-security Improving the Performance and Security of Tor's Onion Services
- 2025-berke-unique-whose-web How Unique is Whose Web Browser? The role of demographics in browser fingerprinting among US users
- 2025-h-ller-evaluating Evaluating Onion Address Collection Methods
- 2025-himmelberger-drivel Drivel: A Quantum-Safe Fully Encrypted Protocol Proxy
- 2025-inyangson-amigo Amigo: Secure Group Mesh Messaging in Realistic Protest Settings
- 2025-kamali-anix Anix: Anonymous Blackout-Resistant Microblogging with Message Endorsing
- 2025-lee-onions-got-puzzled Onions Got Puzzled: On the Challenges of Mitigating Denial-of-Service Problems in Tor Onion Services
- 2025-lipphardt-1-800-censorship 1-800-Censorship: Analyzing internet censorship data using the Internet Yellow Pages
- 2025-midtlien-fingerprint-resistant Fingerprint-resistant DTLS for usage in Snowflake
- 2025-mixon-baca-hidden Hidden Links: Analyzing Secret Families of VPN Apps
- 2025-nourin-nobody Is Nobody There? Good! Globally Measuring Connection Tampering without Responsive Endhosts
- 2025-pereira-extended Extended Abstract: Traffic Shaping for Network Protocols: A Modular and Developer-Friendly Framework
- 2025-pereira-position Position Paper: A Case for Machine-Checked Verification of Circumvention Systems
- 2025-rodriguez-revisiting Revisiting BAT Browsers: Protecting At-Risk Populations from Surveillance, Censorship, and Targeted Attacks
- 2025-sharma-cenpush CenPush: Blocking-Resistant Control Channel Using Push Notifications
- 2025-sheffey-extended Extended Abstract: I’ll Shake Your Hand: What Happens After DNS Poisoning
- 2025-sivan-sevilla-probing Probing the third-party infrastructure of digital news on the Web
- 2025-syverson-onion-location-measurements-fingerprinting Onion-Location Measurements and Fingerprinting
- 2025-tusing-minecraft-tunnels Minecraft tunnels for covert communications
- 2025-umesh-improved An Improved BGP Internet Graph for Optimizing Refraction Proxy Placement
- 2025-vafa-learning Learning from Censored Experiences: Social Media Discussions around Censorship Circumvention Technologies
- 2025-vilalonga-extended Extended Abstract: Using TURN Servers for Censorship Evasion
- 2025-vines-extended Extended Abstract: Nobody’s Fault but Mine: Using Unauthenticated Unidirectional Pushes for Client Update
- 2025-wails-censorship Censorship Evasion with Unidentified Protocol Generation
- 2025-walsh-improved-open-world-fingerprinting Improved Open-World Fingerprinting Increases Threat to Streaming Video Privacy but Realistic Scenarios Remain Difficult
- 2025-wang-custom Is Custom Congestion Control a Bad Idea for Circumvention Tools?
- 2025-wendzel-survey A Survey of Internet Censorship and its Measurement: Methodology, Trends, and Challenges
- 2025-wilson-extended Extended Abstract: Shaperd: Easily Adoptable Real-Time Traffic Shaper for Fully Encrypted Protocols
- 2025-wrana-sok-surveillance SoK: The Spectre of Surveillance and Censorship in Future Internet Architectures
- 2025-xue-discriminative The Discriminative Power of Cross-layer RTTs in Fingerprinting Proxy Traffic
- 2017-frolov-water-pluggable WATER: a programmable framework for pluggable transports
- 2024-ahmed-extended Extended Abstract: The Impact of Online Censorship on LLMs
- 2024-almutairi-fingerprinting Fingerprinting VPNs with Custom Router Firmware: A New Censorship Threat Model
- 2024-awwad-digital Digital Repression in Palestine
- 2024-bhaskar-understanding Understanding Routing-Induced Censorship Changes Globally
- 2024-bocovich-snowflake Snowflake, a censorship circumvention system using temporary WebRTC proxies
- 2024-calle-toward Toward Automated DNS Tampering Detection Using Machine Learning
- 2024-chen-extended Extended Abstract: Oscur0: One-shot Circumvention without Registration
- 2024-chi-just Just add WATER: WebAssembly-based Circumvention Transports
- 2024-durumeric-ten-years-zmap Ten Years of ZMap
- 2024-gao-extended Extended Abstract: Leveraging Large Language Models to Identify Internet Censorship through Network Data
- 2024-gosain-out Out in the Open: On the Implementation of Mobile App Filtering in India
- 2024-hanlon-detecting Detecting VPN Traffic through Encapsulated TCP Behavior
- 2024-holland-detorrent DeTorrent: An Adversarial Padding-only Traffic Analysis Defense
- 2024-kon-netshuffle NetShuffle: Circumventing Censorship with Shuffle Proxies at the Edge
- 2024-kon-spotproxy SpotProxy: Rediscovering the Cloud for Censorship Circumvention
- 2024-kujath-analyzing Analyzing Prominent Mobile Apps in Latin America
- 2024-lorimer-extended Extended Abstract: Traffic Splitting for Pluggable Transports
- 2024-m-ller-turning Turning Attacks into Advantages: Evading HTTP Censorship with HTTP Request Smuggling
- 2024-mixon-baca-snitch Attacking Connection Tracking Frameworks as used by Virtual Private Networks
- 2024-moon-pryde Pryde: A Modular Generalizable Workflow for Uncovering Evasion Attacks Against Stateful Firewall Deployments
- 2024-niere-tls-attacker TLS-Attacker: A Dynamic Framework for Analyzing TLS Implementations
- 2024-pu-exploring Exploring Amazon Simple Queue Service (SQS) for Censorship Circumvention
- 2024-ruo-lost Lost in Translation: Characterizing Automated Censorship in Online Translation Services
- 2024-tang-automatic Automatic Generation of Web Censorship Probe Lists
- 2024-tsai-modeling Modeling and Detecting Internet Censorship Events
- 2024-vilalonga-looking Looking at the Clouds: Leveraging Pub/Sub Cloud Services for Censorship-Resistant Rendezvous Channels
- 2024-vines-communication Communication Breakdown: Modularizing Application Tunneling for Signaling Around Censorship
- 2024-vines-ten Ten Years Gone: Revisiting Cloud Storage Transports to Reduce Censored User Burdens
- 2024-wails-precisely On Precisely Detecting Censorship Circumvention in Real-World Networks
- 2024-wang-identifying Identifying VPN Servers through Graph-Represented Behaviors
- 2024-xue-bridging Bridging Barriers: A Survey of Challenges and Priorities in the Censorship Circumvention Landscape
- 2024-xue-fingerprinting Fingerprinting Obfuscated Proxy Traffic with Encapsulated TLS Handshakes
- 2024-zillien-look Look What's There! Utilizing the Internet's Existing Data for Censorship Circumvention with OPPRESSION
- 2023-amich-deresistor DeResistor: Toward Detection-Resistant Probing for Evasion of Internet Censorship
- 2023-arora-detor-onion Provably Avoiding Geographic Regions for Tor's Onion Services
- 2023-bischof-destination Destination Unreachable: Characterizing Internet Outages and Shutdowns
- 2023-brown-augmenting Augmenting Rule-based DNS Censorship Detection at Scale with Machine Learning
- 2023-ding-discop Discop: Provably secure steganography in practice based on ``distribution copies''
- 2023-fenske-security Security Notions for Fully Encrypted Protocols
- 2023-fifield-comments Comments on certain past cryptographic flaws affecting fully encrypted censorship circumvention protocols
- 2023-fifield-running Running a high-performance pluggable transports Tor bridge
- 2023-jia-voiceover Voiceover: Censorship-Circumventing Protocol Tunnels with Generative Modeling
- 2023-master-worldwide A Worldwide View of Nation-state Internet Censorship
- 2023-nourin-detecting Detecting Network Interference Without Endpoint Participation
- 2023-raman-advancing Advancing the Art of Censorship Data Analysis
- 2023-raman-global Global, Passive Detection of Connection Tampering
- 2023-ramesh-certainty CERTainty: Detecting DNS Manipulation at Scale using TLS Certificates
- 2023-sharma-dolphin Dolphin: A Cellular Voice Based Internet Shutdown Resistance System
- 2023-sun-telepath TELEPATH: A Minecraft-based Covert Communication System
- 2023-tran-crowdsourcing Crowdsourcing the Discovery of Server-side Censorship Evasion Strategies
- 2023-tulloch-lox Lox: Protecting the Social Graph in Bridge Distribution
- 2023-ververis-website Website blocking in the European Union: Network interference from the perspective of Open Internet
- 2023-wails-proteus Proteus: Programmable Protocols for Censorship Circumvention
- 2023-wang-chasing Chasing Shadows: A security analysis of the ShadowTLS proxy
- 2023-wang-self-censorship Self-Censorship Under Law: A Case Study of the Hong Kong National Security Law
- 2023-xue-use The Use of Push Notification in Censorship Circumvention
- 2022-bhaskar-many Many Roads Lead To Rome: How Packet Headers Influence DNS Censorship Measurement
- 2022-cheng-in-depth In-Depth Evaluation of the Impact of National-Level DNS Filtering on DNS Resolvers over Space and Time
- 2022-figueira-stegozoa Stegozoa: Enhancing WebRTC Covert Channels with Video Steganography for Internet Censorship Circumvention
- 2022-harrity-get GET /out: Automated Discovery of Application-Layer Censorship Evasion Strategies
- 2022-hoang-measuring Measuring the Accessibility of Domain Name Encryption and Its Impact on Internet Filtering
- 2022-raman-network Network Measurement Methods for Locating and Examining Censorship Devices
- 2022-ramesh-vpnalyzer VPNalyzer: Systematic Investigation of the VPN Ecosystem
- 2022-waheed-darwin-s Darwin's Theory of Censorship: Analysing the Evolution of Censored Topics with Dynamic Topic Models
- 2022-xue-openvpn OpenVPN is Open to VPN Fingerprinting
- 2021-basso-measuring Measuring DoT/DoH blocking using OONI Probe: a preliminary study
- 2021-bock-weaponizing Weaponizing Middleboxes for TCP Reflected Amplification
- 2021-bock-your Your Censor is My Censor: Weaponizing Censorship Infrastructure for Availability Attacks
- 2021-elmenhorst-web Web censorship measurements of HTTP/3 over QUIC
- 2021-gosain-too Too Close for Comfort: Morasses of (Anti-) Censorship in the Era of CDNs
- 2021-kaptchuk-meteor Meteor: Cryptographically Secure Steganography for Realistic Distributions
- 2021-kwan-exploring Exploring Simple Detection Techniques for DNS-over-HTTPS Tunnels
- 2021-lorimer-oustralopithecus OUStralopithecus: Overt User Simulation for Censorship Circumvention
- 2021-rosen-balboa Balboa: Bobbing and Weaving around Network Censorship
- 2021-satija-blindtls BlindTLS: Circumventing TLS-Based HTTPS Censorship
- 2021-sharma-camoufler Camoufler: Accessing The Censored Web By Utilizing Instant Messaging Channels
- 2021-ververis-understanding Understanding Internet Censorship in Europe: The Case of Spain
- 2021-wei-domain Domain Shadowing: Leveraging Content Delivery Networks for Robust Blocking-Resistant Communications
- 2020-barradas-poking Poking a Hole in the Wall: Efficient Censorship-Resistant Internet Communications by Parasitizing on WebRTC
- 2020-barradas-towards Towards a Scalable Censorship-Resistant Overlay Network based on WebRTC Covert Channels
- 2020-birtel-slitheen Slitheen++: Stealth TLS-based Decoy Routing
- 2020-bock-come Come as You Are: Helping Unmodified Clients Bypass Censorship with Server-side Evasion
- 2020-fifield-turbo Turbo Tunnel, a good way to design censorship circumvention protocols
- 2020-frolov-detecting Detecting Probe-resistant Proxies
- 2020-frolov-httpt HTTPT: A Probe-Resistant Proxy
- 2020-govil-mimiq MIMIQ: Masking IPs with Migration in QUIC
- 2020-minaei-moneymorph MoneyMorph: Censorship Resistant Rendezvous using Permissionless Cryptocurrencies
- 2020-nasr-massbrowser MassBrowser: Unblocking the Censored Web for the Masses, by the Masses
- 2020-niaki-iclab ICLab: A Global, Longitudinal Internet Censorship Measurement Platform
- 2020-oakley-protocol Protocol Proxy: An FTE-based covert channel
- 2020-raman-censored Censored Planet: An Internet-wide, Longitudinal Censorship Observatory
- 2020-raman-measuring Measuring the Deployment of Network Censorship Filters at Global Scale
- 2020-sharma-siegebreaker SiegeBreaker: An SDN Based Practical Decoy Routing System
- 2020-vandersloot-running Running Refraction Networking for Real
- 2020-wang-symtcp SymTCP: Eluding Stateful Deep Packet Inspection with Automated Discrepancy Discovery
- 2019-bock-geneva Geneva: Evolving Censorship Evasion Strategies
- 2019-chai-importance On the Importance of Encrypted-SNI (ESNI) to Censorship Circumvention
- 2019-frolov-conjure Conjure: Summoning Proxies from Unused Address Space
- 2019-hoang-measuring Measuring I2P Censorship at a Global Scale
- 2019-iszaevich-distributed Distributed Detection of Tor Directory Authorities Censorship in Mexico
- 2019-nasr-enemy Enemy At the Gateways: Censorship-Resilient Proxy Distribution Using Game Theory
- 2019-sheffey-improving Improving Meek With Adversarial Techniques
- 2018-barradas-effective Effective Detection of Multimedia Protocol Tunneling using Machine Learning
- 2018-bocovich-secure Secure asymmetry and deployability for decoy routing systems
- 2018-hoang-empirical An Empirical Study of the I2P Anonymity Network and its Censorship Resistance
- 2018-hobbs-sudden How Sudden Censorship Can Increase Access to Information
- 2018-manfredi-multiflow MultiFlow: Cross-Connection Decoy Routing using TLS 1.3 Session Resumption
- 2018-martiny-proof-of-censorship Proof-of-Censorship: Enabling centralized censorship-resistant content providers
- 2018-mcdonald-403 403 Forbidden: A Global View of CDN Geoblocking
- 2018-nisar-incentivizing Incentivizing Censorship Measurements via Circumvention
- 2018-tschantz-bestiary A Bestiary of Blocking: The Motivations and Modes behind Website Unavailability
- 2018-vandersloot-quack Quack: Scalable Remote Measurement of Application-Layer Censorship
- 2018-wright-identifying On Identifying Anomalies in Tor Usage with Applications in Detecting Internet Censorship
- 2017-barradas-deltashaper DeltaShaper: Enabling Unobservable Censorship-resistant TCP Tunneling over Videoconferencing Streams
- 2017-bocovich-lavinia Lavinia: An audit-payment protocol for censorship-resistant storage
- 2017-cho-churn A Churn for the Better: Localizing Censorship using Network-level Path Churn and Network Tomography
- 2017-darer-filteredweb FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs
- 2017-frolov-isp-scale An ISP-Scale Deployment of TapDance
- 2017-gebhart-internet Internet Censorship in Thailand: User Practices and Potential Threats
- 2017-gosain-devil-s The Devil's in The Details: Placing Decoy Routers in the Internet
- 2017-heydari-scalable Scalable Anti-Censorship Framework Using Moving Target Defense for Web Servers
- 2017-javaid-online Online Advertising under Internet Censorship
- 2017-jermyn-autosonda Autosonda: Discovering Rules and Triggers of Censorship Devices
- 2017-lee-usability A Usability Evaluation of Tor Launcher
- 2017-li-detor DeTor: Provably Avoiding Geographic Regions in Tor
- 2017-li-lib-cdot-erate lib$\cdot$erate, (n): A library for exposing (traffic-classification) rules and avoiding them efficiently
- 2017-matic-dissecting Dissecting Tor Bridges: a Security Evaluation of Their Private and Public Infrastructures
- 2017-morshed-when When the Internet Goes Down in Bangladesh
- 2017-nasr-waterfall The Waterfall of Liberty: Decoy Routing Circumvention that Resists Routing Attacks
- 2017-pearce-augur Augur: Internet-Wide Detection of Connectivity Disruptions
- 2017-pearce-global Global Measurement of DNS Manipulation
- 2017-singh-characterizing Characterizing the Nature and Dynamics of Tor Exit Blocking
- 2017-tanash-decline The Decline of Social Media Censorship and the Rise of Self-Censorship after the 2016 Failed Turkish Coup
- 2017-ververis-internet Internet Censorship Capabilities in Cyprus: An Investigation of Online Gambling Blocklisting
- 2017-wang-your Your State is Not Mine: A Closer Look at Evading Stateful Internet Censorship
- 2017-weinberg-topics Topics of Controversy: An Empirical Analysis of Web Censorship Lists
- 2016-akbar-dns-sly DNS-sly: Avoiding Censorship through Network Complexity
- 2016-bocovich-slitheen Slitheen: Perfectly Imitated Decoy Routing through Traffic Replacement
- 2016-douglas-ghostpost GhostPost: Seamless Restoration of Censored Social Media Posts
- 2016-douglas-salmon Salmon: Robust Proxy Distribution for Censorship Circumvention
- 2016-elahi-framework A Framework for the Game-theoretic Analysis of Censorship Resistance
- 2016-fifield-censors Censors' Delay in Blocking Circumvention Proxies
- 2016-fifield-fingerprintability Fingerprintability of WebRTC
- 2016-hahn-games Games Without Frontiers: Investigating Video Games as a Covert Channel
- 2016-khattak-sok SoK: Making Sense of Censorship Resistance Systems
- 2016-kohls-skypeline SkypeLine: Robust Hidden Data Transmission for VoIP
- 2016-li-mailet Mailet: Instant Social Networking under Censorship
- 2016-mcpherson-covertcast CovertCast: Using Live Streaming to Evade Internet Censorship
- 2016-nasr-game Game of Decoys: Optimal Decoy Routing Through Game Theory
- 2016-safaka-matryoshka Matryoshka: Hiding Secret Communication in Plain Sight
- 2016-scott-satellite Satellite: Joint Analysis of CDNs and Network-Level Interference
- 2016-singh-politics The Politics of Routing: Investigating the Relationship Between AS Connectivity and Internet Freedom
- 2016-tschantz-sok SoK: Towards Grounding Censorship Circumvention in Empiricism
- 2016-zarras-leveraging Leveraging Internet Services to Evade Censorship
- 2016-zolfaghari-practical Practical Censorship Evasion Leveraging Content Delivery Networks
- 2015-aceto-internet Internet Censorship detection: A survey
- 2015-aceto-monitoring Monitoring Internet Censorship with UBICA
- 2015-burnett-encore Encore: Lightweight Measurement of Web Censorship with Cross-Origin Requests
- 2015-crandall-forgive Forgive Us our SYNs: Technical and Ethical Considerations for Measuring Internet Filtering
- 2015-dyer-marionette Marionette: A Programmable Network-Traffic Obfuscation System
- 2015-ellard-rebound Rebound: Decoy Routing on Asymmetric Routes Via Error Messages
- 2015-fifield-blocking-resistant Blocking-resistant communication through domain fronting
- 2015-gill-characterizing Characterizing Web Censorship Worldwide: Another Look at the OpenNet Initiative Data
- 2015-jones-can Can Censorship Measurements Be Safe(r)?
- 2015-jones-ethical Ethical Concerns for Censorship Measurement
- 2015-levin-alibi Alibi Routing
- 2015-narayanan-no No Encore for Encore? Ethical Questions for Web-Based Censorship Measurement
- 2015-nisar-case A Case for Marrying Censorship Measurements with Circumvention
- 2015-vines-rook Rook: Using Video Games as a Low-Bandwidth Censorship Resistant Communication Platform
- 2015-wang-seeing Seeing through Network-Protocol Obfuscation
- 2014-anderson-global Global Network Interference Detection over the RIPE Atlas Network
- 2014-brubaker-cloudtransport CloudTransport: Using Cloud Storage for Censorship-Resistant Networking
- 2014-connolly-trist TRIST: Circumventing Censorship with Transcoding-Resistant Image Steganography
- 2014-ensafi-detecting Detecting Intentional Packet Drops on the Internet via TCP/IP Side Channels
- 2014-houmansadr-no No Direction Home: The True Cost of Routing Around Decoys
- 2014-jones-automated Automated Detection and Fingerprinting of Censorship Block Pages
- 2014-jones-facade Facade: High-Throughput, Deniable Censorship Circumvention Using Web Search
- 2014-li-facet Facet: Streaming over Videoconferencing for Censorship Circumvention
- 2014-luchaup-libfte LibFTE: A Toolkit for Constructing Practical, Format-Abiding Encryption Schemes
- 2014-nobori-vpn VPN Gate: A Volunteer-Organized Public VPN Relay System with Blocking Resistance for Bypassing Government Censorship Firewalls
- 2014-roos-measuring Measuring Freenet in the Wild: Censorship-resilience under Observation
- 2014-tan-censorship Censorship Resistance as a Side-Effect
- 2014-wachs-censorship-resistant A Censorship-Resistant, Privacy-Enhancing and Fully Decentralized Name System
- 2014-wustrow-tapdance TapDance: End-to-Middle Anticensorship without Flow Blocking
- 2013-benson-gaining Gaining Insight into AS-level Outages through Analysis of Internet Background Radiation
- 2013-dalek-method A Method for Identifying and Confirming the Use of URL Filtering Products for Censorship
- 2013-das-self-censorship Self-Censorship on Facebook
- 2013-detal-revealing Revealing Middlebox Interference with Tracebox
- 2013-durumeric-zmap ZMap: Fast Internet-wide Scanning and its Security Applications
- 2013-dyer-protocol Protocol Misidentification Made Easy with Format-Transforming Encryption
- 2013-fifield-oss OSS: Using Online Scanning Services for Censorship Circumvention
- 2013-geddes-cover Cover Your ACKs: Pitfalls of Covert Channel Censorship Circumvention
- 2013-hasan-building Building Dissent Networks: Towards Effective Countermeasures against Large-Scale Communications Blackouts
- 2013-houmansadr-i I want my voice to be heard: IP over Voice-over-IP for unobservable censorship circumvention
- 2013-houmansadr-parrot The Parrot is Dead: Observing Unobservable Network Communications
- 2013-invernizzi-message Message In A Bottle: Sailing Past Censorship
- 2013-khattak-towards Towards Illuminating a Censorship Monitor's Model to Facilitate Evasion
- 2013-ruffing-identity-based Identity-Based Steganography and Its Applications to Censorship Resistance
- 2013-verkamp-five Five Incidents, One Theme: Twitter Spam as a Weapon to Drown Voices of Protest
- 2013-wachs-feasibility On the Feasibility of a Censorship Resistant Decentralized Name System
- 2013-wang-rbridge rBridge: User Reputation based Tor Bridge Distribution with Privacy Preservation
- 2013-winter-scramblesuit ScrambleSuit: A Polymorphic Network Protocol to Circumvent Censorship
- 2013-winter-towards Towards a Censorship Analyser for Tor
- 2013-zhou-sweet SWEET: Serving the Web by Exploiting Email Tunnels
- 2012-duan-hold-on Hold-On: Protecting Against On-Path DNS Poisoning
- 2012-fifield-evading Evading Censorship with Browser-Based Proxies
- 2012-filast-ooni OONI: Open Observatory of Network Interference
- 2012-lincoln-bootstrapping Bootstrapping Communications into an Anti-Censorship System
- 2012-ling-extensive Extensive Analysis and Large-Scale Empirical Evaluation of Tor Bridge Discovery
- 2012-moghaddam-skypemorph SkypeMorph: Protocol Obfuscation for Tor Bridges
- 2012-rogers-secure Secure Communication over Diverse Transports
- 2012-schuchard-routing Routing Around Decoys
- 2012-sparks-collateral The Collateral Damage of Internet Censorship by DNS Injection
- 2012-thomas-adapting Adapting Social Spam Infrastructure for Political Censorship
- 2012-vasserman-one-way One-way indexing for plausible deniability in censorship resistant storage
- 2012-verkamp-inferring Inferring Mechanics of Web Censorship Around the World
- 2012-wang-censorspoofer CensorSpoofer: Asymmetric Communication using IP Spoofing for Censorship-Resistant Web Browsing
- 2012-weinberg-stegotorus StegoTorus: A Camouflage Proxy for the Tor Anonymity System
- 2011-bachrach-h00t \#h00t: Censorship Resistant Microblogging
- 2011-bonneau-scrambling Scrambling for lightweight censorship resistance
- 2011-dainotti-analysis Analysis of Country-wide Internet Outages Caused by Censorship
- 2011-danezis-anomaly-based An anomaly-based censorship-detection system for Tor
- 2011-espinoza-automated Automated Named Entity Extraction for Tracking Censorship of Current Events
- 2011-houmansadr-cirripede Cirripede: Circumvention Infrastructure using Router Redirection with Plausible Deniability
- 2011-jones-hiding Hiding Amongst the Clouds: A Proposal for Cloud-based Onion Routing
- 2011-karlin-decoy Decoy Routing: Toward Unblockable Internet Communication
- 2011-kathuria-bypassing Bypassing Internet Censorship for News Broadcasters
- 2011-liu-tor Tor Instead of IP
- 2011-mccoy-proximax Proximax: A Measurement Based System for Proxies Dissemination
- 2011-roberts-mapping Mapping Local Internet Control
- 2011-seltzer-infrastructures Infrastructures of Censorship and Lessons from Copyright Resistance
- 2011-sfakianakis-censmon CensMon: A Web Censorship Monitor
- 2011-shklovski-online Online Contribution Practices in Countries that Engage in Internet Blocking and Censorship
- 2011-smits-bridgespa BridgeSPA: Improving Tor Bridges with Single Packet Authorization
- 2011-wiley-dust Dust: A Blocking-Resistant Internet Transport Protocol
- 2011-wright-fine-grained Fine-Grained Censorship Mapping: Information Sources, Legality and Ethics
- 2011-wustrow-telex Telex: Anticensorship in the Network Infrastructure
- 2010-burnett-chipping Chipping Away at Censorship Firewalls with User-Generated Content
- 2010-mahdian-fighting Fighting Censorship with Algorithms
- 2010-pfitzmann-terminology A terminology for talking about privacy by data minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management
- 2009-backes-anonymity Anonymity and Censorship Resistance in Unstructured Overlay Networks
- 2009-cao-skyf2f SkyF2F: Censorship Resistant via Skype Overlay Network
- 2009-mclachlan-risks On the risks of serving whenever you surf: Vulnerabilities in Tor's blocking resistance design
- 2009-weaver-detecting Detecting Forged TCP Reset Packets
- 2008-aycock-good ``Good'' Worms and Human Rights
- 2008-sovran-pass Pass it on: Social Networks Stymie Censors
- 2007-crandall-conceptdoppler ConceptDoppler: A Weather Tracker for Internet Censorship
- 2006-clayton-failures Failures in a Hybrid Content Blocking System
- 2006-dingledine-design Design of a blocking-resistant anonymity system
- 2006-wolfgarten-investigating Investigating large-scale Internet content filtering
- 2005-perng-censorship Censorship Resistance Revisited
- 2004-danezis-economics The Economics of Censorship Resistance
- 2004-k-psell-achieve How to Achieve Blocking Resistance for Existing Systems Enabling Anonymous Web Surfing
- 2003-dornseif-government Government mandated blocking of foreign Web content
- 2003-feamster-thwarting Thwarting Web Censorship with Untrusted Messenger Discovery
- 2002-feamster-infranet Infranet: Circumventing Web Censorship and Surveillance
- 2002-serjantov-anonymizing Anonymizing Censorship Resistant Systems
- 2001-handley-network Network Intrusion Detection: Evasion, Traffic Normalization, and End-to-End Protocol Semantics
- 2001-stubblefield-dagster Dagster: Censorship-Resistant Publishing Without Replication
- 2001-waldman-tangler Tangler: A Censorship-Resistant Publishing System Based On Document Entanglements
- 2000-waldman-publius Publius: A robust, tamper-evident, censorship-resistant web publishing system
- 1998-ptacek-insertion Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection
- 1996-anderson-eternity The Eternity Service
859 findings tagged here
-
A three-stage detection pipeline exploiting the "dual-role" behavioral fingerprint of single-IP circumvention relays achieved 23.2% recall (96/414 ground-truth relays) with a 0.18% false-positive rate against 97,651 benign TLS servers, for an overall accuracy of 99.5%. The ground-truth set covered OpenVPN, WireGuard, and SOCKS relays identified in a 17 TB single-day backbone trace (WIDE Project, April 9, 2025).
-
The paper identifies a fundamental architectural vulnerability in single-IP circumvention designs: a relay must generate new observable flows (via DNS or TLS SNI) to reach end services after receiving client connections, creating a detectable server-and-client behavioral contrast. A relay accessing user-facing domains (news, social media) scores high on a Relay Suspicion Score (w=0.9) versus infrastructure domains (w=0.1). The paper argues this host-level signal is censorship-invariant and cannot be concealed by link obfuscation.
-
Stage 1 of the detection pipeline uses a lightweight heuristic: restrict analysis to IP addresses in "VPS-dense ASNs," which censors already target for resource-intensive inspection of fully-encrypted traffic. This pre-filter dramatically reduces the search space before applying the more expensive dual-role behavioral analysis. The evaluation was conducted without Stages 1 and 3 due to dataset limitations, meaning the reported 23% recall and 0.18% FPR are conservative lower bounds on the full pipeline's performance.
-
ShieldShare demonstrates that an Android application can route all hotspot-client traffic through a VPN tunnel without root access by using a SOCKS5/HTTP/HTTPS proxy layer between the hotspot and the VPN, with per-client traffic accounting and quota management. The system works because Android's native hotspot does not forward VPN routing tables to connected clients; ShieldShare interposes a proxy that handles this. Released as open-source.
-
ShieldShare's modular architecture (VPN detection, hotspot management, HTTP/HTTPS/SOCKS5 proxy forwarding, traffic metering) shows that community-proxy deployment on commodity Android hardware is technically feasible without root, and that accurate per-client bandwidth allocation and accounting can be maintained under the constraint. The evaluation confirms reliable routing of client traffic through VPN tunnels.
-
NATA (Non-invasive Active Traffic-correlation Analysis) injects low-frequency bandwidth waveforms (sinusoidal, square-wave, triangular) into Tor TCP connections at an upstream gateway without endpoint compromise, payload decryption, or Tor-browser modification. BM-Net, a selective state-space classifier trained on the exit-side observations, achieves a 99.65% binary detection F1 score distinguishing watermarked from natural traffic on a 20,000-trace real-world dataset.
-
BM-Net achieves a 99.65% binary detection F1 score for distinguishing bandwidth-watermarked Tor flows from natural traffic, outperforming all evaluated baselines (next best: TikTok at 75.96% F1). The accuracy gap stems from active perturbation imposing a deterministic low-frequency throughput constraint rather than relying on subtle natural metadata, making the detection task fundamentally easier than passive website fingerprinting.
-
BM-Net achieves a 99.65% binary detection F1 score distinguishing watermarked from natural Tor flows, and a 97.5% macro-F1 score for fine-grained modulation classification across sinusoidal, square-wave, and triangular patterns. The fine-grained test set contains 201 held-out samples collected from ten clients across five geographic regions (Europe, North America, Australia, Southeast Asia, East Asia), with training traces including traffic collected under WTF-PAD and Walkie-Talkie defenses.
-
Active bandwidth perturbation has an inherent detectability–stability trade-off: overly aggressive low-rate phases cause Tor SENDME-based flow control stalls, retransmissions, timeouts, or circuit replacement before sufficient correlation evidence is collected. The paper selects a 30-second modulation period and an empirically determined minimum shaping rate; the usable shaping range varies with relay load, path length, TCP congestion control behavior, and Tor multiplexing.
-
Tornettools-based simulations with a 1%-scaled Tor network show that with 1 adversary-controlled exit relay (148 Mbps), the exit-observation probability pexit ≈ 2.13%; with 5 adversary-controlled relays, pexit exceeds 10%. Tor's bandwidth-weighted path selection means high-bandwidth malicious exits attract a disproportionate share of circuits, and repeated observations over multiple circuits compound correlation risk multiplicatively even at modest relay counts.
-
Using a 1%-scaled tornettools simulation with historical Tor consensus data, a single adversary-controlled exit relay at 148 Mbps yields an exit-observation probability of approximately 2.13%; deploying 5 adversary-controlled exit relays pushes observation probability above 10%. The aggregation effect is concave — repeated observations across T independent windows compound via 1 − (1 − Pcorr)^T.
-
BM-Net achieves a 97.5% macro-F1 score for fine-grained classification of three modulation geometries (sinusoidal, square-wave, triangular) from noisy exit-side Tor observations using only 201 labeled test samples collected across cross-continental Tor paths. Residual errors concentrate between natural traffic and square-wave modulation, as abrupt low-rate transitions are partially smoothed by Tor multiplexing and network jitter.
-
Fine-grained modulation classification (natural vs. sinusoidal vs. square-wave vs. triangular) achieves 97.5% macro-F1 on a 201-sample held-out test set. Square-wave waveforms are the hardest class (F1 = 95.7%), while sinusoidal and triangular each reach 99.0% F1, because abrupt square-wave transitions are partially smoothed by Tor multiplexing and network dynamics.
-
NATA (Non-invasive Active Traffic-correlation Analysis) requires no endpoint compromise, no Tor-browser modification, and no payload decryption. The adversary controls only an upstream network gateway (ISP/AS level) to impose bandwidth modulation on Tor TCP connections, and observes traffic at adversary-controlled exit relays — a Shaper–Sniffer architecture that operates purely at the network-infrastructure layer.
-
Padding-based client-side defenses including WTF-PAD and Walkie-Talkie are insufficient against active bandwidth perturbation: they reshape packet timing and burst structure but cannot remove the upstream rate limit imposed by the gateway shaper. BM-Net trained on a defense-aware dataset containing both undefended and WTF-PAD/Walkie-Talkie traces still achieves 99.65% F1, and the paper explicitly notes that 'client-side padding and burst reshaping may alter the logical traffic pattern, but they do not directly remove the rate limit imposed by the upstream bottleneck.'
-
Client-side padding defenses (WTF-PAD and Walkie-Talkie) do not remove active bandwidth watermarks because they operate on packet timing and burst-level structure, not on the upstream rate limit; BM-Net still achieves 99.65% binary detection F1 on a mixed dataset containing both defended and undefended traces. The upstream shaper's rate constraint causes delayed, queued, or dropped packets whose throughput envelope persists at the exit relay regardless of application-layer obfuscation.
-
Using tornettools-based simulations with historical Tor consensus data scaled to 1% of the real network (80 relays), the adversary's exit-observation probability p̂exit grows monotonically with adversary-controlled bandwidth: a single exit relay at 148 Mbps yields p̂exit ≈ 2.13%, and 5 adversary-controlled exit relays push p̂exit above 10%.
-
An infrastructure-level adversary must balance watermark detectability against connection stability: the paper's threat model requires a minimum shaping rate rmin to prevent Tor circuit stalls, timeouts, or circuit replacement, and notes that repeated poor-throughput events can cause the circuit to be abandoned before sufficient watermark evidence is accumulated. This detectability–stability trade-off constrains the practical attack to macroscopic (30-second) modulation periods rather than fine-grained packet-level timing manipulation.
-
WTF-PAD and Walkie-Talkie client-side defenses — which operate on packet timing, padding, and burst-level structure — do not remove the throughput constraint imposed by an upstream rate limiter. When the shaping rate decreases, excess traffic is delayed, queued, or dropped; exit-side throughput retains the imposed modulation waveform. BM-Net was trained and evaluated on a dataset that includes both undefended and WTF-PAD/Walkie-Talkie-defended traces, confirming detection persists under this mixed condition.
-
Simulations extending the ENEM19 game-theory framework show that ephemeral proxy schemes (modeled on Snowflake/Lantern) effectively neutralize both the "optimal" and "aggressive" censors from the original framework. In overprovisioned settings (proxies arriving at 250/step vs. 200 clients/step), even the null censor scenario outperforms either censor in equal-arrival settings. Over 90% of waiting users receive a proxy within 1 time step. The critical variable is not censor sophistication but proxy arrival rate relative to client demand—high proxy churn combined with high arrival rate defeats both enumeration strategies tested.
-
The host-profiling censor (passive traffic analysis: count connections per server, block those exceeding a threshold τ within a window w) blocks essentially all circumvention user traffic within 30 time steps for all classifier qualities tested (ρ_TP ∈ {0.9, 0.95, 0.99}), while causing far less collateral damage than zig-zag (never exceeding ~30% innocent server blocking). Snowflake resists this attack well: with w=3, τ=3, over 94.48% of users receive a proxy within 2 steps even with worst-classifier rules, and final unblocked server rates are 91.24–99.04%. The host profiling approach is feasible for passive censors who cannot enumerate the distribution channel.
-
Multi-censor simulations show that single-censor-optimized distribution strategies perform suboptimally in realistic multi-region deployments. When two networks have different censor strategies (e.g., one optimal, one zig-zag), the distributor cannot detect that a proxy is blocked until all censors have blocked it; this leaves clients without reachable proxies despite the proxy appearing "available" from the distributor's view. The authors conclude that "single-censor evaluation does not accurately predict more realistic deployment performance." A zig-zag censor in one region with 0.25 weight caused 44.4% collateral damage while reducing proxy lifetime to a median of 4 steps.
-
The zig-zag traffic analysis attack (confirmed supported in Geedge TSG leak) rapidly enumerates all static proxy pools. With ζ_watch ∈ {4, 6} steps and a best-quality classifier (ρ_TP=0.99, ρ_FP=0.001), almost total proxy enumeration and user blockage occurs well before step 300. Even ζ_watch=2 leaves ~50% of users blocked. Collateral damage is high across all settings when ζ_watch ≥ 4: eventually ~50% of innocent servers are also blocked. However, Snowflake-style ephemeral proxies resist zig-zag effectively: reachability remains above 95% after 360 steps because churn prevents the censor from expanding its known proxy set beyond agents' direct assignments.
-
Adversarial pre-padding — prepending stochastic byte noise to packets — degrades ET-BERT encrypted traffic classification accuracy from >99% to 25.68%, exposing a structural vulnerability in all payload-byte-dependent detection systems. White-box adversarial attacks (Ayaka AH-MSI) additionally achieve evasion rates exceeding 99.5% against standard continuous-time sequence models via Manifold Shattering, where adversaries align malicious temporal distributions with benign baselines.
-
AEGIS, a flow-physics-only ML classifier using a Hyperbolic Liquid State Space Model evaluated on a 400GB adversarial corpus including VLESS Reality, GhostBear, and AMOI-morphed traffic, achieves F1-score 0.9952, 99.50% TPR, and 0.2141% FPR at 262.27 µs inference latency on an RTX 4090. The system discards all payload bytes and classifies traffic exclusively on 6-dimensional flow physics: packet size, inter-arrival time, directionality, TCP window size, TCP flags, and payload ratio.
-
Automated proxy engines (e.g., Xray-core running VLESS Reality in automated mode) generate deterministically rigid inter-arrival time distributions because they cannot synthesize the stochastic variance of human-driven IAT, even when volumetrically anchored to benign distributions ('Fat Middle' anchoring via AMOI). The AEGIS Thermodynamic Variance Detector identifies this rigidity via Shannon Entropy of hidden states across 1,000-packet causal windows, rendering volumetric anchoring mathematically distinguishable from genuine human traffic.
-
Gaussian noise injection stress testing shows AEGIS maintains F1-scores of 0.9913 at 5% IAT noise and 0.9753 at 10% IAT noise, but degrades to 0.5939 at 15% Gaussian noise — establishing the 'Manifold Shattering Threshold.' The paper asserts that sustaining 15% IAT noise in practice corrupts the adversary's own C2 channel integrity, making this threshold operationally unachievable for high-throughput tunnels.
-
Flow-physics classifiers face a fundamental 'Human Entropy Horizon': when VLESS Reality multiplexes true human entropy (a human actively browsing web applications), AEGIS achieves a detection rate of only 1.17%, because XTLS wrappers impart near-zero mechanical overhead and the temporal physics remain entirely stochastic. This implies adversaries operating at human interaction speeds can evade flow-based detection, but must abandon automated high-throughput C2 scripts.
-
Routing-guided conditional aggregation (CA) that dynamically weights header versus payload contributions using per-sample MoE routing probabilities outperforms static fusion on all six datasets, demonstrating that the relative discriminative utility of headers versus payloads varies by application type — and that classifiers can adaptively shift reliance to whichever modality is less obfuscated.
-
Explicitly disentangling packet headers (structured, low-entropy) from encrypted payloads (high-entropy, stochastic) into separate MoE branches yields consistent gains across six datasets: 86.85% F1 on 120-class TLS 1.3 traffic (CSTNET-TLS), 97.88% F1 on USTC-TFC2016 malware/benign flows, and 92.65% F1 on imbalanced IoT traffic (CIC-IoT2022), demonstrating that headers and payloads carry fundamentally different and independently exploitable discriminative signals.
-
Pretraining on 30 GB of unlabeled mixed traffic via masked language modeling (ISCX-VPN2016 NonVPN, CICIDS2017, WIDE backbone), then fine-tuning, enables TrafficMoE to classify VPN application traffic at 88.72% F1 and VPN service traffic at 92.61% F1, exceeding all fully supervised and prior pretraining baselines without requiring labeled training data for those domains.
-
TrafficMoE achieves 97.65% accuracy and F1-score on the ISCX-Tor2016 dataset, substantially outperforming all baselines including the best pretraining-based competitor FlowletFormer (91.16% F1), by separately modeling protocol headers and encrypted payloads via dual-branch sparse Mixture-of-Experts rather than treating them as a unified byte stream.
-
An uncertainty-aware filtering (UF) mechanism quantifies per-token reliability via Shannon entropy of the cross-modal header–payload attention matrix, finding that encrypted payloads still contain low-entropy tokens with stable cross-modal alignment that serve as reliable classification anchors — demonstrating that nominally randomized byte streams retain exploitable low-entropy structure.
-
Assemblage's anti-censorship collateral damage argument rests on the economic and social value of AI-generated image communities. Blocking DeviantArt (65M MAU), Reddit (1.21B MAU), X/Twitter (611M MAU), or Telegram (1B MAU) to suppress steganographic circumvention would cause massive collateral damage to legitimate users—and to Chinese companies' revenue in the case of platforms popular in CN. The paper observes that even in authoritarian regimes, everyday users actively post AI-generated content, making blanket platform blocking politically and economically costly.
-
Lossy image compression is the primary practical barrier to deploying Assemblage on major platforms. Of 8 tested platforms, WeChat and Rednote (combined 2.6 billion MAU) failed because they serve only lossy-compressed downloads, destroying embedded steganographic content. Platforms that preserve lossless originals (Reddit, X/Twitter, DeviantArt, Discord, Imgur, Telegram) succeeded end-to-end. Discord serves ~30 KB compressed thumbnails by default but provides lossless originals via its native "Download" option.
-
Assemblage's diffusion-model steganography (Pulsar) encodes 300–618 bytes per image vector (mean ± SD by model). Generating one local state takes ~9.5 sec on an Apple M4 Pro; encoding takes ~4.4 sec; decoding takes ~4.2 sec. Sending a compressed 300-word message requires only K+h = 4+2 images using the church-256 model, with total send time ~90 sec and receive time ~30 sec. Perceptual-hash candidate detection runs in ~0.33 ms per image, making scanning all ~150 daily posts on /r/AIArt take under 1 second.
-
Assemblage inherits the bootstrapping limitation of all generative steganographic schemes: sender and receiver must share a symmetric key before communication begins. Public-key steganography exists in theory but does not currently support common image/text channels efficiently. The paper identifies three viable deployment scenarios: (1) travelers who carry a pre-shared secret before entering a censored region; (2) users in countries with episodic censorship who establish the key during uncensored periods; (3) a hybrid where a one-time signaling channel establishes the secret, after which Assemblage carries subsequent traffic.
-
Balboa's synchronous leaf-content replacement adds non-negligible timing differences that allow censors to identify its activity with up to ~90% accuracy over different network conditions. The timing anomaly arises because Balboa performs data substitution directly at each data exchange, delaying the server's response while covert data is prepared.
-
Without chunk-based padding, an XGBoost classifier identifies the target website from covert data-chunk sizes with 91% accuracy (Tranco top-100). Chunking at 2 MB reduces accuracy to 12% at a 21.3% bandwidth overhead, while 16 MB chunks reduce accuracy to near random guessing at a 480.3% overhead. Chunks as small as 64 KB already reduce accuracy to 64%, demonstrating a monotonic fingerprinting–overhead tradeoff.
-
Huma separates proxy duties between untrusted Decoy Websites (DWs), which relay encrypted messages and serve content, and trusted Shade Proxies (SPs) outside the censored region, which decrypt requests and contact covert destinations. Even if a DW is compromised, the censor learns only whether a specific UID can access the system — no destination, no content, and no client network-layer information. SP assignment is centrally managed by the Huma Authority, preventing DW-SP collusion.
-
Huma's deferred-reply / double-request receive (DRR) protocol reduces a traffic-fingerprinting XGBoost classifier's accuracy to at most 54% (near random guessing) across geographically distributed clients (San Francisco, Frankfurt, Bangalore). A Kolmogorov-Smirnov test on absolute page-load timing distributions yields D=0.03, p=0.98 for U.S. clients — substantially tighter than Waterfall of Liberty's D=0.11 at p=0.5 — confirming that Huma flows are statistically indistinguishable from benign HTTPS fetches.
-
WebSocket, required by HTTPT and WebTunnel to establish covert channels inside TLS connections, had an adoption rate as low as 6.3% of websites in 2021, sharply limiting the pool of volunteer websites that can act as proxies for these tools. By contrast, Huma's traffic replacement scheme embeds covert data in standard HTTP leaf objects (images, scripts, CSS), requiring only that the DW serve HTTP content — a near-universal property.
-
Despite AWS, Google, and Microsoft having publicly withdrawn CDN-level domain-fronting support to preserve commercial relationships with censoring states, domain fronting remains functional on AWS Lambda as of early 2026. Microsoft Azure Functions explicitly rejects mismatched SNI/Host headers, whereas AWS Lambda permits a client to present a legitimate *.lambda-url.*.on.aws SNI while routing internally to a different serverless function via the HTTP Host header.
-
During the cold-start phase on a newly migrated serverless bridge (approximately the first 0–50 invocations), average function duration spikes to over 6,000 ms and success rate occasionally drops below 90%. The system stabilizes between invocations 100–200, with average durations consistently below 1,000 ms and success rates above 95%; AWS Lambda by default supports up to 1,000 concurrent invocations without throttling.
-
CensorLess vanilla mode costs $0.27/month for a single proxy processing 6.76 GB of traffic monthly, a 97.1% reduction (34.4×) over SpotProxy's optimal single-NIC configuration ($9.28/month). The private mode, which adds a t4g.micro EC2 VPS for end-to-end encryption via SOCKS, costs $3.41/month — still 63.3% cheaper than SpotProxy's cheapest option. Costs remain below $3.50/day even when scaling to 300 proxies.
-
Striding with factor 4 (early downsampling) produces the largest single-factor degradation in the ablation study: average macro-F1 drops from 0.9909 to 0.9772 and cross-dataset variance increases from 4.77×10⁻⁵ to 4.51×10⁻⁴, with worst-case dataset performance falling to MIN 0.9524. Fine-grained byte order and short-range structure — protocol headers, payload signatures, repeated byte motifs — carry essential discriminative signal that stride-based aggregation destroys.
-
Early downsampling via striding (stride=4) is the single most damaging ablation, reducing average macro-F1 from 0.9909 to 0.9772 and increasing cross-dataset variance from 4.77×10⁻⁵ to 4.51×10⁻⁴, while the worst-case dataset drops to F1=0.9524 — far larger degradation than any other design choice including Mamba-1 vs Mamba-2.
-
A burst of just 5 packets truncated to 320 bytes each (1600 bytes total) suffices for macro-F1 ≥0.9824 across all six benchmarks; the classification token reads from the final recurrent state after a 4-layer Mamba-2 stack processing this fixed-length prefix, with no additional flow-level or session-level context required.
-
Classification from the first 5 packets × 320 bytes (1600-byte burst) achieves near-perfect accuracy across Tor (F1=0.9990), VPN (F1=0.9871), malware (F1=0.9954), and IoT attack traffic (F1=0.9966), with IP addresses masked and only header and initial payload retained. The earliest portion of each packet provides sufficient discriminative information for a classification decision made within the first kilobyte of a flow.
-
MambaNetBurst classifies Tor traffic (ISCXTor2016) at F1=0.9990 and VPN traffic (ISCXVPN2016) at F1=0.9871 using only the first 5 packets (1600 bytes total) with no pre-training, matching or exceeding pre-trained baselines like ET-BERT (ISCXTor F1=0.9967, ISCXVPN F1=0.9565) and NetMamba (ISCXTor F1=0.9986, ISCXVPN F1=0.9806) at 2.5–2.7M parameters.
-
Mamba-2's constrained scalar-times-identity A-matrix acts as an implicit regularizer for packet-byte sequences: under matched settings it yields higher mean F1 (0.9909 vs 0.9874), better worst-case F1 (0.9824 vs 0.9769), and 48% lower cross-dataset variance (4.77×10⁻⁵ vs 9.21×10⁻⁵) relative to Mamba-1, while delivering 30–60% faster backward passes and 2–4× lower GPU memory usage.
-
Mamba-2 (2.5M parameters) is Pareto-optimal on the accuracy-vs-inference-time frontier: it achieves average macro-F1 of 0.9909 with 30-60% faster backward passes than Mamba-1 and 2-3× faster inference than linear Transformers with FlashAttention-2 at medium-to-large batch sizes on a single RTX 3090. Memory usage is 2-4× lower than Transformer-based counterparts, enabling single-GPU operation at sequence length 1600.
-
Supervised byte-level training without pre-training reduces total compute by an estimated 3–15× in wall-clock training time and 2–4× in training memory footprint compared to pre-trained Transformer baselines (ET-BERT, YaTC, NetMamba), while achieving equivalent or superior classification F1 across six benchmarks spanning encrypted app identification, VPN/Tor, malware, and IoT attack traffic.
-
Eliminating self-supervised pretraining reduces total wall-clock training time by an estimated 3-15× relative to ET-BERT, YaTC, and NetMamba, while achieving comparable or superior accuracy. Pretraining in representative baselines typically consumes 10-100× more compute than downstream fine-tuning; removing it also eliminates the risk of negative transfer from mismatched pretraining corpora under concept drift.
-
MambaNetBurst achieves macro-F1 of 0.9990 on ISCXTor2016 and 0.9871 on ISCXVPN2016 without any pretraining, matching or exceeding heavily pretrained baselines such as ET-BERT (F1=0.9967/0.9565) and YaTC (F1=0.9986/0.9806). High-accuracy Tor and VPN traffic classification is achievable with a compact 2.5M-parameter supervised model requiring no labeled pretraining corpus.
-
As of October 2024, 22% (~220K) of Tranco top-1M domains support QUIC; of those, only 12.8% (~28K) are fully QUICstep-compatible (support IP-address migration). However, port-migration support grew 20% in 3 months (26,234 → 31,262 domains from August to late September 2024). Cloudflare hosts 74.6% of QUIC-supporting domains but only 0.2% support connection migration; if Cloudflare enabled it, 87.2% of QUIC-supporting domains would become compatible. Among QUIC-SNI-blocked domains in China (28,458 total), 2,404 (8.45%) support QUIC and 828 (34.4%) of those are QUICstep-compatible today.
-
A censor attempting to block QUICstep by dropping all QUIC connections that arrive without a preceding Initial/Handshake packet would cause significant collateral damage. Analysis of 24-hour campus traces (3,786,050 unique QUIC connections) found 29.1% (1,100,439 connections) lacked QUIC Initial or Handshake packets—representing legitimate connection migration from mobile handoffs and similar events. This high baseline rate means blanket "no handshake" blocking would disrupt roughly 1-in-3 QUIC connections unrelated to circumvention.
-
QUICstep reduces proxy (handshake channel) traffic by a median of 93% across 100 tested domains compared to full VPN tunneling. For www.youtube.com specifically, proxy traffic dropped from 3.634 MB (full VPN) to 96 KB (QUICstep), a 97.4% reduction. Page load time improved by up to 84% versus full VPN. Performance gain is greatest when the handshake channel is bandwidth-limited (1–5 Mbps): QUICstep/VPN ratios of 0.07–0.09 at 1 Mbps, 0.34–0.46 at 5 Mbps from London to nearby proxies. Psiphon's free tier (2 Mbps) and Tor (~10 Mbps median) are both well within the bandwidth regime where QUICstep provides substantial gains.
-
FreeUp achieves 86.68% AUC on CIC-IoT2023, 85.44% AUC on DoHBrw2020 (malicious DNS-over-HTTPS tunneling), and 95.53% AUC / 93.22% F1 on ISCX-Tor2016 (Tor anonymous traffic), outperforming all nine baselines by more than 3% AUC on the first two datasets. The ISCX-Tor2016 result demonstrates that frequency-decoupled ML classifiers can detect Tor-like anonymous traffic with high confidence under zero-positive (unsupervised) training.
-
FreeUp's dual-branch architecture requires only 0.73G MACs and 6.46M parameters at inference — comparable to or lower than simpler baselines (MFR: 1.01G MACs / 11.18M params; ARCADE: 0.82G MACs / 6.70M params) — while achieving substantially higher detection accuracy. The two branches can be deployed in parallel with minimal memory usage, making frequency-decoupled ML detection computationally practical for real-time network monitoring at scale.
-
Encrypted traffic exhibits a 'full-frequency' spectral property where both low- and high-frequency components are highly active with comparable intensity, unlike natural images which are dominated by low-frequency components. Fourier Transform analysis across CIC-IoT2023, DoHBrw2020, and ISCX-Tor2016 confirms this distinction is pervasive. This signature is an inherent consequence of encryption disrupting byte-level semantics into a visually disordered, noise-like spatial pattern.
-
Ablation experiments show that removing the high-frequency branch from FreeUp degrades AUC from 86.68% to 77.09% on CIC-IoT2023 (−9.6 pp) and from 95.53% to 95.10% on ISCX-Tor2016. Removing the entire frequency-decoupled framework causes the largest degradation, dropping to 82.10% AUC on CIC-IoT2023 and 81.26% on DoHBrw2020, confirming that high-frequency components are the primary discriminative signal in encrypted traffic anomaly detection.
-
FreeUp operates under a zero-positive (unsupervised) learning paradigm — trained exclusively on normal traffic with no labeled anomaly examples — yet achieves 95.53% AUC on Tor traffic and 85.44% AUC on DNS-over-HTTPS tunneling detection. This demonstrates that frequency-aware anomaly detectors generalize to novel circumvention protocols without requiring any labeled attack data, eliminating the labeling bottleneck that previously limited ML-based censorship detection.
-
When the researchers attempted to use Gemini 2.5 Flash as a third independent LLM judge via its API for evaluating moderation decisions, Gemini automatically blocked all judging attempts citing safety reasons. This occurred even though the research task (judging whether a response is more or less moderated) does not itself produce harmful content. The incident illustrates that LLM safety systems can over-block legitimate research use cases, and that different LLM providers have different thresholds— Claude Haiku 4.5 and GPT-4o completed all judging tasks without safety refusals.
-
Category-level analysis of 100 statements across 5 sensitive content categories found that interface-based moderation gaps vary significantly by topic. Sexuality showed the strongest WebUI/API gap (WebUI 7.0× more likely to be moderated than API per GPT-4o judge for Gemini). Political ideology followed at 2.0×, then hate speech at 1.0×. Miscellaneous offensive topics showed the inverse pattern (API more moderated at 0.3×). Religious content showed WebUI moderation with no API moderation. The pattern suggests public-facing WebUI interfaces prioritize reputational risk management for high-scrutiny categories.
-
API and WebUI interfaces show statistically significant response length differences in opposite directions across models. Gemini API responses averaged 2,333 characters vs. 1,746 for WebUI (34% longer API; t=5.028, p<0.0001, Cohen's d=0.50). ChatGPT WebUI responses averaged 2,752 characters vs. 1,389 for API (98% longer WebUI; t=-9.800, p<0.0001, d=-0.98). The divergent direction across models suggests fundamentally different generation parameters rather than simple post-hoc filtering, indicating architectural or policy-level differences at the provider level.
-
An empirical study of 100 sensitive statements tested on Gemini (2.5 Flash) and ChatGPT (GPT-5) found that WebUI interfaces are systematically more restrictive than their API counterparts. According to GPT-4o judge: WebUI was moderated 18% of the time vs. 9% (Gemini API) and 13% (ChatGPT API). DeBERTa classifier found 82% of WebUI responses moderated vs. 58% of API responses. The Gemini WebUI:API ratio ranged from 2.0:1 (GPT-4o) to 7.0:1 (Claude), and ChatGPT from 1.4:1 (GPT-4o) to 15.6:1 (Claude). Neither Google nor OpenAI discloses these interface-specific policies.
-
Strong classifiers can be trained from fewer than one third of available traces with gains diminishing rapidly beyond that threshold. At inference time, macro F1 rises sharply within the first 40% of observed actions across all four datasets, meaning model identity can be inferred while the agent is still actively navigating the page.
-
Passive JavaScript UI traces are sufficient to fingerprint the underlying LLM of a browser agent with up to 96% macro F1 across 14 frontier models, achieving roughly 10× random-chance accuracy. Even the weakest model pair (Qwen3.5-9B on 2WikiMultiHopQA) reaches 63.7% F1 against a ~7% random baseline for 14 classes.
-
In open-set fingerprinting (leave-one-agent-out protocol), the majority of models exceed AUROC 0.60 for unknown-agent detection, but closed-set and open-set performance are dissociated: Seed-2-lite achieves 96.1% closed-set F1 yet scores below-chance open-set AUROC (0.38–0.47 on three of four datasets), while GPT-5.4 achieves AUROC 0.84 open-set despite ranking third in closed-set F1.
-
SHAP analysis shows timing-based features — IEI standard deviation, mean click IEI, and time to first action — dominate agent identity classification under normal conditions, receiving substantially larger attributions than structural or action-type features. Agents are distinguishable primarily by their tempo: how long they pause before acting and how variable that pause is.
-
Injecting uniformly sampled random delays between agent actions substantially degrades an unadapted XGBoost classifier, but a classifier retrained on delayed traces largely recovers performance across all four datasets. Under 5-second delay injection, the classifier shifts weight onto structural features (click-coordinate dispersion, structural key ratio, link-click ratio) that survive timing perturbation.
-
At operationally realistic base rates—1 million connection pairs per hour with only 10 true stepping-stone chains—a detector with a 1% FPR generates approximately 10,000 false alarms per hour while correctly flagging all 10 intrusions, making classical statistical methods (which cannot reach FPR ≪ 10⁻²) operationally unusable; deep learning methods must target FPR ≤ 10⁻³ to be viable.
-
ESPRESSO achieves only TPR 0.132 at FPR ≤ 10⁻³ in network-mode for DNS-tunneled traffic—near chance—compared to TPR 0.992 for SSH traffic at the same threshold. The paper attributes this to the polling-based communication mechanism of dnscat2, which disrupts the timing patterns that interval-based flow correlation relies on.
-
ESPRESSO, a deep learning flow correlator combining a transformer backbone with time-aligned interval features and online triplet mining, achieves TPR >0.99 at FPR ≤ 10⁻³ for SSH, SOCAT, and ICMP stepping-stone traffic in network-mode detection, versus DCF's TPR of 0.320–0.956 across those same protocols at the same threshold. On the harder mixed-protocol dataset in network-mode, ESPRESSO achieves TPR 0.748 at FPR ≤ 10⁻³, more than double DCF's 0.334.
-
Ablation experiments show that replacing ESPRESSO's transformer backbone with a CNN ('Modified DCF') while retaining time-aligned interval features achieves performance competitive with the full ESPRESSO model across most protocols (e.g., SOCAT network-mode pAUC 0.997 vs. 0.989 at FPR ≤ 10⁻³), demonstrating that the time-interval feature representation—not the transformer architecture—is the primary driver of correlation accuracy.
-
A systematic robustness evaluation found that ESPRESSO is highly robust to packet padding alone but that even modest artificial timing jitter causes significant performance degradation, identifying timing-based perturbations as the primary vulnerability of correlation-based stepping-stone (and by extension, anonymity-network) detectors.
-
CAPTCHAs co-occurred with 'Resource Inaccessible' in 70% of CAPTCHA reports and appeared in 23% of all 'Resource Inaccessible' reports; overall 14% of the 119 reports involved one or both problems. Two CAPTCHA failure modes were identified: excessive repetitive CAPTCHAs and broken CAPTCHA servers that made the underlying website permanently inaccessible. The 'Unusual traffic detected from your computer' Google error appeared in 5% of all reports.
-
17% of ReporTor reports cited broken content; investigation found that several websites returned HTTP 403 errors through Tor Browser but loaded normally in Firefox, revealing deliberate differential treatment of Tor traffic masquerading as technical failure. Blocked resources included advertising platforms (e.g., t.co) and JavaScript files handling cookie-consent dialogs, and 8% of reports involved authentication failures where initial page load succeeded but subsequent auth steps were silently refused.
-
Two mechanistically distinct blocking categories account for Tor exit-node inaccessibility: explicit blocks (deliberate CDN/WAF configuration, e.g., Akamai Bot Manager renders AirBnB inaccessible over Tor) and dynamic blocks (abuse-detection systems that flag Tor exit-node IPs because pooled traffic from diverse users raises apparent abuse scores, triggering rate-limiting or blocking despite no explicit Tor policy). Cloudflare does not block Tor by default, but its aggressive IP scoring results in disproportionate blocking in practice.
-
'Resource Inaccessible' was the most frequently reported issue (61% of 119 submitted reports) during a month of naturalistic Tor Browser browsing, followed by CAPTCHAs (18%), Broken Content (17%), Other Issues (13%), and Timeouts (5%). These categories document the operational failure modes that degrade everyday Tor Browser usability beyond protocol-level censorship.
-
Ephemeral defenses were integrated with a WireGuard fork and deployed as Mullvad VPN's 'DAITA' (Defense Against AI-guided Traffic Analysis) opt-in feature across Android, iOS, macOS, Linux, and Windows for over one year, serving a growing number of thousands of daily users. Individual defenses are derived deterministically from seeds in 43.6 ± 4.7 ms on a commodity laptop, making per-connection unique defenses practical at VPN scale.
-
Ephemeral blocking defenses reduce DF accuracy from 89.0% (undefended) to 10.2% and RF from 90.1% to 14.7% with standard 30-epoch training, at 97.5% bandwidth and 68.4% delay overhead; under infinite training, DF rises to only 29.2% and RF to 24.3%, still far below undefended baselines of 92.7% and 94.7%. Defenses are tunable at deployment time by adjusting Maybenot framework-wide limits, enabling overhead-vs-protection trade-offs without redeployment.
-
The ephemeral property — using a unique seed-derived defense per connection — prevents attackers from training classifiers on the exact deployed defense variant. Stacked combinations with height H=5 from N=1,000 base defenses yield 6.88×10^25 unique defenses (polynomial growth O(N^{2H})). Attacks trained on ephemeral defenses also generalize significantly better across other randomized defense families than attacks trained on static defenses.
-
Padding-only defenses that inject bursty traffic cause severe additional delay under realistic network bottlenecks: Break-Pad's delay overhead increases from 0% to 332.6% and FRONT's from 0% to 111.2% under a per-trace simulated PPS bottleneck. Even ephemeral padding defenses induce 43.9% delay overhead under bottleneck conditions, compared to 0% without a bottleneck, due to congestion from dummy packets.
-
With infinite training time, Laserbeak achieves 93.5%, 95.9%, and 95.9% accuracy against ephemeral padding, FRONT, and Interspace respectively, compared to 96.5% undefended — confirming that padding-only defenses provide no meaningful protection against a sufficiently trained deep-learning WF adversary. Only ephemeral blocking defenses retain measurable protection, reducing Laserbeak to 71.8% accuracy under infinite training versus 96.5% undefended.
-
MIRAGE's differentially private routing function provably bounds adversary inference: for a routing protocol satisfying ε-DP with ε = ln(4), any hypothesis test achieving a true positive rate of 80% necessarily incurs a false positive rate of at least 20%. The TPR-to-FPR ratio is bounded by e^ε for any ε-DP routing function, providing a formal privacy guarantee against routing-level statistical disclosure attacks.
-
Embedding explicit TTL values in mesh-routed messages leaks proximity information — a recipient can infer that a high-TTL message originator was recently nearby. MIRAGE mitigates this with memoryless TTLs: carriers independently discard messages with probability q per epoch, implementing a branching process with replication factor R ≤ nmax·(1−q). Setting q > 1 − 1/nmax ensures sub-critical message extinction with expected lifetime ≈ −ln(nmax)/ln(R) epochs.
-
MIRAGE delivers 15× more messages than random-walk protocols and significantly outperforms probabilistic flooding in delivery rate. On the pedestrian YJMob100K dataset at p=0.6, MIRAGE achieved a delivery rate of 36.9%, compared to 9.1% for probabilistic flooding (4.1×) and 3.2% for handoff (11.5×). MIRAGE incurs substantially lower network load than maximal flooding (86.9% delivery) while maintaining better delivery rates than all non-flooding baselines.
-
The PPBR (probabilistic profile-based routing) protocol leaks user community membership through observable routing decisions: in a controlled experiment with 800 majority and 200 minority users, a statistical disclosure attack achieved a true positive rate of 100% and false positive rate of 0% when identifying minority users. Even under a conservative PPBR configuration (top 1/3 fraction acceptance), the attack achieved 100% TPR and only 0.4% FPR.
-
MIRAGE constructs a global mobility graph using locally differentially private per-user submissions, requiring only O(ln(|M|/β) / (α²ε²)) users to achieve per-edge accuracy α with probability 1−β. For a 100-district map with ε=0.05 and α=0.5, fewer than 1 million users suffice for top-2 district reporting; for top-3 districts the requirement drops to under 200K users.
-
CNN-based passive traffic analysis failed to deanonymize I2P services when transferred from a controlled lab to the public I2P network. Lab-trained models produced mostly unusable results: the 'Without port' variant misclassified Class 2 packets at 71.6–88.4× the true count, and the 'Without payload' variant was only marginally better (12.8–13.2× false positives), demonstrating that lab-learned patterns do not generalize to real-world I2P traffic.
-
Fano's inequality establishes a theoretical lower bound on deanonymization error probability as a function of anonymity set size |Θ|, prior uncertainty H(X), and mutual information leakage I(X;Y). For a network of N sufficiently large nodes with uniform routing, Pe ≥ (log N − 1) / log(N−1), approaching 1 (perfect anonymity). The authors found that closed-form estimation of I(X;Y) from I2P traffic features was analytically intractable, requiring ML approximation — and that ML also failed in practice.
-
Applying Fano's inequality, the paper proves Pe ≥ (H(X)−1)/log|Θ|, showing that deanonymization error rate approaches 1 (perfect anonymity) when the anonymity set |Θ| is large and mutual information leakage I(X;Y) between observed traffic Y and target identity X is minimized. A uniform default tunnel length of 3 hops across all nodes, for example, contributes no differential leakage because p(y=3)=1, illustrating that standardized network parameters reduce identifiability.
-
Lab-trained CNN models completely failed to generalize to real public I2P network traffic: the 'without payload' variant produced 12.8–13.2× more false positives for the target service class than ground-truth packets actually existed (Table VIII), rendering all models forensically unusable. The authors conclude that heterogeneity and dynamism of real-world I2P traffic prevents lab-derived classifiers from achieving practical deanonymization.
-
I2P payload entropy is close to 8 bits per packet (Figure 9), confirming strong encryption that renders payload content analytically unusable. Across all CNN experiments, models trained on payload data alone achieved 72.5–76.5% accuracy versus 95.17–99.5% for metadata-only variants; encrypted payload acted as 'noise that confused the model' rather than as a signal.
-
I2P payload entropy was measured at close to 8 bits per byte across sampled packets (Figure 9), confirming that payload content is cryptographically indistinguishable from random noise and provides no usable signal for classification. All experimental variants using raw payload alone achieved poor and high-variance accuracy (72.5–76.5%), while excluding payload improved accuracy to 99.5% in lab conditions.
-
Unsupervised k-Means clustering over I2P flow features (port, payload length, protocol) found no natural cluster structure: distortion decreased nearly linearly with k up to k=20 with no elbow, indicating I2P traffic lacks the simple separable patterns that enable clustering-based traffic classification. The 435-packet dataset yielded one cluster of 75 and clusters as small as 3, with no forensically useful groupings.
-
Unsupervised k-Means clustering on I2P traffic features (port, payload length, protocol type) produced no natural cluster structure — distortion decreased almost linearly with k showing no elbow point — confirming that I2P's obfuscation successfully destroys simple separable patterns that shallow classifiers rely on. CNNs were required to detect any signal at all.
-
Under controlled lab conditions, a CNN trained on packet metadata (ports, sizes, TCP sequence numbers) achieved 99.5% accuracy classifying I2P packets with the 'Without payload' variant, versus only 72.5–76.5% using encrypted payload alone. However, when applied to the full recorded dataset, the 'Without payload' model's accuracy for the dominant irrelevant-traffic class dropped to 95.17% while maintaining 100% on target-class packets — but with a high false-positive rate making it forensically unreliable.
-
CNN models trained on I2P lab traffic achieved 99.5% validation accuracy using metadata alone (packet sizes, ports, TCP sequence numbers) versus only 72.5–76.5% accuracy when using encrypted payload only. This demonstrates that packet metadata is far more discriminating than payload content for traffic classification in encrypted anonymity networks.
-
Joint multi-task training with a combined loss L_joint = L_site + λ·L_pers shows that increasing λ from 0 to 2 raises mixed-site persona accuracy from approximately 45% to approximately 80% while website accuracy declines only from approximately 90% to approximately 75%, demonstrating a wide regime where an attacker can gain strong persona inference at modest cost to existing WFP capability.
-
Using only 1,000-packet windows of signed packet lengths and inter-arrival times (no payload, no URLs, no cookies), a passive adversary achieves approximately 84% accuracy at inferring behavioral persona in a mixed-site open-world setting spanning 10 modern websites and 15 canonical personas plus an open-world class. Per-site persona macro-F1 typically ranges from about 0.78 to 0.91 across representative platforms including Bilibili, eBay, Yahoo, Zhihu, and LinkedIn.
-
In open-world evaluation, an average of 34.4% of traffic from unseen personas is misattributed to a specific canonical persona (MisAttr@OW), and misattributions concentrate heavily: on average 58.7% of misattributed windows fall into just the top-3 canonical personas, rising to 66.8% and 69.3% on Bilibili and Zhihu respectively. The classifier correctly rejects unseen personas as OW with an average F1 of only 65.6%.
-
Attack accuracy scales steeply with persona-labeled training data: mixed-site open-world persona accuracy rises from 55.0% at 500 windows/persona to 65.0% at 1,000, 76.0% at 2,000, and 84.0% at 5,000 windows/persona across 10 sites (results consistent across 3 random seeds with std ≤1.0%). LLM-driven browsing agents make large-scale persona-labeled traffic generation practical for adversaries.
-
A site-only WFP encoder trained without any persona labels already encodes substantial persona information: attaching a lightweight MLP probe to its frozen representations recovers persona accuracy roughly 20–30 percentage points above a random-encoder baseline across all 10 sites (e.g., approximately 53% vs. 21% on Amazon, 49% vs. 27% on YouTube, using the same probe architecture and training budget).
-
Re-testing in 2025 on a Pixel 10 Pro XL running Android 16 with October 2025 security updates confirmed that blind in/on-path VPN inference attacks remain fully viable despite CVE-2019-9461, CVE-2019-14899, and CVE-2024-49734 having been formally closed. All three core attack primitives—VPN-assigned internal IP discovery, active connection inference, and TCP reset injection via sequence/acknowledgment window scanning—succeeded across OpenVPN, WireGuard, and NordLynx.
-
Six widely deployed VPN and circumvention tools—OpenVPN, WireGuard/NordLynx, NordWhisper, Orbot (Tor on Android), Lantern, and Psiphon—all failed to block internal IP inference, connection-state detection, and TCP reset injection under identical adversarial conditions on fully patched Android 16. Application-layer obfuscation in Lantern and Psiphon did not prevent TCP-layer disruption; Orbot's VPN-style encapsulation of Tor traffic was bypassed via the same tunnel-level side channels.
-
The CVE system is structurally incapable of tracking cross-vendor architectural vulnerabilities: in 2019 MITRE correspondence the authors were told CVE identifiers apply only to specific software implementation mistakes and that CVE-2019-14899 'should not have been assigned,' leaving the architectural VPN inference attack surface permanently untracked. Between CVE-2019-14899 (2019) and CVE-2024-49734 (2024), no new CVE was assigned despite continued reporting and confirmed exploitability, creating a five-year gap in the public record during which vendor patch claims went unchallenged.
-
The paper proposes an Internet Freedom vulnerability registry with five design principles: persistent cross-vendor tracking under shared identifiers (e.g., IF-ARCH-2025-001) as long as a risk remains reproducible; human-centered impact ratings anchored to harm potential for journalists and dissidents rather than CVSS-style exploitability scores; timestamped re-verification hooks with linked PCAPs and minimal reproduction scripts; a structured media interface to counter vendor narrative capture; and open public APIs for integration into risk dashboards so that users of tools like Orbot or Lantern can directly query their configuration's exposure to known metadata-based attacks.
-
The server-side variant of the blind VPN inference attack—where an in/on-path adversary exploits predictable NAT assignment and tunnel routing semantics to inject spoofed packets indistinguishable from legitimate encrypted traffic—has remained unacknowledged and unmitigated across all tested platforms since its concurrent disclosure in 2019. Unlike the client-side variant, which received partial fixes from Google (CVE-2019-9461, CVE-2024-49734) and Apple (iOS 17.2.1), no vendor has proposed a viable remediation or claimed ownership of the server-side attack surface.
-
Obscura's browser-to-browser (B-B) WebRTC connections produce DTLS ClientHello and ServerHello messages indistinguishable from genuine browser traffic: across 100 captured handshakes compared against Facebook Messenger, Google Meet, Discord, and a reference WebRTC app using the dfind tool, no unique identifiers were found in C-C connections, and the sole Firefox-specific fingerprint (ServerHello length 86 bytes, cipher TLS_AES_128_GCM_SHA256, extension field length 46 bytes) matches the default Firefox WebRTC profile — meaning blocking it would also block all legitimate Firefox WebRTC users.
-
A differential degradation attack (DDA) that selectively drops RTP packets carrying the last packet of a video frame — exploiting the fact that a single lost packet causes the entire encoded frame to be discarded — reduces Protozoa's covert throughput to single-digit KBps at 1920×1080 with 15% frame loss and at 426×240 with 50% frame loss, while maintaining acceptable video quality for legitimate WebRTC traffic.
-
Under baseline conditions (0% packet loss, no bandwidth constraint, 115 ms RTT), Obscura achieves average throughputs of 1.79 Mbps for Firefox-to-Firefox, 1.49 Mbps for Chrome-to-Chrome, and 1.32 Mbps for Pion-to-Pion connections; P-P connections collapse when the 2 Mbps target video bitrate exceeds the 1500 Kbps bandwidth constraint, while C-C connections remain usable at 10% packet loss with an average of 460 Kbps.
-
Empirical evaluation against nine major commercial VPN providers found all five tested connection tracking frameworks (Linux Netfilter, FreeBSD PF, IPFW, IPFilter, natd) and eight of nine providers vulnerable to at least one session manipulation attack, resulting in 19 assigned CVEs/CNVDs.
-
DNS hijacking via shared VPN NAT is feasible because the full 16-bit TxID space (up to 65,536 values) can be brute-forced in an average of 4.27 seconds, well within a typical 10-second DNS request timeout; browser DNS cache windows range from 60 seconds (Chrome/Edge) to 660 seconds or more (Firefox), with longer windows enlarging the injection race window.
-
A co-tenant attacker sharing the same VPN server can launch a port-exhaustion DoS in an average of 4 seconds with over 90% success rate, inject forged HTTP responses in 64.11 seconds at a 66.7% success rate, and hijack DNS responses at success rates of 20% to 70%.
-
When a VPN server uses Port Preservation for NAT, a co-tenant off-path attacker can infer another user's externally mapped source port by sending probe SYN packets with guessed ports through the tunnel and spoofed SYN/ACK verification packets outside the tunnel; confirmation comes from observing which port the VPN server forwards the response to, enabling targeted TCP session hijacking.
-
Spoofed TCP RST packets with sequence numbers stepped at 60,000-unit intervals sent outside the VPN tunnel can evict a victim's ESTABLISHED session entry (timeout drops from 432,000 s to 10 s in Netfilter pre-patch); approximately 71,000 RST packets suffice and can be sent in seconds on modern hardware. Controlling RST TTL to equal the hop count to the VPN server bypasses the RFC 5961 challenge-ACK countermeasure.
-
A plug-and-play Boundary Preserving Aggregation Module (overlapping window partitioning with joint packet- and burst-level features, W=20ms, stride=10ms) consistently improves existing WF baselines without architectural modification: applied to DF, AUC rises from 0.780 to 0.901 and P@5 from 0.315 to 0.545; applied to ARES'25, P@5 rises from 0.869 to 0.900 in the open-world 5-tab setting. The module's consistent gains across all three tested baselines confirm that fixed non-overlapping window segmentation is a structural vulnerability in prior WF pipelines.
-
DEMUX achieves a P@5 of 0.943 and MAP@5 of 0.961 in the closed-world 5-tab multi-tab website fingerprinting setting, outperforming the strongest prior baseline (ARES'25) by 9.2 and 6.2 percentage points respectively. ARES'25's P@K degrades from 0.900 at 2-tab to 0.851 at 5-tab (a drop of 4.9 pp), while DEMUX improves from 0.926 to 0.943 over the same range, expanding the absolute margin from 2.6 to over 9 points.
-
In the open-world 5-tab setting — where each trace contains one unmonitored site, substantially increasing noise and class imbalance — DEMUX achieves AUC of 0.998, P@5 of 0.951, and MAP@5 of 0.966, while ARES'25 achieves 0.988/0.869/0.911. DEMUX's advantage widens in the open-world setting (the P@5 gap grows from 2.6 pp to 8.2 pp versus closed-world), confirming that state-of-the-art WF attacks are not defeated by open-world conditions or unmonitored co-browsing traffic.
-
The Traffic Aggregation Matrix (TAM) representation used by the RF baseline — which counts directional packet counts over fixed time slots rather than tracking per-packet sequences — shows unexpectedly strong robustness under TrafficSliver, achieving P@2 of 0.702, substantially exceeding all other CNN-based methods under that defense. Var-CNN similarly achieves P@2 of 0.826 under TrafficSliver despite mediocre no-defense performance, suggesting that tolerance to partial packet loss is architecturally separable from peak single-observer accuracy.
-
Under the TrafficSliver defense — which splits traffic across multiple Tor entry nodes so no single observer sees more than a partial fraction of packets — TMWF collapses to a P@2 of 0.399 and ARES'23 to 0.429, while DEMUX retains a P@2 of 0.940, exceeding the next-best competitor by 2.5 points. WTF-PAD and FRONT are substantially weaker defenses, with most methods maintaining near-baseline performance under WTF-PAD.
-
A Tor relay serving as a Bento server and hosting a CenTor instance for 10,000 simultaneous clients experienced only approximately 0.4% performance degradation compared to a vanilla relay; running 20 concurrent CenTor instances on a single relay caused roughly 1% degradation. Shadow-aware routing in a low-relay-density region (US, 933 of 6,666 relays in shadow) produced 0.7% relay degradation while delivering 3.6% client performance improvement.
-
CenTor, a CDN for Tor onion services deployed as a Bento network function, achieved approximately 56.4% reduction in download time compared to standard Tor under a geographically-aware shadow configuration, while location-unaware CenTor (reducing hops without shadow restriction) achieved a 34.5% reduction. The key mechanism is eliminating the server-side 3-hop circuit by deploying replicas in non-anonymous mode, cutting the default 6-hop onion service path to 3 hops.
-
CenTor protects origin onion service operators from DoS and deanonymization by routing all client traffic through geographically distributed Bento replicas running inside SGX-based Trusted Execution Environments (TEEs). The original operator can go fully offline after deploying static content; replicas enforce confidentiality and integrity of hosted content with ephemeral per-enclave encryption keys, preventing malicious Bento node operators from inspecting or modifying content even if they control the underlying hardware.
-
CenTor's anonymity scoring function quantifies the privacy cost of geographic shadow selection using six parameters (client density, AS-level and country-level entropy, relay density, exit density, guard density). Prior work establishes that reducing the client anonymity set by 20x—retaining at least 5% of total Tor users—still provides strong anonymity; accordingly, CenTor recommends minimum thresholds of CD, EL, EC ≥ 0.05 and RD, ED ≥ 0.2 for safe shadow operation.
-
Interactive communication on Tor incurs latencies more than 5x greater than direct Internet paths; onion services compound this by creating a 6-hop circuit (3 client-side plus 3 server-side). Shadow simulations at 100% Tor network scale (752,338 active clients, 6,666 nodes) showed that deploying 3,500 and 6,000 Bento servers caused only 4.4% and 9.6% client-side performance degradation respectively, demonstrating that programmable middlebox overlays are feasible at Tor scale.
-
Simulating a shift from a 0% to 100% male dataset sample changes Shannon entropy estimates by more than 10% for User-Agent (downward) and more than 68% for WebGL Renderer (upward), revealing that prior large fingerprinting studies — Panopticlick (83.6–94.2% unique, predominantly reached via tech-oriented channels) and AmIUnique (90% desktop unique) — likely misrepresent real-world risk due to uncontrolled male bias, as confirmed by a directly comparable study showing 76.5% male participants.
-
Approximately 60% of users in the 8,400-participant US dataset had a unique overall browser fingerprint when combining 13 standard attributes, matching FingerprintJS's advertised 60% accuracy. Fingerprinting risk followed strict monotonic trends: uniqueness increased with age (65+ group most at risk) and decreased with income (household income under $25,000 group at greatest risk), while males showed more unique overall fingerprints but females showed higher uniqueness on passive-fingerprintable attributes (User-Agent, Languages).
-
A simple three-hidden-layer MLP trained on only 13 standard browser attributes achieves AUROC above 0.5 for every tested demographic group: gender 0.663–0.679, age 55+ 0.644, Hispanic ethnicity 0.60, Asian race 0.698, Black race 0.677, and high-income bracket 0.617. Because the model used only attributes already collected by mainstream fingerprinting scripts (e.g., FingerprintJS), richer real-world attribute sets would yield substantially higher demographic inference accuracy.
-
User-Agent and Accept-Language browser attributes are transmitted in HTTP request headers, enabling passive server-side fingerprinting without JavaScript execution or any browser-detectable signal. In the 8,400-user dataset, the Languages attribute placed Hispanic users (who represent only 11% of the sample) among more than 45% of users with 'es-US' as their Languages value, substantially reducing their anonymity set size versus the general population.
-
Screen resolution (572 distinct values, 4.5% unique, entropy 5.51), WebGL Unmasked Renderer (654 distinct values, 3.2% unique, entropy 6.833), and User-Agent (434 distinct values, 2.8% unique, entropy 4.613) are simultaneously the most uniquely identifying individual attributes and the strongest demographic predictors by normalized mutual information across all five demographic categories tested (gender, age, income, Hispanic ethnicity, race).
-
The Ahmia search engine provided the most onion addresses (18,069 in a single day, ranging 18,000–22,000 week-to-week), outperforming five other sources combined (36,028 total across six engines). However, Ahmia's intentional exclusion blacklist contains 46,000+ hashed addresses, and crawling onion services for 20 days yielded 48,745 unique v3 addresses, 11,809 of which were on Ahmia's blacklist — meaning any index-based collection systematically misses a significant share of the onion ecosystem by design.
-
Combining six onion search engines/repositories plus clearnet search engines, Tor2web-style DNS leakage, and 20 days of self-run crawling (2.9 million pages), the authors assembled 482,614 unique v3 onion addresses — the largest known collection. Verifying against HSDir blinded public keys showed the collected addresses accounted for 25% of observed blinded keys but were responsible for 66% of all successful service descriptor downloads, confirming a heavy-tailed usage distribution.
-
Drivel is an obfs4-style fully-encrypted proxy protocol that replaces obfs4's pre-quantum cryptographic primitives with post-quantum alternatives. It is one of the first circumvention protocols explicitly designed to remain secure under a quantum adversary, addressing the forward-secrecy threat to deployed circumvention traffic recorded today for future decryption.
-
Most deployed circumvention protocols (obfs4, Shadowsocks, Trojan, VMess, etc.) still rely on pre-quantum primitives (X25519, AES-GCM, ChaCha20). Drivel is the first published treatment of how to perform this migration in the specific context of a fully-encrypted pluggable transport, providing a design template and security analysis that does not exist elsewhere in the circumvention literature.
-
InterSecLab frames the Geedge/TSG export program as the commoditization of national firewall capability: rather than each censor state independently developing detection infrastructure, they contract Geedge for a turnkey system incorporating the cumulative R&D of MESA Lab (>10 years, National Science and Technology Progress Award winners). This structural shift means the marginal cost for an autocratic government to acquire GFW-grade censorship is now a procurement decision, not a multi-year engineering program. The report identifies that Geedge's relationship with the MESA Lab gives customer states indirect access to ongoing academic R&D improvements, not just a static product.
-
Amigo introduces a decentralized continuous key agreement protocol and novel routing scheme for secure group mesh messaging over short-range radio (Bluetooth/ Wi-Fi Direct) when governments disable the Internet during protests. Extensive simulations demonstrate that prior approaches fail to scale to realistic protest environments that have high link churn, physical spectrum contention, and dense mobility — Amigo's protest-specific optimizations address these but also reveal that scaling to protests with thousands of participants remains an open challenge.
-
Simulations show that previous secure mesh messaging systems fail to provide efficient private group communication under realistic protest conditions — specifically high node mobility, link churn, and RF spectrum contention — conditions that prior work did not evaluate. Bridgefy, the most widely deployed protest mesh app, was broken cryptographically in 2021 and 2022, and even its successor designs lack the scalability needed for protests with thousands of participants.
-
In a 600-node simulation on a 25×25 grid representing a city-wide blackout environment, Anix messages reached over 90% of users within 23 simulation steps (~23 hours) even when adversarial Sybil nodes composing 2% of the network refused to forward messages authored by legitimate users. The simulation modeled a 5-day blackout with 120 one-hour steps.
-
Anix provides two cryptographically distinct identity revocation primitives: soft revocation rotates a user's identity key pair and re-notifies only the retained subset of trusted contacts via encrypted unicast, silently excluding the revoked party; hard revocation broadcasts a signed certificate containing the compromised public key components, instructing all contacts to reject both the revoked identity and any downstream identities produced through subsequent soft revocations.
-
Bridgefy included both sender and receiver long-term identifiers on every message; Albrecht et al. found this unsafe and the deployed security upgrades proved insufficient, leaving Bridgefy unable to provide anonymity. Firechat similarly transmits long-term public user IDs with every message, uniquely identifying accounts to every recipient in the mesh.
-
Standard ECDSA signature schemes are vulnerable to public key recovery attacks that allow an adversary to recover the signer's public verification key from any signature, linking all pseudonymous messages authored under different one-time pseudonyms back to a single user identity. This attack succeeds without any side-channel — it operates solely on the message and its ECDSA signature.
-
Rangzen's transitive trust scheme suffers from two structural defects: diminishing trust (each relay hop multiplicatively reduces a message's trust score, degrading trustworthy messages from distant authors) and path dependency (the same message accrues different trust scores depending on which route it traveled, making scores incomparable across recipients). These defects prevent any user from gauging network-wide endorsement of a message.
-
In 24-hour live proxy deployments, covertDTLS mimicry had a 18.2% DTLS handshake failure rate (vs 12.5% baseline, 27.0% randomization, 25.8% Chrome webextension). Randomization generates ≈994 billion unique fingerprint permutations (cipher shuffling: 109,600; extension shuffling: 994,218,624,000), making blocklist-based fingerprinting infeasible, but at the cost of higher connection failures due to cipher mismatches. Mimicry of DTLS 1.2 was stable and effective; DTLS 1.3 mimicry is not yet achievable with the current Pion library.
-
The DTLS ClientHello extensions field is the most prominent feature for fingerprinting Snowflake's Pion WebRTC stack. A passive DPI tool (dfind) validated against the MacMillan et al. dataset of 6,500 DTLS handshakes reliably identifies Pion-based implementations via unique extension byte patterns. Chrome randomized its extension list order starting with version 129.0.6668.58 (September 2024), yielding 6! = 720 unique permutations and hardening it against deterministic matching. Firefox adopted DTLS 1.3 by default from version 127 (May 2024), which changes the extension structure entirely and renders DTLS 1.2 mimicry obsolete for Firefox traffic.
-
Firefox adopted DTLS 1.3 by default for WebRTC in May 2024 (version 127); Chrome has implemented DTLS 1.3 in BoringSSL but not yet enabled it by default. DTLS 1.3's Encrypted Client Hello (ECH) extension would encrypt extension lists and make passive field-based fingerprinting of those extensions obsolete — but censors may choose to block DTLS 1.3 ECH unless browsers adopt it widely enough that blocking causes unacceptable collateral damage. The Pion library (used by Snowflake standalone proxies) has no concrete roadmap for DTLS 1.3 support, creating a growing gap.
-
Beyond business-filing cross-references, the paper introduces a method of linking VPN provider families by showing they share VPN server cryptographic credentials (Shadowsocks passwords, server TLS fingerprints) across distinct app identities. This extends prior ownership-attribution methods that relied solely on corporate registry data and code similarity, adding shared live infrastructure as a linkage signal that is harder for operators to obscure.
-
Three families of VPN apps with combined Google Play download counts exceeding 700 million share not only common ownership but hardcoded cryptographic credentials, including Shadowsocks passwords embedded in their APKs. An attacker who extracts these hardcoded passwords can passively decrypt all traffic of users of these apps. Business filing and APK analysis linked the families to the same operators; one previously-identified family (Innovative Connecting / Autumn Breeze / Lemon Clove) had already been linked to the People's Liberation Army.
-
Of 640,694 TLS 1.3 servers in the Tranco Top 1M (Feb 2025), 51.28% parse ECH extensions but only 43% actually handshake ECH — and virtually all of those are Cloudflare servers (278,040). Only 6 non-Cloudflare servers successfully handshaked ECH. Cloudflare's own servers have a 44% non-advertisement rate: servers that can handshake ECH but do not publish their ECH configuration in DNS, typically because the operator manages their own DNS outside Cloudflare. The total number of advertised ECH configurations dropped from ~180,000 in November 2024 to ~150,000 by April 2025.
-
Censorship classifiers and traffic analysis attacks consistently exploit the initial seconds of a proxy connection, where packet-size, inter-arrival-time, and burst features are maximally discriminative. Cited work demonstrates that website fingerprinting classifiers trained solely on the first few seconds of Tor traffic achieve high accuracy, and real-world GFW detection of fully-encrypted protocols also targets early-connection bytes.
-
The framework confines active traffic shaping to the first N seconds of a connection (N is a user-defined parameter, e.g., N=10), after which normal unmodified traffic resumes. The authors hypothesize that this design keeps per-session throughput and latency overhead negligible, since the shaping window is a small fraction of total connection time; N can be extended to the full session if the censor is believed capable of classifying beyond early traffic.
-
The framework's GAN-based schedule generator trains on short session windows (e.g., the first 10 seconds) of real browsing traffic from the Tranco Top 1000 sites, learning joint distributions of packet sizes, inter-arrival times, and burst patterns to produce realistic synthetic schedules. This repurposes GAN architectures previously used for traffic analysis (e.g., GANDaLF) as a defense-side cover-traffic generator.
-
The proposed framework operates as a transparent shim between application and network layers, enforcing a configurable schedule over packet size, timing, and burst patterns. The shaping logic is transport-agnostic — applicable across TCP, UDP, QUIC, and TLS — and activates only after the underlying protocol handshake completes, making it reusable across heterogeneous circumvention stacks.
-
The framework is designed for adoption into existing censorship-resistant systems in the same manner as uTLS — as a drop-in Go library requiring minimal code changes. Primary integration targets are Tor pluggable transports and WireGuard-based VPNs that currently lack built-in traffic obfuscation. Predefined hand-crafted schedules are provided alongside GAN-generated ones to enable developer stress-testing without model inference.
-
Security arguments for existing circumvention systems are based on ad-hoc adversary models that are often incomplete or unrepresentative of real-world adversaries, leading to allegedly secure designs that fail against relatively straightforward attacks. Protocols that substitute or parasitize a cover application's encrypted traffic channel fail against application-aware adversaries who observe or induce violations of application-specific behavioral invariants — a weakness that pre-trained classifiers on custom traces fail to surface.
-
A machine-checked EasyCrypt proof demonstrates that a conjunctive SNI + traffic-profile adversary achieves a true positive rate of 1.0 against meek, with a false positive rate bounded by Pr[Game0(MeekEnc).main()=true] ≤ (1/10000) × (1/1000) ≈ 10⁻⁷, under the assumption that meek traffic follows a normal distribution centered at 512 bytes and background traffic a Poisson-like distribution centered at 1024 bytes. The proof is fully machine-checked in EasyCrypt.
-
The blocking-resistance of CenPush derives from the collateral damage a censor would incur by blocking APNs or FCM: doing so would break push notifications for every app on iOS or Android respectively. This is the same collateral-damage deterrent mechanism that makes CDN-based domain fronting and TLS-over-CDN transports resilient, applied to the control plane rather than the data plane.
-
CenPush uses mobile platform push-notification services (APNs, FCM) as a blocking-resistant control channel for distributing fresh proxy IPs and client configuration to users in censored regions. Push notification infrastructure is already widely deployed, has high collateral-damage cost to block, and is a server-push channel — meaning the client never has to initiate a query to an out-of-band endpoint that a censor could block.
-
CenPush is implemented and evaluated specifically for Tor bridge distribution, replacing the existing polled bridge-line fetching with push delivery. The design is presented as a general mechanism applicable to any circumvention tool that needs to push fresh proxy addresses to clients — not just Tor bridges — whenever censors block the tool's normal update channel.
-
152 of 5,478 crawled domains (approximately 2.8%) deployed active bot-detection measures—captcha delivery or perimeter protection—that blocked automated OpenWPM crawling entirely. The authors note this disproportionately excludes untrustworthy sites, biasing the training dataset toward well-resourced trustworthy outlets and limiting recall on the untrustworthy class.
-
The five most important predictive features are: (1) average children per non-leaf tree node, (2) 7-day rolling average of maximum tree breadth, (3) 7-day rolling average of average breadth, (4) average children per parent, and (5) 7-day rolling average of third-party requests. Temporal stability features (rolling means and daily deltas) rank ahead of most static snapshot features, indicating that behavioral consistency over time is more discriminative than point-in-time structure.
-
Of 8,004 unique third-party domains identified across 3,410 crawled news sites, 997 appear exclusively on untrustworthy websites and 2,992 appear exclusively on trustworthy ones. Domains disproportionately associated with untrustworthy sites include Yandex, Zamanta, and PayPal; domains exclusive to trustworthy sites are predominantly small-to-medium advertising and analytics actors rather than major platform giants.
-
Trustworthy news sites show dramatically more complex third-party structures than untrustworthy ones: mean MaxBreadth 39.22 vs 19.63, mean ThirdPartyRequests 137.45 vs 74.12, and mean unique third-party domains 44.15 vs 20.31. This finding reverses prior work (Han et al. 2022) and the authors attribute it to untrustworthy sites being under-resourced and optimized for content spread rather than user experience.
-
Scanning 0.91B unique SANs extracted from 3.7B certificates across 17 CT logs revealed 3,330 unique .onion addresses configured by 26,937 domains. After six months, only 2,101 onions (63%) remained reachable, of which 1,505 (72%) had matching clearnet index pages, constituting the effectively enumerable target set for a targeted OLF adversary.
-
Local onion association—periodically downloading the full set of onion associations from a CT-log-based API and performing each lookup locally—produces a traffic pattern from the guard's perspective that is indistinguishable from generic onion service access, eliminating both the OLF fingerprint and the DNS-based Website Oracle attack vector. This approach requires no per-connection clearnet exit circuit and imposes negligible overhead given the current ~1,500 stable O-L site count.
-
OLF reduces an adversary's target anonymity set from roughly 10,000 active onionsites to the ~1,500 stably available O-L sites—nearly an order of magnitude. Because O-L requires an exit circuit with a DNS lookup, a DNS-based Website Oracle further collapses the false-positive rate, making OLF effectively a closed-world attack on the enumerated O-L site list.
-
Circuit fingerprinting from a guard-relay position achieves ≥99.9% accuracy with FPR ≤0.1% for all four Tor circuit types (general, HSDir, introductory, rendezvous) using the Deep Fingerprinting classifier on the first 512 cells, despite Tor's deployed partial defenses. Onion-Location fingerprinting (OLF) combining these circuit classifiers then achieves 98.81–99.87% accuracy (FPR 0.16–1.23%) distinguishing O-L sessions from ordinary clearnet or onion-only visits.
-
Automatic Onion-Location redirect was disabled in Tor Browser 13.0.12 as a direct result of this research, because automatic redirect forces the distinguishable clearnet-then-onion circuit pattern on every visit without user awareness. Manual O-L remains in Tor Browser but is still fingerprintable with the same near-perfect accuracy since the exit→onion circuit sequence is identical whether the redirect is automatic or manually triggered.
-
MinecruftPT encodes circumvention traffic steganographically inside the Minecraft Java Edition network protocol, making a censored connection appear to a network observer as an ordinary online Minecraft game session. The cover channel is a high-volume, varied-packet-size TCP protocol with a large and active user population, making statistical fingerprinting harder than for lower-volume cover protocols.
-
MinecruftPT achieves mimicry by implementing enough of the Minecraft protocol to pass as a real client-server game session, not just in header structure but in behavioral sequence. The paper evaluates it under DPI and traffic-shape analysis, finding that faithful protocol mimicry at the behavioral level (packet sequence, message types, timing) is necessary to defeat classifiers that go beyond simple byte-pattern matching.
-
MinecruftPT uses the TCP-based Minecraft protocol rather than a WebRTC/UDP approach. The paper notes this gives it an availability advantage in environments where WebRTC is filtered or where UDP is blocked — a common configuration in corporate or institutional networks and some national censorship regimes. This positions it as complementary to Snowflake in the circumvention transport portfolio.
-
Longitudinal AS topology studies cited by the authors show that 95% of core-to-core AS links remain unchanged year-over-year and that large transit providers adjust their peering only gradually, with almost all churn occurring at the customer edge. This implies that high-usage transit ASes identified for RN proxy deployment are likely to retain their topological position for months to years, lending temporal robustness to placement recommendations derived from a single measurement snapshot.
-
An AS+IXP multigraph fusing CAIDA traceroutes (13.6M paths), 256M BGP updates from RouteViews/RIPE RIS, and IXP membership data yields 87,157 AS vertices, 1,588 IXP vertices, and 510,810 edges — an order of magnitude richer than BGP-only baselines. Hidden private peering links and IXP fabric connections invisible to BGP alone materially affect coverage estimates for refraction networking proxy placement.
-
The proposed system adopts the turbo tunnel architecture to provide a reliability layer over lossy TURN relay paths and to allow traffic reassembly at a single bridge across multiple TURN proxies. Three encapsulation modes are specified: direct application data inside TURN messages, DTLS datagrams via WebRTC data channels, and video frames inside WebRTC media streams — the latter two mimicking the encapsulation strategies of existing WebRTC circumvention systems such as Snowflake and TorKameleon.
-
TURN servers used by major applications such as Facebook Messenger for media relay are hypothesized to be less likely blocked in censored regions due to collateral damage to legitimate WebRTC traffic. Providers like Cloudflare, Metered Video, and ExpressTURN supply geographically distributed TURN infrastructure that can be used without any special configuration by a censorship evasion system.
-
Traffic splitting across N TURN proxies (1 ≤ N ≤ M) is hypothesized to resist active probing because each TURN server responds to probing requests identically to a regular TURN server, providing no distinguishing signal. Additionally, proxy ephemerality combined with splitting allows on-the-fly migration to new proxies when existing ones are blocked, maintaining connectivity even under partial blocking.
-
The paper enumerates five adversarial attack surfaces against a video-steganography UP channel: (1) wholesale blocking of the hosting platform, (2) mass-scanning and blocking encoded videos (noted as generally cost-prohibitive per the steganography literature), (3) enumerating videos via pseudorandom tags (feasible but hampered by tag-list overlap with unrelated content and time-window dynamics), (4) banning accounts posting encoded videos, and (5) tracking anticensorship users viewing encoded content. The pseudorandom tag window design specifically prevents preemptive enumeration because the top-n results for a tag at epoch t differ from those at t±1.
-
UP channels based on free third-party content hosting (video, audio, images, ML models) provide no-cost scalability: steganographic videos once uploaded are free to distribute to arbitrarily many users, and the channel sustains adversarial financial denial-of-service attacks without incurring operator costs. This contrasts with meek, SQS, AMPCache, and Skyhook, which face financial DoS risk because adversaries can drive up hosting costs by using those channels as intended.
-
The paper defines Unauthenticated Push (UP) channels as a distinct archetype from signaling/rendezvous channels, characterized by three properties: strictly unidirectional delivery, no client authentication or account association required, and higher bandwidth (kilobytes to megabytes) to support software updates rather than just minimal proxy-address exchanges. This design deliberately shifts operational-security burden onto senders to approach receiver anonymity.
-
A concrete UP channel implementation uses keyed steganographic encoding embedded in videos posted to a public hosting service (e.g., flickr.com), addressed via a time-epoch pseudorandom tag generator drawn from publicly known trending-topic lists. Clients query the top-n videos matching the current epoch tag and attempt decryption; real-world video size variability supports data transmissions from a few kilobytes (configuration updates) to megabytes (software updates).
-
Three open-source DPI tools (Zeek, libprotoident, nDPI) fail to identify 93–100% of UPGen flows across all tools. libprotoident misidentified 7% of UPGen flows as RTMP; nDPI and Zeek produced zero false labels. On a real-world MAWI/WIDE backbone capture, Zeek failed to recognize 90% of flows and nDPI failed on 67%, confirming that unidentified-protocol traffic is common in the wild; allowlisting without significant collateral damage (≥4%) is infeasible.
-
State-of-the-art ML classifiers (Deep Fingerprinting, Decision Tree, Random Forest, nPrintML) trained on known UPGen protocols and benign traffic always incur high out-of-distribution false-positive rates when attempting to block unknown UPGen protocols — in the vast majority of experiments the OOD FPR is 100%. The one exception (SSH OOD, Deep Fingerprinting) achieved a UPGen TPR of only 20%. By contrast, identical classifiers successfully generalize to block unknown Obfs4 flows with near-zero collateral damage in 3 of 4 cases.
-
In laboratory benchmarks, the best UPGen-generated protocol achieves 252 ms TTFB latency (vs 212 ms Obfs4, 313 ms TLS) and 4.25 Gbit/s throughput per core (vs 4.65 Gbit/s Obfs4, 9.42 Gbit/s TLS). The worst-case UPGen protocol (4.5 RTT handshake) reaches 677 ms TTFB but 3.70 Gbit/s throughput. In large-scale distributed Tor simulations, the choice of UPGen protocol had no statistically significant effect on end-to-end Tor flow performance.
-
UPGen's generator samples 18 independent parameters to produce 4.2×10^22 distinct structured encrypted protocols (entropy 38.4 bits). Each proxy is assigned a unique generated protocol, so identifying one protocol exposes only a single proxy. The generator was designed by studying 27 real-world encrypted protocols and sampling from observed structural patterns (greeting strings, handshake patterns, field orderings, key encodings).
-
Combinations of Bayesian methods, data augmentation with mixup, and NOTA defensive padding cut the open-world false positive rate by up to 92% at 0.5 recall on HTTPS-only traffic and 75% on Tor traffic relative to the deterministic MSP baseline. Even with these improvements, sustaining a world size in the hundreds of millions (approaching YouTube-scale) requires accepting recall of 0.5–0.6 and precision of only 0.1–0.2; at precision 0.5 and recall 0.5, the maximum workable world size is only 37.5M for HTTPS-only (Table 3), far below YouTube's ~10 billion video catalog.
-
Extrapolating empirical FPRs using Wang's base-rate-adjusted precision formula (𝜋_r), the best HTTPS-only approach can sustain precision 0.5 at recall 0.5 only up to a world size of 37.5M videos; precision 0.1 at recall 0.5 extends to 337.5M — still short of YouTube's ~10 billion catalog (Table 3). For Tor, the corresponding limits are 4.8M and 42.9M, making dragnet surveillance of unselected users on large platforms effectively infeasible at any acceptable precision with current techniques.
-
When a fingerprinting model is trained on traffic collected from one geographic vantage point and tested on traffic from a different continent, the HTTPS-only open-world FPR at 0.5 recall increased by factors ranging from 2.8x (EU-West-2) to 50.3x (Africa) relative to the same-vantage baseline — despite 60-way closed-world accuracy remaining above 0.99 across all vantage-point pairs (Table 5). For Tor traffic the effect was weaker but still reached 25.2x (Asia-Pacific Southeast-1), showing path diversity also disrupts Tor-based fingerprinting.
-
The paper establishes, for the first time in a large open-world scenario (64,000 unmonitored test videos), that HTTPS-only video stream fingerprinting is significantly easier than Tor-based fingerprinting because DASH adaptive bitrate selection introduces a second-order network-condition effect: clients request entirely different video segments at different quality levels depending on path conditions, causing traffic traces from different geographic vantage points to diverge at the application layer even when network conditions are nominally similar. This makes NOTA and synthetic training sample techniques less effective on Tor data due to inherent trace noisiness.
-
Tor provides substantial and measurable protection against video stream fingerprinting: the best-case FPR at 0.5 recall is 0.0000063 for Tor versus 0.0000008 for HTTPS-only connections, roughly an 8x increase. Translating to world sizes, at 0.5 recall and 0.1 precision the maximum viable platform catalog is 42.9M videos over Tor versus 337.5M over HTTPS-only (Tables 3–4), confirming Tor degrades adversary capability even after an assumed prior website-fingerprinting step that identifies video platform visits.
-
Custom CCAs that deviate from standard TCP/QUIC congestion response fundamentally contradict the core circumvention principle of traffic indistinguishability: by failing to back off under congestion signals, they produce traffic patterns that diverge from the vast majority of Internet flows that censors value, eliminating the collateral-damage protection that makes circumvention tools hard to block wholesale.
-
A two-stage threshold classifier evaluated on 10,080 synthetic flows across 1,260 network condition combinations (20 RTTs × 21 loss rates × 3 bandwidths) achieved 100% accuracy in Stage 1 separating loss-based from non-loss-based CCAs, and produced only 16 false positives from BBR flows in Stage 2, correctly flagging all 1,257 Hysteria and 1,257 Brutal flows as custom CCAs.
-
Shaperd's adaptive blocking-detection mode can integrate with external blockage-detection tools (e.g., Troll Patrol) to detect when a constraint set is no longer effective and automatically switch to an alternate constraint set, changing packet patterns to restore connectivity without user intervention.
-
Packet timings are a distinct detection vector for circumvention tools beyond payload content and packet lengths, as demonstrated by Wails et al. 2024. Prior FEP-specific shaping work (Fenske et al.) addressed packet lengths but explicitly left timing shaping for future work, leaving a known gap in detection resistance.
-
Shaperd's proof-of-concept prototype (~1000 lines of Go) introduces a minimal 4.1% throughput overhead for a single entropy constraint; the first additional constraint added 5.1% overhead and the second added 5.5%, with total overhead scaling with constraint count and rigor.
-
Shaperd introduces a constraint-agnostic traffic shaping system that operates on both packet content and timing in real time, designed for drop-in integration with any existing FEP. The system uses a four-component constraint definition (function, value, comparison operator, target packets) capable of expressing any rule based on a computable deterministic function over packet contents.
-
The paper concludes with design guidelines for future FIA-based privacy-enhancing technologies, identifying that path-aware routing in SCION and NDN's in-network caching both create new surveillance exposure: SCION path headers reveal routing metadata to on-path censors; NDN caching at routers means content is replicated at points under censor control. The authors recommend that PETs built on FIAs treat these architectural features as threat vectors, not privacy benefits.
-
Wrana et al. systematically assess how well existing surveillance and censorship mechanisms can target users of Future Internet Architectures (FIAs) — including NDN, SCION, XIA, and MobilityFirst — finding that DPI and flow-correlation techniques from the current internet map onto FIA traffic with moderate adaptation. The paper identifies that FIA naming/addressing schemes introduce new censorship attack surfaces (e.g., content-name-based filtering in NDN) not present in IP-based architectures.
-
Per-flow RTTdiff detection rates are only ~20% because the majority of proxy flows connect to CDN-cached content (Cloudflare, Google, Fastly) that sits within 5ms of the proxy, suppressing the discrepancy. However, aggregating across flows per website visit yields detection rates exceeding 70%—and from the abstract, approximately 80% of top-5K domains generate at least one detectable flow—with half of those detections made within the first 60 packets. This means an adversary can reliably expose client and proxy IPs after just a few website visits.
-
The paper evaluates two short-term mitigations—TCP delayed ACK on the proxy server and connection multiplexing—but finds both are limited: delayed ACK produces atypical ACK timing that may itself be fingerprintable, and multiplexing only adds entropy without eliminating the RTTdiff signal. Critically, obfs4 and ScrambleSuit's delay-based timing obfuscation are described as 'fundamentally limited' because they manipulate inter-arrival times without eliminating the underlying transport/application-layer session misalignment. The paper concludes no existing obfuscation scheme provides a principled defense against timing-based proxy fingerprinting.
-
IMAP/SSL traffic on port 993 constitutes less than 1% of total ISP traffic but accounts for nearly one third of all false positives in the RTTdiff exploit, because IMAP's non-RESTful multi-connection pattern violates the request-response correlation assumption. The overall per-flow FPR is bounded at 0.6–0.7% (on par with GFW's estimated FPR against fully-encrypted proxies), but implementing a pre-filter to whitelist IMAP traffic reduces the FPR by approximately one third, making the fingerprint substantially more precise.
-
Proxy users who resolve DNS locally (at the client) are approximately twice as susceptible to RTTdiff fingerprinting compared to users who resolve DNS at the proxy, across all tested client/proxy location combinations. Local DNS returns IPs optimally reachable from the client's region, which may be geographically distant from the proxy, increasing the proxy-to-server path distance and thus the RTTdiff discrepancy.
-
Cross-layer RTT discrepancy (RTTdiff) is a protocol-agnostic fingerprint that exploits an inherent architectural property of all proxy setups: transport-layer sessions terminate at the proxy while application-layer sessions remain end-to-end. Evaluation across 10 proxy protocols—including VMess, Shadowsocks, VLESS, Trojan, XTLS-Vision, and obfs4-wrapped SOCKS—shows near-identical detection rates for all except obfs4, confirming the fingerprint is not tied to any specific obfuscation scheme. At FPR=0.01, per-website detection rates exceed 70% across all tested client and proxy location combinations.
-
Because WATER uses a sing-box-compatible interface, a single WASM transport module written once is immediately usable by any application that embeds the WATER host runtime — including lantern-box (Lantern's proxy SDK), any other sing-box-derived client (33k+ GitHub stars as of 2024), and standalone WATER host binaries. This gives each new transport a substantially larger deployment surface than a single-app pluggable transport achieves.
-
WATER (WebAssembly Transport Executables at Runtime) defines a pluggable-transport architecture in which the transport logic is compiled to a WASM module that is loaded and executed at runtime by a thin Go host process. This separates the stable host ABI (dial, accept, read, write) from the rapidly-evolving transport logic, allowing new or updated transports to be delivered as small WASM binaries without recompiling or redeploying the host application.
-
IoT devices pose the primary false-positive risk: many IoT devices (printers, smart bulbs, cameras, vacuum cleaners) maintain very few sessions with a small number of fixed cloud IPs — behaviorally similar to a VPN client. In the CIC IoT 2022 dataset, only 2 devices were misclassified (a Google Nest Cam connecting to nexusapi-us1.dropcam.com and a device using Alibaba cloud) out of the full dataset with WINDOW=300 s and T=500 packets.
-
The threat model requires no DPI and was fully implemented as a Linux kernel module on a NETGEAR R6120 with only a 580 MHz processor, 16 MB ROM, and 64 MB RAM, adding negligible overhead. Unlike ML-based or DPI-based VPN classifiers, the statistical model operates pre-NAT on per-device private IP flows, making it immune to obfuscation techniques that alter packet payloads or disguise protocol handshakes.
-
A passive, router-level VPN fingerprinting technique exploits the design convention that all user traffic is tunneled to a single VPN server IP. By counting packets per device-to-IP session at the home router and flagging sessions where PACKETS_COUNT exceeds threshold T=500 within WINDOW=300 seconds, the method achieved a 100% detection rate for all VPN implementations that route all traffic through one server, with zero false positives across uncontrolled 4-day experiments.
-
The authors propose two countermeasures: (1) widespread adoption of traffic splitting so not all user traffic is routed through a single VPN tunnel, neutralizing the single-destination session signature; and (2) VPN servers should rotate at random intervals so that no prolonged session to one IP accumulates enough packets to trigger the threshold T.
-
Testing 9 popular VPN providers (ProtonVPN, Hide.me, Turbo VPN, Kaspersky VPN, Hotspot Shield, Secure VPN, Fast VPN Pro, VPN Super, VPN Gate), 7 were successfully detected. KasperskyVPN evaded detection because it exchanged keepalive packets with a secondary server exactly every 300 seconds, matching the chosen WINDOW, causing the session counter to reset. Hotspot Shield evaded because of previously documented traffic leakage where not all traffic is tunneled.
-
Snowflake's blocking resistance rests on a large, constantly changing pool of volunteer WebRTC proxies implemented as lightweight JavaScript browser extensions or web pages. Because the proxy population is in constant churn and new addresses appear faster than censors can enumerate and block them, IP-list blocking is structurally ineffective. The system is designed so that when an in-use proxy goes offline, the client seamlessly migrates to another with no disruption to upper network layers.
-
Snowflake proxies are simple enough to run as JavaScript inside a web page or browser extension, making them far cheaper to operate than a traditional VPN or proxy server. This low operational cost enables a large volunteer pool (orders of magnitude more participants than server-hosted bridge networks), which is the structural property that makes IP enumeration hard for censors.
-
XGBoost trained on a single month of OONI data achieves near-optimal performance; expanding the training window to 24 months produces deviations of only 0–5 percentage points for FNR, 0.07 PP for FPR, and 0.10 PP for accuracy — suggesting that larger windows introduce noise and overfitting rather than improving detection. Isolation Forest performance degrades more sharply, with accuracy dropping ~5 PP as training data grows beyond 6 months.
-
For the Isolation Forest model, resolver ASN (SHAP importance 0.237) and probe ASN (0.220) are the two most predictive features for DNS tampering, reflecting that censorship is topologically concentrated at specific network vantage points. For XGBoost, headers_match dominates (0.317), followed by asn_control_match (0.177), indicating that supervised models rely more on cross-layer consistency signals. DNS tampering represents only 0.5–0.8% of all OONI measurements across 2022–2023 (Figure 2), creating severe class imbalance in any training set.
-
XGBoost achieves a False Positive Rate of 0.0005, True Positive Rate of 0.9403, and overall accuracy of 0.9991 on OONI global DNS measurement data (2.5% stratified sample), vastly outperforming unsupervised alternatives: Isolation Forest achieves FPR 0.1321 / ACC 0.8699, and One-Class SVM degrades to FPR 0.9711 / ACC 0.0598, making OCSVM effectively unusable for this task.
-
Because Oscur0 starts with 0-RTT data lacking a full handshake, the station-side connection establishment is vulnerable to replay attacks. Oscur0 mitigates this by including a random 10-byte nonce in the encrypted application data of the first packet; the station checks each arriving nonce against a bloom filter of recently-seen IDs and drops duplicate connections, preventing replay without requiring a full round-trip handshake.
-
Oscur0 eliminates Conjure's separate registration phase by steganographically encoding ECDH public key, phantom IP, and transport parameters into the encrypted application data of the first UDP (DTLS 1.2 with Connection ID) packet sent to the phantom IP, using Elligator encoding to make the public key indistinguishable from random bytes. This removes several round trips — registration, TCP handshake, and application handshake — compared to standard Conjure, and means censors cannot block the scheme by blocking registration alone.
-
Registration-dependent Refraction Networking schemes such as Conjure create multiple single points of failure: censors can block registration channels independently of phantom connections. Domain fronting, a primary registration channel, has been progressively banned by major CDNs — Microsoft Azure in 2021 and Fastly in early 2024 — reducing its viability as a covert registration mechanism.
-
Prior circumvention transports that tunneled over VoIP or voice-conferencing software were identifiable to censors by their TCP retransmission fingerprint: real VoIP applications do not retransmit dropped packets in the same way, making the covert channel's reliability mechanisms a distinguishing artifact. DTLS and QUIC avoid this because they natively support both fault-tolerant and sequential delivery modes without external indicators of which mode is active.
-
WATMs are designed to be generic: any application that embeds the WATER host runtime can use the same WATM binary without modification. This means a single successfully deployed transport module reaches users of every WATER-enabled application simultaneously, collapsing the per-app porting effort that traditionally delays circumvention tool updates.
-
WATER (WebAssembly Transport Executables Runtime) separates transport logic from the host application by compiling it to a WASM module (WATM) that is distributed and loaded independently at runtime. Deploying a new or updated circumvention technique requires only distributing the new WATM binary and optional configuration — no change to the host application and no app-store update cycle is required.
-
Traditional circumvention tool development and deployment is slow because new strategies must be developed, integrated into each tool separately, and then distributed via platform app-stores. WATER's WASM module architecture specifically addresses this asymmetry: censors evolve blocking techniques quickly, while circumventors are bottlenecked by binary release cycles. The paper argues that dynamic WATM delivery breaks this bottleneck by decoupling transport updates from application releases.
-
A decade of ZMap-based studies has produced documented operational norms including blocklist hygiene (organizations can opt out of scans via ZBlocklist) and ethical rate-limiting practices. The same blocklist infrastructure that protects opt-out organizations also provides a model for reducing proxy infrastructure visibility.
-
Cloud-hosted services represent an open measurement problem for ZMap because IPs are shared, ephemeral, and behind CDN layers, making traditional IP-to-service attribution unreliable. The paper identifies reconciling scan-based observation with cloud infrastructure as a key challenge for the next decade.
-
A decade of Internet-wide scanning practice has established that cloud-hosted services present a fundamental measurement ambiguity: IP ownership is ephemeral and shared, making per-IP findings unreliable and complicating the attribution of services to specific operators or censors.
-
IPv6 measurement remains an open problem for ZMap because the address space is too large for exhaustive single-packet enumeration, unlike IPv4. This asymmetry means IPv6-addressed infrastructure is structurally harder to enumerate via blocklisting.
-
LZR, built on top of ZMap, can identify 99% of unexpected Internet services in five handshakes by acting as a shim between ZMap and ZGrab. This gives censors and researchers alike an efficient active-probing primitive to fingerprint proxy protocols at scale.
-
ZMap can scan the entire public IPv4 address space on a single port in under 45 minutes on a gigabit connection; with a 10 GigE connection and PF_RING, the same scan completes in 5 minutes. This makes Internet-wide enumeration of proxy infrastructure operationally trivial for any well-resourced actor.
-
ZMap can scan the entire public IPv4 address space on a single port in under 45 minutes on a 1 Gbps connection; with a 10 GigE connection and PF_RING, the full IPv4 address space scan completes in 5 minutes. This throughput enables near-real-time Internet-wide enumeration of any service listening on a given port.
-
After a decade of ZMap-based measurement, the authors identify IPv6 scanning as an unresolved open problem: the vastly larger IPv6 address space makes exhaustive scanning infeasible, fundamentally changing the threat model for service discovery compared to IPv4.
-
CenDTect (Tsai et al., NDSS 2024) uses decision trees and a novel clustering method on Censored Planet plus OONI data to identify blocking policies and provide interpretable insights at local and country levels. A separate approach (Duncan & Chen, 2023) applies sequence-to-sequence models and CNN image classification — treating network reachability data as grayscale images — to distinguish censored from uncensored content.
-
Brown et al. (2023) combined supervised ML models trained on expert-labeled data with unsupervised models establishing a baseline of 'normal' behavior to detect DNS-based censorship from Satellite and OONI datasets, achieving high true-positive rates for both known and new DNS censorship instances. The hybrid supervised/unsupervised approach is proposed as a template for the LLM-based system.
-
The proposed LLM-based censorship detection system plans to use ICLab as the primary dataset for its semantic richness across all network-stack levels, then cross-reference with OONI and Censored Planet to reduce false negatives. The paper explicitly notes ICLab lacks the scale and geographic coverage of OONI/Censored Planet but offers richer per-measurement context suited to LLM feature learning.
-
The daily volume of network reachability data collected by censorship monitoring platforms such as ICLab, OONI, and Censored Planet surpasses the 16 GB Books Corpus and English Wikipedia that BERT was trained on. This scale mismatch motivates applying LLMs — which thrive on large unlabeled corpora — to censorship measurement data rather than hand-labeling for rule-based systems.
-
Rule-based censorship detection systems rely on predefined regular expressions designed by human experts and fail to adapt to evolving censor techniques, leading to false negatives and poor scalability as data volume grows. In contrast, learning-based models are described as thriving on large data volumes and offering contextual understanding that rule-based systems lack.
-
The encapsulated TCP three-way handshake (3WHS) is detected in 80.59% of VPN flows but only 0.33% of plain UDP flows, making it—on its own—a near-practical VPN detector with 0.33% FPR; its presence is required by the classifier regardless of the compliance-rate threshold t.
-
Random padding alone raises the classifier FPR only slightly (0.11% to 0.15%), and connection multiplexing alone raises it to 0.53%; however, combining both defenses raises FPR to 2.57%, making the detector impractical for a real-world censor and yielding TPR of 93.40%.
-
A protocol-agnostic classifier that identifies RFC-mandated TCP behaviors (three-way handshake, 500ms ACK, 2×RMSS acknowledgement) leaking through UDP-based VPN tunnels achieves a false positive rate of 0.11–0.29% on real campus traffic, an order of magnitude lower than ML-based VPN detection techniques (FPR 1.4–5.5%) and on par with the GFW's estimated heuristic FPR of 0.6%.
-
Web browsing VPN traffic achieves only 32.35–42.44% TPR—far below SSH (99.43–99.56%) and file transfer (83.95–99.73%)—because DNS queries interleaved with TCP streams disrupt detection of the encapsulated 3WHS, confirming that connection multiplexing is a naturally occurring and effective evasion for web-browsing workloads.
-
An attacker who generates 10 defended copies of each training trace (re-sampling noise each time) improves Tik-Tok accuracy against DeTorrent from 31.9% to 48.2%, demonstrating that dataset augmentation with multiple defended samples is a practical countermeasure against randomized padding defenses including DeTorrent and FRONT.
-
Against the state-of-the-art DeepCoFFEA flow-correlation attacker, FC-DeTorrent reduces the true positive rate at a 10^-5 false positive rate to approximately 0.12 — less than half that of the next-best defense Decaf (TPR ≈ 0.29) — while using 97.3% bandwidth overhead, without delaying any real traffic packets.
-
DeTorrent exhibits strong diminishing returns in the bandwidth-performance tradeoff: increasing the dummy-download budget from N=1,000 to N=3,000 reduces Tik-Tok accuracy by ~19.1 percentage points, while a further increase from N=5,000 to N=7,000 yields only an additional 4.9-point reduction (accuracy floor near 20.8% at ~210% overhead). At the lowest tested budget (~40% overhead) Tik-Tok accuracy is still only 52.8%.
-
DeTorrent is implemented as a Tor pluggable transport on top of the WFPadTools/Obfsproxy framework and deployed against live Tor traffic; a modest VPS with 4 GB RAM and 2 vCPUs running at under 50% CPU utilization can defend five simultaneous connections in real time with no GPU required. Performance drops only 0.7% when the generator is trained on one dataset partition and tested on another.
-
DeTorrent reduces closed-world Tik-Tok attack accuracy from 93.4% to 31.9% on the BE dataset — 10.5 percentage points better than the next-best padding-only defense (FRONT at 42.4%) — and reduces Deep Fingerprinting accuracy from 94.3% to 30.0%, at a bandwidth overhead of 98.9%. On the larger DF dataset, Tik-Tok accuracy falls from 97.7% to 79.5%.
-
NetShuffle targets edge networks — small autonomous systems and entities that obtain IP address blocks from upstream providers — as a new class of support base for circumvention infrastructure. This class has received scant attention from prior work, which has focused on cloud providers and volunteer desktop machines. Edge networks represent a large pool of diverse IP space that is harder to block via ASN blackholing compared to a small number of major cloud providers.
-
NetShuffle decouples regular proxy services (e.g., HTTPS proxies, Tor bridges) from their network addresses via continuous in-network change using programmable switches at edge networks. Because the network location of a proxy is in constant flux, blocking by IP or address enumeration becomes structurally ineffective: the proxy service itself is unchanged but its visible address rotates continuously.
-
NetShuffle was prototyped in testbed environments and operated on a live campus network for more than one month. The evaluation shows that the in-network address shuffling provided by programmable switches is transparent to both services and clients and incurs negligible performance overhead, validating the drop-in appliance deployment model.
-
SpotProxy's active fleet-management algorithm continuously searches for cheaper Spot and regular VM instances and migrates the proxy fleet to lower-cost options. The paper demonstrates that this approach yields significant cost savings compared to operating a fixed fleet of on-demand instances, while simultaneously improving anti-blocking properties through higher IP churn.
-
SpotProxy exploits cloud Spot VMs — instances backed by excess capacity that can be reclaimed at any moment and re-spawned at new IP addresses — to create a high-churn proxy fleet. The observation is that Spot VM preemption, which is an operational liability for normal workloads, is a circumvention asset: it continuously refreshes proxy IP addresses, making censor enumeration and blocklisting structurally ineffective.
-
SpotProxy adapts both WireGuard and Snowflake to work with its active proxy migration mechanism, demonstrating that the approach is protocol-agnostic. The active migration mechanism allows clients to move between proxies seamlessly without performance degradation or connection disruption when a proxy is replaced — a requirement for any high-churn proxy infrastructure.
-
Chivo Wallet posts logs of every in-app event to NewRelic ('log-api.newrelic.com'), including keystrokes — DUI national ID numbers, phone numbers, and passwords — without privacy-policy disclosure. Separately, MiTelcel (76% Mexican mobile market share, 10M+ downloads) leaks users' phone numbers and emails to five distinct third-party servers via the HTTP 'referer' field on every 'Experiencias' tab click.
-
The Chivo Wallet app — the official El Salvador government Bitcoin wallet with 1M+ downloads — uses Microsoft CodePush to check 'codepush.appcenter.ms' for JavaScript/HTML/CSS updates each time it opens, bypassing Google Play Store review entirely. This allows the government of El Salvador to push arbitrary behavioral changes to all users' devices without any app store vetting or user notification.
-
In Latin America, censorship predominantly takes the form of targeted surveillance coupled with physical threats rather than network-level blocking. Mexico had documented Pegasus infections on journalists and activists between 2019–2022, at least 25 private spyware vendors sold surveillance tools to Mexican federal and state police, and at least 119 journalists have been killed in Mexico since 2000. Dynamic analysis of 8 widely-used LATAM apps (combined 100M+ downloads) found security failures across all three assessed categories: cleartext traffic, undisclosed PII exfiltration to third parties, and unvetted external code update mechanisms.
-
MiClaro Colombia sends device latitude and longitude to multiple third-party servers without user disclosure, in violation of its own privacy policy. Among the four Movistar country variants, the Argentina app requests access to all phone-call-related permissions while the Uruguay app requests none — demonstrating that third-party SDK inclusion, background receivers, and dangerous permissions vary substantially by country version of the same ostensibly unified telco app.
-
The SAT Móvil app (Mexico's official tax service, 1M+ downloads) consistently fetches its 'Chat' page over cleartext HTTP, exposing citizen ID numbers (CURP, RFC), passwords, and tax documents to any in-path attacker. None of the four major Latin American telco apps (MiTelcel, MiTigo, MiClaro, MiMovistar) implement HSTS on SMS-delivered external links, making all of them uniformly vulnerable to SSL strip downgrade attacks.
-
Censors employing deep learning can use DTLS connection duration as a precise identifier to classify and block Snowflake traffic. The paper proposes switching PT connections after a variable time limit as a countermeasure to prevent duration-based classification.
-
The authors propose a 'shim' pluggable transport that splits client traffic across N PT connections using unmodified existing PT bridges as proxies and a gateway bridge that correlates streams back into a Tor circuit via the Turbo Tunnel reliability pattern. This architecture enables all existing and future PTs to benefit from traffic splitting without modifying each PT's client or server code individually.
-
Initial attempts to split Snowflake traffic naively across multiple WebRTC proxies produced either no improvement in performance or a net negative effect. The authors attribute this to the wide variance in proxy network stability and bandwidth and flag it as an open problem requiring more advanced splitting algorithms.
-
Because traffic splitting is not ubiquitous network behavior, split PT traffic may appear anomalous to a censor, allowing them to distinguish normal PT use from split PT use even without classifying the underlying protocol. The authors flag this as a key open risk to be evaluated empirically and note that splitting across multiple bridges or multiple PT types may simultaneously raise and lower different detection signals.
-
When a user splits traffic across N paths, a censor observing a single path sees only a partial trace, substantially reducing the accuracy of classifiers trained on complete network traces. Prior Tor traffic-splitting work (TrafficSliver, CoMPS, multipath Tor studies) has validated this defense against website fingerprinting outside the PT context.
-
Of 4,488 total HTTP Request Smuggling test vectors, 2,015 (44.9%) were accepted by at least one web server. CL*/TE vectors had a 99.0% acceptance rate (1,103/1,114); TE*/CL had 76.0% (859/1,130); CL/TE* had only 4.7% (53/1,130); and TE/CL* had 0%. Nginx 1.25.2 accepted 1,315 vectors while Apache 2.4.57 accepted only 11, reflecting HRS countermeasures added in Apache 2.4.25 and 2.4.52.
-
The root cause of port-shadow vulnerabilities is that connection-tracking frameworks maintain five shared, globally-accessible resources across all VPN clients on the same server. The paper's formal model identifies these as: the conntrack table, the NAT table, the port space, the routing table, and the ARP/neighbor cache. Any of these shared resources can be used as a side-channel. Bounded model checking confirmed that enforcing strict process isolation around all five resources eliminates the attack surface.
-
The "port shadow" exploit abuses five shared, limited resources in Linux conntrack/Netfilter (and analogous frameworks in BSD, Windows) to let an off-path attacker intercept or redirect encrypted VPN traffic, de-anonymize a VPN peer's source IP, or portscan a peer hidden behind a VPN server — all without compromising the VPN's cryptographic layer. Four concrete attacks are demonstrated; formal model checking with bounded model checking verified six process-isolation mitigations that prevent the shared-resource collision.
-
Stateful firewalls used as censorship middleboxes exhibit counter-intuitive implementation behaviors: FW-3 forwards ACK packets before a TCP handshake is initiated, and FW-1 actively spoofs RST packets in response to unsolicited traffic to thwart evasion attempts. These vendor-specific quirks create or close evasion opportunities that are invisible to rule-verification tools and not predictable from policy documentation alone.
-
Evasion attacks generated against one firewall-deployment combination do not transfer well to other settings: a deployment-agnostic approach (used by censorship circumvention tools) fails to generate effective attacks across diverse victim stacks and attacker capabilities. Pryde's deployment-aware, modular workflow finds successful attacks across configurations with and without insider threats, and against multiple attacker success criteria (data delivery vs. victim ACK vs. attacker receipt of ACK).
-
TCP-compliant packet alphabets are insufficient for modeling stateful firewall evasion. Including non-TCP-compliant traffic — specifically flipped-direction SYNs, out-of-window seq/ack numbers, and packets that form a parallel TCP connection in the reverse direction — is what unlocks discovery of deep attack paths. Prior model-inference work (Alembic) that restricted itself to compliant sequences produced models incapable of generating any of the 6,000+ attacks Pryde found.
-
Pryde generates more than 6,000 successful and unique evasion attacks against 4 popular stateful firewalls, which is 2–3 orders of magnitude higher than censorship circumvention algorithms (e.g., Geneva) and black-box fuzzing. The gap arises because circumvention tools only uncover shallow evasion sequences and cannot systematically explore the full attack-state space.
-
Web security vulnerabilities whose exploitation depends on parser divergence between two co-located systems are structurally isomorphic to censorship circumvention attacks, where the censor acts as the frontend parser and the destination server as the backend. The authors demonstrated this by directly converting all HRS test vectors from prior security research into circumvention probes with no modification, showing that censorship-circumvention techniques can be systematically constructed from existing vulnerability corpora.
-
Reusing the same 64-bit client ID across rendezvous attempts caused approximately 3-minute delays in failure scenarios because the SQS queue deletion API can take up to 60 seconds to complete, forcing subsequent attempts to wait for the previous outgoing queue's deletion before a new queue with the same ID can be created. The fix generates a fresh random 64-bit client ID per attempt, eliminating the dependency on prior-attempt cleanup.
-
A single shared bidirectional SQS queue was rejected for Snowflake rendezvous because SQS provides no mechanism to direct messages to a specific consumer — all polling clients would receive all other clients' messages, creating a privacy violation. The adopted design uses one shared incoming queue (broker-read-only) plus per-client temporary outgoing queues identified by randomly generated 64-bit IDs, with the broker periodically deleting queues idle for more than a configurable number of minutes.
-
Amazon SQS routes client traffic through a single fixed HTTPS endpoint (https://sqs.us-east-1.amazonaws.com), making it infeasible for a censor to distinguish circumvention-bound SQS traffic from legitimate AWS service traffic; blocking this signaling channel would require blocking all Amazon SQS, imposing significant collateral damage on businesses and developers.
-
AWS credentials distributed publicly to enable client access to the SQS API were flagged by GitHub's automated secret-scanning and AWS Support requested their deletion, even though the credentials carried intentionally limited permissions. The operational workaround adopted — base64-encoding credentials before public distribution — bypasses automated scanning but provides no real security.
-
CenDTect, an unsupervised decision-tree system using iterative parallel DBSCAN, analyzed more than 70 billion Censored Planet data points (January 2019 – December 2022) and discovered 15,360 HTTP(S) censorship event clusters across 192 countries and 1,166 DNS event clusters across 77 countries. Manual validation against 38 known censorship events from news reports confirmed all human-identified events were recoverable from CenDTect's output. The system additionally identified more than 100 ASes in 32 countries with persistent ISP-level blocking and 11 temporary blocking events in 2022 correlated with elections, protests, and armed conflict.
-
CenDTect uses cross-classification accuracy — how well a decision tree trained on one domain's blocking pattern predicts another domain's blocking — as a distance metric to cluster domains that share the same blocking policy. This metric outperforms prior time-series approaches because it is interpretable (the resulting decision tree directly reveals the blocking mechanism: which ISP, which port, which protocol) rather than producing opaque anomaly scores. The approach scales to planetary-measurement volumes without requiring labelled training data.
-
Separating the Broker role (a server that holds and manages bridge information) from both the rendezvous channel and the censorship evasion system enables modular protocol design: the rendezvous carrier can be swapped independently of the proxy system. The authors identify broker authentication and multi-broker load distribution as open problems not addressed in the current prototype.
-
Using Google Pub/Sub as a rendezvous channel adds 7.17 seconds of bootstrapping overhead vs. a 1.32-second direct baseline when establishing a TorKameleon WebRTC bridge connection (total: 8.49s vs. 1.32s). The dominant bottleneck is subscription creation time (5.23s), not the message exchange itself (3.26s), averaged across 10 samples with 113 ms cross-Atlantic latency.
-
The paper surveys the rendezvous channel design space and identifies at least six prior carrier approaches: domain fronting via CDNs, AMP cache proxying, Amazon SQS queues, push notification services, email tunneling (Mailet, SWEET), and cryptocurrency covert channels (MoneyMorph). Pub/Sub adds bidirectional real-time messaging with broad IoT/enterprise adoption as a new carrier class not previously evaluated for circumvention rendezvous.
-
The system uses a shared Pub/Sub topic for all users, where session IDs (SIDs) are visible to all subscribers on the broker topic. The paper argues this does not compromise user anonymity because SIDs are randomly generated per-session by client-side software with no link to user identity, and all subsequent bridge-info payloads are encrypted under a session-specific symmetric key exchanged via asymmetric encryption.
-
The paper documents that bridge distribution across major circumvention tools (Tor Browser's Moat, Snowflake) relies entirely on domain fronting (meek) for automated, user-friendly bootstrapping. This concentration means a censor that defeats domain fronting — or that pressures CDN providers to stop offering it — removes essentially all automated bridge-discovery pathways simultaneously, leaving only manual out-of-band methods (email/Telegram accounts) that require many user interactions.
-
Raceboat formalizes a decomposition of application-protocol-tunneling channels into three reusable components (Transport, User Model, Encoding) and a channel manager that supports mixing unidirectional channels. By composing seven different channels from these modular components (including email, AWS S3, and Redis variants), the paper demonstrates that the current ad-hoc one-protocol-one-implementation model wastes significant re-implementation effort: the same transport or encoding logic is duplicated across Snowflake, meek, CloudTransport, and others.
-
The paper argues that a greater diversity of signaling channels reduces the censor's leverage: when many independent services (cloud storage, email, push notifications, domain fronting) can each bootstrap a circumvention connection, a censor must block all of them to prevent access, and the collateral damage of blocking each may deter action. Skyhook specifically targets cloud storage as an additional independent pathway alongside existing channels like meek, Raven (email), and PushRSS.
-
Skyhook redesigns the 2014 CloudTransport concept as a signaling channel for bridge/proxy bootstrapping rather than a general-purpose browsing channel. By scoping to two-message exchanges (~1KB per direction, ~1 minute latency tolerance), Skyhook eliminates the requirement for censored users to create paid cloud storage accounts — the key usability barrier in the original design — and uses unilateral permissioning over AWS S3 objects so blocking Skyhook requires blocking all HTTPS traffic to an entire AWS S3 region.
-
CNN-based deep learning reduces obfs4 false positive rate by an order of magnitude versus the best decision tree (FPR 2.9×10⁻³ vs. 3×10⁻²) while maintaining 100% recall, and achieves near-perfect Snowflake data-flow detection (Precλ=1k = 0.95, Fλ=1k = 0.97). However, at realistic base rates λ > 10⁶ all CNN classifiers still yield near-zero precision, leaving per-flow deep learning alone insufficient for nation-state-scale deployment.
-
The paper identifies that circumvention systems relying on long-lived, consistent proxy servers are fundamentally vulnerable to host-based temporal detection regardless of per-flow obfuscation quality, and recommends adversarial examples, ephemeral obfuscation servers, and programmable or polymorphic protocols as countermeasures. Snowflake's volunteer-browser proxy architecture—where proxies are ephemeral and addresses are not reused—is highlighted as inherently more resistant to host-based classification than static bridge designs like obfs4.
-
State-of-the-art ML-based obfs4 detection (Wang et al. decision tree) achieves 97% precision at equal base rates (λ=1) but precision collapses to 3% at a still-conservative λ=1,000; at λ=10⁶ precision approaches zero for all classifiers tested. This base-rate failure was previously uncharacterized because prior evaluations only considered balanced or near-balanced datasets.
-
Combining a CNN flow classifier with host-based temporal accumulation eliminates all false positive classifications after observing at most 38 flows per host while maintaining perfect recall for all obfs4 and obfs⋆ bridges. The scheme requires only 14 bits of state per (IP, port) pair; tracking 4×10⁹ destination services requires no more than 50 GiB of storage, feasible on commodity hardware.
-
obfs4 and obfs⋆ produce characteristic wire patterns—bursts of roughly MTU-sized payloads followed by a randomly-sized chaff packet—that CNN classifiers detect purely from packet-size sequences without payload inspection. A trivial per-bridge entropy-biasing re-encoding (obfs⋆) completely defeats the hand-tuned decision tree (0% precision, 0% recall) but does not reduce CNN detectability, because the CNN generalizes across size-distribution variants.
-
Circumvention tools circulate through word-of-mouth and underground distribution networks rather than official app stores, making the ecosystem opaque and creating a supply-chain attack surface: adversarially-operated tools (including, per prior work, apps linked to the People's Liberation Army) reach users through the same channels as legitimate tools. The survey documents that providers are aware of misbehaving players but lack coordinated mechanisms to flag or exclude them.
-
The first multi-perspective study of the circumvention-tool ecosystem surveyed 12 leading CT providers collectively serving over 100 million users, plus CT users in Russia and China. Beyond technical blocking challenges, the study found that funding constraints, usability problems, misconceptions (users and providers hold inaccurate beliefs about each other's capabilities), and misbehaving players (tools operated by adversarial actors) are equally significant threats to the ecosystem's health — and are largely unaddressed by the academic research community.
-
Obfuscated proxy traffic (including Shadowsocks, VMess, VLESS, Trojan, obfs4, and REALITY) can be reliably fingerprinted by detecting encapsulated TLS handshakes — the inner TLS ClientHello that appears inside an outer encrypted tunnel. This fingerprint is protocol-agnostic: any proxy that wraps TLS-bearing application traffic will produce it. The authors deployed a similarity-based classifier within a mid-size ISP serving over one million users and demonstrated detection with minimal collateral damage.
-
While stream multiplexing reduces the visibility of encapsulated TLS handshakes by merging inner connections, the paper cautions that multiplexing plus random padding alone is "inherently limited" as a long-term countermeasure. Censors can adapt by monitoring burst sizes and round-trip counts at the outer-connection level, which remain correlated with the number of inner TLS sessions regardless of padding.
-
DeTorOS enables provable geographic avoidance for Tor onion services by running a TEE-backed Bento function as a trusted middlebox: both the client and the onion service upload their respective 3-hop circuit halves to this enclave, which computes the never-once or never-twice avoidance proof without revealing either party's circuit to the other.
-
Computing a never-once avoidance proof for a 6-hop onion-service circuit takes an average of 64.85 seconds — incurred once at connection setup — because the system must collect round-trip timing measurements across all six relays before running the geographic proof; SGX execution overhead is nominal, and the paper notes that lower-RTT circuits (more likely to be DeTorOS-compliant) reduce subsequent data-transfer latency.
-
Never-twice provable avoidance succeeds for 72.4% of sampled source-destination pairs on 6-hop onion-service circuits, compared to approximately 98% on the original 3-hop DeTor circuits; the degradation arises because the additional hops increase round-trip time, making it harder to rule out forbidden-region traversal via speed-of-light bounds.
-
DeTorOS's security relies on the honest-but-curious model: if the onion service refuses to participate or lies about its circuit, the client receives no avoidance guarantee. The paper explicitly flags this as an open limitation and notes it cannot be closed without either requiring a TEE on the onion service side or fundamental protocol changes.
-
Tor's built-in country-exclusion mechanism is unreliable: circuits configured to exclude US Tor nodes only actually bypassed the US 12% of the time, motivating provably-avoidant circuit construction.
-
Across all years of the KIO dataset (2016–2021), a large majority of events involved full-network shutdowns and their count grew significantly from 2016 to 2019 with no significant decline observed through 2021. Censors are also increasingly employing app-specific bans and throttling alongside full shutdowns, with all three restriction categories non-mutually exclusive and rising over the period.
-
Using merged IODA and KIO data across 155 countries (Jan 2018–Aug 2021), elections increase the daily probability of an Internet shutdown by a factor of 16, coups by a factor of roughly 300, and protests by a factor of 9. These political mobilization events do not increase the probability of spontaneous outages, providing a discriminating signal between intentional and unintentional disruptions.
-
The merged KIO-IODA dataset (Jan 2018–Aug 2021) documents 219 national-scale Internet shutdowns across 35 countries and 714 spontaneous outages across 150 countries; the 35 shutdown-affected countries collectively represent more than 1 billion estimated Internet users. Myanmar (53 IODA events), Syria (52), and Iraq (38) are the most frequently affected countries in the shutdown dataset.
-
Countries where state-owned providers originate more than 50% of domestic address space show significantly higher shutdown prevalence; this state-ownership factor predicts shutdowns but shows no discernible difference for spontaneous outages. Countries with shutdowns have a median V-Dem liberal democracy score of 0.151 (maximum 0.481), compared to 0.279 for countries with spontaneous outages and 0.465 for countries with neither.
-
Over 55% of government-ordered shutdowns last a multiple of 30 minutes (vs. 15% of spontaneous outages), and 45% last precisely 4.5, 5.5, 8, or 10 hours (vs. <1% of spontaneous outages). The median recurrence interval between successive shutdown events within the same country is 1 day versus 39 days for spontaneous outages, with 67.7% of shutdowns falling exactly on 1-, 2-, 3-, or 4-day intervals versus 0.17% of outages.
-
Discop's core algorithm is modality-agnostic and deploys unchanged across text generation (GPT-2, DistilGPT-2, Transformer-XL), image completion (Image GPT), and text-to-speech (Tacotron + WaveRNN), requiring only that both parties share the generative model, PRNG, and seed. The same zero-KLD security proof applies across all modalities.
-
Discop with Huffman-tree recursion achieves entropy utilization of 0.92–0.94 (bits embedded ÷ entropy available) and an embedding capacity of 3.48–5.29 bits/token across nucleus-sampling parameters p=0.80–0.98 with GPT-2, matching or exceeding ADG (0.78–0.84 utilization, 3.07–4.89 bits/token) while maintaining exactly zero KL divergence. Per-bit embedding time is 2.17E-03 to 5.52E-03 seconds, comparable to ADG.
-
Discop achieves provably perfect steganographic security (DKL(Pc‖Ps) = 0) by constructing multiple 'distribution copies' of a generative model's predicted distribution and using the copy index to encode the secret message. Because all copies share identical token probabilities, the stego distribution is exactly equal to the cover distribution and no steganalyzer can perform better than random guessing.
-
All prior provably-secure steganography methods introduce measurable distribution distortion: ADG achieves Max KLD of 4.54E-02 to 6.76E-02 bits/token, and Meteor with its heuristic sorting reaches Max KLD up to 9.01E+00 bits/token (Table II, GPT-2, p=0.80). These non-zero KL divergences give any statistical steganalyzer a non-negligible distinguishing advantage, violating the security definition even when average divergence appears small.
-
Replacement-based covert channels that substitute genuine media streams with ciphertext (Protozoa replacing WebRTC video, Balboa replacing audio) are immediately detectable when the censor controls or has plaintext access to the protocol gateway — for example, a WebRTC relay that decrypts and validates incoming media. Censors can also systematically suppress these channels by selectively degrading or blocking encrypted traffic for which they have no decryption trapdoor.
-
Achieving active security (FEP-CCFA) requires that on any AEAD decryption failure a fully encrypted protocol silently return the empty string and keep the channel open indefinitely, never emitting a channel-closure signal. Any observable behavioral difference — including connection termination timing — leaks information about ciphertext-boundary locations to an active adversary.
-
Shadowsocks transmits a fixed-size AEAD-encrypted length field followed by the AEAD-encrypted payload with no support for reducing ciphertext size via fragmentation, while Obfs4 permits input-side padding but not output fragmentation. These designs impose distinct minimum output message lengths, allowing a passive adversary to distinguish between them — and identify short-message sessions — based solely on the minimum observed message length.
-
No existing fully encrypted protocol — including Obfs4, Shadowsocks, VMess, and Obfuscated OpenSSH — simultaneously satisfies passive indistinguishability (FEP-CPFA), active-manipulation resistance (FEP-CCFA), and output-length shaping. The paper presents a novel stream-based construction that provably satisfies all three using AEAD-authenticated length blocks, an output buffer supporting arbitrary fragmentation, and a padding mechanism allowing the sender to emit exactly p output bytes on demand.
-
Obfs4's data-transport phase encrypts per-record length fields with an unauthenticated stream cipher. An active adversary can overwrite this field to force a predictable TCP connection termination at a calculable byte offset; the authors experimentally confirmed that Tor-over-Obfs4 connections can be reliably distinguished from other FEPs because client initiation messages have consistent lengths.
-
Censors optimize for utility under asymmetric misclassification costs rather than raw accuracy: false positives (blocking legitimate traffic) carry economic and political costs that make censors conservative about deploying classifiers with high false-positive rates. Multi-flow stateful classifiers — such as the obfs4 Elligator probabilistic distinguisher, which requires correlating observations across multiple connections — are operationally more expensive than single-packet or connection-initiation classifiers, which the author suggests explains why probabilistic multi-flow distinguishers have not been exploited in practice even when theoretically available.
-
Three independent implementation flaws in obfs4proxy's Elligator encoding made obfs4 public-key representatives passively distinguishable from uniform random bytes: (1) non-canonical square roots allowed a square-then-root test matching 100% of obfs4 outputs but only ~50% of random strings; (2) bit 255 was always zero; (3) only large prime-order subgroup points were encoded. A classifier exploiting these achieves 100% sensitivity (obfs4 never falsely marked as random) at less-than-100% specificity. All three were fixed in obfs4proxy-0.0.12 (December 2021) and 0.0.14 (September 2022).
-
Shadowsocks 'stream cipher' methods lacked integrity protection on ciphertexts, enabling a decryption oracle: an attacker who can guess as few as 4 bytes of plaintext prefix (5 bytes without controlling a /24) can replay a recorded session with a modified 7-byte target header, causing the server to send the decryption of the entire recorded stream to an attacker-controlled host. This provides an efficient active test for identifying Shadowsocks servers; once identified, a censor can block by IP address.
-
VMess's encrypted command block used a non-keyed hash over variable-length fields in a MAC-then-encrypt construction where the receiver cannot locate the hash without first parsing the protected data, enabling an active distinguishing attack: by replaying an authentic request 16 times with the padding-length field P set to 0000–1111, an attacker observes that a VMess server reads exactly P+N+4 bytes before disconnecting, with max and min byte counts differing by exactly 15 with every intermediate value present. V2Ray mitigated this in v4.23.4 by disconnecting after a timeout rather than after receiving a full command block.
-
Variable bitrate encoding (e.g., the OPUS codec's 6–510 kbps range) in VoIP protocols leaks content properties through packet timing, enabling ML classifiers to distinguish protocol tunnels from real conversations. An audio tunnel without timing shaping was identifiable with auROC 0.981 and aucPR 0.959 by an AutoGluon-Tabular classifier examining 1000-packet flow windows.
-
Voiceover's DCGAN, trained on ~400 hours of two-person telephone conversations, generates conversation timing templates that constrain when the tunnel transmits audio. This reduces ML classifier performance from auROC 0.981/aucPR 0.959 (unshaped baseline) to auROC 0.682/aucPR 0.482, and the improvement holds at 500-packet windows (auROC 0.68/aucPR 0.50), suggesting robustness to memory-limited adversaries.
-
Protocol mimicry that replicates only statistical or syntactic traffic properties is insufficient for unobservability: Houmansadr et al. (2013) showed SkypeMorph was trivially detectable by the absence of Skype control channels, missing login-server communication, and failure to replicate implementation-specific bugs present in real Skype—demonstrating that full behavioral replication, not just traffic shaping, is required to withstand scrutiny.
-
Skype for Web normalizes packet sizes such that Voiceover transmissions and genuine audio conversations produce nearly identical packet size CDFs across Ubuntu 18.04 and Windows 10, across all tested modulation parameters (carrier frequency, sampling frequency, baud rate, frame length). This makes the Skype-based tunnel inherently immune to packet-size fingerprinting without requiring explicit size shaping.
-
Voiceover achieves 31.16 bytes/s goodput with default parameters—roughly half the 62.32 bytes/s of the unshaped baseline—because GAN-imposed silence periods reduce transmission time. Skype's OPUS codec bounds the theoretical ceiling at 750–63,750 bytes/s, so all multimedia tunnels over this path are constrained to low-bandwidth use cases; the authors explicitly position Voiceover as an out-of-band channel for sharing secret keys rather than a general-purpose data path.
-
HTTP/URL/keyword filtering was the most prevalent censorship method both during the measurement period (49% of countries) and historically (69%), despite 82% global HTTPS adoption. The authors attribute this persistence to censors lacking technical sophistication to upgrade, and to uneven HTTPS adoption leaving older methods effective in underserved regions.
-
IP and port blocking dropped from 30% of countries historically to only 9% during the study period (six countries), with the decline attributed to difficulty maintaining ephemeral blocklists, CDN collateral damage, and IPv6 expansion. Iran is a significant exception: it has implemented port allowlisting — permitting only ports 80, 443, and 53 — on multiple occasions, blocking all other ports entirely.
-
TLS-based filtering (SNI blocking) was active in 41% of 70 surveyed countries during the June 2020–May 2021 measurement period and 44% historically, driven by the 82% global HTTPS adoption rate (Mozilla telemetry, Oct 2021). China took the unprecedented step of blocking ESNI traffic entirely, and the authors note that widespread ECH deployment could render this entire censorship category obsolete.
-
In Brunei, censorship is confined to AS10094, which serves approximately 70% of the country's Internet users. The censor injects RST packets bearing a distinctive fingerprint — the censored query's IP ID field — in response to HTTP requests containing censored Host headers, and censors on all ports without residual censorship. A SYN followed immediately by a PSH+ACK with a censored payload is sufficient to trigger blocking without a completed TCP handshake.
-
Censoring middleboxes' TCP non-compliance — specifically, their willingness to censor bidirectionally without completing the three-way handshake — enables external vantage points outside a censoring country to trigger and measure censorship without any local endpoint participation. The approach requires only a confirmed censored domain per AS, evidence of bidirectional censorship, and minimal residual censorship.
-
Geneva — originally designed to evolve censorship-evasion packet sequences — was repurposed by inverting its fitness function to discover censorship-triggering packet sequences instead. Training against non-responsive IP addresses allows Geneva to attribute all responses to middleboxes, enabling fully automated discovery of triggering strategies without any endpoint cooperation.
-
Tajikistan routes virtually all national egress and ingress traffic through a single state-run AS (AS51346, Tojiktelecom) under a 2016 national decree, creating a centralized chokepoint. The censor injects RST+ACK packets with a unique 22-byte all-zero payload, censors on all ports, and requires two PSH+ACK packets containing the censored content before injecting — possibly modeling typical multi-resource HTTP browsing behavior.
-
The Censored Planet data analysis pipeline matched more than 60.89% of all HTTP-response data across four years of measurements to either a known blockpage fingerprint or a confirmed non-censorship fingerprint. Over 60 million individual measurements were specifically classified as expected Akamai CDN behavior—responses that previous work had routinely misclassified as censorship because Akamai's edge configuration returns connection timeouts or HTTP 301 redirects when the test domain and vantage-point server are both Akamai-hosted.
-
Prior DNS-manipulation measurement systems suffered from high false-positive rates because DNS anomalies are also produced by benign infrastructure (CDNs, geo-DNS, captive portals). CERTainty's TLS certificate inspection step disambiguates these cases, establishing that certificate validation is a necessary complement to DNS-response comparison for reliable censor classification.
-
Of the Tranco top-10K domains, 286 (3.26%) returned geoblocking signatures for all Russian vantage points in May 2022, with CDN-mediated blocking dominant: 87 domains via Cloudflare and 57 via Akamai. DNS-level geoblocking alone affected 68 domains, and 29 domains implemented both DNS and TCP geoblocking simultaneously, rendering public-resolver circumvention of DNS blocks ineffective for those targets.
-
At 64 bps FSK encoding over cellular voice, Dolphin achieves a bit error rate below 2% across all tested data sizes (100–5000 bytes), cellular providers, and geographic distances up to ~3600 miles. Rates of 128 bps and above cause BER to jump to 5–22%, making transmission too unreliable for practical use.
-
FSK-encoded Dolphin audio is distinguishable from normal human speech via offline amplitude analysis: Dolphin's mean signal amplitude is 0.4 (std 224) versus 205 (std 1590) for natural speech — approximately an order of magnitude lower — enabling classification by a telecom operator who records calls. The paper also notes that standard CRC checksums appearing periodically every chunk provide a unique detectable signature if the adversary attempts to decode the audio.
-
Documented Internet shutdown events grew from 75 in 2016 to 213 in 2019 across 33 countries, with individual shutdowns lasting from hours to 472 days (Chad). These shutdowns completely sever IP connectivity, rendering all existing circumvention tools (Tor, VPNs, Shadowsocks, etc.) non-functional since they require at least partial Internet access to operate.
-
An adversary introducing audio perturbations every 2.5 seconds (sufficient to corrupt each 20-byte chunk at 64 bps) degrades PESQ call quality to 1.6, below the 2.0 'unusable' threshold, making the attack self-defeating. However, targeting only acknowledgment windows (every ~12.5 s under Dolphin's default batch-of-5 configuration) achieves PESQ 3.6 — acceptable to human callers — while fully disrupting Dolphin data transfer.
-
The system is designed to protect crowdsourced volunteer privacy by storing only AS-level granularity alongside randomized short-lived client identifiers, explicitly discarding source IP addresses and any browser-identifying information. AS-level resolution is sufficient for server-side evasion because strategies are evolved per-censor-ASN rather than per-user.
-
Server-side censorship evasion strategies require zero client-side changes: clients bypass censorship without installing software or even being aware of the evasion, and this approach has been adopted in production tools including Psiphon's packetman. The packet manipulations exploit weaknesses in how censors track or tear down TCP connections, occurring entirely at the server during the three-way handshake.
-
All existing automated server-side strategy discovery tools — Geneva, Alembic, and SymTCP — require researcher control of a client during training, even when the discovered strategies are deployed exclusively server-side. This dependency makes it infeasible to train against censors in networks where researchers cannot place a controlled machine.
-
Relying on third-party email providers to verify users was demonstrated by Ling et al. to leave Tor's BridgeDB vulnerable to censors capable of creating multiple accounts, enabling bridge enumeration via sock-puppet attacks at scale. Active and passive detection techniques — including traffic flow analysis, DPI, website fingerprinting, and active probing — have been demonstrated in prior work to reveal Tor bridges, making Tor inaccessible for the majority of users in some regions.
-
The Lox check-blockage protocol response size and time grow linearly with the number of blocked bridges — 6 kB / 11 ms at 5% blocked, 63 kB / 64 ms at 50%, and 126 kB / 122.5 ms at 100% — creating a bandwidth bottleneck a strategic and patient censor can exploit by triggering mass bridge blockages during a critical event (election, coup) to deny successful blockage migrations at the moment users most need them.
-
In a Rust implementation evaluated over 10,000 runs with 3,600 bridges (1,800 open-entry single-bridge buckets and 600 hot-spare three-bridge buckets), Lox's trust promotion protocol incurs the highest latency at 364.2 ms response time and 378 kB response size due to the encrypted migration hashtable, but this operation occurs only once per user. All other protocols complete in under 16 ms with request sizes under 3.4 kB.
-
Lox's trust level scheme (L=0 through L=4, requiring 30, 14, 28, 56, and 84 days respectively per level before upgrading, per Table 2) with blockage inheritance — invited users inherit their inviter's blockage count d — prevents a censor from resetting their reputation through self-invitation after causing blocking events, while users with d ≥ 4 become ineligible to migrate, capping the damage a persistent infiltrator can do.
-
Lox uses Chase et al.'s keyed-verification algebraic MAC anonymous credentials in a single-issuer/verifier setting with jointly-chosen credential IDs (neither party can unilaterally select them), so a fully compromised Lox Authority cannot link credential showings to specific users or reconstruct the social graph — the LA learns only that a shown credential was authentically issued.
-
VPNalyzer is the first study to measure DNS leaks during tunnel failure, discovering that 8 VPN providers — including TunnelBear and Private Internet Access — allow DNS queries to bypass their kill switch or firewall rules, exposing users' ISP IP addresses and queried domain names to their ISP and DNS resolvers outside the tunnel.
-
Only 11 of 80 tested VPN providers supported IPv6 connectivity; 5 providers — Astrill VPN, Norton Secure VPN, Turbo VPN, SurfEasy VPN, and a university VPN — failed to block IPv6 traffic when the VPN tunnel did not support it, silently leaking all IPv6 data directly to the user's ISP even when IPv4 was fully tunneled.
-
Among 80 tested VPN providers, 26 leaked user traffic during tunnel failure: 18 exhibited a missing or broken kill switch leaking all traffic types, and 8 additional providers leaked only DNS traffic. In a case study of 39 top providers with all security settings explicitly enabled ('custom secure mode'), 10 still leaked traffic, with 6 leaking even with the 'kill switch' feature activated.
-
29 of 80 VPN providers — including paid services — configure clients to resolve DNS through third-party public resolvers (Google Public DNS, Cloudflare, OpenDNS, Quad9) rather than provider-operated infrastructure. Three self-hosted solutions (Algo, Streisand, Outline) hardcode public DNS with no easy override, causing connection failures in regions where those services are blocked.
-
27 of 80 tested VPN providers had servers within a single AS (AS 9009, M247 Ltd), and VPNalyzer identified 14 providers sharing 4 specific IP blocks within that AS; 2 additional providers shared an IP block in AS 60068 (Datacamp). Such infrastructure concentration enables censors to block multiple VPN products simultaneously with a single IP-range or AS-level rule.
-
OpenVPN's application-layer P_ACK packets — uniform in size and concentrated only in the handshake phase — provide a timing and count fingerprint detectable via threshold comparison over 10-packet bins. Tunnel-based obfuscation wrappers (Stunnel, SSH, obfs2/3, Shadowsocks) that do not add random padding preserve the 1:1 packet correspondence with the underlying OpenVPN stream, leaving 16 of 20 tested tunnel-based obfuscated configurations vulnerable to ACK fingerprinting.
-
34 of 41 obfuscated OpenVPN configurations and 18 of 20 UDP configurations were co-located with vanilla TCP OpenVPN servers within the same /29 subnet; probing the /29 subnet of a suspected obfuscated or UDP endpoint revealed nearby vanilla TCP servers, enabling confirmation by 'guilt by association' even when the obfuscated endpoint itself resisted direct fingerprinting. Some providers additionally share infrastructure across different VPN brands, further compounding exposure.
-
A two-phase passive-filter-plus-active-probing framework deployed at a 1-million-user ISP identified 85.90% of vanilla OpenVPN flows (1,718/2,000) and 72.67% of obfuscated flows (1,468/2,020), with an upper-bound false positive rate of 0.0039% across over 10 million flows — three orders of magnitude lower than prior ML-based approaches (1.4–5.5%). The system processed 15 TB and 2 billion flows per day on a single commodity server.
-
Even with tls-auth/tls-crypt HMAC protection making OpenVPN servers nominally 'probe-resistant' (silent to unauthenticated clients), the framework fingerprints servers via TCP-level timing side channels: a complete 16-byte client-reset probe triggers an immediate connection drop (HMAC validation fails after full packet reassembly), while a 15-byte truncated probe causes the server to stall awaiting the final byte until a server-specific handshake timeout expires. Over 97% of non-OpenVPN endpoints have RST thresholds below 500 or above 4,000 bytes, versus OpenVPN's characteristic 1,550–1,660 bytes derived from default MTU configurations.
-
OpenVPN's unencrypted opcode header byte is exploited to fingerprint vanilla and XOR-obfuscated flows: the XOR patch specification excludes the first buffer byte (the opcode) from reversal, so opcodes are always XOR-ed with the same key byte and map deterministically to fixed ciphertext values. All 4 of the top-5 VPN providers that offer obfuscated services use XOR-based obfuscation, and all were flagged by opcode fingerprinting over 90% of the time.
-
Censoring middleboxes respond to non-compliant TCP sequences because they must handle asymmetric routing and cannot rely on observing both sides of a connection. The hSYN; PSH+ACKi sequence elicited responses from 69.6% of 184 tested censoring middleboxes with a maximum amplification of 7,455×, while a lone PSH+ACK with no prior handshake elicited responses from 33.2% of middleboxes.
-
Anycast CDN architecture dominates popular web content delivery: in the US, 59% of Alexa top-1k websites use anycast CDNs vs. 19% DNS-based; in Saudi Arabia, 57% use anycast CDNs. IP geolocation databases such as Maxmind are severely inaccurate for anycast infrastructure — reporting only <15% of Saudi Alexa websites as in-country vs. 90% measured by RTT-based multilateration — causing prior research to incorrectly attribute "nation-state hegemony" over developing-country Internet traffic.
-
CacheBrowser and CDNReaper require clients to contact foreign CDN front-end IPs directly, but this only works for DNS-based CDNs; anycast CDNs use the same IP globally, so bypassing local DNS still routes the client to a local front-end. Only approximately 11% of Alexa top-1k websites use DNS-based CDNs across the five tested countries, and for potentially blocked sites (Citizen Lab lists), CacheBrowser can access only ~18% of 2,769 blocked URLs in Brazil.
-
CDN infrastructure causes 61%–92% of country-specific Alexa top-1k websites to be hosted within the client's own country across India, Iran, Saudi Arabia, Brazil, and the US, as measured by the authors' R-CBG multilateration technique achieving >89% accuracy. This traffic localization means web requests to popular sites rarely cross national borders, undermining the foundational assumption of decoy routing, domain fronting, CacheBrowser, and CovertCast.
-
Conjure's initial registration step requires the client to connect to an overt website hosted outside the censor's jurisdiction before deriving the unused IP address for actual decoy routing, but CDN traffic localization means this bootstrap connection frequently terminates at a local front-end and never crosses the border. The paper finds that for India's Alexa top-100 sites, only 23 websites had any parallel (leaf) HTTP connections terminating outside the country, with a median of just 3 such external leaf connections per site.
-
Variable-length sampling (Adaptation 2) achieves a provably secure but impractical encoding: a 16-byte plaintext encoded with GPT-2 requires 502–2994 tokens, produces 2.3–13.6 KiB of stegotext (149×–870× overhead), and takes 42–765 seconds even with GPU acceleration, depending on security parameter k=16–128.
-
Classical public-key steganography (Algorithm 1 from [54]) has a 100% failure rate when encoding a 16-byte message using GPT-2, because GPT-2's per-token entropy drops near zero frequently and standard rejection sampling cannot find an acceptable token. Entropy bounding reduces failure to 0–10% but introduces detectable statistical bias: selected tokens come from a visibly different probability distribution than baseline samples.
-
Meteor encodes bits by embedding a PRG-masked random value into the token-sampling randomness of a generative model, recovering bits proportional to the shared prefix length of the sampled interval. Expected throughput per sampling event is asymptotically within 1/2 of the Shannon entropy of the channel (proven in Appendix A), so Meteor automatically adapts to high entropy variability without explicit signaling or padding.
-
OUStralopithecus (OUStral), a Selenium-based OUS implementing empirically-derived human browsing distributions — Weibull dwell times (λ=30s, k=0.75), Von der Weth action probabilities (45.1% internal-link clicks, 33% new-URL navigations), and Dubroy tab-switching rates — generated 471 requests with all Cloudflare Bot Management scores above the recommended blocking threshold of 30, while Slitheen and Waterfall consistently scored 1. Because Cloudflare has full HTTP-layer visibility (unavailable to a passive network censor), the paper argues a censor observing only encrypted traffic would be even less able to flag OUStral.
-
Traffic replacement systems that only shape individual HTTPS flows remain vulnerable to censors monitoring inter-connection patterns over time. Waterfall's OUS (reloading the same page every second), Slitheen's OUS (naïve PhantomJS with no crawling), and Slitheen++'s OUS all produced non-human connection patterns detectable at the session level even when per-flow content is well-concealed. OUStral addresses this by shaping the distribution and sequencing of connections across an entire browsing session.
-
Prior overt user simulators (OUS) using PhantomJS — including Slitheen, Waterfall, and Slitheen++ — received Cloudflare Bot Management scores of 1 (certainly bot-generated) and would be blocked by any operator following Cloudflare's recommended cut-off of 30. Slitheen++ improved marginally by adding user-agent randomization and brief inter-request pauses, but all PhantomJS-based OUS implementations were trivially detectable as bots.
-
Across tunnelling systems that apply traffic shaping against ML adversaries, a clear throughput cost emerges: Slitheen + OUStral with WebM replacement achieves up to 2.2 Mbps with 4.7x overhead; Protozoa (WebRTC, end-to-end) achieves up to 1.4 Mbps; DeltaShaper (VoIP) achieves only 7 kbps at 2x overhead. By contrast, Conjure (no traffic shaping) reaches 100 Mbps. Additionally, end-to-middle decoy-routing deployments incur a throughput penalty from packet-boundary parsing at the relay station that end-to-end systems (Protozoa, DeltaShaper) avoid.
-
Extending Slitheen to replace WebM video/audio frames reduced mean overhead from ~20x (image-only Slitheen) to 4.7x (±1.6) over 100 ten-minute sessions, while raising throughput to a mean of 581.7 kbps in video-only mode (max 2023.3 kbps, min 78.2 kbps) and 721.6 kbps in background-video mode (max 1528 kbps). This compares favorably to DeltaShaper's 2x overhead at only 7 kbps and Protozoa's up to 1.4 Mbps, while preserving Slitheen's resistance to traffic-analysis attacks.
-
Balboa's covert signaling protocol derives per-connection keys as KDF(TLS_master_secret ∥ pre_shared_secret) and signals by XOR-ing the MAC of a TLS Application Data record with this derived key. Because the master secret is ephemeral, the scheme inherits TLS forward secrecy—unlike Telex-based signaling (Client Random modification), future server compromise cannot retroactively identify which historical connections used Balboa, and a censor mimicking a client has negligible probability of guessing the modified MAC without the pre-shared secret.
-
Balboa runs unmodified application binaries on standard inputs, intercepting TLS via dynamic library injection (LD_PRELOAD / DYLD_INSERT_LIBRARIES) to replace plaintext with covert data while preserving all TLS record lengths and non-timing characteristics. This yields goodput of 145 kbps for audio streaming and up to 8 Mbps for web browsing, versus 2.56 kbps for DeltaShaper and 19 kbps for Freewave, both of which run real applications on non-standard inputs.
-
Balboa currently supports only TLS 1.2 stream cipher suites, covering approximately 81% of TLS connections; an active censor can force non-stream cipher suite negotiation, causing Balboa to silently enter pass-through mode—a potential denial-of-service vector. Separately, if the server's traffic model deviates from the local baseline (e.g., the same audio file streamed repeatedly), a sufficiently powerful censor can detect the anomaly independently of whether Balboa is running.
-
A random-forest classifier trained on TCP statistics distinguishes Balboa-enabled traffic from baseline with 66–84% accuracy at zero network latency (key features: average TCP window advertisement and data transmit time), but accuracy falls to near-random (50–57%) once realistic latency is introduced (≥5 ms mean). Adding four additional innocent clients to the classification task further reduces accuracy—e.g., VLC at zero latency drops from 84% to 66%.
-
By extracting TLS session keys through library debugging hooks (SSLKEYLOGFILE for GnuTLS/NSS/Rustls; an injected SSL_new() callback for OpenSSL) rather than reimplementing the TLS handshake, Balboa leaves the ClientHello entirely untouched. This prevents the class of fingerprinting attacks documented by Frolov and Wustrow that identified meek and similar tools via observable differences in cipher-suite ordering and TLS extension patterns, while remaining compatible with OpenSSL, GnuTLS, NSS, and Rustls without requiring application source-code modifications.
-
Large-file transfers via Camoufler (using Telegram as the IM channel) show modest overhead compared to direct wget: a 10 MB file takes 13.6s vs. 7.9s direct, 50 MB takes 52.1s vs. 35s, and 100 MB takes 93.3s vs. 68s. The overhead stems from the server downloading the complete file before forwarding it, but performance still substantially exceeds prior tunneling systems such as SWEET (email-based) and CovertCast (video-based), which the authors describe as incurring >10s even for small webpage loads.
-
Camoufler defeats active probing of its server endpoints by keeping server IM IDs private (shared only out-of-band with trusted clients) and configuring the server to respond only to those trusted IDs. An adversary systematically probing IM IDs to find Camoufler servers would receive no response from the server, making enumeration futile. When E2M-encrypted IM providers could collude with a censor, an additional application-layer key exchange (DH with RSA-wrapped ephemeral key, AES-256, PFS via key deletion) prevents the provider from revealing plaintext even under coercion.
-
Traffic analysis comparing Camoufler clients (fetching blocked websites) to regular IM clients (exchanging multimedia) shows indistinguishable packet-exchange rates and packet-size distributions: a 1.3 MB document download via Camoufler peaked at >700 packets/s, matching the >800 packets/s spike from a 1.5 MB video download by a regular IM client. Packet sizes cluster identically in two bins (<100 bytes for ACKs; >1,200 bytes for data) regardless of whether the underlying content is a web page or a video.
-
Camoufler's blocking-resistance relies on collateral-damage economics: IM platforms had ~2.5 billion active users as of January 2019 (projected >3 billion by 2022) and are embedded in essential business and commercial operations (airline e-tickets, professional collaboration tools). Blocking all IM to disrupt Camoufler would require the censor to harm its own economy; the threat model requires only that the censor permits at least one IM platform, in which case Camoufler remains operational.
-
Camoufler tunnels censored web traffic through real Instant Messaging applications (Signal, Telegram, WhatsApp, Slack, Skype), achieving a median page-load time of 3.6s (average 4.1s) over Signal and 2.3s median (average 2.7s) over Telegram for Alexa top-1,000 sites — compared to 120s for CovertCast loading BBC News and only 2.56 Kbps throughput for DeltaShaper. Over 90% of TTFB trials across 10 popular sites completed under 2s, with 50% under 1s.
-
The authors developed 'Aladdin,' a 10-step OONI-based measurement experiment that isolates SNI-based blocking (step 1), Host-header blocking (step 2), DNS injection (step 3), system-resolver vs. DoH discrepancy (steps 4–5), TLS interception (steps 6–8), and TLSv1.3-specific SNI dependency (step 10); this methodology exposed Vodafone's Allot TLS interception that OONI's Web Connectivity test had recorded only as a generic certificate error.
-
Spain's blocking infrastructure, initially mandated for copyright and gambling enforcement, was repurposed to block 24 unique Catalan referendum URLs during October 2017, including the IPFS gateway and two GitHub Pages domains. GitHub Pages was blocked only via DNS manipulation (pointing to 127.0.0.1) rather than HTTP blocking specifically to avoid collateral blocking of all of GitHub.
-
Analyzing over 3 million OONI network measurements (2016–2020) from 17 ASes covering 98.45% of broadband and 90.94% of mobile subscribers in Spain, the study detected 16 unique blockpages, 2 DPI vendors (Fortinet/Fortigate in Telefonica; Allot in Vodafone), and 78 blocked websites across copyright, political, civil-rights, and referendum categories.
-
DPI blocking by Spanish ISPs (Fortinet/Telefonica) was circumvented by inserting a tab escape character (\t) into HTTP GET request headers, or by delaying HTTP GET transmission — the same techniques reported to have bypassed DPI blocking of Catalan referendum sites in 2017. Both techniques exploited the DPI's shallow, stateless inspection of the opening HTTP request.
-
Vodafone (AS12357, AS12430, AS6739) deployed Allot-based TLS interception to block womenonweb.org: the system resolver returned a legitimate IP (67.213.76.19), but connecting to it triggered a forged certificate signed by Allot; disabling TLS certificate validation fetched the Vodafone blockpage, confirming a man-in-the-middle box rather than a redirect. OONI's standard Web Connectivity test recorded only a generic ssl_error:certificate verify failed and missed this entirely.
-
Active-probing censors who discover a shadow domain can be defeated by adding a CDN rule that only fetches from the blocked back-end when a secret custom request header is present; without it the CDN returns an innocuous response. Layering domain fronting over domain shadowing (DfDs) further hides the shadow domain by routing the initial request through an allowed front domain with the Host header set to the shadow domain, so the censor never sees the shadow domain in the SNI or DNS query even during active inspection.
-
Of 6 major CDNs surveyed (Google Cloud CDN, AWS CloudFront, Azure CDN, Fastly, Cloudflare, StackPath), 5 support full API automation of the three steps required for domain shadowing: setting the front-end, setting the back-end, and rewriting the Host header. Cloudflare restricts Host header rewriting to enterprise-tier accounts only, making it unsuitable without paid upgrade. All six CDNs allow arbitrary back-end domain binding by design, and all back-end DNS CNAMEs can be indirected to evade any CDN-side blocklist of popular domains.
-
In 200-request latency experiments, all five CDN providers used for domain shadowing yielded lower round-trip times than directly fetching from the origin server; Azure, Fastly, and StackPath showed median delays less than half those of direct visits. User-configured VPS HTTP proxies — including a powerful AWS t3a.2xlarge instance (8 vCPU, 32 GB RAM) — still underperformed CDN-based domain shadowing.
-
Google Cloud CDN and Amazon CloudFront disabled domain fronting by 2021 by enforcing SNI/Host header consistency, causing Tor Meek, Psiphon, Lantern, and Signal to halt or migrate their domain-fronting deployments. Domain shadowing avoids this failure mode entirely because it does not rely on the SNI/Host mismatch that CDNs were able to patch with a simple header equality check.
-
Domain shadowing makes all three traffic indicators — connecting URL, SNI, and Host header — appear to belong to an allowed shadow domain while fetching content from a blocked back-end domain via CDN. Unlike domain fronting, it exploits a legitimate CDN feature (arbitrary back-end binding) rather than a SNI/Host mismatch quirk, so CDNs cannot disable it by enforcing header consistency without breaking legitimate use cases such as third-party service outsourcing via CNAME. The technique was demonstrated successfully accessing www.facebook.com from a heavily censored country.
-
Protozoa's covert channel throughput degrades gracefully under bandwidth constraints but remains usable for common applications: average throughput is 975 Kbps at 1500 Kbps cap, 460 Kbps at 750 Kbps, and 91 Kbps at 250 Kbps. Under 2% and 5% packet loss the channel sustains 1130 Kbps and 360 Kbps, respectively, while 10% loss (near WebRTC tear-down threshold) still yields 160 Kbps without breaking the connection. Traffic analysis resistance is preserved across all these conditions, with AUC peaking at 0.65.
-
CRON restricts multi-hop covert circuits (N≥1 relays) to delay-tolerant traffic only, because establishing multiple simultaneous WebRTC video calls is 'highly atypical in normal user profiles' and would trigger S1 behavioral anomaly detection. Real-time interactive tunneling is limited to direct circuits (N=0) within pre-existing calls, and active mode introduces only bounded variability in call times and frequency to stay within plausible user-profile ranges.
-
Protozoa creates a ≈1.4 Mbps covert channel over WebRTC by replacing encoded video frames with covert payload while preserving SRTP packet size and timing properties, making Protozoa flows 'hardly distinguishable from unmodified WebRTC streams using existing ML-based traffic classifiers.' Since all unencrypted packet fields remain intact, DPI cannot detect the tunnel either.
-
Even when individual WebRTC flows pass traffic analysis, a censor can identify CRON users via three long-term statistical attack types: S1 (simultaneous video calls, atypical for normal users), S2 (sudden connections to previously unknown parties), and S3 (calls at anomalous times, frequencies, or durations). Relay nodes in multi-hop circuits are particularly exposed via S1 because conducting multiple simultaneous video calls is highly atypical in normal user profiles.
-
Slitheen++ achieves a median covert site loading time of 7 seconds in the naive setup, rising to 8 seconds with crawling and 13 seconds with a 1-second thinking-time (TT) delay. The Baseline-to-Covert factor ranges from 3.7–8.5 without TT and from 7.6–21.4 when crawling and 1-second TT are combined, reflecting the fundamental tradeoff between stealth overt behavior and covert throughput.
-
Slitheen++ embeds covert upstream data by applying HTTP/2-like header field compression to overt HTTP requests, using the recovered space for covert data placement. This ensures that neither timing information nor observable changes to packet sizes or delays can reveal decoy routing use to an omni-scientist passive censor. GZIP compression was explicitly avoided to prevent the CRIME side-channel attack.
-
Slitheen++'s relay station introduces minimal overt forwarding overhead: 95% of setups saw downstream per-packet delays between 1 ms and a maximum of 4 ms, with on average only 0.0029% of downstream packets affected (peak 0.006% in any single scenario). Upstream delays were similarly low except for a single outlier near 60 ms caused by thread contention during crawling-induced relay load spikes.
-
The original Slitheen appended covert upstream data directly to overt HTTP requests, significantly changing upstream traffic patterns and enabling censor identification even when traffic is encrypted. This upstream traffic analysis vulnerability—absent from Slitheen's original threat model—is the primary weakness Slitheen++ addresses.
-
A censor can identify Slitheen relay connections by observing that all packets in a suspected overt flow arrive in strict order while flows from the same source naturally exhibit out-of-order delivery: the relay station's traffic-server component reorders TCP segments to enable TLS record decryption, creating a statistically anomalous per-connection ordering pattern. The reordering buffer also increases per-packet round-trip times, providing a secondary timing signal.
-
Geddes et al. demonstrated that acknowledgement packets in covert-channel circumvention systems can be identified through timing characteristics and selectively interfered with to disrupt the tunnel [§4.3, CCS 2013]. A Turbo Tunnel session layer adds fixed-overhead headers and periodic ACK/keepalive traffic that may produce distinctive timing patterns absent in legitimate flows, potentially increasing susceptibility to traffic-shape classifiers.
-
Manually-crafted decision trees combining probe non-response, FIN/RST close type, and connection timing achieved a false-positive rate below 0.001% for obfs4, Lampshade, Shadowsocks, and OSSH across 1.9 million endpoints; for OSSH specifically, 7 of 8 flagged Tap endpoints were confirmed genuine Psiphon proxies by developers. MTProto was the sole exception, producing 3,144 false positives (0.56% of Tap, 0.02% of ZMap) because its infinite-timeout behavior is shared by a non-negligible population of common hosts.
-
Endpoints that never close a connection and never respond to any probe ('infinite timeout') represent 0.7% of the ISP Tap dataset and 42% of the ZMap active-scan dataset; this is the single most common probe-indifferent behavior in both datasets. MTProto already exploits this: its strategy of keeping failed connections open indefinitely produces the highest false-positive rate (0.56% of Tap) among all tested protocols, making it effectively uncountable at acceptable collateral-damage thresholds.
-
The authors' ISP Tap dataset yielded 129,000 unique response sets across 433,286 endpoints while ZMap's 1.5 million endpoints produced only 31,000 unique sets — with over 42% of ZMap endpoints behaving identically (infinite timeout, no data) due to firewall chaff. This vantage-point bias means the effective false-positive rate a censor faces when targeting ISP-observed traffic is ~28× lower than against random scans (0.02% vs 0.56% for MTProto), making ISP-scale active probing far more actionable than Internet-wide scanning alone.
-
Across 433,286 endpoints from a 10 Gbps university ISP passive tap, 94% responded with data to at least one of 8 protocol probes (TLS, HTTP, STUN, S7, Modbus, DNS-AXFR, random bytes, empty); all five tested probe-resistant proxies (obfs4, Lampshade, Shadowsocks, MTProto, OSSH) never responded with data to any probe. This single filter reduces the suspect set from 433,286 to ~26,000 endpoints and rules out 94% of ISP-observed hosts as non-proxies with zero false negatives against the tested protocols.
-
Each probe-resistant proxy exposes a unique TCP close-threshold fingerprint: obfs4 closes with FIN at 8,192–16,384 bytes and RST at the next multiple of 1,448 bytes beyond that; Lampshade at FIN 256 bytes / RST 257 bytes; Shadowsocks-python and -outline both at FIN 50 bytes (outline also RST at 51); OSSH at FIN 24 bytes / RST 25 bytes. A binary-search tool using random probes can discover these thresholds remotely without knowing any shared secret, providing a protocol-specific fingerprint independent of payload content.
-
MIMIQ requires modifications only to a single trusted network (an ISP or enterprise AS): an address allocation server and several critical edge switches. Remote QUIC servers and the wider Internet require no changes. The authors argue ISPs have financial incentives to deploy MIMIQ as a privacy-enhancing service, and that QUIC's rapid adoption (600K+ QUIC-enabled domains, 1.6M QUIC-capable IPs as of 2020) means server-side support is increasingly given.
-
At a round-trip time of 50 ms, MIMIQ incurs only a 10% throughput drop when migrating every 25 packets (frequency = 4 per 100 packets). Per-migration latency ranges from 7–64 ms at 10 ms RTT and 43–99 ms at 50 ms RTT as migration frequency scales from 0 to 50 migrations per 100 packets.
-
MIMIQ leverages QUIC's connection migration to change a client's IP address mid-connection without disrupting ongoing transfers. QUIC's specification requires that endpoints never reuse connection IDs during migration and that migration tokens are encrypted; this makes migration events indistinguishable from a second distinct client initiating a new connection from a fresh IP address.
-
Against censors that detect blacklisted application protocols by examining only the first 30 packets of a flow (e.g., the technique in Wang et al. 2015), a single IP migration after 30 packets have been exchanged is sufficient to defeat detection while incurring minimal performance overhead—the client continues the connection normally on the new address.
-
Migrating the client IP address every 25–100 packets reduces state-of-the-art website fingerprinting attack accuracy to below 10% in the closed-world setting, outperforming advanced dedicated defenses such as HyWF multipathing. The mechanism works because most fingerprinting classifiers rely on as many packets per flow as possible, and flow splitting degrades feature quality.
-
MoneyMorph's threat model exploits the economic cost of blocking entire cryptocurrency networks: the censor is left with a binary choice — ban the full blockchain (incurring economic harm to the censored region) or allow all transactions including covert bootstrapping traffic. This assumption is grounded in the censor's observed tolerance of Bitcoin despite known circumvention use.
-
A prototypical Python implementation of MoneyMorph completes all cryptographic operations in under 50 milliseconds on a commodity Intel Core i7 (2.2 GHz, 16 GB RAM): fresh key-pair generation takes approximately 120ms, shared key derivation approximately 41ms, and symmetric encryption/decryption under 1ms. The dominant latency in practice is blockchain confirmation time, not computation.
-
MoneyMorph provides provable chosen-covertext attack security (SBS-CCA) for proxy bootstrapping, unlike prior email or social-media rendezvous approaches which offer only heuristic security. Under SBS-CCA, the censor's advantage in distinguishing a covertext-bearing transaction from a random transaction in the same space is negligible.
-
Sibling transaction analysis across 45 million Bitcoin transactions (blocks 580,000–600,000, June–Oct 2019) shows 32% use the Pay2PKeyHash + Pay2ScriptHash combination MoneyMorph employs. In Monero, the two-input two-output structure matches 42% of all transactions. In Zcash, only 11–19% of transactions are shielded, giving it the lowest sibling rate despite the highest bandwidth.
-
Zcash shielded transactions provide the highest per-transaction bandwidth of any tested cryptocurrency: 1148 bytes for the challenge covertext and 1168 bytes for the response, at a transaction fee of less than 0.01 USD. Bitcoin yields only 20/40 bytes at $0.34 fee and Ethereum only 20 bytes at $0.18 fee.
-
Protocol Proxy uses 'protected static protocols' — UDP-based protocols whose blocking causes severe collateral damage (e.g., Synchrophasor power-grid traffic, NTP) — as cover channels. Because any detection rule that fires on Protocol Proxy traffic also fires on legitimate PMU traffic, censors face a forced trade-off between blocking circumvention and disrupting critical infrastructure.
-
A deterministic Hidden Markov Model trained on 770,000+ real Synchrophasor samples produces interpacket timing that is statistically indistinguishable from the host protocol: the two-sample Kolmogorov–Smirnov test yields p = 0.21 (threshold 0.05, fail to reject null), and χ² homogeneity p-values for all three timing states are 0.82, 0.37, and 0.15 respectively.
-
Observation-based FTE constructs each packet field exclusively from values previously observed in real host-protocol traffic, guaranteeing syntactic equivalence. Wireshark correctly decodes Protocol Proxy-generated packets as valid Synchrophasor frames with correct checksums, and the Phasor Data Concentrator hardware accepts them; any rule blocking Protocol Proxy traffic must therefore also block legitimate PMU packets.
-
The Protocol Proxy achieves an observed goodput of only 182 bps against a 54 Mbps baseline link (>99.99% reduction), well below the theoretical ceiling of 15,477 bps; the gap is attributed to TCP retransmission overhead and the TCP header transiting the proxy. Tor baseline goodput measured at 7.31 Mbps by comparison.
-
Static protocols — UDP-based with no application-layer handshake — are immune to stateful protocol analysis that defeated SkypeMorph: without a handshake state machine, a censor cannot flag discrepancies between observed and expected protocol states. This eliminates the detection vector that Houmansadr et al. (2013) exploited to identify SkypeMorph via handshake mismatch.
-
Censored Planet collected 21.8 billion measurements over 20 months from more than 95,000 vantage points in 221 countries, covering 66–173 more countries than OONI and ICLab, with a median of 8 ASes per country versus OONI's 4 and ICLab's 1. In March 2020, it achieved coverage of 9,014 ASes compared to OONI's 1,915. Censored Planet and OONI together covered all 21 countries rated 'Not Free' by Freedom House, while ICLab reached only 4.
-
Of 21.8 billion raw measurements, approximately 7% (1.5 billion) were initially flagged as blocked; iterative HTML clustering and DBSCAN image clustering then removed ~500 million false positives, leaving ~1 billion confirmed blocked measurements. The clustering process formed 457 new response clusters, of which 308 were confirmed blockpages and 149 were false positives, with Cloudflare bot-checks being a notable source of false positives in HTTPS measurements.
-
Mann-Kendall trend analysis at 99% significance on 20 months of data found increasing censorship activity in more than 100 countries, driven primarily by DNS and HTTPS blocking methods, and identified 11 website categories facing rising censorship including human rights content, news media, and provocative attire. Countries such as Norway (ranked #1 in press freedom) showed aggressive DNS blocking across 25 ASes targeting more than 50 domains in at least 6 categories including hrw.org.
-
Censored Planet achieves 93% /24 vantage-point continuity and 99.01% AS continuity between weekly scans, versus ICLab's 64% and OONI's 36% AS continuity. Applying bitmap-based anomaly detection on the resulting longitudinal time series detected 15 prominent censorship events over 20 months, two-thirds of which had not been previously reported, while OONI data showed no corresponding increase for most newly discovered events due to sparse volunteer measurements.
-
During the Sri Lanka social-media block following the April 21, 2019 bombings, Censored Planet measured HTTP(S) censorship jumping from 0.1% to 2% in one week and discovered 22 blocked domains versus the 7 reported by NetBlocks and AccessNow; 5 of those extra domains were only present in the Alexa top-sites list, not the Citizen Lab Global Test List. Blocking remained elevated through May 12, 2019, contradicting public reports that the ban was lifted by May 1st.
-
In HTTP tests, more than 50% of filter responses that indicated censorship contained an injected HTML blockpage; the remainder used TCP RST injection or connection timeout. In HTTPS measurements, canonical template matching had a failure rate of only 1.9%, and 95% of Hyperquack measurements completed within 3.5 hours across ~45,000 vantage points.
-
SiegeBreaker's session bootstrapping (from initial email to installed SP redirection rule) averaged 3–4 seconds across 100 trials, with the dominant delay attributed to email handling (SMTP connection, Selenium composition) rather than network latency; this setup cost is not included in the download-time benchmarks. The auxiliary ping-based switch-selection signal encodes 48 bits across three ICMP header fields (IP-ID, ping sequence number, ping identifier), requiring ~281 trillion spoofed ping packets per client–OD pair to brute-force.
-
SiegeBreaker explicitly acknowledges two unresolved attack vectors: (1) latency-based traffic analysis attacks (forced-asymmetry / RAD-style), which the system does not mitigate, and (2) website fingerprinting attacks against the proxied traffic, for which no defense is implemented. Additionally, the email-based control channel is vulnerable to a censor who can delay or block emails to the controller's address, disrupting rule installation before the client's SYN packet arrives.
-
Prior decoy routing deployments suffered severe throughput degradation: the TapDance ISP pilot reported average client throughput of only ~5 KB/s, making it unsuitable for most web content; other DR prototypes restricted evaluation to files under 1 MB in controlled lab settings, with some reporting over 30 seconds to load home pages under 1.5 MB in size.
-
All prior decoy routing systems (Cirripede, Telex, TapDance, Slitheen, Waterfall) require the DR to inspect every traversing flow — either all TCP SYN packets or all TLS flows — to identify DR requests, creating a privacy breach for non-DR users and a computational bottleneck. SiegeBreaker eliminates this by using an out-of-band email pre-registration (encrypted to the controller's 2048-bit RSA public key) that pins the controller's inspection rule to a single client-IP/OD-IP/ISN triple, so only authenticated potential DR flows are ever redirected.
-
SiegeBreaker achieves near-native TCP performance in Internet experiments: average download time for Alexa top-500 home pages via SB was 1.8 s versus 1.7 s for direct wget, across 500 concurrent client instances; bulk downloads of 1 GB files over a shared 1 Gbps link showed SB and native TCP sharing bandwidth almost equally, and throughput remained stable under 15 Gbps of cross-traffic or 50,000 parallel flows on the SDN switch.
-
The first production Refraction Networking deployment used four TapDance stations at Merit Network observing 140 Gbps aggregate capacity and served up to 33,000 unique users per month across 559,000 Psiphon installations, proxying up to 500 Mbps of circumvention traffic during the first year of continuous operation.
-
Snort contains two novel TCP Timestamp discrepancies versus Linux: it omits RFC 7323-mandated timestamp validation on RST packets in SYN_RECV state, and its PAWS TSval acceptance window is 'off by two' — a TSval of 0 or 0xffffffff following a packet with TSval 0x80000000 is accepted by Linux but rejected by Snort, enabling insertion-based evasion by crafting packets that fall in the divergent range.
-
Snort interprets the TCP urgent pointer as the offset to the last byte of urgent data and discards all payload bytes before that offset, while Linux consumes only 1 urgent byte and leaves the remaining payload intact. Injecting a packet with the URG flag and the urgent-pointer offset pointing to an insignificant padding byte allows the full sensitive payload to reach the server while Snort strips it — a novel evasion strategy not previously reported.
-
SymTCP uses selective symbolic execution over Linux's TCP implementation (S2E + KLEE) to enumerate all packet sequences reaching 47 binary-level accept or drop points from LISTEN to ESTABLISHED, then conducts differential testing against a blackbox DPI to confirm discrepancies; the open-sourced system requires no DPI source access and covers 37 of 47 drop points within the operationally relevant handshake window.
-
SymTCP generated 56,787 candidate insertion/evasion packets in approximately one hour using concolic execution over Linux's TCP stack. Evaluating a sampled set of 10,000 test cases against real DPI systems yielded 6,082 evasions against Zeek, 652 against Snort, and 4,587 against the Great Firewall of China — discovering 14 novel evasion strategies beyond those found by prior manual approaches.
-
Conjure achieves 20% lower latency, 14% faster download bandwidth, and over 1400 times faster upload bandwidth compared to TapDance on a 20 Gbps ISP testbed. TapDance upload is throttled to approximately 0.1 Mbps because it must reconnect for every 32 KBytes sent; Conjure maintains a single persistent connection. At the 99th percentile, Conjure is 281 ms (92%) faster than TapDance.
-
For IPv4, Conjure derives both the phantom host IP and TCP port from the client's registration seed, making exhaustive scanning infeasible: a censor enumerating from a /10 of potential client source IPs (4 million addresses) against a /16 of phantom IPs (65K addresses) across all 65K ports would require approximately 50 years at 10 Gbps with ZMap. Phantom hosts are additionally firewalled to respond only to the registering client IP, defeating single-vantage-point ZMap scans.
-
IPv6 phantom addresses drawn from an ISP's /32 prefix provide 2^96 potential addresses, making exhaustive enumeration and pre-image attacks computationally infeasible. Analysis of 4013 observed IPv6 addresses in a deployed /32 found approximately 75 bits of entropy (out of a maximum 96), with enough overlap with legitimate address distributions that blocking high-entropy addresses would produce significant collateral damage to real IPv6 services.
-
Conjure registration is unidirectional: the client embeds a steganographic ciphertext tag in a complete HTTPS request payload encrypted under a Diffie-Hellman shared secret, and the station passively observes it without sending any reply or spoofing packets. This design makes registration flows indistinguishable from normal HTTPS traffic and enables 25% more viable registration decoys than TapDance by removing the requirement to exclude decoys with short TCP windows or connection timeouts.
-
Oman and Qatar deploy layered blocking: after a TCP handshake to geti2p.net completes normally, a TCP RST is injected immediately after the TLS ClientHello (SNI-based blocking), while HTTP connections to the mirror site receive injected packets redirecting to explicit national block pages. Kuwait applied only the HTTP mirror block, and only at one of six tested ASes (AS47589, Kuwait Telecommunication Company), with all other Kuwaiti networks leaving I2P fully accessible—illustrating significant ISP-level variation within a single country.
-
An adaptive censor that retrains classifiers on both unmodified and GAN-transformed Meek traffic ('informed NN') partially recovers detection capability: informed NN achieves a PR-AUC of 0.440 against modified traffic versus 0.309 for the naive NN, and achieves FPR of 0.667 versus 1.000 for the naive NN. However, the informed NN suffers from catastrophic interference and performs worse on FPR than the naive classifier on unmodified data (0.545 vs. 0.002).
-
A GAN-based adversarial transformer applied to Meek traffic signatures increases mean classifier FPR from 0.183 to 0.834 and decreases mean area under the precision-recall curve (PR-AUC) from 0.990 to 0.414 across naive neural network, informed neural network, and CART decision tree classifiers evaluated on three geographically distinct datasets (residential, university, AWS).
-
The paper identifies that Meek traffic is compared against average HTTPS traffic across all domains rather than against traffic specific to the CDN fronting host (e.g., ajax.aspnetcdn.com for meek-azure), meaning a transformed signature that mimics generic HTTPS may still appear anomalous relative to expected traffic to that specific CDN host. This dataset construction limitation means real-world GAN-guided shaping must target host-specific traffic baselines, not population-wide HTTPS baselines.
-
Prior ML classifiers achieve near-perfect detection of unmodified Meek traffic using side-channel features: Wang et al. attain a false positive rate (FPR) as low as 0.0002 with a CART decision tree, Yao et al. achieve 99.98% accuracy with a hidden Markov model, and Nasr et al. deanonymize Meek flows with FPR of 0.0005 using a neural network. The distinguishing features are TCP payload size distributions (Meek concentrates 60–70 byte payloads) and inter-arrival time distributions (higher latency).
-
Incorporating perturbation loss — the mean absolute difference between original and transformed traffic signatures — into the GAN's training objective constrains the transformer to make minimal modifications, reducing the implementation overhead a real-time traffic shaper would require. The perturbation loss is weighted at 10× relative to classification losses, enforcing sparse modifications while still fooling the discriminator.
-
Capturing as little as 30 seconds of a multimedia-tunneling flow is sufficient for XGBoost to reach the same AUC achieved with a 60-second window (AUC=0.99 for Facet s=50%, AUC=0.95 for DeltaShaper h320×240, 8×8, 6, 1i at 30s). Classification performance degrades monotonically below 30 seconds, reaching AUC≈0.81 (Facet) and 0.75 (DeltaShaper) at 1 second.
-
CovertCast — which scrapes web content into colored-matrix frames broadcast over YouTube live streams — is fully detected by the χ² similarity classifier with TPR=100% and FPR=2%. The KL and EMD classifiers achieve TPR>96.5%, indicating the system provides essentially no unobservability in practice.
-
Feature importance analysis of XGBoost models reveals that Facet covert channels are identifiable primarily through packets in the 115–195 byte range (dominated by Skype audio packets), while DeltaShaper is identifiable through two distinct packet-length clusters: 85–100 bytes and 1105–1205 bytes. XGBoost assigns non-zero importance to only ~58% of the 300 quantized packet-length bins for Facet and ~42% of 600 bins for DeltaShaper, indicating that leakage is concentrated in a narrow portion of the packet-size distribution.
-
Unsupervised and semi-supervised anomaly detection methods (OCSVM, Isolation Forest, shallow autoencoders) perform near-random when attempting to detect multimedia protocol tunneling: OCSVM achieves average AUC between 0.518–0.584 across all tested configurations, Isolation Forest between 0.519–0.557, and autoencoders reach a maximum AUC of 0.702 only under optimal hyperparameter search. The paper concludes that labeled training data is a hard requirement for effective covert-channel detection.
-
Decision tree classifiers (XGBoost) can flag 90% of Facet multimedia-tunneling traffic while erroneously flagging only 2% of legitimate Skype connections (FPR=2%). Against DeltaShaper at its most conservative configuration (h160×120, 4×4, 6, 1i), XGBoost achieves AUC=0.85, demonstrating that existing unobservability claims for all three systems (Facet, CovertCast, DeltaShaper) were flawed.
-
Despite I2P's decentralized design, a censor can block more than 95% of peer IP addresses known to a stable I2P client by operating only 10 routers in the network. The censor learns this by passively monitoring the distributed netDb through injected floodfill and non-floodfill nodes, exploiting the fact that I2P's peer-discovery mechanism exposes the near-complete address space to any sufficiently resourced participant.
-
A blocking rate of more than 70% of I2P peer IP addresses is sufficient to cause significant latency in web browsing activities, while blocking more than 90% of peer IP addresses can make the I2P network unusable. The cost to reach the 95% blocking threshold is operating only 10 censor-controlled routers.
-
I2P obfuscates payload content to prevent protocol identification, but flow analysis can still fingerprint I2P traffic because the first four handshake messages between I2P routers have fixed lengths of exactly 288, 304, 448, and 48 bytes. The I2P team acknowledged this and was developing an authenticated key agreement protocol to resist automated identification.
-
A simpler but effective complement to IP-list blocking is to block access to I2P's small set of hardcoded reseed servers: first-time users cannot fetch RouterInfos of other peers and are entirely prevented from joining the network. Reseed servers are functionally equivalent to Tor directory authorities as a single point of failure for bootstrapping.
-
For 1 MB files, even at a database of only 50,000 entries, PIR responses reach 73.1 MB per retrieval, making proof-of-censorship impractical for image or video streaming content providers. By contrast, for 256-byte (Twitter-like) messages the system remains workable at 10 million files with 8.0 MB queries and 2.0 MB replies, and stays roughly constant in reply size (2.0 MB) between 500k and 10 million files.
-
The proof-of-censorship scheme uses single-server computational PIR with homomorphic encryption so that the server, having signed both the PIR query hash and its reply, cannot selectively omit responses for a targeted file without returning garbage data. A client detecting the mismatch publishes the upload ticket, signed reply, and query seed as a compact, transferable cryptographic proof of censorship verifiable by any third party holding the server's long-term public key.
-
On a quad-core Intel Core i5 (3.30 GHz) against a database of 1 million 256-byte messages, the prototype produces a 3.8 MB PIR query (28 ms client-side generation) and a 2.0 MB proof requiring 2.8 s of server-side processing; third-party proof validation takes 52 ms, and the 120-byte upload ticket validates in 381 µs. All client-side operations are fast enough for smartphone or JavaScript implementations.
-
A censoring server cannot selectively withhold PIR responses for a targeted file while honestly answering others: if a PPT algorithm A could distinguish targeted-file queries from all other queries, it would directly violate the query privacy of the underlying PIR scheme. The server's only compliant evasion strategy is an indiscriminate shutdown — refusing all queries or all signatures — which is behaviorally distinguishable and does not produce a plausible-deniability defense.
-
Proofs of censorship are transferable and persistent: even if a content provider restores a censored file, previously generated proofs remain cryptographically valid and can serve as a reputation mechanism, a trigger for smart-contract financial penalties (e.g., Ethereum bonds), or mandatory disclosures to transparency databases such as Lumen, enabling accountability for transient or temporally-selective censorship that current transparency reports cannot capture.
-
Across 85,421 Cloudflare-hosted domains crawled from five vantage points, 524 websites employed country-based blocking (Cloudflare error 1009). Ukraine (VPN) received 313 geo-blocks while Scotland (same VPN provider) received only 175, suggesting that IP/ASN reputation or exit-node characteristics cause significant variation in observed blocking rates even when controlling for the access method.
-
Because a disproportionate number of Tor exit nodes are located in the EU, GDPR-motivated blanket blocking of EU IP ranges creates collateral access restrictions for Tor users globally. This illustrates that privacy-protective legislation and censorship-circumvention infrastructure can have directly competing effects when server-side enforcement is implemented via coarse geographic IP filtering.
-
After GDPR took effect on May 25, 2018, 74 websites that had previously served all three EU vantage points (London, Sofia, Frankfurt) began blocking them; 40 returned explicit 'Blocked due to GDPR' blockpages with HTTP 403, 7 used HTTP 451 Unavailable For Legal Reasons, and all 47 sites with explicit blockpages were local news outlets.
-
Ukraine and Scotland both used the same VPN provider yet Ukraine received 1,874 CAPTCHA challenges vs. 309 for Scotland, and 1,519 browser verification challenges vs. 1,091 — a roughly 6× and 1.4× difference respectively. Only Ukraine was flagged as a VPN or Tor node by OctoNet's HTTP filter, indicating that IP/ASN reputation drives security-motivated blocking independently of the transport protocol used.
-
The paper enumerates at least eight distinct non-censorship motivations for server-side geo-blocking — economic sanctions, third-party liability (SESTA), copyright, GDPR compliance, security/fraud concerns, hosting costs, revenue optimization, and misconfiguration — each of which can produce the same observable signals (403 blockpages, DNS failures, TCP resets) as government censorship. Naive measurement methods that treat all location-based unavailability as censorship will produce systematic false positives.
-
Packet-length frequency distributions reliably distinguish regular Skype calls from irregular streams using Earth Mover's Distance (EMD): regular streams consistently produce EMD < 0.1 against a reference stream, while irregular streams range from 0.025 to 0.25. At the breakeven threshold ∆I = 0.066, an EMD classifier achieves 83% accuracy (equal sensitivity and specificity). An aggressive policy (∆A) blocks 95% of legitimate calls to catch all irregular streams; a conservative policy (∆C = 0.11) passes 80% of irregular streams to avoid false positives.
-
Across eight combinations of traffic features (packet length, bi-gram packet length, inter-packet time, bi-gram inter-packet time) and two similarity metrics (EMD, KS), adversarial classification accuracy against DeltaShaper streams ranges from 72–90% in unperturbed conditions. Bi-gram inter-packet times with EMD achieves 88% accuracy, matching packet-length/EMD, but requires roughly 10x the computation (~64s vs ~6s). Bandwidth throttling to 300 Kbps degrades classifier accuracy from 88% to 75%, but also drops Skype frame rate from 30 to 5 FPS, creating collateral damage that limits censor deployment of throttling as a detection aid.
-
The Lavinia audit protocol is designed so that auditors are cryptographically indistinguishable from ordinary readers: an auditor cannot reveal her status to a server without forfeiting her own payment, and servers are therefore forced to serve content in response to every request. Any reader may additionally claim to be an auditor, and servers cannot verify such claims, further preventing selective serving.
-
The burn contract mechanism defends against deliberate auditor-chain termination attacks, in which a malicious actor poses as an auditor and refuses to post her secret, preventing all subsequent auditors from performing their audits. If the previous auditor fails, the current auditor can burn both her predecessor's payment and her own, receive a small fraction of those funds as incentive, and forward the chain secret to the next auditor — preventing a single compromised link from collapsing the entire revenue stream for a document.
-
Lavinia requires its underlying payment system to satisfy four properties for suitability in censorship-resistant contexts: (1) coercion-resistance through geo-political distribution or anonymization, (2) redeemable with a distributable secret, (3) time-locked escrow preventing early redemption, and (4) an append-only public log. The paper demonstrates that Bitcoin satisfies all four properties, with Zerocash extensions providing payment anonymization to prevent linking payments to specific documents.
-
Theorem 1 proves a dominant strategy Nash equilibrium in which all rational servers honestly store and serve all files, subject to the constraint that per-server audit payment exceeds routing cost and file-serving payment exceeds storage cost. At 2017 prices, storage hardware cost approximately $0.03/GB and bandwidth cost approximately $0.03/GB, so the minimum per-file hosting payment must exceed (η + BR) × $0.03/GB × |f|.
-
Lavinia allows a publisher to publish content, submit payments, and then cease all interaction with the system — continued document availability is not contingent on the original publisher remaining online or reachable. This specifically protects against out-of-band coercion tactics such as rubber-hose cryptanalysis in the case that the publisher is captured or prosecuted.
-
Network-level path churn is critical for censor localization: 25%, 30%, 38%, and 67% of ICLab source-destination pairs observe distinct AS-level path changes over periods of one day, week, month, and year respectively. Without path churn, nearly 90% of constructed CNFs return five or more solutions (ambiguous), compared to less than 2% when multiple distinct paths are included.
-
Combining boolean network tomography with BGP path churn from the ICLab platform identifies 108 censoring ASes located in 49 countries across 4.9M measurements, reducing the candidate set of potential censoring ASes by 97% on average. 97.9% of constructed SAT CNFs return exactly one solution enabling exact AS-level censor identification, with less than 0.7% returning no solution.
-
Splitting measurement data by individual URL and time granularity (day, week, month) is necessary for SAT solvability: coarser time granularity reduces solvability because censorship policies change and noise accumulates, producing unsolvable CNFs. The authors solved 34,298 CNFs in total, each averaging 43 clauses and 17.41ms to solve using an off-the-shelf SAT solver (picosat).
-
Without per-site connection limits, popular decoy hosts risk resource exhaustion (Apache's default cap is 150 simultaneous connections); enforcing an initial limit of 30 concurrent clients per site—coordinated across stations via a central collector—kept the median site load at ~5 simultaneous clients, with the 99th-percentile site peaking at 37 after the limit was raised to 45.
-
Filtering candidate decoy sites by a minimum 15 KB TCP window eliminated 24% of the initial ~5,500 HTTPS hosts; a 30-second HTTP-timeout floor eliminated a further 11%; and AES-128-GCM cipher-suite support requirements eliminated an average of 32%—together reducing the viable decoy-site pool by approximately 55% before any live reachability tests.
-
The one-week trial served over 50,000 unique users (peak daily count: 57,000) with up to 4,000 concurrent sessions simultaneously, demonstrating that a four-station refraction deployment co-located at two mid-sized network operators can support tens of thousands of real censored users.
-
The trial explicitly obtained no evidence about TapDance's resistance to adversarial censor countermeasures: its scale and duration were judged small enough that censors likely did not observe it, leaving theoretical censorship-resistance claims unvalidated against active blocking responses.
-
TapDance was deployed on four ISP uplinks (two 40 Gbps, two 10 Gbps) using commodity 1U servers running a Rust/PF_RING zero-copy implementation; CPU load remained below 25% while handling a peak of ~14,000 new TLS connections per second across 34 cores, with cumulative mirrored traffic peaking at 55 Gbps across all stations.
-
Approximately 10% of respondents (n=23) held uncertain or incorrect beliefs about which actor was responsible for a given block, systematically conflating government censorship with geoblocking, paywalls, and platform-side restrictions. This misidentification cascaded into inappropriate tool selection and inaccurate risk assessment: users who could not distinguish state blocking from licensing restrictions could neither choose the right circumvention tool nor accurately gauge the legal jeopardy of accessing the content. Respondents specifically requested a pre-visit blocking-actor classification tool.
-
Nearly 70% (n=160) of respondents reported self-censoring online for fear of the law. Frequency of exposure to blocked content was a statistically significant, ordered predictor of self-censorship (Goodman-Kruskal's gamma = 0.421, 95% CI [0.247, 0.595], p < 0.05), with self-censorship increasing monotonically as exposure to blocked content increased. Notably, self-censorship rates did not differ significantly between respondents inside and outside Thailand, suggesting the chilling effect extends beyond the reach of domestic ISP-level blocking.
-
Of 229 Thai Internet users surveyed, 63% (n=144) had attempted to circumvent censorship, and of those, roughly 90% (n=132) reported success using VPNs (32.64%), proxies (32.64%), or Tor (23.61%). Failures were isolated to proxies (n=2), VPNs (n=2), and alternative searches (n=3), indicating that existing circumvention tools were technically adequate but that availability and comprehensibility—not raw capability—were the binding constraints on user success.
-
Users in Thailand relied on incident-driven tool selection—running a fresh Google search for a proxy or VPN each time they hit a block—which the paper identifies as a systematic vulnerability: the Thai Royal Police exploited this pattern after the 2014 coup by linking a phishing application to a government block page, harvesting email addresses and gaining application-level access to Facebook profile information. The paper further notes that orchestrated stricter censorship could drive users to a government-operated malicious tool.
-
Social media—primarily Facebook—was the dominant venue for direct, experienced threats: 9 of 15 respondents who had content blocked reported being censored on Facebook, and respondents observed that government censorship was shifting away from website blocking toward social media surveillance precisely because social media platforms are 'hard to block.' Respondents lacked any effective technical defenses against peer reporting, group-administrator censorship, and intermediary liability; they relied instead on social management strategies such as abbreviating references to royalty, running 'trial posts,' and self-censoring likes and shares.
-
Customer-cone size — the AS selection metric used by prior work (Houmansadr et al. 2014) — is poorly correlated with actual path frequency (Spearman rank correlation = 0.2). 33.17% of paths to Alexa top-100 prefixes traverse 1-hop customers of the largest-cone AS (AS3356, cone size 24,553) without transiting AS3356 itself, showing that cone-based heuristics systematically misidentify which ASes actually carry traffic.
-
A CAPTCHA-gated registration scheme with sequences of reCAPTCHAs at random intervals and short solve windows limits automated censor deployment. With 5 minutes spent per registration, a human adversary working non-stop for 24 hours can create at most 288 censors; combined with a 12-hour registration reset cycle, this bounds the adversary's censor accumulation rate.
-
For complete blockage (>99%) over 10 hours, the adversary requires a swarming ratio of 12.8, translating to 128,000 censors against a single server with 10,000 CoAs. Scaling to a 10-server, 10-interface deployment forces the adversary to operate 106,700 humans in parallel; with a 5-minute CAPTCHA registration and a 12-hour reset cycle, achieving complete blockage within 10 hours requires 1,067 non-stop human operators in the first two hours.
-
A credit-based accounting method dynamically assigns users to larger groups as their trust score accumulates (credit increases by G−1 per unblocked interval), requiring a user's credit to be twice the group's risk before joining. This reduces the total number of CoAs needed while making it costly for censor agents to infiltrate large groups, since they must wait through many clean intervals before the group reaches exploitable size.
-
A proof-of-concept Linux prototype using UMIP (open-source MIPv6) with three routers and five commodity machines (2.4GHz Intel Core 2 Duo, 4GB RAM) demonstrated correct CoA rotation every 10 seconds. Signaling overhead was reduced to one-third of standard MIPv6 by eliminating return routability messages; per-packet transmission overhead was 24 bytes (IPsec ESP), identical to the baseline secure-channel cost, yielding zero net overhead attributable to the MTD mechanism.
-
The MI-MTD framework uses Mobile IPv6 Care-of Addresses (CoAs) rotated among randomized user groups every shuffling interval. With 1,000,000 users, 5,000 censors, and 10,000 CoAs (swarming ratio φ=0.5), per-interval access probability is 60.88%; over one minute with 10-second shuffling intervals, blocking probability drops to approximately 0.358%, meaning users retain ~99.6% chance of access.
-
Ad server domains are structurally immune to censor blocking due to collateral-damage risk: Google DoubleClick is embedded in 1,843,854 publisher sites and PubMatic in 215,046, making IP-blocking of these domains prohibitively costly for any censor. Measurements of Alexa top-10K confirm the top 20 ad servers handle more than 75.6% of all ad requests.
-
82.2% of ad requests from Alexa top-500 websites are sent over HTTPS (Table 2), encrypting the HTTP Referer field. This prevents censors from correlating a user's direct-path ad request back to a censored publisher domain in the vast majority of cases; only the remaining 17.8% of HTTP ad requests are vulnerable to Referer-based traffic analysis.
-
Relay-based circumvention severely degrades ad relevance: across Alexa top-500 uncensored sites, the overlap between ad sets fetched via Tor and the direct-path ground truth averaged only 28%, with near-zero overlap for sites serving geo-targeted ads. For blocked sites, only ~16% of ads shown via Tor were in the user's language.
-
ADVENTION's split-path design — fetching publisher content via relay and ad requests via the direct path — raises average ad-set overlap from 28% (Tor) to 70%; combining ADVENTION with Intelligent Relay Selection (language-matched relay) further increases average overlap to ~80%. For blocked sites, ADVENTION with IRS raised ad relevance from ~16% to 100%.
-
ADVENTION provides up to 47% improvement in average page load time (PLT) compared to Tor, because ad requests — which are often on the critical rendering path — are served over the direct channel rather than through the relay. The exact improvement depends on webpage structure and bottleneck resources.
-
Of the 55 filters that inspected the HTTP Host header, 26 keyed only on the first Host header in a multi-Host request, 27 keyed only on the last, and only 2 examined both. Placing a benign Host header in the position the filter reads and the blocked URL in the other position bypassed the filter, and this divergence in behavior tracks RFC 7230's requirement to reject multi-Host requests with a 400 error — which none of the tested filters implemented.
-
HTTP GET fuzzing via subtle token modifications bypassed large fractions of filters: removing the `\r\n` before the Host header bypassed 36–38 of 44 Host-header filters; embedding the censored URL in the middle of a long hostname string bypassed 33–35 filters; placing the URL in an after-Host field with a non-empty Host bypassed 29–36 filters. Blacklist coverage was also weak: no filter blocked all 100 of the Alexa top adult sites, and some blocked as few as 31.
-
Among the 44 non-DNS filters, 11 did not reassemble TCP segments and 7 did not reassemble IP fragments before inspection, meaning a censored URL split across segment or fragment boundaries evaded detection. Five filters applied fragment/segment reassembly timeouts of under 2 seconds despite maintaining HTTP request state for more than 8.5 seconds, creating a window where a deliberately fragmented flow with artificial delay avoids inspection entirely.
-
Autosonda classified 76 commercial web filters in the NYC metropolitan area into three categories: 21 (27.63%) performed DNS blacklist filtering, 44 (57.89%) matched on the HTTP Host header of GET requests, and 11 (14.47%) performed a DNS lookup of the Host header value and blocked based on the resulting IP. Autosonda found circumvention paths for 100% of filters tested.
-
All 76 filters inspected only TCP traffic: sending the identical HTTP request over UDP bypassed censorship 100% of the time. Additionally, 17 of the 49 filters that censored requests to EC2 servers only inspected traffic on port 80 and passed through the same requests sent to port 9900 without modification. No filter triggered on URI query strings, so appending query parameters to any censored URL bypassed every tested filter.
-
In the heavily censored environment (E3), all successful connections used meek domain-fronting bridges (meek-amazon: 11 participants, meek-google: 9, meek-azure: 3); not a single participant successfully connected using flashproxy, fte, fte-ipv6, obfs4, or scramblesuit, despite all being available as built-in options.
-
The authors recommend 'smart automation' for bridge selection: the client first connects via a hard-to-censor bridge, then contacts a central Tor server over that Tor connection to identify the best available bridge for the user's location and network conditions, then reconnects using that bridge — eliminating the manual trial-and-error that caused 79% of attempts to fail. This is contrasted with 'naive automation' (sequential blind retry) which avoids UI friction but wastes time on non-working bridges.
-
Participants spent 64–78% of their total connection time on the progress/waiting screen (not in the configuration UI), and the simulated censorship environment was the dominant predictor of connection time (Kruskal–Wallis χ² = 80.5, df = 2, p < 10⁻¹⁵). In E3, each failed bridge attempt added several minutes of timeout before the user could retry, compounding the overall latency.
-
79% of total user attempts (363 of 458) to connect to Tor in simulated censored environments failed. In the most heavily censored condition (E3, requiring a meek or custom bridge), only 50% (10/20) of participants using the original interface connected, and even with the redesigned interface only 68% (13/19) succeeded within 40 minutes.
-
A redesigned Tor Launcher interface significantly increased success rates (Pearson χ² = 2.808, p < 0.047) and reduced median connection time in E3 from 40:08 to 20:25 (Mann–Whitney Z = −1.84, p < 0.0328, r = 0.172); configuration time also dropped significantly (Z = −3.28, p < 0.0005, r = 0.307). Changes included eliminating yes/no bridge and proxy question screens, adding auto-detection for proxies, consolidating options, and surfacing meek bridges as a fallback recommendation.
-
DeTor circuits have significantly lower end-to-end RTTs than standard Tor circuits because high-RTT paths cannot satisfy avoidance proofs, effectively self-selecting for shorter routes. Bandwidth distributions are similar to standard Tor. However, intentional packet-delay defenses proposed for Tor (to defeat timing attacks) would increase effective δ and reduce DeTor proof coverage, creating a tension between delay-based anonymity defenses and RTT-based geographic avoidance.
-
Never-twice avoidance — ensuring no country appears on both the entry leg (source→entry) and exit leg (exit→destination) of a Tor circuit — succeeds for 98.6% of source-destination pairs not in the same country, using only client-side RTT measurements. This directly defeats traffic-correlation deanonymization attacks that require an adversary on both legs of the circuit simultaneously.
-
DeTor proves geographic avoidance using speed-of-light RTT constraints rather than Internet topology maps. If the measured end-to-end RTT satisfies (1+δ)·Re2e < Rmin, where Rmin is the theoretical minimum RTT that would include any point in the forbidden region, then packets provably could not have traversed that region — even against adversaries who forge traceroute and BGP responses.
-
Tor's built-in country-exclusion feature provides only the illusion of control: among circuits configured to exclude the US, only 12% could be identified as definitively avoiding US territory. The remaining 88% of 'trusted' circuits fail to deliver a proof of avoidance, meaning standard Tor policy and provable security diverge sharply.
-
Bridges that carry clients are highly stable: their median lifetime is 116 days (~4 months) and 84% never change IP address, with 90% having at most one IP change. This means current censor policies that remove bridge IP blocks every 25 hours are far more conservative than necessary — an adversary could sustain blocks for months without significant collateral damage.
-
Tor's vanilla TLS certificate presents a distinctive pattern (SubjectCN=www.[random].com; IssuerCN=www.[random].net using base32 random strings), which never changes across certificate rotations every 2 hours. Using this pattern against Censys and Shodan scan data without running any active scans, the authors discovered 694 private bridges and 645 private proxies, and deanonymized the IP address of 35% of public bridges with clients (23% of all active public bridges) in April 2016.
-
Because Bangladesh's ban targeted specific named applications rather than underlying protocols, users successfully substituted functionally equivalent but unlisted apps: 'Banning Facebook, Viber, and Whatsapp for security purposes was not sufficient. For example, I used IMO to operate those apps. So, ultimately, nothing happened.' Authorities responded by expanding the blocklist to cover substitute apps, producing a reactive cat-and-mouse dynamic over the 26-day ban.
-
The Bangladesh Telecommunication Regulatory Commission (BTRC) directed ISPs to block Facebook, Viber, WhatsApp, and Facebook Messenger on November 18, 2015; the ban expanded over 26 days to include Twitter, Skype, IMO, and Instagram, with a coincidental 1-hour complete internet blackout at the outset. Blocking was enforced at the ISP level via written BTRC directives, targeting specific named platforms rather than underlying protocols or ports.
-
At least one participant was unable to use VPN during Bangladesh's ban because her Windows Phone (Lumia) did not carry VPN client apps in its app store, leaving her 'totally unable to communicate' for the ban's duration despite awareness of the workaround. Device platform and app-store access restrictions created a hard circumvention barrier independent of user intent or technical knowledge.
-
During Bangladesh's 2015 internet ban, police conducted roadside stops and physically inspected mobile phones for VPN software, confiscating devices found with VPN installed and asserting VPN use was illegal — despite no official government directive prohibiting VPN. This extra-legal enforcement, carried out by low-ranking constables, created a chilling deterrent effect on circumvention adoption beyond the technical challenge of blocking.
-
Prior to Bangladesh's 2015 internet ban, only 1 of 21 study participants had prior knowledge of VPN or IP-masking software; during the 26-day ban, VPN knowledge spread virally through social networks until it was described as 'fairly commonplace,' with adoption driven almost entirely by peer-to-peer instruction rather than technical documentation. Users required only procedural knowledge — installation steps and connection — not understanding of VPN mechanics.
-
Evaluation of the top 10,000 Alexa websites finds that 3,916 (39%) support HTTPS, of which 1,976 (50%) perform HTTP 3XX redirects that echo the requested path in the Location header and 812 (20%) replay the URL in HTTP 404 error responses — both usable as upstream covert channels readable by downstream-only decoy routers without intercepting upstream traffic.
-
Waterfall's Overt User Simulator caches previously loaded overt-website responses and replays them to generate cover traffic, overcoming Slitheen's 40% downstream throughput ceiling (caused by restricting covert replacement to leaf HTTP objects only). Because downstream-only decoy routers intercept all downstream TLS records — not just leaf content — Waterfall achieves higher covert capacity while perfectly mimicking overt browsing patterns against traffic analysis.
-
Aggregate measurements across nearly 180 countries over 17 days found that 60% of reflectors experienced some degree of connectivity disruption; the bias of detected blocks toward Citizen Lab Block List sites held for both inbound and outbound filtering, and temporal variability corroborated documented censorship events around political timelines.
-
Of 2,134 tested sites, 229 (10.7%) were invalid for inbound blocking detection due to ingress filtering or network-origin discrimination; 431 additional sites were invalid for outbound blocking detection, of which 75% were Cloudflare-hosted and 7% Fastly-hosted because anycast topology prevents RST packets from returning to the originating anycast node.
-
Validation against the Citizen Lab Block List (CLBL) showed that for 99% of reflectors, more than 56.7% of detected inbound-blocked sites were CLBL-listed (vs. 56.7% CLBL composition of the input dataset); 95% of reflectors showed the same directional bias for outbound filtering, confirming the method detects real censorship rather than measurement noise.
-
Augur's Internet-wide ZMap scan found 22.7 million hosts (of 140 million reachable) using shared monotonically-increasing IP ID counters across 234 countries (median 1,667 reflectors per country); filtering to ethical infrastructure via CAIDA Ark reduced this to 53,130 reflectors in 179 countries (median 15 per country), representing 4,214 ASes.
-
Using sequential hypothesis testing (SHT) with false positive and false negative rates both set to 10^-5, more than 90% of reflectors required 40 or fewer experiment trials to reach a blocking decision; over 17 days the system collected 207.6 million runs across 47 trials spanning 2,134 sites and 2,050 reflectors.
-
CloudFlare platform policy creates outsized blocking: 80% of CloudFlare-hosted websites discriminate against at least 60% of studied Tor exits, while Amazon- and Akamai-hosted sites show high policy diversity. Social networking and shopping sites are the most aggressive discriminators — 50% block over 60% of studied exits — while search engines are least aggressive, with 83% blocking fewer than 20% of exits.
-
Conservative exit policies (Reduced-Reduced, which additionally blocks SSH, Telnet, and IRC ports beyond the default) have no statistically significant correlation with IP blacklisting rates or abuse complaint volume. Web-traffic accounts for 98.88% of all connections on Reduced-Reduced exits, confirming that ports 80/443 are the primary abuse vector and that port-restriction does not meaningfully reduce exposure.
-
7% of 84 commercial IP blacklists proactively blacklist Tor exit relay IPs as a matter of policy: the Snort IP and Paid Aggregator blacklists listed newly deployed relay IPs within 3 hours of their first appearance in the Tor consensus and maintained the listing for the entire relay lifetime. In total, 88% of all Tor exits appear on at least one commercial blacklist, compared to 9% of VPNGate and 69% of HMA VPN endpoints.
-
Real Tor users browsing the Alexa Top 1M websites via deployed exit relays experience failed HTTP requests at rates of 15.8–33.4% and failed HTTPS handshakes at rates of 35.0–49.6%, representing severe service degradation compared to non-Tor browsing (Table 8).
-
20.03% of Alexa Top 500 website front-page loads showed discrimination against Tor exit users. Exercising search functionality on compatible sites raised discrimination by 3.89% (to 21.33%), while exercising login functionality raised it by 7.48% (to 24.56%), demonstrating that headless front-page-only crawlers significantly underestimate the true blocking rate Tor users face.
-
All five Republic of Cyprus ISPs (Callsat AS24672, Cablenet AS35432, Cyta AS6866, MTN AS15805, and Primetel) used DNS hijacking as their sole blocking mechanism, creating local zone entries that override legitimate DNS replies and redirect users to ISP-controlled block pages or error pages.
-
The Republic of Cyprus National Betting Authority (NBA) blocklist grew from 95 URL entries in February 2013 to 2,563 entries in April 2017 — approximately 27 times its initial size — with entries specifying full URL paths rather than just domain names, requiring DPI-capable infrastructure for correct enforcement.
-
DNS hijacking used by Cypriot ISPs to block gambling websites also suppressed MX record responses for blocked domains, rendering email delivery to those domains impossible — collateral damage not mandated by the 2012 gambling law, which required only URL blocking.
-
Cypriot ISPs could not enforce HTTPS URL entries from the NBA blocklist because SSL/TLS interception was not deployed; connections to port 443 for blocked domains simply timed out with no block page or user notification, meaning HTTPS entries were effectively under-blocked.
-
Topic correlation analysis across 2,904 list-topic pairs (585 significant after Bonferroni correction at α = 0.05) shows social media is disproportionately represented in country blacklists relative to the broader web; video-sharing sites are also frequently blocked, likely to suppress political organization, copyright infringement, or competition with local businesses.
-
DNS-sly requires out-of-band distribution of a 2.3 MB compressed bootstrap package (user profile map) before covert communication begins. The authors explicitly reject automated in-band bootstrapping to preserve deniability, accepting a hard scalability constraint as the cost; the particular censored environment tested did not interfere with DNS traffic at all, enabling successful censored-site retrieval at the same throughput rates as uncensored tests.
-
Active probing resistance was evaluated by simultaneously querying 5 additional DNS resolvers for every domain during DNS-sly operation. DNS-sly's response change distribution falls within one standard deviation of the other resolvers, making probing attacks unable to distinguish DNS-sly servers from ordinary resolvers. TTL-based re-encoding prohibition neutralizes forced-divergence probing where an attacker sends repeated identical queries to expose responder state.
-
Schuchard et al. demonstrated that latency differences caused by a decoy routing proxy communicating with a distant covert destination are sufficient not only to detect the use of decoy routing but also to fingerprint which specific censored webpage the client accessed. All prior decoy routing systems (Telex, Cirripede, Curveball, TapDance, Rebound) remained vulnerable to this attack at time of publication.
-
TapDance's non-blocking asymmetric design leaves the overt connection open but abandoned, enabling an active censor to inject a TCP ACK carrying a stale sequence number; the overt server responds with its true TCP state, exposing the discrepancy and confirming decoy routing. The attack requires no clean-path routing capability: the injected packet is forwarded through the tainted path by the non-blocking TapDance station itself.
-
Adding a DPI apparatus with true positive rate TPR and false positive rate FPR creates three ordered thresholds Fam ≤ Fab ≤ Fmb governing censor strategy: allow all traffic (CTP ≤ Fam), deploy the apparatus (Fam < CTP ≤ Fmb), or block all traffic (CTP > Fmb). The apparatus does not qualitatively change the Nash equilibrium structure; it only shrinks the CTP range the circumventor can sustain, with the ordering Fmb ≥ Fab ≥ Fam holding whenever TPR ≥ FPR.
-
A censor can mount a zero-collateral-damage flooding attack by injecting fake CRS-protocol-conformant traffic into open channels, inflating the apparent CTP and evicting real circumvention traffic to throttled or sacrificial protocols. If injection is costless the censor can drive real circumvention throughput to zero while keeping all channels nominally open; the attack is equally effective against both throttling and dumping CTP control strategies.
-
In a single-round censorship game the only Nash equilibrium that keeps the channel open requires the circumvention traffic proportion (CTP) satisfy CTP ≤ F, where F = (βant+βbnt)/(αact+αbct+βant+βbnt). In repeated indefinite games a stable equilibrium exists at CTP = Z = (1−p)·CTPmax, where p is the per-round continuation probability, allowing a non-zero proportion of circumvention traffic to flow indefinitely without triggering shutdown.
-
The optimal multi-protocol CRS traffic allocation distributes circumvention traffic across n cover protocols proportionally to each protocol's non-circumvention traffic volume (CTPi = Li · CTP/(1−CTP)), keeping every individual protocol below the blocking threshold. This makes individual protocol channels independently optimizable, with the sole selection criterion being maximizing cover traffic volume L rather than any other protocol property.
-
Throttling—capping total CRS traffic at Fab and withholding surplus—strictly dominates dumping surplus traffic onto a sacrificial protocol that will subsequently be blocked. Table 2 shows that at CTP = Fab·1.05 the circumventor's relative utility drops to 0.88 of the Fab baseline when dumping, while throttling preserves all open protocols; under a censor flooding attack dumping additionally loses protocol n entirely, making throttling the dominant strategy in both attack and no-attack conditions.
-
Snowflake exclusively uses WebRTC data channels (on-wire protocol: DTLS), whereas the majority of WebRTC applications use media channels (DTLS-SRTP or SRTP/SDES); a censor can therefore block Snowflake by filtering data-channel flows alone without blocking WebRTC media applications, incurring minimal collateral damage and reducing the overblocking deterrent.
-
The authors extend Houmansadr et al.'s 'parrot is dead' argument to WebRTC: because WebRTC is a large multi-protocol framework, superficial mimicry that fails to replicate exact DTLS version, cipher suite ordering, certificate common name ('WebRTC'), 30-day validity period, STUN server selection, and ICE packet sequence leaves detectable residual distinguishers, making deep fingerprint conformance especially hard for standalone non-browser implementations such as Snowflake's client.
-
Among the five WebRTC applications analyzed (Google Hangouts, Facebook Messenger, OpenTokRTC, Sharefest, Snowflake), Snowflake is uniquely identifiable by its use of DTLSv1.2 (all others use DTLSv1.0), its 17 offered cipher suites, and its exclusive selection of TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256—a cipher suite not chosen by any other application in the study.
-
STUN and TURN packets carry a SOFTWARE attribute that explicitly names the server implementation (e.g., 'Citrix-3.2.5.1 Marshal West' for OpenTokRTC), and the choice of STUN servers, forced-TURN usage, and STUN message-type sequence (Binding-only vs. Allocate+CreatePermission vs. send-indication) differ across applications, providing a passive censor with reliable application-level fingerprints orthogonal to the DTLS layer.
-
A DTLS fingerprinting script run on one full day of network traffic at Lawrence Berkeley National Laboratory found only 7 DTLS handshakes with 3 unique client fingerprints and 3 unique server fingerprints, suggesting there may not be enough naturally occurring WebRTC traffic to provide meaningful cover for a WebRTC-based circumvention system.
-
Password-protected Castle game sessions (passwords distributed via a BridgeDB-like mechanism) prevent censors from joining instances to observe in-game state or identify participants; when a client fails to supply the correct password within a timeout, the Castle proxy falls back to an AI player, making Castle instances indistinguishable from legitimate games even to an adversary who enters the lobby.
-
Castle structurally avoids all three covert-channel pitfalls identified by Geddes et al.: architecture mismatch is avoided by supporting both client-server and P2P modes; channel mismatch is avoided because RTS games implement application-layer reliability over UDP (matching proxied TCP requirements, unlike VoIP), blocking selective-drop denial-of-service attacks; content mismatch is avoided because legitimate RTS traffic has high natural variance driven by map, strategy, and player count.
-
A single undergraduate ported Castle to two closed-source commercial RTS games (each with >8.5 million copies sold, from different studios) in under 6 hours per game using a ~500-LOC Python/AutoHotkey codebase; 17 of the Top 20 best-selling RTS games share the unit-command structure Castle requires, and 11 have community-decoded replay formats, enabling rapid adaptation to new titles.
-
Castle's packet-size and inter-packet-time distributions (measured via Kolmogorov-Smirnov statistic) fall within the variance observed between legitimate human-game sessions when using ≤50 units/command at ~1 command/second; the best-performing classifier (Herrmann) achieved only ~60% accuracy—roughly 10% above random guessing—against multiple Castle configurations, while two other classifiers (Liberatore, Shmatikov timing) performed near chance.
-
Vanilla Castle achieves 42–190 bytes/second (average) and transfers a 10 KB file in 52–238 seconds depending on the game (0-A.D. / Aeons / Conquerors); game-specific exploitation of per-unit click logging in Aeons raised throughput to ~3 KB/s. These rates are sufficient for asynchronous text-based communication (tweets, email, news articles) and bootstrapping Tor bridge IP distribution.
-
χ² homogeneity tests on 70 audio signal pairs show that at SNR ≥ 25 dB the probability that a statistical test distinguishes modulated from original signals falls to 77.13% (i.e., the rate of successful discrimination is below 23%). Crucially, this analysis requires access to the original unmodulated signal; for live voice transmissions no such pairing is feasible for the censor, rendering statistical detection unrealizable in practice.
-
SkypeLine's m-ary modulation (Mode B using 128-bit Hadamard sequences) achieves a peak data rate of 2,407 bps, representing a 12,035% improvement over FHSS-based DSSS (Takahashi et al., 20.5 bps) and 19,256% over phase-coding techniques (Nutzinger et al., 12.5 bps). Four-layer parallel binary modulation (Mode A, Quattro) achieves a peak of 224 bps and mean of 106.61 bps at ≥99% reconstruction accuracy.
-
A Skype prototype operating under real-world conditions achieves 64 bps (WGN noise, no ECC) at ≥99% reconstruction accuracy and ≥23 dB SNR. With OPUS/Silk encoding (vector quantization), throughput is constrained to approximately 72 bps at two modulation layers; additional layers fail to satisfy the 99% accuracy bound because VQ codec noise reduction filters the embedded pseudo-noise sequences.
-
Wireshark captures of Skype traffic with and without hidden information at inaudible SNR show no statistically significant differences in inter-arrival times (mean IAT 0.019 s in all conditions) and only a 2.6% difference in mean packet length (130.34 bytes unmodulated vs. 126.98 bytes at inaudible SNR), well within one standard deviation (SD ≈ 12–14 bytes) and insufficient for reliable content-mismatch detection.
-
By transmitting application-level social media content over genuine SMTP/IMAP connections rather than imitating email protocols, Mailet achieves channel and content consistency, making it immune to the differential channel attacks — channel mismatch and content mismatch — that defeated earlier hide-within systems such as StegoTorus and Freewave.
-
Mailet resists proxy enumeration because clients communicate exclusively through widely-used email hosting providers over standard POP3/SMTP/IMAP ports; no direct client-to-Mailet-server connection ever exists, so even if a censor learns a Mailet server's IP address, blocking it requires blocking all email to major providers — collateral damage that is politically infeasible.
-
Mailet's GCM-based Credential Recovery (GCM-CR) achieves a 120x speedup over traditional garbled-circuit 2PC for privately reconstructing split credentials inside a live TLS record, enabling a single Mailet server to support up to 200 simultaneous sessions with each service request completing in approximately 1 second.
-
Mailet clients' daily email traffic patterns remained within the normal range of genuine email users, validated against the Enron dataset (517,425 emails, 151 users) combined with simulated Twitter usage patterns from 100 randomly sampled accounts, demonstrating that per-user daily email frequency is a poor Mailet detector with high false-positive and false-negative rates.
-
Mailet's (2,2)-threshold credential scheme distributes a user's social media credential as Cred1 ⊕ Cred2 across two randomly chosen servers; an adversary corrupting fraction ρ of the server pool has at most probability ρ² of compromising both servers for a given user, and under standard AES assumptions a single compromised server leaks no information about the credential beyond its length.
-
CovertCast's broadcast model decouples server workload from client count: one server can serve unlimited simultaneous clients without per-connection overhead, unlike hide-within systems such as FreeWave where server costs grow linearly with users. This architecture also defeats Sybil-based DoS attacks, because flooding the server with fake client requests does not increase server load — the server never processes individual client connections.
-
Under degraded network conditions, CovertCast page load times increased by 2–3× at 800 Kbps (below YouTube's minimum 720p bitrate of 1.5 Mbps), with 20 of over 4,000 images dropped at 800 Kbps; at 10% packet loss, 35 images were missed due to YouTube temporarily accelerating video playback; at 20% packet loss, 720p video could not be loaded at all.
-
A KL-divergence classifier trained to distinguish CovertCast streams from real YouTube streams achieved only 33–45% true positive rate on packet-size distributions and 36–41% on inter-packet timing distributions — below random guessing — while maintaining 86–98% true negative rates. Overall classifier accuracy was approximately 65–68%, driven entirely by the high true negative rate rather than genuine detection capability.
-
CovertCast uses the identical video codecs, streaming protocols (RTMP/HTTPS), and server endpoints as any other YouTube live stream, making it indistinguishable from regular streaming traffic to both passive protocol-analysis and active traffic-manipulation attacks. Any active attack that disrupts CovertCast connections — such as selective packet dropping — would equally disrupt all non-circumvention viewers of the same streaming service, imposing prohibitive collateral damage.
-
Because CovertCast clients connect to live-streaming service infrastructure (e.g., YouTube servers) rather than to CovertCast servers directly, IP-address blacklisting of CovertCast infrastructure does not allow censors to identify or disrupt client connections. Discovering the CovertCast server's IP address is therefore irrelevant to the censor's blocking goal.
-
Matryoshka achieves an average covert rate of ~3 bits/word after human enhancement; for a 5-word hidden message averaging 5.5 characters per word, the final enhanced stegotext is approximately 73 words. This is roughly 10× the covert rate of Spammimic (~0.3 bits/word), the prior leading approach.
-
After crowdsourced (MTurk) enhancement, 88% of stegotexts on average pass a One-Class SVM trained on 150K sentences from Wikipedia, Brown, and Reuters corpora as natural language; pre-enhancement, only 25–58% pass. For calibration, the same classifier correctly rejects 97% of randomly generated sentences as non-natural-language.
-
A mixed Huffman codebook combining character-level coding with explicit entries for the 300 most frequent English words (covering ~65% of written material) achieves a 52% compression ratio on average across 4,825 sentences of 4–15 words—7 percentage points better than a character-only alphabet—directly increasing the covert bits available per output word.
-
Users required 4.0–5.8 minutes on average to enhance a stegotext into natural language across three experiments, inserting 4–8 extra words per sentence; this is comparable to the time required to write a short email. The random-word-selection baseline consistently required more time and inserted more words, confirming that n-gram-guided word choice meaningfully reduces human editing burden.
-
The Viterbi-based probabilistic decoder achieves zero character error rate on 96%, 93%, and 95% of decoded messages across the three corpora experiments (dreams, animals, facebook). For the small fraction of failures, only 15% of characters on average were corrupted rather than total message loss.
-
The top 10 CDNs collectively host nearly 20% of the Alexa top 10,000 domains (1,967 domains); CloudFlare alone accounts for ~10% of those sites (726 domains) and operates across 75 ASes with 107,008 IP addresses. CDN-hosted domains receive disproportionate interference relative to their 20% share, suggesting censors target popular shared-infrastructure sites as a high-leverage blocking strategy.
-
Satellite's single-node measurement methodology, probing 1/10th of 12 million discovered open DNS resolvers across 20,000 ASes and 169 countries, detected 4,819 instances of ISP-level DNS hijacking across 117 countries while measuring 10,000 domains with weekly precision from a single external vantage point.
-
Domestic mesh traceroutes (both source and destination inside the target country) uncovered 5,562 new AS edges not present in standard BGP table–derived topology datasets, far exceeding the 647 new edges found by inside-out/outside-in traceroutes using up to 25 probes. Russia, the US, France, the UK, and Ukraine gained the most new edges.
-
A decision tree with linear regression at leaves (DTLR) trained on AS-topology features for 168 countries predicts Freedom House freedom category (Free/Partly Free/Not Free) with 95% accuracy. Average FPI prediction error was 3.47%, and prediction error remained ≤8 points (on a 0–100 scale) 90% of the time under leave-one-out cross-validation.
-
IP density (number of IP addresses per person) is the single most predictive feature of a country's Freedom of Press Index. A normalized IP density value of ≥0.167 reliably predicts high freedom of expression, while normalized maximum BGP policy-compliant path length ≥0.643 reliably predicts low freedom.
-
Singapore's AS topology — 257 domestic ASes with 3,022 international connections — resembles that of high-freedom countries, yet its Freedom of Press Index is 33 (Partly Free), making it a structural outlier where rich international BGP connectivity coexists with enforced information controls. Our DTLR model predicts Singapore's FPI should be ≥70 (Free).
-
Measured data overhead when loading web pages across four circumvention channels over DSL: instant messaging (Skype text) added 39% overhead, email added 107%, file sharing (Dropbox) added 272%, and VoIP audio modulation added an 84× overhead. Latency was lowest for instant messaging; VoIP latency was dominated by its limited 1200-baud audio encoding bandwidth.
-
To match legitimate user behavior, the Camouflage dispatcher enforces empirically derived per-protocol session time limits: email 1–3 minutes, file sharing 5–10 minutes, instant messaging 15–20 minutes, and VoIP 20–30 minutes (Table 1). Sessions exceeding these windows produce a detectable deviation from population-level usage norms.
-
Protocol imitation systems (SkypeMorph, CensorSpoofer, StegoTorus) fail to achieve unobservability because they implement the target protocol only partially, creating statistical discrepancies that censors can detect. Houmansadr et al. (2013) demonstrated this as a fundamental flaw: unobservability by imitation is categorically insufficient as a circumvention design principle.
-
A single-protocol circumvention system creates a detectable anomaly: when the system is active, the traffic pattern on that protocol diverges from the same user's baseline behavior, which anomaly-based detectors can classify. Users who also legitimately use the tunneled service in daily life produce two distinct signatures — one with and one without the circumvention layer — further compounding detectability.
-
In Italy, gambling and betting sites were censored primarily via DNS hijacking toward explicit blockpages with ISP-level plausible-DNS-resolution rates as low as 4.5% (NGI), 31.2% (Wind), and 46.1% (Telecom Italia), while the academic GARR network showed no censorship. File-sharing sites (thepiratebay.sx) faced a more aggressive multi-layer response: 2 of 4 ISPs showed less than 50% TCP reachability (versus near 100% for betting sites), and control DNS resolvers were also affected, indicating coordinated infrastructure-wide blocking rather than ISP-level DNS hijacking alone.
-
In South Korea, adult websites (e.g., hardsextube.com) were censored exclusively via HTTP content substitution — a JavaScript redirect to the official blockpage http://warning.or.kr — with 98% of content-size-ratio samples falling below the 0.3 detection threshold, while no DNS tampering or TCP-level blocking was observed. All other tested countries had fewer than 16% of samples below the threshold.
-
UBICA's crowdsourced measurement campaign across 31 countries deployed 200+ probes (47 GUI clients, 188 headless clients, 16 BISmark routers) and tested more than 16,000 targets (~15,000 hostnames) over 4 months. Its content-size ratio algorithm detects blockpage substitution by comparing average resource size per country against a global baseline, using a threshold of 0.3 (midpoint between the two observed distribution modes minus a 0.2 guard interval) without requiring a pre-existing uncensored ground truth.
-
He et al. found that 65% of sampled routes between public traceroute servers have some degree of AS-level asymmetry; John et al. found that asymmetry reaches 96% on Tier-1 ISP backbone links due to hot-potato routing. These figures invalidate the symmetric-route assumption underlying Telex and Cirripede and motivate a fully asymmetric design.
-
Because Rebound never terminates the client–decoy connection, connection-state probes (including 0trace-style TTL-expiry probes that bypass the decoy router via an alternate route) cannot reveal any discrepancy between the observed and actual state: the connection to the decoy host is always exactly in the state a censor would expect.
-
Rebound's mole protocol generates a characteristic traffic pattern — a steady stream of long HTTP GET requests followed by 404-style error responses — that may be identifiable via traffic analysis even though the channel is TLS-encrypted; the paper acknowledges this as an unmitigated vulnerability and notes that intermingling with ordinary requests reduces observability but further lowers effective throughput.
-
Rebound eliminates the stack-fingerprinting vulnerability present in Telex, Curveball, Cirripede, and TapDance by never forging packets addressed to the client; all data from the decoy router to the client travels through the real decoy host, so the TCP/IP stack fingerprint observed by a censor is always that of the genuine decoy.
-
In an Internet measurement from a residential Verizon FiOS client 12 hops from the Rebound router (26 ms RTT), Rebound achieves 129,398 bytes/s (≈126 KB/s) for 1 MB transfers, compared to 354,676 bytes/s for Curveball and 1,174,240 bytes/s for plain HTTP — sufficient to stream 360p video but roughly 3× slower than Curveball. The unoptimised Python router implementation uses less than half a core of an Intel Xeon E5620 at 2.4 GHz at sustained full speed.
-
Domain fronting exploits the fact that major CDN providers (Google, Amazon CloudFront, Akamai, Microsoft Azure) terminate TLS at the edge before inspecting the Host header, so the SNI visible to a censor names a permitted CDN domain (e.g., www.google.com) while the inner HTTP Host header routes the request to a blocked destination. Blocking the fronted service requires blocking the entire CDN, creating collateral damage that most censors are unwilling to accept for major providers.
-
The paper formally characterizes the censor's visibility gap: the SNI field in the TLS ClientHello and the HTTP Host header inside the tunnel are the two places that reveal destination, and CDNs that terminate TLS before forwarding HTTP requests prevent censors from correlating them. Any censor capable of correlating SNI to inner-Host (e.g., through CDN cooperation or plaintext HTTP/2 framing) can defeat domain fronting without CDN blocking.
-
Yemen's national ISP (YemenNet) uses explicit blockpages for social and Internet-tools content while applying stealthy techniques — TCP RST injection and unrequited HTTP GETs — specifically for political and conflict content that is constitutionally protected. Censorship also ceases intermittently when the ISP exhausts filtering product licenses.
-
Beverly et al. found that 77% of Internet clients can spoof source addresses within their own /24 and 11% can spoof within their own /16, with these characteristics holding across a wide range of countries and regions. The authors use this result to argue that IP-spoofed cover traffic — where measurement probes appear to originate from many hosts in the same AS — is broadly feasible in practice.
-
The authors argue that it is almost certainly impossible to eliminate — or even definitively quantify — the risk to users who perform censorship measurements, because surveillance system capabilities are rapidly evolving and in some cases unknowable; retribution in adversarial environments may not follow due process. The paper explicitly states that its techniques had not been deployed on real networks as of writing because "a better consideration of the associated risks is warranted."
-
Surveillance systems are fundamentally more selective than censorship systems due to storage constraints: as of 2009 the NSA could store only 7.5% of received traffic across 592 tapped 10 Gbps links with only 69 10 Gbps backhaul links, and the authors' campus network retains non-alert metadata for ~36 hours and IDS alerts for ~1 year. Censorship systems by contrast are transaction-focused and retain only enough data to process real-time requests. This asymmetry creates an exploitable gap: traffic that does not stand out from the population is discarded before reaching human analysts.
-
Requiring consent from device owners for co-opted censorship measurements reduces coverage and continuity, and may paradoxically increase danger: soliciting consent signals intent to participants and draws attention, whereas the prevalence of malware and third-party trackers provides plausible deniability for unwitting device owners. The authors note that more widespread co-opted measurements collectively provide greater individual protection by normalizing unexplained outbound traffic.
-
The Encore technique uses cross-origin HTTP requests to induce a visiting user's browser to silently fetch a censorship target URL, enabling passive measurement of web filtering for sites including Facebook, YouTube, and Twitter. The ethical argument for deployment rests on the observation that nearly all major websites already embed content from these platforms, so the additional traffic is indistinguishable from normal browsing behavior.
-
University IRBs are not equipped to evaluate censorship measurement research because it falls outside the formal definition of 'human subjects' research (which requires direct intervention with individuals to collect individualized data). Despite this, the work poses real and potentially serious risks to people, leaving a governance gap with no clear institutional oversight body.
-
The legality of a measurement method within a given country does not equate to safety for implicated subjects: authoritarian regimes may assess network logs based on ulterior motives unrelated to technical specifics, or may lack sufficient technical understanding to distinguish measurement traffic from deliberate access. Subjects may additionally face privacy hazards such as being falsely implicated in accessing illegal content.
-
Three approaches to gathering censorship measurements exist: deploying researchers with software (snapshot coverage, researcher safety risk), deploying software to at-risk citizens (continuous but endangers locals), and co-opting existing deployed software (continuous, widespread coverage, but raises consent issues since device owners may be unwittingly implicated). The third approach offers substantially greater measurement capabilities but introduces the most unresolved ethical risk.
-
On a 370-node PlanetLab deployment, Alibi Routing achieved near 100% success avoiding both the USA and China (Tables 1–2) with an average search cost of 1.0–1.66 nodes contacted (Table 4). In simulation over 20,000 globally distributed nodes, success rates were 93–100% at δ=0.5–1.0 with average search cost under 40 nodes (Table 3), capping TTL at 7.
-
For the vast majority of source-destination pairs avoiding the USA or China on PlanetLab, Alibi Routing introduces less than 50% latency inflation; some pairs even see latency improvement due to overlay shortcutting (Figure 9). Latency inflation is relatively insensitive to the inequality factor δ when relays are successfully found.
-
Property 1 proves that a peer inside a forbidden region F cannot satisfy the safety condition: appearing safe would require reporting an RTT lower than (3/c)·distance(peer,F), a physical impossibility. Property 2 follows: all trustworthy peers ignore packets routing through F regardless of attacker-controlled neighbor sets, making Alibi Routing safe without assuming honest neighbor selection.
-
Alibi Routing fails for source-destination pairs close to or inside the forbidden region: approximately 10% of pairs cannot provably avoid China and 22% cannot avoid the USA at δ=1.0 (Figure 5), with a strong monotonic correlation between proximity to the forbidden region and the number of available relays (Figure 6). Additionally, about 50% of nodes in target regions fail the alibi condition when avoiding the USA due to its BGP routing centrality causing actual paths to transit it despite geographic distance (Figure 7a).
-
Alibi Routing proves packets avoided a forbidden geographic region using physical impossibility: a relay MACs forwarded packets, and the observed RTT must satisfy (1+δ)·R(s,r) < min_{f∈F}{R(s,f)+R(f,r)}, where the minimum RTT to any point in F is estimated as (3/c)·ShortestDistance(q,F) — fiber-optic links at 2/3 the speed of light. This proof requires only GPS coordinates and local RTT measurements, no BGP modifications or PKI.
-
The paper identifies a structural conflict between Internet research's scalability imperative — where a project processing millions of devices is considered superior — and human-subjects ethics frameworks designed to minimize the number of people exposed to risk. Under U.S. law, Encore is compliant because it exploits known, intentional web functionality (the same-origin policy's cross-origin request mechanism) and provides an opt-out mechanism, but the authors note this compliance does not transfer to all jurisdictions where measurements occur.
-
C-Saw's design demonstrates that coupling circumvention capability with censorship measurement creates a self-reinforcing incentive loop: users opt in for improved page load times, their participation grows the vantage-point pool, and richer measurements enable finer-grained technique selection per ISP and URL. The system avoids requiring a pre-populated URL list by building a blocked-URL database dynamically from user-initiated requests.
-
Rook constructs per-field symbol tables by observing 600 packets (~60 seconds) of real gameplay at session start, then restricts substituted values to only those previously observed with frequency within two orders of magnitude of the median. This ensures altered packets never contain field values that are absent or anomalously rare in legitimate traffic, defeating value-anomaly and out-of-range DPI filters.
-
Rook achieves 34 bits/second client-to-server and 26 bits/second server-to-client within Team Fortress 2, sufficient for OTR-encrypted real-time chat. Rook use did not trigger Valve Anti-Cheat warnings and did not noticeably degrade gameplay for co-located legitimate players.
-
Kolmogorov-Smirnov two-sample tests on packet-size distributions and inter-packet timing show that standard Rook (altering ~1-in-10 packets) is statistically indistinguishable from normal TF2 gameplay across 20 samples each. High-bandwidth Rook (1-in-2 packets) shows a slightly higher average bandwidth but remains difficult to distinguish on traffic-shape metrics.
-
Because Rook runs the actual game client and server rather than mimicking them, active anti-mimicry probes receive identical responses to a normal game instance. Systems based on protocol mimicry are vulnerable to probes that expose non-conforming behavior, but Rook eliminates this attack surface entirely.
-
Game-specific trigram analysis of mutable fields distinguishes high-bandwidth Rook (1-in-2 substitution rate) in server-side packet counts, showing clearly reduced distinct-trigram counts versus baseline. Standard Rook (1-in-10) produces only a few outliers and is not reliably distinguishable; any detector would face a high false-negative or false-positive rate against normal-rate Rook.
-
Format-transforming encryption (FTE) as deployed in the Tor Browser Bundle is detected by combining a URI Shannon-entropy threshold (≥5.5 bits) with an exact URI length check (239 bytes) on the first HTTP GET request. This embellished test produces only 264 false positives across approximately 10 million HTTP URIs in three campus datasets, while a length-only test causes roughly 15% false-positive rate over the same flows.
-
CART decision-tree classifiers trained on entropy-based and packet-header features detect all five Tor pluggable transports (obfsproxy3/4, FTE, meek-amazon, meek-google) with average PR-AUC=0.987, TPR=0.986, and FPR=0.003 on synthetic traces. On 14 million real campus flows the highest per-obfuscator FPR is 0.65%, and meek-google yields only 842 false positives across all three datasets. However, cross-environment portability is poor: classifiers trained on an Ubuntu/campus setup and tested on a Windows/home network achieve true-positive rates as low as 52% with false-positive rates reaching 12%.
-
The paper demonstrates that 'having no fingerprint is itself a fingerprint': randomizing obfuscators that emit uniformly random bytes from the first packet are detectable precisely because conventional protocols (TLS, SSH, HTTP) always begin with fixed plaintext headers. This structural distinction requires no deep payload parsing — the attack operates on only the first TCP packet — and achieves TPR=1.0 / FPR=0.002 against obfsproxy3/4 using commodity-implementable statistics.
-
Obfsproxy3 and obfsproxy4 are reliably detected by an entropy-distribution test (KS test, block size k=8) applied to the first 2,048 bytes of the first client-to-server packet, combined with a minimum payload-length check of 149 bytes. On three university campus datasets totaling over 14 million TCP flows, the test achieves TPR=1.0 with FPR ranging from 0.24% to 0.33%. Omitting the length check raises the SSL/TLS false-positive rate to approximately 23%.
-
A semantics-based attack that flags HTTP flows carrying structurally invalid PDF documents as Stegotorus produces false-positive rates as high as 43% across three campus datasets (10,847 PDF flows examined), because malformed, partial, and non-standard PDFs are common in real network traffic. By contrast, active HTTP-response fingerprinting of a suspected Stegotorus server yields only 0.03% false positives (3 matching servers out of 9,320 Alexa-top-10K servers), but requires active probing and is detectable by the proxy operator.
-
The paper acknowledges that modern blind steganalysis tools combining first- and second-order statistical classifiers (e.g., SVM-based universal steganalysis) are likely capable of detecting TRIST-embedded images, though this was not experimentally verified. The authors note these attacks rely on large feature vectors and are computationally more expensive than histogram or blockiness attacks, but do not claim invulnerability.
-
TRIST evades the self-calibrated blockiness detector — proven effective against OutGuess — by embedding at JPEG quality 30 and then transcoding the steg image up to quality 90 before transmission. This renders the blockiness-based message length estimator unreliable across the full range of message lengths from 0 to approximately 39 KB, as shown over 20 cover images from the BOSS dataset.
-
By embedding messages in heavily quantized DCT frequency components at base JPEG quality 30, TRIST achieves near-zero bit error rates when images are transcoded to higher quality levels and back. The quantization mapping is many-to-one, so noise introduced by re-encoding tends to be stabilized on output, making the message robust against commodity transcoding proxies that re-encode images in-flight.
-
Using low DCT frequency components (indices 10, 9, 8, 3) at JPEG quality 30 achieves near-zero message error rates for image rescaling in the 75–95% range across a wide range of sharpening sigma values. Higher-frequency component sets (indices 18, 17, 16, 10) only survive rescaling above 100%, making them unsuitable for scenarios where censors reduce image dimensions.
-
TRIST integrated with StegoTorus as a one-hop SOCKS proxy introduces minimal additional bandwidth overhead: JPEG steganography throughput falls between StegoTorus's PDF and JSON schemes across link delays of 20–400 ms and 1–4 parallel circuits. The steganographic expansion factor is 1:6 to 1:12 (message bytes to cover JPEG file length), adequate for basic web surfing.
-
Using TCP IPID side channels combined with SYN backlog state inference, the authors detect intentional packet drops between two arbitrary Internet hosts without controlling either host. The only requirements are a client with a globally incrementing IPID (~1% of IP space) and a server with an open port; an ARMA model handles autocorrelated noise.
-
Client-to-server packet drops (RSTs from client to server are dropped in transit) indicate the simplest null-routing mechanism: the server's destination IP is null-routed at the censor. The method distinguishes this from server-to-client drops (stateless return-path filtering) and from RST/ICMP injection—cases where the packet is not dropped but a forged termination packet is inserted—which both appear as the 'no-packets-dropped' outcome in the IPID time series.
-
Page length comparison at a 30.19% size-difference threshold achieves a 95.03% true positive rate and 1.371% false positive rate for block page detection, outperforming DOM similarity (95.35% TP, 3.732% FP) on false positive rate and cosine similarity (97.94% TP, 1.938% FP, 74.23% precision) on precision. These metrics were evaluated via ten-fold cross-validation on the ONI dataset of ~500,000 entries from 49 countries spanning 2007–2012.
-
Five commercial filtering products (FortiGuard, Squid, Netsweeper, Websense, WireFilter) were identified in 7 of 36 block-page clusters via copyright notices in HTML comments, HTTP header strings, or URL path patterns; the remaining 29 clusters contained no identifying markup. WireFilter was first detected in the wild in Saudi Arabia (AS 25019) in 2011, representing a newly deployed filtering product not previously observed in measurements.
-
Within a single country mandate, different ISPs implement censorship with different filtering tools and mechanisms: Thailand's AS 9737 and AS 17552 use structurally distinct block-page templates (vector 17 is ~1,000 bytes using div layout; vector 8 is ~6,000 bytes using table layout). Both ISPs actively obfuscate their filtering product by reporting generic 'Server: Apache/2.2.9 (Debian)' or 'Server: Apache' HTTP headers instead of the actual product identifier.
-
Term frequency clustering of block pages achieves an F-1 measure of 0.98, correctly recovering manually identified block-page templates; page-length clustering performs far worse at F-1 of 0.64. Across the full ONI dataset, only 37 distinct term frequency vectors were found from five years of measurements, indicating that filtering vendors rarely change block-page HTML structure.
-
By deploying covert channels inside legitimate high-traffic web services (e.g., OpenSearch sites), Facade raises the censor's cost of blocking to unacceptable collateral damage: blocking Facade requires blocking the legitimate web service, which harms local businesses and normal users. Facade explicitly assumes censors are unwilling to block major platforms such as AWS or popular search services.
-
Facade encodes 78.04 bits per HTTP GET request using search-query terms, compared to Infranet's 3 bits per URL — a ~26× improvement — while maintaining comparable statistical deniability. StegoTorus encodes 12,000 bits per URL but offers no statistical deniability against traffic-pattern analysis.
-
Facade routes all encoded HTTP requests through a Selenium-controlled Chrome browser instance, so every message the censor observes is generated by a real browser implementation. This defeats 'parrot attack' fingerprinting, which exploits discrepancies between a protocol emulator's responses to error conditions and those of the genuine client or server.
-
Facade faces an inverse tradeoff between upstream throughput and deniability: pure search encoding maximizes bits per request (78.04 bits) but does not reflect real user click behavior, while mixing in click-range mapping (lg(k) bits per URL, k=8 → 3 bits) reduces throughput but better models normal browsing. Neither pure strategy is optimal; the design requires tuning the search-to-click ratio.
-
Analysis of the AOL search corpus shows an average search query length of 17.42 bytes with an entropy of 4.48 bits/byte, yielding 78.04 bits of deniable information per HTTP GET request. This entropy matches real user search behavior, making entropy-based traffic analysis unable to distinguish Facade traffic from genuine search sessions.
-
Content inconsistency — transmitting non-native payloads (e.g., modem signals or general web traffic) over VBR-encoded VoIP/video channels — is sufficient for censors to detect camouflage systems via packet-length traffic analysis. Channel inconsistency — requiring reliable transport over a loss-tolerant UDP channel — allows selective disruption: dropping 5% of packets stalls SkypeMorph indefinitely, and dropping 90% for under one second desynchronizes the FreeWave modem.
-
A single Facet server with 15 Mbit/s bandwidth and 4 virtual CPU cores supports up to 20 simultaneous video-streaming sessions. Server-side downstream bandwidth consumption is approximately 150 kbit/s lower than a Squid HTTP proxy for equivalent video quality (240p), because Facet fetches only the video stream without page overhead such as advertisements and images.
-
For decentralized videoconferencing systems (e.g., Skype) where peers communicate directly, publicly distributing the Facet server's conferencing ID allows a censor to pinpoint the server's IP address via active probing. Centralized systems (e.g., Google Hangout, FaceTime) hide the proxy IP behind the provider's relay server, making active probing unable to identify the Facet server.
-
Without traffic morphing, a χ² packet-length classifier can identify 90% of Facet (video-over-Skype) sessions with only a 10% false positive rate on genuine videoconferencing. To block 80% of Facet connections, the censor need only disrupt 4% of genuine Skype calls; blocking 70% requires disrupting only 2%.
-
Facet's video morphing — embedding the requested video in a fraction s of H.264 macroblocks within a randomly chosen chat video — raises the censor's required false positive rate dramatically. At steganography level s=0.125, blocking 90% of Facet connections requires disrupting over 40% of genuine videoconferencing traffic; blocking 80% requires disrupting at least 20% of legitimate calls.
-
DFA state-space explosion makes DFA-based FTE impractical for many realistic network-monitor regexes: the minimum DFA for `(a|b)*a(a|b){16}` has 131,073 states requiring 266 MB of precomputed tables, while the equivalent NFA has only 36 states requiring 73 KB — a reduction of roughly four orders of magnitude. Some formats in the Snort corpus required up to 383 MB under DFA-based ranking, rendering them prohibitive for deployment.
-
In PostgreSQL benchmarks, FPE-encrypted account-balance fields (libfte P-DD scheme, regex `\-[0-9]{9}`) reduce throughput by only 0.8% for complex mixed-transaction workloads (USUUI) and only 1.1% for SELECT-only workloads, relative to conventional authenticated encryption. Per-query latency for FPE versus authenticated encryption is identical across all five tested query types.
-
A deterministic FTE scheme (T-DD) that maps 16-digit credit card numbers to 7-byte ciphertext strings achieves simultaneous encryption and compression, reducing on-disk table size from 112 MB (authenticated encryption) to 42 MB — a 62.5% reduction — while maintaining provable privacy. The compression arises because the ciphertext format's message space is smaller than the plaintext's.
-
LibFTE exposes a regex-based API (Python, C++, JavaScript) that instantiates DPI-defeating FTE schemes from a regular-expression format specification alone, without expert cryptographic knowledge. The DCRS FTE scheme implemented in the library makes ciphertexts indistinguishable from real HTTP, SMTP, SMB, or other network-protocol messages under state-of-the-art DPI, and was already integrated into the Tor Browser Bundle at time of publication.
-
LibFTE's NFA-based 'relaxed ranking' sidesteps the PSPACE-hardness obstacle that previously made direct NFA ranking unworkable. Across 3,458 Snort IDS regular expressions in the network-monitor-circumvention setting, NFA-based ranking reduces client/server memory requirements by as much as 30% compared to DFA-based approaches.
-
In a 140-hour measurement, requests forwarded into a 10-node Darknet connected to the Opennet by a single bridge link succeeded only 0.08% of the time, versus 8.46% for Opennet-forwarded requests — a ~100× failure-rate gap caused by ID-space isolation between the two overlay segments.
-
An 8-week measurement in June–August 2012 discovered 58,571 unique Freenet installations across 102,376 distinct IP addresses; approximately 25% were in the US and 12.5% in Germany, with Europe and North America collectively representing the vast majority — users from countries typically associated with Internet censorship were a small minority.
-
Freenet's deployed Opennet topology uses uniformly random long-range contacts rather than Kleinberg-optimal distance-proportional selection, yielding an average routing length of 37.17 hops in simulation; adopting a 1/d distance distribution (r=1) reduces this to fewer than 13 hops — a 2.9× improvement achievable via a Kademlia-style bucket system.
-
Freenet users exhibit a median session length of 95–99 minutes (p=0.975–0.99), substantially longer than all measured P2P file-sharing systems (1–60 minutes for Napster, Gnutella, FastTrack, Overnet, BitTorrent, KAD); ~2% of sessions exceeded 100 hours, and the distribution is best modeled by a lognormal fit (residual error 0.019) rather than Weibull or exponential.
-
The FNPProbeRequest message, designed to return location and uptime of a node sampled via an 18-hop Metropolis-Hastings random walk, can be used to reliably track individual node online times — capturing >98% of online nodes per sampling interval — enabling intersection attacks on anonymity even though it cannot target a specific node by design.
-
The paper argues that the advantage in the censor-vs-circumvention arms race lies with the censor due to fundamental asymmetry: a nation state controls centralized communication infrastructure while dissidents depend on it. Standalone anti-censorship tools therefore face a structurally disadvantaged security posture that iterative patching cannot overcome.
-
Centralized communication architectures have a single global point of failure: governments can leverage centralization to surveil with or without operator cooperation, as demonstrated by the Snowden revelations about Skype, Facebook, and Google. A compromised broker in a centralized design enables monitoring and censorship that spans all users of the service.
-
The paper sketches a decentralized DHT-based communication protocol where all payloads are encrypted in TLS and explicit redirection enables a form of onion routing. Because the censor cannot distinguish censored from non-censored streams, it is forced into a binary choice: block all protocol traffic (overblocking) or allow all of it.
-
If a communication protocol is regularly used for business and commerce, blocking it may be too politically and economically costly for a censor. The paper posits that censorship resistance achieved as a side-effect of widespread general adoption is harder to defeat than a niche protocol designed solely to circumvent censorship.
-
Known attacks on existing circumvention tools include steganographic detection, enumeration of decoy-router locations, and machine-learning traffic classifiers. The paper acknowledges these defeat current approaches (Infranet, Collage, Telex, SkypeMorph, Freewave) and argues that no iterative patch can neutralize the censor's long-term structural advantage.
-
DNSSEC fails to withstand legal attacks because governments can legally compel DNS authority operators to manipulate entries and certify the changes; the trust chains DNSSEC establishes mirror DNS zone delegations and therefore inherit the same jurisdictional vulnerabilities. A Danish police incident demonstrated the collateral damage: 8,000 legitimate domains were accidentally removed when censorship procedures were executed against a single target. Chinese DNS injection has been shown to have worldwide effects on name resolution through out-of-bailiwick NS record chains.
-
Blockchain-based naming systems such as Namecoin are insufficient under a strong adversary model where a nation-state can muster more computational resources than all other participants combined, allowing it to produce alternative valid chain histories. This vulnerability is most acute during system bootstrapping and in censored regions where the user base is small, precisely the conditions under which a censorship-resistant naming layer is most needed.
-
Asymmetric IP routing is a fundamental constraint on prior E2M designs: tier-2 ISPs typically see around 25% of packets on asymmetric paths, while tier-1 ISPs can have up to 90% of packets on asymmetric flows. Because Telex requires observing both directions of a connection to derive the client-server TLS master secret, this asymmetry severely constrains where it can be deployed. TapDance resolves this by using chosen-ciphertext steganography to leak the master secret from client to station in a single upstream packet, making it functional under fully asymmetric routing.
-
TapDance introduces chosen-ciphertext steganography, which allows the client to embed an arbitrary-length hidden message inside a valid TLS ciphertext without invalidating the TLS MAC or session. By exploiting ciphertext malleability in both stream-cipher (counter) mode and CBC mode, the client can choose specific byte values to appear in the ciphertext while constraining plaintext to a safe ASCII range (0x40–0x7F), encoding 6 bits of tag data per ciphertext byte. This provides unbounded covert-channel bandwidth, compared to the fixed 224-bit TLS nonce used by Telex and Decoy Routing or the 24-bit TCP ISN used by Cirripede.
-
All three prior end-to-middle (E2M) schemes — Telex, Cirripede, and Decoy Routing — require an inline flow-blocking component at the participating ISP, which adds latency, introduces a single point of failure, and may violate carrier SLAs. In private discussions with ISPs, the authors found that despite willingness to assist Internet freedom technically and financially, none were willing to deploy existing E2M technologies due to these operational impacts. TapDance removes the inline blocking requirement entirely, requiring only a passive tap and packet-injection capability.
-
Scanning a 1% sample of the IPv4 address space and the Alexa top-1-million domains, the authors found that over half of all TLS hosts will leave an incomplete HTTP request connection open for at least 60 seconds before sending data or closing the connection; many had timeouts exceeding 5 minutes. The 16-core TapDance station prototype processes over 12,000 tag verifications per second per core, with approximately 90% of CPU time consumed by a single ECC point multiplication on Curve25519. The station adds a median latency of 270 milliseconds to page downloads versus direct connections, and a single station instance can be overwhelmed by approximately 1.2 Gbps of TLS application-layer traffic.
-
Because TapDance does not block client-to-server packets, a censor can inject a TCP packet with a stale acknowledgment number directly to the true decoy server; the server will reply with its actual TCP sequence state, which will differ from the sequence numbers the TapDance station has been using — confirming the flow is proxied. This active packet-injection attack is qualitatively easier to execute against TapDance than against Telex or Cirripede, which used inline blocking to prevent such probes from reaching the server. Table 1 in the paper confirms that TapDance, unlike Telex, lacks replay/preplay attack resistance and has no traffic-analysis defense.
-
In the August 2012 Bell-Dery BGP route leak, TTL analysis at per-prefix granularity revealed that two IP addresses within AS577 maintained constant TTLs and unaffected packet rates throughout the disruption, while 37 of 38 other active /16 prefixes experienced significant volume drops and TTL changes indicating rerouting through longer paths. This demonstrates that BGP route leaks can affect subnets within a single AS asymmetrically, and that TTL inspection can identify unaffected sub-AS paths.
-
During the February 2012 Dodo-Telstra BGP route leak, AS1221 (Telstra) exhibited a 20-minute congestion phase in which γC and γ3 both dropped while η rose from approximately 3 to 5 seconds, followed by a complete outage during which zero darknet sources were observed from the AS. The congestion phase produced measurable packet loss before the full blackout, providing an early-warning window of roughly 20 minutes.
-
Conficker-like traffic to TCP port 445 constitutes more than 40% of packets observed at the UCSD /8 Network Telescope and Windows XP/NT hosts consistently emit exactly 2-packet SYN flows; γC stayed within the narrow band 1.98–2.02 throughout an entire month (January 2012) with no large-scale outages. A second signal from default Windows 3-SYN flows (approximately 156 million flows/month from ~14K hosts/hour) provides a non-malware-specific validation stream with inter-packet times consistently between 3.09 and 3.37 seconds.
-
IBR-derived metrics γ (average SYN retransmits per flow) and η (inter-packet time between retransmits) can distinguish packet-loss-induced outages from packet-filtering censorship: during Libya's 2011 packet-filtering phase γC remained near pre-censorship values despite reduced source counts, whereas BGP route leaks caused measurable γ decreases and η increases. This difference exists because filtering reduces the host population but preserves per-flow OS retransmit behavior, while congestion causes routers to drop individual packets mid-flow.
-
Libya's 2011 Internet shutdown combined two distinct censor techniques across separate episodes: BGP-level route withdrawal and later packet filtering. During the packet-filtering episode, γC remained near its pre-censorship baseline (~2.0 packets/flow) even as the number of reachable Conficker sources dropped, confirming that the mechanism was per-subnet allowlisting rather than link saturation.
-
Self-censorship of status updates was significantly higher for users whose friend networks spanned greater political diversity, indicating that perceived audience heterogeneity amplifies the chilling effect even in the absence of any explicit platform enforcement action.
-
Das and Kramer measured last-minute self-censorship on Facebook by tracking text typed into composer fields but never submitted; 71% of users in their study composed at least one status update or comment that they ultimately withheld during the 17-day observation window.
-
The study collected behavioral data from approximately 3.9 million Facebook users by instrumenting client-side JavaScript to detect composition-then-abandonment events, without capturing the suppressed text itself; this passive measurement approach allowed population-scale inference of suppression rates without content-level surveillance.
-
Comment self-censorship was primarily driven by the relationship between commenter and post author: users were more likely to suppress comments when the audience included the post's author, underscoring that relational asymmetry — not just content sensitivity — shapes suppression decisions.
-
In 80% of measured paths (72 PlanetLab VPs × 5,000 Alexa targets), at least one intermediate router returns the full IP packet in ICMP time-exceeded replies (RFC1812-compliant), enabling per-hop detection of packet modifications. The majority of these full-ICMP routers reside in the network core rather than the access segment.
-
Middleboxes that randomize TCP sequence numbers do not update the sequence numbers inside TCP SACK blocks; tracebox found two PlanetLab VPs with stateful seq-number randomizers that cycled approximately every 20 seconds. When SACK blocks reference sequence numbers outside the current window, the Linux TCP stack waits for a full RTO instead of fast-retransmitting, producing up to 50% throughput degradation in controlled measurements.
-
Of 72 PlanetLab vantage points, 7 (~10%) automatically stripped or replaced TCP options (Multipath TCP, MD5, and Window Scale) with NOPs at the very first hop, and 2 VPs always altered TCP sequence numbers. These modifications occurred without any corresponding update to dependent fields, corrupting the TCP stream for higher-layer protocols.
-
tracebox can estimate middlebox location with an error of ≤4 hops in 61% of cases; errors above 13 hops (the length of ~60% of paths) are each below 1% individually. Of MSS-modifying middleboxes detected, 52% were located in the network core and only 2.7% close to the source vantage point.
-
tracebox identified a transparent HTTP proxy or IDS within a National Research Network (SUNET) that intercepted port-80 SYN probes but not port-21 SYN probes, producing shorter observed path lengths to port 80. It also found proxy misconfigurations causing forwarding loops for non-HTTP traffic, where ICMP replies alternated between two routers indefinitely.
-
High-speed Internet-wide scanning enables a censor or attacker to locate every publicly reachable host vulnerable to a newly disclosed flaw within hours of disclosure; in a concrete example, 3.4 million UPnP-vulnerable devices were identified in under 2 hours — faster than network operators could apply patches — with a 150-SLOC probe module written in approximately 4 hours.
-
Comprehensive Internet-wide scanning enables cross-IP tracking of users and devices by correlating stable cryptographic identifiers — TLS certificates or SSH host keys presented by home routers and cable modems — with public geolocation data across DHCP lease changes, defeating the anonymity assumption behind dynamic IP addresses.
-
By scanning ports 443 and 9001 and fingerprinting responses with Tor's TLS v1 cipher-suite handshake pattern, ZMap identified 79–86% of all allocated Tor bridge fingerprints in a single scan, demonstrating that bridges whose protocol is distinguishable are largely discoverable through comprehensive Internet-wide scanning even though their addresses are not publicly listed.
-
Manually-generated FTE regexes achieve a 100% misclassification rate against all six tested DPI systems — appid, l7-filter, YAF, bro, nProbe, and the proprietary enterprise-grade DPI-X — for HTTP, SSH, and SMB target protocols. Each regex took less than 30 minutes to specify and debug against known classifiers.
-
FTE proxy overhead compared to socks-over-ssh: the intersection-ssh format incurred 0% average latency increase and only 16% bandwidth overhead (1,164 KB vs. 1,348 KB per Alexa Top 50 site). The worst-case auto-http format incurred 29% latency increase (5.5 s vs. 7.1 s) and 181% bandwidth overhead (3,279 KB), primarily due to ciphertext expansion and FTE/SOCKS negotiation on persistent empty TCP connections.
-
Regex-based DPI is fundamentally vulnerable to format-transforming encryption: because every tested system (including the proprietary enterprise-grade DPI-X, rated for 1.5 Gbps at $8,000) classifies protocols solely by membership in a regular language, any ciphertext can be guaranteed to match any chosen regex. The paper argues this forces DPI to adopt machine learning, active probing, or non-regular semantic checks — but notes that making such checks fast, scalable, and low-false-positive at line rate for arbitrary target protocols remains an open problem.
-
Online scanning services span security scanners, ad networks (Google AdSense), web diagnostics, and link shorteners—categories economically important enough that blocking them wholesale causes severe collateral damage. The paper identifies five broad OSS categories with dozens of providers, and notes that translation services, photo printers, RSS aggregators, and image hosts are additional unexplored candidates, making exhaustive enumeration by a censor infeasible.
-
In the standard redirect design the cooperating proxy's IP address or domain name appears in plaintext HTTP redirect responses, because the censored client cannot present a valid TLS certificate to the OSS and must use plain HTTP. A censor inspecting OSS-bound traffic can extract the proxy address from the Location header or URL query parameters. The no-redirect variant (client and server each initiate single scans of each other) eliminates this leakage at the cost of higher latency and server-side OSS enumeration.
-
Injecting a single replayed ACK packet every 100 ms into a SkypeMorph session is sufficient to permanently stall data transfer: the server continuously resets its sequence counter back to the replayed position and never advances, while legitimate VoIP call traffic is completely unaffected. The attack requires the censor to induce only a small amount of server-to-client packet loss to prevent the legitimate ACK counter from overtaking the injected value, as shown in Figure 5b.
-
FreeWave's modem synchronization depends on a preamble transmitted only at connection start (approximately 0.25 seconds for a 2048-symbol preamble); a censor applying 95% packet loss for under one second at the beginning of the session reliably prevents synchronization and breaks the connection, while reducing VoIP MOS only briefly and leaving the remainder of the session intact (Figure 2). With fixed data-frame designs, the censor can repeat preamble-targeted drops on every frame, achieving complete desynchronization at low average packet loss rates tolerable to legitimate VoIP.
-
By targeting SkypeMorph's deterministic ACK-flagging schedule (one ACK every ~100 ms) and capping overall packet loss at 5–20%, a censor can drop up to 47% of ACK packets, reducing SkypeMorph throughput from its normal ~200 KB/s to 5–10 KB/s (a 90–95% reduction) while VoIP call quality remains within acceptable MOS thresholds. The attack exploits the reliability mismatch between the loss-tolerant UDP cover channel and the TCP-like retransmission layer SkypeMorph builds over it.
-
FreeWave's modem generates audio whose packet-length distribution has dramatically lower variance than human speech, even when transmitted through Skype's variable-bit-rate encoder; Figure 9 shows that English and Portuguese speech samples produce high-variance packet-length sequences while modem audio produces a narrow, nearly constant distribution, providing a reliable passive classifier for modem-over-VoIP traffic. This content mismatch persists even with perfect emulation of the VoIP protocol framing.
-
Purpose-built or uncommon radio hardware provides governments a legal pretext for crackdowns, is subject to import restrictions, and aids identification of dissidents via radio direction-finding equipment. The authors conclude that only ubiquitous, innocuous devices—smartphones and standard indoor WiFi access points—can be used in a dissent network without raising suspicion or endangering users.
-
Mesh networks can reach meaningful scale only by adopting centralized management, planned growth, and a static topology—properties that simultaneously create a single point of failure and make nodes easy targets for government radio direction-finding. Decentralized, organic, mobile mesh retains safety properties but at the cost of near-zero effective capacity as network size grows.
-
Pseudonymity is insufficient for dissent networks: social-network profile information can be correlated with external data to deanonymize users, and fixed-infrastructure networks enable localization attacks even without explicit identity. The authors argue that true anonymity—or at minimum strong deniability where usage is non-incriminating and activity is difficult to trace—is required to protect participants.
-
Hypothetical fixed parrot systems (SkypeMorph+ and StegoTorus+) that correct all passive detection failures remain unambiguously detectable via active and proactive attacks (Table II). Supernode cache flushing and TCP control channel manipulation — e.g., sending RST causes genuine Skype to drop the call immediately while parrots produce no reaction — distinguish them from genuine Skype because the parrot cannot actually execute Skype protocol logic.
-
CensorSpoofer's IP-spoofing architecture has an unfixable detection flaw: the spoofer cannot receive or respond to SIP probe messages (INVITE, invalid SIP, BYE for random call IDs) directed at the spoofed dummy host, making four SIP probing tests (Table IV) reliably distinguish CensorSpoofer from genuine Ekiga at local-censor cost. The nmap-based dummy-host selection algorithm identifies only 12.1% of 10,000 random IPs as candidate hosts; SIP probing of 10,000 random addresses found zero IETF-based VoIP clients.
-
The authors enumerate 12 requirements a parrot system must satisfy simultaneously (Correct, SideProtocols, IntraDepend, InterDepend, Err, Network, Content, Patterns, Users, Geo, Soft, OS) while a censor need detect only one failure. They conclude 'unobservability by imitation is a fundamentally flawed approach' and recommend embedding covert traffic in genuine encrypted payloads of a real running protocol (e.g., FreeWave in Skype voice, SWEET in email), which constrains detection to OM adversaries performing large-scale multi-flow analysis.
-
SkypeMorph and StegoTorus-Embed fail 5 of 9 standard Skype identification tests (Table I), including the TCP control channel (T9), SoM packet headers (T3), and periodic message exchanges (T6/T7). All failures are detectable by a local (LO) passive censor at line speed without requiring ISP-scale statistical analysis.
-
The StegoTorus-HTTP module returns '200 OK' for non-existent URIs, produces no response to HEAD, OPTIONS, DELETE, and TEST method requests, and omits xref tables from generated PDF files. Using httprecon with 9 request types, the StegoTorus server is distinguishable from any real HTTP server by an OB (resource-limited) censor that records port-80 destination IPs at line speed and fingerprints them offline.
-
For a Collage-style system with T forward-security time intervals and k rendezvous-point identities (e.g., k popular Flickr hashtags), standard public-key steganography requires distributing kT public keys, whereas an IBST-based solution requires distributing only 1 master public key. This reduction is exact — the paper states it verbatim as an efficiency argument.
-
Key distribution is the primary bootstrapping weakness of steganography-based censorship-resistance systems: a censor can simply block stego-key distribution. Identity-based steganographic tagging (IBST) eliminates this attack surface by requiring only a single master public key, which can be bundled with the client software — no key distribution inside the censored area is necessary.
-
The IBST construction is provably secure under the bilinear decisional Diffie-Hellman (BDDH) assumption in the random oracle model. Any adversary with advantage ε(λ) against IBST indistinguishability implies an adversary against BDDH with advantage at least ε(λ)/e(1+qE), where qE is the number of private-key extraction queries. Tags produced by the scheme are computationally indistinguishable from uniform random bitstrings for any party lacking the recipient's private key.
-
Replacing Telex's original stego-tagging with the IBST scheme and using time periods as identities achieves eventual forward security with arbitrarily short rotation intervals. The key material a client needs after a master-key rotation is only the new master public key — 'a few hundred bytes' — small enough to fit in covert channels such as steganographic images, avoiding the original Telex design's problem of large bundled key sets expiring before a client updates its software.
-
The paper proves that immediate forward security is impossible for Telex-like decoy-routing systems. The Telex station must decide whether to treat a connection as a Telex request after the first client message, using only received messages and its long-term key — an eavesdropper who stores all network traffic can replay the station's entire view once it compromises the station's long-term key, retroactively decrypting all sessions.
-
In a DHT-based censorship-resistant name system, poisoning attacks (injecting invalid mappings) are neutralized by requiring signature verification on stored values; eclipse attacks (isolating specific mappings from the network) require replication across multiple DHT nodes. Critically, decentralizing lookups from a single ISP resolver to a DHT shifts query visibility from ISPs to arbitrary peers, requiring per-query encryption keyed to secrets known only to the querying client to limit adversaries to confirmation attacks.
-
DNSSEC's hierarchical delegation structure provides no protection against state-level censors: governments can legally compel top-level domain operators to alter records, and coerced results still validate because they are signed by the coerced-but-technically-legitimate authority — making end-to-end DNSSEC security insufficient to detect such attacks.
-
Pseudo-TLDs (e.g., '.key' for cryptographic-identifier namespaces, '.pet' for petname systems) allow multiple censorship-resistant name systems with distinct security trade-offs to coexist transparently alongside DNS via Name Service Switch configuration, with system-specific resolution logic applied per TLD and no application reconfiguration required by users.
-
In an adversary model where the censor may hold more computational power than all honest nodes combined, a squatting attack lets the adversary enumerate and pre-register every memorable name, formally proving it is impossible to simultaneously achieve memorable, secure, and global names in a single name system (Zooko's triangle).
-
In simulated event-driven (crisis) blocking where all corrupt users simultaneously block bridges on day 300, available bridges drop from ~500 to ~150 and thirsty users spike to 25%; maintaining 50 reserve bridges (~10% of deployed stock) halves the thirsty-user count, and 100 reserve bridges nearly eliminates thirstiness among users who had accumulated sufficient credits.
-
Knowing a user's bridge assignment narrows the adversary's anonymity set to the small group sharing that bridge, deanonymizing Tor users even when the bridge itself is not compromised; rBridge addresses this using 1-out-of-m Oblivious Transfer, Pedersen commitments, and non-interactive zero-knowledge proofs so the bridge distributor learns nothing about which bridges a user holds.
-
rBridge tolerates up to ~30% malicious users with acceptable bridge protection, but fails at f≥50%; with f=5% under aggressive blocking, over 95% of users are never bridge-starved and ~50% of bridges are never blocked, while conservative blocking (corrupt users waiting 225 days before acting) causes ~10% of users to be thirsty 15% of the time because delayed blockers accumulate enough credits to inject additional malicious invitees.
-
rBridge outperforms Proximax by at least one order of magnitude across all robustness metrics under aggressive blocking with 5% malicious users: to support 200 users for 30 days, Proximax requires at least 2400 bridges while rBridge needs only 108, and in Proximax fewer than 5% of bridges produce more than 20 user-hours versus 99% in rBridge.
-
ScrambleSuit's prototype achieves a mean goodput of 148 KB/s (σ=61 KB/s) versus Tor's 286 KB/s (σ=227 KB/s) over a 100 Mbit/s LAN — roughly half Tor's throughput — with 45–50% total protocol overhead compared to Tor's 19.6%. Disabling inter-arrival time obfuscation raises goodput to 321 KB/s (σ=231 KB/s), demonstrating that artificial delays are the dominant cost rather than padding or cryptography.
-
ScrambleSuit achieves polymorphism by seeding each server's PRNG with a randomly generated 256-bit value, which generates server-specific probability distributions over packet lengths (up to 100 bins) and inter-arrival times (bins in [0, 10) ms). The seed is shared with clients after authentication, so both sides shape traffic identically; a censor monitoring two distinct ScrambleSuit servers observes different distributions and cannot build a single universal classifier.
-
Client proof-of-work puzzles are ineffective as an active-probing defense because a state-level censor with parallel hardware can solve multiple puzzles simultaneously, one per CPU core. The authors estimate that the Tor bridge churn rate (rate of new bridge IP addresses) is too low to raise a well-equipped censor's workload beyond practical limits without simultaneously making the scheme impractical for legitimate clients — the same balancing problem as PoW for spam.
-
Tor's traffic contains a characteristic prevalence of 586-byte packets (Tor's 512-byte cells plus TLS header overhead) that form a strong flow-level fingerprint detectable from a few dozen captured packets. ScrambleSuit's packet length morphing eliminates this signature and shifts the distribution toward MTU-sized packets, but the authors note that a censor using the VNG++ classifier — which relies on coarse features like connection duration, total bytes, and burstiness — would still require only a marginal increase in ScrambleSuit's overhead to defeat.
-
When using a foreign encrypted email provider (AlienMail), the censor observes only an encrypted connection to the foreign mail server (e.g., Gmail's servers in the U.S.); it cannot see the recipient address or the SWEET server's IP, making spam-filtering-style blocking of the SWEET endpoint entirely infeasible. This anonymity is provided by the mail provider's own TLS, requiring no additional obfuscation from the client.
-
When using a domestic email provider that collaborates with the censor (DomesticMail), SWEET clients must embed tunneled data via steganography (image or text) and coordinate a secondary secret email account with the SWEET server out-of-band. This prevents the censor from discovering the SWEET server association via recipient-field inspection, but adds operational complexity and requires an out-of-band bootstrapping channel.
-
Over 11,700,000 DNS requests across 6 days at ICSI's border network and 15,200,000 DNS transactions in a 1.5-hour trace at UC Berkeley's border, secondary differing DNS replies were essentially absent in normal traffic, yielding effectively 0 false positives. Only two benign authority servers produced anomalous dual replies at Berkeley—one for the BBC returning two addresses within the same /24, one for businessinsider.com returning a SERVFAIL—neither of which would disrupt a Hold-On resolver.
-
Because browser-based proxies can only initiate outbound connections, flash proxies connect to censored clients rather than the reverse, requiring the facilitator to maintain a registry of client IP addresses; a censor can impersonate a legitimate flash proxy to query the facilitator and enumerate the IP addresses of circumvention users.
-
With 512 PlanetLab nodes each advertising 50 KB/s as malicious Tor middle routers, the theoretical catch probability that at least one bridge circuit traverses a controlled node reaches P(512, 50, 30) ≈ 99% after only 30 circuits. In real-world validation, the 21st circuit created by a bridge client traversed one of the 512 controlled PlanetLab nodes, matching theory. The result generalizes: the 30-circuit exposure threshold applies to any adversary whose nodes' aggregated bandwidth reaches the equivalent of 512 × 50 KB/s = ~25.6 MB/s.
-
Tor's bandwidth-weighted path selection creates a structural amplification: 60% of middle routers selected across 430 circuits had bandwidth above 1 MB/s, yet only 10% of all Tor routers exceed 1 MB/s. This skew means that an adversary advertising a single high-bandwidth middle node achieves selection probability far exceeding its proportional count in the network, making high-bandwidth Sybil nodes highly cost-effective for bridge discovery.
-
The paper identifies three countermeasure classes against bridge discovery: (i) CAPTCHA on email/HTTPS distribution (limited by automated solving services); (ii) uniform random middle-node selection, which defeats bandwidth-Sybil attacks but degrades Tor throughput by routing through low-bandwidth nodes; (iii) DHT-based P2P architecture where no central server holds all bridge IPs, making systematic enumeration infeasible—though DHT systems introduce Sybil and eclipse-attack vulnerabilities of their own.
-
Large-scale email and HTTPS enumeration of Tor bridges using 500+ PlanetLab nodes and 2,000 Yahoo accounts discovered 2,365 distinct bridges over approximately one month. The bridge https server rate-limits distribution to 3 bridges per 24-bit IP prefix per day, and the email server to 1 reply per account per day; these controls are circumvented by sourcing requests from hundreds of distinct prefixes. Bridge distribution follows a weighted coupon collector model proportional to bridge bandwidth, not uniform probability.
-
A single malicious Tor middle router advertising 10 MB/s bandwidth discovered 2,369 distinct bridges in 14 days. The catch probability is determined solely by the aggregated bandwidth M = k·b of malicious middle routers regardless of how that bandwidth is distributed across nodes: three routers at 10 MB/s each achieve strictly greater catch probability than 512 nodes at 50 KB/s each. This means a well-resourced single node is equivalent to or surpasses hundreds of low-bandwidth Sybil nodes.
-
The paper explicitly flags that BTP's fixed-size b-byte connection tag creates an active-probing oracle: a censor that sends b−1 bytes and observes no close, then sends one more byte and observes a close, can confirm the endpoint is running BTP. Preventing such active-probing attacks is identified as future work.
-
BTP's forward secrecy guarantee depends on reliably destroying old keys, but the paper notes that secure deletion from persistent storage—especially solid-state storage—is difficult with current operating systems and hardware. The recommended mitigation is passphrase-derived encryption of stored secrets, though this shifts the problem to passphrase protection.
-
BTP achieves forward secrecy over unidirectional transports—where ephemeral in-band key exchange is impossible—by using a one-way key derivation function (NIST SP 800-108) to produce sequential temporary secrets from an initial shared secret. Once both devices destroy a given temporary secret, no keys derived from it can be reconstructed even if devices are later compromised.
-
BTP's wire protocol contains no handshakes, timeouts, or plaintext headers. Connections open with a pseudo-random b-byte tag that the recipient can compute in advance from its key state, making BTP frames indistinguishable from random data to a passive observer who does not know the shared secret.
-
BTP's secret retention period for transport t is Rt + 2C + Lt, where Rt is the rotation period, C is the maximum clock-skew tolerance, and Lt is the maximum transport latency. With Rt = 2C + Lt only two temporary secrets need simultaneous storage. Concrete durations: TCP with automatic clocks (C=10s, Lt=60s) requires 2 minutes 40 seconds; TCP with manual clocks (C=1800s) requires 4 hours 2 minutes; mail with manual clocks (Lt=2 weeks) requires 4 weeks 4 hours.
-
Content-oblivious replication delegates ongoing availability maintenance to 'manifest guarantors' — nodes holding content manifests — who periodically sample chunk replication factors and restore missing replicas without knowing the plaintext they protect, freeing the original publisher from any post-publication obligation. Two honest manifest holders (one content, one key) are sufficient to maintain replication with overwhelming probability even under adversarial conditions and high churn.
-
Simulation over erasure code parameters uniformly sampled from m∈[1,5] and n∈[5,500] shows that a 50-of-500 code is the best trade-off between overhead and robustness: it requires nearly 10× storage overhead to support 2^60 variable-size chunks and allows the network to tolerate more than 70% node failure before data is lost. Replication combined with erasure coding yields better durability than either strategy alone.
-
A hybrid garbage-collection scheme combining time-based expiry (last-access timestamp cutoff), popularity-based retention, and editor-signed manifest exemptions forces adversaries conducting pollution or exhaustion attacks to continuously re-access or re-upload junk to prevent its deletion. A single honest editor's signature is sufficient to exempt important but infrequently accessed content from deletion indefinitely, while malicious editors cannot explicitly remove content from the system.
-
One-way indexing separates a published file into encrypted content blocks (indexed by hash1(block)), a content manifest (indexed by hash2(keyword)), and a key manifest (indexed by hash3(keyword)), so a storer holding all content chunks cannot recover the plaintext or keywords without inverting a cryptographic one-way function. Using distinct hash functions for each manifest type also minimizes the probability that a single node stores both manifests, preventing correlation.
-
In a 250-node PlanetLab deployment with 10–15% silent node failures and high churn, the median user retrieved a 20MB file in 65–85 seconds end-to-end (search + manifest download + chunk fetch + reconstruction + decryption). 15.12% of DHT lookups and 11.24% of maintenance operations failed; 20% of nodes accounted for 80% of failures, yet nodes with working connections completed lookups and maintained sufficient guarantors for manifest replication.
-
The MIT ANA Spoofer project shows that over 400 ASes (22%) and 88.7 million IP addresses (15.7%) permit outbound IP address spoofing, constraining where CensorSpoofer proxy nodes can be deployed. ASes applying ingress/egress filtering make IP-spoofing-based downstream channels infeasible from those locations.
-
#h00t achieves censorship resistance by truncating a key-derivation-function output to k bits to produce a 'short tag', deliberately inducing collisions across unrelated groups. A censor cannot block a targeted group's short tag without simultaneously blocking all colliding groups — including innocuous, high-traffic ones — forcing heavy-handed censorship that creates domestic blowback. The design provides plausible deniability: subscribers can claim they follow a foreign pop star rather than a dissident group.
-
Even with end-to-end encrypted messages, a censor observing subscription queries can detect anomalous interest in a short tag (e.g., a sudden domestic surge in followers of a foreign pop star's hashtag) and use timing/size traffic analysis to distinguish #h00t subscriptions from ordinary hashtag follows. The paper flags this as an open threat and proposes two mitigations: (1) push cover traffic for randomly selected short tags to all clients regardless of their actual subscriptions, or (2) silently redirect normal clients' hashtag follows to the corresponding #h00t short tags.
-
Both Egypt and Libya demonstrate that concentration of Internet infrastructure under state ownership—in Egypt, all submarine fiber backhaul terminated at a single facility, the Ramses Exchange, controlled by the state telecommunications provider—makes country-wide BGP-based shutdowns technically straightforward. The authors conclude that the small number of state-controlled parties involved in international connectivity was the critical enabling factor, not any novel technical capability.
-
Unsolicited background radiation traffic to the UCSD network telescope—particularly Conficker worm scanning (TCP SYN, port 445, 48-byte packets)—dropped nearly simultaneously with Egyptian BGP route withdrawals on January 27, corroborating control-plane analysis with data-plane evidence. Crucially, some worm-infected hosts continued to generate outbound scanning traffic even after their prefixes were BGP-withdrawn, because packet filtering was absent; this asymmetry between inbound unreachability and outbound connectivity can distinguish pure BGP-based blocking from combined BGP-plus-filtering approaches.
-
Egypt's Internet shutdown on January 27, 2011 was accomplished via BGP route withdrawals: approximately 2,500 IPv4 prefixes (out of 2,928 visible) disappeared within a 20-minute window beginning at 22:12:26 GMT, leaving only 176 prefixes visible by 23:30:00 GMT. The shutdown lasted more than five days, with BGP connectivity beginning to return at 09:29:31 GMT on February 2, and more than 2,500 Egyptian prefixes back in global BGP tables by 09:56:11 GMT.
-
During Egypt's 5.5-day Internet blackout, active CAIDA Ark measurements found that only 1% of probes to Egyptian IPv4 prefixes received responses, compared to 16–17% on normal days. The minority of addresses that retained bidirectional connectivity all mapped to BGP prefixes that had not been withdrawn—including prefixes serving the Egyptian stock exchange and two national banks, whose 83 prefixes were kept live until January 31 at 20:46:48 GMT before being simultaneously withdrawn.
-
Libya implemented escalating Internet disruptions before executing a sustained blackout: a 6.8-hour curfew on February 18 and an 8.3-hour curfew on February 19, followed by a 3.7-day near-total blackout beginning March 3. The authors detected what they believe were Libya's attempts to test firewall-based packet filtering before transitioning to more aggressive BGP-based disconnection, demonstrating a two-phase escalation pattern.
-
Graduated censorship — limiting the suppression rate to remain within the typical weekly variance band — evades the weekly-interval detector entirely. The paper acknowledges that detecting slow-ramp blocking requires extending the observation window beyond seven days.
-
Per-jurisdiction user counts are modeled as a Poisson process; the detector infers the 99.99th-percentile credible interval for the underlying rate λ from the observed count via a Gamma-Poisson approximation rather than a Gaussian assumption, correctly treating small-jurisdiction zero-user days as non-anomalous.
-
The detector constructs its 'typical ratio' baseline exclusively from the 50 largest jurisdictions, then discards outliers beyond four inter-quartile ranges of the median before fitting N(m,v). This ensures a jurisdiction undergoing active censorship cannot bias the global model and mask its own anomaly.
-
A censor can defeat the anomaly detector without triggering an alert by replacing blocked user traffic with synthetic requests from adversary-controlled machines, keeping per-jurisdiction connection counts within the typical range. The paper explicitly identifies this as an unaddressed active-attack vector.
-
The deployed system uses 7-day intervals and a baseline built from the 50 largest Tor jurisdictions; a jurisdiction's user-count ratio is flagged when it falls outside the 99.99th percentile of the fitted Normal distribution N(m,v), yielding an expected false-alarm rate of approximately 1 in 10,000 per jurisdiction-week.
-
If clients probe the top 1,000 Alexa-ranked sites to discover a deflecting router, a censor would have to block more than 95% of those 1,000 sites to prevent any client from joining Cirripede. Clients aware of failed probes can continue cycling through additional popular sites, further raising the blocking cost.
-
Cloud-based onion routing confronts censors with a collateral-damage dilemma: blocking a cloud provider's IP prefixes requires blocking all co-hosted services (Amazon EC2 hosted over 1 million instances sharing common IP prefixes in 2010), while allowing the traffic means circumvention succeeds. Rotating IP addresses—by retiring and spinning up new VM instances or via DHCP/gratuitous ARPs—reduces the window a blocked address remains in service, forcing censors into a perpetual cat-and-mouse game across all major cloud providers simultaneously.
-
COR does not solve the bootstrapping problem: a user's first connections to the COR bootstrapping network are vulnerable to the same IP-enumeration and blocking attacks as public Tor directory connections. To mitigate directory-partitioning attacks, directory retrieval is always performed through an existing COR circuit, and directories return only a random subset of available nodes rather than the full list—but this subset-delivery design is itself exploitable by a malicious directory that can fingerprint users via uniquely-assigned relay subsets.
-
COR circuit construction enforces four properties to prevent single-entity de-anonymization in a limited-provider setting: (1) entry and exit ASPs must differ; (2) entry and exit CHPs must differ; (3) the same ASP's relays must not surround another ASP's relay without an intervening hop of a distinct ASP; and (4) at least two relays per traversed datacenter so an adversary with only perimeter visibility cannot trivially correlate ingress/egress.
-
Decoy routing places the circumvention service at transit routers rather than fixed-IP edge proxies, so the client addresses packets to any reachable decoy destination and the router hijacks the flow on the client's behalf. A single well-placed router may lie on paths to millions of destinations, making circumvention proxies appear ubiquitously deployed from an adversary's perspective. Blocking such a router requires disrupting ordinary traffic for large fractions of the Internet, qualitatively raising the cost of IP-address-based censorship.
-
An adversary aware of a decoy router's location can force decoy-routed flows to be unprocessable by fragmenting all packets below the size of a complete TCP header in the first fragment, preventing flow assignment and forcing the router into expensive reassembly. Alternatively, the adversary can use small-fragment attacks to grow the router's state table, analogous to NAT resource exhaustion. The paper identifies fragmentation-based denial as a harder-to-mitigate attack class than sentinel replay.
-
A preplay attack defeats the TLS-sentinel covert channel: the adversary intercepts each ClientHello, immediately sends a copy to the decoy destination before the client's copy arrives, causing the sentinel to be consumed and poisoned. The client can never establish a decoy routing session while ordinary TLS to the decoy destination continues to work normally, giving the adversary both blocking capability and forensic confirmation that decoy routing was attempted. The paper notes this vulnerability is specific to the TLS sentinel and that alternatives such as port-knocking sentinels may not share it.
-
TCP flow hijacking by the decoy proxy is practical under an asymmetric routing assumption: expected sequence numbers are recoverable from ACK values in client-originated packets alone, so the decoy router need not observe return traffic. The proxy forges a TCP RST to the decoy destination and mimics its TCP options (timestamp, window scale, SACK) to reduce detectability; these options are conveyed encrypted inside the sentinel's 28-byte TLS random field.
-
Clients embed HMAC-derived, time-varying sentinels into the 28-byte random field of the TLS ClientHello message, which decoy routers can scan at line rate. Sentinels are keyed to the current hour and a per-hour sequence number, providing freshness. This covert channel requires no out-of-band signaling and is invisible to passive observers who see only a normal TLS handshake toward the decoy destination.
-
Encrypting traffic at the application layer still discloses communicating parties to every ISP along the path; overlay anonymization is subject to blacklisting of exit nodes and traffic analysis. The paper argues that effective privacy requires building anonymity into the network routing layer itself, with the necessary tradeoff being hardware cost and routing inefficiency for privacy-requiring circuits.
-
Tor-like anonymizing overlays are easily censored because they rely on centralized, publicly visible relay lists; governments can blacklist Tor nodes or monitor all Tor exit traffic so that traffic analysis can reveal the source. Traffic to or from Tor 'essentially advertises itself as probably worth tracking.'
-
The design guarantees that as long as an end host can reach any non-censoring ISP, it can trampoline to any service; the anonymity properties make it difficult for ISPs to selectively block flows without cutting off the end host from the outside world entirely. Wikileaks-like services require only one willing authority for name resolution, not universal cooperation.
-
Channel blocking risk in Proximax is modeled as an independent Poisson process with rate λj; when a proxy is advertised on multiple channels simultaneously the risk parameters add (Λi = γ + Σλj), so each additional dissemination channel shortens expected proxy lifetime 1/Λi. The analytic result is that redundant multi-channel broadcasting is strictly suboptimal once cumulative risk exceeds the marginal usage gain.
-
A sophisticated censor can infiltrate a proxy distribution system, accumulate large numbers of proxy addresses and channel identities, and delay mass-blocking for weeks or months to maximize information before acting. The paper argues this is self-limiting: delayed blocking extends proxy lifetimes (benefiting system yield), and the infiltrating account's subtree reputation score degrades sharply the moment it begins blocking proxies, triggering exclusion from future proxy assignments.
-
Proximax uses fast-flux DNS — multiple IP addresses registered to one personalized domain with short TTLs and round-robin rotation — to resist channel-level DNS blocking. When a channel's domain is blocked, the system issues a fresh individualized hostname, forcing the censor to repeat discovery rather than permanently suppressing the channel with a single DNS entry removal.
-
Open proxy distribution registrations are vulnerable to adversary flooding with fictitious accounts that inflate yield scores via dummy connections. Proximax uses invitation-only registration with RICO-style subtree reputation scoring — a compromised sub-node taints the entire inviting user's subtree — and sub-linearly credits usage from closely clustered source IP prefixes to limit bot-driven inflation.
-
Proximax frames proxy distribution as a yield-maximization problem: the expected yield of a proxy is its attracted usage Ui divided by its total blocking risk Λi. A dissemination channel should only be assigned a proxy if the channel's own yield ratio u/λ exceeds the proxy's current yield ratio; otherwise the added risk outweighs the additional traffic and the channel must not be used at all.
-
Each round of copyright enforcement drove deeper architectural decentralization: centralized servers (BBSs/FTP) → central directory (Napster) → supernodes (KaZaA/Grokster) → pure protocol (BitTorrent). Even after Grokster was shut down its software continued to work, because no fixed corporate entity remained as the control point.
-
DNS infrastructure is a primary chokepoint target: U.S. DHS seized domain names of sites including rojadirecta.org — found non-infringing under Spanish law — without Congressional authority. The proposed PROTECT-IP Act (2011) would have authorized DNS injection against 'non-domestic' domains. Developers countered with a browser plug-in distributing alternate domains outside U.S. jurisdiction; Mozilla refused a DHS demand to remove it.
-
When RIAA filed suit against more than 30,000 individual filesharers, users migrated toward anonymous channels, small-world networks of vetted peers, ephemeral pointers, and user-generated IP blacklists for spoofed-peer detection. The University of Washington demonstrated IP-to-person attribution is unreliable — a networked laser printer received a DMCA takedown notice.
-
Censorship operating at the infrastructure layer (hosting, DNS, ISPs) rather than the content layer produces opacity: blocklists must be kept secret lest they become menus of blocked content, accuracy cannot be examined, and harms are divided from those with incentive or expertise to oppose them. The consistent pattern in anti-censorship responses is to distribute, decentralize, encrypt, and obfuscate — making circumvention traffic indistinguishable from permitted use.
-
The U.S. 'five strikes' program had major ISPs reduce bandwidth of accused subscribers; challenging required a $35 fee with only one permitted defense category ('unauthorized use of account'). Users responded by routing traffic through VPNs and anonymizing networks such as I2P to bypass ISP-level monitoring entirely.
-
21% of all URLs that CensMon began tracking were found accessible on the very first re-probe, indicating initial inaccessibility was a transient network failure rather than censorship. The false-network-failure rate fell to near zero after 3 consecutive tracking attempts, providing a practical threshold for classifying persistent inaccessibility as filtering.
-
Some politically active bloggers in the studied country deliberately continued publishing on officially court-blocked platforms, reasoning that official blockage created a legal defense against persecution: 'if they say you wrote this on your blog, I will say all of these blogs are blocked according to this court decision—they don't exist and they are officially inaccessible to citizens.' This co-option of censor infrastructure as a shield was treated as a serious protective strategy.
-
A politically active blogger in an anonymized censored country explicitly avoided BlackBerry encryption stating: 'they can't crack that encryption and they would just get suspicious. Cause they listen to me and listen to me and then suddenly I am encrypting and so that means I am really saying something they don't want me to.' This documents censor behavior where the mere use of strong encryption—independent of content—serves as a targeting signal.
-
Blocking in the studied country was erratic and inconsistent: some geographic areas accessed the Internet through channels outside the main government-controlled pipeline and experienced no blocking, while other areas experienced sudden unexplained block-and-unblock cycles (e.g., a video sharing site and a microblogging site were blocked for 2-3 days in 2010 and then unblocked without explanation). Users frequently could not distinguish between deliberate blocking and ordinary technical outages, and this ambiguity itself amplified self-censorship among users who had not been directly targeted.
-
Forum and blog platform operators in the censored country were systematically coerced into serving as first-line censorship enforcers: they monitored user comments, warned users that Internet anonymity did not exist, gave users chances to self-remove offending posts, and ultimately handed user identifying information to government agencies when users did not comply. Larger forums hired full-time moderators operating 24 hours a day to manage this compliance workload.
-
Users lacking technical circumvention skills bypassed blocking via social relays: technically savvy friends or contacts in unblocked regions copied blocked content into email or reposted it on social network profiles, allowing censored information to reach users who had no direct access to proxies or anonymizers. This informal bypass required no circumvention software on the recipient's end.
-
A passive observer of BridgeSPA traffic sees only a TCP connection timeout on failed authorization or a successful TLS connection on success—exactly what they would observe with an unmodified Tor bridge. The ConnectionTag is indistinguishable from the normally-random ISN and timestamp fields in Linux 2.6, so no new observable artifact is introduced. However, BridgeSPA does not address the separate problem that Tor traffic itself remains fingerprint-distinguishable from HTTPS; this is an orthogonal concern.
-
BridgeSPA encodes a 32-bit SHA256-HMAC ConnectionTag derived from a time-limited MACKey into the TCP SYN packet's ISN (lower 3 bytes) and TCP timestamp (lower 1 byte) fields—values that are uniformly random in Linux 2.6 and therefore carry the tag innocuously. Bridges silently drop unauthorized SYN packets without returning any response, preventing aliveness queries. MACKeys rotate every 1–7 days (bridge-configured), so hoarded descriptors become stale within the epoch.
-
Measured over 5,000 SYN/SYN-ACK pairs on a shared physical network hub—the best-case vantage for an adversary—BridgeSPA's DoorKeeper adds a mean latency of approximately 90 µs (280±20 µs baseline vs. 370±80 µs with BridgeSPA). This overhead is consistent with prior SilentKnock analysis concluding that an adversary would need hundreds of observed connections before gaining statistical advantage in distinguishing SPA-protected hosts from dynamic-firewall behavior.
-
An active man-in-the-middle adversary can hijack a live BridgeSPA TCP SYN by intercepting the ConnectionTag-bearing packet and racing to complete the bridge connection before the client's timestamp rounds to a new minute. Mitigating this requires the client to re-send the full (non-truncated) ConnectionTag after TLS is established, causing the bridge to act as a cover service (e.g., IMAP over TLS) until validated—but this mitigation is undermined by the fact that Tor bridge TLS certificates are currently distinguishable from other service certificates.
-
Encrypted protocols such as SSL/TLS remain fully fingerprint-able through their unencrypted handshakes: DPI can apply static string matching, packet-length comparison, and timing profiling to the cleartext cipher-negotiation and key-exchange phase to identify and block the protocol even though the payload is encrypted.
-
Dust defeats DPI fingerprinting by constructing all packets from entirely encrypted or single-use random bytes (defeating static string matching), appending a random number of random padding bytes to every packet (defeating length matching), and permitting a complete client–server conversation to be encoded in a single UDP or TCP packet (defeating timing analysis for sufficiently small payloads).
-
Dust eliminates the in-band key-exchange fingerprint surface via an out-of-band half-handshake: the server's public key, IP, port, and a single-use secret are bundled into a PBKDF-encrypted invite packet transmitted out-of-band; only the decryption password (not the server IP) appears in plaintext, defeating the email/IM IP-address blocking attacks documented against prior systems.
-
BitTorrent's Message Stream Encryption (MSE), despite omitting static strings from the handshake, can be identified with 96% accuracy using packet-size analysis and direction-of-packet-flow; MSE also uses a cleartext Diffie-Hellman key exchange, leaving an additional fingerprint surface.
-
The obfuscated-openssh handshake encrypts SSH with a key derived from an iterated-hash PBKDF whose slowness was intended to prevent real-time censor analysis; Wiley argues this defense fails because modern censors use statistical packet sampling with offline processing, and the slow key generation itself introduces a timing side-channel detectable from the inter-packet delay between the first and second packets.
-
Censorship mapping tools that detect filtering by probing blocked content create concentrated access patterns that are qualitatively different from normal user behavior, potentially exposing volunteers to scrutiny even in countries where individual access to filtered content would not ordinarily trigger enforcement action. The paper identifies this as a fundamental ethical tension intrinsic to any filtering measurement methodology.
-
Open DNS resolvers, widely available across the internet as public services, make DNS poisoning trivially detectable globally: a researcher can connect to a resolver in a target country and compare responses against a trusted reference resolver, without requiring volunteer proxies or in-country infrastructure.
-
National-level filtering is not homogeneous: the administrative burden of maintaining up-to-date filtering rules at national scale leads states to delegate implementation to regional authorities or individual ISPs, producing measurable filtering differences between geographic regions and providers within the same country.
-
Collage's threat model identifies the censor's two most dangerous capabilities as: (1) aggregate traffic-flow analysis (e.g., NetFlow statistics) to detect anomalous access patterns to specific content hosts, and (2) joining the system as a sender or receiver to discover content locations and mount denial-of-service or deniability attacks. The censor is assumed to monitor all egress traffic but is modeled as computationally limited against joint statistical distributions across arbitrary user pairs.
-
Rateless erasure coding with ε=0.01 adds only a 0.5% storage and traffic overhead. Consistent hashing of message identifiers to task-database entries ensures that when 50% of tasks are replaced, sender and receiver still share at least one task if three or more tasks are mapped per identifier. At a 10× send rate, message recovery succeeds even if 90% of published vectors are blocked.
-
The paper demonstrates that no single steganographic algorithm can provide both availability and deniability, since almost all production algorithms have been broken and steganography alone does not hide the identities of communicating parties. Collage addresses this by treating the embedding algorithm as a swappable component in a layered architecture—vector layer, message layer, application layer—so that compromise of the embedding scheme does not compromise the system, and stronger algorithms (e.g., digital watermarking) can be substituted as they mature.
-
Production steganography tools achieve encoding rates of 0.01–0.05 (fraction of cover-medium bytes available for hidden data), yielding 20–100× increases in storage, traffic, and transfer time relative to the raw message. A 23 KB one-day news summary requires approximately 9 JPEG photos (~3 KB data per photo plus encoding overhead) and takes under 1 minute to retrieve over a fast connection; over an unreliable broadband wireless link the same message was received in under 5 minutes with sender time under 1 minute.
-
Collage leverages platform-scale user-generated content—Flickr's 3.6 billion images with 6 million new per day and Twitter's ~500K tweets/day as of 2009—as a covert channel substrate. Because the censor cannot block all UGC platforms simultaneously without removing massive amounts of legitimate content, the system achieves availability and user deniability that fixed-infrastructure proxies (e.g., Tor relays) cannot: accessing Flickr or Twitter does not implicate the user as a circumvention tool operator.
-
A simple entropy argument proves the dynamic key distribution problem requires at least Ω(k log(n/k) / log(k + log n)) keys: the algorithm must identify which k of n users are adversaries from at most ℓ log ℓ bits of feedback (ℓ round outcomes each indexing one of ℓ keys), and distinguishing among C(n,k) adversary sets requires log C(n,k) = Ω(k log(n/k)) bits.
-
By reusing keys already held by trusted (non-suspicious) users for ℓ−1 of ℓ subgroups when bisecting the suspicious cohort — issuing only one fresh key per round — the total proxy count drops from O(k log n) to O(k² log n / log log n) in expectation. The information-theoretic lower bound is Ω(k log(n/k) / log(k + log n)), so this bound is tight in n up to a factor of k.
-
In the Clouds P2P protocol, a blocking attack against a specific topic requires adversaries to occupy at least 50% of the 200-peer region closest to the resource provider to be effective; below that threshold, query messages routed through multiple paths bypass the censorship. This 50% threshold holds regardless of the number of clouds κ created per peer.
-
The number of clouds per peer κ has no measurable effect on censorship resistance (Figure 5 curves are identical across κ = 1–4), while cloud size is the dominant driver of message overhead. This decoupling means designers can increase κ to improve anonymity without degrading censorship resistance or incurring bandwidth cost.
-
Cloud locality — building clouds from semantically close peers via short-distance links — ensures that 2-wise and 3-wise cloud intersections have median cardinality between 40 and 50 peers, and the probability that a peer participates in clouds whose pairwise intersection falls below 40 is below 10⁻⁴, rendering intersection attacks infeasible in practice.
-
The surrounding attack on peer anonymity is also effective only when adversaries control at least 50% of the ~100 semantically closest peers to the target; at 25% malicious peers, at least 10 honest peers still join the target's cloud at every step of the joining algorithm, preserving k-anonymity.
-
The Clouds protocol retrieves approximately 70% of available answers even in the absence of attackers, representing a ~30% retrieval performance decrease relative to an insecure SON. This baseline loss stems from the cloud-based routing mechanism's probabilistic message delivery, not from adversarial interference.
-
Website fingerprinting attacks that match file sizes and access patterns against a database of known sites remain applicable to SkyF2F, but are limited to the granularity of 512-byte fixed-size stream cells, since streams are multiplexed within a single tunnel circuit. The authors note this is less effective than against SafeWeb, where full request/response sizes are directly observable.
-
Because Skype relies on a central login server, it is technically possible for a censor to block Skype, but the paper observes that blocking widely-deployed services like Skype or Google inflicts real economic harm, making it a credible deterrent. Additionally, Skype's proprietary, closed-source protocol and P2P architecture make it harder to characterize and selectively filter than open protocols.
-
SkyF2F's friend-to-friend service model, where a server publishes its appid only to trusted contacts rather than publicly, provides significant resistance to both sybil attacks (malicious censor-controlled servers) and DoS exhaustion attacks. A censor posing as a client can establish many tunnels to exhaust a public server's resources; restricting service to a trusted friend list eliminates most of this attack surface.
-
SkyF2F tunnels censored traffic through Skype's encrypted overlay network, forcing the censor into an all-or-nothing dilemma: blocking SkyF2F requires blocking Skype entirely, which causes actual economic damage to businesses and users who depend on it. Because Skype users are identified by pseudonym and all messages are routed to overlay addresses rather than Internet addresses, IP-based blocking, DNS filtering, port blocking, and keyword filtering are all rendered ineffective.
-
A censor hosting Skype supernodes can perform passive traffic-flow analysis on relayed streams even without breaking encryption, since supernode-relayed conversations expose traffic metadata. However, with thousands of supernodes in the Skype network, the probability that any censor-controlled supernode relays a specific SkyF2F tunnel is low, making large-scale correlation high-cost.
-
Using Tor exit nodes to query the bridge authority, the authors enumerated 247 bridge descriptors over two weeks (out of 1,716 active bridges during that period). An adversary running a relay advertising just 10 MBps of bandwidth would discover 63% of bridges that relay at least 40 circuits and 87% of bridges running at least 80 circuits, because all Tor clients proactively build circuits every 10 minutes.
-
A circuit-clogging attack against bridge operators—using median-normalized latency correlations—achieved an AUC of 0.884 and an equal error rate of 0.2 when distinguishing the victim bridge from innocent bridges in PlanetLab experiments with 180 victim and 180 disjoint runs. With 10 repeated clogging experiments and a majority-vote threshold, the false positive (and false negative) rate drops below 0.033, confirming a bridge operator's identity with high confidence given a candidate set of ≤4.4 bridges from the winnowing stage.
-
The architectural coupling of 'surfing' and 'serving' in Tor's bridge design—where enabling the bridge service is required to use Tor as a client—means a bridge always accepts connections whenever its operator is online, allowing a remote non-global adversary to probe a bridge's availability at negligible cost (less than 2 bps per bridge per status check via SYN/RST). Of the 247 enumerated bridges, only an average of 29.6 (just over 10%) were accessible at any given moment, providing a highly discriminating availability signal for intersection attacks.
-
An 'unfair queuing' mechanism that partitions CPU time between bridge-operator circuits and bridge-client circuits using a time-allocation parameter τ=0.9 reduced the circuit-clogging AUC from 0.884 to 0.520 (median-normalized) and 0.412 (mean-normalized)—indistinguishable from random guessing—in 20 PlanetLab experiments. The mechanism eliminates latency interference between the two circuit types without requiring the bridge to ever refuse connections, but introduces up to 1−τ performance loss for client traffic.
-
Cross-referencing the online/offline status of 87 monitored bridges against 186,935 Wikipedia users' edit sessions showed that 95.7% of users with 50 or more sessions matched zero bridges after winnowing. For users with 180 or more sessions (a surrogate for long-term pseudonymous activity), only 89 false positives remained among 2,329 users—a false positive rate of 0.000439—meaning that even if 10,000 Tor clients volunteer to bridge, on average only 4.4 bridges remain after the winnowing stage.
-
Injectors sending multiple RSTs with increasing sequence numbers to overcome the RST_SEQ_DATA race condition produce a detection signature (RST_SEQ_CHANGE) that cannot arise from a standards-compliant TCP endpoint: the second RST must have a sequence number exceeding both the preceding RST and any ACK yet observed from the receiver. This creates an inherent design tension — a robust injector that uses sequence-incremented multi-packet RSTs to ensure delivery is precisely the kind most detectable by passive monitoring.
-
Out-of-band RST injectors fundamentally face race conditions because they cannot modify in-flight packets: a data packet may pass the injector's observation point before the forged RST is generated, producing detectable out-of-sequence RSTs (RST_SEQ_DATA) or post-RST data packets (DATA_SEQ_RST). A passive detector exploiting these two race conditions, plus a third signature (RST_SEQ_CHANGE) from multi-packet injectors, reliably identifies injected RSTs across four network datasets totaling 30.2M TCP flows.
-
Individual RST injectors exhibit stable, idiosyncratic header-field fingerprints enabling device-level identification across geographically separated sites. Sandvine devices produce back-to-back RST pairs where the second packet's sequence number is exactly 12,503 higher than the first (a known implementation bug confirmed by Sandvine's CTO) with IPID increments of 4 then 1; 90% of 193 alerting Comcast IP addresses across all four datasets matched this fingerprint. The GFW SEQ 1460 injector always increments sequence numbers by 1,460 regardless of actual MTU or window size.
-
On a crawled Orkut subgraph of 42,474 users (≈90% Brazilian nodes treated as the censored domain, 15% of external nodes as proxies = 1.5% overall), the median node reaches 7 proxies — higher than the synthetic graph due to greater average degree (5.59 vs. 4.65) and lower clustering. Even when subverted trust links reach half the total proxy count, more than 94% of users can still access at least one proxy unknown to the censor.
-
The CleanFeed first stage populates its IP blocklist by automatically resolving hostnames from the IWF database via DNS. Content providers can serve false DNS results pointing to high-traffic third-party IP addresses (e.g., Google cache servers at 66.102.9.104), causing the first stage to redirect legitimate traffic through the proxy. Automated IP-update processes cannot reliably distinguish a genuine IP migration from a spoofed DNS result, and this can cause legitimate sites to be blocked collaterally.
-
Tor's public relay list (a few thousand IP addresses as of 2006) can be trivially enumerated and blocked by a censor. The paper proposes 'bridge relays' drawn from Tor's existing user base of hundreds of thousands of people, creating a pool of frequently-changing IP addresses that is too large and dynamic for a censor to enumerate completely. Bridge relays rate-limit relayed connections to ~10 KB/s and publish descriptors only to a private bridge directory authority rather than the public consensus.
-
The paper proposes dividing public bridge addresses into 8 pools (n=3 bits from HMAC(identity-key, authority-secret)) each assigned a distinct distribution strategy: time-windowed release, IP-subnet-partitioned assignment, time+location combined, mailing-list rotation, email/CAPTCHA delivery, and social-trust delegation. Deploying all strategies concurrently forces the attacker to allocate resources across every channel simultaneously, making all strategies more robust than any single strategy deployed alone.
-
If bridges run on predictable ports and any TCP connection to a bridge port reveals it as a Tor bridge, a censor can scan the entire address space of residential ISP ranges to enumerate and block all bridges. The paper proposes 'scanning resistance': bridges require a nonced hash of a pre-shared password before revealing Tor behavior, and respond to unauthenticated connections by impersonating an ordinary HTTPS server (e.g., default Apache page or a random legitimate website).
-
Tor's 2006 TLS handshake contained multiple identifying fingerprints exploitable by censors: the X.509 organizationName field was set to 'Tor', the relay nickname appeared in the commonName field, clients always presented certificates (unlike browsers), and Tor used two-certificate chains (identity cert + per-session TLS cert) while most consumer HTTPS services use a single certificate. The paper flags these as sufficient for a censor to identify Tor traffic without deep payload inspection.
-
Tor encrypts payload but does not obscure traffic volume, leaving a residual publisher-vs-reader asymmetry: a user publishing a home video generates a markedly different upload/download ratio than one reading news. The paper also notes that website fingerprinting attacks — where the adversary pre-downloads hundreds of popular sites and matches traffic patterns to a Tor client's stream — remain possible even through bridge circuits, and are exacerbated by Tor's varying supported protocols (web vs. IM produce different timing signatures).
-
Theorem 1 proves that censorship resistance (CR) implies Private Information Retrieval (PIR): any system achieving low censorship susceptibility must implement PIR as an underlying primitive. CR systems cannot be built with cryptographic primitives weaker than PIR.
-
Server-deniability schemes (Publius) and data-entanglement schemes (Tangler, Dagster) both achieve censorship susceptibility of 1 under the cooperative-server model. Publius fails because the Publius URL encodes the hosting servers and document identity in public, enabling direct query filtering. Tangler and Dagster fail because their limited-width entanglement graphs allow a censor to remove a document with collateral damage too small to prevent selective censorship — only a small number of blocks per document are entangled.
-
PIR alone does not achieve censorship resistance. Using the QRA (Quadratic Residuosity Assumption) PIR scheme as a direct CR implementation, a filter can replace a query component — substituting a quadratic residue for a non-residue at the target column index — forcing the server to return an incorrect result for the targeted document while leaving all other documents unaffected, yielding censorship susceptibility of 1.
-
Theorem 3 demonstrates that having the server digitally sign its response together with the verbatim client query is sufficient to achieve CR when built atop any secure PIR protocol. This construction (sys+S) eliminates query modification as an attack vector, reducing the censor's viable strategies to query-dropping only — an advantage bounded above by the underlying PIR adversary's advantage, proving that the censor must shut down the entire service to achieve selective filtering.
-
Under a threat model granting the censor universal inspection of server communications and processing logs — with only the server's signing key withheld — data-replication systems (Freenet, Gnutella, Eternity Service) and anonymous-communication systems (Free Haven, Serjantov's scheme) all achieve censorship susceptibility of 1. Because document names are publicly known, a censor with full server visibility can selectively drop any targeted query without disrupting access to other documents.
-
The paper argues that censorship is an economic activity in which both censor and target incur costs, and that binary 'blocked/unblocked' models are as unrealistic as an omnipotent global adversary. Technology changes (e.g., moveable type, online publishing, trusted computing) can shift the cost parameters dramatically, making quantitative cost modeling — rather than binary vulnerability analysis — the correct framing for censorship-resistance evaluation.
-
Discretionary P2P networks avoid the social-choice and incentive-manipulation problems inherent in random distribution, which requires collective agreement on a system-wide resource ratio (rs, bs) and thus creates incentives to subvert voting or reputation mechanisms. By allowing nodes to self-select content, discretionary systems need no election schemes, reputation systems, or electronic cash, enabling simpler and more stable designs.
-
Under the paper's economic model, the aggregate censorship-resistance defense budget is always at least as large in a discretionary P2P network (nodes serve content they choose) as in a random-distribution network: for every node i, td ≥ ts, so the total cost imposed on the censor satisfies Σtd ≥ Σts. Equality holds only when all nodes share identical preferences (ri = rs); in all other cases discretionary distribution is strictly harder to censor.
-
In a random-distribution network, nodes whose utility is non-decreasing under censorship will set their defense budget to zero. For example, in a network with rs = 0.5 (equal red/blue), a censor shifting the distribution to rc = 0 (all blue) increases the utility of strongly blue-preferring nodes; they then invest nothing in resistance, reducing aggregate network defense.
-
Under the paper's quadratic utility function and linear defense probability P(t) = t/T, a node will invest zero resources fighting censorship when the censor's imposed distribution reduces its utility by less than half (i.e., when Ui(rc,bc) ≥ Ui(ri,bi)/2). Nodes whose preferences most diverge from the censor's are the first to resist; mild censorship therefore attracts little aggregate resistance.
-
The paper presents a systematic taxonomy of blocking criteria across ISO/OSI layers: circumstance-based (addresses including sender/receiver/kind/physical location; timing including send time, receive time, duration, frequency; data-transfer properties; services including protocols, names, addresses) and content-based (file type/MIME, statistical detection of encrypted or compressed data, pattern matching for keywords or phrases, and website fingerprinting via request-count/byte-volume signatures).
-
The paper proposes using CAPTCHAs (hard AI problems) to gate forwarder-list access, forcing the blocker to expend human resources solving every puzzle while each blockee solves only one. However, a 'stealing cycles from humans' attack allows a censor to relay CAPTCHAs to unwitting third parties (e.g., visitors to an attacker-operated website) who solve them on the censor's behalf.
-
NAT and firewalls make volunteer forwarders (JAPR) unreachable for inbound connections by default, removing the incentive for volunteers to reconfigure their systems for no personal benefit. The design response is to reverse the connection direction — JAPR initiates contact with JAPB — shifting the NAT/firewall configuration burden to the motivated blockee who gains direct benefit from solving it.
-
For a secure steganographic system the embedding ratio is at least 1:10, meaning 1 MB of web content requires 10 MB of transmitted cover data; for a system robust against active attacks (e.g., StirMark bilinear distortions) the ratio is probably 1:100. A censor need not break the steganographic algorithm with high accuracy — suspicion alone is sufficient, since the censor can probe suspected nodes directly by acting as a blockee.
-
The protocol between blockee and volunteer forwarder is designed to be transport-layer independent from the outset, allowing substitution of plain TCP with SSL tunnels, SMTP, or steganographic channels as the censor escalates detection. The system is intentionally deployed in a weak initial form to observe how quickly and in what manner the censor adapts, then hardened iteratively based on measured censor behavior.
-
The paper evaluates all major circumvention techniques available in 2003 and concludes that only application-layer proxies (HTTP, SOCKS, JAP, peek-a-booty) and IP tunneling can defeat all three blocking layers (IP filtering, DNS tampering, filtering proxies) simultaneously. Encryption alone cannot circumvent IP or DNS blocking; HTTPS hides URL paths but not the destination host; DNS-over-HTTPS/DNSSEC can detect but not defeat DNS tampering without a third-party resolver.
-
An empirical DNS survey of North Rhine-Westphalia providers (May 2003) found that kids.stormfront.org — not named in the blocking order — was returned with obscure errors by 56% of surveyed servers, while rotten.com (also not in the order) was erroneously blocked by 11% of providers. www.stormfront.org itself was blocked by 12 providers with 0% still accessible, demonstrating that real-world DNS-tampering deployments systematically over-block non-targeted names at high rates.
-
Survey of NRW provider DNS implementations revealed at least five distinct tampering strategies in the wild: name hijacking to a government redirect server, NXDOMAIN for entire zones, name astrayment to 127.0.0.1 (user's own machine) or to unallocated IPs such as 1.1.1.1, silence (no reply), and provoked SERVERFAIL. One provider (tops.net) additionally set tracking cookies on users redirected to its block-notification page, demonstrating that name hijacking creates a surveillance vector beyond the blocking itself.
-
DNS zone architecture prevents providers from blocking individual hostnames without also disrupting all other services (email, chat, file transfer) for every name in the same DNS zone. A provider blocking www.bad.example.com must create a synthetic zone for bad.example.com, requiring continuous re-synchronization with authoritative servers at 3–24 hour intervals; failing to replicate MX records blocks email to non-targeted addresses in the zone.
-
IP-level blocking causes severe over-blocking because more than 87% of all domains deploy name-based virtual hosting on shared IP addresses (per Edelman's 2003 survey of .com/.net/.org). A single blocked IP can deny access to thousands of unrelated sites; when xs4all.nl was blocked in 1996/1997, between 3,000 and 6,000 separate websites were collaterally blocked.
-
Active-server document anonymity is achieved by routing decryption through a randomly chosen ephemeral 'decrypter' node: the storer holds only ciphertext {h}k while key k is delivered separately to the decrypter via onion routing. Neither the storer nor any other single node can reconstruct the plaintext share, so a storer cannot identify the document it is hosting even by attempting to retrieve it.
-
An adversary who wishes to expose storers by having forwarders log storer identities must compromise all n−k+1 chosen forwarders before or during the publication event; forwarders that legitimately delete the storer mapping immediately after acknowledging publication render this attack ineffective unless the adversary pre-positions malicious nodes at sufficient density. The paper notes that with a reasonably large forwarder population the probability of the required simultaneous compromise is small.
-
The paper proposes a forwarder/storer role split in which forwarders hold only an anonymous return-address pointer to the storer, and deliberately forget the storer's identity upon receiving a publication acknowledgment. Because forwarders neither hold content nor retain storer addresses post-publication, coercing a forwarder after publication yields no actionable information about where shares are held.
-
Publius splits document keys into n shares where any k reconstruct the document, requiring a censor to coerce only n−k+1 servers to suppress it. Because all Publius server locations are discoverable by any reader, the paper argues this threshold is easily achievable, making location-secrecy of storers a necessary — not optional — property for censorship-resistant storage systems.
-
An attacker can conduct stealth port scans against a victim without revealing their own IP by exploiting a 'patsy' host whose OS uses a globally incrementing IP Identifier: the attacker observes ID increments of 2 (rather than 1) in the patsy's traffic when the victim sends a RST to the patsy in response to a spoofed SYN, revealing open ports. Choosing a different patsy for each port makes the scan very hard to detect.
-
The user-level norm normalizer processes a realistic 100,000-packet trace (88% TCP) at approximately 101,000 pkts/sec (397 Mb/s) with all normalizations enabled on a $1,000 AMD Athlon 1.1 GHz PC, compared to a memory-copy-only baseline of 727,270 pkts/sec; the authors conclude a kernel implementation could sustain a bidirectional 100 Mbps access link with sufficient headroom to weather high-speed small-packet flooding attacks.
-
Passive NIDS can be evaded via three fundamental classes of ambiguity: incomplete protocol analysis (none of the four commercial systems tested by Ptacek and Newsham in 1998 correctly reassembled IP fragments), divergent end-system behavior (different OS stacks resolve overlapping TCP retransmissions differently), and topology uncertainty (low-TTL packets may not reach the victim end-system, so the NIDS cannot determine which packets are delivered).
-
The paper derives a closed-form expression for the expected number of later blocks that link to the n-th block: with c=10 cross-links per block, there is a 55% probability that the 10^7th block in the system will have been linked by at least one subsequent legitimate block after 10^5 additional blocks are added. This quantifies the minimum corpus activity required before a publisher can safely announce a document and have plausible censor-resistance.
-
Dagster identifies every block by the cryptographic hash of its contents (block ID), making it infeasible for an adversary to pre-empt a name with bogus data — an attack that directly affects Publius, where an attacker who possesses a target document can insert garbage under the same name that the legitimate document would have occupied. Content-addressing also makes the system robust to the naming ambiguity observed in Freenet (where a single document was posted under three distinct capitalizations).
-
Dagster achieves censorship resistance on a single server — without geographic replication — by cryptographically intertwining legitimate and illegitimate data into a directed acyclic graph. Each new block XORs the publisher's content with c pre-existing blocks before encrypting with a fresh key, so removing any one block destroys the decodability of every block that later links to it. This creates a legal constraint: a censor cannot excise a censorable block without simultaneously destroying an unknown number of legally protected blocks that depend on it.
-
Dagster requires both clients and servers to enforce a randomness predicate rand?(x) on every block before storage or forwarding, ensuring all server-stored data is statistically indistinguishable from uniform random noise. This provides server deniability — the operator can credibly deny knowledge of content — and also closes the attack present in Publius and Freenet where a malicious client could post plaintext, potentially exposing the operator for 'knowingly' hosting illegal content.
-
Dagster's randomness predicate cannot distinguish legitimate random-looking blocks from adversarially generated filler, leaving the system vulnerable to storage-exhaustion denial-of-service: an attacker can submit arbitrarily many random blocks that pass the predicate, consuming server disk until legitimate publications are refused. The paper identifies anonymous digital cash (as proposed in the Eternity Service) or hash-cash proof-of-work as candidate mitigations but does not implement either.
-
Publius cryptographically binds the URL to both the document content and the key shares via name_i = wrap(H(M · share_i)). Any unauthorized modification to the stored encrypted file, a share, or the URL itself causes the tamper check to fail, preventing silent content substitution by a malicious server.
-
A malicious server operator with write access can mount a redirection attack by inserting a fake update file pointing to adversary-controlled content. If the client retrieves only k shares and Mallory controls k collaborating servers, all k update URLs match and the client proxy follows the redirect. A 1-bit non-updatable flag in the Publius URL blocks this vector by instructing clients to ignore all update files.
-
Publius's delete mechanism requires the publisher to supply H(server_domain · PW) per server rather than a bare password, preventing any single malicious server from learning the global password and deleting the document from all hosting servers. However, the paper acknowledges that an adversary who identifies the publisher can apply coercive ('rubber-hose') methods to obtain the URL and password directly from the author, bypassing all cryptographic protections.
-
Publius provides source anonymity once content is published but offers no connection-based anonymity at upload time. A network-layer eavesdropper between the publisher and the servers, or a server's connection log, can reveal the publisher's IP address. The paper explicitly states that Publius must be combined with a mix-network or crowd-anonymity tool (e.g., Crowds, Onion Routing) to protect publisher identity during the upload phase.
-
Publius encrypts content under a symmetric key K, then splits K into n shares using Shamir secret sharing such that any k shares reconstruct K. Each server stores the encrypted document plus one share, so an adversary must corrupt or destroy n−k+1 servers to censor the document, and increasing n or decreasing k raises the bar proportionally.
-
The paper proves that any network IDS operating without maintaining complete, OS-specific per-connection state cannot reliably reconstruct the byte stream seen by the end-system. TCP and IP reassembly ambiguities guarantee unavoidable blind spots unless the IDS performs full per-target OS emulation—a fundamental architectural limitation, not an implementation bug, that applies equally to any DPI-based censor.
-
IP-level fragment overlap attacks operate independently of TCP: crafting overlapping IP fragments whose reassembly by the IDS yields benign content while the end-system's reassembly yields the true payload. The paper demonstrates this is a separate attack surface from TCP-level evasion, exploitable below the transport layer before any TCP stream reconstruction begins.
-
Different operating systems apply different precedence rules when TCP segments overlap—some implementations use 'first data wins,' others 'last data wins.' An IDS applying a single universal reassembly policy will systematically diverge from the actual target end-system whenever overlapping segments appear, creating a predictable and repeatable evasion surface that is an inherent consequence of policy misalignment rather than a configuration flaw.
-
An 'evasion' attack exploits the mirror condition: the IDS drops a TCP segment that the end-system accepts, due to differences in overlap-resolution policy. The IDS reconstructs 'ATTCK' while the end-system sees 'ATTACK'; the missing segment carries the content that would trigger the signature, leaving the censor with an incomplete—and non-matching—view of the stream.
-
An 'insertion' attack sends TCP segments with forged TTL values low enough to expire at the IDS/censor but not at the true destination. The IDS incorporates the spurious segment into its reconstructed stream—seeing 'ATXTACK'—while the end-system assembles the intended byte stream 'ATTACK,' causing signature-based content matching to fail without disrupting delivery.
-
Anderson establishes that anonymity and physical redundancy are substitutes: 'Anonymity enables us to reduce diversity.' Tamper-resistant hardware security modules controlling anonymized file servers ensure no identifiable group of people — including sysadmins — can locate or delete a specific file without breaking a quorum of hardware modules distributed across jurisdictions.
-
Using Byzantine-fault-tolerant protocols (specifically Rampart), seven replicas suffice to resist a conspiracy of any two malicious administrators or the accidental destruction of four systems with guaranteed complete recovery. Signing all files with a system key further ensures that a full recovery is possible as long as a single valid copy and an uncompromised public key survive.
-
Effective censorship of a distributed service requires simultaneous enforcement across every jurisdiction hosting nodes. With no head office to coerce, a legal attack requires coordination across multiple independent legal systems — making successful suppression 'very expensive indeed — hopefully beyond even the resources of governments.' Local bans (e.g., country-level) do not affect nodes in other jurisdictions.
-
The Eternity Service's core design stores a file on 100 servers worldwide but retains records of only 10 for auditing, destroying the remaining 90 records. Even if a user is legally compelled to disclose all 10 known server locations and those copies are seized, 90 copies survive at unknown locations and can be retrieved via anonymous broadcast once the user leaves the jurisdiction.
-
Traffic analysis is identified as the primary threat to location secrecy in a distributed anonymous storage system: if an adversary can correlate inter-server communications or link requests to stored file locations, it can target physical seizure. The paper proposes mix-nets (Chaum 1981) for user-facing file delivery and dining-cryptographers ring protocols for inter-server communications, supplemented by traffic padding, so that even traffic analysis yields no actionable location information.