2023-tran-crowdsourcing

Crowdsourcing the Discovery of Server-side Censorship Evasion Strategies

Nhi Tran, Kevin Bock, Dave Levin · Free and Open Communications on the Internet · 2023

canonical link →

Tags

censors: generic
techniques: measurement-platform

findings extracted from this paper

The proposed crowdsourced system runs multiple isolated Geneva training pools on a controlled server — one pool per censorship system (initially China and Iran) — and instructs volunteer browsers via JavaScript to send forbidden requests to isolated ports, with no download or software installation required from the user. The server monitors per-strategy success or failure to drive genetic evolution entirely from the server side.

§3 Design defense keyword-filteringdpi cnir
Browsers cannot independently set the HTTP Host header or TLS SNI field, blocking the standard censorship-trigger methods used in Geneva training. The paper proposes two workarounds: (1) keyword-based HTTP censorship triggers using forbidden strings in URL parameters, limited to censors that employ keyword filtering; and (2) registering domains whose strings contain a censored substring to exploit censor overblocking via overbroad regular expressions (e.g., registering a domain matching torproject.org's regex to also catch mentorproject.org).

§3 Design — Triggering Censorship defense keyword-filteringsni-blockingdpi cn
The system is designed to protect crowdsourced volunteer privacy by storing only AS-level granularity alongside randomized short-lived client identifiers, explicitly discarding source IP addresses and any browser-identifying information. AS-level resolution is sufficient for server-side evasion because strategies are evolved per-censor-ASN rather than per-user.

§3 Design — Protecting Users defense measurement-platform generic
Server-side censorship evasion strategies require zero client-side changes: clients bypass censorship without installing software or even being aware of the evasion, and this approach has been adopted in production tools including Psiphon's packetman. The packet manipulations exploit weaknesses in how censors track or tear down TCP connections, occurring entirely at the server during the three-way handshake.

§1 Introduction defense dpirst-injectionmiddlebox-interference generic
All existing automated server-side strategy discovery tools — Geneva, Alembic, and SymTCP — require researcher control of a client during training, even when the discovered strategies are deployed exclusively server-side. This dependency makes it infeasible to train against censors in networks where researchers cannot place a controlled machine.

§1 Introduction evaluation dpimiddlebox-interference generic