2010-burnett-chipping

Chipping Away at Censorship Firewalls with User-Generated Content

Sam Burnett, Nick Feamster, Santosh Vempala · USENIX Security Symposium · 2010

canonical link →

Tags

censors: generic
techniques: dpi keyword-filtering
defenses: steganography tunneling

findings extracted from this paper

Collage's threat model identifies the censor's two most dangerous capabilities as: (1) aggregate traffic-flow analysis (e.g., NetFlow statistics) to detect anomalous access patterns to specific content hosts, and (2) joining the system as a sender or receiver to discover content locations and mount denial-of-service or deniability attacks. The censor is assumed to monitor all egress traffic but is modeled as computationally limited against joint statistical distributions across arbitrary user pairs.

§3.1 detection traffic-shapeml-classifierdpi cngeneric
Rateless erasure coding with ε=0.01 adds only a 0.5% storage and traffic overhead. Consistent hashing of message identifiers to task-database entries ensures that when 50% of tasks are replaced, sender and receiver still share at least one task if three or more tasks are mapped per identifier. At a 10× send rate, message recovery succeeds even if 90% of published vectors are blocked.

§4.2, §4.3, Figure 4, Figure 5a defense ip-blocking generic
The paper demonstrates that no single steganographic algorithm can provide both availability and deniability, since almost all production algorithms have been broken and steganography alone does not hide the identities of communicating parties. Collage addresses this by treating the embedding algorithm as a swappable component in a layered architecture—vector layer, message layer, application layer—so that compromise of the embedding scheme does not compromise the system, and stronger algorithms (e.g., digital watermarking) can be substituted as they mature.

§4.1, §2 defense dpitraffic-shape generic
Production steganography tools achieve encoding rates of 0.01–0.05 (fraction of cover-medium bytes available for hidden data), yielding 20–100× increases in storage, traffic, and transfer time relative to the raw message. A 23 KB one-day news summary requires approximately 9 JPEG photos (~3 KB data per photo plus encoding overhead) and takes under 1 minute to retrieve over a fast connection; over an unreliable broadband wireless link the same message was received in under 5 minutes with sender time under 1 minute.

§5, Figure 5 evaluation traffic-shape generic
Collage leverages platform-scale user-generated content—Flickr's 3.6 billion images with 6 million new per day and Twitter's ~500K tweets/day as of 2009—as a covert channel substrate. Because the censor cannot block all UGC platforms simultaneously without removing massive amounts of legitimate content, the system achieves availability and user deniability that fixed-infrastructure proxies (e.g., Tor relays) cannot: accessing Flickr or Twitter does not implicate the user as a circumvention tool operator.

§1, §4.1 defense ip-blockingdpi cngeneric