2017-deng-random

The Random Forest based Detection of Shadowsock's Traffic

Ziye Deng, Zihan Liu, Zhouguo Chen, Yubin Guo · Intelligent Human-Machine Systems and Cybernetics · 2017

canonical link →

Tags

censors: cn
techniques: ml-classifier
defenses: shadowsocks

findings extracted from this paper

The classifier uses a 3,000-dimension binary vector recording which upstream and downstream packet sizes appear across the full session, combined with aggregate biflow statistics (total packets, burst length, transmission time, incoming/outgoing fractions). This packet-size histogram is the highest-dimensionality feature in the set.

§IV.B, Table 1 detection ml-classifiertraffic-shape cn
The authors trained on 1 GB of captured Shadowsocks traffic and 1 GB of non-Shadowsocks traffic from a single host, then tested on over 1 GB of each from 26 randomly selected hosts. The cross-host generalization of the model is demonstrated but no explicit false-positive or false-negative rates are reported.

§V.B–C evaluation ml-classifier cn
A Random Forest classifier with 100 CART trees and a sqrt(C) feature-selection strategy achieves over 85% accuracy detecting Shadowsocks traffic from biflow statistics. Accuracy increases monotonically with train-set and test-set size before plateauing.

§V, Abstract detection ml-classifierdpi cn
Shadowsocks traffic appears as ordinary TCP with no payload keywords or obvious protocol markers because the entire payload is encrypted; firewalls cannot distinguish it from generic TLS without behavioral flow analysis. This makes signature- and keyword-based detection ineffective against it.

§III.A detection dpikeyword-filtering cn
The paper identifies that Shadowsocks can also serve as a transport layer for Tor and VPN connections, meaning a Shadowsocks flow detector functions as a first-stage classifier that unmasks compounded anonymity systems. The authors explicitly cite this as a motivation for detection.

§II detection ml-classifierdpi cn