2017-deng-random
findings extracted from this paper
-
The classifier uses a 3,000-dimension binary vector recording which upstream and downstream packet sizes appear across the full session, combined with aggregate biflow statistics (total packets, burst length, transmission time, incoming/outgoing fractions). This packet-size histogram is the highest-dimensionality feature in the set.
-
The authors trained on 1 GB of captured Shadowsocks traffic and 1 GB of non-Shadowsocks traffic from a single host, then tested on over 1 GB of each from 26 randomly selected hosts. The cross-host generalization of the model is demonstrated but no explicit false-positive or false-negative rates are reported.
-
A Random Forest classifier with 100 CART trees and a sqrt(C) feature-selection strategy achieves over 85% accuracy detecting Shadowsocks traffic from biflow statistics. Accuracy increases monotonically with train-set and test-set size before plateauing.
-
Shadowsocks traffic appears as ordinary TCP with no payload keywords or obvious protocol markers because the entire payload is encrypted; firewalls cannot distinguish it from generic TLS without behavioral flow analysis. This makes signature- and keyword-based detection ineffective against it.
-
The paper identifies that Shadowsocks can also serve as a transport layer for Tor and VPN connections, meaning a Shadowsocks flow detector functions as a first-stage classifier that unmasks compounded anonymity systems. The authors explicitly cite this as a motivation for detection.