2016-kohls-skypeline
findings extracted from this paper
-
χ² homogeneity tests on 70 audio signal pairs show that at SNR ≥ 25 dB the probability that a statistical test distinguishes modulated from original signals falls to 77.13% (i.e., the rate of successful discrimination is below 23%). Crucially, this analysis requires access to the original unmodulated signal; for live voice transmissions no such pairing is feasible for the censor, rendering statistical detection unrealizable in practice.
-
The paper's threat model explicitly assumes censors can enforce client-side VoIP software (e.g., TOM-Skype in China) giving the adversary access to the pre-encoding audio signal at both endpoints. Despite this, SkypeLine forces the censor into an all-or-nothing position: intercepting hidden data requires blocking the entire VoIP service, since no network-layer observable (packet headers, timing, encrypted payload) distinguishes steganographic from legitimate calls.
-
SkypeLine's m-ary modulation (Mode B using 128-bit Hadamard sequences) achieves a peak data rate of 2,407 bps, representing a 12,035% improvement over FHSS-based DSSS (Takahashi et al., 20.5 bps) and 19,256% over phase-coding techniques (Nutzinger et al., 12.5 bps). Four-layer parallel binary modulation (Mode A, Quattro) achieves a peak of 224 bps and mean of 106.61 bps at ≥99% reconstruction accuracy.
-
A Skype prototype operating under real-world conditions achieves 64 bps (WGN noise, no ECC) at ≥99% reconstruction accuracy and ≥23 dB SNR. With OPUS/Silk encoding (vector quantization), throughput is constrained to approximately 72 bps at two modulation layers; additional layers fail to satisfy the 99% accuracy bound because VQ codec noise reduction filters the embedded pseudo-noise sequences.
-
Wireshark captures of Skype traffic with and without hidden information at inaudible SNR show no statistically significant differences in inter-arrival times (mean IAT 0.019 s in all conditions) and only a 2.6% difference in mean packet length (130.34 bytes unmodulated vs. 126.98 bytes at inaudible SNR), well within one standard deviation (SD ≈ 12–14 bytes) and insufficient for reliable content-mismatch detection.