2026-yan-efficient-provably-secure
Efficient Provably Secure Linguistic Steganography via Range Coding
canonical link → · arxiv: 2604.08052
2026-yan-efficient-provably-secure
canonical link → · arxiv: 2604.08052
findings extracted from this paper
The vanilla range-coding baseline suffers from two provable security failures that RRC corrects: (1) distortion of the token probability distribution because interval boundaries do not align with token probabilities, and (2) randomness reuse across sampling steps, which exposes detectable statistical bias. The rotation mechanism specifically addresses both by introducing fresh PRNG-seeded randomness o~U(0,1) at each step and applying a modulo rotation to the decimal state.
Rotation Range-Coding (RRC) steganography achieves approximately 100% entropy utilization (99.98% on GPT-2) while maintaining zero KL divergence at every generative step, outperforming all prior provably-secure baselines: SparSamp at 96.76%, Discop w/ sort at 95.17%, iMEC at 71.44%. The rotation mechanism transforms the discrete uniform random variable into a continuous uniform at each step, preserving the original LM probability distribution exactly.
RRC steganography achieves an embedding speed of up to 1554.66 bits/s on GPT-2 — the fastest among all provably-secure methods tested — and sustains approximately 1500 bits/s across message lengths from 64 to 1024 bits. Embedding a 128-bit secret message takes 0.082 seconds, and the method scales to at least 8192-bit messages (13.5 seconds, 99.99% entropy utilization) without an upper bound on message length.
Fine-tuned BERT and RoBERTa steganalysis discriminators achieve only 47.8–50.6% detection accuracy across GPT-2, OPT-1.3B, and Llama-2-7B stegotext — indistinguishable from random guessing. Human evaluators perform similarly poorly (46.6–50.6% accuracy, F1 ≤ 51.5%), while the paper notes statistical classifiers already outperform humans on this discrimination task.
RRC steganography is training-free, model-agnostic, and plug-and-play: it requires no modification to the underlying language model and was validated on GPT-2, OPT-1.3B, and Llama-2-7B. The symmetric-key design requires only that sender and receiver agree on a shared PRNG seed and the secret message length l before communication.