FINDING · EVALUATION
Eliminating self-supervised pretraining reduces total wall-clock training time by an estimated 3-15× relative to ET-BERT, YaTC, and NetMamba, while achieving comparable or superior accuracy. Pretraining in representative baselines typically consumes 10-100× more compute than downstream fine-tuning; removing it also eliminates the risk of negative transfer from mismatched pretraining corpora under concept drift.
From 2026-kulatilleke-mambanetburst-direct-byte-level — MambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining · §VI-F · 2026 · arXiv preprint
Implications
- Rapid classifier retraining (hours rather than days) becomes feasible for a well-resourced censor, meaning circumvention tools cannot rely on a slow adversarial adaptation cycle — transport-layer obfuscation must be adaptable on a similar timescale.
- Concept-drift attacks (deliberately shifting protocol byte distributions over time to confuse classifiers trained on stale data) are less effective against pretraining-free models that can be cheaply retrained on fresh traffic samples.
Tags
Extracted by claude-sonnet-4-6 — review before relying.