FINDING · EVALUATION

Eliminating self-supervised pretraining reduces total wall-clock training time by an estimated 3-15× relative to ET-BERT, YaTC, and NetMamba, while achieving comparable or superior accuracy. Pretraining in representative baselines typically consumes 10-100× more compute than downstream fine-tuning; removing it also eliminates the risk of negative transfer from mismatched pretraining corpora under concept drift.

From 2026-kulatilleke-mambanetburst-direct-byte-levelMambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining · §VI-F · 2026 · arXiv preprint

Implications

Tags

censors
generic
techniques
ml-classifier

Extracted by claude-sonnet-4-6 — review before relying.