FINDING · EVALUATION

Supervised byte-level training without pre-training reduces total compute by an estimated 3–15× in wall-clock training time and 2–4× in training memory footprint compared to pre-trained Transformer baselines (ET-BERT, YaTC, NetMamba), while achieving equivalent or superior classification F1 across six benchmarks spanning encrypted app identification, VPN/Tor, malware, and IoT attack traffic.

From 2026-kulatilleke-mambanetburst-direct-byte-levelMambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining · §VI-F, §V-A · 2026 · arXiv preprint

Implications

Tags

censors
generic
techniques
ml-classifier

Extracted by claude-sonnet-4-6 — review before relying.