FINDING · EVALUATION

Mamba-2 (2.5M parameters) is Pareto-optimal on the accuracy-vs-inference-time frontier: it achieves average macro-F1 of 0.9909 with 30-60% faster backward passes than Mamba-1 and 2-3× faster inference than linear Transformers with FlashAttention-2 at medium-to-large batch sizes on a single RTX 3090. Memory usage is 2-4× lower than Transformer-based counterparts, enabling single-GPU operation at sequence length 1600.

From 2026-kulatilleke-mambanetburst-direct-byte-levelMambaNetBurst: Direct Byte-level Network Traffic Classification without Tokenization or Pretraining · §V-C, §V-D, Figure 2 · 2026 · arXiv preprint

Implications

Tags

censors
generic
techniques
ml-classifier

Extracted by claude-sonnet-4-6 — review before relying.