FINDING · DETECTION
A Naive Bayes classifier built on 17 LIWC-derived and keyword features achieved 79.34% accuracy (10-fold cross-validation) predicting censorship of Sina Weibo posts, with precision 0.80 and recall 0.85 for the censored class — outperforming all single-domain feature sets including the full 408-feature combination (0.69 accuracy).
From 2018-ng-detecting — Detecting Censorable Content on Sina Weibo: A Pilot Study · §5, Table 1 · 2018 · Hellenic Conference on Artificial Intelligence
Implications
- Keyword scrubbing alone is insufficient for platform-censorship evasion; the dominant signal is stylistic (informal register, mood particles), so user-guidance tools must target linguistic style, not just lexical content.
- Circumvention tools embedding pre-submission feedback could flag posts scoring high on LIWC informal-language and modal-particle dimensions as elevated censorship risk.
Tags
Extracted by claude-sonnet-4-6 — review before relying.