FINDING · DETECTION

A Naive Bayes classifier built on 17 LIWC-derived and keyword features achieved 79.34% accuracy (10-fold cross-validation) predicting censorship of Sina Weibo posts, with precision 0.80 and recall 0.85 for the censored class — outperforming all single-domain feature sets including the full 408-feature combination (0.69 accuracy).

From 2018-ng-detecting — Detecting Censorable Content on Sina Weibo: A Pilot Study · §5, Table 1 · 2018 · Hellenic Conference on Artificial Intelligence

Implications

Keyword scrubbing alone is insufficient for platform-censorship evasion; the dominant signal is stylistic (informal register, mood particles), so user-guidance tools must target linguistic style, not just lexical content.
Circumvention tools embedding pre-submission feedback could flag posts scoring high on LIWC informal-language and modal-particle dimensions as elevated censorship risk.

Implications

Tags