FINDING · EVALUATION

CensorshipDetector, an XLM-RoBERTa model fine-tuned on 587,819 Baidu Baike articles (censored) and Chinese Wikipedia (uncensored), achieved 91% accuracy on a held-out validation set of Chinese news articles, correctly classifying 93% of Chinese state media articles as censored and 87% of New York Times Chinese articles as uncensored, with average censorship scores of 0.93 and 0.13 respectively.

From 2025-ahmed-llm-censorship-biasAn Analysis of Chinese Censorship Bias in LLMs · §4.7, §4.7.2 · 2025 · Proceedings on Privacy Enhancing Technologies

Implications

Tags

censors
cn
techniques
keyword-filteringml-classifier

Extracted by claude-sonnet-4-6 — review before relying.