FINDING · EVALUATION

Across all 7 LLMs tested (GPT 4o, GPT 4o Mini, Gemini 1.5 Flash, Gemini 1.5 Pro, Llama 3.2, Claude 3.5 Haiku, Claude 3.5 Sonnet), statistically significant evidence of censorship bias was found in at least one evaluation metric per model: responses to Simplified Chinese prompts were more neutral, more similar to sanitized text, and less opinionated than semantically identical Traditional Chinese prompts (p < 0.05 across refusal-rate, sentiment, CensorshipDetector classification, and word-embedding analyses).

From 2025-ahmed-llm-censorship-biasAn Analysis of Chinese Censorship Bias in LLMs · Table 4, §6 · 2025 · Proceedings on Privacy Enhancing Technologies

Implications

Tags

censors
cn
techniques
keyword-filteringml-classifier

Extracted by claude-sonnet-4-6 — review before relying.