2023-streisand-where
findings extracted from this paper
-
In China, it is typically the publisher (not the government) who self-censors translated books to avoid punishment including harsh scrutiny of future publications, book confiscation, and suspension of publishing rights. Authors are often unaware their translations were altered until well after publication. This self-censorship dynamic produces more restrictive outcomes than direct government censorship because publishers err on the side of caution without clear rules.
-
A case study on Chapter 5 of Chinese Literature: A Very Short Introduction (Knight) found 7 censored topics, 5 removed paragraphs, 31 removed sentences, and 2 removed/altered words in the Chinese translation. Censored topics included 2000 Nobel laureate Gao Xingjian, the Tiananmen Square Massacre, Mao Zedong, the Cultural Revolution, the Great Leap Forward, the plasma economy scandal in Henan, and a discussion of book censorship itself.
-
ChatGPT correctly identified missing sentences in a partially censored translation and correctly judged a complete translation as complete in a control condition, demonstrating that LLMs are a viable complementary detection method. The paper notes that having multiple independent detection approaches (NLP alignment, bitext mining, LLM-based reasoning) improves overall robustness by enabling cross-validation.
-
The paper proposes detecting translation censorship by back-translating the Chinese text to English via Google Translate, embedding each paragraph with distiluse-base-multilingual-cased-v1, and solving a linear-sum-assignment bipartite matching weighted by negated cosine similarity. Paragraphs below a similarity threshold are flagged as cut; matched paragraphs are recursively compared at sentence level to detect alterations.
-
The paper argues that an effective counter to translation censorship is to actively trigger the Streisand effect: publishing detected censored content side-by-side with the original on a public website causes the censored text to reach a broader audience — including people who would not have read the censored version — and makes the censorship itself backfire. Censors deliberately avoid publicizing removals precisely to prevent this outcome.