2023-streisand-where

Where Have All the Paragraphs Gone? Detecting and Exposing Censorship in Chinese Translation

Mizhang Streisand, Eric Wustrow, Amir Houmansadr · Free and Open Communications on the Internet · 2023

canonical link →

Tags

censors: cn
techniques: keyword-filtering

findings extracted from this paper

In China, it is typically the publisher (not the government) who self-censors translated books to avoid punishment including harsh scrutiny of future publications, book confiscation, and suspension of publishing rights. Authors are often unaware their translations were altered until well after publication. This self-censorship dynamic produces more restrictive outcomes than direct government censorship because publishers err on the side of caution without clear rules.

§1 Introduction policy keyword-filtering cn
A case study on Chapter 5 of Chinese Literature: A Very Short Introduction (Knight) found 7 censored topics, 5 removed paragraphs, 31 removed sentences, and 2 removed/altered words in the Chinese translation. Censored topics included 2000 Nobel laureate Gao Xingjian, the Tiananmen Square Massacre, Mao Zedong, the Cultural Revolution, the Great Leap Forward, the plasma economy scandal in Henan, and a discussion of book censorship itself.

§3 Case Study / Table 1 evaluation keyword-filtering cn
ChatGPT correctly identified missing sentences in a partially censored translation and correctly judged a complete translation as complete in a control condition, demonstrating that LLMs are a viable complementary detection method. The paper notes that having multiple independent detection approaches (NLP alignment, bitext mining, LLM-based reasoning) improves overall robustness by enabling cross-validation.

§4 Discussion / Appendix C detection measurement-platform cn
The paper proposes detecting translation censorship by back-translating the Chinese text to English via Google Translate, embedding each paragraph with distiluse-base-multilingual-cased-v1, and solving a linear-sum-assignment bipartite matching weighted by negated cosine similarity. Paragraphs below a similarity threshold are flagged as cut; matched paragraphs are recursively compared at sentence level to detect alterations.

§2 Methodology detection keyword-filteringmeasurement-platform cn
The paper argues that an effective counter to translation censorship is to actively trigger the Streisand effect: publishing detected censored content side-by-side with the original on a public website causes the censored text to reach a broader audience — including people who would not have read the censored version — and makes the censorship itself backfire. Censors deliberately avoid publicizing removals precisely to prevent this outcome.

§1 Introduction / §5 Conclusion defense keyword-filtering cn