2017-knockel-measuring
findings extracted from this paper
-
Chinese mobile games widely implement keyword censorship client-side — blacklists were found embedded in plain text, XML, JSON, compiled Lua, compiled C++, and encrypted formats requiring reverse engineering to extract. The client-side implementation exposed 132 keyword lists from 113 different games in the first experiment alone. Games must submit their blocked keyword list to regulators (MOC/SAPPRFT) to obtain a publication license, making keyword filtering a regulatory compliance artifact rather than purely an operational choice.
-
Analysis of over 183,111 unique keywords collected from 200+ Chinese mobile games found no central state or provincial authority controlling keyword list generation. The only consistently significant predictors of keyword list similarity were whether games shared the same developer (Mantel r=0.17, p<0.001) or publisher (r=0.15, p<0.001); city, province, and genre showed no significant correlation (p>0.58). This indicates Chinese companies have substantial flexibility in determining which content to block under the 'self-discipline' intermediary liability framework.
-
When controlling for shared-developer as a confound, shared-publisher correlation collapsed to r=0.047 (p=0.0015) in the first experiment and r=0.064 (p=0.015) in the second; when controlling for shared-publisher, shared-developer remained r=0.095 (p<0.001) and r=0.13 (p<0.001) respectively. This demonstrates that development teams — not publishing entities — are the primary locus of keyword list authorship in the Chinese mobile gaming ecosystem.
-
Forensic analysis of keyword list formatting artifacts — C-style escapes appearing in XML files, XML entities appearing in non-XML files, and double-backslash encoding traceable to a 2004 leaked QQ keyword list — provides evidence that developers copy and circulate keyword lists across companies through informal channels including old web applications and bulletin boards. This keyword propagation mechanism explains partial overlap between unrelated companies' lists without implying a central authority.
-
Content analysis of 7,000 randomly sampled keywords (±1.1% at 95% confidence) found Social content (gambling, illicit goods, competitor references) was the dominant theme at 51.16%, followed by Technology/URLs at 16.81%, Political content at 15.00%, People (officials, dissidents) at 6.57%, and Event-related keywords at only 4.89%. Gaming keyword lists lacked references to current events from 2016–2017 that were found actively censored on Chinese chat applications during the same period, suggesting games face lower scrutiny for real-time event censorship than communication platforms.