2026-zohaib-extended
findings extracted from this paper
-
CensorAlert aggregates censorship signals from heterogeneous open sources — including OONI, Cloudflare Radar, NetBlocks, GitHub Issues of circumvention tools (Hysteria, Xray), Net4People BBS, NTC Party forum, Mastodon, X, Telegram channels, and arXiv — normalizing each item into a common schema preserving timestamp, source, raw data, and provenance with a link to the original content.
-
Systematic measurement platforms (OONI, Censored Planet, Cloudflare Radar, NetBlocks) have inherent blind spots due to geographic coverage and protocol-specific test constraints; critical censorship discoveries — including GFW fully-encrypted protocol blocking and regional Chinese censorship — were first surfaced by user reports on forums and GitHub issue pages of circumvention tools, not by automated measurement infrastructure.
-
CensorAlert generates text embeddings (OpenAI text-embedding-3-small) from each item's summary, title, and tags to cluster semantically similar reports within configurable time windows; near-duplicates such as reposts, copied headlines, and translations are collapsed into a single canonical post that preserves all original source URLs and metadata.
-
CensorAlert's LLM agent scores each ingested item 0–5 on five independent dimensions — credibility, novelty, impact, timeliness, and verifiability — then computes a normalized significance score (0–10) from these components; items are processed every two hours using OpenAI GPT-5 Thinking (hosted on Azure) constrained to return structured JSON output.