There’s no Tiananmen Square in the new Chinese image-making AI
There’s a new text to image AI in town. ERNIE-ViLG is a new AI created by Baidu, a Chinese tech company. It can create images that capture China’s cultural specialties. It can also create anime art better than DALL-E 2 and other Western image-making Ai’s.
But there are many things that the AI won’t show you, such as Tiananmen Square which is the country’s second largest city square and a symbol of the country.
When a demo version of the software was released in August, users quickly discovered that certain words were labeled “sensitive” and prevented from producing any results. These words could be explicit mentions of political leaders or words that are only politically controversial. It seems that China’s sophisticated online censorship system has now extended to AI.
It’s not uncommon for similar AIs restrict users from creating certain types of content. DALL–E 2 forbids sexual , faces or images of medical treatment. The case of ERNIE ViLG highlights the question of where the line is drawn between moderation, political censorship, and other forms of censorship.
The ERNIE-ViLG algorithm is part of Wenxin. It is a large-scale project for natural-language processing by China’s top AI company Baidu. It was trained on a data set of 145 million image-text pairs and contains 10 billion parameters–the values that a neural network adjusts as it learns, which the AI uses to discern the subtle differences between concepts and art styles.
That means ERNIE-ViLG has a smaller training data set than DALL-E 2 (650 million pairs) and Stable Diffusion (2.3 billion pairs) but more parameters than either one (DALL-E 2 has 3.5 billion parameters and Stable Diffusion has 890 million). In late August, Baidu released a demo version of its platform and later from Hugging Face ,, the popular international AI community.
The main difference between ERNIE–ViLG models and Western ones is that the Baidu-developed model understands prompts in Chinese and is less likely make mistakes when it is culturally specific words.
A Chinese video creator compared results from different models when asked to choose prompts that included Chinese historical figures and pop culture celebrities. He found that ERNIEViLG produced more accurate images then DALL-E 2 and Stable Diffusion. ERNIE-ViLG was also embraced by the Japanese animation community ,. They found that it can produce more satisfying anime art than other model, likely due to the fact that it has more anime training data.
But ERNIE-ViLG, like the other models, will be defined by what it allows. ERNIE-ViLG doesn’t have a published explanation about its content moderation policy. Baidu declined to comment on this story.
When the ERNIE–ViLG demo was released on Hugging Face, users who entered certain words would get the message “Sensitive Words Found.” Please enter again (Cun Zai Min Ga Ci,Qing Zhong Xin Ru ),” which was a surprisingly honest admission about the filtering mechanism. However, since at least September 12, the message has read “The content entered doesn’t meet relevant rules. After adjusting it, please try again. (Shu Ru Nui Rong Bu Fu Ze Xiang Guan Gui Ze Xiang Hou Zai Shi!)”
HTML3_In a test by MIT Technology Review, a few Chinese words were blocked. These included names of high-profile Chinese leaders like Xi Jinping or Mao Zedong, terms that could be considered politically sensitive like “revolution” (a metaphor for using a VPN in China) and Yanhong (Robin Li, Baidu’s founder, Yanhong (Robin), Li, the CEO and CEO, Yanhong (Robin).
While words like “democracy”, “government” and other words are allowed, prompts that combine them, such as “democracy Middle East,” or “British Government,” are blocked. ERNIE-ViLG also doesn’t have Tiananmen Square, Beijing. This is likely due to its association with Tiananmen Massacre, which is heavily censored here in China.
In today’s China, social media companies usually have proprietary lists of sensitive words, built from both government instructions and their own operational decisions. This means that the filter ERNIE -ViLG uses is likely to be different from those used by Tencent-owned WeChat and Weibo. Sina Corporation operates Weibo. Some of these platforms have been systematically tested by the Toronto-based research group Citizen Lab.
Badiucao is a Chinese-Australian cartoonist. He uses an alias to protect his identity and was one of the first to notice the censorship in ERNIE–ViLG. His artworks often criticize the Chinese government and its leaders. These were the first prompts he used to create the model.
“I was also exploring its ecosystem intentionally. Badiucao says that it’s a new territory and that censorship has caught up to it. “But [the result] It’s quite a shame Badiucao, an artist, doesn’t agree that moderation should be allowed in these AIs. He believes he should decide what’s acceptable for his art. He cautions that censorship motivated by moral concerns should not necessarily be confused with censorship for politics. He says, “It’s different if an AI judges what it can generate based on commonly agreed upon moral standards and when the government, as third parties, says you can’t because it harms either the country or national government.”
The difficulty in defining censorship and moderation comes from differences between cultures and legal systems ,, Giada Pistilli principal ethicist at Hugging Face. Different cultures may interpret the same imagery in different ways. Pistilli says that in France, religious symbols are not allowed in public. This is because it’s an expression of secularism. “When you go to the US, secularism means that everything, like every religious symbol, is allowed.”
In January, the Chinese government proposed a new regulation banning any AI-generated content that “endangers national security and social stability,” which would cover AIs like ERNIE-ViLG.
What could help in ERNIE-ViLG’s case is for the developer to release a document explaining the moderation decisions, says Pistilli: “Is it censored because it’s the law that’s telling them to do so? Is it because they think it’s wrong? It helps to explain our arguments .”
Despite the inherent censorship, ERNIEViLG will continue to be an important player in large-scale text/to-image AI development. AI models that are trained on specific language data sets can overcome some of the limitations of mainstream models that are English-based. It is especially useful for users who require an AI that can understand Chinese and generate accurate images accordingly.
Just as Chinese social media platforms have thrived despite strict censorship, so ERNIE-ViLG may also experience the same fate. They are too valuable to be ignored.
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.