The dark secret behind those cute AI-generated animal images
Another month brings us another flood of amazing images created by artificial intelligence. OpenAI’s new picture-making neural net DALLE 2 , was presented in April. It could produce amazing high-resolution images of almost any object it was asked to. It easily outperformed the original DALL-E in almost all aspects.
Now, just a few weeks later, Google Brain has revealed its own image-making AI, called Imagen. It performs better than DALLE 2. It scores higher on a standard measurement for grading the quality of computer-generated images and the pictures it produced were preferred to a group of human judges.
“We’re living through the AI space race!” one Twitter user commented. Another user tweeted, “The stock imagery industry is officially toast.”
Many of Imagen’s images are indeed jaw-dropping. Some of Imagen’s outdoor scenes could have been taken from National Geographic. Imagen could be used by marketing teams to create billboard-ready ads in just a few clicks.
Google is focusing on cuteness, just like OpenAI did with DALLE. Both firms promote their tools with pictures of anthropomorphic animals doing adorable things: a fuzzy panda dressed as a chef making dough, a corgi sitting in a house made of sushi, a teddy bear swimming the 400-meter butterfly at the Olympics–and it goes on.
There’s a technical, as well as PR, reason for this. Combining concepts such as “fuzzy panda”, “making dough” and “fuzzy panda”, forces the neural network learn how to manipulate these concepts in a way that makes sense. The cuteness of these tools hides a darker side. The public is not allowed to see it because it would reveal the ugly truth.
Most images that Google and OpenAI make public are cherry-picked. We only see cute images that match their prompts with astonishing accuracy. That’s to be expected. We also don’t see images that include hateful stereotypes, racism or misogyny. There is no sexist or violent imagery. There is no panda porn. We know that these tools are constructed in a way that is acceptable.
It’s no secret that large models, such as DALL-E 2 and Imagen, trained on vast numbers of documents and images taken from the web, absorb the worst aspects of that data as well as the best. OpenAI and Google explicitly acknowledge that.
Scroll down the Imagen website–past the dragon fruit wearing a karate belt and the small cactus wearing a hat and sunglasses–to the section on societal impact and you get this: “While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized [the] LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen is based on text encoders that were trained from uncurated web-scale data and inherits the social biases of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.”
It’s the same kind of acknowledgement that OpenAI made when it revealed GPT-3 in 2019: “internet-trained models have internet-scale biases.” And as Mike Cook, who researches AI creativity at Queen Mary University of London, has pointed out, it’s in the ethics statements that accompanied Google’s large language model PaLM and OpenAI’s DALL-E 2. These firms know their models can produce terrible content and don’t know how to fix it.
For now, the solution is to keep them caged up. OpenAI is limiting DALL-E 2 to a few trusted users. Google does not plan to release Imagen.
This would be fine if these were just proprietary tools. These firms push the boundaries of AI and their work shapes the AI we all live with. They create new marvels and new horrors, and then move on with a shrug. When Google’s in-house ethics team raised problems with the large language models, in 2020 it sparked a fight that ended with two of its leading researchers being fired.
Image-making AIs and large language models have the potential to transform the world, but only if they are controlled. This will require more research. These neural networks can be opened up for widespread research by taking small steps. A few weeks ago Meta released a large language model to researchers, warts and all. Hugging Face will release its open-source version GPT-3 in the coming months.
For now, enjoy the teddies.
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.