(San Francisco) ChatGPT can now generate images: they are surprisingly detailed.
OpenAI, the San Francisco artificial intelligence (AI) company, showed a new version of its DALL-E image generator to a small group of testers on Wednesday and integrated it with ChatGPT, its popular online chatbot. .
DALL-E 3 can produce more convincing images than previous versions and is particularly good at images featuring letters, numbers and human hands, OpenAI says.
“It understands and represents what the user is asking much better,” says OpenAI researcher Aditya Ramesh, adding that DALL-E 3 has a more accurate understanding of the English language.
A Swiss knife
The addition of the latest version of DALL-E makes ChatGPT a Swiss army knife of generative AI: it can produce text, images, sounds, software and other digital media. ChatGPT’s success last year sparked a race among Silicon Valley tech giants to be at the forefront of AI advancements.
On Tuesday, Google released a new version of its Bard bot, which connects to several of the company’s popular services, including Gmail, YouTube and Docs. Midjourney and Stable Diffusion, two other image generators, were updated this summer.
ChatGPT has long been able to connect to other online services like Expedia, OpenTable, and Wikipedia. But the addition of the image generator is a first.
DALL-E and ChatGPT were previously separate applications. But with the latest version of ChatGPT, its users can obtain digital images by simply describing what they want to see. They can even create images from descriptions generated by ChatGPT, which automates the creation of graphics, artwork, and other media.
In a recent demonstration, OpenAI researcher Gabriel Goh obtained detailed text descriptions from ChatGPT that were then used to generate images. For example, ChatGPT created descriptions of the logo of a restaurant called Mountain Ramen; then, it generated multiple images from these descriptions in a matter of seconds.
DALL-E 3 can produce images from long descriptions and closely follow very detailed instructions, Goh said. Like all image generators – and other AI systems – DALL-E 3 can make mistakes, he added.
In order to refine its technology, OpenAI will not share DALL-E 3 with the general public until October. DALL-E 3 will then be offered on ChatGPT Plus, a service that costs US$20 per month.
AI image generation can be used to spread large amounts of misinformation online, experts warn. To avoid this, OpenAI has integrated tools into DALL-E 3 that are supposed to block certain subjects, such as images with sexual connotations and depictions of public figures. OpenAI also attempts to limit DALL-E’s ability to imitate the style of certain artists.
In recent months, AI has been used as a source of visual disinformation: a synthetic imitation – not very good, in fact – of an explosion at the Pentagon caused a brief fall in stock markets in May. There are also fears that this technology could be used for malicious purposes during elections.
According to Sandhini Agarwal, an OpenAI researcher who studies security and policy, DALL-E 3 generally generates images that are more stylized than photorealistic. But she admits the model could produce convincing scenes, like the type of grainy footage shot by security cameras.
For the most part, OpenAI doesn’t plan to block potentially problematic content from DALL-E 3. This approach would be “too broad,” says Sandhini Agarwal, because the images can be harmless or dangerous depending on the context. “It really depends on where they’re used and how people talk about them,” she said.
This article was published in the New York Times.