Ian Sansavera, a software architect at a New York startup called Runway, typed out a short description of what he wanted to see in a video. “A quiet river in the forest,” he wrote.
Less than two minutes later, an experimental internet service generated a short video of a lazy river in a forest. The running water of the river glistened in the sun as it weaved through the trees and ferns, took a turn and gently splashed against the rocks.
Runway, which plans to open its service to a small group of testers this week, is one of many companies developing technology using artificial intelligence (AI) that will soon allow people to generate videos simply by typing a few words in a box on a computer screen.
These companies represent the next step in an industry race — involving giants like Microsoft and Google, as well as much smaller start-ups — to create new kinds of artificial intelligence-based systems that some say will could be the next big technological innovation, as important as web browsers or the iPhone.
New video generation systems could speed up the work of filmmakers and other digital artists, while becoming a fast new way to create hard-to-detect fake news online, making it even harder to know what’s real on the Internet.
These systems are examples of so-called generative AI, which can instantly create text, images and sounds. Another example is ChatGPT, the online chatbot created by San Francisco startup OpenAI, which stunned the tech industry with its capabilities late last year.
Google and Facebook’s parent company Meta unveiled the first video-generating systems last year, but did not make them available to the public because they feared the systems could be used to spread misinformation with a new speed and efficiency.
Runway CEO Cris Valenzuela said he thinks the technology is too important to keep in a research lab, despite the risks it entails. “It’s one of the most impressive technologies we’ve built in the last hundred years,” he said. People have to actually use it. »
Generative AI
The ability to edit and manipulate films and videos is nothing new, of course. Filmmakers have been doing it for over a century. In recent years, researchers and digital artists have used artificial intelligence and software to create and edit videos, often referred to as deepfake (overkill).
But systems like the one created by Runway could eventually replace editing skills with the press of a button.
Runway’s technology generates videos from any short description. To get started, just type in a description, just like you would for a quick note.
This works best if the scene has a bit of action, but not too much – something like “a rainy day in the big city” or “a dog with a cell phone in the park”. Press “Enter” and the system generates a video in a minute or two.
The technology can reproduce common images, such as a cat sleeping on a carpet. She can also combine disparate concepts to generate weirdly funny videos, like a cow at a birthday party.
The videos are only four seconds long, and on closer inspection, they’re choppy and blurry. Sometimes the images are weird, distorted and disturbing. The system has a habit of fusing animals like dogs and cats with inanimate objects like balls or cell phones. But given the right direction, he produces videos that show the future of technology.
“At this point, if I see a high-resolution video, I’m probably going to trust it,” said Phillip Isola, a Massachusetts Institute of Technology professor and AI specialist. “But that will change quite quickly. »
Like other technologies that use generative AI, Runway’s system learns by analyzing digital data — in this case, photos, videos, and captions describing the content of those images. By training this type of technology on increasingly large amounts of data, researchers are confident that they can quickly improve and extend its skills. Soon, according to experts, they will produce professional-looking mini-movies, complete with music and dialogue.
“In the past, to do something like this, you needed a camera. We needed props. We needed a filming location. Permission was needed. You had to have money,” says Susan Bonser, a Pennsylvania-based author and publisher who experimented with early incarnations of generative video technology. “Today, none of this is necessary. You can just sit back and imagine it. »
This article was originally published in the New York Times.