Deepfakes now close to perfection thanks to voice cloning

Until today, deepfakes, one of the favorite tools of the disinformation universe, have been fishing for the voice: it was either a crude imitation or an audio montage that sounded false. It was without counting on a young Ukrainian company indirectly rewarded with an Emmy Award for its vocal cloning of Richard Nixon.

On July 20, 1969, Neil Armstrong became the first man to set foot on the Moon, soon followed by Buzz Aldrin. From the Oval Office, Richard Nixon told the pride of every American to the Apollo 11 astronauts during a history-making phone call: “Hello Neil and Buzz!“, launches the American president to the two astronauts who listen to him, in their spacesuit, standing on the surface of the Moon. “I can’t say how proud we are. Every American has never been so proud in their life! ”

But two days earlier, and in complete secrecy, Nixon had asked his advisor, Bill Saphire, to write him a second speech, in case the mission turned into a disaster. The Massuchusetts Institute of Technology made an Emmy Award-winning film of it, to warn about a new generation of deepfakes: to the mouth-rigging that Barack Obama had discovered at the beginning of 2019, at his expense, is added today a new voice cloning technology. Nixon never gave that second speech. And yet …

In In Event of Moon Disaster, MIT’s advanced virtuality center is delivering this second speech to the former US president. We see Nixon, solemn and serious, sitting behind his desk in the White House, and he begins with these words:

“Good evening. Fate decided that the men who went to the Moon to explore it peacefully, stay on the Moon to find rest and peace there. These brave men, Neil Armstrong and Edwin Aldrin, know that there is no hope of saving them. But they know that in their sacrifice there is hope for humanity. They will be mourned by their family and by their friends. They will be mourned by their nation, by the whole world ”.

So, of course, the voice is a little trembling but the illusion is almost perfect, to the point of being very disturbing. For MIT, at least, the educational goal has been achieved and the Emmy Award for interactive documentary received on September 29th dedicated it. For the record, the image fixing is based on the real speech made by Nixon to announce his resignation, consequence of the Watergate affair …

“We will have a real-time voice converter by the end of the year.”

Alex Serdiuk, CEO of Respeecher

to franceinfo

What is new is therefore this voice: it is neither an imitation of Richard Nixon, nor a montage from televised interventions or recorded speeches. We are in the presence of a near-perfect vocal cloning from the voice of an actor who read this speech as Nixon would have.

“Our technology is based on deep learning and artificial intelligence, explains Alex Serdiuk, co-founder and CEO of Respeecher in Kiev. Cloning goes through a learning phase. We feed the technology with the maximum number of audio recordings of the target voice, in this case that of Richard Nixon. During this time, the computers reel. This phase lasted almost three months in the case of Nixon. From now on, fifteen days are sufficient. Once the system is trained, voice conversion only takes a few minutes. “

Progress is such that Respeecher already has a prototype that works in real time: “You speak into a microphone and the cloned voice comes out half a second later, says Alex Serdiuk. We have already shown this prototype to several customers. I hope that by the end of the year we will have a real-time voice converter up and running. “

If the technology impresses, it is also cold in the back. How not to imagine an ill-intentioned spirit taking his phone and pretending, with a cloned voice, for the French President, the Pope or Kim Jung-Un?

Not enough to panic Jean-Marc Dumontet, producer of C Canteloup on TF1, which has been based entirely on visual deepfakes since the start of the 2020 school year. When he talks about it, we also feel that the showman has become one of the best specialists. Voice cloning does not worry him either:

“I do not believe in these gross falsifications which could mislead the public. If François Hollande announces to us that he supports Eric Zemmour, between news channels and social networks, it will be denied and condemned within 45 seconds. And I thinks that it will obviously not serve François Hollande but it will not serve Eric Zemmour either. So, I don’t see the point, except that of the gag, the antics, the fun and the entertainment! “

“The deepfake takes a long time to achieve a satisfactory result.”

Jean-Marc Dumontet, producer of “C’est Canteloup” on TF1

to franceinfo

And then, these technologies, visual and vocal, remain extremely sophisticated:

“The deepfake takes a long time to polish characters, adds Jean-Marc Dumontet. Our technology today is excellent because we have been using it daily for two years. Even me, if I wanted to fabricate a Emmanuel Macron speech that holds up, I would need a lot of time. “

Alex Serdiuk may have exclusive technology, he knows the danger exists:

“Sooner or later voice cloning will fall into the wrong hands. So educating the public is fundamental. That’s why we got involved in Project Nixon. And as far as we’re concerned, we don’t launch any projects. at the request of a client without providing us with written permission from the voice to be cloned or its descendants. ”

From Kiev, however, he denies opening Pandora’s box: “It’s just a tool, like Photoshop, like the Internet. Photoshop is also used around the world to create posters and magazines, not just to make images lie. The Internet, too, has a dark side but it is technology without which we couldn’t talk to each other right now. So it’s just a tool that requires education. “

This technology can be scary, but there is indeed another outlet that raises questions of a different order: the return on stage of missing stars like Michael Jackson or Whitney Houston. Their return, in the form of a hologram, was already launched before the appearance of this new technique of voice cloning, which opens up new perspectives.

With their cloned voices, these transparent silhouettes could not only sing their hits but also new songs written after their disappearance. Pure prospective: Alex Serdiuk remains silent on his current projects, but nothing prevents to imagine this kind of use.

Respeecher’s technology has already been used in this kind of context. Jon Favreau, the director of the series The Mandalorian, taken from the Star Wars universe on Disney +, entrusted Respeecher with the task of rejuvenating by 40 years the voice of Mark Hamill, alias Luke Skywalker, who appears, young, in the finale of season 2. The illusion had been such that no fan had noticed the slightest anomaly in the voice. Disney + had observed and waited almost a year before revealing… that it was a cloned voice.

source site