The new system enables “full control over the target by transferring the rigid head pose, facial expression and eye motion with a high level of photorealism.” Here, a source actor (the input) is used to manipulate a portrait video of a target actor (the output). Image: H. Kim et al., 2018
But it’s more than just facial expressions. The new technique allows for an array of movements, including full 3D head positions, head rotation, eye gaze, and eye blinking. The new system uses AI in the form of generative neural networks to do the trick, taking data from the signal models and calculating, or predicting, the photorealistic frames for the given target actor. Impressively, the animators don’t have to alter the graphics for existing body hair, the target actor body, or the background.
Secondary algorithms are used to correct glitches and other artifacts, giving the videos a slick, super-realistic look. They’re not perfect, but holy crap they’re impressive. The paper describing the technology, in addition to being accepted for presentation at SIGGRAPH 2018, was published in the peer-reviewed science journal ACM Transactions on Graphics.
Deep Video Portraits now presents a highly efficient way to do computer animation and to acquire photorealistic movements of pre-existing acting performances. The system, for example, could be used in audio dubbing when creating versions of films in other languages. So if a film is shot in English, this tech could be used to alter the lip movements to match the dubbed audio in French or Spanish, for example.
Unfortunately, this system will likely be abused—a problem not lost on the researchers.
“For example, the combination of photo-real synthesis of facial imagery with a voice impersonator or a voice synthesis system, would enable the generation of made-up video content that could potentially be used to defame people or to spread so-called ‘fake-news’,” writes Zollhöfer at his Stanford blog. “Currently, the modified videos still exhibit many artifacts, which makes most forgeries easy to spot. It is hard to predict at what point in time such ‘fake’ videos will be indistinguishable from real content for our human eyes.”
Sadly, deepfake tech is already being used in porn, with early efforts to reduce or eliminate these invasive videos proving to be largely futile. But for the burgeoning world of fake news, there are some potential solutions, like watermarking algorithms. In the future, AI could be used to detect fakes, sniffing for patterns that are invisible to the human eye. Ultimately, however, it’ll be up to us to discern fact from fiction.
“In my personal opinion, most important is that the general public has to be aware of the capabilities of modern technology for video generation and editing,” writes Zollhöfer. “This will enable them to think more critically about the video content they consume every day, especially if there is no proof of origin.”
[ACM Transactions on Graphics via BoingBoing]