Visual Culture

AI Researchers Created an Uncanny Video of the Mona Lisa Talking

Screen shot of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models  by Egor Zakharov via YouTube.

Screen shot of Few-Shot Adversarial Learning of Realistic Neural Talking Head Models by Egor Zakharov via YouTube.

As we march toward a post-truth future bolstered by fake images, we can at least be entertained along the way. Last week, researchers from Samsung’s AI Center and Skolkovo Institute of Science and Technology in Moscow, released an explainer on how they’ve trained neural networks to transform still portraits into lifelike videos, calling them “talking head models.” One example turned ’s Mona Lisa into a breathing, moving human. It’s fascinating to see the world’s most famous subject come to life, especially considering the painting’s much-debated and mysterious history. Dropping her static, signature smile for once, she looks quite contemporary. But it’s also jarring to consider how fast AI imaging is advancing, and what that means for our already contested media landscape.
Over the past three years, we’ve seen the eerie powers of AI algorithms serving up bogus faceswap videos ranging from hilarious to seriously disconcerning: Nicholas Cage replacing other actors in films, a Jordan Peele-voiced mimicry of Barack Obama, and famous actresses having their faces superimposed into incest porn. On the still-image side, tech company Nvidia has been hard at work producing lifelike, generated images of unassuming people who look like they could be the faces of future friendly neighborhood bots. Ever since Redditors coined the term deepfake for AI–edited or –generated videos of humans in 2017, we’ve seen the writing on the wall; soon enough we won’t be able to trust our own eyes and will rely on the same companies funneling money into AI research to come up with solutions for discerning the truth in this new reality.
These neural networks from Moscow are fed endless images showing the “facial landmarks” of humans, then trained to extract the particular features from a static portrait and recreate them in a video format. It can be done with one image, like the Mona Lisa, but the resulting “person” will be limited in their range of motion and emotion. However if the goal is to create a new video of a real person, and the algorithm can be trained using multiple frames from the source video, it can create a much more believable deepfake.
In order to pull off the effect with just one image, the neural networks rely on the huge dataset they are given, breaking down the images of real people into their features, codifying them, and re-using them to animate others. If you think a video made with one image doesn’t look quite as authentic as those made with multiple, not to worry—it won’t be long until the process is more refined.
It’s a common theme in sci-fi novels that if humans can do it, they will. As AI technology moves forward, “please consume responsibly” will be applicable for visuals, too, especially when the veracity of images presented as news becomes more unclear. But in art and visual culture, AI poses some intriguing possibilities for interactivity and identity, as well as the age-old search for truth. At the very least, thinking about all of the world’s famous paintings and photographs coming to life is just plain cool. Maybe the next digital influencer won’t be Lil’ Miquela, but Venus de Milo.
Jacqui Palumbo is Artsy’s Senior Editor, Visual Culture.