A new tool could guard against deepfake voice scams

It should make it much harder to trick people with audio deepfakes

Scammers can use AI to create deepfake mimics of people’s voices. But researchers are fighting back. They’ve found a way to trick the tricksters and protect others from impersonating your voice.

Moor Studio/DigitalVision Vectors/Getty Images Plus

By Kathryn Hulick

April 15, 2024 at 6:30 am

This is another in a year-long series of stories identifying how the burgeoning use of artificial intelligence is impacting our lives — and ways we can work to make those impacts as beneficial as possible.

Imagine scrolling through TikTok and seeing famous YouTuber MrBeast pop up. He says he’s giving away brand-new iPhones. “Click the link below to claim yours now!” Do you click? Maybe. It sure looks and sounds like MrBeast. But it’s actually a deepfake — a phony clip created by artificial intelligence, or AI. Last October, this TikTok clip tricked some fans into sharing personal details. They also paid shipping fees for a phone that would never arrive. But a new tool — AntiFake — could help prevent such scams.

Most deepfake defenses simply scan existing video or audio files to try to see if they’re real. AntiFake is different. It aims to protect recordings of our voices so that deepfake AI models have more trouble learning to mimic them in the first place.

a hand holding a smartphone with an incoming phone call marked as 'Unknown' — In 2023, a Canadian woman got a scary call. Her grandson said he had gotten in a car accident and needed money to avoid going to jail. She believed the story and paid up. But it was a scam. AI had created a deepfake of her grandson’s voice. Tero Vesalainen/iStock/ Getty Images Plus

That could make it harder for AI to create voices for deepfake videos. It also could make it harder to create deepfake voices for phone scams. In some of those scams, crooks have used deepfake voices to call someone’s family members asking for money. In others, scammers have used the same technique to break into people’s voice-protected bank accounts.

“We have seen that attackers use [AI tools] to conduct financial fraud or trick our family and friends,” says Zhiyuan Yu. “This is a real risk.” Yu is a PhD student in computer science at Washington University in St. Louis, Mo. He worked with the university’s Ning Zhang to build AntiFake.

The computer scientists described their research at the 2023 ACM Conference on Computer and Communications Security. It was held in Copenhagen, Denmark, last November.

AntiFake complements past work by other groups that guards images against deepfake copycats by AI. “How do we make sure society is using AI responsibly?” asks Zhang. “That’s the ultimate goal of our work.”

In early April, this tool won a contest! Ning Zhang’s group was one of four that shared a $35,000 first prize in a Voice Cloning challenge. It was run by the U.S. Federal Trade Commission. This contest asked researchers for creative ways to help protect people from scams that use deepfaked voices.

Make some noise

All it takes to create a deepfake voice is real audio or video of someone talking. Often, an AI model can learn to mimic someone’s voice based on just 30 seconds or so of speech. It does this by creating something called an embedding.

This is basically a “series of numbers,” says Yu. The numbers are like an address to the speaker’s identity in a vast mathematical map of all voices. Voices that sound similar are located closer together in this map.

People, of course, don’t use this sort of map to recognize voices. We pay more attention to some frequencies of sound waves and less attention to other frequencies. The AI model uses all of these frequencies to create good embeddings.

AntiFake protects voice recordings by adding some noise to the frequencies that people pay less attention to. Human listeners can still understand the speech. That noise, however, can mess up an AI model’s ability to create a good embedding of the voice. That AI ends up creating an address to the wrong part of the map. So any deepfake generated with this embedding won’t sound like the original voice.

People can hear background noise in a file protected with AntiFake. That’s not ideal. But adding this level of noise allows the tool to protect against a wide variety of different deepfake attacks. If they knew which AI model a scammer was going to use to mimic someone’s voice, Yu says, his team could design more specific, subtler noise protection — sounds that people can’t hear.

But in the real world, he says, “there’s no way you will know” what tools scammers will use. Defenders have to be ready for anything.

Do you have a science question? We can help!

Submit your question here, and we might answer it an upcoming issue of Science News Explores

Testing, testing

To test AntiFake, Yu’s team posed as deepfake attackers. They used five different deepfake AI models to create 60,000 voice files mimicking different people. Next, they checked which deepfakes could fool human listeners and voice-authentication systems (like the ones for some bank accounts). The researchers picked 600 deepfake clips that fooled both people and machines.

Next, the team added AntiFake protection to the 600 voice clips those deepfakes had been based on. The scientists then sent the protected files back through the five deepfake AI models. This time, those models didn’t do such a good job. More than 95 percent of the new deepfake samples no longer tricked people or voice-authentication systems.

Fake out

Researchers tested AntiFake on 600 audio clips of people speaking. In this example, Zhiyuan Yu poses as a YouTuber welcoming people to his channel. In a successful deepfake mimic, his own voice seems to say something he’d never actually say (since vaccinations don’t cause cancer!) AntiFake adds some background noise to the protected version of the sample. That noise throws off the deepfake AI so that it spits out a voice that sounds nothing like Yu.

Original voice sample:

Successful deepfake mimic:

AntiFake-protected voice sample:

Failed deepfake mimic:

Audio: Coen Elemans/University of Southern Denmark

Banks and other companies are actually using the voice-authentication systems that were used to test the AntiFake-protected files, notes Shimaa Ahmed. “That was very good.” Ahmed is an expert in computer security at the University of Wisconsin–Madison and did not take part in building AntiFake.

She can envision people using AntiFake to protect their identities. “There is value in this,” she says. “It can be integrated with many applications.” For example, a social media site might someday prompt users to check a box if they’d like to add protection to a file they’re uploading.

But adding a bit of noise to protect voices concerns Ahmed. “If my voice is my job, for example, I would like to have my voice as it is,” she says. So people who are voice actors or singers might hesitate to guard their voices this way.

“This is actually a problem that we want to address in the future,” Yu notes. Instead of adding noise, he thinks AntiFake could modify the rhythm or tone of a voice. The voice would sound clear and unaltered to listeners. But it would contain carefully hidden signals to trick deepfake AI models.

Someone has to apply AntiFake to a voice file to protect it. So the technique can’t protect audio that someone has already posted or shared. It also can’t stop a scammer from secretly recording someone as they speak. But it’s an intriguing new tool to defend against deepfakes, says Ahmed — which is a “very challenging” task. People need all the help they can get to protect their identities online.

Note: This story has been updated to include mention that AntiFake shared top prize in a federal competition for tech to limit voice cloning.

This is one in a series presenting news on technology and innovation, made possible with generous support from the Lemelson Foundation.