What Is Deepfake and How to Use It | DiscoverDataScience.org (2024)

Deepfakes are a synthetic media created by machine-learning algorithmsnamed for the deep-learning methods usedin the creation processand the fake eventstheydepict.

Deepfake methods intersect disciplines and industries from computer science and programming to visual effects, computer animation,and even neuroscience.They can be convincingly realistic and difficult to detect when done welland with the aid of sophisticated and powerful technologies.

But ultimately,machine learning is a foundational concept for data scientists, and as such, it offers an interesting area of study in the context of deepfakes and the predictive models used to create them. The training methods, algorithmic structures and synthetic output of these models offer insight into deep learning and data.

FIND SCHOOLS

A Brief History of Deepfake Technology

In 2017 a Reddit user by the name of “deepfake” posted p*rnographic videoscreatedthrough the use offace-swapping technology that replaced theoriginal subjects’faceswith those of known celebrities.

Though cropping up in a variety of applications, Deepfakes have been implemented within the p*rn industry more than any other to date.A 2019 reportreleased by Amsterdam-based cybersecurity firmSensity— formerlyDeeptrace— found that “nonconsensual deepfake p*rnography accountedfor 96% of the total deepfake videos online.”

This, however, is not where the deepfake story begins, ends, or best succeeds.

Deep-learning technology, including rudimentary versions of the models that make deepfakes — also known as synthetic media — has existed for decades, but the limited graphics processing power of computers at that time made most applications cumbersome and impractical.

According to freeCodeCampcontributor Nick McCullum, the cognitive psychologist and computer scientist Geoffrey Hinton contributed significantly to the study of deep learning with his introduction of the artificial neural network.

Hinton’s artificial neural network, an integral component of advanced deepfake techniques used today, was intended to closely resemble the architecture of the human brain, relaying signals through layers of nodes that process large amounts of data to learn and classify information.

Similar tothe way neurons in the human brain create meaning as they process the data they receive, artificial neural networks, or ANNs, pass raw data (noise) from their input layers to their middle (hidden) layers and finally to the output layer.

As we’ll see when we get to the section on how to create a deepfake video, image,or audio by way of artificial intelligence deep-learning models, the most accurate synthetic media outputs are those that result from a large volume of high-quality data.

For example, some of the most popular deepfakes have come from visual effects specialist Chris Ume, who offered a peek behind the curtain of his unsettlingly realistic viraldeepfakes of Tom Cruise onTikTok.

In an interview withScience Weekly, Ume explained that such sophisticated deepfakes require “asmuch data as possible — pictures, videos, anything you can find. And then you scrub through them and you clean itupso you only have the best of the best.”

This abundance of available data is a big part of what makes the Tom Cruise videos so uncannily authentic. The actor has been filmed and photographed for nearly 40 years, so the sheer volume of data that could be used for training makes the output — i.e., the deepfake — a stunningly accurate representation.

“The important thing is you cover all angles.You have as much expression as possible and even try to have a lot of different light angles, so the machine knows how Tom’s face reacts in certain scenes,” said Ume.

How to Create a Deepfake Video

There are several ways to make deepfake videos. For a piece of synthetic media to qualify as a true deepfake, itmust use deep-learning training techniques to achieve the goal of facial manipulation. This includes altering expressions, swapping the faces of two realpeopleor generating a nonexistent human face from a dataset that includes thousands of images of real people.

Recalling Ume’s explanation of the resources needed to make realistic synthetic media, we can deduce that some methods are more precise and exacting than others.

In addition to massive amounts of pre-training data in the form of random faces andthe training data required for Cruise’s face specifically, Ume attributes the authenticity of his deepfakes to the performance of professional actor Miles Fisher.

Fisher has perfected Cruise’s gestures,mannerismsand facial expressions, which provided thedestination videos that Ume used in his deepfakes. He compared the current deepfake technology to that of Photoshop, telling host Alex Hern that, just as it takes professional-level skills to create superior images with Photoshop, you need a high level ofskill and experience for generating undetectable deepfakes.

“You can’t do it by just pressing a button,” Umetold The Vergein a recent interview. “That’s important, that’s a message I want to tell people.”

All of this is to say that high-quality synthetic media require impeccable data for both the source and target media to effectively train the models.

It’s also worth noting that for regular people, this amount of data does not exist in the wild.

Tech and Gear

To create a deepfake on the level of the videos found on Ume’s @deeptomcruiseTIkTok, you would need a high-powered machine and GPU.

You can findseveral no-code apps, websites and open-source software that allow for facial manipulation in one of two categories: facial expression manipulation and facial identity manipulation. The latter is commonly known as “face swapping.”

Commercial Apps and Websites for Facial Manipulation

FaceSwap
Face2Face
Reface
Deepfakes Web
DeepFaceLab

In addition to these tools, we’ve seen a deluge ofshallowfakesthat employ methods as simple as slowing or accelerating audio and video or mislabeling media with the intent to deceive.

In 2019, human rights campaigner and Witness program manager Sam Gregory raised awareness of the dangers ofshallowfakesin aspeech he gave to anEmTechDigital audience.

“By these ‘shallowfakes’ I mean the tens of thousands of videos circulated with malicious intent worldwide right now — crafted not with sophisticated AI, but often simply relabeled and re-uploaded, claiming an event in oneplace has just happened in another,” Gregory said.

From a data science and machine-learning standpoint,shallowfakesare of little value, but it’s important to note their existence and understand the difference between ashallowfakeand a deepfake.

The artificial intelligence and deep-learning technology currently used for deepfakes typically involve generative adversarial networks, or GANs, and autoencoders.

The Science Behind Deepfakes

Data scientists interested in the implications of deepfake technologyforprivate enterprise, government entities, cybersecurity,and public safety can learn a lot from studying deepfake methods and the science behind them.

In the face of evolving deep-learning models, it is becoming more crucial for researchers, companies,and world leaders to develop the skills and resources to address the potential threat posed by harmful synthetic media.

Government agencies and large corporations are invested in the advancement of deepfake detection. ThanhThiNguyen and colleagues authored a paper titled “Deep Learning for Deepfakes Creation and Detection,” which emphasized the importance of deepfake detection.

“To address the threat of face-swapping technology or deepfakes, the United States Defense Advanced Research Projects Agency (DARPA) initiated a research scheme in media forensics (named Media Forensics orMediFor) to accelerate the development of fake digital visual media detection methods.”

The ability to create and detect deepfakes will become a more valuable skill set as the technology advances and the potential for nefarious uses of deep learning escalates.

Autoencoders

An autoencoder is an unsupervised neural network that can reduce the dimensionality of raw data and generate an output that replicates its input.

Autoencoders consist of encoders and decoders. When data is fed through the first layer — the input layer — of the autoencoder’s neural network, the encoder compresses the image and feeds it to the decoder. The decoder then attempts to reconstruct the original data.

Deepfakes leverage autoencoders by training two network pairs, one encoder-decoder pair for the source-image dataset and another for the target-image dataset. The pairs share the encoder network, which allows the encoder to learn the structure of a human face. When the source image then passes through the decoder that has been trained for the target image, it synthesizes the two images in the reconstruction process.

Generative Adversarial Networks — GANs

A generative adversarial network, or GAN, is a machine-learning method in which two neuralnetworks — a generator and a discriminator —competeto boost their levels of accuracy.

In this model, which is often referred to as a zero-sum game, the generator converts randomized data from a training dataset into an image. This image is added to a stream of real images that is then fed to the discriminator. The job of the discriminator is to differentiate the real images from the synthetic images.

The goal of a neural network is to minimize errors. In the case of deepfakes, this means minimizing the difference between the fake image and the real images. To achieve this result, the processis repeated with model-weight adjustments until the output reaches the desired level of accuracy.

First Order Motion Model

In first order motion models, image animation techniques allow the user to animate existing videos using the source code from the paper First Order Motion Model for Image Animation.

According to the authors, the model “istrained to reconstruct the training videos by combining a single frame and a learned latent representation of the motion in the video.”

DimitrisPoulopoulos, a machine learning engineer and contributor to Towards Data Science, summarized the first order motion model and provided an interactive example in which he used the source code to create a shell script, and then applied the model weights, a YAML configuration file, a source image, and a driving video.

How Neural Networks Make Deepfakes Possible

We’ve referenced neural networks throughout this article. That’s because neural networks are the basis for deepfake technology.

Neural networks make machine learning possible through a “feed-forward” structure of interconnected nodes. These nodes mirror the neurons in the human brain. And like thehuman brain, a computer can learn to perform a task through training.

In fact,researchers at MITin 2016 reported that their computational model of the humanbrain’s face-recognition mechanism generated a spontaneous reproduction of “invariant representations” of faces.

They had designed and trained a machine-learning scheme for the model and discovered that “the trained system included an intermediate processing step that represented a face’s degree of rotation — say, 45 degrees from center — but not the direction — left or right.”

According to the researchers, this step, which wasn’t built into the algorithm and appeared to mimic the human brain in its recognition of faces and objects, was “an indication that their system and the brain are doing something similar.”

Christof Koch of the Allen Institute for Brain Science considered the findings significant.

“In this day and age, when everything is dominated by either big data or huge computer simulations, this shows you how a principled understanding of learning can explain some puzzling findings,” said Koch.

Neural networks of half a century ago consisted of fewer than five layers. Today, the advanced capabilities of graphics processing unitsare able topower neural networks with depths of up to 50 layers.

For data scientists to fully understand deep learning, they need a solid foundational knowledge of neural networks.

The process isfairly straightforward.

Thenodes throughout the layers of a neural network receive input signals and perform calculations that result in output signals that the nodes then feed forward to the next layer. The more layers of nodes, the deeper the network.

The connection that allows the transmission of these signals, the synapse, is associated with a weight that determines the influence of the node on the final output. During training, the synapse weights are adjusted repeatedly.

According to McCullum, weights “are a very important topic in the field of deep learning because adjusting a model’s weights is the primary way through which deep learning models are trained.”

The output layer of the neural network makes predictions based on the calculations of the hidden layers in a deep network. The training process consists of the network determining which input values should be used in the next layer’s calculations.

In order forthe computer to make these determinations, the program must be soft coded to enable the computer to interpret the problem and solve it on its own.

Uses: The Good & Bad

The ethical implications of deepfake technology have been debated for the last several years. While there are many beneficial uses for image and video manipulation, researchers will need to stay on top of this evolving branch of artificial intelligence and continue to hone their skills to guard against harmful applications employed by bad actors.

The threat inherent in such powerful technology is real, and it increases with each new development. As deepfakes improve and become less detectable, the risk to our security expands in scope and potential impact.

According to Nguyen and colleagues, deepfakes become a threat to the world when they are used to falsify the speech andactions of world leaders.

“Deepfakes therefore can be abused to cause political or religion tensions between countries, to fool public and affect results in election campaigns, or create chaos in financial markets by creating fake news,” the authors posited.

Constructive Applications:

Entertainment
Intercultural communication
Disruption of extremist groups

Laws and policies have been implemented to thwart the exploitation of deepfake methods, but critics say many of these rules don’t go far enough.

After Facebookannounced its policy to ban deepfakes,The Guardian reportedthat “Facebook did not give a reason as to why it limited its policy exclusively to those videos manipulated using AI tools, but it is likely that the company wanted to avoid putting itself in a situation where it had to make subjective decisions about intent or truth.”

And according to IEEE Spectrum, “Identity fraud was the top worry regarding deepfakes for more than three-quarters of respondents to a cybersecurity industry poll by the biometric firmiProov.”

Data scientists who work in the financial services industry will likely be at the forefront of AI for detecting deepfakes and performing digital media forensics.

Destructive Applications:

Bullying
Scams
Extortion
Identity fraud

As human rights activist MarkLatoneroargued, we need data scientists and technology companies to take a proactive approach to the inevitable erosion of trust that will follow the deepfake evolution. If the notion that we, as merehumans, can’t trust anything we see takes hold, it will undermine democracy.

“Now is really the time for companies, researchers, and others to build these very strong connections to civil society, and the different country offices where your products might launch. … Engage with the people who are closest to the issues in these countries. Build those alliances now,” he said. “When something does go wrong — and it will — we can start to have the foundation for collaboration and knowledge exchange.”

FIND SCHOOLS