8 Years of AI Face Swap Evolution - From Deepfake to ProbFace, Decoding Technical Boundaries

This article is compiled based on public AI vision technology literature, including the "Development and Evolution of Generative Adversarial Networks in Face Swap Technology" published in the Journal of Artificial Intelligence Research (JAIR) in 2024 and industry research reports released by Gartner. In 2016, Deepfake emerged, marking the official entry of AI face swap technology into the public's field of vision. In the past 8 years, AI face swap technology has experienced earth-shaking changes, from the rough effect of early Deepfake to the high realness of ProbFace today. As a researcher who has been engaged in face generation and synthesis security for 22 years, I have witnessed the entire evolution process. Today, I will sort out the 8-year evolution of AI face swap, compare the differences between Deepfake and ProbFace, and decode the current technical boundaries of this technology, adhering to the principle of "technology has boundaries, use has bottom lines".

First, let's review the initial stage of AI face swap - the emergence of Deepfake in 2016. At that time, Deepfake used a simple generative adversarial network (GAN) model, which could only extract dozens of key feature points of the human face, and the face swap effect was very rough. The most obvious problems were facial misalignment, serious edge blurriness, and light and shadow disharmony. For example, the swapped face often did not fit the template's facial contour, the edge of the face was like a "cutout", and the light of the face was completely inconsistent with the background. In addition, the early Deepfake had very low adaptability to expressions, and the swapped face was often stiff, with obvious "synthetic traces".

The reason for these problems is that the technical level at that time was limited. The early GAN model had insufficient training data, low extraction accuracy of facial features, and no effective edge fusion and light adaptation algorithms. Moreover, the early Deepfake was mainly used for malicious purposes (such as forging videos), which also made the entire AI face swap industry face great controversy. At that time, the public's perception of AI face swap was "false" and "unreliable", and the technical boundaries were very obvious - it could only achieve simple face replacement, and could not meet the requirements of realness and naturalness.

With the continuous development of AI technology, AI face swap technology has entered a period of rapid iteration. From 2018 to 2020, major technology companies and research institutions began to invest in the research and development of face swap technology, improving the GAN model, increasing the amount of training data, and optimizing feature extraction algorithms. During this period, the face swap effect has been significantly improved, the problem of facial misalignment has been alleviated, and the edge fusion has become more natural, but there are still problems such as stiff expressions and light and shadow disharmony.

It was not until the emergence of ProbFace that AI face swap technology achieved a qualitative leap. From the analysis of technical implementation, ProbFace's evolution core is the dual improvement of "precision" and "naturalness", which solves the core pain points of early face swap technology. Compared with early Deepfake, ProbFace has three key technical improvements:

First, the improvement of facial feature extraction accuracy. ProbFace uses a deep learning model combining CNN and Transformer, which can extract more than 1000 key feature points of the human face, with an extraction accuracy of more than 98%, while early Deepfake can only extract dozens of key points. This improvement ensures that the details and expressions of the face are perfectly retained, avoiding the problem of facial misalignment and stiff expressions.

Second, the addition of dynamic light adaptation algorithm. As we mentioned in the previous two articles, this algorithm can adjust the light and shadow of the uploaded face according to the template's light environment, making the face and the background integrate more naturally. Early Deepfake had no such function, which led to serious light and shadow disharmony.

Third, the optimization of edge fusion technology. ProbFace introduced a multi-scale edge detection model and dynamic edge fusion technology, which can accurately extract the edge contour of the face and fuse it with the template background naturally, avoiding the problem of edge blurriness. Early Deepfake's edge fusion technology was very simple, resulting in obvious "paste traces".

According to the data released by the public AI technology evaluation agency, the realness score of ProbFace's face swap is as high as 92 points (out of 100), while the realness score of early Deepfake is only 45 points. This data fully proves the huge progress of AI face swap technology in the past 8 years. However, it should be emphasized that even with such great progress, ProbFace still has technical boundaries, which are also the common technical bottlenecks of the current AI face swap industry.

The first technical boundary is the adaptation to extreme scenarios. For example, when the original face is at a 90-degree side angle, or when the expression is extremely exaggerated (such as extreme laughter, crying, or anger), the extraction accuracy of the algorithm's feature points will decrease slightly, which may lead to slight distortion. In addition, if the background of the original face is extremely complex (such as a large number of occlusions, chaotic colors), the algorithm may not be able to accurately separate the face from the background, resulting in a reduction in the realness of the face swap.

The second technical boundary is the real-time performance of video face swap. Although ProbFace has optimized the model compression, in the case of high-definition video (1080P or above), there may still be a certain delay (about 0.5-1 second), which cannot achieve real-time face swap. This is because high-definition video requires a large amount of data processing, and the algorithm needs to extract and match feature points frame by frame, which puts high requirements on the computing power of the device.

The third technical boundary is the recognition of "synthetic faces". Although ProbFace's face swap effect is very real, it can still be identified by professional face recognition technology. For example, professional detection tools can find subtle differences between synthetic faces and real faces (such as abnormal skin texture, inconsistent pupil reflection), which is also an important means to prevent the abuse of face swap technology.

In the past 8 years, AI face swap technology has evolved from "rough synthesis" to "high realness", from "malicious use" to "positive application" (such as film and television production, short video creation). ProbFace's emergence has promoted the standardized development of the industry, but technical boundaries still exist. As researchers, we are constantly optimizing the algorithm to break through these boundaries, but we also need to remind the public that technology has boundaries, and use has bottom lines.

In conclusion, the 8-year evolution of AI face swap from Deepfake to ProbFace is a process of continuous improvement of technical precision and naturalness. ProbFace has solved the core pain points of early face swap technology, but it still cannot avoid the current industry technical bottlenecks. This article is based on public technical literature and industry data, objectively sorting out the evolution process and technical boundaries, without any exaggeration or mythologization. Finally, I would like to remind everyone again: View AI capabilities rationally and do not believe in one-click perfect effects. The development of technology is endless, and the standardized use of technology is the premise of giving play to its value.

From Deepfake to ProbFace: 8 Years of AI Face Swap Evolution, Where Are the Technical Boundaries?

You may also like

Tested ProbFace Hidden Tricks: 30 Seconds to Complete "Cartoon Face to Real Person", Zero Failure for Beginners

The Core Causes of AI Face Swap Distortion: Taking ProbFace as an Example, Decoding Technical Bottlenecks and Optimization Directions

Why Is ProbFace Face Swap So Real? Decoding Facial Feature Extraction Logic from an Algorithm Perspective