In 1997, Bregler, Michele, and Slaney wrote a paper about a genuinely innovative and, at the time, unique piece of software. Essentially it automated what some production studios could do only with hours upon hours of manual work, the Video Rewrite Program. It tied these together by building upon older work that created realistic audio from text and samples and modelled lips onto a 3D space. It created a piece of software that combined all these concepts and animated them convincingly.
The intention was to use the system to dub films, teleconference sessions, and create special effects. The actual applications of the technology would go far beyond this.
It wouldn’t be until 2016 that a further pair of papers from the Technical University of Munich and the University of Washington would demonstrate the creation of deepfakes using consumer hardware. The Face2Face project could realistically replace the mouth area in a target video with lip-synced animations. While It didn’t provide a voice, the means of doing so were well established and could be incorporated. The Synthesizing Obama project took less than an hour to create. It used the equivalent of a high-specced gaming machine at the time to create a 66-second video of Obama speaking words he never had.
The first commonly known use of these ideas was, as is often the case, to create fake pornography – often of celebrities – and post it online. This approach has been used not only to generate pornography of celebrities but was also rapidly seen to create pornographic images and videos of individuals for imagined revenge and blackmail purposes.
It wouldn’t be until 2020 that the first publicly documented fraud cases using Deepfaked voice samples would emerge. Attackers used the technology to clone a target’s voice and place calls to companies using it. In October 2021, a fraud attempt utilising this approach was successful, with a company in the UAE losing $35 million to the attack. Given this success, it is no surprise that the tactic is only becoming more common as time goes on.
Deepfakes have also been used to limited effect (so far) in global politics. In March 2022, there emerged a video of Volodymyr Zelensky calling for Ukrainians to put down their weapons and surrender to the Russian invasion. Fortunately, the fake was poorly made, with several visible flaws making it unconvincing to many. It is only a matter of time before these become more common to suit other malicious agendas, financial, criminal, and political, as well as becoming more realistic and convincing.
We are at the point now where free phone applications can use this technology to replace faces in videos or photos and lip-sync public figures to popular songs. They are easy enough to use that there are social media accounts based on sharing deep fake videos of celebrities such as the popular @deeptomcruise TikTok account. Anyone with savvy, persistence, media samples and an average computer can create convincing videos making public figures speak words they have never said.
Since disinformation is easy to disseminate and propaganda a weapon, deepfake social engineering will only become more common, with defences sadly lacking.
What can be done against these attacks? Various companies have developed and published under open source license technologies designed to detect media created with deep fake technology, and guides are available. These are still largely unknown, and the technologies are not widely deployed. Even if they were, the constant improvements to deepfake techniques make them hard for automated systems or the most cautious humans to recognise. MIT has published a site where users can test their perceptions of fakes against natural and artificial videos of Joseph Biden and Donald Trump making statements. The results are not promising for people’s ability to differentiate between the two.
There are several tells to pay attention to, which are becoming more subtle as the technology progresses:
- Look at the face; almost all of the best deepfakes are focused on facial changes or replacements, and there can be signs here
- Examine the forehead and cheeks, look for signs of computer-generated imagery, artefacts such as overly smooth or overly wrinkled skin
- Look at the eyes, eyebrows, and shadows around them; many deepfake algorithms do not account for shadows properly and can slip on the physics of these small facial movements
- If the subject is wearing glasses, is there glare, are there reflections, are there oddities in these movements as the face moves?
- Hair is challenging for these fakes and especially facial hair, which is difficult for the existing algorithms to replicate convincingly
- Are there any other facial markings that you can examine for signs of odd movement or are not present in the actual subject, whether moles, scars, or birthmarks?
- Blinking too much or too little can be a sign of a deepfake video
To test your ability to spot these, use the MIT site above. Given the difficulty of doing so and that audio deepfakes are far more convincing than most videos, how can you protect against these attacks effectively without sinking in more effort than necessary?
The Simple Protection
The easiest way to protect yourself or your organisation against deepfakes is to verify whenever asked to do something unexpected. If you’re being asked to transfer money through a voice call, give the person a phone call rather than relying on VoIP, send them an email, or shoot them an instant message to check. Use a separate channel for the request, so send an email if the call came through a phone. If it was through a social media video call, give them a phone call on a known number. If it’s for a company, get in touch with a third party and check the request against them. No one, not even the supposed CEO, should be able to bypass procedures designed to verify before transferring $35 million to an unknown third party.
While technologies designed to protect against deepfakes may become commonplace, there will always be a constant battle between the two. As known signs of the fakes emerge and become learned, attackers will fix these flaws in their systems and improve the realism of their attacks. Artificial intelligence is being used to try and detect the cracks left behind in the manufacturing process used to change an image’s digital “fingerprint.” However, this technology is not widely known, or accessible. Thus common sense must prevail and The Four eyes principle is adopted; two individuals must approve crucial actions before they are taken. And both must make a determination via a separate and distinct communication channel to the voice, or video instruction.
“I’ll believe it when I see it!” I wouldn’t if I were you, Dishing is here.
Written by James Bore on behalf of Samurai Digital Security Limited
Edited by Dr David J Day