Study: Watermarking AI Deepfake Images Offers Little Security

A team of researchers have developed new attack techniques for watermarking systems exposing their vulnerabilities.

Midjourney-Showcase-WinBuzzer-Art
Showcase of Art created with Midjourney (Image: Midjourney)

Study results released recently by researchers at the University of Maryland assert that the deployment of watermarking techniques for fighting deepfake images may not be as effective as tech giants might hope. , , , and recently added watermarking – a method of adding metadata to digital content to establish its origin– to bolster security measures against produced by their AI models. However, the team at the University of Maryland dispute the efficacy of this technique, stating it could be overcome relatively easily.

The Innate Vulnerability of Image Watermarking

The research paper “Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks”, published on ArXiv, details the findings. The team was co-led by Soheil Feizi, Associate Professor of Computer Science at the University of Maryland. He said in an email to The Register that the study reveals “fundamental and practical vulnerabilities of image watermarking as a defense against deepfakes.”. The research shows a direct trade-off between the false negatives, watermark images considered as unmarked, and false positives, unmarked images identified as watermarked. Essentially, watermark detection systems can offer high performance with few false negatives, or high robustness with fewer false positives, but not both simultaneously.

New Attack Techniques

The researchers have developed new attack techniques for watermarking systems. For low-perturbation images, or those with imperceptible watermarks, they presented a method known as diffusion purification. Originally proposed as a defense against adversarial examples, this technique involves introducing Gaussian noise to an image and using diffusion model denoising processes to remove the added data. For high-perturbation images, perceptible watermarks, the team has devised a spoofing mechanism that could make unmarked images appear to be watermarked, potentially leading to undesired public relations or financial consequences for companies marketing AI models.