Artists aim to thwart AI with data-poisoning software and legal action

As the use of artificial intelligence (AI) has permeated the creative media space — especially art and design — the definition of intellectual property (IP) seems to be evolving in real time as it becomes increasingly difficult to understand what constitutes plagiarism.

Over the past year, AI-driven art platforms have pushed the limits of IP rights by utilizing extensive data sets for training, often without the explicit permission of the artists who crafted the original works.

For instance, platforms like OpenAI’s DALL-E and Midjourney’s service offer subscription models, indirectly monetizing the copyrighted material that constitutes their training data sets.

In this regard, an important question has emerged: “Do these platforms work within the norms established by the ‘fair use’ doctrine, which in its current iteration allows for copyrighted work to be used for criticism, comment, news reporting, teaching and research purposes?”

Recently, Getty Images, a major supplier of stock photos, initiated lawsuits against Stability AI in both the United States and the United Kingdom. Getty has accused Stability AI’s visual-generating program, Stable Diffusion, of infringing on copyright and trademark laws by using images from its catalog without authorization, particularly those with its watermarks.

However, the plaintiffs must present more comprehensive proof to support their claims, which might prove challenging since Stable Diffusion’s AI has been trained on an enormous cache of 12+ billion compressed pictures.

In another related case, artists Sarah Andersen, Kelly McKernan and Karla Ortiz initiated legal proceedings against Stable Diffusion, Midjourney and the online art community DeviantArt in January, accusing the organizations of infringing the rights of “millions of artists” by training their AI tools using five billion images scraped from the web “without the consent of the original artists.”

AI poisoning software

Responding to the complaints of artists whose works were plagiarized by AI, researchers at the University of Chicago recently released a tool called Nightshade, which enables artists to integrate undetectable alterations into their artwork.

These modifications, while invisible to the human eye, can poison AI training data. Moreover, subtle pixel changes can disrupt AI models’ learning processes, leading to incorrect labeling and recognition.

Even a handful of these images can corrupt the AI’s learning process. For instance, a recent experiment showed that introducing a few dozen misrepresented images was sufficient to impair Stable Diffusion’s output significantly.

The University of Chicago team had previously developed its own tool called Glaze, which was meant to mask an artist’s style from AI detection. Their new offering, Nightshade, is slated for integration with Glaze, expanding its capabilities further.

In a recent interview, Ben Zhao, lead developer for Nightshade, said that tools like his will help nudge companies toward more ethical practices. “I think right now there’s very little incentive for companies to change the way that they have been operating — which is to say, ‘Everything under the sun is ours, and there’s nothing you can do about it.’ I guess we’re just sort of giving them a little bit more nudge toward the ethical front, and we’ll see if it actually happens,” he added.

*An example of Nightshade poisoning art data sets. Source: HyperAllergic*

Despite Nightshade’s potential to safeguard future artwork, Zhao noted that the platform cannot undo the effects on art already processed by older AI models. Moreover, there are concerns about the software’s potential misuse for malicious purposes, such as contaminating large-scale digital image generators.

However, Zhao is confident that this latter use case would be challenging since it requires thousands of poisoned samples.

Recent: AI and pension funds: Is AI a safe bet for retirement investment?

While independent artist Autumn Beverly believes that tools like Nightshade and Glaze have empowered her to share her work online once again without fear of misuse, Marian Mazzone, an expert associated with the Art and Artificial Intelligence Laboratory at Rutgers University, thinks that such tools may not provide a permanent fix, suggesting that artists should pursue legal reforms to address ongoing issues related to AI-generated imagery.

Asif Kamal, CEO of Artfi, a Web3 solution for investing in fine art, told Cointelegraph that creators using AI data-poisoning tools are challenging traditional notions of ownership and authorship while prompting a reevaluation of copyright and creative control:

“The use of data-poisoning tools is raising legal and ethical questions about AI training on publicly available digital artwork. People are debating issues like copyright, fair use and respecting the original creators’ rights. That said, AI companies are now working on various strategies to address the impact of data-poisoning tools like Nightshade and Glaze on their machine-learning models. This includes improving their defenses, enhancing data validation and developing more robust algorithms to identify and mitigate pixel poisoning strategies.”

Yubo Ruan, founder of ParaX, a Web3 platform powered by account abstraction and zero-knowledge virtual machine, told Cointelegraph that as artists continue to adopt AI-poisoning tools, there needs to be a reimagining of what digital art constitutes and how its ownership and originality are determined.

“We need a reevaluation of today’s intellectual property frameworks to accommodate the complexities introduced by these technologies. The use of data-poisoning tools is highlighting legal concerns about consent and copyright infringement, as well as ethical issues related to the use of public artwork without fairly compensating or acknowledging its original owners,” he said.

Stretching IP laws to their limit

Beyond the realm of digital art, the influence of Generative AI is also being noticed across other domains, including academia and video-based content. In July, comedian Sarah Silverman, alongside authors Christopher Golden and Richard Kadrey, took legal action against OpenAI and Meta in a U.S. district court, accusing the tech giants of copyright infringement.

The litigation claims that both OpenAI’s ChatGPT and Meta’s Llama were trained on data sets sourced from illicit “shadow library” sites, allegedly containing the plaintiffs’ copyrighted works. The lawsuits point out specific instances where ChatGPT summarized their books without including copyright management information, using Silverman’s Bedwetter, Golden’s Ararat, and Kadrey’s Sandman Slim as key examples.

Separately, the lawsuit against Meta asserts that the company’s Llama models were trained using data sets from similarly questionable origins, specifically citing The Pile from EleutherAI, which reportedly includes content from the private tracker Bibliotik.

Recent: Real AI use cases in crypto: Crypto-based AI markets, and AI financial analysis

The authors asserted that they never consented to their works being used in such a manner and are therefore seeking damages and restitution.

As we move toward a future driven by AI tech, many companies seem to be grappling with the immensity of the technological proposition put forth by this burgeoning paradigm.

While companies like Adobe have started using a mark to flag AI-generated data, companies like Google and Microsoft have said they are willing to face any legal heat should customers be sued for copyright infringement while using their generative AI products.