Five programmes to protect you from AI trainers
The use of images available on the internet to train artificial intelligence (AI) models keeps the tension alive between developers and creators. While the legal debate continues, especially within the European Union (EU), the technical reality is clear: any published file is vulnerable to data extraction unless it has explicit restrictions and recognizable security measures. By 2026, the protection of a work will no longer depend on an invisible watermark. The most effective solutions combine cryptographic authentication, exclusion registries, scraping control protocols, and legal guarantees for text and data extraction (ETD).
We analyze the five tools and frameworks that define the current state of creator protection against unauthorized AI training, incorporating the legal perspective of Juan Carlos Guerrero, partner in Intellectual Property and Technology at ECIJA.
Content Credentials
Developed by Adobe as part of the Coalition for Content Provenance and Authenticity (C2PA) standard, these allow embedding verifiable cryptographic metadata that states authorship, editing history, and usage preferences, including the option/tag 'Do not train'. Unlike traditional invisible watermarks, they integrate digital signatures and cryptographic fingerprints into a standard adopted by media and platforms. They do not prevent copying, but leave a verifiable mark. Guerrero points out that these credentials 'enhance traceability and can be decisive in demonstrating that the developer was able to identify the reserved rights'.
In practice, they are useful for photographers, brands, and professionals working in C2PA-compliant environments. Safe Creative, recognized as an official validator since the end of 2025, incorporates the CR logo on registered works, allowing users to verify information about authenticity, origin, and editing.
Spawning Exclusion Registry
The Spawning platform promotes the 'Do Not Train' Registry and the 'Have I Been Trained?' tool, which allows users to check if an image appears in specific datasets. Its main function is to allow creators to express their opposition and enable developers to consult this information before training models. However, the expert points out that these systems 'do not replace the withdrawal exception provided for in European legislation', but instead reinforce 'the visibility and traceability of the rights holder's objection'.
In Spain, the regime for the exploitation of texts and data was introduced by Royal Decree Law 24/2021 and allows for the lawful use of accessible works for ETD unless the rights holder expressly objects. For this reason, simply being included in a private registry is not enough. It is essential to formulate an unequivocal objection that is readable by automated systems. Spawning also promotes the ai.txt protocol, an evolution of the classic robots.txt adapted for AI, which informs tracking systems which content cannot be used for training.
ImageSentinel
This is a research framework aimed at protecting large collections of images from generative models. Instead of simply flagging individual files, it introduces 'sentinel' images into datasets to detect if unauthorized material has been incorporated. It does not prevent use, but can provide relevant evidence in case of litigation. Although still in the academic phase, it is of interest to image banks, institutional archives, and large repositories.
Advanced Disruption Tools
Recent research, such as that conducted by the Commonwealth Scientific and Industrial Research Organisation (CSIRO), explores methods that subtly alter the pixels of images so that AI systems learn distorted representations during training, imperceptible to the human eye.
These types of techniques are known as adversarial defenses and represent an advance over earlier generations of 'anti-AI noise', which are now easily neutralized by automated processes. However, they require technical knowledge to apply and do not guarantee exclusion from training.
ai.txt protocol
This integrates into the server and allows for automatic declaration that the content cannot be used for training or 'fine-tuning' (additional training to specialize models for specific tasks). While it does not block unauthorized downloads or guarantee legal compliance, it can be relevant in a possible legal assessment. Guerrero points out that what matters is that the 'objection is unequivocal and readable by machines. If the developer cannot reasonably identify the restriction, it will be easier for them to invoke the TDM exception. Therefore, it is not a fortress, but rather a technical signal to demonstrate that the objection was detectable by automated processes.
Which tools are left behind?
First-generation invisible watermark systems, such as the NO AI project, or basic web services like the first ArtShield Watermarker, are now insufficient if used in isolation. The strategy of passing an image through AI-generated to have models reject it has lost its effectiveness against more advanced AI systems. Tools like Glaze and Nightshade, which were disruptive in 2023, have had to evolve to adapt to models more resistant to simple interference. Their current usefulness is deterrent and depends on the use of recent versions combined with other protection mechanisms, such as legal copyright.
What to do if the work is part of a training dataset?
Currently, there is no automatic mechanism to ensure the removal of a work included in a training dataset. However, detection by creators is crucial: although 'it does not automatically generate a right to compensation, it constitutes fundamental evidence if integrated into a broader legal strategy,' says Guerrero. The actions to be taken include:
- Formulation or reinforcement of the exception to text and data mining.
- Sending withdrawal or exclusion requests.
- Exploring legal actions for copyright infringement or unfair competition, especially if a valid reservation is ignored.
As for economic compensation, there is also no general mandatory remuneration system for AI training; it depends on voluntary agreements, specific licenses, or individual litigations in which the infringement is demonstrated.
Recommendations
In a multimodal model environment capable of learning from images, text, and video, protection against unauthorized training has ceased to be a one-off action and has become a comprehensive strategy.
It is well established that technical measures work best when combined and are accompanied by a coherent legal and documentary strategy. 'There is a clear difference between those who publish without a strategy and those who combine explicit legal opposition with a machine-readable technical signal and traceability measures,' holds Guerrero. A first step is to register the creative process of works, from sketches to final results, in services like Safe Creative, which allows demonstrating authorship in case of dispute. Moreover, by registering, this platform includes the option for the author to register their objection to the use of their work for training purposes, adding an extra level of protection.
The lawyer insists that: 'the right strategy is not to try to make the work invisible, but to make it difficult to argue that the automated system has not identified the reserved rights of the rights holder.' The fact is that in an environment where style is identity and economic value, protecting it requires the same professionalism with which it is created.