Red Lines under the EU AI Act: Understanding the ban of the untargeted scraping of facial images and facial recognition databases
Blog 5 | Red Lines under the EU AI Act Series
This blog is the fifth of a series that explores prohibited AI practices under the EU AI Act and their interplay with existing EU law. You can find the whole series here.
1. Introduction
The fifth blog in the “Red lines under the EU AI Act” series focuses on unpacking the Article 5(1)(e) prohibition to place on the market, put into service, or use AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the Internet or CCTV footage. It is notable how this provision targets specifically the acts necessary prior to engaging in facial recognition itself, which is tackled separately, under a different provision of the AI Act, Article 5(1)(h). A number of key takeaways emerge from our analysis:
- The European Commission Guidelines echo Recital 43 AI Act by acknowledging that the untargeted scraping of facial images is a particularly intrusive practice which “adds to the feeling of mass surveillance and can lead to gross violations of fundamental rights, including the right to privacy”. This, in turn, is consistent with previous case law of Data Protection Authorities (DPAs) on the basis of the GDPR, which remains the most comprehensive protection in facial recognition use-cases;
- The prohibition expressly differentiates between “targeted” and “untargeted” scraping, thereby limiting the scope of its application and excluding qualified “targeted” scraping from its scope;
- An analysis of the practices that fall outside the scope of the AI Act’s prohibition finds that some use-cases, such as the scraping of facial images for training AI models that generate new images about fictitious persons, may lead to increasingly complex compliance scenarios triggering both copyright and data protection rules.
Following this brief introduction, Section 2 outlines the rationale behind the prohibition, while Section 3 notes its specific scope as defined in the differentiation between “targeted” and “untargeted” scraping. Section 4 outlines what falls outside the scope of the prohibition, potentially including use-cases of AI-driven deepfakes, while Section 5 explores the AI Act’s interplay with other relevant areas of EU law, including the GDPR and Law Enforcement Directive (LED). After noting significant cases on facial recognition by DPAs, Section 6 includes concluding reflections and key takeaways.
2. Context and rationale: untargeted scraping of facial images as a particularly intrusive practice posing “unacceptable risk”, consistent with past case law under the GDPR
Article 5(1)(e) AI Act prohibits the creation or expansion of facial recognition databases through the untargeted scraping of internet or CCTV footage. The European Commission’s Guidelines on Prohibited Artificial Intelligence Practices under the AI Act recognize that the untargeted scraping of facial images “seriously interferes with individuals’ right to privacy and data protection and deny those individuals the right to remain anonymous”. This is further supported by Recital 43 AI Act, which recognizes that the untargeted scraping of facial images can add to the feeling of mass surveillance and lead to gross violations of fundamental rights, including the right to privacy.
The context and rationale of the AI Act’s prohibition is consistent with past case law by DPAs across the EU on the basis of the GDPR. Indeed, the expansion and creation of facial recognition databases on the basis of the untargeted scraping of data, including biometric data such as facial images, has been a continuous area of serious concern for DPAs. From 2022 to 2024, several DPAs imposed large fines on Clearview AI for GDPR violations due to practices related to facial recognition, as highlighted in Section 5 of this blog.
3. Defining facial recognition databases and (targeted vs.) untargeted scraping
Article 5(1)(e) AI Act states that the following practice shall be prohibited: “the placing on the market, the putting into service for this specific purpose, or the use of AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV footage” (emphasis added).
Article 5(1)(e) AI Act states that the following practice shall be prohibited: “the placing on the market, the putting into service for this specific purpose, or the use of AI systems that create or expand facial recognition databases through the untargeted scraping of facial images from the internet or CCTV footage” (emphasis added).
Four cumulative conditions must be met for the prohibition to apply:
- The practice must constitute market placement, putting into service for this specific purpose, or usage of the AI system;
- Aim to create or expand facial recognition databases;
- Employ AI tools for untargeted scraping methods; and
- Source images from either the internet or CCTV footage.
The Guidelines clarify that, similarly to the other Article 5 prohibitions, all four cumulative conditions above must be met simultaneously to trigger the prohibition. This approach, which is a consistent element of the AI Act’s full set of prohibited practices, seems to ensure a targeted approach to banning very specific uses of AI technologies. The prohibition applies to both providers and deployers who, in accordance with their responsibilities and placement in the value chain, have a responsibility not to place on the market, put into service, or use AI systems for this specific purpose.
The Guidelines stress that Article 5(1)(e) AI Act does not require that the sole purpose of the database is to be used for facial recognition; it is sufficient that the database can be used for facial recognition. The Guidelines define a “database” in this context as any collection of data or information that is specially organized for rapid search and retrieval by a computer, and may be temporary, centralized or decentralized.
An important distinction in the application of this provision is between targeted and untargeted scraping – the prohibition does not apply to any scraping tool with which a database for face recognition may be constructed or expanded, but only to tools for untargeted scraping. In this context, untargeted scraping is defined as a technique absorbing as much data and information as possible from different sources and without a specific focus to a given individual or group of individuals. This may be done using a variety of scraping tools and techniques, including web crawlers, bots, or other means that allow for the extraction of data from a variety of sources, including CCTV footage, social media, and other websites, in an automatic manner.
It is crucial to determine the precise scope of the scraping, since the Guidelines further note that the prohibition does not cover “targeted” scraping, such as the collection of images or videos of specific individuals or pre-defined groups of persons for law enforcement purposes. Furthermore, in more complex systems combining targeted with untargeted searches, only the untargeted scraping is prohibited.
Notably, the Guidelines highlight that the publication of images on social media by natural persons does not constitute consent for inclusion in facial recognition databases, aligning with the GDPR notion of (valid) consent as a legal basis for processing personal data.
4. What falls out of the scope of the prohibition?
While specifically targeted scraping is in some cases allowed, several other practices fall outside the prohibition’s scope, including the untargeted scraping of biometric data other than facial images (such as voice samples) and, importantly, non-AI scraping methods. The Guidelines also note that AI systems which harvest large amounts of facial images from the internet to build AI models that generate new images about fictitious persons, similarly fall outside the scope of the prohibition.
While the logic behind this last use-case is seemingly to permit the effective training of AI models, and it explicitly falls outside the scope of the prohibition, attention should be paid to the compliance of this practice with both copyright and data protection laws. Indeed, AI systems that scrape large amounts of facial images to build AI models may trigger the dual application of EU copyright rules, which protect the images themselves to the extent they are under copyright protection, and the application of the GDPR, which protects facial images as personal data, or even as special category biometric data where they are processed with the purpose of uniquely identifying a person. While the scope of this prohibition was agreed upon by co-legislators during final negotiations for the AI Act, this particular use-case may not account for the increasing sophistication of AI-driven deepfakes.
In fact, at the time of writing, the European Parliament reportedly reached a political agreement on the AI Act Omnibus wherein the latest compromise text includes a new addition to the list of prohibited practices. Namely, once adopted, non-consensual sexual deepfakes will be banned under the revised AI Act.
It is also worth noting that while this new ban will allow for further protection, it will not cover all use-cases of AI-driven deepfakes, potentially marking an area of continuous, ongoing review by regulators and legislators alike. For this purpose, outside of the Omnibus procedure, the AI Act’s Article 112 empowers the Commission to assess and review the list of prohibited practices on a yearly basis, with the resulting assessment report having to be submitted to the EU Parliament and Council.
5. Interplay with other EU laws: From the GDPR to the LED
5.1. Facial recognition as a long-standing regulatory priority for DPAs across the EU
The creation and expansion of facial recognition databases on the basis of untargeted scraping of facial images has been a prominent area of regulatory intervention on the basis of the GDPR. In February 2022, the Italian DPA (Garante) fined Clearview AI €20 million and imposed a ban on the company’s further collection and processing of data, including biometric data, and ordered the erasure of such data relating to citizens on Italian territory.
In October 2022, the French DPA (CNIL) similarly imposed a fine of €20 million on Clearview AI, recognizing the very serious risk to individuals’ fundamental rights posed by their facial recognition software. In September 2024, in an ex officio investigation, the Dutch DPA (AP) fined Clearview AI €30 million for “illegal data collection for facial recognition.”
In their investigations, DPAs found breaches of the GDPR’s Article 6 (lawfulness of processing), Article 9 (processing of special categories of personal data), and a failure to fulfil data subject rights, particularly those found in Article 15 (right of access) and Article 17 (right to erasure). The Garante also found breaches of key principles of data protection, in particular of lawfulness, fairness and transparency (Article 5(1)(a) GDPR), the purpose limitation principle (Article 5(1)(b) GDPR), and the storage limitation principle (Article 5(1)(e)). As such, in addition to constituting a prohibited practice under the AI Act, the untargeted scraping of facial images for the purposes of creating or expanding a facial recognition database also contravenes several obligations found in the GDPR.
The Guidelines themselves similarly note that, in general, the processing of personal data via the untargeted scraping of the Internet or CCTV material to build up or expand face recognition databases is unlawful, and there is no legal basis under the GDPR for such activity.
5.2. Law Enforcement use of facial recognition databases
Law Enforcement Authorities (LEAs) use facial recognition databases for identification purposes, allowing for the automated identification of individuals that may in some way be related to criminal events, such as suspects, wanted persons, victims, or witnesses. Among the different types of databases used for face matching by LEAs are also databases consisting of surveillance footage or private data sources and open-source data from the internet.
While the AI Act prohibits the creation or expansion of facial recognition databases through the untargeted image scraping from the internet or CCTV footage, the provision does not seem to prohibit the use of already existing databases that were previously created from untargeted scraping of internet or CCTV footage that are used by LEAs for face matching and identification purposes. Hence, there might be a legal gap between the prohibition of the creation of new databases and the expansion of existing databases from image scraping, and the use of such databases that were created prior to the entry into force of the AI Act prohibition.
The AI Act’s Article 5(1)(e) prohibition admits no exceptions for law enforcement use, unlike Article 5(1)(h) on real-time remote biometric identification (to be explored in the final instalment of this blog series), which has a carve-out for competent authorities in public spaces under strict conditions. The AI Act’s blanket ban seems intentional to prevent circumvention through law enforcement justifications.
The LED, the specific legal framework for data protection in law enforcement, takes a more balanced approach: it may permit particularly intrusive practices if proportionate, necessary, and legally grounded. Hence, if a biometric database is strictly necessary, sufficiently targeted (i.e., footage related to a specific investigation), and proportionate for law enforcement purposes, it passes the LED test.
Article 10 LED governs the processing of special categories of data, including biometric data processed for the purpose of uniquely identifying a natural person, and permits such processing only where it is strictly necessary, subject to appropriate safeguards, and authorized by Union or Member State law. Untargeted scraping does not seem to satisfy Article 10 conditions.
Hence, even though the LED does not explicitly prohibit the use of databases from untargeted scraping, it implicitly reaches to the same normative position as the AI Act due to its strict requirements. The primary difference is that the AI Act’s prohibition does not engage with that balancing at all: untargeted scraping is simply prohibited. The two legal instruments thus create overlapping and mutually reinforcing layers of prohibition. One question that remains is whether a database that was created outside of the EU can be used by LEAs in the EU in accordance with the LED or AI Act.
6. Concluding Reflections and Key Takeaways
The AI Act’s prohibition is consistent with previous case law of DPAs, on the basis of the GDPR, which remains the most comprehensive protection in facial recognition use-cases
The prohibition’s differentiation between the targeted and untargeted scraping of facial images, and the subsequent ban of untargeted scraping, is reminiscent of several DPAs’ fines, particularly in the line of Clearview AI cases between 2022 and 2024. DPAs, including the Italian Garante, the Dutch AP, and the CNIL, found that Clearview AI’s facial recognition software breached several of the GDPR’s key principles and obligations.
The prohibition expressly differentiates between “targeted” and “untargeted” scraping, thereby limiting the scope of its application and excluding qualified “targeted” scraping from its scope
The differentiation between targeted and untargeted scraping is also significant because the AI Act does not include a blanket ban on all scraping of facial images. Indeed, it acknowledges that in some cases, such as in law enforcement contexts, targeted scraping may be lawful when strictly necessary and proportionate. The LED sets out specific conditions for such use-cases, which are tightly regulated across the EU. An analysis of the interplay between the LED and AI Act shows an alignment between the two regulations, creating mutually reinforcing layers of prohibition.
Some use-cases, such as the harvesting of facial images for training AI models that generate new images of fictitious persons, may lead to increasingly complex compliance scenarios
When analyzing the practices or use cases that fall outside the scope of the prohibition, we also found that specific AI-driven deepfakes have so far not been captured by Article 5 AI Act. This seems to have similarly been recognized by legislators when, on 11 March 2026, it was reported that the European Parliament reached a political agreement on the AI Act Omnibus, which aims to include a new ban on non-consensual sexual deepfakes. It is worth noting that while this is a development that will allow for further protection, the (new) ban does not cover all AI-driven deepfakes.