Data Protection Law Errors in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez
The following is a guest post by Scott D. Goss, Senior Privacy Counsel, Qualcomm Incorporated, addressing the recent “Right to be Forgotten” decision by the European Court of Justice.
There has been quite a bit of discussion surrounding the European Court of Justice’s judgment in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos (AEPD), Mario Costeja Gonzalez. In particular, some interesting perspectives have been shared by Daniel Solove, Ann Cavoukian and Christopher Wolf, and Martin Husovec. The ruling has been so controversial, newly appointed EU Justice Commissioner, Martine Reicherts delivered a speech defending it. I’d like to add to the discussion.[1] Rather than focusing on the decision’s policy implications or on the practicalities of implementing the Court’s ruling, I’d like to instead offer thoughts on a few points of data protection law.
To start, I don’t think “right to be forgotten” is an apt description of the decision, and instead distorts the discussion. Even if Google were to follow the Court’s ruling to the letter, the information doesn’t cease to exist on the Internet. Rather, the implementation of the Court’s ruling just makes internet content linked to peoples’ names harder to find. The ruling, therefore, could be thought of as, “the right to hide”. Alternatively, the decision could be described as, “the right to force search engines to inaccurately generate results.” I recognize that such a description doesn’t roll off the tongue quite so simply, but I’ll explain why that description is appropriate below.
I believe the Court made a few important legal errors that should be of interest to all businesses that process personal data. First was the Court’s determination that Google was a “controller” as defined under EU data protection law and second was the application of the information relevance question. Then, I’ll explain why “the right to force search engines to inaccurately generate results” may be a more appropriate description of the Court’s ruling.
1. “Controller” status must be determined from the activity giving rise to the complaint
To understand how the Court erred in determining that Google is a “controller” in this case, it helps to understand how search engines work. At a conceptual level, search is comprised of three primary data processing activities: (i) caching all the available content, (ii) indexing the content, and (iii) ranking the content. During the initial caching phase, a search engine’s robot minions scour the Internet noting all the content on the Internet and its location. The cache can be copies of all or parts of the web pages on the Internet. The cache is then indexed to enable much faster searching by sorting the content. Indexing is important because without it searching would take immense computing power and significant time for each page of the Internet to be examined for users’ search queries. Finally, the content within the index is ranked for relevance.
From a data processing perspective, I believe that caching and indexing achieve two objective goals: Determining the available content of the internet and where can it be found. Tellingly, the only time web pages are not cached and indexed is when website publishers, not search engines, include a special code on their web pages instructing search engines to ignore the page. This special code is called robots.txt
The web pages that are cached and indexed could be the text of the Gettysburg address, the biography of Dr. Martin Luther King, Jr., the secret recipe for Coca-Cola, or newspaper articles that include peoples’ names. It is simply a fact that the letters comprising the name “Mario Costeja Gonzalez” could be found on certain web pages. Search engines cannot control that fact any more than they could take a picture of the sky and be said to control the clouds in the picture.
After creating the cache and index, the next step involves ranking the content. Search engine companies employ legions of the world’s best minds and immense resources to determine rank order. Such ranking is subjective and takes judgment. Arguably, ranking search results could be considered a “controller” activity, but the ranking of search results was never at issue in the Costeja Gonzalez case. This is a key point underlying the Court’s errors. Mr. Costeja Gonzalez’s complaint was not that Google ranked search results about him too high (i.e., Google’s search result ranking activity), but rather that the search engine indexed the information at all. The appropriate question, therefore, is whether Google is the “controller” of the index. The question of whether Google’s process of ranking search results confers “controller” status on Google is irrelevant. The Court’s error was to conflate Google’s activity of ranking search results with its caching and indexing of the Internet.
Some may defend the Court by arguing that controller status of some activities automatically anoints controller status on all activities. This would be error. The Article 29 Working Party opined,
[T]he role of processor does not stem from the nature of an entity processing data but from its concrete activities in a specific context. In other words, the same entity may act at the same time as a controller for certain processing operations and as a processor for others, and the qualification as controller or processor has to be assessed with regard to specific sets of data or operations.
Opinion 1/2010 on the concepts of “controller” and “processor”, page 25, emphasis added. In this case, Mr. Costeja Gonzalez’s complaint focused on the presence of certain articles about him in the index. Therefore, the “concrete activities in a specific context” is the act of creating the index and the “specific sets of data” is the index itself. The Article 29 WP went on to give an example of an entity acting as both a controller and a processor of the same data set:
An ISP providing hosting services is in principle a processor for the personal data published online by its customers, who use this ISP for their website hosting and maintenance. If however, the ISP further processes for its own purposes the data contained on the websites then it is the data controller with regard to that specific processing.
I submit that creation of the index is analogous to an ISP hosting service. In creating an index, search engines create a copy of everything on the Internet, sort it, and identify its location. These are objective, computational exercises, not activities where the personal data is noted as such and treated with some separate set of processing. Following the Article 29 Working Party opinion, search engines could be considered processors in the caching and indexing of Internet content because such activities are mere objective and computational exercises, but controllers in the ranking of the content due to the subjective and independent analysis involved.
Further, as argued in the Opinion of Advocate General Jaaskinen, a controller needs to recognize that they are processing personal data and have some intention to process it as personal data. (See paragraph 82). It is the web publishers who decide what content goes into the index. Not only do they have discretion in deciding to publish the content on the Internet in the first instance, but they also have the ability to add the robots.txt code to their web pages which directs search engines to not cache and index. The mark of a controller is one who “determines the purposes and means of the processing of personal data.” (Art. 2, Dir. 95/46 EC). In creation of the index, rather than “determining”, search engines are identifying the activities of others (website publishers) and heeding their instructions (use or non-use of robots.txt). I believe such processing cannot, as a matter of law, rise to the level of “controller” activities.
Finding Google to be a “controller” may have been correct if either the facts or the complaint had been different. Had Mr. Costeja Gonzalez produced evidence that: (i) the web pages he wanted removed contained the “robots.txt” instruction or, (ii) the particular web pages were removed from the Internet by the publisher but not by Google in its search results, then it may be appropriate to hold Google as a “controller” due to these independent activities. Such facts would be similar to the example given by the Article 29 Working Party of an ISP’s independent use of personal data maintained by its web hosting customers. Similarly, had Mr. Costeja Gonzalez’s complaint been that search results regarding his prior bankruptcy been ranked too high, then I could understand (albeit I may still disagree) that Google would be found to be a controller. But that was not his complaint. His complaint was that certain information was included in the index at all – and for that, I believe, Google should have no more control over than it has in the content of the Internet itself.
2. “Relevance” of Personal Data must be evaluated in light of the purpose of the processing.
The Court’s second error arose in the application of the controller’s obligations. Interestingly, after finding that Google is the controller of the index, the Court incorrectly applied the relevancy question. To be processed legitimately, personal data must be “relevant and not excessive in relation to the purposes for which they are collected and/or further processed.” Directive 95/46 EC, Article 6(c) (emphasis added). Relevancy is thus a question in relation to the purpose of the controller – not as to the data subject, a customer, or anyone else. The purpose of the index, in Google’s own words, is to “organize the world’s information and make it universally accessible and useful.” (https://www.google.com/about/company/). With that purpose in mind, all information on the Internet is, by definition, relevant. While clearly there are legal boundaries to the information that Google can make available, the issue is whether privacy law contains one of those boundaries. I suggest that in the context of caching and creating an index of the Internet, it is not.
The court found that Google legitimizes its data processing under the legitimate interest test of Article 7(f) of the Directive. Google’s legitimate interests must be balanced against the data subjects’ fundamental rights under Article 1(1). Since Article 1(1) provides no guidance as to what those rights are (other than “fundamental”), the Court looks to subparagraph (a) of the first paragraph of Article 14. That provision provides data subjects with a right to object to data processing of their personal data, but offers little guidance as to when controllers must oblige. Specifically, it provides in cases of legitimate interest processing, a data subject may,
“object at any time on compelling legitimate grounds relating to his particular situation to the processing of data relating to him, save where otherwise provided by national legislation. Where there is a justified objection, the processing instigated by the controller may no longer involve those data.”
What are those “compelling legitimate grounds” for a “justified objection”? The Court relies on Article 12(b) “the rectification, erasure, or blocking of data the processing of which does not comply with the provisions of this Directive, in particular because of the incomplete or inaccurate nature of the data”. It is here the Court erred.
The Court took the phrase “incomplete or inaccurate nature of the data” and erroneously applied it to the interests of the data subject. Specifically, the Court held that the question is whether the search results were “incomplete or inaccurate” representations of the data subject as he/she exists today. I submit that was not the intent of Article 12(b). Rather, Article 12(b) was referring back to the same use of that phrase in Article 6 providing that:
“personal data must be: . . .(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified.
The question is not whether the search results are “incomplete or inaccurate” representations of Mr. Costeja Gonzalez, but whether the search results are inaccurate as to the purpose of the processing. The purpose of the processing is to copy, sort, and organize the information on the internet. In this case, queries for the characters “Mario Costeja Gonzalez,” displayed articles that he admits were actually published on the Internet. Such results, therefore, are by definition not incomplete or inaccurate as to the purpose of the data processing activity. To put it simply, the Court applied the relevancy test to the wrong party (Mr. Costeja Gonzalez) as opposed to Google and the purpose of its index.
To explain by analogy, examine the same legal tests applied to a credit reporting agency. A credit reporting analogy is helpful because it also has at least three parties involved in the transaction. In the case of the search engine, those parties are the search engine, the data subject, and the end user conducting the search. In the case of credit reporting, the three parties involved are the credit ratings businesses, the consumers who are rated (i.e., the data subjects), and the lenders and other institutions that purchase the reports. It is well-established law that consumers can object to information used by credit ratings businesses as being outdated, irrelevant, or inaccurate. The rationale for this right is found in Article 12(f) and Article 6 in relation to the purpose of the credit reporting processing activity.
The purpose of credit reporting is to provide lenders an opinion on the credit worthiness of the data subject. The credit ratings business must take care that the information they use is not “inaccurate or incomplete” or they jeopardize the purpose of their data processing by generating an erroneous credit score. For example, if a credit reporting agency collected information about consumers’ height or weight, consumers would be able to legitimately object. Consumers’ objections would not be founded on the fact that the information is not representative of who they are – indeed such data may be completely accurate and current. Instead, consumers’ objections would be founded on the fact that height or weight are not relevant for the purpose of assessing consumers’ credit worthiness.
Returning to the Costeja Gonzalez case, the issue was whether the index (not the ranking of such results) should include particular web pages containing the name of Mr. Costeja Gonzalez. Since the Court previously determined that Google was the “controller” of the index (which I contend was error), the Court should have determined Google’s purpose of the index and then set the inquiry as to whether the contested web pages were incorrect, inadequate, irrelevant or excessive as to Google’s purpose. As discussed above, Goolge’s professed goal is to enable the discovery of the world’s information and to that end the purpose of the index is to, as much as technologically possible, catalog the entire Internet – all the good, bad, and ugly. For that purpose, any content on the Internet about the key words “Mario Costeja Gonzalez” is, by definition, not incorrect, inadequate, irrelevant or excessive because the goal is to index everything. Instead, however, the Court erred by asking whether the web pages were incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzalez, the data subject. Appling the relevancy question as to Mr. Costeja Gonzalez is, well, not relevant.
Some may argue that the Court recognized the purpose of the processing when making the relevancy determination by finding that Mr. Costeja Gonzalez’s rights must be balanced against the public’s right to know. By including the public’s interest in the relevancy evaluation, some may argue, the Court has appropriately directed the relevancy inquiry to the right parties. I disagree. First, I do not believe it was appropriate to inquire as to the relevancy of the links vis-a-vis Mr. Costeja Gonzalez in the first instance and, therefore, to balance it against other interests (in this case the public) does not cure the error. Secondly, to weigh the interests of the public, one must presume that the purpose of searching for individuals’ names is to obtain correct, relevant, not inadequate and not excessive information. I do not believe such presumptions are well-founded. For example, someone searching for “Scott Goss” may be searching for all current, relevant, and non-excessive information about me. On the other hand, fifteen years from now perhaps someone is searching for all privacy articles written in 2014 and they happened to know that I wrote one and so searched using my name. One cannot presume to know the purpose of an individual’s search query other than a desire to have access to all the information on the Internet containing the query term.
If not the search engines, where would it be pertinent to ask the question of whether information on the Internet about Mr. Costeja Gonzalez was incorrect, inadequate, irrelevant or excessive? The answer to this question is clear: it is the entities that have undertaken the purpose of publishing information about Mr. Costeja Gonzales. Specifically, website publishers process personal data for the purpose of informing their readers about those individuals. The website publishers, therefore, have the burden to ensure that such information is not incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzales. That there may be an exception in data protection law for web publishers, does not mean that courts should be free to foist obligations onto search engines.
3. The right to force search engines to inaccurately generate results
Finally, the “right to force search engines to inaccurately generate results” is, I believe, an apt description of the ruling. A search engine’s cache and index is supposed to contain the entire web’s information that web publishers want the world to know. Users expect that search engines will identify all information responsive to their queries when they search. Users further expect that search engines will rank all the results based upon their determination of the relevancy of the results in relation to the query. The Court’s ruling forces search engines to generate an incomplete list of search results by gathering all information relevant to the search and then pretending that certain information on the Internet doesn’t exist on the Internet at all. The offending content is still on the Internet, people just cannot rely on finding the content using individuals’ names entered into search engines (at least the search engines on European country-coded domains).
[1] These thoughts are my own and not the company for which I work and I do not profess to be an expert in search technologies or the arguments made by the parties in the case.