CDT's "Always On" – the Digital Student

FPF attended the “Always On – Digital Student” forum hosted by the Center for Democracy & Technology and the Data Quality Campaign on September 24, 2014. 

Nuala O’Connor, President and CEO of CDT; Aimee Guidera, founder and Executive Director, DQC; and Brenda Leong, FPF Education Policy Fellow.

The Digital Student Challenge

Jon Phillips of Dell opened the session with a challenge that we need to change the learning process as well as change how we measure the learning that is occurring.  He emphasized the need for students to be more active players in their journey, with ownership of the goals and the process to achieve them.  This is a growing refrain in privacy circles, recently addressed by FPF’s parent blogger, Olga Garcia-Kaplan.

Phillips emphasized the 4 aspects of good digital design:

–       helping the teacher and student orchestrate the learning environment

–       creating a collaborative environment for meaningful learning to occur

–       maintaining focus on the critical value of experiential learning

–       applying assistive or adaptive tools to make the tech a tool that enable learning for all students

Phillips’ key is that true personalization should be the central tenet, while allowing students a personal voice and self-direction to ensure they understand the relevance of the learning process for their own goals.  He sees personalization as:  digital literacy; data security; and integration. He also pointed out that in such a model, even with an increasing role for technology, the teacher’s impact is dramatically important for success.  Because teachers can be overwhelmed with technology opportunities, we should offer help to deliver the relevant and effective tools they can use.

Phillips concluded with the challenge to create the freedom for teachers to take risks, and for those risks to be rewarded, and to find ways to bring students into the technology conversation and process.

Principles and Ethics

Following Phillips’ presentation, small groups of attendees addressed a variety of questions facing the edtech community, which were then pulled into a larger discussion with a panel of privacy experts.  These addressed many of the key issues that FPF is currently working on and publicizing.  First, participants agreed that there are many “basics” that most schools, companies, and parents agree on for privacy and data security protections…many of the things that schools have always done (grade work, take attendance, discipline students) but is now digitized and thus maintained, evaluated, and aggregated in ways not historically possible.

Most attendees agreed that after some of this “low hanging fruit” is addressed, the tougher issues of ethics in the new uses of data previously inconceivable must be addressed.  The role of parents, the possibilities of certification processes for specific education apps, within the need to clarify what a student’s expectation of privacy really is, or should be, were all explored.

The panel also agreed that part of the way forward is to clarify the conversation, separating out not only “privacy” from “security” as individual requirements and challenges, but also to disentangle privacy-specific concerns from the broader curriculum and educational reform movements.  FPF recently published. on the ethics of student privacy, and the trust required between parents and schools.

Importantly, as pointed out by panelist Shannon Sevier, President of the National PTA, parents must be included in the conversation.  Since most parents will not be technology experts, much less privacy savvy, communication must include the what/why/how of data collection and use by schools, as well as clear descriptions of the benefits, rather than the rarer (but media-attractive) scare stories. A blended approach to classroom and homework is an important part of changing the way learning occurs, but safeguards are needed to ensure equity of access to the various technology portals, something not yet available for all students.

All agreed that today, use of technology is growing and changing faster than best practices and communication have adapted, so it is incumbent on companies and privacy advocates to help families and policymakers catch up, with outreach and programming that helps make clear what the existing rules are, and guides the conversation toward future enforceable standards.

At the end of the day, many voices agreed that citizenship, while moving into a digital age, is still a question of values and judgment and learned behaviors. Students must be taught their “digital citizenship” with its associated rights and responsibilities, just as they have been in more traditional forms in the past.  FPF continues its work with a group of education and industry stakeholders.  If you are interested in participating, contact us at [email protected].

Transparency About What Data is Used is Key for Parents and Students

Last week, the New York Times hosted a Room for Debate on Protecting Student Privacy in On-Line Learning.  The question: “Is the collection of data from schools an invasion of students’ privacy?”  Jules’ weighed in, making the key point that schools must continuously and effectively communicate with parents on what technology is being used, what data is being collected, and how that information is safeguarded.  FERPA/Sherpa parent blogger Olga Garcia-Kaplan also participated, emphasizing that the focus must stay on the education and needs of individual students.

Thoughts on the Data Innovation Pledge

Yesterday, as he accepted the IAPP Privacy Vanguard award, Intel’s David Hoffman made a “data innovation pledge” that he would work only to promote ethical and innovative uses of data. As someone who only relatively recently entered the privacy world by diving headfirst into the sea of challenges surrounding big data, I think an affirmative pledge of the sort David is proposing is a great idea.

While pledges can be accused of being mere rhetorical flourishes, words do matter. A simple pledge can communicate a good deal — and engage the public in a way that drives conversation forward. Think of Google’s early motto to “don’t be evil.” For years this commitment fueled a large reservoir of trust that many have for the company. With every new product and service that Google releases, its viewed through the lens of whether or not its “evil.” That places a high standard on the folks at Google, and for that, we should pleased.

Of course, pledges present obligations and challenges. Using data only for good presents a host a new questions. As FPF explored in our whitepaper on benefit-risk analysis for Big Data, there are different aspects to consider when evaluating the benefits of data use — and some of these factors are largely subjective. Ethics is a broad field, and it also exposes the challenging philosophical underpinnings of privacy.

The very concept of privacy has always been a philosophical conundrum, but so much of the rise of the privacy profession has focused on compliance issues and the day-to-day reality of data protection. But today, we’re swimming in a sea of data, and all of this information makes us more and more transparent to governments, industry, and each other. It’s the perfect catalyst to consider what the value of “privacy” truly is. Privacy as an information on/off switch may be untenable, but privacy as a broader ethical code makes a lot of sense.

There are models to learn from. As David points out, other professions are bound by ethical codes, and much of that seeps into how we think about privacy. Doctors not only pledge to do no harm, but they also pledge to keep our confidences about our most embarrassing or serious health concerns. Questionable practices around innovation and data in the medical field led to review boards to protect patients and human test subjects and reaffirmed the role of every medical professional to do no evil.

Similar efforts are needed today as everything from our wristwatches to our cars is “datafied.” In particular, I think about all of the debates that have swirled around the use of technology in the classroom in recent years. A data innovation pledge could help relieve worried parents. If Monday’s FTC workshop is indication, similar ethical conversations may even be needed for everyday marketing and advertising.

The fact is that there are a host of different data uses that could benefit from greater public confidence. A data innovation pledge is a good first start. There is no question that companies need to do more to show the public how they are promoting innovative and ethical uses of data. Getting that balance right is tough, but here’s to privacy professionals helping to lead that effort!

-Joseph Jerome, Policy Counsel

FTC Wants Tools to Increase Transparency and Trust in Big Data

However we want to define “Big Data” – and the FTC’s latest workshop on the subject suggests a consensus definition remains elusive – the path forward seems to call for more transparency and the establishment of firmer frameworks on the use of data. As Chairwoman Ramirez suggested in her opening remarks, Big Data calls for a serious conversation about “industry’s ethical obligations as stewards of information detailing nearly every facet of consumers’ lives.”

Part of that challenge is that some Big Data uses are often “discriminatory”. Highlighting findings from his paper on Big Data and discrimination, Solon Barocas began the workshop by noting that whole point of data mining is to differentiate and to draw distinctions. In effect, Big Data is a rational form of discrimination, driven by apparent statistical relationships rather than any capriciousness. When humans introduce unintentional biases into the data, there is no ready solution at a technical or legal level. Barocas called for a conversation for lawyers and public policy makers to have a conversation with the technologists and computer scientists working directly with data analytics – a sentiment echoed when panelists realized a predictive analytics conference was going on simultaneously across town.

But the key takeaway from the workshop wasn’t that Big Data could be used as tool to exclude or include. Everyone in the civil rights community agreed that data could be a good thing, and a number of examples were put forth to suggest once more that data had the potential to be used for good or for ill. Pam Dixon of the World Privacy Forum classifying individuals creates a “data paradox,” where the same data can be used to help or to harm that individual. For our part, FPF released a report alongside the Anti-Defamation League detailing Big Data’s ability to combat discrimination. Instead, there was considerable desire to understand more about industry’s approach to big data. FTC staff repeatedly asked not just for more positive uses of big data by the private sector, but inquired as to what degree of transparency would help policy makers understand Big Data decision-making.

FTC Chief Technologist Latanya Sweeney followed up her study that suggested web searches for African-American names were more likely than searches of white-sounding names to return ads suggesting the person had an arrest record by looking at credit card advertising and website demographics. Sweeney presented evidence that advertisements for harshly criticized credit cards were often directed to the homepage of Omega Psi Phi, a popular black fraternity.

danah boyd observed that there was a general lack of transparency about how Big Data is being used within industry, for a variety of complex reasons. FTC staff and Kristin Amerling from Senate Commerce singled out the opacity surrounding the practices of data brokers when describing some of the obstacles being faced when policy makers try to under how Big Data is being used.

Moreover, while consumers and policy makers are trying to grapple with what companies are doing with their streams of data, industry is also placed in the difficult position of making huge decisions about how that data can be used. For example, boyd cited the challenges JPMorgan Chase faces when using analytics to evaluate human trafficking. She applauded the positive work the company was doing, but noted that expecting it to have the ability or expertise to effectively intervene in trafficking perhaps asks too much. They don’t know when to intervene or whether to contact law enforcement or social services.

These questions are outside the scope of their expertise, but even general use of Big Data can prove challenging for companies. “A lot of the big names are trying their best, but they don’t always know what the best practices should be,” she concluded.

FTC Commissioner Brill explained that her support for a legislative approach to increase transparency and accountability among data brokers, their data sources, and their consumers, was to help consumers and policy makers “begin to understand how these profiles are being used in fact, and whether and under what circumstances they are harming vulnerable populations.” In the meantime, she encouraged industry to take more proactive steps. Specifically, she recommended again that data brokers explore how their clients are using their information and take steps to prevent any inappropriate uses and further inform the public. Companies can begin this work now, and provide all of us with greater insight into – and greater assurances about – their models,” she concluded.

A number of legal regimes may already apply to Big Data, however. Laws that govern the provision of credit, housing, and employment will likely play a role in the Big Data ecosystem. Carol Miaskoff at the Equal Employment Opportunity Commission suggested there was real potential with Big Data to gather information about successful employees and use that to screen people for employment in a way that exacerbates prejudices built into the data. Emphasizing his recent white paper, Peter Swire suggested there were analogies to be made between sectoral regulation in privacy and sectoral legislation in anti-discrimination law. With existing legal laws in place, he argued that it was past time to “go do the research and see what those laws cover” in the context of Big Data.

“Data is the economic lubricant of the economy,” the Better Business Bureau’s C. Lee Peeler argued, and he supported the FTC’s continued efforts to explore the subject of Big Data. He cited earlier efforts by the Commission to examine inner-city marketing practices, which produced a number of best practices still valid today. He encouraged the FTC to look at what companies are doing with Big Data on a self-regulatory basis as a basis for developing workable solutions to potential problems.

So what is the path forward? Because Big Data is, in the words of Promontory’s Michael Spadea, a nascent industry, there is a very real need for guidelines on not just how to evaluate the risks and benefits of Big Data but also how to understand what is ethically appropriate for business. Chris Wolf highlighted FPF’s recent Data-Benefit Analysis and suggested companies were already engaged in detailed analysis of the use of Big Data, though everyone recognized that businesses practices and trade secrets precluded making much of this public.

FTC staff noted there was a “transparency hurdle” to get over in Big Data. Recognizing that “dumping tons of information” onto consumers would be unhelpful, staff picked up on Swire’s suggestion that industry needed some mechanism to justify what is going on to either regulators or self-regulatory bodies. Spadea argued that “the answer isn’t more transparency, but better transparency.” The Electronic Frontier Foundation’s Jeremy Gillula recognized the challenge companies face revealing their “secret sauce,” but encouraged them to look at more way to give consumer more general information about what was going on. Otherwise, he recommended, consumers ought to collect big data on big data and turn data analysis back on data brokers and industry at large through open-source efforts.

At the same time, Institutional Review Boards, which are used in human subject testing research, were again proposed as a model for how companies can begin affirmatively working through these problems. Citing a KPMG report, Chris Wolf insisted that strong governance regimes, including “a strong ethical code, along with process, training, people, and metrics,” were essential to confront the many ethical and philosophical challenges that flirted around the day’s discussions.

Jessica Rich, the Director on the FTC’s Consumer Protection Bureau, cautioned that the FTC would be watching. In the meantime, industry is on notice. The need for clearer data governance frameworks is clear, and careful consideration of Big Data project should be both reflexive and something every industry privacy professional talks about.

//

Other Relevant Reading from the Workshop:

 

iOS 8 and Privacy: Major New Privacy Features

iOS 8 includes several new privacy features founded on Apple’s core privacy principles of consent, choice and transparency. With these principles in mind, Apple created and incorporated increasingly granular controls for location, opportunities for developers to communicate to users how and why they use data, and limits on how third parties can track your device.

Users now have greater visibility regarding application access to location information.

In previous versions of iOS, apps could prompt users for permission to use Location Services, and, once a user gave an app access, the app could access the user’s location any time it was running, including when the app was not on screen (i.e. in the background).  In iOS 8, Location Services has two modes: “While Using the App” – whereby the app can only access location when the app is on screen or made visible to a user by iOS making the status bar blue – or “Always.”  Apps have to decide which Location Services mode to request and are encouraged by Apple to only request access “Always” location permission when users would “thank them for doing so.” In fact, iOS 8 will at times present a reminder notice to users if an app that has “Always” location permission uses Location Services while the app is not on screen.

Users will be able to limit access to their contacts.

In iOS 8, users can use a picker, controlled and mediated by iOS, that allows users to share a specific contact with an app without giving the app access to their entire address book.

Apps will be able to link directly to privacy settings.

With iOS 8, apps will be able to link directly to their settings, including their privacy settings, making it easier for users to control their privacy. Before, apps could only give instructions on how to go to the phone’s settings to change the privacy controls. This new feature makes control over privacy settings more accessible to users.

Apple’s new Health app implements additional protections for user’s health data.

Apple’s new Health app and HealthKit APIs give third party health and fitness apps a secure location to store their data and gives users an easy-to-read dashboard for their health and fitness data. Apple has implemented a number of features and safeguards to protect user privacy. First, a user has full control as to which apps can input data into Health and which apps can access Health data. Second, all Health data on the iOS device is encrypted with keys protected by a user’s passcode. Finally, developers are required to obtain express user consent before sharing Health data with third parties, and even then they may only do so for the limited purpose of providing health or fitness services to the user. These features and restrictions allow users to have control over their HealthKit data.

Apple requires apps accessing sensitive data to have a privacy policy disclosing their practices to users

Apple requires apps that utilize the HealthKit or HomeAPIs, offer third party keyboards, or target kids, to have a privacy policy, supporting industry standards and California law. App privacy policies should include what data is collected, what the app plans to do with that data, and if the app plans to share it with any third parties, who they are. Users will be able to see the privacy policy on the App Store before and after downloading an app.

iOS 8 places additional emphasis on disclosure of why developers want access to data.

Apple strongly encourages developers to explain why their apps request a user’s data or location when a user is prompted to give an app access. Developers can do so in “purpose strings,” which are part of the notice that appears when an app first tries to access a protected data class.

Apple’s iOS encourages a “just in time” model, where users should be prompted for access after they take an action in an app that requires the data.  The “just in time” prompt and access flow is mediated by iOS and replaces consent models such as those consisting of strings of permissions that pop up after installation like a conga line or users having to give an app access to all data if they want to use an app. Moreover, Apple continues its practice of encouraging app developers to only ask for access to data when needed, and to gracefully handle not getting permission to access a user’s data.

MAC address randomization makes it more difficult to track and individualize iOS devices.

Wi-Fi enabled devices generally scan for available wireless networks.  These scans include the device Media Access Control (MAC) address, which is a 12 character string of letters and numbers, required by networking standards to identify a device on a network and assigned by the manufacturer.  Mobile Location Analytic companies have, at times, relied on these scans, and the fact that Wi-Fi devices’ MAC addresses do not change, to track individual mobile devices as they move around a venue.

In iOS 8, Apple devices will generate and use random MAC addresses to passively scan for networks, shielding users’ true MAC addresses until a user decides to associate with a specific network. Randomizing MAC addresses makes this kind of tracking much more difficult. However, your device can still be tracked when you are connected to a Wi-Fi network or using Bluetooth.  FPF’s Mobile Location Analytics Code of Conduct governs the practices of the leading location analytics companies and provides an opt-out from mobile location tracking. Visit Smart-Places for more details or to opt-out.

Summary

iOS 8’s new “prompting with purpose” disclosures, refined location settings, strict requirements for HealthHit, HomeKit, and kids apps, and MAC address randomization will present greater transparency, protection, and control over privacy for iOS users.

Lessons from Fair Lending Law for Fair Marketing and Big Data

Lessons from Fair Lending Law for Fair Marketing and Big Data

Where discrimination presents a real threat, big data need not necessary lead us to a new frontier. Existing laws, including the Equal Credit Opportunity Act and other fair lending laws, provide a number of protections that are relevant when big data is used for online marketing related to lending, housing, and employment. In comments to be presented at the FTC public workshop, Professor Peter Swire will discuss his work in progress entitled Lessons from Fair Lending Law for Fair Marketing and Big Data. Swire explains that fair lending laws already provide guidance as to how to approach discrimination that allegedly has an illegitimate, disparate impact on protected classes. Data actually plays an important role in being able to assess whether a disparate impact exists! Once a disparate impact is shown, the burden shifts to creditors to show their actions have a legitimate business need and that no less reasonable alternative exists. Fair lending enforcement has encouraged the development of rigorous compliance mechanisms, self-testing procedures, and a range of proactive measures by creditors.

Big Data: A Tool for Fighting Discrimination and Empowering Groups

Big Data: A Tool for Fighting Discrimination and Empowering Groups

Even as big data uses are examined for evidence of facilitating unfair and unlawful discrimination, data can help to fight discrimination. It is already being used in myriad ways to protect and to empower vulnerable groups in society. In partnership with the Anti-Defamation League, FPF prepared a report that looked at how businesses, governments, and civil society organizations are leveraging data to provide access to job markets, to uncover discriminatory practices, and to develop new tools to improve education and provide public assistance.  

Big Data: A Tool for Fighting Discrimination and Empowering Groups explains that although big data can introduce hidden biases into information, it can also help dispel existing biases that impair access to good jobs, good education, and opportunity.

Big Data: A Benefit and Risk Analysis

On September 11, 2014, FPF released a whitepaper we hope will help to  frame the big data conversation moving forward and promote better understanding of how big data can shape our lives. Big Data: A Benefit and Risk Analysis provides a practical guide for how benefits can be assessed in the future, but they also show how data is already is being used in the present.

Privacy professionals have become experts at evaluating risk, but moving forward with big data will require rigorous analysis of project benefits to go along with traditional privacy risk assessments. We believe companies or researchers need tools that can help evaluate the cases for the benefits of significant new data uses.  Big Data: A Benefit and Risk Analysis is intended to help companies assess the “raw value” of new uses of big data. Particularly as data projects involve the use of health information or location data, more detailed benefit analyses that clearly identify the beneficiaries of a data project, its size and scope, and that take into account the probability of success and evolving community standards are needed.   We hope this guide will be a helpful tool to ensure that projects go through a process of careful consideration.

Identifying both benefits and risks is a concept grounded in existing law. For example, the Federal Trade Commission weighs the benefits to consumers when evaluating whether business practices are unfair or not. Similarly, the European Article 29 Data Protection Working Party has applied a balancing test to evaluate legitimacy of data processing under the European Data Protection Directive. Big data promises to be a challenging balancing act.

Click here to read the full document: Big Data: A Benefit and Risk Analysis.

Data Protection Law Errors in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos, Mario Costeja Gonzalez

The following is a guest post by Scott D. Goss, Senior Privacy Counsel, Qualcomm Incorporated, addressing the recent “Right to be Forgotten” decision by the European Court of Justice.

There has been quite a bit of discussion surrounding the European Court of Justice’s judgment in Google Spain LS, Google Inc. v. Agencia Espanola de Proteccion de Datos (AEPD), Mario Costeja Gonzalez.  In particular, some interesting perspectives have been shared by Daniel Solove, Ann Cavoukian and Christopher Wolf, and Martin Husovec.  The ruling has been so controversial, newly appointed EU Justice Commissioner, Martine Reicherts delivered a speech defending it.  I’d like to add to the discussion.[1]  Rather than focusing on the decision’s policy implications or on the practicalities of implementing the Court’s ruling, I’d like to instead offer thoughts on a few points of data protection law.

To start, I don’t think “right to be forgotten” is an apt description of the decision, and instead distorts the discussion.  Even if Google were to follow the Court’s ruling to the letter, the information doesn’t cease to exist on the Internet.  Rather, the implementation of the Court’s ruling just makes internet content linked to peoples’ names harder to find.  The ruling, therefore, could be thought of as, “the right to hide”.  Alternatively, the decision could be described as, “the right to force search engines to inaccurately generate results.”  I recognize that such a description doesn’t roll off the tongue quite so simply, but I’ll explain why that description is appropriate below.

I believe the Court made a few important legal errors that should be of interest to all businesses that process personal data.  First was the Court’s determination that Google was a “controller” as defined under EU data protection law and second was the application of the information relevance question.    Then, I’ll explain why “the right to force search engines to inaccurately generate results” may be a more appropriate description of the Court’s ruling.

1.       “Controller” status must be determined from the activity giving rise to the complaint

To understand how the Court erred in determining that Google is a “controller” in this case, it helps to understand how search engines work.  At a conceptual level, search is comprised of three primary data processing activities:  (i) caching all the available content, (ii) indexing the content, and (iii) ranking the content.  During the initial caching phase, a search engine’s robot minions scour the Internet noting all the content on the Internet and its location.  The cache can be copies of all or parts of the web pages on the Internet.  The cache is then indexed to enable much faster searching by sorting the content.  Indexing is important because without it searching would take immense computing power and significant time for each page of the Internet to be examined for users’ search queries.  Finally, the content within the index is ranked for relevance.

From a data processing perspective, I believe that caching and indexing achieve two objective goals: Determining the available content of the internet and where can it be found.  Tellingly, the only time web pages are not cached and indexed is when website publishers, not search engines, include a special code on their web pages instructing search engines to ignore the page.  This special code is called robots.txt

The web pages that are cached and indexed could be the text of the Gettysburg address, the biography of Dr. Martin Luther King, Jr., the secret recipe for Coca-Cola, or newspaper articles that include peoples’ names.  It is simply a fact that the letters comprising the name “Mario Costeja Gonzalez” could be found on certain web pages.  Search engines cannot control that fact any more than they could take a picture of the sky and be said to control the clouds in the picture.

After creating the cache and index, the next step involves ranking the content.  Search engine companies employ legions of the world’s best minds and immense resources to determine rank order.  Such ranking is subjective and takes judgment.  Arguably, ranking search results could be considered a “controller” activity, but the ranking of search results was never at issue in the Costeja Gonzalez case.  This is a key point underlying the Court’s errors.  Mr. Costeja Gonzalez’s complaint was not that Google ranked search results about him too high (i.e., Google’s search result ranking activity), but rather that the search engine indexed the information at all.  The appropriate question, therefore, is whether Google is the “controller” of the index.  The question of whether Google’s process of ranking search results confers “controller” status on Google is irrelevant.  The Court’s error was to conflate Google’s activity of ranking search results with its caching and indexing of the Internet.

Some may defend the Court by arguing that controller status of some activities automatically anoints controller status on all activities.  This would be error.  The Article 29 Working Party opined,

[T]he role of processor does not stem from the nature of an entity processing data but from its concrete activities in a specific context.  In other words, the same entity may act at the same time as a controller for certain processing operations and as a processor for others, and the qualification as controller or processor has to be assessed with regard to specific sets of data or operations.

Opinion 1/2010 on the concepts of “controller” and “processor”, page 25, emphasis added.  In this case, Mr. Costeja Gonzalez’s complaint focused on the presence of certain articles about him in the index.  Therefore, the “concrete activities in a specific context” is the act of creating the index and the “specific sets of data” is the index itself.  The Article 29 WP went on to give an example of an entity acting as both a controller and a processor of the same data set:

An ISP providing hosting services is in principle a processor for the personal data published online by its customers, who use this ISP for their website hosting and maintenance.  If however, the ISP further processes for its own purposes the data contained on the websites then it is the data controller with regard to that specific processing.

I submit that creation of the index is analogous to an ISP hosting service.  In creating an index, search engines create a copy of everything on the Internet, sort it, and identify its location.  These are objective, computational exercises, not activities where the personal data is noted as such and treated with some separate set of processing.  Following the Article 29 Working Party opinion, search engines could be considered processors in the caching and indexing of Internet content because such activities are mere objective and computational exercises, but controllers in the ranking of the content due to the subjective and independent analysis involved.

Further, as argued in the Opinion of Advocate General Jaaskinen, a controller needs to recognize that they are processing personal data and have some intention to process it as personal data. (See paragraph 82).  It is the web publishers who decide what content goes into the index.  Not only do they have discretion in deciding to publish the content on the Internet in the first instance, but they also have the ability to add the robots.txt code to their web pages which directs search engines to not cache and index.  The mark of a controller is one who “determines the purposes and means of the processing of personal data.” (Art. 2, Dir. 95/46 EC).  In creation of the index, rather than “determining”, search engines are identifying the activities of others (website publishers) and heeding their instructions (use or non-use of robots.txt).  I believe such processing cannot, as a matter of law, rise to the level of “controller” activities.

Finding Google to be a “controller” may have been correct if either the facts or the complaint had been different.  Had Mr. Costeja Gonzalez produced evidence that: (i) the web pages he wanted removed contained the “robots.txt” instruction or, (ii) the particular web pages were removed from the Internet by the publisher but not by Google in its search results, then it may be appropriate to hold Google as a “controller” due to these independent activities.  Such facts would be similar to the example given by the Article 29 Working Party of an ISP’s independent use of personal data maintained by its web hosting customers.  Similarly, had Mr. Costeja Gonzalez’s complaint been that search results regarding his prior bankruptcy been ranked too high, then I could understand (albeit I may still disagree) that Google would be found to be a controller.  But that was not his complaint.  His complaint was that certain information was included in the index at all – and for that, I believe, Google should have no more control over than it has in the content of the Internet itself.

2.       “Relevance” of Personal Data must be evaluated in light of the purpose of the processing.

The Court’s second error arose in the application of the controller’s obligations.  Interestingly, after finding that Google is the controller of the index, the Court incorrectly applied the relevancy question.  To be processed legitimately, personal data must be “relevant and not excessive in relation to the purposes for which they are collected and/or further processed.” Directive 95/46 EC, Article 6(c) (emphasis added). Relevancy is thus a question in relation to the purpose of the controller – not as to the data subject, a customer, or anyone else.  The purpose of the index, in Google’s own words, is to “organize the world’s information and make it universally accessible and useful.” (https://www.google.com/about/company/).  With that purpose in mind, all information on the Internet is, by definition, relevant.  While clearly there are legal boundaries to the information that Google can make available, the issue is whether privacy law contains one of those boundaries.  I suggest that in the context of caching and creating an index of the Internet, it is not.

The court found that Google legitimizes its data processing under the legitimate interest test of Article 7(f) of the Directive.  Google’s legitimate interests must be balanced against the data subjects’ fundamental rights under Article 1(1).  Since Article 1(1) provides no guidance as to what those rights are (other than “fundamental”), the Court looks to subparagraph (a) of the first paragraph of Article 14.  That provision provides data subjects with a right to object to data processing of their personal data, but offers little guidance as to when controllers must oblige.  Specifically, it provides in cases of legitimate interest processing, a data subject may,

“object at any time on compelling legitimate grounds relating to his particular situation to the processing of data relating to him, save where otherwise provided by national legislation.  Where there is a justified objection, the processing instigated by the controller may no longer involve those data.”

What are those “compelling legitimate grounds” for a “justified objection”?  The Court relies on Article 12(b) “the rectification, erasure, or blocking of data the processing of which does not comply with the provisions of this Directive, in particular because of the incomplete or inaccurate nature of the data”.  It is here the Court erred.

The Court took the phrase “incomplete or inaccurate nature of the data” and erroneously applied it to the interests of the data subject.  Specifically, the Court held that the question is whether the search results were “incomplete or inaccurate” representations of the data subject as he/she exists today.  I submit that was not the intent of Article 12(b).  Rather, Article 12(b) was referring back to the same use of that phrase in Article 6 providing that:

“personal data must be: . . .(d) accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that data which are inaccurate or incomplete, having regard to the purposes for which they were collected or for which they are further processed, are erased or rectified.

The question is not whether the search results are “incomplete or inaccurate” representations of Mr. Costeja Gonzalez, but whether the search results are inaccurate as to the purpose of the processing.  The purpose of the processing is to copy, sort, and organize the information on the internet.   In this case, queries for the characters “Mario Costeja Gonzalez,” displayed articles that he admits were actually published on the Internet.  Such results, therefore, are by definition not incomplete or inaccurate as to the purpose of the data processing activity.  To put it simply, the Court applied the relevancy test to the wrong party (Mr. Costeja Gonzalez) as opposed to Google and the purpose of its index.

To explain by analogy, examine the same legal tests applied to a credit reporting agency.  A credit reporting analogy is helpful because it also has at least three parties involved in the transaction.  In the case of the search engine, those parties are the search engine, the data subject, and the end user conducting the search.  In the case of credit reporting, the three parties involved are the credit ratings businesses, the consumers who are rated (i.e., the data subjects), and the lenders and other institutions that purchase the reports.  It is well-established law that consumers can object to information used by credit ratings businesses as being outdated, irrelevant, or inaccurate.  The rationale for this right is found in Article 12(f) and Article 6 in relation to the purpose of the credit reporting processing activity.

The purpose of credit reporting is to provide lenders an opinion on the credit worthiness of the data subject.  The credit ratings business must take care that the information they use is not “inaccurate or incomplete” or they jeopardize the purpose of their data processing by generating an erroneous credit score.  For example, if a credit reporting agency collected information about consumers’ height or weight, consumers would be able to legitimately object.  Consumers’ objections would not be founded on the fact that the information is not representative of who they are – indeed such data may be completely accurate and current.  Instead, consumers’ objections would be founded on the fact that height or weight are not relevant for the purpose of assessing consumers’ credit worthiness.

Returning to the Costeja Gonzalez case, the issue was whether the index (not the ranking of such results) should include particular web pages containing the name of Mr. Costeja Gonzalez.  Since the Court previously determined that Google was the “controller” of the index (which I contend was error), the Court should have determined Google’s purpose of the index and then set the inquiry as to whether the contested web pages were incorrect, inadequate, irrelevant or excessive as to Google’s purpose.  As discussed above, Goolge’s professed goal is to enable the discovery of the world’s information and to that end the purpose of the index is to, as much as technologically possible, catalog the entire Internet – all the good, bad, and ugly.  For that purpose, any content on the Internet about the key words “Mario Costeja Gonzalez” is, by definition, not incorrect, inadequate, irrelevant or excessive because the goal is to index everything.  Instead, however, the Court erred by asking whether the web pages were incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzalez, the data subject.  Appling the relevancy question as to Mr. Costeja Gonzalez is, well, not relevant.

Some may argue that the Court recognized the purpose of the processing when making the relevancy determination by finding that Mr. Costeja Gonzalez’s rights must be balanced against the public’s right to know.  By including the public’s interest in the relevancy evaluation, some may argue, the Court has appropriately directed the relevancy inquiry to the right parties.  I disagree.  First, I do not believe it was appropriate to inquire as to the relevancy of the links vis-a-vis Mr. Costeja Gonzalez in the first instance and, therefore, to balance it against other interests (in this case the public) does not cure the error.  Secondly, to weigh the interests of the public, one must presume that the purpose of searching for individuals’ names is to obtain correct, relevant, not inadequate and not excessive information.  I do not believe such presumptions are well-founded.  For example, someone searching for “Scott Goss” may be searching for all current, relevant, and non-excessive information about me.  On the other hand, fifteen years from now perhaps someone is searching for all privacy articles written in 2014 and they happened to know that I wrote one and so searched using my name.  One cannot presume to know the purpose of an individual’s search query other than a desire to have access to all the information on the Internet containing the query term.

If not the search engines, where would it be pertinent to ask the question of whether information on the Internet about Mr. Costeja Gonzalez was incorrect, inadequate, irrelevant or excessive?  The answer to this question is clear:  it is the entities that have undertaken the purpose of publishing information about Mr. Costeja Gonzales.  Specifically, website publishers process personal data for the purpose of informing their readers about those individuals.  The website publishers, therefore, have the burden to ensure that such information is not incorrect, inadequate, irrelevant or excessive as to Mr. Costeja Gonzales.  That there may be an exception in data protection law for web publishers, does not mean that courts should be free to foist obligations onto search engines.

3.       The right to force search engines to inaccurately generate results

Finally, the “right to force search engines to inaccurately generate results” is, I believe, an apt description of the ruling.  A search engine’s cache and index is supposed to contain the entire web’s information that web publishers want the world to know.  Users expect that search engines will identify all information responsive to their queries when they search.  Users further expect that search engines will rank all the results based upon their determination of the relevancy of the results in relation to the query.  The Court’s ruling forces search engines to generate an incomplete list of search results by gathering all information relevant to the search and then pretending that certain information on the Internet doesn’t exist on the Internet at all.  The offending content is still on the Internet, people just cannot rely on finding the content using individuals’ names entered into search engines (at least the search engines on European country-coded domains).

 


[1] These thoughts are my own and not the company for which I work and I do not profess to be an expert in search technologies or the arguments made by the parties in the case.