Consumer Genetic Testing: Beginning to Assess Privacy Practices

Genetic testing is becoming more widely available to consumers; such testing can be an exciting new opportunity to help individuals flesh out family histories, discover cultural connections, and learn about their personal backgrounds.  The availability of low-cost genetic sequencing and analysis has led to numerous businesses offering a variety of services, including some that provide detailed health and wellness reports that explain how genetics can influence risks for certain diseases.  The enthusiastic public response demonstrates that there is great demand for this knowledge.

But, as with so many new technologies, this new data analysis also raises privacy questions.  DNA can be immensely revealing. And by its nature, DNA includes information about an individual’s close relatives – not just data about the person tested.  The broad US law protecting health privacy, HIPAA, only protects health information when handled by specific types of entities, such as health care providers or health insurers.  If your doctor orders a genetic test, all the providers involved are bound by HIPAA requirements.  But if you order a consumer genetic test on your own, those restrictions are not applicable.

To ensure that genetic information isn’t misused, Congress acted, providing protections in some areas.  The Genetic Information Nondiscrimination Act of 2008 (GINA) prohibits the use of genetic information to make health insurance and employment decisions.  GINA was a landmark when it passed, but it does not provide comprehensive protections.  For example, GINA does not apply to decisions about schools, mortgage lending, or housing. And it excludes other forms of insurance like life insurance, long-term care, and disability insurance, although some states do provide some additional protections in these areas.

Given the gaps in legal protection, it is particularly important that companies offering genetic testing to consumers provide rock solid, legally enforceable commitments to consumers that ensure their data won’t be used to harm them.  And consumers need to look for commitments by companies not to share genetic information without explicit permission, the ability to delete their information, and promises to only use the data for the expected purposes.  FPF has begun discussions with a number of consumer genetics companies and hopes to share best practices guidance in the upcoming months.

But before we begin, there are some useful lessons that FPF can share from our work in other sectors.  It’s useful to understand some of the language that is common to the legal construction of policies and terms of service, as well as the underling protections provided by federal and state consumer protection laws.

  1. Companies do not own your data when they claim a perpetual license to use your information. When you provide a company with data – whether that data is DNA, user comments, profile pictures, or other content that the company needs to hold and use to provide services – the company will often declare that it has a perpetual, royalty-free, worldwide license to use your information.  Corporate intellectual property lawyers insist on this language to give themselves the rights to use the data on an ongoing basis, subject to the restrictions they place on themselves – such restrictions can include commitments to only use data for the services described a company’s policies, and users’ right to demand deletion of the data.  Search the phrase “perpetual license,”  and you will find it in the policies of almost every online service that allows the submission of user content.  This does not mean the company owns your data and can use it for any purpose it pleases –companies typically cannot make a book out of your private photos or publish your DNA.  But several times a year, someone reads “perpetual license” and sounds an alarm that is picked up by the media.  The fact that reporters own publications have the same language in their online policies is typically not considered.  Often, a company will respond by making a cosmetic amendment to its terms, explaining that indeed it does not own consumers’ data.  This story is the Groundhog Day story of privacy.  In 2008, Google’s terms were debated. In 2011, Dropbox was critiqued. In 2012, Twitter and Facebook came under scrutiny. In 2015, it was Microsoft .  Last week, AncestryDNA was the latest company to encounter this flap and accordingly updated its terms to explain that it had never asserted legal ownership of consumers data.  Companies can get ahead of this issue by using clear terms from the outset.  Smart consumers and critics should recognize this legal language by now and appreciate that it does not grant a company  “ownership rights to user data.”  Look for the limitations on what a company can actually do or not do with the data and your rights to opt-in or out.
  2. All bets are not off when a company is sold. The Federal Trade Commission (FTC) has repeatedly made clear that it will hold a successor company responsible to use data only in ways compatible with the original privacy policy.  Back in the ToySmart case, where sensitive childrens’ data was involved, the FTC required that ToySmart’s buyer abide by the terms of the Toysmart privacy statement. If the buyer wanted to make changes to that policy, it could not change how the information previously collected by Toysmart was used, unless it provided notice to consumers and obtained their affirmative consent (“opt-in”) to the new uses. The FTC will surely hold companies that collect and process DNA to this standard
  3. Policies cannot be changed at any time. The FTC has been clear that material changes to consumer privacy policies can’t be made without first providing prominent notice to consumers and providing them with choices before data is used in any manner inconsistent with terms they were initially provided. So if a company holds sensitive data, it should not claim that it may change its policy at any time and immediately apply the new terms to data it previously collected.  If the change is material, a company may not apply it retroactively without consumers’ express, affirmative consent.

These are just some of the baseline issues that are worth understanding before beginning to think through the important commitments genetics companies can make to promote trust and responsible data use in this emerging industry.  Stay tuned for that effort!

Homomorphic Encryption Signals the Future for Socially Valuable Research on Private Data

Encryption has become a cornerstone of the technologies that support communication, commerce, banking, and myriad other essential activities in today’s digital world. In an announcement this week, Google revealed a new marketing measurement tool that relies on a particular type of advanced encryption to allow advertisers to understand whether their online ads have resulted in in-store purchases. The announcement created controversy because of the types of data and analysis involved, with the theme of media coverage being that “Google knows your credit card purchases.” Although the details were lost in much of the coverage, we were far more intrigued by the apparent advances in homomorphic encryption that Google seems to have achieved in order to apply this double-blind method at scale.

The Importance of Encryption

If well encrypted, even data that is made public—intentionally or due to a data breach—remains totally unintelligible to those who may try to access it. However, data that is well encrypted also loses its utility, as it can no longer be analyzed or used in ways that we might want and need. For example, if cloud-based data is fully encrypted and only the owner holds the key, the data is unreadable by the cloud provider, safe from attackers, and unavailable to law enforcement authorities that might approach the cloud provider. If the key is lost, the data may be lost forever. This level of protection is often sought out for purposes of data security and privacy—but it also means that the cloud provider cannot perform useful computing on the data that we may want performed, such as to easily provide a “search” function. If the data cannot be read, it cannot be searched or analyzed. Similarly, research cannot be conducted on encrypted data.

But what if there were methods of encryption that ensured data was converted in ciphertext, protecting the privacy of individuals, but enabling research to be conducted on the data? The most advanced technique being developed to enable the performance of some basic functions on data that is encrypted– adding, matching, sorting – is known as homomorphic encryption.  This method, recently reviewed in Forbes, has made some great strides in recent years, but has required substantial computing resources and has thus been quite limited in use. Some recent successes have started to make these processes more efficient, exciting researchers who consider fully homomorphic encryption to be a “holy grail” for researchers.

In the words of security expert Bruce Schneier when IBM researcher Craig Gentry first discovered a fully homomorphic cryptosystem: “Visions of a fully homomorphic cryptosystem have been dancing in cryptographers’ heads for thirty years. I never expected to see one. It will be years before a sufficient number of cryptographers examine the algorithm that we can have any confidence that the scheme is secure, but — practicality be damned — this is an amazing piece of work.”

Comparing Datasets through Homomorphic Encryption Methods

One of the reasons researchers are enthused is because very often the data sets they wish to study belong to separate organizations, each who has promised to protect the privacy and personal information of the data subjects. Fully homomorphic encryption provides the ability to generate aggregated reports about the comparisons between separate, fully encrypted, datasets without revealing the underlying raw data. This could prove revolutionary for fields such as medicine, scientific research, and public policy. For example, datasets can be compared to analyze whether people provided with homeless services end up in housing or holding jobs; whether student aid helps students succeed; or whether certain kinds of support can prevent people from being re-admitted to hospitals. These uses all depend on comparing sensitive data held by different parties and subject to strict sharing protections. But homomorphic encryption will allow datasets to be encrypted, thereby protecting personal information from scrutiny, but still compared and analyzed to gain insights from aggregate level summary reports.

Similarly, homomorphic encryption can be used in the fields of advertising and marketing. Google announced that it was using a new double-blind encryption system to enable a de-identified analysis of encrypted data about who has clicked on an advertisement in combination with de-identified data held by companies that maintain credit card purchase records. Google can provide a report to an advertiser that summarizes the relationship between the two databases to conclude, for example, that “5% of the people who clicked on your ad ended up purchasing in your store.”

If the encryption is sound and the data combined is truly unintelligible to both Google and its partners, but the mathematics can still enable useful comparisons, this methodology could be an importance advance for privacy protective research and data sharing. It means that when different researchers have datasets that include private or sensitive information, this methodology could enable valuable insights while respecting individual privacy.

Google seems to have put years of top level research into advancing these privacy protective encryption methods. We expect that after the initial controversy over its application to analyze the effectiveness of advertising, researchers will take a hard look at how these methods can be used for a wide range of socially valuable research. And with respect to advertising itself, it’s certainly good to see real sophisticated privacy-enhancing technologies being used when sensitive data is being analyzed. Although we appreciate the importance of providing users with choices and notices, at the end of the day there is nothing better than scientific, technically advanced protections to ensure personal information is protected.

"Your Phone May Be Tracking Your Every Move; Here's How to Stop It"

During the International Association of Privacy Professional’s Global Privacy Summit 2017, FPF’s CEO, Jules Polonetsky, took a moment to speak with NBC 4 Los Angeles about the privacy implications of granting apps permission to track your location. Jules explained:

“If there’s an appropriate message, they offer everybody who’s been at a bar at midnight downtown a cheap taxi ride home, well that’s kind of cool,” Polonetsky said. “If it’s we know you were at a doctor’s office or at a strip club, or at a casino, or any place where you may be somewhat sensitive about where you are, the idea that that will be used to try to target you later on certainly can be creepy and more than offensive.”

WATCH STORY

Announcing the Inaugural Issue of Future of Privacy Forum's Privacy Scholarship Reporter

Future of Privacy Forum is pleased to announce it has published the inaugural issue of the Privacy Scholarship Reporter. This regular newsletter will highlight recent privacy research and is published by the Privacy and Data Responsibility Research Coordination Network (RCN), an FPF initiative supported by the National Science Foundation.

The RCN is a community of academic researchers and industry leaders fostering industry-academic cooperation to address research priorities identified in the National Privacy Research Strategy (NPRS). If you are interested in advancing the privacy research agenda by creating new partnerships, research, and discussions based on the issues under the NPRS, we encourage you to join the RCN today.

Each issue of the Privacy Scholarship Reporter will focus on one of three research priorities from the NPRS: 1) increasing transparency of data collection, sharing, use, and retention; 2) assuring that information flows and use are consistent with privacy rules; and 3) advancing progress on practical de-identification techniques and policy and reducing privacy risks of analytical algorithms.

The first issue focuses on research regarding mitigation of privacy risks posed by some analytical algorithms. The highlighted papers address issues such as discrimination in the workplace, applying accountability mechanisms to the use of algorithms in a big data environment, and how data from Google Street View is being used to estimate demographics.

READ ISSUE 1

Want the next issue sent directly to your inbox? Join the RCN here.

5th Annual Public Policy Conference on the Law & Economics of Privacy and Data Security

The Program on Economics & Privacy, in partnership with the Future of Privacy Forum, and the Journal of Law, Economics & Policy, will hold its 5th Annual Public Policy Conference on the Law & Economics of Privacy and Data Security, on Thursday, June 8, 2017 at George Mason University Antonin Scalia Law School in Arlington, VA.

Data flows are central to an increasingly large share of the economy. A wide array of products and business models—from the sharing economy and artificial intelligence to autonomous vehicles and embedded medical devices—rely on personal data. Consequently, privacy regulation leaves a large economic footprint. As with any regulatory enterprise, the key to sound data policy is striking a balance between competing interests and norms that leaves consumers better off; finding an approach that addresses privacy concerns, but also supports the benefits of technology is an increasingly complex challenge. Not only is technology continuously advancing, but individual attitudes, expectations, and participation vary greatly. New ideas and approaches to privacy must be identified and developed at the same pace and with the same focus as the technologies they address.

This year’s symposium will include panels on Unfairness under Section 5: Unpacking “Substantial Injury”, Conceptualizing the Benefits and Costs from Data Flows, and The Law and Economics of Data Security.

WHAT

5th Annual Public Policy Conference on the Law & Economics of Privacy and Data Security

WHEN

Thursday, June 8, 2017

8:00 a.m. – 3:40 p.m.

WHERE

George Mason University, Founders Hall

3351 Fairfax Drive

Arlington, VA 22201

REGISTER HERE

VIEW AGENDA

Fifth Annual Public Policy Symposium on the Law & Economics of Privacy and Data Security is supported in part by the National Science Foundation under Grant No. 1654085

The Top 10: Student Privacy News (April – May 2017)

The Future of Privacy Forum tracks student privacy news very closely, and shares relevant news stories with our newsletter subscribers.* Approximately every month, we post “The Top 10,” a blog with our top student privacy stories.

The Top 10

  1. A bipartisan group of senators introduced a bill that would overturn a federal prohibition on tracking the educational and employment outcomes of college students. This legislation “would allow the federal government, as well as families and prospective students, to obtain more accurate and complete data about whether students at a particular college, or in a certain major, graduate on time and find well-paying jobs, among other things.” News articles have reported on reactions to the bill so far, and New America’s new privacy fellow writes about the privacy protections built into the bill. It may be worth reading FPF’s comments to the Commission on Evidence-Based Policymaking on this topic.
  2. President Trump has ordered a review of cybersecurity education and workforce-development efforts as part of his Cybersecurity Executive Order signed on May 11th. Among other departments, the U.S. Department of Education is supposed to provide a report to the President within 120 days with “findings and recommendations regarding how to support the growth and sustainment of the nation’s cybersecurity workforce in both the public and private sectors.” It may be useful to re-read Ben Herold’s EdWeek article on the state of K-12 cybersecurity education.
  3. The Electronic Frontier Foundation (EFF) has released a report, “Spying on Students: School-Issued Devices and Student Privacy,” which was picked up by multiple news outlets. Jim Siegl released a very interesting blog posthighlighting his results when he investigated EFF’s results, and the report was criticized by other privacy experts. Check out this great EdWeek article by Ben Herold.
  4. Edmodo has had a tough week. After reports of a data breach last week, the company was criticized for ad tracking. Some in the media noted that this “revives skepticism surrounding ‘free’ edtech tools.” This follows data of students held by Schoolzilla being exposed by a security researcher in late April (see their statement here).
  5. The Parent Coalition for Student Privacy and Campaign for a Commercial-Free Childhood has released a student privacy parent toolkit (already featured in several news outlets). While alarmist in many sections, the guide also has some useful information for parents. The group is having a webinar on May 23 on the guide.
  6. Last year, the National Academy of Education held a workshop on education research and student privacy. Yesterday, they released their report from the workshop with recommendations.
  7. There has been a big focus in the news lately on top tech companies in schools. Both EdWeek and CNN wrote about how Amazon, Apple, Google, and Microsoft “Battle for K-12 Market, and Loyalties of Educators,” and the NYTimes front page story on Sunday was “How Google Took Over the Classroom” (that article’s author, Natasha Singer, was also interviewed on the radio).
  8. Jeffrey Johnson from Utah Valley University wrote a great paper on “Structural Justice in Student Analytics, or, the Silence of the Bunnies.”
  9. Future Ready Schools released a “Personalized Learning Guidebook Geared to Rural Districts’ Needs.” The Center for Curriculum Redesign also published a report on personalized learning, “clarifying the confusions over terminology and structure, and including its recommendations for progress.” It is worth revisiting Monica Bulger’s fantastic paper on the topic, “Personalized Learning: The Conversations We’re Not Having.” EdWeek also reported that the “Chan Zuckerberg Initiative Gives More for Personalized Learning at State Level,” and CRPE says schools should start “with the ‘why’ in personalized learning.”
  10. Lots of articles on Social-Emotional Learning this month: Audrey Watters names social-emotional learning as a trend to watch; “How a Chicago School Is Using Data to Improve School Climate” and “Teacher Prep Slow to Embrace Social-Emotional Learning” via EdWeek; “ClassDojo app takes mindfulness [meditation] to scale in public education;” new tool Panorama Student Success is an “in-depth online progress report for students” that includes data on “social-emotional well-being;” and EdSurge asks how we can measure SEL skills. In the meantime, EdSurge asks “Social-Emotional Learning Is the Rage in K-12. So Why Not in College?

*Want more news stories? Email Amelia Vance at avance AT fpf.org to subscribe to our student privacy newsletter.

FPF Joins Leading Civil Society Groups, Academics and Companies to Participate in the Work of the Partnership on AI

FPF has been invited to join the Partnership on AI, an organization started by the world’s leading AI researchers. In this capacity, we will work with companies and civil society stakeholders to define and advance a shared vision of AI that benefits people and society. FPF is proud to join this organization and help drive this important work forward.

“The Partnership on AI is a unique organization that is directly engaging with the most important issues that arise from the increasingly sophisticated capabilities of artificial intelligence systems,” said Brenda Leong, FPF Senior Counsel. “As AI technologies advance, fair information practices, data ethics, and new methods of transparency will be essential to shaping AI uses that benefit society. FPF looks forward to sharing our expertise with the Partnership and supporting responsible implementation of artificial intelligence.”

Established last year, the Partnership on AI seeks to:

The Partnership includes forward thinking commercial companies, nonprofit organizations, and other leaders who are developing a diverse, balanced, and global set of perspectives on AI.  The Partnership hopes to address issues including “fairness and inclusivity, explanation and transparency, security and privacy, values and ethics, collaboration between people and AI systems, interoperability of systems, and of the trustworthiness, reliability, containment, safety, and robustness of the technology.”

In light of FPF’s experience and knowledge of digital privacy across a broad variety of technology platforms, we are eager to contribute to the Partnership and explore critical questions – how can we apply good privacy strategies to AI? Are there unique challenges in the future that will require new applications of privacy standards?

We join the Partnership and support our partners’ shared goals: advancing public understanding and awareness of AI, including writing and other communications on core technologies, potential benefits, and costs; and acting as trusted and expert points of contact as questions, concerns, and aspirations arise from the public and others in the area of AI.

READ MORE

AI Ethics: The Privacy Challenge

BRUSSELS PRIVACY SYMPOSIUM

AND CALL FOR PAPERS

AI ETHICS: THE PRIVACY CHALLENGE

The Future of Privacy Forum and the Brussels Privacy Hub of the Vrije Universiteit Brussel

are partnering with IEEE Security & Privacy in a call for papers on

 AI Ethics: The Privacy Challenge

6 NOVEMBER, 2017

Brussels Privacy Symposium • Vrije Universiteit Brussel • Pleinlaan 5, 1050, Brussel

Researchers are encouraged to submit interdisciplinary works in law and policy, computer science and engineering, social studies, and economics for publication in a special issue of IEEE Security & Privacy. Authors of selected submissions will be invited to present their work in a workshop, which will be hosted by the VUB on 6 November 2017, in Brussels, Belgium.

This year’s event follows up on the 2016 Brussels Privacy Symposium regarding Identifiability: Policy and Practical Solutions for Anonymization and Pseudonymization. For 2017, this event will focus on privacy issues surrounding Artificial Intelligence. Enhancing efficiency, increasing safety, improving accuracy, and reducing negative externalities are just some of AI’s key benefits. However, AI also presents risks of opaque decision making, biased algorithms, security and safety vulnerabilities, and upending labor markets. In particular, AI and machine learning challenge traditional notions of privacy and data protection, including individual control, transparency, access, and data minimization. On content and social platforms, it can lead to narrowcasting, discrimination, and filter bubbles.

A group of industry leaders recently established a partnership to study and formulate best practices on AI technologies. Last year, the White House issued a report titled Preparing for the Future of Artificial Intelligence and announced a National Artificial Intelligence Research and Development Strategic Plan, laying out a strategic vision for federally funded AI research and development. These efforts seek to reconcile the tremendous opportunities that machine learning, human–machine teaming, automation, and algorithmic decision making promise in enhanced safety, efficiency gains, and improvements in quality of life, with the legal and ethical issues that these new capabilities present for democratic institutions, human autonomy, and the very fabric of our society.

Papers and Symposium discussion will address the following issues:

• Privacy values in design

• Algorithmic due process and accountability

• Fairness and equity in automated decision making

• Accountable machines

• Formalizing definitions of privacy fairness and equity

• Societal implications of autonomous experimentation

• Deploying machine learning and AI to enhance privacy

• Cybersafety and privacy

READ MORE

FPF Comments on FTC and NHTSA Connected Vehicle Workshop

Future of Privacy Forum submitted written comments to the Federal Trade Commission and the Department of Transportation, National Highway Traffic Safety Administration in response to their request for input on the benefits and privacy and security issues associated with current and future motor vehicles.

FPF commends the FTC and NHTSA for working together to host a public workshop focused on privacy and security issues related to connected vehicles. It is a valuable opportunity to expand the dialogue among regulators, industry, and advocates regarding expectations for consumer privacy in a rapidly evolving field.

As the automotive sector becomes more data-intensive, conversations of this kind are vital for fostering informed and constructive consumer protections. We look forward to participating in the workshop. Below, we highlight several items for consideration by the agencies and for the workshop.

FPF recommends that the FTC and NHTSA:

  1. highlight the importance of transparency and communication around consumer data use, including through the provision of clear user interfaces and resources that are: 1) publicly available; 2) accessible before purchase; 3) reviewable throughout the life of a vehicle; as well as the incorporation of consumer privacy controls when appropriate
  2. understand the importance of distinguishing between types of data in the vehicle context for any regulatory approaches to privacy (i.e. between data that is operationally critical or not, personally identifiable or not, sensitive or not), as well as the importance of accurately mapping data flows in a vehicle before apportioning responsibility between actors;
  3. encourage alignment between federal and state regulatory guidance and encourage industry self-regulatory efforts;
  4. consider the risks of connected vehicle data collection by state and local regulators, and propose guidance resources to support these regulators in data management best practices;
  5. monitor new entrants to the market that may seek to monetize connected vehicle data without fully understanding existing consumer protections; and
  6. recognize that this technological shift will have impacts beyond the automotive sector, particularly in the insurance and credit industries.

A cohosted workshop by NHTSA and the FTC is an important step in enabling advocates, industry, and consumers to build an understanding of the regulatory landscape of this rapidly evolving sector. We commend the agencies for working together, and we look forward to participating in the beginning of this dialogue around an emerging field.

Read the full comments here.

FPF’s extensive work in this area includes:

Our analyses are based on our own research, as well as interactions with members of industry, academics, advocates, and regulators.

Privacy Scholarship Research Reporter: Issue 1, May 2017 – Algorithms: Privacy Risk and Accountability

Notes from FPF

Through academic, policy, and industry circles, making progress on the cluster of issues related to algorithmic accountability has become a leading priority. The inaugural issue of the Future of Privacy Forum’s Privacy Scholarship Reporter provides a clear and compelling look into some of the most worrisome problems and promising solutions.

Although not everyone is familiar with the specific concept of algorithmic accountability, it covers well-known topics, at least broadly speaking. We all know that debate exists over what individuals and organizations should do to be responsible for their actions and intentions. Some of the discussion focuses on moral accountability and some of it deals with legal liability. Of course, some of the crucial conversations address both issues. After all, moral arguments can be levers that lead to legal reform, especially when the rapid pace of technological development strains historically rooted legal reasoning that’s become out-of-sync with the disruptive times.

To think about algorithmic accountability is to consider situations where decision-making has been delegated to computers. Today, algorithms recommend all kinds of seemingly innocuous things, from what to watch and how to efficiently drive to a destination. But they also play potent roles in socially charged outcomes. For example, algorithms affect what prices different people pay when they shop. Algorithms influence when loans and insurance are withheld. Algorithms impact how people consume news and form political opinions. Algorithms play a role in determining who gets placed on the government’s no-fly list, who is subject to heightened government surveillance, which prisoners are offered parole, how self-driving cars make life-or-death decisions, how facial recognition technology identifies criminals, and how online advertisers detect when consumers are feeling vulnerable. This is a very partial list of algorithmic powers. The reach continues to grow as artificial intelligence matures and big data sets become increasingly inexpensive to procure, create, store, and subject to speedy and fine-tuned analysis.

When we closely examine the impact of algorithms in different contexts, it becomes clear that they are given varying levels of power and autonomy. Algorithms inform human judgment. Algorithms mediate human judgment. And algorithms displace human judgment. Algorithms can be advisors, information shapers, and executives. And algorithms even can be soothsayer that try to foretell what will happen in the future. Unsurprisingly, the wide scope and breadth of algorithmic impact have engendered strong hopes and fears.

What you’ll find in this issue are smart takes on some of the fundamental questions of algorithmic fairness. How can transparency norms be properly applied to opaque algorithmic processes that can seem inscrutable due to computational complexity or intellectual property protections? How can transparency norms be prevented from creating problems concerning privacy and unfairness? How can unfair algorithmic processes be identified and redressed? How can predictive analytics be properly used, given the fact that they’re only making inferences about potentialities? How can implicit and explicit biases be prevented from polluting computations that are represented as objective calculations? How can appropriate opt-in standards be created and enforced when algorithmic surveillance, sorting, and sentencing is becoming ubiquitous? And how can algorithmic analysis of sensitive information enhance social welfare? Are there important scholarship missing from our list? Send your comments or feedback to [email protected]. We look forward to hearing from you.

Evan Selinger, FPF Senior Fellow


The Ethics of Algorithms: Mapping the Debate

B. MITTELSTADT, P. ALLO, M. TADDEO, S. WACHTER, L. FLORIDI

More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper attempts to clarify the ethical importance of algorithmic mediation by: 1) providing a prescriptive map to organize the debate; 2) reviewing the current discussion of ethical aspects of algorithms; and 3) assessing the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

Authors’ Abstract

In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

“The Ethics of Algorithms: Mapping the Debate” by B. Mittelstadt, P. Allo, M. Taddeo, S. Wachter, L. Floridi Big Data & Society, Vol. 3(2), DOI: 10.1177/2053951716679679, 2016 (November 2016).

 

Accountability for the Use of Algorithms in a Big Data Environment

A. VEDDER, L. NAUDTS

Decision makers, both in the private and public sphere, increasingly rely on algorithms operating on Big Data. As a result, special mechanisms of accountability concerning the making and deployment of algorithms is becoming more urgent. In the upcoming EU General Data Protection Regulation, concepts such as accountability and transparency are guiding principals. Yet, the authors argue that the accountability mechanisms present in the regulation cannot be applied in a straightforward way to algorithms operating on Big Data. The complexities and the broader scope of algorithms in a Big Data setting call for effective, appropriate accountability mechanisms.

Authors’ Abstract

Accountability is the ability to provide good reasons in order to explain and to justify actions, decisions, and policies for a (hypothetical) forum of persons or organizations. Since decision-makers, both in the private and in the public sphere, increasingly rely on algorithms operating on Big Data for their decision-making, special mechanisms of accountability concerning the making and deployment of algorithms in that setting become gradually more urgent. In the upcoming General Data Protection Regulation, the importance of accountability and closely related concepts, such as transparency, as guiding protection principles, is emphasized. Yet, the accountability mechanisms inherent in the regulation cannot be appropriately applied to algorithms operating on Big Data and their societal impact. First, algorithms are complex. Second, algorithms often operate on a random group-level, which may pose additional difficulties when interpreting and articulating the risks of algorithmic decision-making processes. In light of the possible significance of the impact on human beings, the complexities and the broader scope of algorithms in a big data setting call for accountability mechanisms that transcend the mechanisms that are now inherent in the regulation.

“Accountability for the Use of Algorithms in a Big Data Environment” by A. Vedder, L. Naudts International Review of Law, Computers and Technology, forthcoming (January 2017).

 

Accountable Algorithms

J. A. KROLL, J. HUEY, S. BAROCAS, E. W. FELTEN, J. R. REIDENBERG, D. G. ROBINSON, H. YU

Many important decisions historically made by people are now made by computers. Algorithms can count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an audit, and grant or deny immigration visas. This paper argues that accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? The authors propose that additional approaches are needed to ensure that automated decision systems — with their potentially incorrect, unjustified or unfair results — are accountable and governable. This article describes a new technological toolkit that can be used to verify that automated decisions comply with key standards of legal fairness.

Authors’ Abstract

Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.

The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.

We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.

The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.

The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.

Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.

“Accountable Algorithms” by J. A. Kroll, J. Huey, S. Barocas, E. W. Felten, J. R. Reidenberg, D. G. Robinson, H. Yu University of Pennsylvania Law Review, Vol. 165, 2017 (Forthcoming), March 2016.

 

Algorithmic Transparency via Qualitative Input Influence: Theory and Experiments with Learning Systems

A. DATTA, S. SEN, Y. ZICK

In this paper, the authors have developed a formal foundation to improve the transparency of such decision-making systems. Specifically, they introduce a family of Quantitative Input Influence (QII) measures that attempt to capture the degree of influence of inputs on outputs of systems. These measures can provide a foundation for the design of transparency reports that accompany system decisions (e.g. explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination).

Authors’ Abstract

Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque—it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g.,income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algorithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.

“Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems”by A. Datta, S. Sen, Y. Zick Carnegie Mellon University, Pittsburgh, USA, 2016 IEEE Symposium on Security and Privacy.

 

Data Driven Discrimination at Work

P. T. KIM

A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the bases on which employment decisions are made; and they may exacerbate inequality because error detection is limited and feedback effects can compound bias. Given these risks, this paper argues for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics.

Authors’ Abstract

A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although data algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Algorithms built on inaccurate, biased, or unrepresentative data can produce outcomes biased along lines of race, sex, or other protected characteristics. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the basis on which employment decisions are made; and they may further exacerbate inequality because error detection is limited and feedback effects compound the bias. Given these risks, I argue for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics. Addressing classification bias requires fundamentally rethinking anti-discrimination doctrine. When decision-making algorithms produce biased outcomes, they may seem to resemble familiar disparate impact cases; however, mechanical application of existing doctrine will fail to address the real sources of bias when discrimination is data-driven. A close reading of the statutory text suggests that Title VII directly prohibits classification bias. Framing the problem in terms of classification bias leads to some quite different conclusions about how to apply the anti-discrimination norm to algorithms, suggesting both the possibilities and limits of Title VII’s liability-focused model.

“Data-Driven Discrimination at Work” by P. T. Kim William & Mary Law Review 2017, forthcoming.

 

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

T. GEBRU, J. KRAUSE, Y. WANG, DUYUN CHEN, J. DENG, E. LIEBERMAN AIDEN, L. FEI-FEI

As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative to human review. Here, the authors present a method that attempts to determine socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, the authors attempted to determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful.

Authors’ Abstract

The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.

“Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US” by T. Gebru, J. Krause, Y. Wang, Duyun Chen, J. Deng, E. Lieberman Aiden, L. Fei-Fei Cornell University Library, Submitted on 22 Feb 2017 (v1), last revised 2 Mar 2017.

 

Tackling the Algorithmic Control Crisis – the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents

B. BODO, N. HELBERGER, K. IRION, F. ZUIDERVEEN BORGESIUS, MOLLER, J. MOLLER, B. VAN DER VELDE, N. BOL, B. VAN ES, C. DE VREESE

The objectives of this paper are two-fold. The authors’ first aim is to describe one possible approach to researching the individual and societal effects of algorithmic recommenders, and to share experiences with the readers. The second aim is to contribute to a more fundamental discussion about the ethical and legal issues of “tracking the trackers,” as well as the costs and trade-offs involved. This paper discusses the relative merits, costs and benefits of different approaches to ethically and legally sound research on algorithmic governance. It argues that besides shedding light on how users interact with algorithmic agents, we also need to be able to understand how different methods of monitoring our algorithmically controlled digital environments compare to each other in terms of costs and benefits. The article concludes with a number of concrete suggestions for how to address the practical, ethical and legal challenges of researching algorithms and their effects on users and society.

Authors’ Abstract

Algorithmic agents permeate every instant of our online existence. Based on our digital profiles built from the massive surveillance of our digital existence, algorithmic agents rank search results, filter our emails, hide and show news items on social networks feeds, try to guess what products we might buy next for ourselves and for others, what movies we want to watch, and when we might be pregnant.

Algorithmic agents select, filter, and recommend products, information, and people; they increasingly customize our physical environments, including the temperature and the mood. Increasingly, algorithmic agents don’t just select from the range of human created alternatives, but also they create. Burgeoning algorithmic agents are capable of providing us with content made just for us, and engage with us through one-of-a-kind, personalized interactions. Studying these algorithmic agents presents a host of methodological, ethical, and logistical challenges.

The objectives of our paper are two-fold. The first aim is to describe one possible approach to researching the individual and societal effects of algorithmic recommenders, and to share our experiences with the academic community. The second is to contribute to a more fundamental discussion about the ethical and legal issues of “tracking the trackers”, as well as the costs and trade-offs involved. Our paper will contribute to the discussion on the relative merits, costs and benefits of different approaches to ethically and legally sound research on algorithmic governance.

We will argue that besides shedding light on how users interact with algorithmic agents, we also need to be able to understand how different methods of monitoring our algorithmically controlled digital environments compare to each other in terms of costs and benefits. We conclude our article with a number of concrete suggestions for how to address the practical, ethical and legal challenges of researching algorithms and their effects on users and society.

“Tackling the Algorithmic Control Crisis – the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents” by B. Bodo, N. Helberger, K. Irion, F. Zuiderveen Borgesius, Moller, J. Moller, B. van der Velde, N. Bol, B. van Es, C. de Vreese 19 YALE J. L. & TECH. 133 (2017).

 

Why a Right to Explanation of Automated Decision Making Does Not Exist in the General Data Protection Regulation

S. WACHTER, B. MITTELSTADT, L. FLORIDI

This paper argues that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against harmful automated decision-making, and therefore runs the risk of being toothless. The authors propose a number of legislative steps that, they argue, would improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.

Authors’ Abstract

Since approval of the EU General Data Protection Regulation (GDPR) in 2016, it has been widely and repeatedly claimed that a ‘right to explanation’ of decisions made by automated or artificially intelligent algorithmic systems will be legally mandated by the GDPR. This right to explanation is viewed as an ideal mechanism to enhance the accountability and transparency of automated decision-making. However, there are several reasons to doubt both the legal existence and the feasibility of such a right. In contrast to the right to explanation of specific automated decisions claimed elsewhere, the GDPR only mandates that data subjects receive limited information (Articles 13-15) about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems, what we term a ‘right to be informed’. Further, the ambiguity and limited scope of the ‘right not to be subject to automated decision-making’ contained in Article 22 (from which the alleged ‘right to explanation’ stems) raises questions over the protection actually afforded to data subjects. These problems show that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against automated decision-making, and therefore runs the risk of being toothless. We propose a number of legislative steps that, if taken, may improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.

“Why a Right to Explanation of Automated Decision Making Does Not Exist in the General Data Protection Regulation” by S. Wachter, B. Mittelstadt, L. Floridi International Data Privacy Law, Forthcoming.

 

Exposure Diversity as a Design Principle for Recommender Systems

N. HELBERGER, K. KARPINNEN & L. D’ACUNTO

Some argue that algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct, this paper discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to discourage potential “filter bubbles” rather than create them. Combining insights from democratic theory, computer science, and law, the authors suggest design principles and explore the potential and possible limits of “diversity sensitive design.”

Authors’ Abstract

Personalized recommendations in search engines, social media and also in more traditional media increasingly raise concerns over potentially negative consequences for diversity and the quality of public discourse. The algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct or not, this article discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to break potential ‘filter bubbles’ rather than create them. Combining insights from democratic theory, computer science and law, the article makes suggestions for design principles and explores the potential and possible limits of ‘diversity sensitive design’.

“Exposure Diversity as a Design Principle for Recommender Systems” by N. Helberger, K. Karpinnen & L. D’Acunto Information, Communication and Society Journal (December 2016).