Privacy Scholarship Research Reporter: Issue 1, May 2017 – Algorithms: Privacy Risk and Accountability
Notes from FPF
Through academic, policy, and industry circles, making progress on the cluster of issues related to algorithmic accountability has become a leading priority. The inaugural issue of the Future of Privacy Forum’s Privacy Scholarship Reporter provides a clear and compelling look into some of the most worrisome problems and promising solutions.
Although not everyone is familiar with the specific concept of algorithmic accountability, it covers well-known topics, at least broadly speaking. We all know that debate exists over what individuals and organizations should do to be responsible for their actions and intentions. Some of the discussion focuses on moral accountability and some of it deals with legal liability. Of course, some of the crucial conversations address both issues. After all, moral arguments can be levers that lead to legal reform, especially when the rapid pace of technological development strains historically rooted legal reasoning that’s become out-of-sync with the disruptive times.
To think about algorithmic accountability is to consider situations where decision-making has been delegated to computers. Today, algorithms recommend all kinds of seemingly innocuous things, from what to watch and how to efficiently drive to a destination. But they also play potent roles in socially charged outcomes. For example, algorithms affect what prices different people pay when they shop. Algorithms influence when loans and insurance are withheld. Algorithms impact how people consume news and form political opinions. Algorithms play a role in determining who gets placed on the government’s no-fly list, who is subject to heightened government surveillance, which prisoners are offered parole, how self-driving cars make life-or-death decisions, how facial recognition technology identifies criminals, and how online advertisers detect when consumers are feeling vulnerable. This is a very partial list of algorithmic powers. The reach continues to grow as artificial intelligence matures and big data sets become increasingly inexpensive to procure, create, store, and subject to speedy and fine-tuned analysis.
When we closely examine the impact of algorithms in different contexts, it becomes clear that they are given varying levels of power and autonomy. Algorithms inform human judgment. Algorithms mediate human judgment. And algorithms displace human judgment. Algorithms can be advisors, information shapers, and executives. And algorithms even can be soothsayer that try to foretell what will happen in the future. Unsurprisingly, the wide scope and breadth of algorithmic impact have engendered strong hopes and fears.
What you’ll find in this issue are smart takes on some of the fundamental questions of algorithmic fairness. How can transparency norms be properly applied to opaque algorithmic processes that can seem inscrutable due to computational complexity or intellectual property protections? How can transparency norms be prevented from creating problems concerning privacy and unfairness? How can unfair algorithmic processes be identified and redressed? How can predictive analytics be properly used, given the fact that they’re only making inferences about potentialities? How can implicit and explicit biases be prevented from polluting computations that are represented as objective calculations? How can appropriate opt-in standards be created and enforced when algorithmic surveillance, sorting, and sentencing is becoming ubiquitous? And how can algorithmic analysis of sensitive information enhance social welfare? Are there important scholarship missing from our list? Send your comments or feedback to [email protected]. We look forward to hearing from you.
Evan Selinger, FPF Senior Fellow
The Ethics of Algorithms: Mapping the Debate
B. MITTELSTADT, P. ALLO, M. TADDEO, S. WACHTER, L. FLORIDI
More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper attempts to clarify the ethical importance of algorithmic mediation by: 1) providing a prescriptive map to organize the debate; 2) reviewing the current discussion of ethical aspects of algorithms; and 3) assessing the available literature in order to identify areas requiring further work to develop the ethics of algorithms.
Authors’ Abstract
In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.
“The Ethics of Algorithms: Mapping the Debate” by B. Mittelstadt, P. Allo, M. Taddeo, S. Wachter, L. Floridi Big Data & Society, Vol. 3(2), DOI: 10.1177/2053951716679679, 2016 (November 2016).
Accountability for the Use of Algorithms in a Big Data Environment
A. VEDDER, L. NAUDTS
Decision makers, both in the private and public sphere, increasingly rely on algorithms operating on Big Data. As a result, special mechanisms of accountability concerning the making and deployment of algorithms is becoming more urgent. In the upcoming EU General Data Protection Regulation, concepts such as accountability and transparency are guiding principals. Yet, the authors argue that the accountability mechanisms present in the regulation cannot be applied in a straightforward way to algorithms operating on Big Data. The complexities and the broader scope of algorithms in a Big Data setting call for effective, appropriate accountability mechanisms.
Authors’ Abstract
Accountability is the ability to provide good reasons in order to explain and to justify actions, decisions, and policies for a (hypothetical) forum of persons or organizations. Since decision-makers, both in the private and in the public sphere, increasingly rely on algorithms operating on Big Data for their decision-making, special mechanisms of accountability concerning the making and deployment of algorithms in that setting become gradually more urgent. In the upcoming General Data Protection Regulation, the importance of accountability and closely related concepts, such as transparency, as guiding protection principles, is emphasized. Yet, the accountability mechanisms inherent in the regulation cannot be appropriately applied to algorithms operating on Big Data and their societal impact. First, algorithms are complex. Second, algorithms often operate on a random group-level, which may pose additional difficulties when interpreting and articulating the risks of algorithmic decision-making processes. In light of the possible significance of the impact on human beings, the complexities and the broader scope of algorithms in a big data setting call for accountability mechanisms that transcend the mechanisms that are now inherent in the regulation.
“Accountability for the Use of Algorithms in a Big Data Environment” by A. Vedder, L. Naudts International Review of Law, Computers and Technology, forthcoming (January 2017).
Accountable Algorithms
J. A. KROLL, J. HUEY, S. BAROCAS, E. W. FELTEN, J. R. REIDENBERG, D. G. ROBINSON, H. YU
Many important decisions historically made by people are now made by computers. Algorithms can count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an audit, and grant or deny immigration visas. This paper argues that accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? The authors propose that additional approaches are needed to ensure that automated decision systems — with their potentially incorrect, unjustified or unfair results — are accountable and governable. This article describes a new technological toolkit that can be used to verify that automated decisions comply with key standards of legal fairness.
Authors’ Abstract
Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.
The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.
We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.
The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.
The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.
Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.
“Accountable Algorithms” by J. A. Kroll, J. Huey, S. Barocas, E. W. Felten, J. R. Reidenberg, D. G. Robinson, H. Yu University of Pennsylvania Law Review, Vol. 165, 2017 (Forthcoming), March 2016.
Algorithmic Transparency via Qualitative Input Influence: Theory and Experiments with Learning Systems
A. DATTA, S. SEN, Y. ZICK
In this paper, the authors have developed a formal foundation to improve the transparency of such decision-making systems. Specifically, they introduce a family of Quantitative Input Influence (QII) measures that attempt to capture the degree of influence of inputs on outputs of systems. These measures can provide a foundation for the design of transparency reports that accompany system decisions (e.g. explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination).
Authors’ Abstract
Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque—it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g.,income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algorithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.
“Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems”by A. Datta, S. Sen, Y. Zick Carnegie Mellon University, Pittsburgh, USA, 2016 IEEE Symposium on Security and Privacy.
Data Driven Discrimination at Work
P. T. KIM
A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the bases on which employment decisions are made; and they may exacerbate inequality because error detection is limited and feedback effects can compound bias. Given these risks, this paper argues for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics.
Authors’ Abstract
A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although data algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Algorithms built on inaccurate, biased, or unrepresentative data can produce outcomes biased along lines of race, sex, or other protected characteristics. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the basis on which employment decisions are made; and they may further exacerbate inequality because error detection is limited and feedback effects compound the bias. Given these risks, I argue for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics. Addressing classification bias requires fundamentally rethinking anti-discrimination doctrine. When decision-making algorithms produce biased outcomes, they may seem to resemble familiar disparate impact cases; however, mechanical application of existing doctrine will fail to address the real sources of bias when discrimination is data-driven. A close reading of the statutory text suggests that Title VII directly prohibits classification bias. Framing the problem in terms of classification bias leads to some quite different conclusions about how to apply the anti-discrimination norm to algorithms, suggesting both the possibilities and limits of Title VII’s liability-focused model.
“Data-Driven Discrimination at Work” by P. T. Kim William & Mary Law Review 2017, forthcoming.
Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
T. GEBRU, J. KRAUSE, Y. WANG, DUYUN CHEN, J. DENG, E. LIEBERMAN AIDEN, L. FEI-FEI
As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative to human review. Here, the authors present a method that attempts to determine socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, the authors attempted to determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful.
Authors’ Abstract
The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.
“Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US” by T. Gebru, J. Krause, Y. Wang, Duyun Chen, J. Deng, E. Lieberman Aiden, L. Fei-Fei Cornell University Library, Submitted on 22 Feb 2017 (v1), last revised 2 Mar 2017.
Tackling the Algorithmic Control Crisis – the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents
B. BODO, N. HELBERGER, K. IRION, F. ZUIDERVEEN BORGESIUS, MOLLER, J. MOLLER, B. VAN DER VELDE, N. BOL, B. VAN ES, C. DE VREESE
The objectives of this paper are two-fold. The authors’ first aim is to describe one possible approach to researching the individual and societal effects of algorithmic recommenders, and to share experiences with the readers. The second aim is to contribute to a more fundamental discussion about the ethical and legal issues of “tracking the trackers,” as well as the costs and trade-offs involved. This paper discusses the relative merits, costs and benefits of different approaches to ethically and legally sound research on algorithmic governance. It argues that besides shedding light on how users interact with algorithmic agents, we also need to be able to understand how different methods of monitoring our algorithmically controlled digital environments compare to each other in terms of costs and benefits. The article concludes with a number of concrete suggestions for how to address the practical, ethical and legal challenges of researching algorithms and their effects on users and society.
Authors’ Abstract
Algorithmic agents permeate every instant of our online existence. Based on our digital profiles built from the massive surveillance of our digital existence, algorithmic agents rank search results, filter our emails, hide and show news items on social networks feeds, try to guess what products we might buy next for ourselves and for others, what movies we want to watch, and when we might be pregnant.
Algorithmic agents select, filter, and recommend products, information, and people; they increasingly customize our physical environments, including the temperature and the mood. Increasingly, algorithmic agents don’t just select from the range of human created alternatives, but also they create. Burgeoning algorithmic agents are capable of providing us with content made just for us, and engage with us through one-of-a-kind, personalized interactions. Studying these algorithmic agents presents a host of methodological, ethical, and logistical challenges.
The objectives of our paper are two-fold. The first aim is to describe one possible approach to researching the individual and societal effects of algorithmic recommenders, and to share our experiences with the academic community. The second is to contribute to a more fundamental discussion about the ethical and legal issues of “tracking the trackers”, as well as the costs and trade-offs involved. Our paper will contribute to the discussion on the relative merits, costs and benefits of different approaches to ethically and legally sound research on algorithmic governance.
We will argue that besides shedding light on how users interact with algorithmic agents, we also need to be able to understand how different methods of monitoring our algorithmically controlled digital environments compare to each other in terms of costs and benefits. We conclude our article with a number of concrete suggestions for how to address the practical, ethical and legal challenges of researching algorithms and their effects on users and society.
“Tackling the Algorithmic Control Crisis – the Technical, Legal, and Ethical Challenges of Research into Algorithmic Agents” by B. Bodo, N. Helberger, K. Irion, F. Zuiderveen Borgesius, Moller, J. Moller, B. van der Velde, N. Bol, B. van Es, C. de Vreese 19 YALE J. L. & TECH. 133 (2017).
Why a Right to Explanation of Automated Decision Making Does Not Exist in the General Data Protection Regulation
S. WACHTER, B. MITTELSTADT, L. FLORIDI
This paper argues that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against harmful automated decision-making, and therefore runs the risk of being toothless. The authors propose a number of legislative steps that, they argue, would improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.
Authors’ Abstract
Since approval of the EU General Data Protection Regulation (GDPR) in 2016, it has been widely and repeatedly claimed that a ‘right to explanation’ of decisions made by automated or artificially intelligent algorithmic systems will be legally mandated by the GDPR. This right to explanation is viewed as an ideal mechanism to enhance the accountability and transparency of automated decision-making. However, there are several reasons to doubt both the legal existence and the feasibility of such a right. In contrast to the right to explanation of specific automated decisions claimed elsewhere, the GDPR only mandates that data subjects receive limited information (Articles 13-15) about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems, what we term a ‘right to be informed’. Further, the ambiguity and limited scope of the ‘right not to be subject to automated decision-making’ contained in Article 22 (from which the alleged ‘right to explanation’ stems) raises questions over the protection actually afforded to data subjects. These problems show that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against automated decision-making, and therefore runs the risk of being toothless. We propose a number of legislative steps that, if taken, may improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.
“Why a Right to Explanation of Automated Decision Making Does Not Exist in the General Data Protection Regulation” by S. Wachter, B. Mittelstadt, L. Floridi International Data Privacy Law, Forthcoming.
Exposure Diversity as a Design Principle for Recommender Systems
N. HELBERGER, K. KARPINNEN & L. D’ACUNTO
Some argue that algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct, this paper discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to discourage potential “filter bubbles” rather than create them. Combining insights from democratic theory, computer science, and law, the authors suggest design principles and explore the potential and possible limits of “diversity sensitive design.”
Authors’ Abstract
Personalized recommendations in search engines, social media and also in more traditional media increasingly raise concerns over potentially negative consequences for diversity and the quality of public discourse. The algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct or not, this article discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to break potential ‘filter bubbles’ rather than create them. Combining insights from democratic theory, computer science and law, the article makes suggestions for design principles and explores the potential and possible limits of ‘diversity sensitive design’.
“Exposure Diversity as a Design Principle for Recommender Systems” by N. Helberger, K. Karpinnen & L. D’Acunto Information, Communication and Society Journal (December 2016).