Framing the "Big Data Industry"


For all its hype, discussions about Big Data often still devolve into debates about buzzwords and concepts like business intelligence, data analytics, and machine learning. Hidden in each of these terms are important privacy and ethical considerations. A recent article by Kirsten Martin in MIS Quarterly Executive attempts to bring these considerations to the surface by moving past framing Big Data as merely some business asset or computational technique. Instead, Martin suggests analyzing risks and rewards at a macro-level by looking at the entire Big Data ecosystem, which she terms the Big Data Industry (BDI).

Yes, her paper still largely focuses on the negative impacts of Big Data, but instead of a general sense of doom-and-gloom, her focus is on a systemic analysis of where the data industry faces specific challenges. Though the article is peppered with examples of privacy-invading headlines, like Target’s purported ability to predict pregnancy, her framing is particularly helpful because it largely divorces the “risks” posed by Big Data from individualized company practices, anecdotes, and hypotheticals. Instead, she describes the entire Big Data information supply chain from upstream data sources to downstream data uses. Consumer-facing firms, tracking companies, and data aggregators — or data brokers — work together to exchange information and add more value to different data sources.

Martin breaks down the different negative effects that can impact individuals at different points in the supply chain. She highlights some of the existing concerns around downstream uses of Big Data. For example, she notes that both incorrect and correct inferences about individuals could limit individual’s opportunities, encourage consumer manipulation, and ultimately be viewed as being disrespectful to individual concerns. While these sorts of Big Data harms have been long debated, Martin places them on a spectrum alongside concerns raised by upstream suppliers of data, including poor data quality, biases in the data, and privacy issues in the collection and sharing of information. Analogizing to how food providers have become responsible for everything from labor conditions to how products are farmed, she argues that Big Data Industry players, by choosing and creating supply chains, similarly become “responsible for the conduct and treatment of users throughout the chain.”

By looking at Big Data as one complete supply chain, Martin appears to believe it will be easier for members of the Big Data Industry to identify and monitor economical and ethical issues with the supply chain. Yet problems also exist across this nascent industry. Even if we can effectively understand data supply chains, Martin is perhaps more concerned with the systemic issues she sees in the BDI. Specifically, the norms and practices currently being established throughout the entire data supply chain give rise to “everyone does it” ethical questions, and the BDI, in particular, poses two pivotal ethical considerations.

First, data supply chains may create negative externalities, especially in aggregate. Air pollution, for example, can become a generalized societal problem through global warming, and the harm from actions across the manufacturing industry can be considerably greater than the pollution caused by any individual company. Martin posits the Big Data Industry presents a similar dynamic, wherein every member that captures, aggregates, or uses information creates costs to society in the form of surveillance. By contributing to a “larger system of surveillance” and by frequently remaining invisible and out-of-sight to individuals, the BDI may be generating an informational power imbalance. Perhaps because individual companies that are part of the BDI fail to see themselves as part of a larger data ecosystem, few companies have been put in a position to take account of — or even to consider — that their data practices may give rise to such a negative externality.

Second, the Big Data Industry may foster “destructive demand” for consumer-facing companies to collect and sell increasing amounts of consumer data with lower standards. According to Martin, demand can become destructive (1) when a primary markets that promise a customer-facing relationship become a front for a secondary market, (2) when the standards and quality of the secondary market are less than the primary market, and (3) when those consumer-facing companies have limited accountability to consumers for their transactions and dealings in the secondary market. Martin sees a cautionary tale for the BDI in the recent mortgage crisis and the role that mortgage-backed securities played in warping the financial industry. She warns that problems are inevitable as the buying and selling of consumer data becomes more important than “selling an application or providing a service.” Invoking the specter of the data broker boogeyman, Martin argues that consumer-facing organizations lack accountability for their activities in the secondary data market, particularly so long as consumers remain in the dark as to what is going on behind the scenes in the greater BDI.

So how can the Big Data Industry address these concerns? She places much of her faith in the hope that organizations like the Census Bureau that have “unique influence” as well as “providers of key products within the Big Data Industry, such as Palantir, Microsoft, SAP, IBM” can help shape sustainable industry practices moving forward. These practices would embody a number of different solutions under the rubrics of data stewardship, data integrity, and data due process. Many of the proposals under the first two amount to endorsing additional transparency mechanisms. For example, publicly linking companies through a larger supply chain could create “a vested interest in ensuring others in the chain uphold data stewardship and data due process practices.”

Data due process, on the other hand, would help firms to “internalize the cost of surveillance.” Additional internal oversight and due process procedures would, according to Martin, “increase the cost of holding individualized yet comprehensive data and internalize the cost of contributing to surveillance.” As to what these mechanisms could look like, Martin points to ideas like consumer subject review boards, which was first popularized at a Future of Privacy Forum event two years ago and is an effort we have continued to expand upon. The call for data integrity professionals mirrors the notion of “algorithmists” that could monitor not just the quality of upstream data sources but downstream data uses. (As an aside, she chastises business schools, who, even as they race to train Big Data professionals, do not require business students to take courses in ethics.) Effective ethical reviews would require such professionals, which could potentially mitigate some of risks inherent in the data supply chain.

While Martin’s proposals are not a panacea, industry and regulators alike should take her suggestions seriously. Her framing of a greater Big Data Industry provides a path forward for companies — and regulators and watchdogs — to better target their efforts to promote public trust in Big Data. She has identified places in the information supply chain where certain industry segments may need to get “more skin in the game” so to speak. And, at the very least, Martin has moved Big Data from amorphous buzzword to a full-fledged ecosystem with some shape to it.

-Joseph Jerome, Policy Counsel