What's Scary About Big Data, and How to Confront It
Any discussion surrounding the benefits–and the risks–presented by Big Data often focuses on the far-off future. The world of Minority Report is frequently invoked, but in the wake of April’s “Big Data Week,” it is time to recognize that Big Data is already here. In their recent book, Big Data: A Revolution that Will Transform How We Live, Work, and Think, Viktor Mayer-Schönberger and Kenneth Cukier act as heralds of Big Data, and suggest that the real phenomenon is the “datafication” of our world. They describe the transformation of our entire world into “oceans of data that can be explored” that can provide us with a new perspective on reality. The language and rhetoric in the book highlight Big Data’s potential: the scale of Big Data, they suggest, allows us to “extract new insights” and “create new forms of value” in ways that will fundamentally change how we interact with one another.
These new insights can be used for good or for ill, but that’s true of any new piece of knowledge. What exactly is it then that some find so disconcerting about Big Data?
Mayer-Schönberger and Cukier recognize that Big Data is on a “direct collision course” with our traditional privacy paradigms, and further, that it opens the door to create the sort of propensity models seen in Minority Report. However, the pair are more concerned with what they term the “dictatorship of data.” They fear that well-meaning organizations may “become so fixated on the data, and so obsessed with the power and promise it offers, that [they] fail to appreciate its limitations.”
And these limitations are very real. The popular statistician Nate Silver argues that it is time to admit that “we have a prediction problem. We love to predict things–and we aren’t very good at it.” It is this dynamic that presents the biggest worries about Big Data. Its promise is that by transforming our entire world, our whole experience into data points that numbers will be able to speak for themselves, but this alone will not cure our prediction predilection. As Kate Crawford of Microsoft Research recently pointed out, Big Data is full of hidden biases. “Data and data sets are not objective,” she states. “They are creations of human design.”
Google Flu Trends is often held out as something that can only be done on the scale provided by Big Data. Using aggregated Internet searches to chart the spread of a disease demonstrates how seemingly mundane web browsing can produce new insights, but it is important to recognize the limitations behind the project’s underlying algorithms. Google Flu Trends got things wrong this year. Why? As Google admits, not everyone who searches for “flu” is actually sick. This year, due to extensive media coverage, more people than anticipated were using Google to learn more. The result was that the algorithms behind the scenes began to see signs of the flu’s spread where it didn’t actually exist. Google Flu Trends’ mistake can be excused for a number of reasons: not only is the tool largely a data experiment, but it also has a generally benevolent purpose. Had a similar algorithm informed a decision by the CDC to quarantine a community or otherwise directly impact individuals, it would be a different conversation. Organizations and individuals need to become more aware of the biases and assumptions that underlie our datafied world.
This requires establishing a data conversation among users. In order to strengthen our understanding of individual privacy without cutting off technological innovation, individuals need to be educated about how their data is used. To start this conversation, we need more transparency. Jules Polonetsky and Omer Tene suggest that organizations should disclose the logic underlying their decision-making processes as best as possible without compromising their algorithmic “secret sauce.” This information has two key benefits: it allows us to monitor how data is used, and it also allows individuals to become more active participants in how their data is used.
Today, the data deluge that Big Data presents encourages a passivity and misguided efforts to get off the grid. With an “Internet of Things” ranging from our cars to our appliances, even to our carpets, retreating to our homes and turning off our phones will do little to stem the datafication tide. Transparency for transparency’s sake is meaningless. We need mechanisms to achieve transparency’s benefits. We need to encourage users to see their data as a feature that can be turned on or off, and toggled at will. Letting users declare their own data preferences will encourage individuals to care about what their data says about them and how to actively engage in how their information is processed.
The challenge will be making this process both easily accessible and fun for users. The BlueKai Registry suggests one possible avenue by allowing consumers to see what data companies think about their computer, and Google and Yahoo already offer settings managers for users to select who sees what data. More organizations must think carefully about how best to strike the balance between offering user-friendly and comprehensive controls.
At the same time, transparency also allows experts to police companies in order to monitor, expose, and prevent practices we do not want. Mayer-Schönberger and Cukier call for the rise of the “algorithmist,” a new professional that would evaluate the selection of data sources, the choice of analytical tools, and the algorithms themselves. While offering individuals opportunities to understand and to challenge how decisions about them is important, internal algorithmists alongside the watchful eyes of regulators and privacy advocates can help to ensure that companies are held accountable. This could go a long way toward alleviating fears about Big Data and providing an environment where society can safely maximize its benefits.