Alissa Cooper, at the Center for Democracy and Technology, has just published a superb paper on search log files. She does a great job at walking through all the reasons search engines retain a person’s searches, flags the privacy risks of having those queries sitting around log term, and reviews potential solutions.
As search becomes an increasingly essential part of so many Internet users daily lives, the breadth and depth of information contained in query logs grows to unparalleled levels. As a body of data that can reveal the interests, preferences, search strategies, and linguistic behaviors of entire populations, query logs are a true bounty for research of all kinds, conducted internally, at the search engine companies, and externally, by academics and others.
But the great promise of query logs as a research tool is bound by the privacy risks that arise for some of the very same reasons that the logs are so useful in the first place—the richness of detail that they offer about individuals’ lives.
Achieving the right balance between protecting privacy and promoting the utility of the logs is thus difficult but necessary to ensure that Internet users can continue to rely on Web search without fear of adverse privacy consequences.
We will look forward to hosting some frank discussions at the Forum the about the risks and rewards of log file retention, by search engines as well as adservers. As Alissa lays out, there is much more companies can do in this area to maintain the functions users want, while reducing privacy consequences.