Last week the major search engines presented before the Article 29 Working Party of European data regulators. Brendon Lynch of Microsoft had this handy chart which gave an overview of how the companies handle search data.
One thing I haven’t seen discussed is how companies are – or are not – handling the very tricky issue of stripping our proper names from search records. The easiest obvious way that a user could be potentially indentified from search logfiles (other than via legal process) is going to be when they have searched for their own name. Pulling out proper names is actually a difficult process, requiring a data base of all names in all languages used, leaving in the many “famous” names”, doing some calculations around velocity, etc. Not a trivial effort, but achievable in an useful, albeit imperfect manner.
We note that Yahoo is the only of the listed companies that has applied a public retention period to ad-serving log-files. In Europe, behavioral advertising service provider Wunderloop has qualified for data regulator-backed privacy seals by committing to very limited log-file retention periods. In the Federal Trade Commission Staff Report on Behavioral Advertising, released last week, clear guidance was provided that data should only be kept for as long as needed for the service. This would be a very good time for the big 3, plus many of the ad networks, to begin the process of figuring out how long they really need records of the Web sites a user has visited. Why wait for stress from EU regulators, threats of US regulation and press criticism? This effort is not as tricky to handle, nor as difficult a challenge to take on.