Google Case Highlights the Risks and Rewards of Data Retention

January 30, 2006

Appeared in the Toronto Star on January 30, 2006 as Oceans of Data Ripe for Use – and Abuse

Appeared on the BBC on January 30, 2006 as Risks and Rewards of Net Data

The Internet community has been buzzing for the past ten days about the U.S. Department of Justice’ s demand for search data from the world’ s leading search engines. Yahoo!, AOL, and Microsoft have all reportedly complied with the request, however, Google refused, paving the way for a major court battle.

While much of the focus has been on the privacy implications of the USDOJ request, the story highlights a much bigger issue – the significant risks and rewards that arise from retaining enormous amounts of data.

Canadians have become accustomed to protecting their personal information by safeguarding their identification cards, shredding bank statements, or trusting their health provider to protect their medical files, yet they have limited control over search engines, Internet service providers, and e-commerce companies that retain an ever-expanding mountain of data that can reveal personal preferences, interests, and habits.

The USDOJ demand stems from an attempt to prove that legislation, rather than technologies such as content filtering, would be more effective at blocking children’ s access to "harmful" materials. In order to prove its case, it sought data from the leading search engines that would allow it to gauge the amount of available pornography on the Internet as well as the frequency with which Internet users search for such content.

The authorities’ initial data request was stunning for its sheer breadth. The USDOJ requested all web addresses (URLs) contained in the Google database as well as a record of "all queries that have been entered into your company’ s search engine between June 1, 2005 and July 31, 2005." In other words, it wanted a list chronicling every website in Google database along with literally every search request over a two-month period.

When it faced resistance, the USDOJ agreed to a narrower request that included a random sample of one million web addresses as well as a list of every search string during a one-week period.

Although none of this data relates to a specific individual – it covers hundreds of millions of Internet users – the request has still produced a chilling effect as many begin to question whether search requests thought to be anonymous could ultimately be tracked back to them.

In a broader context, the demand also highlights the growing challenge associated with data retention. All companies, particularly those operating online, recognize the value of retaining information about their users. Some information is essential to providing customer service, while other data can be used to provide users with a customized experience by eliminating the need to re-enter passwords, automatically posting relevant content, or sending permission-based email marketing that accurately reflects the users’ interests.

The value of information extends beyond personal data. Once aggregated, retailers can spot trends among demographic groups, ISPs can gauge usage patterns, and search engines can identify what is on the minds of the world’ s Internet users.

Given its value, it comes as little surprise to find that companies retain such data for lengthy periods, using sophisticated data mining technologies to analyze the information. While these previous examples illustrate the rewards of data retention (which benefit both companies and their customers), significant risks also exist.

The same data can be mined for purposes that extend far beyond the reasons for which it was initially provided. The Google case provides a classic illustration in this regard as mere search terms take on a new significance in the hands of Department of Justice lawyers.

Some data is not consciously provided at all – it is simply gathered automatically with little thought given to its potential uses. For example, private parties may demand ISP server logs that are generated automatically to assist with new defamation or copyright lawsuits.

One of the biggest risks associated with data retention comes not from requests that proceed through the legal system, but from security vulnerabilities that puts sensitive data into the hands of hackers. Last year, more than 50 million people in North America received notifications that their personal information had been placed at risk due to a security breach.

Policy makers worldwide have scarcely begun to reconcile the risks and rewards of data retention. In the immediate aftermath of the Google issue, at least one U.S. politician has called for new legislation to set limits on data retention and establishes a positive obligation to destroy data under certain circumstances. In Europe, the debate has centered on mandating data retention to assist law enforcement.

While Canadian privacy law establishes general obligations on data retention and destruction, there are few clear legal obligations to either retain or destroy information. In light of recent events, it is time to search for some solutions.

Michael Geist holds the Canada Research Chair in Internet and E-commerce Law at the University of Ottawa, Faculty of Law. He can reached at mgeist@uottawa.ca or online at www.michaelgeist.ca.