My weekly Law Bytes column (Toronto Star version, freely available version, BBC version) examines the U.S. Department of Justice’s demand for search data from the world’s leading search engines. I argue that while much of the focus has been on the privacy implications of the USDOJ request, the story highlights a much bigger issue – the significant risks and rewards that arise from retaining enormous amounts of data.
The authorities’ initial data request was stunning for its sheer breadth. The USDOJ requested all web addresses (URLs) contained in the Google database as well as a record of "all queries that have been entered into your company’ s search engine between June 1, 2005 and July 31, 2005." In other words, it wanted a list chronicling every website in Google database along with literally every search request over a two-month period. When it faced resistance, the USDOJ agreed to a narrower request that included a random sample of one million web addresses as well as a list of every search string during a one-week period.
Although none of this data relates to a specific individual – it covers hundreds of millions of Internet users – the request has still produced a chilling effect as many begin to question whether search requests thought to be anonymous could ultimately be tracked back to them.
In a broader context, the demand also highlights the growing challenge associated with data retention. All companies, particularly those operating online, recognize the value of retaining information about their users. Some information is essential to providing customer service, while other data can be used to provide users with a customized experience by eliminating the need to re-enter passwords, automatically posting relevant content, or sending permission-based email marketing that accurately reflects the users’ interests. The value of information extends beyond personal data. Once aggregated, retailers can spot trends among demographic groups, ISPs can gauge usage patterns, and search engines can identify what is on the minds of the world’ s Internet users.
Given its value, it comes as little surprise to find that companies retain such data for lengthy periods, using sophisticated data mining technologies to analyze the information. While these previous examples illustrate the rewards of data retention (which benefit both companies and their customers), significant risks also exist.
The same data can be mined for purposes that extend far beyond the reasons for which it was initially provided. The Google case provides a classic illustration in this regard as mere search terms take on a new significance in the hands of Department of Justice lawyers. Some data is not consciously provided at all – it is simply gathered automatically with little thought given to its potential uses. For example, private parties may demand ISP server logs that are generated automatically to assist with new defamation or copyright lawsuits. One of the biggest risks associated with data retention comes not from requests that proceed through the legal system, but from security vulnerabilities that puts sensitive data into the hands of hackers. Last year, more than 50 million people in North America received notifications that their personal information had been placed at risk due to a security breach.
While Canadian privacy law establishes general obligations on data retention and destruction, there are few clear legal obligations to either retain or destroy information. In light of recent events, it is time to search for some solutions.