The Risks and Rewards of Data Retention

January 30, 2006

My weekly Law Bytes column (Toronto Star version, freely available version, BBC version) examines the U.S. Department of Justice’s demand for search data from the world’s leading search engines. I argue that while much of the focus has been on the privacy implications of the USDOJ request, the story highlights a much bigger issue – the significant risks and rewards that arise from retaining enormous amounts of data.

The authorities’ initial data request was stunning for its sheer breadth. The USDOJ requested all web addresses (URLs) contained in the Google database as well as a record of "all queries that have been entered into your company’ s search engine between June 1, 2005 and July 31, 2005." In other words, it wanted a list chronicling every website in Google database along with literally every search request over a two-month period. When it faced resistance, the USDOJ agreed to a narrower request that included a random sample of one million web addresses as well as a list of every search string during a one-week period.

Although none of this data relates to a specific individual – it covers hundreds of millions of Internet users – the request has still produced a chilling effect as many begin to question whether search requests thought to be anonymous could ultimately be tracked back to them.

In a broader context, the demand also highlights the growing challenge associated with data retention. All companies, particularly those operating online, recognize the value of retaining information about their users. Some information is essential to providing customer service, while other data can be used to provide users with a customized experience by eliminating the need to re-enter passwords, automatically posting relevant content, or sending permission-based email marketing that accurately reflects the users’ interests. The value of information extends beyond personal data. Once aggregated, retailers can spot trends among demographic groups, ISPs can gauge usage patterns, and search engines can identify what is on the minds of the world’ s Internet users.

Given its value, it comes as little surprise to find that companies retain such data for lengthy periods, using sophisticated data mining technologies to analyze the information. While these previous examples illustrate the rewards of data retention (which benefit both companies and their customers), significant risks also exist.

The same data can be mined for purposes that extend far beyond the reasons for which it was initially provided. The Google case provides a classic illustration in this regard as mere search terms take on a new significance in the hands of Department of Justice lawyers. Some data is not consciously provided at all – it is simply gathered automatically with little thought given to its potential uses. For example, private parties may demand ISP server logs that are generated automatically to assist with new defamation or copyright lawsuits. One of the biggest risks associated with data retention comes not from requests that proceed through the legal system, but from security vulnerabilities that puts sensitive data into the hands of hackers. Last year, more than 50 million people in North America received notifications that their personal information had been placed at risk due to a security breach.

While Canadian privacy law establishes general obligations on data retention and destruction, there are few clear legal obligations to either retain or destroy information. In light of recent events, it is time to search for some solutions.

Share this post

2 Comments

Anonymous says:
January 30, 2006 at 3:54 am

Don’t do it like Europe
The European Parliament recently passed a bill drafted by the EU Commission (Dec. 14, 2005), requiring all communication providers in the EU to record connection data of e-mails, phone calls, cellphone calls, web browsing, file sharing, locations of cellphones during call initiation and other information for at least 6 and up to to two years. That amounts to roughly 639.000 CDs of data for Germany alone – per day!

While access to this information is currently supposed to be limited to “serious criminal offenses” such as terrorism (whatever the definition of that is), the Music Industry for example has already lobbied for softening this restriction in order to better go after whoever they think might be violating copyrights. I do not even want to think about the possible other abuses of this data…

Sources (unfortunately only available in German):
http://www.heise.de/ct/aktuell/meldung/66857
http://www.spiegel.de/netzwelt/politik/0,1518,390770,00.html
omnipotent speck says:
January 31, 2006 at 6:38 pm

confusing Google position
Google opposes the United States action -for good reason- but is the first to act whenever China requests some new level of restriction to be added to Google searches inside China. I honestly don’t know who to root for here.

As for the EU ruling, Canada’s C-74 was a mirror image of this, was it not? I seem to remember the recording industry lobbying for this pretty heavily here as well. If cost of storage is a problem, perhaps the recording industry would be happy to forward some of the funds collected on recordable media that was actually used for data back-up, and not for bootleg copies of Jessica Simpson’s musical butchery.