Columns

The Untold Story Behind the LAC-Canadiana Digitization Plan

The need for a large-scale Canadian digitization strategy has been readily apparent for many years, with experts repeatedly pointing to the benefits that would come from improved access to Canadian history and culture. While other countries have marched ahead with ambitious projects that often incorporate historical text records, photographs, and video, Canada has fallen behind. 

Library and Archives Canada, which is charged with preserving and making accessible Canada’s documentary heritage, has led the digitization effort, but most of its work over the past decade has failed to bear much fruit.

Given the past disappointments, my weekly technology law column (Toronto Star versionhomepage version) notes the launch a massive new digitization project should have been a cause for celebration. Last June, the LAC and Canadiana, an alliance of public and academic libraries focused on digital preservation, announced plans to digitize and create metadata on 60 million historical Canadian documents. The documents are currently in microfiche and the project envisions digitizing the images and adding transcriptions and metadata (data about data content) to improve their searchability.

Yet as the details of the project dubbed Héritage leaked out, controversy arose with concerns that the historical documents would be placed behind a paywall that would require individual Canadians to pay monthly fees for access. That generated a significant outcry from many groups, with then-Canadian Heritage Minister James Moore assuring the House of Commons that the new head of LAC would closely examine the project.

After the outcry subsided, however, H̩ritage began to proceed largely as planned. The key supporters of the project РCanadiana, the major library associations, and the LAC Рtried to assure critics that their concerns were unfounded, promising to make the digitized microfiche copies freely available to all and restricting additional fees to value-added services such as transcription or metadata. However, newly obtained documents under the Access to Information Act raise troubling questions about public access and promises of exclusivity made by the LAC.

Among the documents obtained is the previously unreleased contract between LAC and Canadiana (underlying MOU here). The contract, which was signed in May 2013, does not provide for digital public access to the documents without a paywall. Rather, the minimum requirements are that the LAC will provide physical access in its reading rooms, Canadiana will charge fees for hosting the content, and at least ten per cent of the collection will be made freely available online each year. After ten years, the entire collection will be openly available to the public online.

The contractual terms are inconsistent with public statements that provided assurances that all digital copies would be publicly available on completion and that the ten per cent restrictions would only apply to works with additional transcription or metadata. Canadiana officials now say that they plan to go beyond the contractual requirements, yet it is surprising that the LAC did not insist on full public access within the contract.

The contract also grants Canadiana exclusive rights to host and make accessible online the entire collection for ten years. However, internal LAC documents readily acknowledge that there was nothing to stop anyone from doing the same thing, since the documents are in the public domain and there is free access to the physical copies.

In fact, granting exclusivity rights is difficult to reconcile with the role of the LAC in digitizing the historical records, which is far more extensive that is generally appreciated. The contract indicates that the LAC will digitize no less than two-thirds of the collection. Given that the LAC is doing most of the digitizing and Canadiana hopes to rely on crowdsourcing techniques for some of the transcription and metadata, the extensive public contribution creates real doubts about the need for any paywall or exclusivity.

The Héritage project promises to offer unprecedented access to Canadian historical documents. Yet the fine print of the agreement may leave many wondering how a deal could have been reached without mandating free online public access, while granting exclusive rights that do not exist.

12 Comments

  1. pat donovan says:

    strike three
    if it’s an asset, it will be monetized.
    the fees will only rise.

    If it’s information, it will be SSS info.
    sterilized, slanted and sanitized.

    big bro by patronage contract means
    (with OUR luck, floppy disks onna 286 for a server)

    only in canada, eh?
    welcome to quebec efficiency, western charm and newbie patronage systems.

    pat donovan

  2. Richard Ellis says:

    Readers may wish to view the response of Canadiana.org to some of the issues raised now in this blog, but previously in others.
    http://www.canadiana.ca/en/lac-project-faq

  3. Then maybe LAC should be given the funding to do this without having to rely on third parties?

  4. My view – Canadiana admits to paywall…
    From the Canadiana FAQ linked above:

    What about full text searching?

    Very little of the archival text is currently visible to our search tools. This is because archival records present some exceptional technological challenges: Optical Character Recognition (OCR) software, a component of digitization which extracts searchable text from the raw images, doesn’t recognize the handwritten or cursive text found in these records. Transcribing the collection to produce this data will take enormous time and resources—arguably, beyond the scope of any single institution to accomplish, which explains this partnership.

    We hope to generate the revenues to create this text data from donations, sponsorships, and an optional premium site with enhanced features—this last being at the origin of the “paywall” myth. Individuals will be able to choose whether they want to pay and support the project (again, the digitized images will always be free) but the more revenue we collect, the more data we can create. Until the completion of the project, this searchable, full-text data will be one of the premium services.

    So, imagine the scenario this creates. You can VIEW any IMAGE of a digitized document. But unless you know exactly what document you want, there there will be no way to FIND that document — without paying. This is disengenuous at best. They appear to be saying “We don’t charge for viewing documents, but we do charge for the ability to know what document to view.”

    And – since searchable full-text data will be a premium service until the completion of the project – that would seem to imply that LAC has agreed put off searching ANY of these items without paying until 2023.

  5. This is the Canadiana quote, standalone (so as not to confuse it with what I wrote)
    What about full text searching?

    Very little of the archival text is currently visible to our search tools. This is because archival records present some exceptional technological challenges: Optical Character Recognition (OCR) software, a component of digitization which extracts searchable text from the raw images, doesn’t recognize the handwritten or cursive text found in these records. Transcribing the collection to produce this data will take enormous time and resources—arguably, beyond the scope of any single institution to accomplish, which explains this partnership.

    We hope to generate the revenues to create this text data from donations, sponsorships, and an optional premium site with enhanced features—this last being at the origin of the “paywall” myth. Individuals will be able to choose whether they want to pay and support the project (again, the digitized images will always be free) but the more revenue we collect, the more data we can create. Until the completion of the project, this searchable, full-text data will be one of the premium services.

  6. Harper government leadership is at it again.

    The massive amount of OCR work necessary can only be accomplished by crowdsourcing. This process is subject to often random task selection by the volunteer crowd. What the public needs is guidance and leadership that makes it possible to know what is there an what needs to be done.

  7. I think they are being deceptive in their comments on the FAQ page.

    I tried to look through a few pages and received this message: “Subscription Required
    Early Canadiana Online is funded by its users through subscriptions. Become a subscriber today to access all of the site’s content and features.”

    If that isn’t a paywall, what is? And what are these “features”?

  8. Russell McOrmond says:

    My commentary, as a Sysadmin at Canadiana and Open* Advocate
    When this story was first discussed I made two postings on the topic:

    http://mcormond.blogspot.ca/2013/06/good-new-canadiana-lac-project-spun.html
    http://mcormond.blogspot.ca/2013/06/why-is-license-required-for-canadiana.html

    If people still have concerns that they would like to hear from an “insider”, they can ask there where I’ll get notified in email of additional comments.

    I’m still confused by the concern. There isn’t any “copyright” style restrictions, and the material is freely available to all patrons of the libraries who are funding this project. The libraries have just opted to not also fund access for people who are not patrons.

    While we can all wish the federal government would fund these access projects, I don’t see the value is constantly critiquing the libraries who have decided to actually do something while we all wait for some “perfect” day in the future that may never come.

    Note: Canadiana is a charity whose board is made up of representatives from the library community who fund it. It isn’t an entity that can be usefully spoken of separate from the libraries who created and fund it.

    Ray,

    ECO (Early Canadiana Online) is a separate project from the Heritage project, and it is only Heritage that Mr. Geist’s article is about. Canadiana isn’t limited to using the same business model for every project, and libraries use whatever options they have available to increase access to as much material as possible.

  9. Universities and their libraries are ultimately funded by the public through the government. Those who are likely to use the Canadiana site are not going to be too concerned about the distinction between the Heritage and ECO projects. As long as these documents and publications are a vital part of Canadian history we would do better not to favour elite access by one sector of society.

  10. “…the material is freely available to all patrons of the libraries who are funding this project. The libraries have just opted to not also fund access for people who are not patrons.”

    Is it not public information in the first place? Didn’t all the people pay for it already, not just the lucky ones with access to certain libraries?

  11. Ray, ECO is nothing new. Most Canadian universities and many large public libraries subscribe to it, which is why Canadiana turned to them for funding when the government refused to fund the Heritage project. Unlike the content for Heritage, hard copies of ECO content are NOT available freely from LAC, or anywhere else that I’m aware of, but the digitization was done from microforms held privately by Canadiana. Canadiana also offers the Canadiana Discovery Portal free of charge, and they’ve partnered with the University of Alberta Libraries to digitize many of their microforms for the Internet Archive (also free of charge).

    Honestly the only people I’ve ever known to use ECO were Canadian historians, that is academics, who worked/studied at universities, because the general public really doesn’t have that much need for, say, the full text of the 1833 Canadian Surgeons Quarterly claiming that TB is caused by living in an unchaste state, that they can’t wait for a hard copy to come into their local public library through ILL.

  12. Cara, I largely agree. Most of these documents are historical in nature, but those with interests in the history are not limited to academics. Most of the general public is unaware that this material exists. The material is, however, extensive. For those of us who are not within the walls of academe the pay walls are a serious impediment. We may want to pursue a particular line of inquiry, but when we are faced with a pay wall we are more likely to go away. We have no assurance that what we seek will prove relevant to our study.

    The Internet Archive and Hathi Trust are wonderful sites, but they are restricted by US copyright law. I often find myself looking at material between 1923 and Canadian copyright expiry.