Case Study
Colorado Universities Add Public Domain Research Information to
Discovery Service
Innovative Interfaces, Inc.
HathiTrust borrows part of its name and logo from the elephant, pairing the Hindi word for the Pachyderm with a long memory, and “trust,” which the organization’s website calls, “a core value of research libraries and one of their greatest assets.” HathiTrust brings together the rich collections of prestigious institutions from the United States and Europe for digital preservation and open sharing purposes. Approximately 75% of HathiTrust material is in copyright and the remaining 25% in the public domain. The digital library, both copyrighted and in the public domain, represent over seven million volumes and over 350 terabytes of data—all stored in state-of-the-art servers.
One way research libraries can leverage the sharing purpose of the HathiTrust is by harvesting its extensive public domain collection. Content that is public domain in the US includes US Government documents and works published anywhere in the world prior to 1923. Works published outside the US prior to 1870 are in the public domain worldwide. Publications in the US from 1923 to 1963 may be part of the public domain if they did not comply with copyright formalities. A team of HathiTrust partners are at work manually assessing the copyright status of these volumes so they can be added to the collection.
Jeremy York, Project Librarian at HathiTrust, coordinates projects, facilitates new partnerships, and helps maintain the technology infrastructure. “Our fundamental purpose is preservation,” says York, “and the partners believe in a very tight coupling between preservation and access.” HathiTrust uses a digital presentation vendor that provides the scalability, storage redundancy, and disaster recovery capabilities to keep the collective “memories” of prestigious libraries intact. (A complete list of HathiTrust partners can be found at www.hathitrust.org/community.)
University of Denver
In May 2010, the HathiTrust public domain worldwide collection was harvested by Innovative Interfaces for University of Denver. Library staff had been watching the progress of the HathiTrust collection as it grew, but needed “a sustainable way” to harvest the material—and keep harvesting it—as the collection grew, according to Betty Meagher, Head of Metadata and Materials Processing at Penrose Library. Says Meagher: “We now have an automatic ‘pull’ from HathiTrust and get regular emails from the Encore system about the new information that comes in.”
As a research library and also an authorized Federal Depository Library the extensive assets of HathiTrust were particularly attractive. Meagher says that the HathiTrust material complements the collection they do have and has valuable information in full-text, such as the Federal Register of the United States and other important population information. “There are huge runs of information we simply don’t have room for in the library,” says Meagher.
University of Colorado, Boulder
Staff at the University of Colorado, Boulder Libraries looked at integrating external collections into their library’s Encore system. “We’d talked about harvesting Google Books into Encore, but I had reservations about both the quality of the metadata and the appropriateness to our research community,” says Jina Choi Wakimoto, Faculty Director of Cataloging and Metadata Services. “It’s the trusted depository that matters and we have that with HathiTrust. Users also get full-text!”
As an example of quality, Wakimoto mentions a search of “Chinese bronzes” and limiting to HathiTrust Collection facet results in, “a wonderful list of books and exhibition catalogs available in public domain.” By tapping the Encore Services team, the Libraries now have completed harvesting projects from both local digital collections and now Hathi Trust. In a single search, users may gain from both sets of materials. Wakimoto points out a search for “Zoroaster” presents relevant collection-facets from both the Libraries’ Thomas MacLaren Architectural Drawings and the HathiTrust collection.
As with the University of Colorado, what Wakimoto calls the “mechanics” of including a growing open collection had not been tenable for staff in the past. “When we had Encore we had the perfect solution,” she says. “We now have quality metadata–a prestigious group of information from real research partners. All we had to do is tell Encore Services the domain of the material we wanted and how often we wanted the new harvest to come in.”
How Harvesting Works
HathiTrust, headquartered at the University of Michigan, makes its materials available through OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting), a standard that allows metadata records to be collected or harvested from a repository. Encore is able to the harvest the public domain material from HathiTrust in the Dublin Core(SM) format. Libraries can choose to harvest materials that are public domain in the US only or worldwide. By taking advantage of Innovative Interfaces’ Encore Harvesting services, harvesting of the HathiTrust data through OAI-PMH is seamless to the library.
As the public domain collection grows, so will the size of the harvested materials at Penrose Library and University of Colorado Libraries. In fact, the HathiTrust collection is growing by 700 volumes every day. As the libraries move forward, they will be able to gauge patterns of usage with Google Analytics™.
Over half-a-million records from the HathiTrust’s public domain collection are accessible in Encore at two universities. An example of a search for “intellectual property” limited to the HathiTrust collection facet can be accessed at both University of Denver (http://bit.ly/ acrovW), and University of Colorado, Boulder (http://bit.ly/9lDkae). The entire collection of the HathiTrust public domain can be retrieved on either Encore system by searching “public domain items worldwide.” Questions about Encore can be directed to info@encoreforlibraries.com.
Trademarks: Dublin Core(SM), OCLC, Inc; Google Books™ and Google Analytics™, Google.
Innovative Interfaces dedicates its energies to meeting the needs of libraries and the challenges of library automation. The company has fulfilled this mission with first-rate services and products such as the Millennium integrated library system, the INN-Reach direct consortia borrowing solution, Electronic Resource Management, and the Encore discovery services platform. Today, thousands of libraries of all types in over 50 countries rely on Innovative's products, services, and support. The company is located in Emeryville, California, with offices in Australia, Canada, France, Hong Kong, Korea, Portugal, Spain, Sweden, Taiwan, Thailand, and the United Kingdom. www.iii.com
