The number of non-digital Copyright records (70 million) and the constraints on funding make the digitization of copyright records a long-term project. But that doesn’t mean we can’t make some records available sooner rather than later. We’re looking at several strategies and are eager for your feedback and ideas.
First, we can demonstrate what’s possible, engage users, and make some records available online through a search and retrieval pilot using a small but complete subset of the records indexed by multiple fields with links to images of the original copyright records. Options include the records of transfers and assignments of copyrights. The 2.5 million catalog cards with indexes to approximately 350,000 documents recorded between 1870 and 1977 have already been digitized. PDF copies of the documents also exist. Transfer and assignment records must be consulted to determine the complete ownership history of any copyright and so their availability online would form a nice complement to the Catalog of Copyright Entries from 1891 to 1977, which are being digitized onsite here at the Library and made available through the Internet Archive website. Nearly two thirds complete, CCE records are now available online back to 1936. Another option is the set of records of prints and labels registered between 1922 and 1940. This set is much smaller, about 43 thousand registrations, and could be done sooner but it may be too confined to be a model for the 16 million records referring to many other types of copyrightable material. A third option is the set of registration records from 1971 to 1977. This is a much larger set of 7.7 million catalog cards with indexes to 2.8 million registrations and would require considerably more time to complete. I seek your comments on which of these three options would be most useful to you.
Second, as an interim measure while full record indexing is underway, we are considering making the catalog card images available online through a virtual card catalog organized hierarchically by type of record, time period, drawer name, and card image number. This could be done after digitization of each set and would enable online searching of these records in a manner that mimics searching the actual cards. While this would require a few more steps to search for a particular term, it would enable viewing surrounding records, a feature considered useful by some users.
Third, we are exploring the feasibility, costs, and benefits of optical character recognition and double-blind data capture as possible options for extracting data from copyright records. Indexing 70 million records is a daunting task and way beyond present staff resources. At the same time, the accuracy and integrity of the records is of paramount importance. Through prototyping and piloting and your feedback, we plan to find the optimal approach that will capture the necessary information correctly and completely. Whether captured through keyboarding or OCR there must be a second pass of the data for verification. In concert with this we are considering how we might use crowd-sourcing to engage large numbers of interested persons to help with the data capture and verification.
Fourth, we are going to publicize the project through the Copyright website and other media such as this blog to generate excitement, seek input, and garner support for the project.
As always, your feedback and comments are most important and most welcome.