As we digitize, capture and index the pre-1978 Copyright records, a goal is that they be searchable in combination with the existing post-1977 records eventually resulting in a search capability that spans the full realm of creativity and copyright ownership from 1870 to the present. More than 18 million records from 1978 to the present are already available online and first thoughts are to add the pre-1978 records to that same database. However, before we go further with that idea, we’d like to know what you like and what you don’t like about the existing online search functionality for Copyright records. It’s available at the following address: http://cocatalog.loc.gov/. The basic search page looks like this:
Before 2007 when the current search functionality was installed, the post-1977 Copyright records were maintained in three separate files, one each for monograph registrations, serial registrations, and transfer/assignment documents. The index files for searching were similarly kept in separate files although some combined searching of the files was possible. Only left anchored searching was available, a disadvantage when one didn’t know the exact title they were searching for or at least how the title began. The old system had been developed at the Library in the 1970’s and by 2003 the time had come to replace it. A decision was made to use the same software for Copyright records that was already in place for the Library’s bibliographic records and which is still used today. Several benefits derived from this. It meant that the same tool would be used for both Copyright and bibliographic records with a very similar look and feel, a benefit for users of both sets of records. It avoided the cost of buying or developing and maintaining new software just for Copyright records. It leveraged the knowledge and experience that the Library had gained since implementing the software several years earlier. It entailed a conversion of the Copyright data resulting in records that are cleaner, more consistent, and better organized and which enable more portability of the data. And all records are stored in one database. The new software also supported improved indexing and keyword searching.
As with any tool, some users had developed expertise in using the old system and were sad to see it retired. Moving to the new software was a good decision at the time but before we add 16 million more records, we’d like to hear what you think of the present system for searching Copyright records. The Copyright Office is currently conducting an online survey that’s available when one searches the records at the link given above. The survey can only be completed once and requires that your browser pop-up blocker be turned off. If you haven’t already completed the survey, please take a few minutes to use it to provide us with feedback about what you like or don’t like about this search capability or your thoughts on adding the pre-1978 records. Question 6 of the survey has a text box in which you can give us your comments. If you’ve already completed the survey but have additional comments please add them to this post. Whatever feedback or comments you provide based on your experience in searching Copyright records online will be most appreciated and will be taken into account in deciding how to organize, index and make available the pre-1978 records.
August 23, 2012 at 8:14 pm
I have several ideas.
I would very much appreciate it if your system embedded your search terms in the URL so that when they were doing a search that returned multiple results a user didn’t have to complete looking at everything within a finite amount of time- and the search results could be bookmarked.
The best way to do this is to use a web app that slices the URLs into its component pieces and uses them as variables to do database queries, This can be made transparent to the user to make any number of (database resident) documents look like a file system to a web server. Then a search URL could be saved and whenever you typed that string into your browser the same search would be executed. That would make it easy for users, to say, periodically launch the URL in a tool like lynx and save the output to a file and diff it against the last search they did to see any new changes. It could be run by a script and the results parsed to show changes automatically. Also, it would be great if people could set up an account and save searches that would run, say once a week and email them the results (like they can do on PubMed.)
Also, are there any plans to ever use cryptographic hash functions (like an MD5 checksum) or public key cryptography to allow all documents to be “authenticated” simply? Similar to what is done in the open source software community using the md5sum(for simple checksums to show if a file has been modified) or gpg/pgp software (sophisticated digital signing capability)
Also, its great that you are digitizing the old records. What is the timeframe for that? (You’re probably just working backwards from 1978?)
August 23, 2012 at 10:08 pm
Having the LOC utilize the most robust search technology is the service we need to easily access all information within the Library’s holdings. I applaud the LOC for working to make available via digital technology this new resource.
August 24, 2012 at 7:58 pm
I agree with Christopher. A modernized functionality will be a great help to all users.
August 28, 2012 at 9:44 am
Christopher,
Thank you for your ideas about improving the search capability. The notion of saving search terms for later use makes sense. Searches can be bookmarked in the current system but as you probably know they are only good for about the length of one search session. Maybe we could improve on that.
We actually have experience with digital signatures, the encryption of a checksum over an object using PKI, and have used them for many years in one of our ingestion systems. Are you thinking of them being used with the downloading of a copy of a record?
Good progress is being made on the digitization of the catalog cards. More than 22 million have been scanned, quality checked, and copied to secure storage. We began the scanning with the most recent time period, 1971 to 1977, and we are now about halfway through the 1946 to 1954 period. The decision to work backward in time was based on a survey of public users who indicated that they search the most recent cards more often than the earlier ones. With continued funding we plan to finish the card digitization by the end of 2014. We are also exploring the option of making available a virtual card catalog which I described in posts on March 22 and April 5, as an interim measure to share the card images sooner rather than later.
October 19, 2012 at 6:46 pm
Thank you for every other wonderful post. Where else may just anybody get that type of info in such a perfect way of writing? I’ve a presentation next week, and I’m at the search for such information.