Back Issue Digitization
PMC Back Issue Digitization Project
A number of journals that joined PubMed Central (PMC) prior to 2008 have benefited from NLM's back issue digitization project, offered to publishers whose archival content was not yet available in electronic form. By scanning back issues that were available only in print, NLM has helped create a complete digital archive of these journals in PMC. The details of the project, which is now in its final phase:
- The full cost of scanning the back issues and creating the related OCR and XML files was covered by NLM and, in some cases, the Wellcome Trust and the U.K. Joint Information Systems Committee (JISC). See the announcement of the collaboration between NLM, the Wellcome Trust and JISC. The Wellcome Trust site has information about the journals sponsored by Wellcome and JISC.
- Participating journals have given NLM permanent rights to archive the scanned material and make it freely available to the public through PMC, subject to normal ‘fair use’ provisions of copyright law. In return, NLM offers to provide the publisher with a complete electronic copy of its material, at no cost. As with existing content in PMC, copyright for the scanned material remains with the publisher or with individual authors, as applicable.
- NLM scanned back to the first issue of each journal. Each issue was scanned cover to cover, with pages scanned at resolutions ranging from 300 dpi to 600 dpi, depending on the nature of the source material. A PDF file was created for every article or other discrete item in an issue. Grayscale and color graphics in an article were reproduced in the PDF file as true representations of the original pages.
- OCR text, of sufficient quality to build indexes for full text searching and to use for other background processing, was generated automatically from the scanned images. There was no manual correction of the OCR text to improve its accuracy, and PMC users do not have direct access to the OCR text.
- An XML record was created for the citation and abstract of any scanned article that is not already listed in NLM's PubMed abstracts database, and these abstracts are being added to PubMed. See the PMC FAQs for exceptions to this statement.
- For complete technical details, see the NLM Image Specifications and Functional Requirements for Citation Capture [PDF–750K].
In 2008, with the scanning mostly complete for participating journals, NLM has scaled back PMC scanning activity, and expects to complete the remainder of the scanning work in the queue by early 2009.