About the Text Generated by Optical Character Recognition without Correction
The periodicals in this collection were scanned at 600 dots per inch and captured as bitonal (black and white, not greyscale) TIFF images. The text for the twenty-two periodicals converted by Cornell University Library was generated by a fully automated process of optical character recognition, with no human intervention beyond initial calibration. The OCR process was implemented by Cornell University Library staff. A similar process was used by the University of Michigan Digital Library Production Service to prepare searchable text for Garden and Forest for the Library of Congress.
Why are there strange characters and odd spacing
in the text?
The text was generated automatically by optical character recognition
(OCR). OCR works best on uniform clear print. You will notice
problems with small print, special fonts, and decorated text.
Strange
characters,
particularly ~, occur in places where the OCR process determined
that a character was present but not what character it was. Strange
spacing, such as a sequence of blank lines, may occur when pages have
illustrations.
Decorative blocks, such as
the title block
on the front page of each issue of Garden and Forest, also cause
problems for OCR.
In some cases, the columns on the page were not
recognized as separate sequences of text.
Why was the text not re-keyed?
The
advantage of OCR over re-keying is in the cost. The cost of fully
automated OCR
is around 15 cents per page; the cost of re-keying is determined by
the number of characters. These pages are dense with words and re-keying
to the 99.95% accuracy rate required for most Library of Congress projects
to date would certainly cost a dollar or two per page. Human correction
of OCR would probably cost at least 4 or 5 times as much as the OCR itself.
The per page difference in
cost becomes significant for a collection of around 750,000 pages.
Why not simply use page images?
Most users will want to view the page images for reading. The
converted text
is primarily to support searching. Search the full text of a
periodical for a place, say Chesapeake Bay. [Choose "match
this exact phrase" as an option, and "match words exactly," rather than
"include word variants."]
Imagine using microfilm or even browsing the original paper issues.
Would you find these items? When
you search the full text, links from result lists will take you to the
uncorrected text.
Often, there will be a Best Match button at
the top (beside the American Memory logo). This will take you to the
section of the retrieved text with the best or most matches to the words you
entered.
View page links will take you to the image of the selected page.
You will be able to browse backwards and forwards
through the volume.