Analytical
Chemistry
Atomic
and Molecular Physics
Biometrics
Biotechnology
Chemical
and Crystal Structure
Chemical
Kinetics
Chemistry
Communications
Construction
Environmental
Data
Fire
Fluids
International
Trade
Law
Enforcement
Materials
Properties
Mathematical
Databases, Software and Tools
Optical
Character Recognition
Physics
Product
Design
Surface
Data
Text
and Video Retrieval
Thermophysical
and Thermochemical
|
|
NIST
Special Database 13
NIST
Census Miniform Test Database:
Binary Images from Paper and Microfilm
NIST
Special Database 13 is a set of 1990
Census Miniform images. A Miniform is a non-sensitive portion of the
Industry and Occupation section of an actual Census Long Form with
handwritten responses to three questions.
The database
is available on CD-ROM and contains images of 3000 paper miniforms
(9000 fields), 3000 microfilm miniforms (9000 fields) and
files containing ASCII transcriptions of the strings that were written
in the miniform fields. This database is designed for the evaluation
of optical character recognition (OCR) systems in a difficult but
realistic form-based task on binary images from microfilm.
Each miniform
image contains three fields with handwritten answers to the following
questions (Long Form Questions 28b, 29a and 29b respectively):
- Describe the
activity performed at location where employed.
- What kind
of work was this person doing?
- What were
this person's most important activities or duties?
A possible set of
responses would therefore be:
- hospital
- registered
nurse
- patient care
The forms were
scanned from microfilm, yielding images of far lesser quality than
forms scanned from paper. The images are 624 by 744 pixels sampled
at 78.74 pixels/cm (200 pixels/inch). They are packed five to a file
and are CCITT Group 4 compressed. Source code for image manipulation,
including programs to uncompress and unpack the images, is present
on the CD-ROM. The code is written in the C programming language and
was developed on Sun workstations running SunOS 4.1.1*.
NIST Special
Database 13 was the third of three produced in conjunction with The
Second Census Optical Character Recognition Systems Conference, and
was used for the system benchmark testing. (The first, Special Database
11, contained microfilm training data. The second, Special Database
12, contained paper and microfilm training data). NIST and the Bureau
of the Census sponsored the Conference, in which the participants
sought to determine the state of the art of the OCR industry on a
challenging, realistic task. The results of the Conference were published
in NIST Internal Report (IR) 5452. That report is available on the
Internet in PostScript form via anonymous FTP from the server sequoyah.ncsl.nist.gov,
maintained by NIST's Visual Image Processing Group. It is also available
on request in hardcopy form.
NIST Special
Database 13 comes with a 30-page guide that presents an overview of
the Conference and its results and documents the file formats and
software.
*Specific hardware
and software products identified were used in order to adequately
support the development of the technology described in this document.
In no case does such identification imply recommendation or endorsement
by the National Institute of Standards and Technology, nor does it
imply that the equipment identified is necessarily the best available
for the purpose.
Price:
$90.00
Special pricing
for multiple copies available. Call for details.
Please click
here to view the PDF version of Users'
Guide
For
more information please contact:
- Standard Reference
Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2300
Gaithersburg, MD 20899-2310
(301) 975-2008 (VOICE) / (301) 926-0416 (FAX)
Contact Us
The
scientific contact for this database is:
- Stanley Janet
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940
PH: (301) 975-2916
e-mail: stan.janet@nist.gov
Keywords:
ASCII Reference, Binary Image Database, census forms, Census OCR Systems
Conference, Character recognition, hand print, hand-printed characters,
microfilm documents, NIST, OCR, optical character recognition, paper,
software recognition, NIST.
|