Analytical
Chemistry
Atomic
and Molecular Physics
Biometrics
Biotechnology
Chemical
and Crystal Structure
Chemical
Kinetics
Chemistry
Communications
Construction
Environmental
Data
Fire
Fluids
International
Trade
Law
Enforcement
Materials
Properties
Mathematical
Databases, Software and Tools
Optical
Character Recognition
Physics
Product
Design
Surface
Data
Text
and Video Retrieval
Thermophysical
and Thermochemical
|
|
NIST
Special Database 2
NIST
Structured Forms Reference Set of
Binary Images (SFRS)
Price:
$90.00
The NIST Structured
Forms Database consists of 5,590 pages of binary, black-and-white
images of synthesized documents.
The documents in
this database are 12 different tax forms from the IRS 1040 Package X for
the year 1988. These include Forms 1040, 2106, 2441, 4562, and 6251 together
with Schedules A, B, C, D, E, F, and SE.
Eight of these forms
contain two pages or form faces; therefore, there are 20 different form
faces represented in the database.
The document images
in this database appear to be real forms prepared by individuals, but
the images have been automatically derived and synthesized using a computer.
There are 900 simulated
tax submissions represented in the database averaging 6.2 form faces per
submission. This significant new database totals approximately 5.9 gigabytes
of uncompressed image data including image format documentation and example
software.
The database has
the following features:
- 900 simulated
tax submissions
- 5,590 images of
completed structured form faces
- 300 pixel/inch
resolution
- 5,590 text files
containing entry field answers
- 20 tables of entry
field types and contexts
- image format documentation
and example software
Suitable for both
document processing and automated data capture research, development,
and evaluation, the data set can be used for:
- forms identification
- field isolation;
locating the entry fields on the form
- character segmentation:
separating entry field values into characters
- character recognition:
identifying specific machine printed characters
This database is
a valuable tool for measurement of system performance and system comparison
on complex forms.
System
Requirements: CD-ROM drive with software to read ISO-9660
format.
Price:
$90.00
Special pricing for
multiple copies available. Call for details.
Please click here
to view the PDF version of Users' Guide.
For
more information on Special Database 2 please contact:
- Standard Reference
Data Program
National Institute of Standards and Technology
100 Bureau Dr., Stop 2300
Gaithersburg, MD 20899-2310
(301) 975-2008 (VOICE)
/ (301) 926-0416 (FAX) / Contact Us
The
scientific contact for this database is:
- Michael Garris
National Institute of Standards and Technology
100 Bureau Drive, Stop 8940
Building 225, Room A216
Gaithersburg, MD 20899-8940
(301) 975-2928
michael.garris@nist.gov
Keywords:
ASCII Reference, automated character recognition, automated data capture,
Binary Image Database, forms identification, image format documentation,
IRS, NIST, Machine Print, OCR, optical character recognition, printed
characters, software recognition, synthesized documents, tax forms.
|