|
About SOIC
What is SOIC?
How much does the software cost?
Where can industry and occupation information be
found?
Why collect industry and occupation information?
What is the advantage of an automated coding system?
What types of data are accepted by SOIC?
Will the software be updated?
Coding Results
What is the overall performance of SOIC?
How well does the software perform on industry
narratives?
How well does the software perform on
occupation narratives?
What are the limitations to using an automated
system?
What can be done to data to improve the software’s
performance?
Should I perform quality control on coded data?
About SOIC
What is SOIC?
The Standardized Occupation and Industry Coding (SOIC) system
is a standalone Windows based software package that assigns 3-digit numerical
codes to narrative industry and occupation descriptions. Using data that
have been imported from existing files or entered directly into SOIC using
the data entry screen, the software can code one record at a time or all
records in a file. The software assigns codes based on the 1990 Bureau
Of the Census Alphabetical Index of Industries and Occupations.
How much does the software cost?
Because SOIC was created in the public domain, the software can
be downloaded free of charge.
Where can industry and occupation information
be found?
Many data sets contain narrative text for industry and occupation,
including:
- Vital records systems
- Cancer registries
- Worker’s compensation systems
- Healthcare records
Why collect industry and occupation information?
The collection of industry and occupation (I&O) information
serves many purposes:
- To associate specific health outcomes (e.g., a particular cancer
or cause of death) with certain industries and/or occupations
- To identify areas in need of further research
- To assess socioeconomic status and identify persons who may be
at high-risk of disease or injury.
This type of information can be used by public health workers, industrial
organizations, employers, and others to provide the best possible hazard
abatement and control, and safety and health programs for workers.
What is the advantage of an automated
coding system?
Manually assigning industry and occupation (I&O) codes can
often be expensive, time consuming, and not highly consistent. An automated
system like SOIC can decrease the number of cases a coder must review
and create consistency in a records system.
What types of data are accepted
by SOIC?
SOIC offers flexible data access features. A user can:
- Enter data directly into SOIC
- Operate directly on external data tables in dBase, FoxPro, and
Access
- Import and export text files
Will the software be updated?
The current version of SOIC assigns codes based on the 1990 Bureau
of the Census (BOC) coding scheme. No further revisions will be made to
this version. Although the BOC developed a 2000 industry and occupation
coding scheme, currently, there are no plans to create a version of SOIC
that incorporates these codes.
Coding Results
What is the overall performance
of SOIC?
NIOSH conducted a comparison of SOIC and an expert’s manually
assigned codes for 48,067 cases from a death certificate based surveillance
system. The number of software-assigned codes that matched the expert
manual coder is shown below. In this test there was no adjudication of
the results; that is, the mismatched cases were not reviewed to determine
if the SOIC autocoder or the manual coder was actually correct. These
results are provided as an illustration. Coding results will vary and
depend upon overall data quality. The software does not perform well on
narratives with company names and other ambiguous information.
Industry codes matched |
36,376 cases (76%) |
Occupation codes matched |
36,207 cases (75%) |
Both occupation and industry codes matched |
30,389 cases (63%) |
How well does the
software perform on industry narratives?
To identify and categorize SOIC errors, unique industry and occupation
narratives were reviewed separately. Using Bureau of the Census industry
divisions as a guide, SOIC and manually assigned codes were compared.
Of the 48,067 total cases, 16,096 cases
contained unique industry narratives (all duplicates were removed, e.g.,
the file contained the term "construction roofing" in the industry
narrative field only once compared to the numerous times it may have appeared
in the original file of 48,067 cases). More than half (9,262)
were coded correctly. The software incorrectly coded 3,296
narratives and could not code 3,538 narratives. Problems
that were identified are characterized below.
Codes were in different division categories but were
unrelated (e.g., construction vs. hospital) |
1,159 (35%) |
Codes were in different division categories but were related (e.g.,
food manufacturing vs. retail grocery) |
1,032 (31%) |
Codes were in the same division category |
825 (25%) |
Codes were in different division categories and one code was an
unclassified code |
248 (8%) |
Codes were in different division categories and one code was a non-worker
code |
32 (1%) |
How well does
the software perform on occupation narratives?
To identify and categorize SOIC errors, unique industry and occupation
narratives were reviewed separately. Using Bureau of the Census occupation
divisions as a guide, SOIC and manually assigned codes were compared.
Of the 48,067 total cases, 9,808 cases
contained unique occupation narratives (all duplicates were removed, e.g.,
the file contained the term "carpenter" in the occupation narrative
field only once compared to the numerous times it may have appeared in
the original file of 48,067 cases). Almost half (4,852)
were coded correctly. The software incorrectly coded 2,020
narratives and could not code 2,936 narratives. Problems
that were identified are characterized below..
Codes were in different division categories but were
unrelated (e.g., nurse vs. machine operator) |
789 (39%) |
Codes were in the same division category |
748 (37%) |
Codes were in different division categories but were
related (e.g., machine operators vs. operating engineers) |
388 (19%) |
Codes were in different division categories and one
code was an unclassified code |
60 (3%) |
Codes were in different division categories and one code was a non-worker
code |
35 (2%) |
What are the limitations to using an
automated system?
Because of data quality, there are often challenges to using
an automated industry and occupation (I&O) coding system.
- Multiple ways of reporting industries or occupations
- Industry: Hauling/Transporting/Trucking
- Occupation: Teacher/Tutor/Instructor
- State-specific business names
- Industry: Bar Church Key Pub/Harper’s County
Ham
- Ambiguous I&O narratives
- Industry: Food/Computers
- Occupation: Healthcare/Assistant
What can be done to data to improve the software’s
performance?
Data quality plays an integral part in software performance.
The cleaner, more straightforward your data, the better the software performs.
Several steps can be taken to ensure well-coded data. Before coding, review
the industry and occupation (I&O) narratives in your data and modify
as needed:
- Spell out abbreviated words and acronyms
Example:
- AFB = Air Force Base
- ED = Emergency Department
- Use business type instead of business name
Example:
- Construction instead of Smith & Sons
Contracting
- Make sure that industry and occupation narratives are in appropriate
fields
- Delete non-essential words
Example:
- a teacher
- works as a lawyer
Should I perform quality control
on coded data?
After data have been coded, we strongly encourage performing
quality control to ensure that there are no systematic errors.
- Randomly sample electronically coded cases
- Independently code sample manually
- Compare records and adjudicate mismatches
Page last updated: July 10, 2007
Page last reviewed: May 13, 2008
Content Source: National Institute for Occupational Safety and Health (NIOSH)
|
|
SOIC Software
Topic Index:
SOIC Home
The software...
SOIC manual...
PDF 1.96 MB (130 pages)
How to...
Resources...
Contact us...
|
|