CDC logoSafer Healthier People CDC HomeCDC SearchCDC Health Topics A-Z
NIOSH - National Institute for Occupational Safety and Health

SOIC Frequently Asked Questions

About SOIC

What is SOIC?
How much does the software cost?
Where can industry and occupation information be found?
Why collect industry and occupation information?
What is the advantage of an automated coding system?
What types of data are accepted by SOIC?
Will the software be updated?


Coding Results

What is the overall performance of SOIC?
How well does the software perform on industry narratives?
How well does the software perform on occupation narratives?
What are the limitations to using an automated system?
What can be done to data to improve the software’s performance?
Should I perform quality control on coded data?

About SOIC

Back to top     What is SOIC?
The Standardized Occupation and Industry Coding (SOIC) system is a standalone Windows based software package that assigns 3-digit numerical codes to narrative industry and occupation descriptions. Using data that have been imported from existing files or entered directly into SOIC using the data entry screen, the software can code one record at a time or all records in a file. The software assigns codes based on the 1990 Bureau Of the Census Alphabetical Index of Industries and Occupations.

Back to top     How much does the software cost?
Because SOIC was created in the public domain, the software can be downloaded free of charge.

Back to top     Where can industry and occupation information be found?
Many data sets contain narrative text for industry and occupation, including:

  • Vital records systems
  • Cancer registries
  • Worker’s compensation systems
  • Healthcare records

Back to top     Why collect industry and occupation information?
The collection of industry and occupation (I&O) information serves many purposes:

  • To associate specific health outcomes (e.g., a particular cancer or cause of death) with certain industries and/or occupations
  • To identify areas in need of further research
  • To assess socioeconomic status and identify persons who may be at high-risk of disease or injury.

This type of information can be used by public health workers, industrial organizations, employers, and others to provide the best possible hazard abatement and control, and safety and health programs for workers.

Back to top     What is the advantage of an automated coding system?
Manually assigning industry and occupation (I&O) codes can often be expensive, time consuming, and not highly consistent. An automated system like SOIC can decrease the number of cases a coder must review and create consistency in a records system.

Back to top     What types of data are accepted by SOIC?
SOIC offers flexible data access features. A user can:
  • Enter data directly into SOIC
  • Operate directly on external data tables in dBase, FoxPro, and Access
  • Import and export text files

Back to top     Will the software be updated?
The current version of SOIC assigns codes based on the 1990 Bureau of the Census (BOC) coding scheme. No further revisions will be made to this version. Although the BOC developed a 2000 industry and occupation coding scheme, currently, there are no plans to create a version of SOIC that incorporates these codes.

Coding Results

Back to top     What is the overall performance of SOIC?
NIOSH conducted a comparison of SOIC and an expert’s manually assigned codes for 48,067 cases from a death certificate based surveillance system. The number of software-assigned codes that matched the expert manual coder is shown below. In this test there was no adjudication of the results; that is, the mismatched cases were not reviewed to determine if the SOIC autocoder or the manual coder was actually correct. These results are provided as an illustration. Coding results will vary and depend upon overall data quality. The software does not perform well on narratives with company names and other ambiguous information.

Industry codes matched
36,376 cases (76%)
Occupation codes matched
36,207 cases (75%)
Both occupation and industry codes matched
30,389 cases (63%)

Back to top     How well does the software perform on industry narratives?
To identify and categorize SOIC errors, unique industry and occupation narratives were reviewed separately. Using Bureau of the Census industry divisions as a guide, SOIC and manually assigned codes were compared. Of the 48,067 total cases, 16,096 cases contained unique industry narratives (all duplicates were removed, e.g., the file contained the term "construction roofing" in the industry narrative field only once compared to the numerous times it may have appeared in the original file of 48,067 cases). More than half (9,262) were coded correctly. The software incorrectly coded 3,296 narratives and could not code 3,538 narratives. Problems that were identified are characterized below.

Codes were in different division categories but were unrelated (e.g., construction vs. hospital)
1,159 (35%)
Codes were in different division categories but were related (e.g., food manufacturing vs. retail grocery)
1,032 (31%)
Codes were in the same division category
825 (25%)
Codes were in different division categories and one code was an unclassified code
248 (8%)
Codes were in different division categories and one code was a non-worker code
32 (1%)

Back to top     How well does the software perform on occupation narratives?
To identify and categorize SOIC errors, unique industry and occupation narratives were reviewed separately. Using Bureau of the Census occupation divisions as a guide, SOIC and manually assigned codes were compared. Of the 48,067 total cases, 9,808 cases contained unique occupation narratives (all duplicates were removed, e.g., the file contained the term "carpenter" in the occupation narrative field only once compared to the numerous times it may have appeared in the original file of 48,067 cases). Almost half (4,852) were coded correctly. The software incorrectly coded 2,020 narratives and could not code 2,936 narratives. Problems that were identified are characterized below..

Codes were in different division categories but were unrelated (e.g., nurse vs. machine operator)
789 (39%)
Codes were in the same division category
748 (37%)
Codes were in different division categories but were related (e.g., machine operators vs. operating engineers)
388 (19%)
Codes were in different division categories and one code was an unclassified code
60 (3%)
Codes were in different division categories and one code was a non-worker code
35 (2%)

Back to top     What are the limitations to using an automated system?
Because of data quality, there are often challenges to using an automated industry and occupation (I&O) coding system.

  • Multiple ways of reporting industries or occupations
    • Industry: Hauling/Transporting/Trucking
    • Occupation: Teacher/Tutor/Instructor
  • State-specific business names
    • Industry: Bar Church Key Pub/Harper’s County Ham
  • Ambiguous I&O narratives
    • Industry: Food/Computers
    • Occupation: Healthcare/Assistant

Back to top     What can be done to data to improve the software’s performance?
Data quality plays an integral part in software performance. The cleaner, more straightforward your data, the better the software performs. Several steps can be taken to ensure well-coded data. Before coding, review the industry and occupation (I&O) narratives in your data and modify as needed:

  • Spell out abbreviated words and acronyms
    Example:
    • AFB = Air Force Base
    • ED = Emergency Department
  • Use business type instead of business name
    Example:
    • Construction instead of Smith & Sons Contracting
  • Make sure that industry and occupation narratives are in appropriate fields
  • Delete non-essential words
    Example:
    • a teacher
    • works as a lawyer

Back to top     Should I perform quality control on coded data?
After data have been coded, we strongly encourage performing quality control to ensure that there are no systematic errors.

  • Randomly sample electronically coded cases
  • Independently code sample manually
  • Compare records and adjudicate mismatches
Page last updated: July 10, 2007
Page last reviewed: May 13, 2008
Content Source: National Institute for Occupational Safety and Health (NIOSH)

SOIC Software

occupation/industry logo - links to SOIC Home

Topic Index:


> SOIC Home


> The software...
> SOIC manual...
PDF this document in PDF 1.96 MB (130 pages)

> How to... > Resources... > Contact us...

On this page...