CDC logoSafer Healthier People CDC HomeCDC SearchCDC Health Topics A-Z
NIOSH - National Institute for Occupational Safety and Health

What is SOIC?

Background

Many data sets contain narrative text for industry and occupation. These include vital records systems, cancer registries, worker's compensation systems, and healthcare records. Manually assigning industry and occupation (I&O) codes can be expensive, time consuming, and not highly consistent. Furthermore, because some industry and occupation titles are so rare, or include infrequently used synonyms, even experienced coders have great difficulty in reaching agreement.

To decrease the number of cases a manual coder must review and to create national consistency, NIOSH led the development of the Standardized Occupation and Industry Coding (SOIC) software. The development of SOIC was a collaborative effort that included the National Association for Public Health Statistics and Information Systems, the National Center for Health Statistics (NCHS), the Bureau of Labor Statistics (BLS), the National Center for Chronic Disease Prevention and Health Promotion, and the Bureau of the Census (BOC).

Back to top     The Software

SOIC codes occupation and industry narratives according to the 1990 BOC Alphabetical Index of Industries and Occupations supplemented with special codes for non-paid workers, non-workers, and the military as defined in the NCHS Instruction Manual, Part 19. This website provides downloadable versions of the current version (SOIC 1.5) and its documentation. The SOIC software may be downloaded free of charge. Minimum system requirements include:

  • 90 MHZ Pentium with 32 MB of RAM
  • Windows® 98, NT, ME, or 2000
  • Minimum 30 MB of free disk space

The SOIC system client was written using the Microsoft Visual Basic programming language and the Microsoft Access database management system. SOIC data tables and data files are stored as Access tables and files. SOIC offers several data access features: the main window can be used for data entry; text or ASCII files can be imported and exported; and files in Microsoft Access, dBase, or FoxPro formats can be opened directly into the software. Microsoft conventions for Windows applications were used wherever possible. The software has an easy-use-interface created based on the U.S. standard death certificate (the data entry screen) and includes an extensive system menu that includes options for opening and saving files, editing or finding records, and coding a single record and entire files.

The data entry screen.

Back to top     The Coding Process

To assign industry and occupation codes, the software uses a stepwise series of increasingly complex coding modules. Narrative information is processed through each module until an industry or occupation code is assigned or the narrative is determined to be uncodable.

SOIC coding process flow diagram.

Auto-Spell: corrects some misspellings and expands fused words, acronyms, and abbreviations.

Lookup Tables: assigns codes based on exact matches to various I&O narrative combinations.

  • Paired-phrase matching: commonly occurring I&O narratives.
  • Company matching: a limited list of state-specific industry names.
  • Idiom matching: misleading industry narratives.

Knowledge Base: assigns codes based on static handwritten coding rules that emulate the logic that a manual coder would typically apply (e.g., performs “fuzzy” matching on word fragments). There are 2,055 rules that are broken down into 848 industry rules and 1,207 occupation rules.

Word-to-Code: predicts codes based on word patterns observed in data used to develop the software.

Back to top     Coding Results

NIOSH conducted a comparison of SOIC and an expert’s manually assigned codes for 48,067 cases from a death certificate based surveillance system. The number of software-assigned codes that matched the expert manual coder is shown below. In this test there was no adjudication of the results; that is, the mismatched cases were not reviewed to determine if the SOIC autocoder or the manual coder was actually correct. These results are provided as an illustration. Coding results will vary and depend upon overall data quality. The software does not perform well on narratives with company names and other ambiguous information.

Number of SOIC assigned codes that matched manually assigned codes
Industry Codes matched  36,376 cases (76%)
Occupation Codes Matched  36,207 cases (75%)
Both occupation and industry codes matched  30,389 cases (63%)

Back to top     Software Version

The current SOIC software version available for download is v. 1.5 and is based on the 1990 BOC industry and occupation coding scheme. The software is provided as a resource tool for injury and illness researchers where uniform coding of industry and occupation is beneficial to prevention efforts. No further revisions will be made to this version. User support is limited. Assistance may be requested by contacting the NIOSH SOIC group.

The BOC developed a 2000 industry and occupation coding scheme. Currently, NIOSH is not planning to create a version of SOIC that incorporates these codes.

Download SOIC Software
Page last updated: July 10, 2007
Page last reviewed: May 13, 2008
Content Source: National Institute for Occupational Safety and Health (NIOSH)

SOIC Software

occupation/industry logo - links to SOIC Home

Topic Index:


> SOIC Home


> The software...
> SOIC manual...
PDF this document in PDF 1.96 MB (130 pages)

> How to... > Resources... > Contact us...