What is SOIC? |
|
BackgroundMany data sets contain narrative text for industry and occupation. These include vital records systems, cancer registries, worker's compensation systems, and healthcare records. Manually assigning industry and occupation (I&O) codes can be expensive, time consuming, and not highly consistent. Furthermore, because some industry and occupation titles are so rare, or include infrequently used synonyms, even experienced coders have great difficulty in reaching agreement. To decrease the number of cases a manual coder must review and to create
national consistency, NIOSH led the development of the Standardized Occupation
and Industry Coding (SOIC) software. The development of SOIC was a collaborative
effort that included the National Association for Public Health Statistics
and Information Systems, the National Center for Health Statistics (NCHS),
the Bureau of Labor Statistics (BLS), the National Center for Chronic
Disease Prevention and Health Promotion, and the Bureau of the Census
(BOC). The SoftwareSOIC codes occupation and industry narratives according to the 1990 BOC Alphabetical Index of Industries and Occupations supplemented with special codes for non-paid workers, non-workers, and the military as defined in the NCHS Instruction Manual, Part 19. This website provides downloadable versions of the current version (SOIC 1.5) and its documentation. The SOIC software may be downloaded free of charge. Minimum system requirements include:
The SOIC system client was written using the Microsoft Visual Basic programming language and the Microsoft Access database management system. SOIC data tables and data files are stored as Access tables and files. SOIC offers several data access features: the main window can be used for data entry; text or ASCII files can be imported and exported; and files in Microsoft Access, dBase, or FoxPro formats can be opened directly into the software. Microsoft conventions for Windows applications were used wherever possible. The software has an easy-use-interface created based on the U.S. standard death certificate (the data entry screen) and includes an extensive system menu that includes options for opening and saving files, editing or finding records, and coding a single record and entire files.
The Coding ProcessTo assign industry and occupation codes, the software uses a stepwise series of increasingly complex coding modules. Narrative information is processed through each module until an industry or occupation code is assigned or the narrative is determined to be uncodable. Auto-Spell: corrects some misspellings and expands fused words, acronyms, and abbreviations. Lookup Tables: assigns codes based on exact matches to various I&O narrative combinations.
Knowledge Base: assigns codes based on static handwritten coding rules that emulate the logic that a manual coder would typically apply (e.g., performs “fuzzy” matching on word fragments). There are 2,055 rules that are broken down into 848 industry rules and 1,207 occupation rules. Word-to-Code: predicts codes based on word patterns observed in data used to develop the software. Coding ResultsNIOSH conducted a comparison of SOIC and an expert’s manually assigned codes for 48,067 cases from a death certificate based surveillance system. The number of software-assigned codes that matched the expert manual coder is shown below. In this test there was no adjudication of the results; that is, the mismatched cases were not reviewed to determine if the SOIC autocoder or the manual coder was actually correct. These results are provided as an illustration. Coding results will vary and depend upon overall data quality. The software does not perform well on narratives with company names and other ambiguous information.
Software VersionThe current SOIC software version available for download is v. 1.5 and
is based on the 1990 BOC industry and occupation coding scheme. The software
is provided as a resource tool for injury and illness researchers where
uniform coding of industry and occupation is beneficial to prevention
efforts. No further revisions will be made to this version. User support
is limited. Assistance may be requested by contacting the NIOSH SOIC
group.
Page last updated: July 10, 2007
Page last reviewed: May 13, 2008 Content Source: National Institute for Occupational Safety and Health (NIOSH) |
|
||||||||||||||||||