Jump to main content.


How to Use HEDS -- Reference

CONTENTS

INTRODUCTION

DISCLAIMER

HEDS SCREENS

INFORMATION ON DOWNLOADS

INTRODUCTION

The Human Exposure Database System (HEDS) enables you to view and download human exposure study information. HEDS uses the EPA's Environmental Information Management System (EIMS) as its metadata repository. Therefore, when you are using the system, you can expect to see pages from both HEDS and EIMS.

To access all of the EIMS functions described below, it is necessary to have entered through the HEDS portal of EIMS. (The HEDS portal is the pathway into the HEDS portion of the EIMS system.) If you have entered through another EIMS portal, you can access some but not all of these functions on the EIMS pages. If you do not see the HEDS logo at the top of the screen or the HEDS-Home link at the bottom, you are not in the HEDS portal. To get to the HEDS portal, use the following URL: http://www.epa.gov/eims/?p=ord-heds. The HEDS portal in EIMS focuses your searches to HEDS entries within EIMS.

To get back to HEDS from EIMS at any time, click on the EIMS Download link for a HEDS entry and then click on one of the HEDS links or click on the HEDS Home link in the footer.

Every data set, document, and study in HEDS is assigned an Entry ID number, which is used in both HEDS and EIMS.

The term "active," when used with study, data set, or customized selection, refers to the most recent choice of that item type.

HEDS is optimized for use with the Netscape browser version 4.0 or higher. Some functions may operate differently for other browsers. Also, some functions may operate differently depending on user settings in the browser. In most cases, the difference is not significant. In instances where it may be, additional information is provided.

Note that you can Back out of this page at any time to try any of the HEDS Studies links described herein. You can get back to this page by returning to the HEDS Home Page and then clicking How To Use--Reference on the navigation bar or Help in the page footer.

DISCLAIMER

Product and corporate names may be trademarks or registered trademarks of commercial companies and are used only for explanation and to the owners' benefit, without intent to infringe. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.

HEDS HOME PAGE

The HEDS Home Page navigation bar includes the following links:

The main window includes two links:

HEDS STUDIES

This is the starting page for retrieving any HEDS-related information. This page contains links for HEDS projects and studies. A HEDS study contains data sets and documents pertinent to that study. EIMS contains links to and between the data sets and documents within a study. Some HEDS studies may be related to a project, and links from the project to the related studies will be available.

The HEDS Studies page includes the same HEDS navigation bar as the HEDS Home Page. The two kinds of links on the main part of this page and the information they provide are described as follows.

Project. If you click on a project link, you will be taken to an EIMS Summary page having the EIMS navigation bar.

The EIMS Summary page includes

The EIMS navigation bar includes

Links at the bottom of the page provide access to other EPA sites. The links include a HEDS-Home link to enable you to get back to HEDS.

Study. If you click on a study link, you will be taken to the HEDS Study Information Directory page for a particular study.

STUDY INFORMATION DIRECTORY

This page provides the following links:

At the bottom of this page is a shortcut for returning directly to a previously viewed Data Set or Document for the active study. To use this feature, you must have previously accessed a data set or document and noted its Entry ID number. Enter the number in the field provided. Be sure to click the appropriate radio button for either Download Data Set or Download Document, and then click Submit. If you click the incorrect radio button for the Entry ID number you enter, you will get an error message reminding you to click the correct radio button.

If you enter an Entry ID number for a data set or document that does not belong to the active study, you will get a message informing you of that fact. Return to the Study Information Directory page and enter a number that belongs to the active study, or return to the HEDS Studies page and select the study associated with the desired Entry ID number.

Download Data Set takes you to the HEDS Download Complete Data Set page. Download Document takes you to the EIMS Downloads page for that document.



DOWNLOAD COMPLETE DATA SET

The page header includes the Entry ID number, title, and a portion of the abstract and the data use and constraints. To view the full abstract and related information, click the View Full Description link just below the text for Data Use and Constraints.

This page and the pages described below have a navigation bar different from other HEDS pages. This navigation bar includes the following sections: Complete Data Set, Customized Data Set, and General.

Return to Top

Under COMPLETE DATA SET on the navigation bar, the Download link refreshes the page. Under the Browse subheading, the links perform the same actions as the similarly named Browse links on the main part of the page. The links on the Download Complete Data Set page are described in the following paragraphs.

Immediately below the Download and Browse Options heading are links that take you to parts of this reference document. Below those links are the following links:

Return to Top



Under GENERAL on the navigation bar, the three links are described as follows.

Under CUSTOMIZED DATA SET on the navigation bar, only the Select Columns link is active on this page. To create a customized data set based on the active data set, click the Select Columns link. The Select Data Set Columns page appears.

SELECT DATA SET COLUMNS

The page header includes the Entry ID number, title, and a portion of the abstract and the data use and constraints. To view the full abstract and related information, click the View Full Description link just below the text for Data Use and Constraints.

The Select Data Set Columns page enables you to select columns of a data set for browsing or downloading. As the default, all columns are deselected. To select all columns, click the Select All Columns button. All columns are selected, and the link changes to Deselect All Columns. (Note that the record identifying or key columns are always selected.)

To manually select or deselect individual columns, click in the selection box to the left of each column name. A check mark indicates that the column is selected. To clear all manually selected columns, click the Clear button. If you want most of the columns, you can click the Select All Columns link to select all the columns and then individually deselect the columns you do not want. However, if you find that you want most of the columns, it is recommended that you download the complete data set.

When you have selected the desired columns, click Submit. The Selected Data Set Columns page appears.

SELECTED DATA SET COLUMNS

This page enables you to review your column selections. If you are not satisfied with your selection, click Back to return to the Select Data Set Columns page and revise your selection, or click Select Columns in the navigation bar to make a new selection. If you are satisfied with your selection, click the Data Set link in the navigation bar. The Data Set link takes you to the Download Customized Data Set page.

Note that on the navigation bar under Customized Data Set the links for Data Set, Code Set, and Data Dictionary are now active. The Data Set link takes you to the Download Customized Data Set page. The Code Set and Data Dictionary links take you to pages that allow for downloading or browsing the code set and the data dictionary for the customized data set, respectively. See Code Set and Data Dictionary for further information.

DOWNLOAD CUSTOMIZED DATA SET

The page header includes the Entry ID number, title, and a portion of the abstract and the data use and constraints. To view the full abstract and related information, click the View Full Description link just below the text for Data Use and Constraints. The links on this page are described in the following paragraphs.

DOWNLOAD CODE SET FOR CUSTOMIZED DATA SET

This page allows you to download the code set for a customized data set. To reach this page, on the navigation bar, under Customized Data Set, click Code Set. The page header includes the Entry ID number, title, and a portion of the abstract and the data use and constraints. To view the full abstract and related information, click the View Full Description link just below the text for Data Use and Constraints. The links on this page are described in the following paragraphs.

DOWNLOAD DATA DICTIONARY FOR CUSTOMIZED DATA SET

This page allows you to download the data dictionary for a customized data set. To reach this page, on the navigation bar, under Customized Data Set, click Data Dictionary. The page header includes the Entry ID number, title, and a portion of the abstract and the data use and constraints. To view the full abstract and related information, click the View Full Description link just below the text for Data Use and Constraints. The links on this page are described in the following paragraphs.

Return to Top



EIMS NAVIGATION BAR LINKS

As indicated above, some HEDS links take you to EIMS. In EIMS pages, the navigation bar may show both active and inactive links, with inactive links in a subdued color or grayed out. Of the available links, the following are particularly useful to HEDS users.

The following links are also available.

Return to Top



INFORMATION ON DOWNLOADS

CONTENTS OF A DOWNLOAD PACKAGE

Information downloaded from HEDS includes data set metadata (as a readme file), data set files(s), data dictionary file(s), and code set file(s). For a complete data set, these files are provided as a group in a zipped package. For customized data sets, the files are not in a zipped package but are provided as individual files. The following sections describe aspects of downloads.

DATA SET METADATA

DATA SET PACKAGES

DATA SET

DATA DICTIONARY

CODE SET



DATA SET METADATA

Data set metadata information is taken from the Environmental Information Management System (EIMS), ORD's metadata repository. EIMS metadata can be accessed via the Data Set Metadata link on the HEDS navigation bar or the Study Metadata link on the Study Information Directory page.

Return to Information on Downloads

Return to Top



Data Set Metadata Sample

An example of data set metadata follows.

Data Set Description

Entry ID:

17419

Data Set contains 119 columns, 459 rows, and 1 section(s)

Name:

NHEXAS PHASE I REGION 5 STUDY--DESCRIPTIVE QUESTIONNAIRE DATA

Abstract:

This data set includes responses for 459 descriptive questionnaires. The Descriptive Questionnaire was used to enumerate individuals within a household for sampling purposes (basis for selection of sample individual), to identify general characteristics of the living quarters and occupants, and to provide a basis for assessing potential bias due to refusals in subsequent steps. It includes a few general questions about the household and a set of demographic questions about each full-time resident of the household. Keywords: questionnaire; exposure survey.

The National Human Exposure Assessment Survey (NHEXAS) is a federal interagency research effort coordinated by the Environmental Protection Agency (EPA), Office of Research and Development (ORD). Phase I consists of demonstration/scoping studies using probability-based sampling designs. The NHEXAS Phase I Questionnaires were organized into six modules for simplicity in administration (to minimize respondent burden and maximize participation rates at each step) and for collecting information that can be temporally related to the exposure, concentration and/or biological measurements collected in NHEXAS: Descriptive, Baseline, Technician, Follow-up, Time and activity diary, and Dietary diary (and follow-up). The Region 5 study was conducted in EPA's Region 5 (Ohio, Michigan, Illinois, Indiana, Wisconsin, and Minnesota), and included personal exposure, residential concentration, and biomarker measurements of metals and VOCs. The study was conducted by the Research Triangle Institute (RTI) and the Environmental and Occupational Health Sciences Institute (EOHSI). The scope and design of the study are detailed in the following article: E. Pellizzari et al., Population-Based Exposure Measurements in EPA Region 5: A Phase I Field Study in Support of the National Human Exposure Assessment Survey. Journal of Exposure Analysis and Environmental Epidemiology, Vol. 5, No. 3, 1995, pp. 327-358.

Data Use And Constraints:

These data are the result of a probability-based sampling design specific to the population under study. Thus the data may or may not be representative of subsets of this study's population or of other populations. The study was designed to test certain hypotheses and thus may limit its applicability for other purposes. When using these data it is important to consider the percentage of nonresponses or nondetects in the data as an indicator of its usefulness for other purposes. No liability is accepted by the U.S. EPA for any errors or omissions in the results included in the data set, associated information and/or documentation.

Based on the output format selected by the user and the software into which the data set is imported, the user may notice that measurements and sampling weights are zero-padded to the right of nonzero decimal digits. These zeroes are purely a function of the formatting process and the software's acceptance of that format and should not be construed to represent significant digits in the value. Measurement values are provided with four significant digits; sampling weights are provided with three decimal digits.

Notice:

The current data is draft data and should not be used for any definitive purposes.

Additional Information:

To access the data set, click on the downloads link on the navigation bar. Then click on the download entry "Access Data Set".

Version:

1.0.

Return to Information on Downloads

Return to Top



DATA SET PACKAGES

General

The data set portion of a typical download package contains three types of information for the selected data set: data, data dictionary, and code set. The data dictionary provides characteristics of the data columns in a given data file. The code set provides the map between the code values used for the responses in the data and the descriptions those code values represent.

The download package is a zipped file that can be opened using most unzip software. For information on software to use in unzipping the file, go to the HEDS Home Page and click on the Related Web Sites link. (Return to Contents of a Download Package.)

The individual files are provided in dBase IV format (.dbf) or ASCII format (.txt) and should be importable into most database or analytical software tools. The files follow naming conventions and other specifications that allow importing into a broad range of tools and versions likely to be used. In ASCII files, the columns are delimited by tabs, and columns with text are surrounded by double quotation marks (").

Analytical software packages do not accept these formats in the same way. Some make their own judgments about how to interpret what is included in the file to be imported. Files downloaded from HEDS were imported into Excel 97, Access 97, SPSS 10, and dBase IV. The following notes describe characteristics of the software that should be taken into account in importing the data, data dictionary, and code set files. In most cases, data in the .dbf format imports more easily and with fewer problems than data in the .txt format.

Importing dBase IV (.dbf) Files

Note: For a data set downloaded in .dbf format, two additional files (xxxxxds.prt and xxxxxds.txt) may appear in the unzipped data set. These are files dBase uses to help with printing. They do not affect use of other files.

Excel 97 imports data in the .dbf format without problems. It keeps the numeric formats with the correct number of decimal digits. Excel 97 seems to set a default cell format for all columns as "number" with zero decimal digits.

Access 97 imports data in the .dbf format. It changes numeric columns to double (double precision), and it sets the width of all text fields to 255 characters.

SPSS 10 imports data in the .dbf format without problems, and it carries over format correctly. It adds a first column called d_r that has all blank values.

dBase IV imports data in the .dbf format without problems.

Importing ASCII (.txt) Files

In ASCII files, the columns are delimited by tabs, and columns with text are surrounded by double quotation marks ("). These characteristics have an impact on the importing of .txt files into the four software packages previously noted.

Excel 97 provides a text import wizard. In the successive screens of the wizard, selections should be made in Step 1 for Delimited; in Step 2 for Tab delimiter and double quote ("); and in Step 3 for General as the Column Data Format.

Excel 97 does not carry over the data type. For example, if a column is formatted as character and contains only numeric values, Excel changes the format to numeric. You can correct this problem in Step 3 of the wizard. First, under Data Preview, select the column. Then, under Column Data Format, select Text. Repeat this procedure for other columns as desired before you click Finish. The data dictionary included with the data set will help you determine the format for a column.

Excel 97 does not carry over the column width. Thus, for example, all of the decimal digits of a value may not be visible. To correct this problem, adjust the column widths as necessary in the Excel worksheet.

Access 97 follows its own rules for converting data. It generalizes, not keeping detailed format information. For example, it changes integer to long integer and float type columns to double (double precision). Access 97 shows all of what is in the field in terms of decimal digits and width within the definition of its data type. All character columns are assigned a text data type with a length of 255.

SPSS 10 encounters several errors in importing .txt files. It cannot read the column names, which have double quotation marks around them, and it converts column names to variable names V1, V2, etc. SPSS 10 retains the double quotation marks around character strings in the data cells, a fact that also increases the width of the character fields by 2. It defines the width for numeric fields based on the values in the first couple of rows. It also uses 2 as the default number of decimal digits. You can change these through various SPSS options or as part of the import process. The actual values are maintained, but the format affects what appears on output.

dBase IV will not import ASCII-formatted (.txt) files.

Importing Customized Data Sets

Customized data sets are currently available only in ASCII format. Depending on the software to be imported into, it may be easier to import the files into Excel first and then import them into other software.

Return to Information on Downloads

Return to Top



File Naming Conventions

The following file naming conventions are used. If the HEDS Entry ID number for the data is 12345 and the files are in dBase IV format, then the data set (ds) file has a name like 12345ds.dbf. The data dictionary (dd) file describing this data set has a name like 12345dd.dbf. The related code set (cs) file has a name like 12345cs.dbf.

Return to Information on Downloads

Return to Top



Segments for Complete Data Sets

To fit most software packages and their most-likely-used versions, the files containing the actual data (data set files) have been limited to a maximum of 255 data columns. If the entire data set contains no more than 255 data columns, the download package contains one each data set, data dictionary, and code set. Thus, if HEDS data set 12345 has 1 segment, and if the download is in .dbf format, the following files would be included in this package:

12345ds.dbf Data set
12345dd.dbf Data dictionary for the data set
12345cs.dbf Code set for the data set


However, if a complete data set contains more than 255 data columns, the files in the download package are provided in segments. Each segment contains the same set of record identifying columns at the beginning of each record to enable matching of data from different segment files. The segment number is included as the last character in the segmented file's filename. For example, if HEDS data set 45678 has two segments, and if the download is in .dbf format, the names of the files for the second segment's data set, data dictionary, and code set would be 45678ds2.dbf, 45678dd2.dbf, and 45678cs2.dbf, respectively.

Some users may have software enabling them to merge segmented files into joined files. Therefore, the download package for segmented files also includes a data dictionary and code set for the entire, nonsegmented data set. These data dictionary and code set files for the entire data set do not contain a segment number as the last character in the file name. Note that the package does not include a nonsegmented data set. Thus, if HEDS data set 45678 has two segments, and if the download is in .dbf format, the following files would be included in this package:

45678ds1.dbf Segment 1 file of the data set
45678ds2.dbf Segment 2 file of the data set
45678dd1.dbf Data dictionary for segment 1 data file
45678dd2.dbf Data dictionary for segment 2 data file
45678cs1.dbf Code set for segment 1 data file
45678cs2.dbf Code set for segment 2 data file
45678dd.dbf Data dictionary for the complete data set
45678cs.dbf Code set for the complete data set


In this example, the data dictionary for the complete data set would include data columns in the following order: the record identifying columns from segment 1, the data columns from segment 1, and data columns following the repeated record identifying columns from segment 2. Segments are created to keep data columns of a similar category together. This may not reflect the original order of columns in a questionnaire.

Customized Data Sets

All customized downloads from a given data set are assigned standardized filenames in the download process. If you are downloading multiple selections from the same data set, it is recommended that you assign unique names to each selection's files.

For a customized data set of 255 or fewer data columns, one data file, one data dictionary file, and one code set file will be available for download through links on separate pages. The data file will contain the record identifying columns for the data set followed by the selected columns in the order they appear in the full data set.

For a customized data set of more than 255 columns, the files in the download package are provided in sections. (Sections of customized data sets are similar to segments of complete data sets.) The first section will include the record identifying columns followed by the selected columns in the order they appear in the full data set up to a total of 255 columns. The next section will include the record identifying columns followed by the next set of selected columns in the order they appear in the full data set up to a total of 255 columns. The sections do not necessarily keep data columns for categories of information together as in the complete data set download. There is also no data dictionary or code set for the full selection.

It is suggested that if you are selecting more than 255 data columns you download the complete data set package.

Important Download Information for Internet Explorer Users

The Internet Explorer (5.0+) browser does not process the customized download files in the same manner as the Netscape browser. It is recommended that, if you are using Internet Explorer (5.0+), you use one of the following methods to download the customized files.

Method 1

Method 2

For either method, at this point it is necessary to edit the saved file before using it. Open the saved file into your preferred editing software. You will see tab delimited data fields with double quotation marks (") around columns with text preceded by some text like: Content-Type: text/tab separated values, and Content-Disposition: attachment; filename=17420cs1.txt. This text will precede each section of data within the file. Each section must be saved as a separate file without the header in order to import it to other software.

If you use Internet Explorer, it may be easier to download the complete data set.

See Importing Customized Data Sets for further information.

Return to Information on Downloads

Return to Top



DATA SET

HEDS data sets fall into three general categories: questionnaires, analytical results, and QA analytical results. Because studies may differ in the information collected, one format cannot be used for all data sets in a category. However, some consistency in approach has been followed.

In a data set containing questionnaire responses, one row of the data file represents one participant's responses. If the questionnaire is administered once to each respondent, then each row in the data file represents one participant's responses. If the questionnaire is administered to a participant more than once, then each row represents a participant's responses on a particular date.

In a data set file, the first set of data columns contains columns for participant identification, columns relating to the sampling design of the study, columns for sampling weights related to a probability-based sampling design if used, columns with dates relating to the administration of the questionnaire, and columns with miscellaneous record information. These are the record identifying columns, and they are repeated in any segment or section. The remaining data columns provide the responses to the questions in the order they appear in the questionnaire. At the end of all the response columns, there may be additional columns containing calculated data.

In many studies, the names of the data columns reflect the question number. To facilitate the use of these data files in multiple versions and types of software, the column names have been limited to eight characters beginning with an alphabetic character. The .dbf or .txt files include the column names and the data.

A data set containing analytical results includes results for a similar set of analyzed samples, for example, metals in air or volatile organic compounds (VOCs) in water. The organization of analytical results across a study's data files is study-dependent. Each record in an analytical results data file represents all the results from one analysis of a sample. In most cases, the sample ID uniquely identifies the rows; in some cases, additional data columns may be required. (See the Primary Key column in the data dictionary.) The first group of data columns includes (1) various basic information associated with the study, such as sample ID, sampling medium, analyte (chemical) class, and participant ID; and (2) information related to the study's sampling design and any probability-based sampling weights. These are the record identifying columns, and they are repeated in any segment or section. The remaining data columns include the analytical results in groups of columns organized by analyte. One analyte's group of columns might include the concentration, the units of concentration, a detection or quantitation limit, a data quality flag, and a comment about the result for this sample. The data file from a metals analysis for a set of samples might include a group of columns each for lead, cadmium, chromium, and arsenic. Additional columns might include comments or ancillary information on the sample. The column names have been limited to eight characters beginning with an alphabetic character, and the .dbf or .txt files include the column names and the data.

A data set containing QA analytical results will include results only for a specific type of QA sample (blanks, duplicates, spikes) for an analyte class. The organization of QA analytical results across a study's data files is study-dependent. The organization of data columns in these data files is similar to that in the analytical result files described above. In addition these data files contain columns specific to the type of QA analyses included in the file, such as type of duplicate (field or analytical), percent recovery for spikes, and the sample ID for a replicate sample, if applicable. These analytical results are also grouped in columns by analyte. The column names have been limited to eight characters beginning with an alphabetic character, and the .dbf or .txt files include the column names and the data.

Return to Information on Downloads

Return to Top



DATA DICTIONARY

The data dictionary provides characteristics of the data columns in a given data file. Each data dictionary file contains the same columns of descriptors. Each row in a data dictionary file provides information on one of the data columns in the data file. The rows in the data dictionary are in the same order as the data columns in the data file. The columns in the data dictionary are as follows:

COL_NAME -- column name, max 8 characters (see Notes below)

COLLABEL -- column label, max 40 characters (see Notes below)

EXT_DESC -- extended column description, max 250 characters (see Notes below)

DATATYPE -- column data type [number, integer, character, or string (alphanumeric)]; max 9 characters (NOTE: date is not an acceptable data type for HEDS. A predefined numeric format is used for dates and years. See Notes below.)

COLWIDTH -- maximum number of spaces taken up by data column, max 4 digits

COL_FMT -- column format, specifies a format for reading and printing the data in the column (see Notes below)

UNITS -- units, max 10 characters (blank if units are specified in a separate column)

PRMRYKEY -- primary key, max 1 character, contains "Y" or "P" if column is a primary key for the data file, blank otherwise

MINIMUM -- minimum permissible value, max 20 characters, present for numeric data column, does not include nonresponse code values. Permissible values are the values included in the data column that are considered usable. (See Notes below.)

MAXIMUM -- maximum permissible value, max 20 characters, present for numeric data column, does not include missing value codes. Permissible values are the values included in the data column that are considered usable. (See Notes below.)

MISSVALS -- list of the study-defined missing value or nonresponse codes for this column, separated by a semicolon (;) or a comma (,), max 40 characters, present if data column contains any nonresponse codes. Some analytical software tools, such as SAS and SPSS, allow a user to define missing values for a data column. This software option enables a user to include or not include in an analysis any rows that contain the assigned missing values.

COL_NUM -- a number that represents a relative ordering of the columns for the data file, not data set, associated with the data dictionary. This ordering will differ depending on whether the data file is a segment of the complete data set or a customized data set.

WT_NAME -- list of weight names associated with this data column, max 100 characters, present if study uses probability-based sampling weights and if weights can be applied to the data column. These names are links to study documentation describing the creation and use of weights.

WT_TYPE -- type of weight to be used for this data column, if more than one level of response, e.g., household and participant, is included in the data file, max 10 characters, not required but included if study uses probability-based sampling weights and if weights can be applied to the data column.

COMMENTS -- additional information about the data column, max 80 characters

CHECKSUM -- sum of all the values in the data column to be used as a check that all records in the data file have transferred correctly; a number, where size depends on data column; available for most numeric columns

Notes

The following notes relate to the columns in a data dictionary file.

COL_NAME, COLLABEL and EXT_DESC are progressively detailed ways to describe what is contained in the data column. COL_NAME should be able to transfer with the data to provide the column name in most analytical software. Some software, such as SPSS and SAS, may provide options for using the COLLABEL. For both COL_NAME and COLLABEL, however, the number of characters allowed may not be adequate for a good description of the column's contents. Thus the extended column description is available and includes the actual phrasing of the question from a questionnaire, where available.

COL_FMT describes any formatting definitions for a data column. Decimal values include the decimal point. Specifications for COL_FMT are as follows:

MINIMUM, MAXIMUM, and MISSVALS help in understanding what to expect from the data in a column by specifying the end points of the permissible value range and the code values for nonresponses. Available only for numeric data columns.

The data dictionary .dbf or .txt files include the column names and the data dictionary information.

Return to Information on Downloads

Return to Top



CODE SET

Code sets are a shortcut way of including responses in a data set. For example, a single number or character can be defined as a placeholder for the full response. Thus the number 1 can be assigned for the response "male," and the number 2 for "female." The numbers 1 and 2 are the code values; male and female are the code descriptions.

The code set file contains the code sets for all the data columns in the associated data file. A code set is available for each data column that contains code values, whether those code values represent descriptive responses or nonresponse categories. (Nonresponse categories include responses like "missing," "don't know," and "not applicable." Some software packages allow handling of nonresponses differently than permissible values when processing the data.) The code set for a given data column is identified by the name of the data column using it. If several columns use the same code set, a code set is included for each with the data column name as identifier. Each row in the code set contains information for a unique code value in a code set, including nonresponse codes. The code set file has the following format:

COL_NAME -- name of data column to which code values and descriptions apply, max 8 characters

CHAR_VAL -- code value for code set in character format, max 10 characters; exists for all code values

NUM_VAL -- code value for code set in numeric format; available only for numeric data columns

SHRTDESC -- short description associated with code value and for use in software package labeling, max 20 characters

EXT_DESC -- extended description associated with code value, max 250 characters, available when short description is not adequate for understanding

The code set .dbf or .txt files include the column names and the code set information.

Return to Information on Downloads

Return to Top

Office of Research & Development | National Exposure Research Laboratory
Send questions or comments to Carry Croghan,
Webmaster at Croghan.Carry@epa.gov


Local Navigation


Jump to main content.