2. Where can I find the data files and list
of data items that are available from NHANES 2007-2008? List/description of Mobile Examination Center (MEC) exam data items
List/description of MEC lab data items
List/description of household interview data items
List/description of demographic data items
3. Why are there so many data files? The data files have been separated to reduce the amount of time to
download data and documentation from the Internet, along with the greater
ease in producing, editing, and validating data files. This does require
that you merge files together for analysis. Please refer to the following
SAS code example to learn how to merge files together: NHANES data merge code example
4. On the NHANES
2007-2008 data page I see links for data. How do I access the data from
these links?
The Docs and Procedures files are in Adobe PDF format, so you should be able
to view these directly in your browser, if configured with Adobe Acrobat. A
PDF file can be saved from this view using the "File/Save As..." menu and
specifying a location on your local computer or network to store the file.
Or you can right-click the file name directly on the webpage and select
"Save Target As..." from the popup box, then specify a location to save the
file on your computer.
Clicking on the Data link will open a dialog box from which you can
specify a location to store the file (using the "Save" button) or open it
directly with SAS (using the "Open" button.)
5. Next to the name of each questionnaire
section, laboratory component, or exam component on the
NHANES 2007-2008 data page there are links
that appear as follows: [Data, Docs, Procedures]. What are these links for?
In previous years, files for each data section was made available as a
self-extracting zip file containing the codebook, documentation, file
frequencies, and SAS transport dataset. In addition, each of the codebook,
documentation, file frequencies, and SAS transport dataset were made
directly, individually accessible by a separate link, in brackets, next to
the data section name.
Starting with data years 2003-2004, the documentation, codebook, and
frequencies have been combined in a single Adobe PDF file, accessible from
the single Docs link. A new, direct link to Procedures is also
being provided for each data section on the data page. For exam sections
this will link to the examination procedures manual for that section, and
for questionnaire sections this will link to the questionnaire instrument.
Procedure manuals for laboratory sections are also available but as there
may be multiple documents for a given lab, these will remain on a separate
page, accessible by clicking the Laboratory Procedures Manuals link
at the top of the list of laboratory data sections. Because all data and
documentation files are directly accessible, self-extracting zip files will
no longer be required.
6. What format are the data files in? Can
they be used with SAS, SPSS, or STATA?
The files are in SAS transport file format. They can be used with any
package that supports this file format. For statistical/analytical packages
that do not support SAS transport file format, you need to convert the file
to a different format using an appropriate software package. Please note
that NHANES 2007-2008 is a complex probability sample and proper analysis of
the data usually requires statistical software that specifically
incorporates sample design complications such as weighting and clustering.
7. What are the different data formats? Since 1999, data files are released as SAS Transport files in .xpt
format. This includes 1999+ survey data, as well as newly released or
updated data files from NHANES III, II, and I. The SAS Transport files can
be opened directly as a temporary work file or permanent libraries can be
created using SAS. Please see the
Continuous NHANES Web Tutorial for instructions. Also, the files can
be opened with the free SAS System Viewer and converted to other formats for
use with other software packages. Please see Question 17
for more information on this option.
NHANES III data released or updated after 1999 are available as SAS
Transport (.xpt) files and can be used like the continuous survey files.
NHANES III data released before 1999 was released as .dat files, which are
formatted ASCII data files (text files). Running the associated SAS code
creates a SAS dataset. Please see the
NHANES III Web Tutorial for instructions on using these data files.
Additionally, the text files can be used with other software packages.
Please see your software package's instructions for working with text files
(.dat or .txt).
NHANES II and I data files released or updated after 1999 are also available
as SAS Transport (.xpt files) and can be used like the continuous NHANES
data files. NHANES II and I data files released before 1999 are formatted
text files (.txt) wrapped in a self-extracting executable file (.exe).
Running the associated SAS code creates a SAS dataset from the text file.
Please see the NHANES
II Web Tutorial or NHANES I Web Tutorial
for instructions on how to do this. Additionally, the text files can be used
with other software packages. Please see your software package's
instructions for working with text files (.txt).
8. When will other data be available from
NHANES 2007-2008?
As other data is processed and ready for public release it will be released
on the NHANES website. Certain data will only be available at the NCHS Research Data Center. The RDC data
consists of adolescent data (people less than 20 years old) such as: youth
conduct disorder, sexual behavior, drug use, alcohol use, and CDISC. Please
refer to the NHANES What's New page for
further details.
9. Do I need to use SAS software to view
NHANES data? No. You can view NHANES data with the SAS System Viewer—a free download
from SAS Institute. Currently, most NHANES is available in the SAS transport
format (.xpt), which can be used in several statistical software programs,
including SUDAAN and SPSS. Users desiring alternate data formats can use the
SAS Viewer to convert the transport file into space-, tab-, or
comma-delimited text files for use in additional software programs, such as
Microsoft Excel.
10. Where can I find the analytic
guidelines (weighting, variance estimation, sample design)?
The analytic
guidelines provide information on the sample design and on
recommended methodologies for analyzing the data. In particular, the
guidelines provide information on how the sample persons were selected, how
the various survey weights were calculated, what particular survey weight
should be used to provide survey estimates, how to compute sampling
variances for those estimates, and recommended sample sizes for analysis.
11. Will data and weights be available on
public use files for single years such as 1999, 2000, 2001, or 2002?
No.
12. Will data and weights be available on
public use files in combined datasets for three year and six year periods
such as 1999-2001, 2002-2004, 2006-2008, 1999-2004, 2001-2006, or 2003-2008?
No. The continuous NHANES will be grouped for two year periods for public
release (i.e. 1999-2000, 2001-2002, 2003-2004, etc.). Combining two or more
two-year periods is possible (i.e. 1999-2002, etc.). The two-year sample
weights should be used for NHANES 1999-2000, NHANES 2001-2002, NHANES
2003-2004, and NHANES 2005-2006 analyses, respectively. The four-year sample
weights should be used for combined analyses of NHANES 1999-2000 & NHANES
2001-2002 data.
Six-year sample weights for NHANES 1999-2004 should be calculated by
researchers as follows: With the first two dataset weights (NHANES
1999-2002) already averaged as a four-year sample weight, then the six year
weight would be WT99-04 = (2/3) x WT99-02 + (1/3) x WT03-04, where WT99-02
is the variable WTMEC4YR from the NHANES 2001-2002 demographic file dataset,
and WT03-04 is the variable WTMEC2YR from the NHANES 2003-2004 demographic
file dataset. Eight-year sample weights for NHANES 1999-2006 should be
calculated similarly to calculating the six-year sample weight. WT99-06 =
(1/2) x WT99-02 + (1/4) x WT03-04 + (1/4) x WT05-06, where WT05-06 is the
variable WTMEC2YR from the NHANES 2005-2006 demographic file dataset.
Six-year sample weights for 2001-2006 can be combining by using the 2-year
weights found in the demographic files. For example, WT01-06 = (1/3) x
WT01-02 + (1/3) x WT03-04 + (1/3) x WT05-06.
Please refer to the NHANES Analytic
Guidelines provided with the data release files to determine the
appropriate methodology for analyses of combined years of data.
13. What is the sample size for a
particular data item, questionnaire section, examination component, or
laboratory analyte?
For any particular questionnaire section, examination component or
laboratory data file you will only find records for survey participants that
were eligible. For example, suppose 6,000 people were eligible for an
examination in the MEC and only 5,000 were eligible for the muscular
strength component due to age restrictions. Of the 5,000 suppose only 4,500
participated in the examination; the other 500 either refused or did not
have enough time to participate in the exam. The data file would have 5,000
records with 500 records having missing data. For further details refer to
the "frequency" counts document for each of the data files.
14. Where can I find a description of the
codebook contents? Documentation Contents
15. How do I determine the skip patterns
for a questionnaire section?
The first step is to review all of the documentation for the questionnaires.
To review skip patterns look at the complete questionnaire instrument. Please
note that not all questionnaire items are released due to small sample sizes
and confidentiality/sensitivity issues, but all skip pattern integrity was
maintained and validated.
16. How are missing values, "blank but
applicable", "don't know" and other values coded?
There are codes for refused (7-fill: that is 7, or 77, or 777, …, depending
on the number of digits required for a particular data value), don't know
(9-fill), and missing values (a blank field) which means the person was not
asked the question or given the test. There is no longer a specific code for
those cases where the variable response is “blank but applicable”; for such
cases the values are designated as missing values. For laboratory data there
are special considerations. When a laboratory value was less than the lower
limit of detection (LOD), a “fill” value based on the LOD was used instead
of the sample value as the sample value was deemed “not detectable.” An
indicator variable taking value (0 or 1) is used to identify which values
are real and which values are fill values.
17. I have questions about using the data,
protocols, etc - where can I get help?
First, and most important, refer to the questionnaire, exam component, or
laboratory descriptions. A second option is to contact NCHS by using
the Contact Us online
feature. If you need help beyond this you can pose your question to the NHANES listserv – please note however
that the NHANES program staff do not routinely provide technical responses
to questions posted to the listserv.
18. Why isn't the adolescent data on
alcohol use, smoking, sexual behavior, reproductive health and drug use
available as a public release file?
These files have not been released on the NHANES website due to
confidentiality concerns. Adolescent data files containing this sensitive
information will be made available at the NCHS Research Data Center.