III. Data Set
of Investigators seeking Access to Study Data
of Study Investigators in Preparing Data Sets
- Data Storage
- Content of
Limited Access Data
- Timing of
Release of Limited Access Data
for Protection of Privacy for Limited Access Data Sets
Review and Approval of Limited Access Preparation
for Limited Access Preparation
Note that the following policy is an update from the
10/25/2002 policy available at:
For contract-supported clinical trials and
epidemiology studies: Requirements for preparation of limited access data sets
have been modified to shorten the timeline and expand the data to be included,
as described below. These changes will be effective with contracts awarded on
or after October 1, 2005.
For grant-supported new and competing applications for
selected epidemiology studies and clinical trials: Applications received on or
after October 1, 2005 will be expected to include provisions for submission of
limited access data sets as described below. Applicants are expected to include
the costs of limited access data set preparation, with appropriate
justification, in their budget requests. Funds awarded for limited access data
set preparation will be restricted for use solely for that purpose and only
upon release by the NHLBI.
In general the following types of studies will be
included under this policy:
* Clinical trials and epidemiology studies that are
supported by the U01 (cooperative agreement) mechanism AND have 500 or more
* Trials or studies requesting $500,000 direct costs
or more in any one year and identified as being of high programmatic interest
to the NHLBI, as indicated in the Institute's letter of agreement to accept
assignment of the application
* Ancillary studies based on clinical trials or
epidemiology studies that are required by this policy to provide limited access
Requests for exceptions to these guidelines will be
considered by the NHLBI if adequately justified. Examples of adequate
justification include: unavoidable and unanticipated delays in making data
available within the parent study for analysis; presence of provisions in
informed consent prohibiting LADS release; evidence of unacceptability of LADS
release to communities under study; measurements on too small a subset of
participants to be of scientific value. All such requests should be addressed
to the Director of the Program Division funding the award.
Policies for data sharing from studies of American
Indian or Alaska Native tribes and other sovereign entities will be developed
with them and will be provided as available at a later date.
The National Heart, Lung, and Blood Institute (NHLBI)
has supported data collection from participants in numerous clinical trials and
epidemiologic studies. These data from well-characterized population samples
constitute an important scientific resource. It is the view of the NHLBI that
their full value can only be realized if they are made available, under
appropriate terms and conditions consistent with the informed consent provided
by individual participants, in a timely manner to the largest possible number
of qualified investigators.
Under no circumstances will data relating to an
individual be distributed in any way that is inconsistent with his or her
informed consent. Data sets without an informed consent permitting use by
non-study researchers will only be released if the requester's IRB has approved
a waiver of informed consent based on minimal risk to the participants [see
Institutional Review Board section].
Data sets distributed under this policy include only
limited access data, i.e., records with personal identifiers and
other variables that might enable individual participants to be identified,
such as outliers, dates, and study sites, removed or otherwise modified.
Because it may still be possible to combine the limited access data with other
publicly available data and thereby determine with reasonable certainty the
identity of individual participants, these data sets are not truly anonymous.
They are, therefore, only provided to investigators who agree in advance to
adhere to established policies for distribution.
Limited access data sets are available for NHLBI
studies supported by contract and for selected studies supported by cooperative
agreements or other grants. However, data will not be provided for limited
access if the Institute deems them to be unreliable or invalid. All proposed
data exclusions must be strongly justified and whether proposed by the study
investigators or Institute staff, each one must be reviewed and approved by the
director of the NHLBI program division that sponsored the study.
Data - Information collected and recorded
from study participants through periodic examinations and follow-up contacts,
not to include original specimens or images.
Commercial purpose - Data will be considered
as being for a commercial purpose if they are to be used by an investigator who
is an employee of a for-profit organization, if they are to be used by an
investigator to satisfy a contractual relationship with a for-profit
organization, or if they are to be used by an investigator as the basis for a
consulting relationship with a for-profit organization. Data will also be
considered as being for a commercial purpose if the investigator(s) take any
affirmative steps to facilitate commercial use of results derived from the
Non-Commercial Purpose Data Set A data
set consisting of all records except those for participants who requested that
their data not be shared beyond the initial study investigators.
Commercial Purpose Data Set A data set
consisting of all records except those for participants who requested that
their data not be shared beyond the initial study investigators or used for
Non-Commercial Purpose Pedigree/Genetic Data
Set A pedigree/genetic data set consisting of all pedigree and
genetic data except those for participants who requested that their data not be
shared beyond the initial study investigators.
Commercial Purpose Pedigree/Genetic Data Set
A pedigree/genetic data set consisting of all pedigree and genetic data
except those for participants who requested that their data not be shared
beyond the initial study investigators or used for commercial purposes.
III. Data Set Requests
Responsibilities of Investigators Seeking Access to Study Data
To ensure that the confidentiality and privacy of
study participants are protected, all investigators seeking access to data from
NHLBI-supported studies that are in the possession of the Institute must
execute and submit with their requests the appropriate standard Distribution
Agreement for each study. Because of the potential for identification of
individual participants and consistent with the conditions included in the
Distribution Agreements, investigators seeking access to study data must also
submit an approval from their Institutional Review Board (IRB). An expedited
review from the IRB is acceptable.
Unless a specific request for the Non-Commercial
Purpose Data Set is received, investigators requesting access to study data
will be provided with the Commercial Purpose Data Set. Investigators seeking
access to the Non-Commercial Purpose Data Set must submit a signed statement
affirming that they will not be using the data for a commercial purpose as
defined above. Investigators who do so must recognize that if they subsequently
develop results of potential commercial value, they will have to replicate
those results using the Commercial Purpose Data Set before they can take any
affirmative steps to facilitate commercial use of the results.
Investigators interested in receiving a
Pedigree/Genetic Data Set must specifically request it. Investigators seeking
access to a Pedigree/Genetic Data Set must describe the specific need for
access to it in the Research Project description of their signed Data
Distribution Agreement. Investigators using these data sets are strongly
discouraged from publishing individual pedigree structures and are prohibited
from investigation into issues such as non-paternity.
Investigators should recognize that they are bound
by the conditions of the relevant study Distribution Agreement. Failure to
comply with it could result in denial of further access to NHLBI data sets.
Moreover, violation of the confidentiality requirements in a Distribution
Agreement may lead to legal action against the recipients of the data by study
participants, their families, or the U.S. Government
Responsibilities of Study Investigators in Preparing Data Sets
Investigators in NHLBI studies covered by this
policy are required as part of the terms and conditions of their awards to
prepare and deliver to the NHLBI limited access data sets that satisfy NHLBI
requirements. Included among them are documentation, elimination of personal
identifiers, and modification of other data elements so as to reduce the
likelihood that any individual participant can be identified.
Two limited access data sets, i.e., a Non-Commercial
Purpose Data Set and a Commercial Purpose Data Set, and, if applicable, two
pedigree/genetic data sets, i.e., a Non-Commercial Purpose Pedigree/Genetic
Data Set and a Commercial Purpose Pedigree/Genetic Data Set, and associated
documentation, must be provided in electronic form to the Institute. In
addition, investigators must provide the Institute with two separate lists of
participant identification numbers, one consisting of those participants who
asked that their data not to be shared beyond the initial study investigators
and the other of those participants who asked that their data not be used for
Investigators in ancillary studies based on ongoing
(parent) studies that are required by this policy to produce limited access
data sets must submit ancillary study data to the NHLBI through the parent
study Coordinating Center or limited access data submission process established
by the parent study. Ancillary studies conducted on small subsets of a study
sample may be appropriate for exclusion from limited access data sets; requests
for their exclusion should be justified and addressed as described in the
- Documentation Documentation for
limited access data sets must be comprehensive and sufficiently clear to enable
investigators who are not familiar with a data set to use it. The documentation
must include data collection forms, study procedures and protocols,
descriptions of all variable recoding performed, and a list of major study
In addition, a summary documentation file, usually called
a "readme" file, is required. It must provide a complete overview of the data
and a description of their use for investigators who are not familiar with the
data set. It must also contain a brief description of the study (including a
general orientation to the study, its components, and its examination and
follow-up timeline), a listing of all limited access files being provided, a
description of system requirements, a generation program code for installing a
SAS file from the SAS export data file, and a frequency distribution for
selected key variables.
- Data Storage and Format The data
are to be stored on a CD ROM unless the investigators and the NHLBI mutually
agree upon an alternative storage medium. Both the comprehensive documentation
and the summary documentation must be prepared in a consistent format, either
as a Word Perfect, MS Word, ASCII, or portable document format (PDF) file and
included on the same storage medium as the limited access data set. To ensure
access by users with disabilities, all PDF files must be created in Adobe
Acrobat version 5.0 or higher. Documentation that is not available in
electronic form, such as data collection forms, should be scanned into a
graphics file, converted to a PDF file using Adobe Acrobat version 5.0 or
higher, and saved on the same medium as the data set. Pedigree data should be
provided in a format readable by standard genetic analysis programs such as
SAGE and SOLAR, with one individual's data per line beginning with pedigree
identifier, individual's id, father's id, mother's id, and individual's
- Content of Limited Access Data In
addition to summary information, limited access data sets also include for each
participant those raw data elements (e.g., food item data or individual
electrocardiographic lead scores) that have not otherwise been processed
into summary information.
a. Clinical Trials included are
baseline, interim visit, ancillary data, and outcome data, along with
laboratory measurements not otherwise summarized.
b. Observational Epidemiology Studies
included are all of the examination data obtained in each examination
cycle, ancillary data, and/or all of the follow-up information available up to
the last follow-up cycle cutoff date
- Timing of Release of Limited Access
a. Clinical Trials Data are prepared
by the study coordinating center and sent to the NHLBI after publication of the
primary clinical trial results. They are available for release once they are
received and checked by the NHLBI. The data sets must be submitted to the NHLBI
no later than 3 years after the final visit of the participants to their
clinical trial sites or 2 years after the main paper of the trial has been
published, whichever comes first.
b. Observational Epidemiology Studies
Epidemiology studies typically have an examination component and a
mortality/morbidity follow-up component. Data from each cycle of an examination
or follow-up component are prepared by the study coordinating center and sent
to the NHLBI for distribution as a limited access data set no later than 3
years after the completion of each examination or follow-up cycle or 2 years
after the baseline, follow-up, genetic, ancillary study, or other data set is
finalized within the study for analysis for use in publication, whichever comes
c. Ancillary Studies In those cases
in which the timeline for an ancillary study differs from that of its parent
study, the release date will relate to the timeline of the ancillary study.
IV. Procedures for Protection of
Privacy for Limited Access Data Sets
Institute Review and
Approval of Limited Access Preparation
The NHLBI requires that the data be provided in a
manner that protects the privacy of study participants. The Institute requires
appropriate documentation of the steps taken to protect their privacy in
preparing a limited access data set. A summary of all proposed modifications
and deletions to be made to a data set in preparing it for limited access must
be submitted to and approved by the director of the division that sponsored the
study prior to their implementation.
Guidelines for Limited
The following guidelines provide a framework for
decision-making regarding preparation of limited access data sets:
- All data for participants who refused to permit
sharing their data with other researchers must be deleted from the
Non-Commercial Purpose Data Set.
- All data for participants who only refused to
permit sharing their data for commercial purposes must also be deleted from the
Commercial Purpose Data Set .
- Participant identifiers:
a. Obvious identifiers (e.g., name, addresses,
social security numbers, place of birth, city of birth, contact data) must be
b. New identification numbers must replace original
identification numbers. Codes linking the new and original data should be sent
to the NHLBI in a separate file, not included on the CD ROM, so that linkage
may be made if necessary for future research.
- Variables that might lead to the identification
of participants and of centers in multicenter studies, or variables that are
sensitive, inaccurate, or of limited scientific utility:
a. Clinical center identifier -- In trials or
studies that have only a few centers and relatively few participants per
center, the data set should not contain center identifiers. In trials that have
either many centers or a large number of participants per center, the data may
offer little possibility of identifying individuals. For them, the
investigators and the NHLBI will determine whether to include them on a
b. Interviewer or technician identification numbers
must be recoded or deleted.
c. Sensitive data, including illicit drug use, risky
behaviors (e.g., carrying a gun or exhibiting violent behavior), sexual
behaviors, and selected medical conditions (e.g., alcoholism, HIV/AIDS) must be
d. Regional variables with little or no variation
within a center because they could be used to identify that center must be
e. Unedited, verbatim responses that are stored as
text data (e.g., specified in "other" category) must be deleted
f. Pedigree and genetic data will be distributed in
separate data sets only to investigators specifically requesting them.
Genotyping data for any person in whom potential pedigree errors are detected
must be deleted.
- Dates: All dates should be coded relative to a
specific reference point (e.g., date of randomization or study entry). This
provides privacy protection for individuals known to be in a study who are
known to have had some significant event (e.g., a myocardial infarction) on a
- Variables with low frequencies for some values,
that might be used to identify participants, may be recoded. These might
a. Socioeconomic and demographic data (e.g., marital
status, occupation, income, education, language, number of years married).
b. Household and family composition (e.g., number in
household, number of siblings or children, ages of children or step-children,
number of brothers and sisters, relationships, spouse in study).
c. Numbers of pregnancies, births, or multiple
children within a birth.
d. Anthropometry measures (e.g., height, weight,
waist girth, hip girth, body mass index).
e. Physical characteristics (e.g., missing
f. Detailed medication, hospitalization, and cause
of death codes, especially those related to sensitive medical conditions as
listed above, such as HIV/AIDS or psychiatric disorders.
g. Prior medical conditions with low frequency
(e.g., group specific cancers into broader categories) and related questions
such as age at diagnosis and current status
h. Parent and sibling medical history (e.g.,
parents' ages at death).
- Race/ethnicity and sex information when very few
participants are in certain groups or cells.
a. Polychotomous variables: values or groups should
be collapsed so as to ensure a minimum number of participants (e.g., at least
20) for each value within each race-sex cell.
b. Continuous variables: distributions should be
truncated if needed to ensure that a minimum number of participants (e.g., at
least 20) have the same highest and lowest values in each race-sex cell.
c. Dichotomous variables: data should either be
grouped with other related variables so as to ensure a minimum number of
participants (e.g., at least 20) in each race-sex cell or deleted
- The investigators may realize that other
variables may make it easy to identify individuals. All such variables should
be recoded or removed. The NHLBI program officer or project administrator
should be consulted concerning such variables.
June 27, 2005