I.
Introduction
II.
Definitions
III. Data Set
Requests
A.
Responsibilities
of Investigators seeking Access to Study Data B.
Responsibilities
of Study Investigators in Preparing Data Sets
- Documentation
- Data Storage and
Format
- Content of Limited
Access Data
- Timing of Release
of Limited Access Data
IV. Procedures for
Protection of Privacy for Limited Access Data Sets
A.
Institute
Review and Approval of Limited Access Preparation B.
Guidelines for
Limited Access Preparation
I. Introduction
The National Heart, Lung, and Blood Institute (NHLBI)
has supported data collection from participants in numerous clinical trials and
epidemiologic studies. These data from well-characterized population samples
constitute an important scientific resource. It is the view of the NHLBI that
their full value can only be realized if they are made available, under
appropriate terms and conditions consistent with the informed consent provided
by individual participants, in a timely manner to the largest possible number
of qualified investigators.
Under no circumstances will data relating to an
individual be distributed in any way that is inconsistent with his or her
informed consent. Data sets without an informed consent permitting use by
non-study researchers will only be released if the requester's IRB has approved
a waiver of informed consent based on minimal risk to the participants [see
Institutional Review Board section].
Data sets distributed under this policy include only
ìlimited access dataî, i.e., records with personal identifiers and
other variables that might enable individual participants to be identified,
such as outliers, dates, and study sites, removed or otherwise modified.
Because it may still be possible to combine the limited access data with other
publicly available data and thereby determine with reasonable certainty the
identity of individual participants, these data sets are not truly anonymous.
They are, therefore, only provided to investigators who agree in advance to
adhere to established policies for distribution.
Limited access data sets are available for NHLBI
studies supported by contract and for selected studies supported by cooperative
agreements or other grants. However, data will not be provided for limited
access if the Institute deems them to be unreliable or invalid. All proposed
data exclusions must be strongly justified and whether proposed by the study
investigators or Institute staff, each one must be reviewed and approved by the
director of the NHLBI program division that sponsored the study.
II. Definitions
Commercial purpose - Data will be considered
as being for a commercial purpose if they are to be used by an investigator who
is an employee of a for-profit organization, if they are to be used by an
investigator to satisfy a contractual relationship with a for-profit
organization, or if they are to be used by an investigator as the basis for a
consulting relationship with a for-profit organization. Data will also be
considered as being for a commercial purpose if the investigator(s) take any
affirmative steps to facilitate commercial use of results derived from the
data.
Non-Commercial Purpose Data Set A data
set consisting of all records except those for participants who requested that
their data not be shared beyond the initial study investigators.
Commercial Purpose Data Set A data set
consisting of all records except those for participants who requested that
their data not be shared beyond the initial study investigators or used for
commercial purposes.
III. Data Set Requests
A.
Responsibilities of
Investigators Seeking Access to Study Data
To ensure that the confidentiality and privacy of
study participants are protected, all investigators seeking access to data from
NHLBI-supported studies that are in the possession of the Institute must
execute and submit with their requests the appropriate standard Distribution
Agreement for each study. Because of the potential for identification of
individual participants and consistent with the conditions included in the
Distribution Agreements, investigators seeking access to study data must also
submit an approval from their Institutional Review Board (IRB). An expedited
review from the IRB is acceptable.
Unless a specific request for the Non-Commercial
Purpose Data Set is received, investigators requesting access to study data
will be provided with the Commercial Purpose Data Set. Investigators seeking
access to the Non-Commercial Purpose Data Set must submit a signed statement
affirming that they will not be using the data for a commercial purpose as
defined above. Investigators who do so must recognize that if they subsequently
develop results of potential commercial value, they will have to replicate
those results using the Commercial Purpose Data Set before they can take any
affirmative steps to facilitate commercial use of the results.
Investigators should recognize that they are bound
by the conditions of the relevant study Distribution Agreement. Failure to
comply with it could result in denial of further access to NHLBI data sets.
Moreover, violation of the confidentiality requirements in a Distribution
Agreement may lead to legal action against the recipients of the data by study
participants, their families, or the U.S. Government
B.
Responsibilities of Study
Investigators in Preparing Data Sets
Investigators in NHLBI studies supported by contract
and selected other NHLBI-supported observational studies and clinical trials
are required as part of the terms and conditions of their awards to prepare and
deliver to the NHLBI limited access data sets that satisfy NHLBI requirements.
Included among them are documentation, elimination of personal identifiers, and
modification of other data elements so as to reduce the likelihood that any
individual participant can be identified. Limited access data and associated
documentation must be provided in electronic form.
The investigators must provide the Institute with
two limited access data sets, i.e., the Non-Commercial Purpose Data Set and the
Commercial Purpose Data Set. In addition, the investigators must provide the
Institute with two separate lists of patient identification numbers, one
consisting of those participants who asked that their data not to be shared
beyond the initial study investigators and another of those participants who
asked that their data not be used for commercial purposes.
- Documentation Documentation for
limited access data sets must be comprehensive and sufficiently clear to enable
investigators who are not familiar with a data set to use it. The documentation
must include data collection forms, study procedures and protocols,
descriptions of all variable recoding performed, and a list of major study
publications.
In addition, a summary documentation file, usually called
a "readme" file, is required. It must provide a complete overview of the data
and a description of their use for investigators who are not familiar with the
data set. It must also contain a brief description of the study (including a
general orientation to the study, its components, and its examination and
follow-up timeline), a listing of all limited access files being provided, a
description of system requirements, a generation program code for installing a
SAS file from the SAS export data file, and a frequency distribution for
selected key variables
- Data Storage and Format The data
are to be stored on a CD ROM unless the investigators and the NHLBI mutually
agree upon an alternative storage medium. Both the comprehensive documentation
and the summary documentation must be prepared in a consistent format, either
as a Word Perfect, MS Word, ASCII, or portable document format (PDF) file and
included on the same storage medium as the limited access data set. To ensure
access by users with disabilities, all PDF files must be created in Adobe
Acrobat version 5.0 or higher. Documentation that is not available in
electronic form, such as data collection forms, should be scanned into a
graphics file, converted to a PDF file using Adobe Acrobat version 5.0 or
higher, and saved on the same medium as the data set.
- Content of Limited Access Data In
addition to summary information, limited access data sets also include for each
participant those raw data elements (e.g., food item data or individual
electrocardiographic lead scores) that have not otherwise been processed
into summary information.
a. Clinical Trials included are
baseline, interim visit, and outcome data, along with laboratory measurements
not otherwise summarized.
b. Observational Epidemiology Studies
included are all of the examination data obtained in each examination
cycle, and/or all of the follow-up information available up to the last
follow-up cycle cutoff date.
- Timing of Release of Limited Access Data
a. Clinical Trials Data are
prepared by the study coordinating center and sent to the NHLBI after
publication of the primary clinical trial results. They are available for
release once they are received and checked by the NHLBI. The data sets must be
submitted to the NHLBI no later than 3 years after the publication of the
primary outcome paper.
b. Observational Epidemiology Studies
Epidemiology studies typically have an examination component and a
mortality/morbidity follow-up component.
i. Examination Component
Data from each cycle of an examination component are prepared by the study
coordinating center and sent to the NHLBI for distribution as a limited access
data set no later than 5 years after the last patient visit of that cycle
ii. Follow-up Component Data
from a follow-up component are prepared by the study coordinating center and
sent to the NHLBI for distribution no later than 5 years after the last
follow-up cycle cutoff date.
IV. Procedures for Protection of
Privacy for Limited Access Data Sets
A.
Institute Review and
Approval of Limited Access Preparation
The NHLBI requires that the data be provided in a
manner that protects the privacy of study participants. The Institute requires
appropriate documentation of the steps taken to protect their privacy in
preparing a limited access data set. A summary of all proposed modifications
and deletions to be made to a data set in preparing it for limited access must
be submitted to and approved by the director of the division that sponsored the
study prior to their implementation.
B.
Guidelines for Limited
Access Preparation
The following guidelines provide a framework for
decision-making regarding preparation of limited access data sets:
- All data for participants who refused to permit
sharing their data with other researchers must be deleted from the
Non-Commercial Purpose Data Set
- All data for participants who only refused to
permit sharing their data for commercial purposes must also be deleted from the
Commercial Purpose Data Set .
- Participant identifiers:
a. Obvious identifiers (e.g., name, addresses,
social security numbers, place of birth, city of birth, contact data) must be
deleted.
b. New identification numbers must replace
original identification numbers. Codes linking the new and original data should
be sent to the NHLBI in a separate file, not included on the CD ROM, so that
linkage may be made if necessary for future research.
- Variables that might lead to the identification
of participants and of centers in multicenter studies, or variables that are
sensitive, inaccurate, or of limited scientific utility:
a. Clinical center identifier -- In trials or
studies that have only a few centers and relatively few participants per
center, the data set should not contain center identifiers. In trials that have
either many centers or a large number of participants per center, the data may
offer little possibility of identifying individuals. For them, the
investigators and the NHLBI will determine whether to include them on a
case-by-case basis.
b. Interviewer or technician identification
numbers must be recoded or deleted.
c. Sensitive data, including illicit drug use,
risky behaviors (e.g., carrying a gun or exhibiting violent behavior), sexual
behaviors, and selected medical conditions (e.g., alcoholism, HIV/AIDS) must be
deleted.
d. Regional variables with little or no variation
within a center because they could be used to identify that center must be
deleted
e. Unedited, verbatim responses that are stored as
text data (e.g., specified in "other" category) must be deleted
f. Identification of family relationships and
pedigrees.
g. Genetic markers sufficient for individual
identification.
- Dates: : All dates should be coded relative to
a specific reference point (e.g., date of randomization or study entry). This
provides privacy protection for individuals known to be in a study who are
known to have had some significant event (e.g., a myocardial infarction) on a
particular date.
- Variables with low frequencies for some values,
that might be used to identify participants, may be recoded. These might
include:
a. Socioeconomic and demographic data (e.g.,
marital status, occupation, income, education, language, number of years
married).
b. Household and family composition (e.g., number
in household, number of siblings or children, ages of children or
step-children, number of brothers and sisters, relationships, spouse in
study).
c. Numbers of pregnancies, births, or multiple
children within a birth.
d. Anthropometry measures (e.g., height, weight,
waist girth, hip girth, body mass index).
e. Physical characteristics (e.g., missing
limbs).
f. Detailed medication, hospitalization, and cause
of death codes, especially those related to sensitive medical conditions as
listed above, such as HIV/AIDS or psychiatric disorders.
g. Prior medical conditions with low frequency
(e.g., group specific cancers into broader categories) and related questions
such as age at diagnosis and current status
h. Parent and sibling medical history (e.g.,
parents' ages at death).
- Race/ethnicity and sex information when very
few participants are in certain groups or cells.
a. Polychotomous variables: values or groups
should be collapsed so as to ensure a minimum number of participants (e.g., at
least 20) for each value within each race-sex cell.
b. Continuous variables: distributions should be
truncated if needed to ensure that a minimum number of participants (e.g., at
least 20) have the same highest and lowest values in each race-sex cell.
c. Dichotomous variables: data should either be
grouped with other related variables so as to ensure a minimum number of
participants (e.g., at least 20) in each race-sex cell or deleted
- The investigators may realize that other
variables may make it easy to identify individuals. All such variables should
be recoded or removed. The NHLBI program officer or project administrator
should be consulted concerning such variables.
October 25, 2002 |