Skip left side navigation and go to content

I. Introduction

II. Definitions

III. Data Set Requests

A. Responsibilities of Investigators seeking Access to Study Data
B. Responsibilities of Study Investigators in Preparing Data Sets

  1. Documentation
  2. Data Storage and Format
  3. Content of NHLBI Data Set
  4. Timing of Release of NHLBI Data Set

IV. Procedures for Protection of Privacy for NHLBI Data Sets

A. Institute Review and Approval of NHLBI Data Set Preparation
B. Guidelines for NHLBI Data Set Preparation


Note that the following policy is an update from the 10/25/2002 policy available at: http://www.nhlbi.nih.gov/resources/deca/policy.htm.

For contract-supported clinical trials and epidemiology studies: Requirements for preparation of data sets have been modified to shorten the timeline and expand the data to be included, as described below. These changes will be effective with contracts awarded on or after October 1, 2005.

For grant-supported new and competing applications for selected epidemiology studies and clinical trials: Applications received on or after October 1, 2005 will be expected to include provisions for submission of data sets as described below. Applicants are expected to include the costs of data set preparation, with appropriate justification, in their budget requests. Funds awarded for data set preparation will be restricted for use solely for that purpose and only upon release by the NHLBI.

In general the following types of studies will be included under this policy:

* Clinical trials and epidemiology studies that are supported by the U01 (cooperative agreement) mechanism AND have 500 or more participants

* Trials or studies requesting $500,000 direct costs or more in any one year and identified as being of high programmatic interest to the NHLBI, as indicated in the Institute's letter of agreement to accept assignment of the application

* Ancillary studies based on clinical trials or epidemiology studies that are required by this policy to provide NHLBI data sets.

Requests for exceptions to these guidelines will be considered by the NHLBI if adequately justified. Examples of adequate justification include: unavoidable and unanticipated delays in making data available within the parent study for analysis; presence of provisions in informed consent prohibiting data set release; evidence of unacceptability of data set release to communities under study; measurements on too small a subset of participants to be of scientific value. All such requests should be addressed to the Director of the Program Division funding the award.

Policies for data sharing from studies of American Indian or Alaska Native tribes and other sovereign entities will be developed with them and will be provided as available at a later date.

I. Introduction

The National Heart, Lung, and Blood Institute (NHLBI) has supported data collection from participants in numerous clinical trials and epidemiologic studies. These data from well-characterized population samples constitute an important scientific resource. It is the view of the NHLBI that their full value can only be realized if they are made available, under appropriate terms and conditions consistent with the informed consent provided by individual participants, in a timely manner to the largest possible number of qualified investigators.

Under no circumstances will data relating to an individual be distributed in any way that is inconsistent with his or her informed consent. Data sets without an informed consent permitting use by non-study researchers will only be released if the requester's IRB has approved a waiver of informed consent based on minimal risk to the participants [see Institutional Review Board section].

Data sets distributed under this policy include only data with personal identifiers and other variables that might enable individual participants to be identified, such as outliers, dates, and study sites, removed or otherwise modified. Because it may still be possible to combine the data with other publicly available data and thereby determine with reasonable certainty the identity of individual participants, these data sets are not truly anonymous. They are, therefore, only provided to investigators who agree in advance to adhere to established policies for distribution.

Data sets are available for NHLBI studies supported by contract and for selected studies supported by cooperative agreements or other grants. However, data will not be provided if the Institute deems them to be unreliable or invalid. All proposed data exclusions must be strongly justified and whether proposed by the study investigators or Institute staff, each one must be reviewed and approved by the director of the NHLBI program division that sponsored the study.

II. Definitions

Data - Information collected and recorded from study participants through periodic examinations and follow-up contacts, not to include original specimens or images.

Commercial purpose - Data will be considered as being for a commercial purpose if they are to be used by an investigator who is an employee of a for-profit organization, if they are to be used by an investigator to satisfy a contractual relationship with a for-profit organization, or if they are to be used by an investigator as the basis for a consulting relationship with a for-profit organization. Data will also be considered as being for a commercial purpose if the investigator(s) take any affirmative steps to facilitate commercial use of results derived from the data.

Non-Commercial Purpose Data Set – A data set consisting of all records except those for participants who requested that their data not be shared beyond the initial study investigators.

Commercial Purpose Data Set – A data set consisting of all records except those for participants who requested that their data not be shared beyond the initial study investigators or used for commercial purposes.

Non-Commercial Purpose Pedigree/Genetic Data Set – A pedigree/genetic data set consisting of all pedigree and genetic data except those for participants who requested that their data not be shared beyond the initial study investigators.

Commercial Purpose Pedigree/Genetic Data Set – A pedigree/genetic data set consisting of all pedigree and genetic data except those for participants who requested that their data not be shared beyond the initial study investigators or used for commercial purposes.

III. Data Set Requests

A. Responsibilities of Investigators Seeking Access to Study Data

To ensure that the confidentiality and privacy of study participants are protected, all investigators seeking access to data from NHLBI-supported studies that are in the possession of the Institute must execute and submit with their requests the appropriate standard Distribution Agreement for each study. Because of the potential for identification of individual participants and consistent with the conditions included in the Distribution Agreements, investigators seeking access to study data must also submit an approval from their Institutional Review Board (IRB). An expedited review from the IRB is acceptable.

Unless a specific request for the Non-Commercial Purpose Data Set is received, investigators requesting access to study data will be provided with the Commercial Purpose Data Set. Investigators seeking access to the Non-Commercial Purpose Data Set must submit a signed statement affirming that they will not be using the data for a commercial purpose as defined above. Investigators who do so must recognize that if they subsequently develop results of potential commercial value, they will have to replicate those results using the Commercial Purpose Data Set before they can take any affirmative steps to facilitate commercial use of the results.

Investigators interested in receiving a Pedigree/Genetic Data Set must specifically request it. Investigators seeking access to a Pedigree/Genetic Data Set must describe the specific need for access to it in the Research Project description of their signed Data Distribution Agreement. Investigators using these data sets are strongly discouraged from publishing individual pedigree structures and are prohibited from investigation into issues such as non-paternity.

Investigators should recognize that they are bound by the conditions of the relevant study Distribution Agreement. Failure to comply with it could result in denial of further access to NHLBI data sets. Moreover, violation of the confidentiality requirements in a Distribution Agreement may lead to legal action against the recipients of the data by study participants, their families, or the U.S. Government

B. Responsibilities of Study Investigators in Preparing Data Sets

Investigators in NHLBI studies covered by this policy are required as part of the terms and conditions of their awards to prepare and deliver to the NHLBI data sets that satisfy NHLBI requirements. Included among them are documentation, elimination of personal identifiers, and modification of other data elements so as to reduce the likelihood that any individual participant can be identified.

Two data sets, i.e., a Non-Commercial Purpose Data Set and a Commercial Purpose Data Set, and, if applicable, two pedigree/genetic data sets, i.e., a Non-Commercial Purpose Pedigree/Genetic Data Set and a Commercial Purpose Pedigree/Genetic Data Set, and associated documentation, must be provided in electronic form to the Institute. In addition, investigators must provide the Institute with two separate lists of participant identification numbers, one consisting of those participants who asked that their data not to be shared beyond the initial study investigators and the other of those participants who asked that their data not be used for commercial purposes.

Investigators in ancillary studies based on ongoing (parent) studies that are required by this policy to produce data sets must submit ancillary study data to the NHLBI through the parent study Coordinating Center or data submission process established by the parent study. Ancillary studies conducted on small subsets of a study sample may be appropriate for exclusion from data sets; requests for their exclusion should be justified and addressed as described in the Introduction above.

  1. Documentation Documentation for data sets must be comprehensive and sufficiently clear to enable investigators who are not familiar with a data set to use it. The documentation must include data collection forms, study procedures and protocols, descriptions of all variable recoding performed, and a list of major study publications.

    In addition, a summary documentation file, usually called a "readme" file, is required. It must provide a complete overview of the data and a description of their use for investigators who are not familiar with the data set. It must also contain a brief description of the study (including a general orientation to the study, its components, and its examination and follow-up timeline), a listing of all files being provided, a description of system requirements, a generation program code for installing a SAS file from the SAS export data file, and a frequency distribution for selected key variables.


  2. Selected Documentation will be used to describe the study on the Data Repository website. Examples include Forms, Data Dictionaries, Descriptive Statistics, and the Study Protocol. These documents will need to be accessible to those with disabilities according to section 508 of the Rehabilitation Act. Information Management Services has prepared a summary document in adobe acrobat on 508 compliance issues with links to resources on creating and checking accessibility. The HHS maintains a website devoted to 508 issues as well.
  1. Data Storage and Format The data are to be stored on a CD ROM unless the investigators and the NHLBI mutually agree upon an alternative storage medium. Both the comprehensive documentation and the summary documentation must be prepared in a consistent format, either as a Word Perfect, MS Word, ASCII, or portable document format (PDF) file and included on the same storage medium as the data set. To ensure access by users with disabilities, all PDF files must be created in Adobe Acrobat version 5.0 or higher. Documentation that is not available in electronic form, such as data collection forms, should be scanned into a graphics file, converted to a PDF file using Adobe Acrobat version 5.0 or higher, and saved on the same medium as the data set. Pedigree data should be provided in a format readable by standard genetic analysis programs such as SAGE and SOLAR, with one individual's data per line beginning with pedigree identifier, individual's id, father's id, mother's id, and individual's sex.
  1. Content of NHLBI Data – In addition to summary information, data sets also include for each participant those raw data elements (e.g., food item data or individual electrocardiographic lead scores) that have not otherwise been processed into summary information.

    a. Clinical Trials –included are baseline, interim visit, ancillary data, and outcome data, along with laboratory measurements not otherwise summarized.

    b. Observational Epidemiology Studies – included are all of the examination data obtained in each examination cycle, ancillary data, and/or all of the follow-up information available up to the last follow-up cycle cutoff date

  2. Timing of Release of NHLBI Data

a. Clinical Trials – Data are prepared by the study coordinating center and sent to the NHLBI after publication of the primary clinical trial results. They are available for release once they are received and checked by the NHLBI. The data sets must be submitted to the NHLBI no later than 3 years after the final visit of the participants to their clinical trial sites or 2 years after the main paper of the trial has been published, whichever comes first.

b. Observational Epidemiology Studies – Epidemiology studies typically have an examination component and a mortality/morbidity follow-up component. Data from each cycle of an examination or follow-up component are prepared by the study coordinating center and sent to the NHLBI for distribution as a data set no later than 3 years after the completion of each examination or follow-up cycle or 2 years after the baseline, follow-up, genetic, ancillary study, or other data set is finalized within the study for analysis for use in publication, whichever comes first.

c. Ancillary Studies – In those cases in which the timeline for an ancillary study differs from that of its parent study, the release date will relate to the timeline of the ancillary study.

IV. Procedures for Protection of Privacy for NHLBI Data Sets

A.          Institute Review and Approval of Data Set Preparation

The NHLBI requires that the data be provided in a manner that protects the privacy of study participants. The Institute requires appropriate documentation of the steps taken to protect their privacy in preparing a data set. A summary of all proposed modifications and deletions to be made to a data set in preparing it must be submitted to and approved by the director of the division that sponsored the study prior to their implementation.

B.          Guidelines for NHLBI Data Set Preparation

The following guidelines provide a framework for decision-making regarding preparation of data sets:

  1. All data for participants who refused to permit sharing their data with other researchers must be deleted from the Non-Commercial Purpose Data Set.
  2. All data for participants who only refused to permit sharing their data for commercial purposes must also be deleted from the Commercial Purpose Data Set .
  3. Participant identifiers:

a. Obvious identifiers (e.g., name, addresses, social security numbers, place of birth, city of birth, contact data) must be deleted.

b. New identification numbers must replace original identification numbers. Codes linking the new and original data should be sent to the NHLBI in a separate file, not included on the CD ROM, so that linkage may be made if necessary for future research.

  • Variables that might lead to the identification of participants and of centers in multicenter studies, or variables that are sensitive, inaccurate, or of limited scientific utility:

a. Clinical center identifier -- In trials or studies that have only a few centers and relatively few participants per center, the data set should not contain center identifiers. In trials that have either many centers or a large number of participants per center, the data may offer little possibility of identifying individuals. For them, the investigators and the NHLBI will determine whether to include them on a case-by-case basis.

b. Interviewer or technician identification numbers must be recoded or deleted.

c. Sensitive data, including illicit drug use, risky behaviors (e.g., carrying a gun or exhibiting violent behavior), sexual behaviors, and selected medical conditions (e.g., alcoholism, HIV/AIDS) must be deleted.

d. Regional variables with little or no variation within a center because they could be used to identify that center must be deleted

e. Unedited, verbatim responses that are stored as text data (e.g., specified in "other" category) must be deleted

f. Pedigree and genetic data will be distributed in separate data sets only to investigators specifically requesting them. Genotyping data for any person in whom potential pedigree errors are detected must be deleted.

  • Dates: All dates should be coded relative to a specific reference point (e.g., date of randomization or study entry). This provides privacy protection for individuals known to be in a study who are known to have had some significant event (e.g., a myocardial infarction) on a particular date.
  • Variables with low frequencies for some values, that might be used to identify participants, may be recoded. These might include:

a. Socioeconomic and demographic data (e.g., marital status, occupation, income, education, language, number of years married).

b. Household and family composition (e.g., number in household, number of siblings or children, ages of children or step-children, number of brothers and sisters, relationships, spouse in study).

c. Numbers of pregnancies, births, or multiple children within a birth.

d. Anthropometry measures (e.g., height, weight, waist girth, hip girth, body mass index).

e. Physical characteristics (e.g., missing limbs).

f. Detailed medication, hospitalization, and cause of death codes, especially those related to sensitive medical conditions as listed above, such as HIV/AIDS or psychiatric disorders.

g. Prior medical conditions with low frequency (e.g., group specific cancers into broader categories) and related questions such as age at diagnosis and current status

h. Parent and sibling medical history (e.g., parents' ages at death).

  • Race/ethnicity and sex information when very few participants are in certain groups or cells.

a. Polychotomous variables: values or groups should be collapsed so as to ensure a minimum number of participants (e.g., at least 20) for each value within each race-sex cell.

b. Continuous variables: distributions should be truncated if needed to ensure that a minimum number of participants (e.g., at least 20) have the same highest and lowest values in each race-sex cell.

c. Dichotomous variables: data should either be grouped with other related variables so as to ensure a minimum number of participants (e.g., at least 20) in each race-sex cell or deleted

  • The investigators may realize that other variables may make it easy to identify individuals. All such variables should be recoded or removed. The NHLBI program officer or project administrator should be consulted concerning such variables.

October, 2008

Skip footer links and go to content