|
Frequently Asked Questions on Data Sharing
This list of Frequently Asked Questions will be updated as we receive additional questions and finalize the NIH statement on sharing research data. We encourage readers to check in regularily for updates.
March 5, 2003:
- Why should I share my final research data?
Data sharing achieves many important goals for the scientific community, such as
- reinforcing open scientific inquiry
- encouraging diversity of analysis and opinion,
- promoting new research, testing of new or alternative hypotheses and methods of analysis
- supporting studies on data collection methods and measurement
- facilitating education of new researchers
- enabling the exploration of topics not envisioned by the initial investigators
- permitting the creation of new datasets by combining data from multiple sources.
- Who benefits from data sharing?
Everyone benefits, including investigators, funding agencies, the scientific
community, and, most importantly, the public. Data sharing provides more
effective use of NIH resources by avoiding unnecessary duplication of data
collection. It also conserves research funds to support more investigators.
The initial investigator benefits, because as the data are used and published
more broadly, the initial investigator's reputation grows.
- Is data sharing widely accepted as a good practice?
National scientific organizations have made a commitment to the sharing and
archiving of data through their ethical codes (e.g., the American Sociological
Association) or publication policies (e.g., the American Psychological
Association). More than 15 years ago, the National Academy of Sciences described
the benefits of sharing data. (See http://books.nap.edu/catalog/2033.html)
For many years, the National Science Foundation (NSF) Economics Program has
required data underlying an article arising from an NSF grant to be placed in a
public archive. Similar expectations exist at the National Institute of Justice.
Moreover, many scientific journals require that authors make available the data
included in their publications. In the biological sciences, protein and DNA
sequences are made available to researchers through data archives, such as
GenBank. Since 1996, NIH has required data sharing in several areas, such as
DNA sequences, mapping information, and crystallographic coordinates.
- What do you mean by final research data?
By "final research data", we mean recorded factual material commonly accepted in
the scientific community as necessary to validate research findings. Final
research data do not include laboratory notebooks, partial datasets, preliminary
analyses, drafts of scientific papers, plans for future research, peer review
reports, communications with colleagues, or physical objects, such as gels or
laboratory specimens.
- Does "final research data" include data that were not originally produced under an NIH grant or contract?
Sometimes. For example, where NIH support is sought to transform or link
datasets (as opposed to producing new data), the investigator should include a
data-sharing plan in the application.
- What do you mean by unique data?
By "unique data" we mean data that cannot be readily replicated. Examples of
studies producing unique data include: large surveys that are too expensive to
replicate; studies of unique populations, such as centenarians; studies
conducted at unique times, such as a natural disaster; studies of rare
phenomena, such as rare metabolic diseases.
- What kinds of data are candidates for sharing?
Potentially all kinds of data are candidates for sharing, but unique data are
especially important. Some biologic sciences already have data-sharing plans in
place, such as genetic mapping. But other basic science data are also amenable
to sharing. Data from human subjects (e.g., surveys, clinical studies) also can
be shared if the identity and privacy of research participants can be protected.
- Can you give me some examples of data that have been shared?
Examples of shared epidemiologic data include the Framingham Heart Study, the
Honolulu Heart Program, the Atherosclerosis Risk in Communities, Epidemiology
of Chronic Disease in the Oldest Old, and the Iowa 65+ Rural Health Study.
Examples of shared data from clinical trials include the Asymptomatic Cardiac
Ischemia Pilot, the Intermittent Positive Pressure Breathing Study, and the
Safety and Efficacy Trial of Zidovudine for Asymptomatic HIV Infected
Individuals. Examples of shared datasets from the basic sciences include a
growing number of genome sequences and maps, as well as protein and nucleotide
databases (see ENTREZ http://www.ncbi.nlm.nih.gov/Database/index.html and other
resources for molecular biology at the National Center for Biotechnology
Information at http://www.ncbi.nlm.nih.gov)
- Data from my studies are generated from a very small number of rats, and I publish the final data. Am I expected to provide these data to other investigators as well?
Publishing these final data constitutes an acceptable mechanism for sharing data.
- How soon after data collection am I obliged to share the final data?
Recognizing that the value of data often depends on their timeliness, data
sharing should occur in a timely fashion. NIH expects the timely release and
sharing of data to be no later than the acceptance for publication of the main
findings from the final dataset. This time point will be influenced by the
nature of the data collected. Data from small studies can be analyzed and
submitted for publication relatively quickly. If data from large epidemiologic
or longitudinal studies are collected over several discrete time periods or
waves, data should be released in waves as data become available or main
findings from waves of the data are published. NIH recognizes that the
investigators who collected the data have a legitimate interest in benefiting
from their investment of time and effort. NIH continues to expect that the
initial investigators may benefit from the first and continuing use, but not
from prolonged exclusive use. While NIH also understands that an institution's
desire to exercise its intellectual property rights may justify a need to delay
disclosure of research findings, a delay of 30 to 60 days is generally viewed
as a reasonable period for such activity.
- Does data sharing pertain only to published data?
No. Data-sharing plans should encompass all data from funded research that can
be shared without compromising individual subjects' rights and privacy,
regardless of whether the data have been used in a publication. Furthermore,
data sharing prior to the publication of major results is encouraged in many
instances, for example, when data are collected to provide a resource for the
scientific community (as in the case of many large surveys).
- Due to circumstances beyond my control (an earthquake!), I was unable to recontact a substantial portion of the sample in my longitudinal study. I was planning to put my data in an archive, but the resulting high rate of attrition makes the data minimally useful. Should I still archive the final dataset?
Investigators need to find a balance between the value of the final data and
the costs associated with archiving. If the data are of limited usefulness,
then it is probably not worth the expense and effort of putting them in an
archive. However, if the investigator has published results based on this
dataset, then the dataset should be shared.
- I am preparing an SBIR application. Am I required to submit a data-sharing plan?
Yes. The specific nature of the data you will collect will determine whether or
not you may share the final dataset. If the final data are not amenable to
sharing, for example, if they are proprietary, then you need to explain this
in your application. Under the Small Business Act, SBIR grantees may withhold
their data for 4 years after the end of the award. The Small Business Act
provides authority for NIH to protect from disclosure and nongovernmental use
all SBIR data developed from work performed under an SBIR funding agreement
for a period of 4 years after the closeout of either a Phase I or Phase II
grant unless NIH obtains permission from the awardee to disclose these data.
The data rights protection period lapses only upon expiration of the protection
period applicable to the SBIR award, or by agreement between the small business
concern and NIH.
- I don't want to share my data, which were generated under an NIH grant. Can I be forced to do so?
When the PI and the authorized institutional official sign the face page of an
NIH application, they are assuring compliance with policies and regulations
governing research awards. NIH expects grantees to follow these rules and to
conduct the work described in the application. Thus, if an application describes
a data sharing plan, NIH expects that plan to be enacted. In some instances,
for example, NIH may make data sharing a term and condition of award.
Under specific circumstances, your data also may be accessible through the
Freedom of Information Act (FOIA). If your competitive grant was awarded after
April 17, 2000 and if your data were cited in a Federal regulation or
administrative order, then your data may also be accessible through FOIA.
(See http://grants.nih.gov/grants/policy/a110/a110_guidance_dec1999.htm).
- Will the data-sharing plan affect the priority score of my application?
No. Reviewers will not factor the proposed data-sharing plan into the
determination of scientific merit or priority score. Program staff is
responsible for overseeing the data-sharing policy and for assessing the
appropriateness and adequacy of the proposed data-sharing plan. Program
concerns must be resolved prior to making any award.
- My research, which seeks support from both the public and private sectors, will involve proprietary data. How do I deal with the data-sharing issue in my application?
NIH recognizes that there may be circumstances where a cofunder has requested
restrictions on data sharing as a condition of funding. These restrictions
should be identified in the application and a proposal made about how data from
the cofunded project will be shared. Should you believe that you are unable to
share any of the data, your justification will be considered by NIH program
staff.
- I'm a busy investigator. I don't have time to process requests for my data. What should I do?
In addition to publishing small datasets, there are several alternatives to
responding to each separate request to share data (e.g., putting data in an
archive or restricted access facility, and setting up a web site for data
access). Archives and data enclaves provide technical assistance for users
with questions or problems and may spare busy investigators time.
- Can I share data with colleagues under my own auspices?
Yes. Your data-sharing plans should indicate the criteria for deciding who can
receive your data and whether or not you will place any conditions on their use.
Data should be made as widely and freely available as possible while
safeguarding the confidentiality of the data and privacy of participants.
You should not place limits on the questions or methods others might pursue
nor should you require co-authorship as a condition for receiving the data.
- Should the data source be cited or acknowledged in papers that rely on shared data?
It is appropriate to acknowledge the source of data upon which a manuscript is
based. Many investigators include this information in the methods and/or
reference sections of their manuscripts. Journals generally include an
acknowledgement section, in which the authors can recognize people who helped
them gain access to the data. However, you should check the policies of the
journal to which you plan to submit.
- Should I consider contributing my research data to a data archive?
Maybe. Archives are organizations that collect and distribute data. They
understand what is needed to prepare data for wider distribution and
documentation for users. They provide stable, reliable, and cost-effective
means for distributing data. They also provide protections for the dataset
and technical assistance for requestors.
- Where can I find guidance on preparing data for sharing and archiving?
Guidance is available from a variety of sources. For example, the
Inter-University Consortium for Political and Social Research at the University
of Michigan has prepared an excellent set of guidelines for preparing data for
archiving. While these guidelines were written with social science data in mind,
they are broadly applicable. See http://www.icpsr.umich.edu/ACCESS/dpm.html
For molecular biology information, the National Center for Biotechnology
Information (NCBI), a division of the National Library of Medicine (NLM) at
the National Institutes of Health, is ready to assist researchers who have
genome-specific and molecular data to submit. For more information about
submitting and accessing NCBI data, see the NCBI Website at
http://www.ncbi.nlm.nih.gov/Genbank/index.html
- How do I pay for preparing data for sharing and archiving?
NIH recognizes that it takes time and money to prepare data for sharing. You
can request funds for data archiving and sharing as part of your grant
application for collecting the data. If you have already collected the data,
you may want to ask your NIH Project Officer about a competitive or
administrative supplement. NIH recommends that you consider procedures and
costs for data sharing during the application process rather than after the
data have been collected.
- Should I address data sharing in my NIH application?
Yes. By the October 1, 2003 application receipt date, NIH requests that all
extramural applicants seeking $500,000 or more in direct costs in any one year
provide a data-sharing plan in their applications.
- What do I need to include in my application and where do I put the information about data sharing?
Scientists submitting grant, cooperative, or contract applications should
include a data-sharing plan, or provide justification for the absence of such
a plan, in a brief paragraph to be placed immediately after the Research Plan
Section (i.e., immediately after PHS 398 Section I. Letters of Support in the
Research Plan Section of their application) so it does not count toward the
application page limit. Additional information on data sharing might be
included in other sections of the application, as appropriate. For example,
if you are producing a large dataset that will become an important resource
for the scientific community, you probably want to mention this in the
significance section. If you are requesting funds to prepare, document, and
archive the data, you would want to include relevant information in the budget
and budget justification sections. In the Human Subjects section of the
application, you should discuss the potential risks to research participants
posed by data sharing and steps you will take to address those risks.
- The informed consent form for my recently completed study states explicitly that only my research team will see the data provided and that we will not share the data. Am I now expected to share it?
No, but if you plan to collect additional data from those subjects under a
grant with a data-sharing plan, you should revise the consent procedure to be
consistent with the data-sharing plan. In preparing and submitting a
data-sharing plan during the application process, investigators should avoid
developing or relying on consent processes that promise research participants
not to share data with other researchers. Such promises should not be made
routinely or without adequate justification described in the data-sharing plan.
- How can I protect the privacy of my subjects?
It is the responsibility of the investigators, their IRB, and their institution
to protect the rights of participants and the confidentiality of their data.
Data should be redacted to strip all individual identifiers, and effective
strategies should be adopted to minimize risk of disclosing a participant's
identity. Options to protect privacy include: withholding part of the data,
statistically altering the data in ways that will not compromise secondary
analyses, requiring researchers who seek data to commit to protect privacy
and confidentiality, and providing data access in a controlled site, sometimes
referred to as a data enclave. Some investigators use hybrid methods, releasing
a redacted dataset for general use but providing access to more sensitive data
through a user contract or data enclave. In most instances, sharing data is
possible without compromising participant confidentiality and privacy.
- Can institutions and investigators subject to the Federal Health Insurance Privacy and Portability Act (HIPAA) Privacy Rule share data in accord with the NIH Data Sharing policy?
Yes. NIH recognizes that data sharing may be complicated or limited, in some
cases, by institutional policies or local IRB rules, as well as by local, state
and Federal laws and regulations like the Privacy Rule. To protect the rights
and privacy of people who participate in NIH-sponsored research, data intended
for broader use should be free of identifiers that would permit linkages to
individual research participants, and exclude variables that could lead to
deductive disclosure of the identity of individual subjects. When data sharing
is limited, applicants should explain such limitations in their data sharing
plans.
- I collect data on sensitive and, sometimes, illegal behaviors. Are these data too sensitive to be shared?
Not necessarily. The collection of sensitive data does not preclude sharing.
For example, the National Center for Chronic Disease Prevention and Health
Promotion at CDC operates the Youth Risk Behavior Surveillance System (YRBSS),
available at http://www.cdc.gov/nccdphp/dash/yrbs/, which provides data on six
health risk behaviors among youth: unintentional injuries and violence, tobacco
use, alcohol and other drug use, sexual behaviors, dietary behaviors, and
physical activity. Similarly, data from the National Survey of Family Growth,
which includes statistical data on family life, marriage and divorce,
contraception, sexual experience, pregnancy, and infertility, can be obtained
from the National Center for Health Statistics.
Sensitive data can be shared so long as appropriate privacy safeguards are in
place. Investigators must determine if and how the rights and privacy of the
subjects can be protected. And investigators collecting data on sensitive and
illegal behaviors should obtain a Certificate of Confidentiality
(http://grants.nih.gov/grants/policy/coc/) to protect against the involuntary
release of data that could identify research participants.
- Can data from a clinical trial be shared?
It depends. Participants' privacy must be protected in accord with all
applicable laws and regulations. Clinical trial datasets are frequently rich
in items that could potentially identify individual subjects. For example, many
early phase trials use small samples, which make it difficult to protect the
privacy of the participants. Researchers who are planning clinical trials and
intend to share the resulting data should think carefully about the study
design, the informed consent documents, and the structure of the resulting
data prior to the initiation of the study.
There are many precedents for sharing of clinical trial data. For example, data
from a number of clinical trials supported by the National Heart, Lung, and
Blood Institute (NHLBI) are available for research use (See
http://www.nhlbi.nih.gov/resources/deca/directry.htm). The National Institute
of Allergy and Infectious Diseases (NIAID) also lists their clinical trials
datasets that they have made available through the National Technical
Information Service (NTIS) for public use
(See http://www.niaid.nih.gov/research/aidsdata.htm).
- Is data on DNA and protein sequences archived?
Yes. For example, GenBank (http://www.ncbi.nih.gov/Genbank/) and Entrez
(http://www.ncbi.nlm.nih.gov/Entrez/) archive gene sequencing data. The sharing
of materials, data, and software in a timely manner has been an essential
element in the rapid progress that has been made in the genetic analysis of
mammalian genomes.
- I did not request support for sharing data in my application, which was funded. Can I charge requestors for the costs associated with sharing the data?
Yes, as long as such costs are reasonable and not excessive and reflect actual
costs associated with complying with the request. These expenses for preparing
and shipping the data might include costs of personnel, computing time,
supplies, and other directly related expenses. NIH requirements for
accountability for various types of income under NIH grants are specified
elsewhere, see
http://grants.nih.gov/grants/policy/nihgps_2003/NIHGPS_Part8.htm#_Toc54600138
- I am working on a select pathogen and cannot share the data for reasons of national security. Is this an acceptable reason for not sharing?
Yes.
- If I am required to submit a revised data-sharing plan, what do I need to do?
As is the case with PIs who submit any additional or revised application
material, your revised data-sharing plan must be signed by your institutional
official and by you.
- I want to request a dataset from a recent publication. How do I do this?
You should check the publication to see if reference is made to an archive, an
enclave, or a Website where the data might be available. If no such information
is provided, you may wish to send a letter to the PI to see if the data are
available for sharing, and where you might be able to get the data and
associated documentation.
February 16, 2004:
- I am a PI on a P30 center grant with a budget in excess of $500,000 (direct costs) in each year. Some of the research projects that collect survey data benefit by the infrastructure support provided by the P30 but these research projects are not funded by NIH. Am I still expected to share data from these research grants?
If any NIH support (i.e., partial support) is provided for resource development, even if those research resources were developed primarily with non-NIH funds, then those research resources must be shared in line with NIH policy as if NIH funded the entire project. It should be emphasized that although a data sharing plan is only required of grants awarding direct costs of $500,000 or more in any one year, data sharing itself (without a specific plan submission) continues to be a requirement of all NIH-funded grants. If the P30 maintains core resources that actually house and are the final repository of the data, e.g., a high throughput array analysis core, then any project using the center’s resources would be subject to the center’s data sharing plan.
|