[Federal Register: August 28, 2007 (Volume 72, Number 166)]
[Notices]               
[Page 49290-49297]
From the Federal Register Online via GPO Access [wais.access.gpo.gov]
[DOCID:fr28au07-117]                         

-----------------------------------------------------------------------

DEPARTMENT OF HEALTH AND HUMAN SERVICES

National Institutes of Health

 
Policy for Sharing of Data Obtained in NIH Supported or Conducted 
Genome-Wide Association Studies (GWAS)

AGENCY: National Institutes of Health, HHS.

ACTION: Notice.

-----------------------------------------------------------------------

Background

    The NIH is interested in advancing genome-wide association studies 
(GWAS) to identify common genetic factors that influence health and 
disease. For the purposes of this policy, a genome-wide association 
study is defined as any study of genetic variation across the entire 
human genome that is designed to identify genetic associations with 
observable traits (such as blood pressure or weight), or the presence 
or absence of a disease or condition.\1\ Whole genome information, when 
combined with clinical and other phenotype data, offers the potential 
for increased understanding of basic biological processes affecting 
human health, improvement in the prediction of disease and patient 
care, and

[[Page 49291]]

ultimately the realization of the promise of personalized medicine. In 
addition, rapid advances in understanding the patterns of human genetic 
variation and maturing high-throughput, cost-effective methods for 
genotyping are providing powerful research tools for identifying 
genetic variants that contribute to health and disease.
---------------------------------------------------------------------------

    \1\ To meet the definition of a GWAS, the density of genetic 
markers and the extent of linkage disequilibrium should be 
sufficient to capture (by the r\2\ parameter) a large proportion of 
the common variation in the genome of the population under study, 
and the number of samples (in a case-control or trio design) should 
provide sufficient power to detect variants of modest effect.
---------------------------------------------------------------------------

    For these reasons, the NIH announced in May 2006 that it planned 
to: (1) Update NIH data sharing policies for research applications 
involving GWAS data; (2) initiate a public consultation process to 
inform policy development activities; and (3) track GWAS applications 
and awards at a central level (NOT-OD-06-071--http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-071.html
). A call for public 

comments on a proposed GWAS policy was issued on August 30, 2006 (NOT-
OD-06-094--http://grants.nih.gov/grants/guide/notice-files/NOT-OD-06-094.html
). Between August 30 and November 30, 2006, the NIH solicited 

public comments from a range of public sectors (see Preamble below). 
Following the comment period, NIH convened a Town Hall Meeting in 
Bethesda, Maryland, on December 14, 2006, to provide an opportunity for 
direct interaction with interested stakeholders on the important policy 
questions raised through the proposed policy (NOT-OD-06-022--http://grants.nih.gov/grants/guide/notice-files/NOT-OD-06-022.html
).

    This Notice provides the NIH response to the public comments 
received during the public consultation activities and presents the 
revised GWAS policy developed by the NIH in response to the feedback 
received and further internal development of the issues. The policy 
addresses (1) Data sharing procedures, (2) data access principles, (3) 
intellectual property, and (4) issues regarding the protection of 
research participants through all phases of GWAS. Many of the 
principles contained in the policy reflect existing NIH polices and 
other NIH discussions.
    The goal of the policy is to advance science for the benefit of the 
public through the creation of a centralized NIH GWAS data repository. 
Maximizing the availability of resources facilitates research and 
enables medical science to better address the health needs of people 
based on their individual genetic information.

Protecting Research Participants

    The potential for public benefit to be achieved through sharing 
GWAS data is significant. However, genotype and phenotype information 
generated about individuals, such as data related to the presence or 
risk of developing particular diseases or conditions and information 
regarding paternity or ancestry, may be sensitive. Therefore, 
protecting the privacy of the research participants and the 
confidentiality of their data is critically important. Risks to 
individuals, groups, or communities should be balanced carefully with 
potential benefits of the knowledge to be gained through GWAS. The 
sensitive nature of GWAS information about participants and the broad 
data distribution goals of the NIH GWAS data repository highlight the 
importance of the informed consent process to this research.
    The NIH recognizes that scientific, ethical and societal issues 
relevant to this policy are evolving, and the agency has established 
on-going mechanisms to oversee GWAS policy implementation across the 
agency and to monitor whole genome association data use practices. The 
NIH will revisit and revise the policy and related practices as 
appropriate.

Preamble: Summary of Public Comments on Proposed Policy

    On August 30, 2006, the NIH published the Proposed Policy for 
Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide 
Association Studies (GWAS) (http://grants.nih.gov/grants/guide/notice-files/NOT-OD-06-094.html
) for public comment in the Federal Register 

and the NIH Guide for Grants and Contracts. The comment period ended 
with a Town Hall meeting held in Bethesda, Maryland on December 14, 
2006, that was attended by a total of 374 people (on-site and via 
webcast).
    Overall the NIH received 196 written comments from professional 
societies, patient advocacy groups, privacy groups, individual 
scientists, and private citizens. The comments reflected a variety of 
interests and perspectives. In developing policies, the NIH strives to 
be respectful of the diversity of individual and group interests, 
incorporating appropriate protections while promoting maximum public 
benefit from the research it sponsors. The NIH GWAS policy and its 
implementation are expected to evolve in response to advances in 
scientific knowledge, available technologies, and the legal and ethical 
issues they raise.

I. Rationale for a Centralized Data Repository

    Respondents asked for clarification of the rationale for creation 
of a central data repository instead of distributed repositories under 
the control of individual (and non-governmental) institutions and 
investigators. Concerns expressed about a central data repository 
included, for example, the resources required to maintain it and the 
extent to which it would duplicate efforts and resources already 
invested by multiple institutions.
    The advantages and limitations of central versus distributed data 
repositories have been discussed extensively at the NIH. From a 
scientific standpoint, a central repository offers a number of 
important advantages: Tighter and more consistent control over the 
standards and quality of the genotype and phenotype data included; the 
ability to standardize and update terminology and format as technology 
and methodology improve; consistent, defined and transparent security 
and standards for access to data; a long-term commitment to maintenance 
of data after studies have been completed; a common point of entry for 
all investigators who use the data; a consistent and defined approach 
to removal of data in the event of withdrawal of participant consent; 
facilitation of meta-analyses and analyses that use data from multiple 
studies; and the ability to implement consistent participant 
protections at the level of data submission and data access. Individual 
investigators and many institutions may lack sufficient resources to 
ensure consistency and quality control, or a long-term commitment to 
data storage and access. One of the potential disadvantages of a 
central repository residing at NIH is that the data may be accessible 
through the Federal Freedom of Information Act (FOIA), unless they are 
exempt from release under one of the FOIA exemptions. This is further 
discussed in the Protection of Research Participants section below.
    As clinical and genomics research progresses, genotype and 
phenotype data are being collected into databases maintained by a 
variety of investigators, studies, and institutions. The NIH is 
concerned that the present situation may provide less consistent 
standards for the protection of research participants, data quality, 
and data access than would a central repository. However, the NIH 
recognizes that other databases will be designed to achieve different 
scientific aims or to integrate different analytic capacities, and the 
NIH GWAS policy is not intended to constrain the development of such 
databases or to curtail the deposition of NIH-supported GWAS data into 
other databases (as may be appropriate or required for some research 
programs). Among the on-going charges to the trans-NIH Technical 
Standards Steering

[[Page 49292]]

Committee established through the GWAS governance structure (see 
Oversight and Governance section below) will be explicit consideration 
of the evolving technical capacities and interoperability needed to 
facilitate the submission of data into the NIH GWAS data repository \2\ 
through other major database systems (e.g., the NCI caBIG network). 
This committee also will provide a forum for inter-IC coordination of 
data structures and standards to maintain interoperability of NIH 
databases.
---------------------------------------------------------------------------

    \2\ Currently named the NIH database of Genotypes and Phenotypes 
(dbGaP) (http://www.ncbi.nlm.nih.gov/entrez/query/Gap/gap_tmpl/about.html
).

---------------------------------------------------------------------------

II. Protection of Research Participants

Non-Research Use of Data

    Respondents noted that data held by the Government are subject to 
the FOIA, and thus could be obtained outside of the Controlled Access 
data request process described in the GWAS policy. Respondents 
expressed concern that data could be obtained for non-research purposes 
(e.g., by law enforcement agencies, employers, or insurance companies) 
or for purposes beyond the scope of the research uses envisioned within 
the GWAS policy.
    As an agency of the Federal Government, the NIH is required to 
release Government records in response to a request under the FOIA, 
unless they are exempt from release under one of the FOIA exemptions. 
Although the NIH-held data will be coded and the NIH will not hold 
direct identifiers to individuals within the NIH GWAS data repository, 
the agency recognizes the personal and potentially sensitive nature of 
the genotype-phenotype data. Further, the NIH takes the position that 
technologies available within the public domain today, and 
technological advances expected over the next few years, make the 
identification of specific individuals from raw genotype-phenotype data 
feasible and increasingly straightforward.
    The agency believes that release of unredacted GWAS datasets in 
response to a FOIA request would constitute an unreasonable invasion of 
personal privacy under FOIA Exemption 6, 5 U.S.C. 552(b)(6). Therefore, 
among the safeguards that the NIH foresees using to preserve the 
privacy of research participants and confidentiality of genomic data is 
the redaction of individual-level genotype and phenotype data from 
disclosures made in response to FOIA requests and the denial of 
requests for unredacted datasets.
    In addition, the NIH acknowledges that legitimate requests for 
access to data made by law enforcement offices to the NIH may be 
fulfilled. The NIH will not possess direct identifiers within the NIH 
GWAS data repository, nor will the NIH have access to the link between 
the data keycode and the identifiable information that may reside with 
the primary investigators and institutions for particular studies. The 
release of identifiable information may be protected from compelled 
disclosure by the primary investigator's institution if a Certificate 
of Confidentiality is or was obtained for the original study. Within 
the final GWAS policy, the NIH explicitly encourages investigators to 
consider the potential appropriateness of obtaining a Certificate of 
Confidentiality as an added measure of protection against future 
compelled disclosure of identities for studies planning to collect 
genome-wide association data.
Stigmatization
    Respondents commented that some data to be included in the 
repository may be highly sensitive because they may suggest the 
existence either of individually identifiable or socially undesirable 
traits. These data have implications for both participants and family 
members.
    Tools for analysis of genomic data increasingly are able to make 
inferences about some individual traits (e.g., height, weight, skin and 
hair and eye color) and to identify predilections for characteristics 
(e.g., risk of developing some diseases) and behaviors with social 
stigma. In recognition of these risks, the NIH policy includes steps to 
protect the interests and privacy concerns of individuals, families and 
identifiable groups who participate in GWAS research. The NIH is asking 
institutions submitting GWAS datasets to certify that an Institutional 
Review Board (IRB) and/or Privacy Board (as applicable) has considered 
such risks and that investigators have stripped the data of all 
identifiers before the data are submitted. The NIH Data Access 
Committees (DACs) will approve access only for research uses that are 
consistent with an individual's consent as defined by the submitting 
institution. In addition, in the event that requests raise questions or 
concerns related to privacy and confidentiality, risks to populations 
or groups, or other relevant topics, the DACs will consult with other 
experts as appropriate.
Informed Consent
    Respondents asked for clarification regarding appropriate informed 
consent processes and consent documentation for individuals 
participating in studies for which data are to be submitted to the NIH 
GWAS data repository. Concern was raised that participants may not be 
aware of the potential privacy risks associated with placement of their 
genotype and phenotype data in a central repository at the NIH. 
Respondents also commented that adequate consent for data sharing 
requires participants to understand both the risks and potential 
benefits of the proposed sharing. Key stakeholders in these 
considerations are: Research participants (both those who have 
participated in on-going or prior studies for which GWAS were not 
anticipated and those who may participate in prospective GWAS); 
investigators developing informed consent processes; institutions 
approving the submission of datasets to the NIH GWAS data repository; 
and IRBs asked to review studies proposing genome-wide association 
analysis. Respondents commented that additional institutional resources 
are likely to be required if additional consent is needed for data 
sharing.
    As noted elsewhere and reflected in the GWAS oversight structure 
established to manage implementation of the GWAS policy (see Oversight 
and Governance section below), the NIH recognizes that the ethical 
considerations relevant to GWAS data sharing are complex and dynamic. 
Therefore, the NIH is developing informational materials as a resource 
for IRBs and institutions for their consideration of the issues 
relevant to reviewing and approving individual studies proposing data 
submission to the NIH GWAS data repository. The NIH intends to continue 
to engage the Office for Human Research Protections, the research 
community, and the public to explore the participant protection issues 
related to GWAS and to identify best practices for the consideration 
and risk-benefit analysis of genotype and phenotype data sharing under 
this policy. These efforts will include discussion of the optimal 
methods for communicating with participants about relevant issues 
through the informed consent process for prospective studies, and 
discussion of issues to consider in the institutional review of consent 
materials for use of existing samples or data proposed for GWAS. 
Participant interests relevant to GWAS data sharing extend beyond 
individual participants to families, communities, and their respective 
cultural sensitivities. The NIH believes that institutional 
deliberations regarding data submission

[[Page 49293]]

to the NIH GWAS data repository should include these broader interests. 
Further, especially complex issues exist with regard to GWAS where 
participant consent has been provided by proxy (e.g., pediatric 
research or some studies involving mental health disorders). Discussion 
of this topic will be included in the informational materials \3\ that 
the NIH is developing for submitting institutions and IRBs asked to 
review proposed GWAS.
---------------------------------------------------------------------------

    \3\ The NIH anticipates releasing additional GWAS implementation 
documents, including a Points to Consider document on informed 
consent issues related to the submission of data to the repository.
---------------------------------------------------------------------------

    The GWAS policy applies to genome-wide association research 
utilizing genetic materials and data collected both prospectively and 
retrospectively. For prospective studies, in which GWAS are conceived 
within the study designs at the time research participants provide 
their consent, the NIH expects specific discussion within the informed 
consent process and documentation that participants' genotype and 
phenotype data will be shared for research purposes through the NIH 
GWAS data repository. For retrospective studies performed using 
existing genetic materials and previously collected data, the NIH 
anticipates considerable variation in the extent to which data sharing 
and future genetic research have been addressed within the informed 
consent documents. As described in the policy, the submitting 
institution will determine whether a study is appropriate for 
submission to the NIH GWAS data repository (including an IRB and/or 
Privacy Board review of specific study elements, such as participant 
consent). The NIH anticipates that a number of GWAS proposing to 
include pre-existing data or samples may require additional consent of 
the research participants. The NIH may give programmatic consideration 
to requests for funds or other resources needed to conduct additional 
participant consent when appropriate.
    In the event that participants withdraw consent for sharing of 
their individual-level genotype and phenotype data through the NIH GWAS 
data repository, the submitting institution will be responsible for 
alerting the NIH GWAS data repository and requesting that the specific 
record be removed from future data distributions. However, data that 
have been distributed to researchers will not be retracted.
Return of Results
    Respondents asked for clarification of plans for return of results 
to study participants.
    The NIH does not anticipate that participants will be able to 
obtain individual results of secondary analyses on data obtained from 
their participation in primary studies. Because the NIH GWAS data 
repository and secondary data users will not have access to identifying 
information or to the link to the keycode within the data, neither will 
be able to return individual results directly to subjects. Secondary 
investigators may share their findings with primary investigators, who 
may determine whether it is appropriate to return individual or 
aggregate research results to participants whose health may be 
affected, following established institutional procedures (e.g., IRB 
approval) and specific parameters defined within the original study.
Oversight and Governance of the NIH GWAS Data Repository, Submission 
and Access
    Some respondents commented on the importance of adequate oversight 
of policies for data submission and access, and on the details of the 
repository. A need for oversight of the quality control measures for 
genotype and phenotype data and of the security measures for the 
repository was noted by many respondents. Some respondents commented on 
the importance of the policies established by the Data Access 
Committees, and their function within the Institutes and Centers.
    The NIH has developed a governance structure for GWAS that provides 
oversight tailored to the specific role involved. The NIH Director will 
oversee the GWAS policy and its implementation. In carrying out this 
responsibility, the NIH Director will be informed by a Senior Oversight 
Committee composed of Institute and Center (IC) Directors and 
appropriate leadership from within the Office of the Director. The 
Senior Oversight Committee will be responsible for the on-going 
management and stewardship of GWAS policy and operating implementation 
procedures across ICs. Reporting to the Senior Oversight Committee will 
be two Steering Committees charged with the implementation, 
communication, and development of specific procedures related to the 
conduct, submission and data release practices for GWAS supported by 
the NIH. One of these groups, the Research Participant Protection and 
Data Management Steering Committee, will include among its members the 
chairs of all Data Access Committees at the NIH as well as appropriate 
staff from NIH policy and oversight offices (e.g., the Office of 
Science Policy and the Office of Human Subjects Research). This 
committee will work to promote consistent and robust participant 
protections across relevant NIH programs. The second group, the 
Technical Standards Steering Committee, will include membership from 
scientific programs across the NIH as well as staff from the National 
Center for Biotechnology Information. This committee will focus on the 
challenges and needs associated with building and maintaining the NIH 
GWAS data repository and on formulating or stimulating the 
consideration of data standards (for genotype or phenotype data) where 
appropriate. Critical input from individual genome-wide association 
research programs and studies will be provided to the two Steering 
Committees through the ICs' Data Access Committees or other project 
oversight bodies created for specific studies, e.g., community 
representative groups, scientific advisory boards.
    In order to maintain GWAS policy consistent with evolving 
technological and ethical considerations, the NIH Director will solicit 
recommendations on the policy from external experts representing public 
and scientific stakeholders through the Advisory Committee to the 
Director.

III. Scientific Publication

    Some respondents commented on the considerable logistical 
difficulties posed by limiting the period of publication exclusivity, 
particularly considering the complexity of many of the studies and the 
lag time between submission and publication of peer-reviewed scientific 
papers. Some respondents were concerned that submitting investigators 
would not receive appropriate credit for their work and would have 
insufficient control over use of their data. Concern was expressed 
about enforcing compliance with publication policies. Some respondents 
commented that the limited period of exclusivity could stimulate a rush 
to publish initial analyses prematurely, deterring subsequent studies 
and reducing the overall quality of the reports.
    The NIH initially proposed that GWAS datasets be made available as 
soon as appropriate quality control measures (as defined for a given 
NIH program) were complete and that a 9-month period of exclusivity 
would exist for primary investigators to submit analyses of GWAS 
datasets for publication. The NIH believes that an extended period of 
exclusivity would

[[Page 49294]]

undermine the potential benefits of data sharing. However, in response 
to concerns raised through the public comment process, the NIH has 
lengthened this exclusivity period to 12 months in the final policy. 
The publication exclusivity period will commence on the date that a 
GWAS dataset is first made available through the NIH GWAS data 
repository, and the expiration date of this time period will be 
featured prominently in all descriptions and overviews of the dataset 
provided through both the public and controlled access pathways of the 
NIH GWAS data repository. The policy now is explicit on the inclusion 
within this exclusivity period of electronic and other means of 
information dissemination beyond peer-reviewed publications. As part of 
an overarching desire for transparency in the use of GWAS datasets, the 
names, institutional affiliations, and Data Access Committee-approved 
research uses for all GWAS data users will be available to the public 
within the NIH GWAS data repository. GWAS data users will be encouraged 
to collaborate with the primary investigators for GWAS as appropriate. 
The period of exclusivity is consistent with existing practices for 
other genome-wide association programs already available or in the 
pipeline for deposition into the NIH GWAS data repository, and is 
intended only as an upper limit as some NIH programs may stipulate 
shorter (or no) publication exclusivity timelines. The NIH anticipates 
that over time investigators will become more comfortable with the GWAS 
data sharing policy as the benefits of greater research access to the 
data are realized.

IV. Intellectual Property

    Respondents raised concerns that the policy might diminish the 
intellectual property rights of the submitting investigators, as well 
as their ability to obtain patents. Some respondents questioned whether 
the proposed policy text is a violation of the Bayh-Dole Act.
    The NIH believes that the intellectual property section of the 
policy presents no conflict with, or infringement upon, rights granted 
by the Bayh-Dole Act or any other federally-created intellectual 
property rights. Funding recipients are still able to elect title to 
any inventions or discoveries developed under the respective federal 
funding agreements that are or may be patentable, consistent with the 
Bayh-Dole Act and NIH policies. The NIH expects that intellectual 
property issues or questions that may occur will be resolvable through 
appropriate negotiations under the rubrics provided previously in NIH 
guidance to the research community within the Research Tools Policy 
(http://ott.od.nih.gov/policy/research_tool.html) and the Best Practices for the Licensing of Genomic Inventions (http://

http://www.ott.nih.gov/policy/genomic_invention.html). The NIH encourages 

development of new diagnostics, therapeutics, or other interventions 
building on basic discoveries, and believes they will be enabled 
through the NIH GWAS data repository. The NIH anticipates that 
downstream technology development opportunities will increase as a 
result of broad research access to genotype-phenotype associations 
provided through the GWAS policy. The NIH has engaged in informal 
discussions with academic and private sector experts in intellectual 
property; these interactions, as well as formal responses received from 
stakeholders through the GWAS public consultation process, have 
suggested that the GWAS policy is consistent with existing practices 
and can be expected to better promote the development of exciting new 
discoveries for the public benefit.

Policy for Genome-Wide Association Studies (GWAS)

    Effective Date: January 25, 2008.

I. Principles

    The NIH is interested in advancing genome-wide association studies 
(GWAS) to identify common genetic factors that influence health and 
disease. For the purposes of this policy, a genome-wide association 
study is defined as any study of genetic variation across the entire 
human genome that is designed to identify genetic associations with 
observable traits (such as blood pressure or weight), or the presence 
or absence of a disease or condition.\4\ Whole genome information, when 
combined with clinical and other phenotype data, offers the potential 
for increased understanding of basic biological processes affecting 
human health, improvement in the prediction of disease and patient 
care, and ultimately the realization of the promise of personalized 
medicine. In addition, rapid advances in understanding the patterns of 
human genetic variation and maturing high-throughput, cost-effective 
methods for genotyping are providing powerful research tools for 
identifying genetic variants that contribute to health and disease.
---------------------------------------------------------------------------

    \4\ To meet the definition of a GWAS, the density of genetic 
markers and the extent of linkage disequilibrium should be 
sufficient to capture (by the r\2\ parameter) a large proportion of 
the common variation in the genome of the population under study, 
and the number of samples (in a case-control or trio design) should 
provide sufficient power to detect variants of modest effect.
---------------------------------------------------------------------------

    Consistent with the NIH mission to improve public health through 
research, the NIH believes that the full value of GWAS to the public 
can be realized only if the genotype and phenotype datasets are made 
available as rapidly as possible to a wide range of scientific 
investigators. Rapid and broad data access is particularly important 
for GWAS because of the significant resources they require; the 
challenges of analyzing large datasets; and the extraordinary 
opportunities for making comparisons across multiple studies.
    Protection of research participants is a fundamental principle 
underlying biomedical research. The NIH is committed to responsible 
stewardship of data throughout the research process, which is essential 
to protecting the interests of study participants and to maintaining 
public trust in biomedical research.
    In consideration of the evolving scientific, ethical, and societal 
issues related to this policy, the NIH is establishing a governance 
structure for NIH GWAS activities that will:
     Ensure ongoing, high-level agency oversight; and
     Obtain regular input from public representatives, 
including those with expertise in bioethics, privacy, data security, 
and appropriate scientific and clinical disciplines; and
     Revisit and revise the policy as appropriate.

II. Applicability

    This NIH policy applies to:
     Competing grant applications that include GWAS and are 
submitted to the NIH for the January 25, 2008, and subsequent receipt 
dates;
     Proposals for contracts that include GWAS and are 
submitted to the NIH on or after January 25, 2008; and
     NIH intramural research projects that include GWAS and are 
approved on or after January 25, 2008.
    An application or proposal will be identified as GWAS by applicants 
and/or NIH staff (see NOT-OD-06-071--http://grants.nih.gov/grants/guide/notice-files/NOT-OD-07-071.html
).


III. Data Management

Data Repository

    To facilitate broad and consistent access to NIH-supported GWAS 
datasets, the NIH has developed a central NIH GWAS data repository \5\ 
at

[[Page 49295]]

the National Center for Biotechnology Information (NCBI), National 
Library of Medicine. The repository will provide a single-point of 
access to basic information about NIH-supported GWAS and to available 
genotype-phenotype datasets for GWAS. Although the NIH envisions that 
access to all NIH-supported GWAS datasets will be possible through this 
repository, it does not intend the repository to become the exclusive 
point of data submission for these data, nor does it intend the central 
database to delimit the structures or tools that may be appropriate for 
other similar databases. The repository also will accept GWAS datasets 
contributed from other sources.
---------------------------------------------------------------------------

    \5\ Currently named the NIH database of Genotypes and Phenotypes 
(dbGaP) (http://www.ncbi.nlm.nih.gov/entrez/query/Gap_tmpl/about.html
).

---------------------------------------------------------------------------

    To ensure the security of the data held by the repository, the NCBI 
will employ multiple tiers of data security (such as sequential 
firewalls and independent networks) based on the content and level of 
risk associated with the data. The NIH will establish and maintain 
operating policies and procedures for the repository to address issues 
including, but not limited to, the privacy and confidentiality of GWAS 
research participants, the interests of individuals and groups, data 
access procedures, and data security mechanisms. These will be reviewed 
periodically by the GWAS oversight bodies.
Data Submission
    All investigators who receive NIH support to conduct genome-wide 
analysis of genetic variation in a study population are expected to 
submit to the NIH GWAS data repository descriptive information about 
their studies for inclusion in an open access portion of the NIH GWAS 
data repository. All data and information will be submitted to a high 
security network within the NCBI through a secure transmission process. 
Submissions should include the following:
     The protocol,
     Questionnaires,
     Study manuals,
     Variables measured, and
     Other supporting documentation.
    In addition, the NIH strongly encourages the submission of curated 
and coded phenotype, exposure, genotype, and pedigree data, as 
appropriate, to the NIH GWAS data repository as soon as quality control 
procedures have been completed at the local institution. These detailed 
data will be made available through a controlled access process 
according to the GWAS Data Access procedures (described in Data Access 
section below). Investigators who elect to submit their GWAS data to 
additional data repositories or networks should verify that appropriate 
data security, confidentiality, and privacy measures are in place for 
protection of GWAS participants. Irrespective of where the data are 
submitted, researchers submitting GWAS data are encouraged to consider 
whether a Certificate of Confidentiality might be appropriate for their 
data as an additional safeguard with regard to involuntary disclosure 
of the research participant identities. Further information about 
Certificates of Confidentiality is available at the following Web site: 
http://grants2.nih.gov/grants/policy/coc/.

    In order to minimize risks to study participants, data submitted to 
the NIH GWAS data repository will be de-identified and coded using a 
random, unique code. Data should be de-identified according to the 
following criteria: the identities of data subjects cannot be readily 
ascertained or otherwise associated with the data by the repository 
staff or secondary data users (45 CFR 46.102(f)); the 18 identifiers 
enumerated at section 45 CFR 164.514(b)(2) (the HIPAA Privacy Rule) are 
removed; and the submitting institution has no actual knowledge that 
the remaining information could be used alone or in combination with 
other information to identify the subject of the data.\6\ Keys to codes 
will be held by submitting institutions. Submissions of GWAS data 
should be accompanied by a written certification (detailed below) 
stating that the identities of research participants will not be 
disclosed to the NIH GWAS data repository. Therefore, the NIH GWAS data 
repository will be unable to provide individual research results 
derived from analyses of submitted data to participants. General 
information regarding known publications analyzing GWAS datasets will 
be made available through the repository.
---------------------------------------------------------------------------

    \6\ The identities of data subjects cannot be readily 
ascertained or otherwise associated with the data by the repository 
staff or secondary data users (Common Rule); and the following data 
elements have been removed (HIPAA Privacy Rule).
    1. Names.
    2. All geographic subdivisions smaller than a state, including 
street address, city, county, precinct, ZIP Code, and their 
equivalent geographical codes, except for the initial three digits 
of a ZIP Code if, according to the current publicly available data 
from the Bureau of the Census: a. The geographic unit formed by 
combining all ZIP Codes with the same three initial digits contains 
more than 20,000 people. b. The initial three digits of a ZIP Code 
for all such geographic units containing 20,000 or fewer people are 
changed to 000.
    3. All elements of dates (except year) for dates directly 
related to an individual, including birth date, admission date, 
discharge date, date of death; and all ages over 89 and all elements 
of dates (including year) indicative of such age, except that such 
ages and elements may be aggregated into a single category of age 90 
or older.
    4. Telephone numbers.
    5. Facsimile numbers.
    6. Electronic mail addresses.
    7. Social Security numbers.
    8. Medical record numbers.
    9. Health plan beneficiary numbers.
    10. Account numbers.
    11. Certificate/license numbers.
    12. Vehicle identifiers and serial numbers, including license 
plate numbers.
    13. Device identifiers and serial numbers.
    14. Web universal resource locators (URLs).
    15. Internet protocol (IP) addresses numbers.
    16. Biometric identifiers, including fingerprints and 
voiceprints.
    17. Full-face photographic images and any comparable images.
    18. Any other unique identifying number, characteristic, or 
code, unless otherwise permitted by the Privacy Rule for re-
identification.
    In addition, the submitting institution should have no actual 
knowledge that the remaining information could be used alone or in 
combination with other information to identify the individual who is 
the subject of the information.
---------------------------------------------------------------------------

    All submissions to the NIH GWAS data repository should be 
accompanied by a certification by the responsible Institutional 
Official(s) of the submitting institution that they approve submission 
to the NIH GWAS data repository.
    The certification should assure that:
     The data submission is consistent with all applicable laws 
and regulations,\7\ as well as institutional policies;
---------------------------------------------------------------------------

    \7\ Applicable federal regulations may include HHS human 
subjects regulations (45 CFR part 46), FDA human subjects 
regulations (21 CFR parts 50 and 56), and the Health Insurance 
Portability and Accountability Act Privacy Rule (45 CFR part 160 and 
part 164, Subparts A and E).
---------------------------------------------------------------------------

     The appropriate research uses of the data and the uses 
that are specifically excluded by the informed consent documents are 
delineated;
     The identities of research participants will not be 
disclosed to the NIH GWAS data repository; and
     An IRB and/or Privacy Board, as applicable, reviewed and 
verified that:
    [cir] The submission of data to the NIH GWAS data repository and 
subsequent sharing for research purposes are consistent with the 
informed consent of study participants from whom the data were 
obtained;
    [cir] The investigator's plan for de-identifying datasets is 
consistent with the standards outlined above;
    [cir] It has considered the risks to individuals, their families, 
and groups or populations associated with data submitted to the NIH 
GWAS data repository; and
    [cir] The genotype and phenotype data to be submitted were 
collected in a manner consistent with 45 CFR part 46.
    While the NIH encourages data sharing through this policy,

[[Page 49296]]

circumstances beyond the control of investigators may preclude 
submission of GWAS data to the NIH GWAS data repository. Applications 
submitted to the NIH for support of GWAS in which the above 
expectations for data submission cannot be met will be considered for 
funding on a case-by-case basis by the appropriate IC.
    Submitting investigators and their institutions may request removal 
of data on individual participants from the NIH GWAS data repository in 
the event that a research participant withdraws his or her consent. 
However, data that have been distributed for approved research use will 
not be retrieved.
Data Access
    The basic descriptive and aggregate summary information submitted 
to the NIH GWAS data repository for each NIH-supported or conducted 
GWAS will be available publicly through the NIH GWAS data repository. 
Access to the genotype and phenotype datasets submitted and stored in 
the NIH GWAS data repository, along with appropriate automated 
calculations (e.g., quality control measures, simple genotype-phenotype 
associations, or a listing of all variants known to be in linkage 
disequilibrium \8\ with variants measured in the genotype), will be 
provided for research purposes through an NIH Data Access Committee 
(DAC). Membership of the DACs will include Federal staff with relevant 
expertise in areas such as the relevant particular scientific 
disciplines, research participant protection, and privacy. The NIH 
anticipates that individual DACs may be established based on 
programmatic areas of interest and the relevant needs for technical and 
ethics expertise. All DACs will operate according to common principles 
and follow similar procedures to ensure the consistency and 
transparency of the GWAS data access process.
---------------------------------------------------------------------------

    \8\ Linkage disequilibrium information will be based on data 
from the International HapMap Project (http://www.hapmap.org/).

---------------------------------------------------------------------------

    Investigators and institutions seeking data from the NIH GWAS data 
repository will be expected to meet data security measures (such as 
physical security, information technology security, and user training) 
and will be asked to submit a data access request, including a Data Use 
Certification, that is co-signed by the investigator and the designated 
Institutional Official(s). Data access requests should include a brief 
description of the proposed research use of the requested GWAS 
dataset(s). Within a Data Use Certification investigators will agree, 
among other things,\9\ to:
---------------------------------------------------------------------------

    \9\ Investigators requesting access to GWAS datasets who also 
have access to identifying information for the individuals within 
the dataset will require IRB approval.
---------------------------------------------------------------------------

     Use the data only for the approved research;
     Protect data confidentiality;
     Follow appropriate data security protections;
     Follow all applicable laws, regulations and local 
institutional policies and procedures for handling GWAS data;
     Not attempt to identify individual participants from whom 
data within a dataset were obtained;
     Not sell any of the data elements from datasets obtained 
from the NIH GWAS data repository;
     Not share with individuals other than those listed in the 
request any of the data elements from datasets obtained from the NIH 
GWAS data repository;
     Agree to the listing of a summary of approved research 
uses within the NIH GWAS data repository along with his or her name and 
organizational affiliation;
     Agree to report, in real time, violations of the GWAS 
policy to the appropriate DAC;
     Acknowledge the GWAS policy with regard to publication and 
intellectual property; and
     Provide annual progress reports on research using the GWAS 
dataset.
    Data Access Committees or their designees will review requests for 
access to determine whether the proposed use of the dataset is 
scientifically and ethically appropriate and does not conflict with 
constraints or informed consent limitations identified by the 
institutions that submitted the dataset to the NIH GWAS data 
repository. In the event that requests raise concerns related to 
privacy and confidentiality, risks to populations or groups, or other 
concerns, the DAC will consult with other experts as appropriate.

IV. Publication

    The NIH expects that investigators who contribute data to the NIH 
GWAS data repository will retain the exclusive right to publish 
analyses of the dataset for a defined period of time following the 
release of a given genotype-phenotype dataset through the NIH GWAS data 
repository (including the pre-computed analyses of the data). During 
this period of exclusivity, the NIH will grant access through the DACs 
to other investigators, who may analyze the data, but are expected not 
to submit their analyses or conclusions for publication during the 
exclusivity period. The maximum period of exclusivity is twelve months 
from the date that the GWAS dataset is made available for access 
through the NIH GWAS data repository, although a shorter period of 
exclusivity may be determined by the NIH funding IC. Contributing 
investigators are encouraged to shorten the period of publication 
exclusivity at their own discretion. Publication exclusivity is 
expected to extend to all forms of public disclosure, including meeting 
abstracts, oral presentations, and publicly accessible electronic 
submissions (e.g., Web sites, web blogs). Following expiration of the 
exclusive publication period for a given GWAS dataset, the NIH expects 
that all investigators with access to the data may submit publications 
or present analyses for any purpose consistent with the practices and 
policies of their institutions and the NIH. The NIH also expects all 
investigators who access GWAS datasets to acknowledge the Contributing 
Investigator(s) who conducted the original study, the funding 
organization(s) that supported the work, and the NIH GWAS data 
repository in all resulting oral or written presentations, disclosures, 
or publications of the analyses.

V. Intellectual Property

    It is the hope of the NIH that genotype-phenotype associations 
identified through NIH-supported and NIH-maintained GWAS datasets and 
their obvious implications will remain available to all investigators, 
unencumbered by intellectual property claims. The NIH discourages 
premature claims on pre-competitive information that may impede 
research, though it encourages patenting of technology suitable for 
subsequent private investment that may lead to the development of 
products that address public needs.
    The NIH will provide approved GWAS data users with certain 
automated calculations (described under the Data Access section) as a 
component of the GWAS datasets distributed through the NIH GWAS data 
repository.
    The NIH expects that NIH-supported genotype-phenotype data made 
available through the NIH GWAS data repository and all conclusions 
derived directly from them will remain freely available, without any 
licensing requirements, for uses such as, but not necessarily limited 
to, markers for developing assays and guides for identifying new 
potential targets for

[[Page 49297]]

drugs, therapeutics, and diagnostics. The intent is to discourage the 
use of patents to prevent the use of or block access to any genotype-
phenotype data developed with NIH support. The NIH encourages broad use 
of NIH-supported genotype-phenotype data that is consistent with a 
responsible approach to management of intellectual property derived 
from downstream discoveries, as outlined in the NIH's Best Practices 
for the Licensing of Genomic Inventions (http://www.ott.nih.gov/policy/genomic_invention.html) and its Research Tools Policy (http://

ott.od.nih.gov/policy/research--tool.html).
    The filing of patent applications and/or the enforcement of 
resultant patents in a manner that might restrict use of NIH-supported 
genotype-phenotype data could diminish the potential public benefit 
they could provide. Approved users and their institutions, through the 
execution of an NIH Data Use Certification, will acknowledge the goal 
of ensuring the greatest possible public benefit from NIH-supported 
GWAS.
Expectations Defined in the Policy for Investigators
    The detailed expectations are enumerated in the individual sections 
of this policy, and summarized as follows:
Investigators Submitting GWAS Data Are Expected To
     Provide descriptive information about their studies;
     Submit coded genotypic and phenotypic data to the NIH GWAS 
data repository; and
     Submit certification by the Institutional Official(s) of 
the responsible submitting institution that it has reviewed and 
approved submission to the NIH, noting any limitations on data use 
based on the relevant informed consents and providing assurance that 
all data are submitted to the NIH in accord with applicable laws and 
regulations and that the identities of research participants will not 
be disclosed to the NIH GWAS data repository.
Investigators Requesting and Receiving GWAS Data Are Expected To
     Submit a description of the proposed research project;
     Submit a data access request, including a Data Use 
Certification co-signed by the designated Institutional Official(s) at 
their sponsoring institution;
     Protect data confidentiality;
     Ensure that data security measures are in place;
     Notify the appropriate Data Access Committee of policy 
violations; and
     Submit annual progress reports detailing significant 
research findings.
Inquiries
    Specific questions about this Notice should be directed to: Laura 
Lyman Rodriguez, PhD, Special Advisor to the Director, National Human 
Genome Research Institute, 31 Center Drive, Room 4B09, Bethesda, MD 
20892, Phone: 301-496-0844. Sam Shekar, M.D., M.P.H., Assistant Surgeon 
General and Director, Office of Extramural Programs, Office of 
Extramural Research, 1 Center Drive, Bethesda, MD 20892, Phone: 301-
435-3492.
    E-mail inquiries should be directed to GWAS@nih.gov.
    Additional information and detailed implementation guidance related 
to the NIH GWAS Policy will be provided at http://grants.nih.gov/grants/gwas/index.htm
.


    Dated: August 22, 2007.
Elias A. Zerhouni,
Director, National Institutes of Health.
 [FR Doc. E7-17030 Filed 8-27-07; 8:45 am]

BILLING CODE 4140-01-P