I The Responsible Use and Publication of Data Generated by the TCGA Pilot Project
The primary purpose of the TCGA, as described on the Project’s web site (cancergenome.nih.gov) and in recent papers on the Project (description paper reference, Scientific American), is to develop and publish a comprehensive catalogue of the genomic changes found in individual cancer types. As is now standard practice in large-scale genomic research projects, the TCGA Pilot Project will adopt and follow a policy of releasing data as quickly as possible prior to publication, anticipating that they will be useful to many investigators. The TCGA Pilot Project anticipates that its data will be of high value in a number of research areas and will be used in many ways. Those include but are not limited to development of new analytical methods, identification of the genomic etiology of individual tumor types and subtypes, and development of new experimental diagnostic, therapeutic and preventive approaches and strategies for cancer. Thus, the TCGA Project recognizes that the data should be available to all users for any purpose, limited only by the need to avoid identifiability of the research participants (Lowrance and Collins, Science, August 3, 2007).
The NCI and NHGRI have identified the TCGA Pilot as a “community resource project…a research project specifically devised and implemented to create a set of data, reagents or other material whose primary utility will be as a resource for the broad scientific community.” This concept was developed at a meeting that was held to discuss the release of pre-publication data from large resource-generating scientific projects. That meeting, the “Fort Lauderdale meeting,” was held in January 2003 and was sponsored by the Wellcome Trust and the NHGRI, one of the TCGA’s funders. The report from that meeting is at http://www.genome.gov/Pages/Research/WellcomeReport0303.pdf.
The recommendations from the Fort Lauderdale meeting address the roles and responsibilities of data producers, data users, and funders of community resource projects, with the aim of establishing and maintaining an appropriate balance between the interests that data users have in rapid access to data and the needs that data producers have to publish and receive recognition for their work. The conclusion of the attendees at the Fort Lauderdale meeting was that a “responsible use” approach would be the best way to ensure that first-rate data producers will continue to participate in such projects and produce and quickly release large-scale data sets of broad use to a wide range of investigators. “Responsible use” was defined as allowing the data producers to have the opportunity to publish the initial global analyses of the data, as specifically articulated at the outset of the project, within a reasonable period of time.
The TCGA Project currently plans to prepare several manuscripts based on TCGA data (to be elaborated by the TCGA Steering Committee; this should include both topics and rough estimates of timing of submission):
- Commentary detailing the scientific aims and organization of TCGA (to be submitted early 2008).
- Interim integrated microarray data analysis of GBM partial set (to be submitted mid 2008).
- Analysis of DNA sequencing data for the GBM sample set (to be submitted late 2008).
- Final integrated microarray data analysis and sequence data analysis of completed GBM set (to be submitted late 2009).
Ovary and Lung reports as in 2-4 but about one year behind above timeline.
To act in accord with the Fort Lauderdale principles and support the continued prompt public release of large-scale genomic data prior to publication, researchers who plan to prepare manuscripts that would be comparable to the analyses described above, and journal editors who receive such manuscripts, are encouraged to coordinate their independent reports with the Project’s publication schedule described above. This may be done by contacting the co-chairs of the Project’s Publications Group (TBN).
Beyond the topics described above, researchers are free, and indeed encouraged, to publish results based on integrating TCGA data with data from other sources, particularly in efforts to study the role of specific genes and genomic changes in the biology of cancer. Researchers also are encouraged to use TCGA data to publish on the development of novel methods to analyze genomic data related to cancer and genotype-phenotype relationships in cancer. This may include the application of these methods to portions of the data, for example specific cancer subtypes or particular aspects of tumor biology.
The NCI and NHGRI do not consider that deposition of data from the TCGA Pilot Project, like those from other large-scale genomic projects, into its own (http://cancergenome.nih.gov/dataportal/data/about/) or public databases to be the equivalent of publication in a peer-reviewed journal. Therefore, although the data are available to others, the producers still consider them to be formally unpublished and expect that the data will be used in accord with standard scientific etiquette and practices concerning unpublished data.
Prior to the publication of the initial paper, the TCGA Project requests that authors who use data from the Project acknowledge the Project and reference the description paper (which has the TCGA Project as author): Reference for description paper.
Authors are also encouraged to acknowledge the appropriate sample donors and research groups, which can be found in the description paper. Similarly, the TCGA Project requests that Journal editors and reviewers attempt to ensure that the description paper is cited and that appropriate acknowledgements are made.
To ensure protection of genetic privacy for sample donors, data users will have to agree to certain conditions described in the TCGA Patient Protection Policy and Controlled Access Policy as to how the data will be used. For example, users will have to agree that they will share these data only with others who have also completed a data access agreement and that they will not patent discoveries in a way that prevents others from using the data (refer to IP policy ). This means that reviewers of a manuscript who need to see any controlled-access TCGA data underlying a result must also agree to these user access conditions before they can see these data.
Meeting presentations of TGCA data and analyses are possible and encouraged. However, to keep track of meeting presentations, and to avoid potential similar and/or identical presentations of the same data at a single meeting, we request that each presenter submit their abstract to the TCGA publication working group 2 weeks before the abstract is due. If duplicate meeting presentations occur, you will be contacted by the publication working group, who will suggest how to divide the presentations to minimize overlap. In addition, public meeting oral presentations of the data are also allowed and encouraged, but each investigator is asked to keep track of when and where these presentations occurred. The TCGA Publication committee will provide to each investigator, a series of 2-3 slides that must be displayed on all posters, or shown as part of an oral presentation, which will accomplish the goal of properly citing the TCGA project and its many contributors; it is critical that the TCGA project also be properly cited and identified in the meeting abstracts, and language will also be provided to accomplish this goal.