Database Development in Toxicogenomics: Issues and Efforts William B. Mattes,1 Syril D. Pettit,2 Susanna-Assunta
Sansone,3 Pierre R. Bushel,4 and Michael D. Waters4 1Pfizer Inc, Groton, Connecticut, USA; 2ILSI Health and
Environmental Sciences Institute, Washington, DC, USA; 3European
Molecular Biology Laboratory-European Bioinformatics Institute, Hinxton,
United Kingdom; 4National Center for Toxicogenomics, National Institute
of Environmental Health Sciences, National Institutes of Health, Department
of Health and Human Services, Research Triangle Park, North Carolina, USA Abstract The marriage of toxicology and genomics has created not only opportunities but also novel informatics challenges. As with the larger field of gene expression analysis, toxicogenomics faces the problems of probe annotation and data comparison across different array platforms. Toxicogenomics studies are generally built on standard toxicology studies generating biological end point data, and as such, one goal of toxicogenomics is to detect relationships between changes in gene expression and in those biological parameters. These challenges are best addressed through data collection into a well-designed toxicogenomics database. A successful publicly accessible toxicogenomics database will serve as a repository for data sharing and as a resource for analysis, data mining, and discussion. It will offer a vehicle for harmonizing nomenclature and analytical approaches and serve as a reference for regulatory organizations to evaluate toxicogenomics data submitted as part of registrations. Such a database would capture the experimental context of in vivo studies with great fidelity such that the dynamics of the dose response could be probed statistically with confidence. This review presents the collaborative efforts between the European Molecular Biology Laboratory-European Bioinformatics Institute ArrayExpress, the International Life Sciences Institute Health and Environmental Science Institute, and the National Institute of Environmental Health Sciences National Center for Toxigenomics Chemical Effects in Biological Systems knowledge base. The goal of this collaboration is to establish public infrastructure on an international scale and examine other developments aimed at establishing toxicogenomics databases. In this review we discuss several issues common to such databases: the requirement for identifying minimal descriptors to represent the experiment, the demand for standardizing data storage and exchange formats, the challenge of creating standardized nomenclature and ontologies to describe biological data, the technical problems involved in data upload, the necessity of defining parameters that assess and record data quality, and the development of standardized analytical approaches. Key words: ArrayExpress, bioinformatics, CEBS, database, EBI, HESI, MIAME, NCT, toxicogenomics. Environ Health Perspect 112:495-505 (2004) . doi:10.1289/txg.6697 available via http://dx.doi.org/ [Online 15 January 2004] This article is part of the mini-monograph "Application of Genomics to Mechanism-Based Risk Assessment." Address correspondence to W.B. Mattes, GeneLogic, Inc., 610 Professional Dr., Gaithersburg, MD 20879 USA. Telephone: (240) 364-6238. Fax: (240) 364-6262. E-mail: wmattes@genelogic.com We thank A. Brazma, Microarray Informatics, (EMBL-EBI) ; C. Bradfield, McArdle Laboratory for Cancer Research, University of Wisconsin, Madison, WI ; W. Tong, National Center for Toxicological Research, Jefferson, AR ; and W. Eastin, National Toxicology Program, National Institute of Environmental Health Sciences, Research Triangle Park, NC, for their review of this manuscript prior to submission. We also thank the microarray informatics team at EMBL-EBI, the expression profiler developers, and the ArrayExpress curation and development teams. We especially thank S. Contrino for his contribution to Tox-MIAMExpress. The ArrayExpress project is funded by EMBL, the European Commission [TEMBLOR (The European Molecular Biology Linked Original Resources) grant], the EBI Industry Programme (Biostandards) , the CAGE (Compendium of Arabidosis Gene Expression) consortium, and the Health and Environmental Sciences Institute (HESI) Toxicogenomics Database grant. The authors declare they have no competing financial interests. Received 25 August 2003 ; accepted 12 January 2004. The full version of this article is available for free in HTML or PDF formats. |