Annotation and Cross-Indexing of Array Elements on Multiple Platforms William B. Mattes Investigative Toxicology, Pfizer Inc, Kalamazoo, Michigan, USA Abstract On the surface, transcript profiling using microarrays seems to offer a way of looking at the global response of the cell to perturbation, with a focus on changes in gene expression. The difficulty, however, is that the response of a particular gene is actually measured on the array by an element that is a short, defined nucleic acid sequence. Sequences that map back to the same genetic locus may actually be given different names and descriptions when they are deposited in public sequence databases ; when such sequences are used in microarray construction, elements that monitor the same genetic locus may have different names and descriptions. The algorithm described here uses a hierarchical approach to assign a single best annotation to the elements in a given microarray in such a fashion that elements from one microarray platform may be cross-indexed with those of another. The algorithm relies on the nucleic acid accession number for a given array element, and uses that to retrieve annotation from the most recent versions of LocusLink and UniGene. Both database resources are searched, with a priority being given to annotation derived from the curated LocusLink database. In lieu of annotation found in these databases, the default GenBank annotation is used. As a final outcome, a cross-chip identifier is generated that may be used to cross-index array elements. The program is available as a practical extraction and report language (Perl) script that can run under any Perl interpreter. Key words: annotation, cross-platform, indexing, LocusLink, microarray, UniGene. Environ Health Perspect 112:506-510 (2004) . doi:10.1289/txg.6698 available via http://dx.doi.org/ [Online 15 January 2004] This article is part of the mini-monograph "Application of Genomics to Mechanism-Based Risk Assessment." Address correspondence to W.B. Mattes, Gene Logic Inc., 610 Professional Dr., Gaithersburg, MD 20879 USA. Telephone: (240) 364-6238. Fax: (240) 364-6262. E-mail: wmattes@genelogic.com The author thanks the many colleagues who offered support and advice. The input of B. Pennie (Pfizer Inc) , P. Lord (Johnson & Johnson Pharmaceutical Research Division) , R. Paules [National Institute of Environmental Health Sciences (NIEHS) ], and D. Robinson (Pfizer Inc) from the International Life Sciences Institute Health and Environmental Sciences Institute Committee on the Application of Genomics to Mechanism-Based Risk Assessment was critical to the initiation and continuation of this effort. J. Fostel (NIEHS) , I. Reardon (Pfizer Inc) , C. Storer (Pfizer) , and M. Lawton (Pfizer Inc) offered especially helpful comments over the course of this project on the algorithm and Perl programming in general. The author also thanks C. Bradfield (McArdle Laboratory for Cancer Research, University of Wisconsin) for a careful review of this article. Finally, the author is indebted to S. Pettit (ILSI HESI) for her constant support and suggestions. The authors declare they have no competing financial interests. Received 25 August 2003 ; accepted 12 January 2004. The full version of this article is available for free in HTML or PDF formats. |