SDF Download Page
ARYEXP: European Bioinformatics Institute (EBI) ArrayExpress Repository for Gene Expression Experiments
Structure-Index Locator File
** Updated Version 2a DSSTox Structure-Index Locator File, 06 March 2009 (Source website content extracted 20Jan2009)
Quick & Easy File Downloads: FTP Download Instructions
Description
Auxiliary Data File (ARYEXP_Aux_v2a)
Source Website & Contact
Main Citation
Guidance for Use
SDF Fields
Version 2 Update **
SDF Content
New Users: For general information, see DSSTox Project Goals and About DSSTox. For additional information on DSSTox SDF (Structure Data Format) files and their use in Chemical Relational Databases, see More on SDF and More on CRDs.
Description: The European Bioinformatics Institute (EBI) ArrayExpress Repository is a public repository for transcriptomics, gene expression data that supports use of MIAME guidelines in accordance with the Microarray Gene Expression Data Society (MGED) recommendations. Since the online publication of ArrayExpress in 2002, the ArrayExpress Repository had grown to more than 7,700 experiments in 2009. Public data in ArrayExpress are made available for browsing and querying on experiment properties, submitter, species, etc. Queries return summaries of experiments and complete data, or subsets can be retrieved (http://www.ebi.ac.uk/microarray-as/aer/entry). Recent additions to ArrayExpress include new portals for programmatic access where users can query and download data in a systematic or automatic manner from the ArrayExpress FTP site.
The EBI ArrayExpress Repository and the National Center for Biotechnology Information (NCBI) GEO Series (see accompanying DSSTox file, GEOGSE) are the two main public respositories of gene expression data and microarray experiments associated with the scientific literature. Deposition of data into one of these two resources is now a precondition and standard requirement for journal publication of microarray studies. At the time of this writing, neither resource has standard requirements for reporting of chemical information associated with submitter-deposited experiments. As a result, until now it has been difficult to assess the chemical-related content or, more specifically, the chemical exposure-related content in these resources such that microarray experiments have been isolated from other public sources of chemically-indexed information pertaining to toxicology. This DSSTox project was undertaken to use chemical information linkages to contribute to building a public toxicogenomics capability and to encourage the application of structure-activity relationship (SAR) concepts to gene expression data where sufficient comparable experiments on chemical analogs are available.
The DSSTox ARYEXP data file is a chemical-index file of unique chemical substances pertaining to the chemical exposure-related experimental content (identified by us as Chemical_StudyType ="Treatment") within the ArrayExpress Repository as of the date of data extraction (see Note). The chemical exposure-related content of the ArrayExpress Repository was identified through a series of automated methods that filtered for characteristics such as experimental design type- compound treatment, dose response, or time course; the occurrence of keywords- compound, chemical, treatment, drug, etc., in the experimental description category; or the occurrence of specific accession numbers such as TOXM ). These automated methods, however, were insufficient and had to be supplemented by extensive manual curation and review of the chemical content extracted from ArrayExpress fields and free text description submitter entries (Williams-Devane et al. 2009). The final DSSTox ARYEXP file contains the full complement of DSSTox Standard Chemical Fields for each unique substance, as well as URL link(s) to one or more chemical-specific Experiment_Accession number data page(s) within the ArrayExpress Repository. All ArrayExpress Experiment Accession numbers pertaining to the same chemical substance (i.e., the same DSSTox_Generic_SID) are listed in the Experiment_Accession field in the same ARYEXP chemical record.
The DSSTox ARYEXP chemical index file has been incorporated into the DSSTox Structure-Browser, and deposited into PubChem, enabling a user to locate particular chemical-associated experiments or those associated with close chemical analogs through a structure similarity search.
ARYEXP Auxiliary Data File: During the course of this project, a large amount of chemical-associated information is initially curated from the full ArrayExpress Repository file that is of potential use for toxicogenomics investigations. Prior to identifying chemical exposure-related ArrayExpress content (i.e., Treatment vs. other uses, such as Reference, Vehicle, Media, etc), we create a full listing of ArrayExpress Repository chemical-experiment pairs (i.e., one record per Experiment Accession number, with some DSSTox_Generic_SID substances spanning multiple records and experiments), along with a full complement of summary experimental descriptors and indices provided by ArrayExpress. These summary experimental fields include MIAME score elements, species, array type, number of samples, etc, as well as URL linkages to raw data, etc. This content is contained in the Auxiliary Data File (ARYEXP_Aux) offered in the Download Table below in SD or table format. The file contains the full complement of DSSTox Standard Chemical Fields, as well as 44 Source-specific content fields from ArrayExpress experiment annotations (an MS Word doc file listing all fields and their definitions is also included in the Download Table below). The content of these files will be incorporated, along with the GEOGSE files, into the Chemical Effects in Biological Systems (CEBS) database and are being provided to the EBI ArrayExpress project in the hopes of improving chemical annotation and data linkages of public gene expression resources in the future.
Source Website: EBI ArrayExpress is located online at http://www.ebi.ac.uk/microarry-as/aer .
Note: The EBI ArrayExpress Repository is regularly updated; the DSSTox ARYEXP_v2a content represents a snapshot of the chemical exposure-related content of that repository extracted on 20Jan2009 (v1a corresponded to data extraction on 20Sep2008). |
Source Contact: Contact ArrayExpress staff at arrayexpress@ebi.ac.uk.
Main Citations: For more information on this project and procedures used to extract data and chemically annotate gene expression experiments in the two main public repositories, ArrayExpress and GEO, see:
Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009) DSSTox Chemical-index Files for Exposure-Related Experiments in ArrayExpress and Gene Expression Omnibus: Enabling Toxico-chemogenomics Data Linkages, Bioinformatics, 25:692-694.
Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009) Towards a public toxicogenomics capability for supporting predictive toxicology: Survey of current resources and chemical indexing of experiments in GEO and ArrayExpress, Toxicology Sciences, in press.
Guidance for Use: ARYEXP represents a departure from previously published DSSTox data files, which either contain toxicology data of potential use for structure-activity relationship (SAR) modeling, or are high-interest chemical inventories for environmental toxicology from the EPA or National Toxicology Program. This is the first DSSTox file to chemically index a public repository of microarray experiments of potential use for toxicogenomics investigations. The DSSTox ARYEXP file is an inventory of unique chemical substances, with each chemical mapped to one or more experiments contained within the ArrayExpress Repository and, in each case, chemical exposure (or treatment) is deemed a primary objective of the experiment. The file was created to encourage consideration of chemical structure and chemical similarity as an organizing principle for such data, to aid in association of common gene expression patterns, and to aid in the aggregation of multiple data types for potential toxicogenomics investigation. Users should be aware that the chemically indexed experimental content of the public ArrayExpress Repository spans a large diversity of treatment conditions, species, array types, data annotation, laboratories, etc. Hence, data aggregation by chemical or chemical similarity must also consider and attempt to control for these many variables in a public repository. An auxiliary data file, ARYEXP_Aux, is offered for download that includes a larger set of chemical-experiment pairs (including all categories of chemical experiment association) for the ArrayExpress Repository and 44 additional data fields.
DSSTox Standard Chemical Fields (20)
Source_ChemicalName new field added Feb2009
Note_ARYEXP
Chemical_StudyType
Experiment_Accession
Experiment_URL
e.g. Acetonitrile: E-TOXM-31
Version 2 Update: ARYEXP_v2a and ARYEXP_Aux_v2a contain updated content extracted from the ArrayExpress website as of 20Jan2009 (v1a corresponded to data extraction on 20Sep008). Method of data extraction and file construction is documented in the Main Citations. A total of 191 new experiments were determined to be associated with a chemical substance and were included in the updated ARYEXP_Aux_v2a file. Of these, 163 were labeled by us as chemical "treatment" experiments, and these new experiments correspond to 74 new unique chemicals (3 chemicals were deleted from v1a, leaving a total of 71 new unique chemical records associated with "treatment" experiments). Hence, 163 new chemical treatment experiment links are provided in PubChem (for a total of 1999 PubChem chemical-experiment pair entries), 71 new chemicals with links to one or more experiments have been added to the ARYEXP_v2a structure-index file, and a total of 161 new URLs to experiments were added. The chemical content totals for ARYEXP_v1a and v2a are summarized in the table below.
Whereas in v1a, the field TestSubstance_ChemicalName was used to store the Source-provided chemical name obtained from the ArrayExpress experimental record (with all abbreviations and sometimes errors), in v2a (and in all DSSTox files posted after Jan09), this Source-provided chemical name has been moved to a new field Source_ChemicalName. The Standard Chemical Field, TestSubstance_ChemicalName, now carries a default, quality-reviewed chemical name used for this substance (DSSTox_Generic_SID) across all DSSTox files (this can be a common, generic or trade name). Structure_InChI and Structure_InChIKey codes have been updated to correspond to the newly published NIST recommended standard InChI options (see http://www.epa.gov/ncct/dsstox/MoreonInChI.html#InChIDSSTox).
** Note that a misalignment of PubChem substances to URL listings in v1a caused misdirection of some substances to ArrayExpress Experiment descriptions. This problem has been corrected in v2a.
For more information and version history, and to locate specific updated chemical records, consult the ARYEXP_LogFile in the Download Table below and version update entries in the Note_ARYEXP field.
ARYEXP SDF Content Summary - 06 March 2009
ARYEXP SDF Content |
Totals_v1a | Totals_v2a |
---|---|---|
# Unique Chemical Records
|
887
|
958
|
DSSTox Standard Chemical Fields
|
20 |
20 |
DSSTox Standard Toxicity Fields
|
1 |
1 |
ARYEXP Source Fields
|
4 |
5 |
Total # Fields
|
25 |
26 |
Total # Treatment Experiment Accession IDs*
|
1836 |
1999 |
Chemical Content |
Counts_v1a | Counts_v2a |
defined organic |
628
|
674
|
inorganic |
60 |
61 |
organometallic |
20 |
20 |
no structure |
179 |
203 |
STRUCTURE_TestedForm_DefinedOrganic: | ||
parent |
544
|
585
|
complex |
61 |
66 |
salt |
23 |
23 |
salt complex |
0 |
0 |
TestSubstance_Description: | ||
single chemical compound |
669
|
716
|
macromolecule |
165 |
182 |
mixture or formulation |
42
|
46
|
* Note: Total includes replicate Experiment Accession IDs and corresponds to unique chemical-experiment pairs, which includes many cases where the same Experiment Accession ID is mapped to different unique chemicals (i.e., experiment/study tested many chemicals).
File Download Notes: The following files are offered in the DownLoad table below:
Structure Data File (SDF) is the main DSSTox product, providing the complete inventory of chemical structures, DSSTox Standard Chemical Fields, and all Source-specific data fields [Note: the structure field is blank for all records containing mixtures or undefined substances];
Data Table MS Excel (MS Office 2003) file contains the full SDF data contents in spreadsheet table form, minus the chemical structure field [file created with CambridgeSoft ChemDraw Ultra plug-in to MS Excel 2004];
Structures Table (PDF) file contains a tiled format graphical view of all chemical structures contained in the SDF file, annotated with TestSubstance_CASRN and truncated TestSubstance_ChemicalName field entries for the tested form of the chemical [file created with ACD ChemFolder, ver. 11.00, ACD Labs].
You will need Adobe Acrobat Reader, available as a free download, to view the Adobe PDF files on this page. See EPA's PDF page to learn more about PDF, and for a link to the free Acrobat Reader. |
Zip files may be decompressed using a utility such as JZip. |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
These files constitute the main DSSTox products. DSSTox Structure Data Files and DSSTox File Names adhere to strict formatting standards and conventions. For additional information, see More on DSSTox Standard Chemical Fields, Known Problems & Fixes, Chemical Information Quality Review Procedures, and How to Use DSSTox Files.
Quick & Easy File Downloads: FTP Download
Acknowledgements: All original and updated file content was extracted from the on-line ArrayExpress resource by ClarLynda Williams, using a combination of automated and manual curation. QA review, corrections to submitter chemical information, and structure annotation were carried out by Maritja Wolf (Lockheed Martin, Contractor for EPA). We thank Jennifer Fostel (NIEHS CEBS) and Chihae Yang (Ohio State University) for their helpful comments in the review of this work. We also thank Tom Transue (Lockheed Martin, Contractor for EPA) for assistance with loading of ARYEXP into the DSSTox Structure-Browser and QA review, and Erik Griffis for assistance in reviewing v1a ArrayExpress content. Updated files were created by ClarLynda Williams and Maritja Wolf.
DSSTox Citation:
Williams-Devane, C.R., M.A. Wolf, and A.M. Richard (2009)
DSSTox European Bioinformatics Institute (EBI) ArrayExpress Repository for Gene Expression Data (ARYEXP and ARYEXP_Aux): SDF Files
and Documentation, Updated versions: ARYEXP_v2a_958_06Mar2009, ARYEXP_Aux_v2a_2556_06Mar2009, www.epa.gov/ncct/dsstox/sdf_aryexp.html
Disclaimer: Every effort is made to ensure that DSSTox SDF files and associated documentation are error-free, but neither the DSSTox Source collaborators nor the EPA DSSTox project team make guarantees of accuracy, nor are any of these persons to be held liable for any subsequent use of these public data. The contents of this webpage and supporting documents have been subjected to review by the EPA National Center for Computational Toxicology and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use. See additional disclaimers.
EPA/600/C-06/009