SDF Download Page
NTPHTS: National Toxicology Program High Throughput Screening Project
Structure-Index File
** Version 2c DSSTox Structure-Index File, updated 11 March 2009
A total of 9 structures modified from v2b
PubChem IDs removed from DSSTox file and provided in separate txt file to extract NCGC assay results from PubChem
Update: Instructions for accessing PubChem Bioassay Data associated with the NCGC NTPHTS 1408 structure-index file.
Quick & Easy File Downloads: FTP Download Instructions
Description
Source Website & Contact
Retrieving PubChem Bioassay Data
SDF Fields
SDF Content Summary
Version 2c Update
SDF Download Table
Acknowledgements, DSSTox Citation & Disclaimer
New Users: For general information, see DSSTox Project Goals and About DSSTox. For additional information on DSSTox SDF (Structure Data Format) files and their use in Chemical Relational Databases, see More on SDF and More on CRDs.
Description: The National Toxicology Program (NTP) has initiated a High Throughput Screening (HTS) Project to explore new approaches to evaluating chemicals across a spectrum of high-throughput biological assays. Assays are being selected based on their potential be informative of animal bioassay results and relevant to human health risk assessments. As an initial phase of this project, the NTP has provided a set of 1408 chemicals from NTP inventories for HTS in bioassays relevant to toxicology, to the NIH Chemical Genomics Center (NCGC), part of the NIH Molecular Libraries & Imaging Roadmap (MLR) Initiative . Assays are described and assay results reported in PubChem for this NTPHTS chemical data set in the same manner as for compounds from the Molecular Libraries Small Molecule Repository . For an early interview on this NTP HTS project, see: Newsletter of the Society for Biomolecular Sciences - Chris Portier: HTS Takes US National Toxicology Program to Next Level.
The DSSTox project is collaborating with the NTP HTS project to provide structure-annotation and chemoinformatics support for this effort. Drawing largely from the contents of the existing NTPBSI Structure-Index Locator File, with many of the 1408 NTP HTS chemicals having been used in historical NTP toxicity studies, the DSSTox NTPHTS Structure-Index File provides the full complement of DSSTox Standard Chemical Fields for the NTP HTS chemical set. Additionally, the DSSTox NTPHTS file is accompanied here by a .txt file containing the NCGC Source PubChem_SID (Substance Identifier) codes. NCGC is the Source Substance depositor for HTS assay data associated with the NTPHTS structures in PubChem.
Maintenance of this DSSTox NTPHTS file will be coordinated with expansion of the NTP HTS program. Inclusion of the DSSTox NTPHTS structure-index content in the DSSTox Master Structure-Index File additionally allows linkages to be made to other DSSTox Structure-Data Files, including the NTPBSI Structure-Index Locator File which links to the chemical-specific content of the NTP Bioassay On-line Database. See Coordinating Public Efforts and Work in Progress for additional information on DSSTox NTP collaborations.
Note: The DSSTox NTPHTS (1408) substance/structure inventory is also deposited in PubChem as part of the DSSTox Source Substance listing (and provides links to the NTP website and to this webpage); however, DSSTox Substances are not directly linked to NCGC Bioassay data since the latter were deposited by the NCGC Source.
Source Website: NIEHS’s National Toxicology Program, NTP High Throughput Screening Project
Source Contact: National Institutes of Environmental Health Sciences, National Toxicology Program: Ray Tice, email: tice@niehs.nih.gov; Cynthia Smith, email: smith19@niehs.nih.gov
Retrieving PubChem Bioassay Data (updated 10 March 2009)
A full list of NTP HTS bioassays for which data are available in PubChem can be obtained from the following keyword search:
Go to the PubChem Home Page
Under the header PubChem Text Search, select PubChem Bioassay and in the box to the right enter the keywords ntp ncgcListed will be all PubChem Assays IDs (AIDs) in which the DSSTox NTPHTS chemical inventory has been tested by the NIH Chemical Genomics Center (NCGC) exclusively for this 1408 set.
(as of 06Mar2009, there are 65 assays listed)To download data for a single assay, click on any "AID link"
Bioassay Summary: Bioassay Results: Data Table (All) Select
Bioactivity Analysis: Data Table, Complete Select Tab
At bottom of page, Results Exports: Bioassay or Structure
HTS bioassays used to screen this NTPHTS set are further described in PubChem on these pages and some assays are being used to screen larger sets of chemicals, with data also retrievable within PubChem using the appropriate Bioassay search criteria.
For example, HTS data for NTPHTS has been generated by the NCGC for 62,237 substances in the NCGC Small Molecule Inventory:
Cell viability assay (AID 559): http://pubchem.ncbi.nlm.nih.gov/assay/assay.cgi?aid=559
Alternatively, to batch retrieve Pubchem NCGC-NTPHTS substances and corresponding Assay data:
Download the following Substance ID list file (save to a folder on your desktop):
NTPHTS_NCGC_1408sids.txt
(file contains PubChem SIDs registered to NCGC for retrieving bioassay data for current NCGC NTP assays)
Go to Entrez Batch
Select "Pubchem Substance" in the "Database" drop down list.
Browse to your desktop to retrieve file NTPHTS_NCGC_1408sids.txt.
Click "Retrieve" to retrieve records
W hen page is returned, click on "Retrieve records for 1408 UID(s)".All 1408 NCGC Substances corresponding to the NTP HTS set will be listed:
Click on Links: ... "Bioassays" (and in the Pull down menu, select "PubChem Bioassays") to retrieve a full list of the bioassays in which these PubChem SIDs have been tested.
DSSTox Standard Chemical Fields (20)
Source_ChemicalName new field added Jan2009
NTPHTS SDF Content Summary - 11 March 2009
NTPHTS SDF Content |
Totals_v1a | Totals_v2a | Totals_v2b | Totals_v2c |
---|---|---|---|---|
# Records
|
1408
|
1408
|
1408
|
1408
|
DSSTox Standard Chemical Fields
|
18 |
18 |
19 |
20 |
NTPHTS Source Fields
|
1 |
3 |
3 |
2 |
Total # Fields
|
19 |
21 |
22 |
22 |
Chemical Content |
Counts_v1a | Counts_v2a | Counts_v2b | Counts_v2c |
defined organic |
1348
|
1348
|
1348
|
1348
|
inorganic |
27 |
27 |
27 |
27 |
organometallic |
19 |
19 |
19 |
19 |
no structure |
14 |
14 |
14 |
14 |
STRUCTURE_TestedForm_ DefinedOrganic: |
||||
parent |
1279
|
1279
|
1279
|
1279
|
complex |
50 |
50 |
50 |
50 |
salt |
18 |
18 |
18 |
18 |
salt complex |
1 |
1 |
1 |
1 |
TestSubstance_Description: | ||||
single chemical compound |
1343
|
1351
|
1354
|
1354
|
defined mixture or formulation |
47 |
* (NA) |
* (NA) |
* (NA) |
undefined mixture |
5 |
* (NA) |
* (NA) |
* (NA) |
macromolecule |
3 |
3 |
3 |
3 |
unspecified or multiple forms |
0
|
0
|
0
|
0
|
mixture or formulation |
* (NA)
|
54
|
51
|
51
|
* (NA) = field entry not applicable for DSSTox file version indicated
Note: NTPHTS contains 55 chemical records with duplicate structures (duplicate DSSTox_CID); 1 pair of these consists of a monomer and polymer with same STRUCTURE represented, whereas 54 of these pairs are ostensibly the same Test Substances (same TestSubstance_CASRN and TestSubstance_Description), but were obtained from different lots or chemical manufacturers (different Lot and Vial numbers are maintained in internal NTP tracking identification). Because these chemical substances are provided as distinct chemical samples for HTS evaluation, they must be differentiated as separate chemical records in the DSSTox SDF, as well as by distinct DSSTox_RID and PubChem_SID assignments. The HTS data for these duplicates will be presented as separate PubChem SID listings, but will be grouped for easy viewing and comparison by common DSSTox_Generic_SID and DSSTox_CID values. For more information on accessing NTP HTS data within PubChem, see Searching DSSTox Files in PubChem.
SDF Version History: An initial list of NTP HTS chemical names and CASRN was provided by NTP Sources Brad Collins and Cynthia Smith. Based initially on CASRN matching, DSSTox Standard Chemical Fields were populated largely from the existing DSSTox Master File content (see also Chemical Information Quality Review Procedures), with approximately 200 substances not contained in NTPBSI. Some corrections were applied to the original NTP HTS Source information (and communicated back to the NTP Source) and information on new substances not already occurring in the DSSTox Master File were entered into the Master FIle and NTPHTS after thorough Chemical Information Quality Review Procedures.
Version 2a Update: NTPHTS_v2a has no additional substance records, but includes includes minor QA corrections and two additional field, PubChem_CID and Note_NTPHTS. Changes to DSSTox Standard Chemical Fields include new ID fields: DSSTox_RID, DSSTox_Generic_SID and DSSTox_FileID (replacing DSSTox_SID and DSSTox_ID_FileName - see More on Standard Chemical Fields). Entries in theTestSubstance_Description field also have been simplified.
Version 2b Update: NTPHTS_v2b includes 1 structure modification and the new Source_ChemicalName field (25 character abbreviated InChI for use in structure-indexing applications) has been added as a DSSTox Standard Chemical Field to all DSSTox files.
Version 2c Update: NTPHTS_v2c includes 9 structure modifications and the new Source_ChemicalName field. PubChem_CID and PubChem_SID fields were deleted since NCGC currently maintains both the NTPHTS structures and corresponding bioassay data; NCGC PubChem SIDs are provided in a .txt file below for ease in accessing PubChem bioassay data on these compounds.
File Download Notes: The following files are offered in the DownLoad table below:
Structure Data File (SDF) is the main DSSTox product, providing the complete inventory of chemical structures, DSSTox Standard Chemical Fields, and all Source-specific data fields [Note: the structure field is blank for all records containing mixtures or undefined substances];
Data Table MS Excel (MS Office 2003) file contains the full SDF data contents in spreadsheet table form, minus the chemical structure field [file created with CambridgeSoft ChemFinder plug-in to MS Excel 2003];
Structures Table (PDF) file contains a tiled format graphical view of all chemical structures contained in the SDF file, annotated with TestSubstance_CASRN and truncated TestSubstance_ChemicalName field entries for the tested form of the chemical [file created with ACD ChemFolder, ver. 10.01, ACD Labs].
You will need Adobe Acrobat Reader, available as a free download, to view the Adobe PDF files on this page. See EPA's PDF page to learn more about PDF, and for a link to the free Acrobat Reader. |
Zip files may be decompressed using a utility such as JZip. |
|
||||||||||||||||||||||||||||||||||||
These files constitute the main DSSTox products. DSSTox Structure Data Files and DSSTox File Names adhere to strict formatting standards and conventions. For additional information, see More on DSSTox Standard Chemical Fields, Known Problems & Fixes, Chemical Information Quality Review Procedures, and How to Use DSSTox Files.
Quick & Easy File Downloads: FTP Download
Acknowledgements: The DSSTox SDF file of the NTP HTS chemical list was originally provided by Cynthia Smith and Brad Collins of the NTP. Initial structure annotation, QA review, problem resolution, and subsequent QA review, field additions, and structure modifications to NTPHTS_v2 were carried out by Maritja Wolf (Lockheed Martin, Contractor for EPA). In addition, we thank Ajit Jadhav (NIH, NCGC) for assistance in implementing the NTPHTS structures file with the NTP HTS assay data, and for providing instructions on batch retrieval of bioassay data in PubChem.
DSSTox Citation: Smith, C., B. Collins, R. Tice, M.A. Wolf, and A.M. Richard (2009) DSSTox National Toxicology Program High Throughput Screening Structure-Index File: SDF File and Documentation, Updated version: NTPHTS_v2c_1408_11Mar2009, www.epa.gov/ncct/dsstox/sdf_ntphts.html
Disclaimer: Every effort is made to ensure that DSSTox SDF files and associated documentation are error-free, but neither the DSSTox Source collaborators nor the EPA DSSTox project team make guarantees of accuracy, nor are any of these persons to be held liable for any subsequent use of these public data. The contents of this webpage and supporting documents have been subjected to review by the EPA National Center for Computational Toxicology and approved for publication. Approval does not signify that the contents reflect the views of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.
EPA/600/C-06/012