Jump to main content.


More on DSSTox Standard Chemical Fields

DSSTox Standard Chemical Fields are a central contribution of the DSSTox effort. They are included in each DSSTox Structure Data File and minimally describe the chemical content within each of these files. By providing a common set of chemical information fields spanning diverse toxicity databases, these fields offer common search metrics to explore diverse toxicity information domains. This page provides a general overview of the objectives, definitions, features and uses of DSSTox Standard Chemical Fields. See also Notes below.

Main Objectives:

small arrow bullet graphic To promote minimum standards for chemical structure annotation of public chemical toxicity databases
small arrow bullet graphic To accurately convey the actual form of the chemical substance tested, predicted, or represented in the original Source database
small arrow bullet graphic To provide identifiers for accurately indexing all DSSTox content by Structure, Test Substance, and Record
small arrow bullet graphic To provide chemical field content that will facilitate structure-based searching across DSSTox SDF files
small arrow bullet graphic To provide chemical information content that will facilitate use of DSSTox SDF files in structure-activity modeling studies

*** TestSubstance_ChemicalName field modified (February 2009) >More info

*** New Substance_modify_yyyymmdd field added (October 2008)

*** New STRUCTURE_InChIKey field added (February 2008)

*** Revised DSSTox Standard Chemical Fields (June 2007) >More info

For complete reference information, see:
DSSTox Standard Chemical Field Definition Table


DSSTox Standard Chemical Field Relationships:
The following figure illustrates the relationship between the STRUC
TURE_... content-linked fields (blue) and the TestSubstance_... content-linked fields (yellow), with the STRUCTURE_Shown field (pink) relating the two categories of fields. The light green ID fields are more specific than TestSubstance, and are uniquely assigned to each TestSubstance record (RID) and Source version record (FileID) across DSSTox data files. For more info on content-linked fields, see Note below.

For a full definition of a particular Standard Chemical Field, click on field of interest in the figure above or list below:

Back to top list. Return to Top

DSSTox Standard Chemical Field Notes:

blue bullet graphic Revisions to DSSTox Standard Chemical Fields - June 2007

blue bullet graphic Use of DSSTox Standard Chemical Fields
blue bullet graphic DSSTox Master Structure-Index File
blue bullet graphic DSSTox Source-Specific Chemical Fields
blue bullet graphic Creating a Defined Organic Parent (DOP) file for SAR studies
blue bullet graphic Content-linked groupings of Standard Chemical Fields
blue bullet graphic Chemical information content of existing toxicity databases
blue bullet graphic STRUCTURE Field - 2D versus 3D
blue bullet graphic Deemphasis on use of Chemical Names
blue bullet graphic Locating replicate chemical information in DSSTox SDF files
blue bullet graphic Unique substance record identification in DSSTox SDF files
blue bullet graphic Quality assurance of Standard Chemical Field information
blue bullet graphic Blank (or empty) chemical fields

Revisions to DSSTox Standard Chemical Fields - June 2007

DSSTox Standard Chemical Fields were revised in June 2007 to improve internal data management of structural information across diverse DSSTox toxicology data sets. All revised fields are defined, and modifications from earlier versions are noted, in the DSSTox Standard Chemical Field Definition Table. These changes were motivated by the need to:

small arrow bullet graphic represent chemical information uniformly and consistently across DSSTox databases (see DSSTox Master Structure-Index File);
small arrow bullet graphic assign unique chemical structure and test substance indexing (Record IDs, Chemical IDs and Substance IDs) for internal referencing, for cross referencing across data files, and for interfacing with outside projects, such as NIH PubChem;
small arrow bullet graphic to simplify content and eliminate redundant information.

The following changes have been made to existing Standard Chemical Fields (June 2007):

Additional details on the use of the various DSSTox ID fields are provided for the DSSTox Master File and in Quality Review and Data File Construction The following fields from previous version files have been eliminated as Standard Chemical Fields due to their reliance on Source-specific information and occasional use. They may be used, as needed, as Source-specific content fields.

Back to top list. Return to Top

Use of DSSTox Standard Chemical Fields:

DSSTox Standard Chemical Fields are used to minimally annotate the general chemical information content of DSSTox data files. In each DSSTox Field Definition File, these fields are listed and defined first and separately from the DSSTox Standard Toxicity Fields and the Source-specific toxicity-related fields, the latter of which are primarily biological in nature and study-specific. The entire complement of DSSTox Standard Chemical Fields is included in each DSSTox database. However, only those allowable field entries that are needed to encompass the information content of the particular DSSTox database under construction are included. For example, if no inorganic or organometallic structure is present in a particular database, then these entries will not be listed as allowable field entries in the DSSTox Field Definition File for that database. In addition, DSSTox Standard Chemical Fields constitute the primary content of DSSTox Structure-Index Files..

Back to top list. Return to Top

DSSTox Master File:

The DSSTox Master File was created to facilitate central management of DSSTox Standard Chemical Fields for all unique chemical substances occuring in all published DSSTox Data Files, and to aid new DSSTox . This is a dynamic file that grows with the addition of new unique test substances during the construction of each new DSSTox data file.

The DSSTox Master File is a consolidation of all DSSTox Standard Chemical Fields in all DSSTox Structure Data File Types, i.e. Databases, data file creation and quality reviewStructure-Index Files, and Structure-Index Locator Files, published, unpublished, and in-development. The DSSTox Master File content that has been subject to extensive Chemical Information Quality Review Procedures prior to its incorporation into the Master File. The DSSTox Master File is no longer offered as a downloadable SDF due to its increasing complexity, size, and prohibitions by the commercial concern, CAS SciFinder exit EPA, for dissemination of more than 10K CAS Registry numbers in any public database. The DSSTox project views the CAS RN as essential annotation to DSSTox records due to historical use and reliance, despite their non-unique nature and unreliability as a definitive substance indexing system for a public database concern.

Back to top list. Return to Top

DSSTox Source-specific Chemical Fields:

DSSTox Source-specific fields generally pertain to experimental results associated with the TestSubstance, i.e. biological or toxicological information obtained from the Source. In some cases, however, DSSTox Source-specific fields will relate to chemical information and yet be study-specific. An example is the field ChemClass_FHM, which pertains to general organic classifications that were used in the EPAFHM database study. Other DSSTox databases employ different general organic classifications and thus the field is named specific to the study (e.g., ChemClass_DBP in DBPCAN, and ChemClass_ERB in NCTRER). In other cases, DSSTox Source-Specific Fields will include one or more chemical property fields (e.g., CLOGP in EPAFHM), whose values were generated using a version of a proprietary or commercial application. At this time, fields such as this are not included in the list of DSSTox Standard Chemical Fields since the property calculation module is not publicly available, there are alternate calculation models available, the version and name of the application must be specified, and the field is not generally included across most or all DSSTox SDF files. The only exception to this rule is the TestSubstance_ChemicalName field, which contains the Source-assigned chemical name, for cross-referencing purposes, but is considered a Standard Chemical Field due to its importance as a central chemical identifier. Note: As of Feb2009, this is no longer the case. The Source-assigned chemical name is now carried in a new field Source_ChemicalName, and the TestSubstance_ChemicalName field contains a single generic or common name assigned to each DSSTox_Generic_SID.

Back to top list. Return to Top

Creating a Defined Organic Parent (DOP) file for SAR Studies:

DSSTox Standard Chemical Fields contain all the information needed to easily create a simplified-to-parent DSSTox "Defined Organic Parent" (DOP) SDF file for use in Structure-Activity Relationship (SAR) modeling studies:

small arrow bullet graphic STRUCTURE_ChemicalType field can be used to separate "defined organic" compounds from all other compounds (inorganics, organometallics, mixtures);
small arrow bullet graphic for the subset of "defined organic" compounds, STRUCTURE_TestedForm_DefinedOrganic field can be used to identify and segregate salt or complex entries;
small arrow bullet graphic for all "defined organic" compounds that are in salt or complex form (i.e., more than a single chemical entity in the STRUCTURE field), a STRUCTURE_Parent_SMILES field containing the SMILES code for the corresponding "desalted" structure that can be used to generate a "desalted" STRUCTURE field entry;
small arrow bullet graphic Use DSSTox_CID field to locate all instances of "replicates" in the database with respect to 2D structure (i.e., isomers), parent structure (different salt or complex forms of the same parent), or CAS (substances with different purity or composition, but represented by the same STRUCTURE field (see Note).

Assuming an informed user with access to a general Chemical Relational Database application, the following steps can be used to construct a DOP file for these purposes:

1. Search for all records satisfying the condition:
STRUCTURE_ChemicalType
= "defined organic". Rename Subset A.

2. In Subset A, search for all records satisfying the conditions:
STRUCTURE_TestedForm_DefinedOrganic
= "salt" or "complex". Rename Subset B.

3. For all records in Subset B, convert STRUCTURE_Parent_SMILES to structure using a standard "SMILES-to-structure" converter (see More on SMILES) and enter "desalted" structure in STRUCTURE field. Replace Subset B records in Subset A. Rename Subset A and save as DOP SDF file.

4. Alternatively, just convert entire column of STRUCTURE_Parent_SMILES to structure and replace contents of STRUCTURE field for all compounds of Subset A, rename and save as DOP SDF file.

Note that the modified STRUCTURE field entries will require changes in the entries of all corresponding STRUCTURE_... content-related fields. Also, replicates will need to be resolved by the user, e.g. one representative record chosen. Once these changes are made, a new DSSTox DOP database file can be finalized by adding ", simplified to parent" as a qualifier to the previous entry in the STRUCTURE_Shown field (e.g., ("tested chemical, simplified to parent").

Note: As of 01Mar2005, DOP files are no longer a standard offering for DSSTox databases containing salts and complexes (see reasons).

Back to top list. Return to Top

Content-linked groupings of Standard Chemical Fields:

As shown in the figure above, (Field Relationships), we have constructed two categories of DSSTox Standard Chemical Fields to clearly distinguish information in the STRUCTURE_... content-related fields from information in the TestSubstance_... content-related fields.

small arrow bullet graphic The contents of the STRUCTURE field are used to automatically generate all STRUCTURE_... content-related fields and are explicitly related to these fields. These STRUCTURE_... content-related fields are also directly related to and indexed by the DSSTox Chemical Identifier (DSSTox_CID) integer-value field.

small arrow bullet graphic The TestSubstance_... content-related fields are explicitly related to the actual test substance used in the toxicity experiment or toxicity evaluation, and the Source characterization of the test substance. These TestSubstance_... content-related fields are also directly related to and indexed by the DSSTox Substance Identifier (DSSTox_Generic_SID) integer-value field.

small arrow bullet graphic The contents of the STRUCTURE_Shown field links the STRUCTURE_... content-related fields to the TestSubstance_... content-related fields by specifying the relationship between the two categories of fields, i.e., what is shown to what was actually tested (see Field Relationships figure).

The need for this type of Standard Chemical Field construction accompanied the decision to include a representative structure in DSSTox data records for TestSubstance_Description = "mixture or formulation". Defined mixture entries, i.e. mixtures with known components, occur in a diverse array of public toxicity databases. Structure-annotation of these records does not necessarily serve SAR studies, since these records may be excluded. However, structure-annotation of defined mixture records allows structure-search locating of all records pertaining to a particular chemical structure, its close structural analogs, or defined mixtures containing the component structure or its analogs. A longer-term goal of the DSSTox Project is to identify all mixture components through the use of DSSTox_CID linkages to the primary DSSTox_Generic_SID, so that the mixture could be found by a structure-search of any one of the known mixture components.

Back to top list. Return to Top

Chemical information content of existing toxicity databases:

Many publicly available toxicity databases, not currently indexed by chemical structure, have diverse chemical data contents. Typically, the majority of the data are for defined organics (i.e., chemicals that can be represented by a defined chemical structure). In addition, toxicity databases often include toxicity information for mixtures and/or ill-defined substances that cannot be represented by a defined chemical structure, or for inorganics and organometallics that include coordinated or bound metals. DSSTox SDF files are particularly designed for use in creating a structure-searching capability and/or developing SAR (Structure Activity Relationship) predictive models. These uses primarily consider defined organics with unique 2D chemical structures and, in some cases, other substances that can be represented by a 2D structure.

Standard practice in toxicology is to report the tested form of a chemical (i.e., salt or complex), whereas SAR modeling studies typically simplify salts and complexes to the neutral parent form prior to modeling or exclude these chemicals entirely from a modeling study (see note on Creating a DSSTox DOP file for SAR studies). Frequently, insufficient information is provided in the literature to determine how salts and complexes were considered in SAR model development, and whether the modeled form of the chemical used in the SAR study was the actual tested form. Use of DSSTox Standard Chemical Fields should encourage more consistent reporting of this information in relation to the modeled forms of chemicals used in SAR studies.

Primary indexing of chemical substance information within the DSSTox Master File and DSSTox SDF files is accomplished with the DSSTox_CID and DSSTox_Generic_SID chemical and test substance identifiers, fields that track the STRUCTURE-content and TestSubstance content-related fields (see Field Relationships figure).

Back to top list. Return to Top

STRUCTURE field - 2D versus 3D:

DSSTox SDF files contain 2D chemical structure representations that can be drawn with standard chemical drawing programs. A standard 2D rendering of the molecular structure provides an easily interpreted visual representation of the structure and is the most useful representation for data mining and structure-searching applications. DSSTox structures are stored in what is known as "MDL mol file" format within a standard SDF file (More on SDF). The mol format includes cartesian coordinates, bond connectivities, and bond types, and can support either 2D or 3D structure representations. DSSTox SDF files can be used in Chemical Relational Database structure-search applications, or in Structure-Activity Relationship (SAR) modeling studies. There are a number of specialized resources available to SAR modelers that can batch-convert an SDF file containing 2D chemical structure representations to an SDF file containing reasonable 3D representations (See also SDF Viewers, Structure Browsers & CRD Applications ).

Note: No other DSSTox Standard Chemical Fields are sensitive to 3D coordinates; hence, they are unaffected by conversion of the STRUCTURE field from 2D to 3D.

Back to top list. Return to Top

Deemphasis on use of Chemical Names:

A primary objective of the DSSTox project is to promote the primacy of chemical structure as the most useful and informative chemical search metric for exploring toxicity information. Hence, the chemical identifier fields, TestSubstance_CASRN and TestSubstance_ChemicalName, which carry little intrinsic chemical information, are employed here as secondary substance references, mainly for cross-referencing purposes back to the Source and the historical literature. The field TestSubstance_ChemicalName generally corresponds to the chemical name listed in the original Source database, and corresponds to the first listed name if more than one name is provided by the Source. Any detected errors or truncations of the chemical name occurring in the Source database are corrected in the DSSTox SDF file; however, there is no further refinement of the chemical name to be more precise or informative. Hence, we do not recommend use of the TestSubstance_ChemicalName field for definitive identification purposes. A user interested in locating chemical name synonyms is referred to public sources such as ChemFinder.com or NLM's ChemID Plus exit EPA. Note that a systematic IUPAC Chemical Name (STRUCTURE_ChemicalName_IUPAC) is provided in DSSTox data files that directly corresponds to and is generated automatically from the contents of the STRUCTURE field.

Unlike the TestSubstance_ChemicalName field, which is the only DSSTox Standard Chemical Field carrying Source-specific information and varying across DSSTox data files, the TestSubstance_CASRN field entry is consistent across DSSTox data files and represents our best judgement as to which CASRN is most recent and relevant to the corresponding substance record. Since this may not in all cases correspond to the CASRN assignment of the Source, any discrepancies and alternate CASRN are provided in the Source-specific Note_NAMEID field. For further details pertaining to TestSubstance_CASRN assignment, see DSSTox Chemical Information Quality Review Procedures.

Update February 2009: A new non-standard field, Source_ChemicalName, is being added to DSSTox files published as of this date to provide a direct correspondence of the DSSTox data file to the Source-provided chemical name. The TestSubstance_ChemicalName wil henceforth carry a single common or trade name of chemical in the DSSTox Master file, shared by all DSSTox Data files. We do not characterize this as a standard chemical name, as such, since IUPAC systematic names better fit that description, but we now characterize TestSubstance_ChemicalName more accurately as a DSSTox Standard Chemical Field, since it has a common data entry for each DSSTox_Generic_SID across all DSSTox files.

Back to top list. Return to Top

Locating replicate chemical record information in DSSTox SDF files:

Replicate chemical record information pertaining to chemical structure can be most reliably located using the DSSTox_CID field entry, which is unique to chemical structure and common across all DSSTox data files. Replicate chemical record information pertaining to test substance can be most reliably located using the DSSTox_Generic_SID entry, which is unique to test substance characteristics and common across all DSSTox data files. Note that DSSTox_Generic_SID is more specific than DSSTox_CID, i.e. two different test substances (assigned different DSSTox_Generic_SIDs) can share the same DSSTox_CID and chemical structure representation. However, the converse is not true, hence, different DSSTox_CIDs will always correspond to different DSSTox_Generic_SIDs.

In previous versions of DSSTox data files, the ChemicalReplicateCount field was incorporated to allow for easy identification and counting of "replicates" with respect to 2D structures, parent structures, or CAS registry numbers within a particular DSSTox data file. This field is no longer considered a Standard Chemical Field, however, due to its reliance on Source-specific information and occasional use. The field may be used, as needed, as a Source-specific content field (e.g., FDAMDD).

Back to top list. Return to Top

Unique record identification in DSSTox SDF files:

The fields DSSTox_RID (new DSSTox field, June 2007) and DSSTox_FileID (replaces DSSTox_FileName_ID, June 2007 ) provide for unique identification and location of any record in a current or archived DSSTox SDF file. The DSSTox_FileID entry specifies a numerical ID counter, from 1 to the total number of records in a file, followed by an abbreviated file name that includes the NAMEID and file version (e.g., 5_CPDBAS_v4a for record 5). Since replicate records can exist within a DSSTox SDF file with respect to fields such as STRUCTURE (i.e., same DSSTox_CID) or TestSubstance_CASRN (i.e., same DSSTox_Generic_SID), the DSSTox_FileID field entry provides a simple, unambiguous means for locating a specific record in any DSSTox SDF file, including past versions of files. The DSSTox_FileID numerical counter can also be used to conveniently sort or segregate records upon SDF import or in a merging of DSSTox data files. The DSSTox_FileID is essential for documenting the origin of each record if a user merges DSSTox SDF files into a single or larger database file. This field also greatly facilitates replacement of an older version of a particular DSSTox SDF file with a newer version in a merged database. Finally, when reporting errors, users need only specify the DSSTox_FileID field entry of the record in question to uniquely identify the record containing the error.

The DSSTox_RID is an integer field with a 1:1 correspondence to the current DSSTox_FileID field entry, but whereas the latter will change with DSSTox file version, the DSSTox_RID is file version independent unless the TestSubstance or Structure characteristics or the record change. This field allows for a constant record/substance identifier that is invarient to DSSTox file version changes to Source-specific content. For more information on use of these fields, see DSSTox Master File.

Back to top list. Return to Top

Quality assurance of DSSTox Standard Chemical Field information:

For details of quality assurance procedures used in DSSTox SDF file construction and review, see DSSTox Chemical Information Quality Review Procedures. Information on quality review procedures is also provided for each DSSTox database in the Log File (see Sample Log File) posted for download on the corresponding DSSTox SDF Download Page (see, e.g., EPAFHM).

Note: Users can access File Error Report or Contact Us to report residual errors for correction in subsequent versions.

Back to top list. Return to Top

Blank (or empty) chemical fields:

When a user encounters an empty or null standard chemical field in a DSSTox SDF record, it generally means that the information is either not available or the field is not applicable for that data record. In the case of missing or unavailable TestSubstance_CASRN, the field entry "NOCAS" is used. Blank entries in DSSTox Standard Chemical Fields will also occur in the following cases:

green bullet graphic If TestSubstance_Description entry is “mixture or formulation” or “unspecified or multiple forms” and no structure can be assigned, blank entries will occur in all STRUCTURE_... content-related fields except STRUCTURE_Shown and STRUCTURE_ChemicalType, which will have a "no structure" entry;

green bullet graphic If STRUCTURE_ChemicalType has any entry other than "defined organic", then STRUCTURE_TestedForm_DefinedOrganic is blank entry;

green bullet graphic ChemicalNote field is blank unless additional information is available for record.

Note: If the first record of a DSSTox SDF file contains a null or blank entry due to one of the above conditions being met, the word "blank" is entered as a place-holder field entry to enforce the original ordering of fields upon SDF import into a Chemical Relational Database application. After SDF import, a user can choose to delete these place-holder field entries

Back to top list. Return to Top


Local Navigation


Jump to main content.