Jump to main content.


More on Structure Data Format (SDF)

Structure Data Format (SDF) files, also known as SD Files, are simple, ASCII text files that adhere to a strict format for representing multiple chemical structure records and associated data fields. The format was originally developed and published by Molecular Design Limited (MDL) exit EPA and has come to serve as the most widely used public standard for exchange of structure/data information on chemicals. Virtually all Chemical Relational Database (CRD) applications used for structure-searching of chemical information are capable of importing and exporting SDF files (More on CRDs). The topic areas below provide further information on SDF files.

blue bullet graphic Details of SDF and other MDL file formats
blue bullet graphic Sample SDF file
blue bullet graphic DSSTox SDF files
blue bullet graphic Compliance with SDF standard specifications (MDL CTFiles)
blue bullet graphic Public SDF Tools
blue bullet graphic Known Problems & Fixes


Details of SDF and other MDL file formats:

MDL technical documentation on SDF and other MDL file formats can be downloaded from the MDL website:
http://www.mdli.com/downloads/public/ctfile/ctfile.jsp exit EPA
In addition, users are referred to the main literature citation for MDL file formats:

Dalby, A., J.G. Nourse, W.D. Hounshell, A.K.I. Gushurst, D.L. Grier, B.A. Leland, J. Laufer (1992) Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited, J. Chem. Inf. Comput. Sci. 32:244-255.

The general format of an SDF file consists of blocks of information, with a single compound record format represented below (Dalby et al., 1992, Fig. 11, Section 5):

Single SDF record graphic consisting of 5 lines: MOLfile, Data Header, Data, Blank line, and end of record delimiter.

*c = Compound record format is repeated for the length of the SDF file.
*d = Data item format is repeated for each data item associated with a compound record.
*l = A separate line is used for each data value.
MOLfile format is the MDL format for storage of chemical structure information.

Return to the list above Return to Top

Sample SDF File:

In addition to their widespread use, the many consistent formatting features of SDF files and the ease of viewing and editing these files have made them ideal for DSSTox development purposes. An SDF file is simple ASCI II text; hence, it can be viewed in any conventional word processor. A sample SDF file is shown below for 2 simple compound records (1,2-trans-dichloroethene and bromochloroacetonitrile) containing 4 data fields each.


Sample SDF file view of two molecule records, showing atom-bond connection table followed by 4 data fields.

Note that if a field entry is blank or null for any particular record in the CRD, the field will not be listed in that record of the SDF file.

Return to the list above Return to Top

DSSTox SDF Files:

DSSTox SDF files adhere both to MDL SDF standards and to some additional restrictions on SDF content. DSSTox data files are characterized as "clean SDF" in the sense that they have been purged of CRD application-depended information that is automatically inserted upon SDF export from CRD applications used in DSSTox file development (More on CRDs). In particular, DSSTox SDF files contain only DSSTox Standard Chemical Fields and Source-Specific Fields, such as listed in the Central DSSTox Field Definition Table, and no extraneous field or file information. In addition, to ensure the proper ordering of fields upon import of DSSTox SDF files, we insert the text entry "blank" in any field in the first SDF record that has a blank or null entry in the CRD. We list below a few specific features of SDF files that are either restricted or included for use in DSSTox SDF files.

Header note is a text string inserted upon SDF export of data from a CRD application. Since header notes are generally application-specific, this note is deleted from DSSTox SDF files and replaced with a blank line.

Data fields (text), strictly according to MDL SDF format requirements, require a hard carriage return to be inserted in any text field exceeding 200 characters in length. We do not adhere to this specification in DSSTox files since InChI and SMILES fields frequently exceed 200 characters in length for larger molecules and are considered essential chemical information fields for DSSTox data files. We have contacted MDL with a request to consider modifying this limiting standard specification for SDF format.

x,y,z coordinates can support 2D or 3D structure representations. Main DSSTox SDF files generally contain 2D structure representations that can be easily printed and visualized in 2D; such representations have the z coordinates set to zero.

Stereochemistry can be represented in a limited way by special atom and bond labels, even when 2D (x,y) coordinate representations are used.

Sample SDF file of one chemical record, highlighting features such as: header note, x,y,z coordinates, connection bond table, and representation of chiral centers.

Return to the list above Return to Top

Compliance with SDF standard specifications (MDL CTFiles):

In the course of the DSSTox project and database development, we have encountered some instances where CRD applications were not totally compliant with SDF standards upon SDF file export. These problems are typically specific to a particular CRD version and to the export of particular types of chemical information. Since we have not exhaustively evaluated all currently available CRD applications, and some problems have been corrected in subsequent CRD version releases, we do not list these specific problems here. Difficulties are not generally encountered with the import of “clean SDF” files into these applications, but more frequently upon export-to-SDF from these applications. Problems include: field lengths truncated, bond types represented in non-standard ways, and application-specific fields automatically added to the SDF. When encountered, we have reported problems to the product vendors; in addition, we have developed procedures to compensate for such problems. See also Known Problems & Fixes.

Strictly speaking, the MDL SDF format specifications, published in 1992 (Dalby et al., J.Chem.Inf.Sci., 1992, 32:244-255), require a hard carriage return to be inserted in any text field exceeding 200 characters in length. We deliberately violate this specification in DSSTox files since InChI and SMILES fields frequently exceed 200 characters in length for larger molecules and are considered essential chemical information fields. We have contacted MDL with a request to consider modifying this limiting standard specification.

Return to the list above Return to Top


Public SDF Tools:

Many features of SDF files, such as the strict ASCII text and content formatting, and labeling of fields and records, make these files relatively easy to modify and manipulate with automated procedures. See Tools & Scripts for a listing of downloadable program scripts, mainly open-source code developed by us and others, that can be applied to editing and modifying SDF files.

Return to the list above Return to Top


Local Navigation


Jump to main content.