Diversity Set II Information
The NCI Diversity Set II was derived from the almost 140,000 compounds
available for distribution from the DTP repository. Only compounds
having at least 250 mg of material available were considered. This was
done to allow a large number of copies to be made and to assure
adequate amounts to supply refill requests. We also wanted to insure
that the computer representation of the chemical structure was
reasonable. With the help of the PubChem team, we compared the
connection table encoded by the old SANSS format to the connection
table generated by processing the structure picture and output via a
MDL molfile. Only compounds that showed identical connection tables
were considered. Furthermore, we checked the molecular formula
generated from the structure to the molecular formula independently
entered in our database and only used compounds where the formulae
matched. The more than 80,000 compounds meeting these criteria were
then reduced to the final set using the programs Chem-X (Oxford
Molecular Group) and Catalyst (Accelrys, Inc.). Both Chem-X and
Catalyst use defined pharmacophoric centers (i.e., hydrogen bond
acceptor, hydrogen bond donor, positive charge, aromatic, hydrophobic,
acid, base) and defined distance intervals to create a finite set of
three dimensional, 3-point pharmacophores resulting in over 1,000,000
possible pharmacophores for the Diversity Set II selection. The
selection protocol considers each molecule, all its pharmacophores and
each of its conformational isomers. During the generation of the
diversity set, the pharmacophores for any candidate compound are
compared to the set of all pharmacophores found in structures already
accepted into the set. If the current structure has more than 5 new
pharmacophores, it is added to the set. An additional objective with
the NCI Diversity Set II was to create a diverse set of compounds that
were amenable to forming structure-based hypotheses. Thus, molecules
that were relatively rigid, with 5 or fewer rotatable bonds, having a
tendency to be planar, 1 or less chiral centers, and pharmacologically
desirable features (i.e., did not contain: obvious leaving groups,
weakly bonded heteroatoms, organometallics, polycyclic aromatic
hydrocarbons, etc.) were given priority in the final selection. This
resulted in a set of 3046 compounds. This set was
sent to the Molecular
Libraries Small Molecular Repository where they were checked for
purity via LC/Mass Spec. Only compounds with a purity of 90% or better
by this method were accepted. This resulted in a final set of
1364 compounds.
Diversity Set Data
Structural Data
These compounds are also in the Molecular Libraries Small Molecule Repository (MLSMR) and
are shipped to the screening centers that are part of the
Molecular Libraries Program. In order to see data in PubChem on testing done in the Molecular
Libraries, you need to search by the PubChem substance ID for the MLSMR data deposition. You can
use the following file with NSC and equivalent PubChem SID. For further information or help, please
contact Dan Zaharevitz.
download csv file with identifiers