United States National Library of Medicine National Institutes of Health

Sample NLM® Data

INSTRUCTIONS FOR FTP OF SAMPLE RECORDS
Ftp to NLM's anonymous ftp server: ftp://ftp.nlm.nih.gov/nlmdata/sample/
(login as a non-fee/anonymous user; use your e-mail address as password)
You will see a directory for each NLM database. Go to the directory you want and get the desired files.

  1. MEDLINE®/PubMed® (includes approximately 98% of all records in PubMed)

    NLM distributes MEDLINE/PubMed data in XML format.

    2009 Production Year Data
    The DTDs used for the forthcoming 2009 production year are available from http://www.nlm.nih.gov/databases/dtd/.

    2008 Production Year Data
    The NLMMEDLINE DTD used for the 2008 production year data is available at http://www.nlm.nih.gov/databases/dtd/nlmmedline_080101.dtd. This DTD references the NLMMedlineCitation DTD at http://www.nlm.nih.gov/databases/dtd/nlmmedlinecitation_080101.dtd that in turn references the new NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_080101.dtd that in turn references the NLMCommon DTD at http://www.nlm.nih.gov/databases/dtd/nlmcommon_080101.dtd.

    Six large sample files, each in .gz and .zip format, containing 30,000 records each, and named medsamp2008a.xml through medsampv2008f.xml, are available for ftp (see access instructions at the top of this page). These files contain records in MEDLINE, PubMed-not-MEDLINE, and OLDMEDLINE statuses. Please note that maintained versions of all sample records may reside in PubMed during the year.

    A small sample file of 138 representative records using the 2008 DTDs and covering each of the five status categories of records distributed to MEDLINE/PubMed licensees (i.e., MEDLINE, In-Data-Review, In-process, PubMed-not-MEDLINE, and OLDMEDLINE) is available. (Note: elements new for 2008 are not yet represented in these records).

    Documentation
    A document describing the MEDLINE/PubMed data element descriptions (including definitions of the record status categories) is available at http://www.nlm.nih.gov/bsd/licensee/data_elements_doc.html

  2. CCRIS, GENE-TOX and HSDB®
    Sample CCRIS, GENE-TOX and HSDB data in an abbreviated XML format are available for ftp. See instructions below for obtaining the abbreviated DTDs, sample records in XML format, and two files of documentation for each database from NLM's ftp server. The two documentation files are a .readme file containing definitions of the elements using legacy format element names and a conversion table showing conversion of data element names from legacy format to new XML element names.

  3. TOXLINE® Subset
    Sample TOXLINE Subset data in XML format are available for ftp. See instructions below for obtaining sample records and DTDs from NLM's ftp server. Multiple DTDs and sample files are available for TOXLINE Subset: toxspec.dtd defines the XML for the entire TOXLINE Subset and archival.dtd defines the XML for the archival subfiles only. (Note that licensees must have special arrangements with BIOSIS and IPA before NLM will distribute their data). Other DTDs and sample files are present for each individual subfile of the database. Updates for the various subfiles comprising this database, if available, will be placed on the NLM server for licensees at the end of each month. The frequency of updates will be irregular, as NLM is dependent upon the outside suppliers whose schedules are not fixed. Each update file will be a complete replacement for that specific subfile.

  4. CHEMIDplus Subset and DIRLINE®
    Sample ChemIDplus and DIRLINE data in XML format are available for ftp. See instructions below for obtaining the DTDs and sample records in XML format from NLM's ftp server. Note that licensees must contact U.S. Pharmacopeia Convention, Inc. (USP), for possible special arrangements before NLM will distribute ChemIDplus.

  5. Catfile, CatfilePlus, and Serfile
    Catfile is available in MARC 21 format only; CatfilePlus and Serfile are also available in XML format. Sample files of MARC 21 and XML-formatted products are available per access instructions at the top of this page.

    CatfilePlus in XML and Serfile in XML are defined by three NLM DTDs:
    The 2008 NLMCatalogRecord DTD is available at http://www.nlm.nih.gov/databases/dtd/nlmcatalogrecord_080101.dtd. This DTD references the NLMSharedCatCit DTD at http://www.nlm.nih.gov/databases/dtd/nlmsharedcatcit_080101.dtd that in turn references the NLMCommon DTD http://www.nlm.nih.gov/databases/dtd/nlmcommon_080101.dtd.

    Data element descriptions applicable to CatfilePlus in XML and Serfile in XML are available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_element_desc2.html. A description of attributes for these elements is available at http://www.nlm.nih.gov/bsd/licensee/catrecordxml_attributevalues_alpha2.html.

    2009 Production Year Data
    The DTDs used for the forthcoming 2009 production year are available from http://www.nlm.nih.gov/databases/dtd/.

    General information on the MARC 21 record structure is available from the Library of Congress at http://lcweb.loc.gov/marc/marcdocz.html.

Last reviewed: 22 August 2008
Last updated: 22 August 2008
First published: 01 January 1999
Metadata| Permanence level: Permanence Not Guaranteed