PubChem Download Facility Help This document describes how to use the download facility built into PubChem, which lets you create a file containing a set of structures from the results of an Entrez search or an ID list. Note that the number of records that can be retrieved this way is limited to 500,000 (except for images, limited to 50,000); if you need larger sets of records, then you should use the PubChem FTP site that contains the entire database in the same formats available here. All download requests are kept private; without your unique 64-bit key, nobody else can see what records you have requested. Steps to downloading: 1) Perform a search in PC-Substance or PC-Compound Entrez. 2) From the Display menu near the top of the Entrez results page, select PubChem Download. 3) This will take you to a format selection page. There are two menus, one to choose the data format:
A second menu lets you choose the compression for the resulting data file:
4) Press the Download button to begin the download process. Because the records are being retrieved directly from the PubChem database, it is necessary to queue download requests in order to prevent server overload. You will see a series of self-refreshing pages during this process. In particular, the Queue status shows what's happening:
You do not have to keep your browser open on this page the entire time; you can bookmark this status page and come back to it later to check your request's progress, anytime within 24 hours of the initial request. 5) When the download is finished, your file should start transferring automatically. You can also download by FTP from the given URL link - either directly through your browser or with any FTP client. Your file will remain on the FTP site for at least a week. It is now possible to download directly without going through Entrez. Simply navigate to the download service URL (http://pubchem.ncbi.nlm.nih.gov/pc_fetch), select a database, and supply a list of IDs. These should be SIDs for PubChem Substance or CIDs for PubChem Compound, and one may either enter them in the web page form or upload a local file of IDs. The IDs may be integers separated by any combination of white space, comma, or semicolon. The rest of the download operation then proceeds as described above. Note that these additional inputs will not appear in the web form when downloading from Entrez. The Save Job button produces an XML data structure that may be used with PUG, or as a model for constructing PUG download requests. See http://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html for more information on accessing PubChem through PUG.
|
|||||||||||||||||||||||||||||||||