Published Patent Application (Pre-Grant Publications or PGPUBs) Data Products

A patent application is a document submitted by an inventor requesting a patent be issued.

A patent application document contains bibliographic front page information, an abstract (summary), specification and claims as originally filed, and drawings depicting the invention.

 

Patent Application Publication Multi-Page Images (2001-current Calendar Year)

Contains the images of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to the current calendar year in Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression (multi-page TIFFs) from the USPTO USAApp optical disc product (discontinued 12/31/2011).

Each weekly file contains approximately 5,000 patent application publications and is approximately 6 GB (compressed). The entire yearly collection is approximately 4 TB.

Available weekly (7-14 days after publication) for no charge: https://bulkdata.uspto.gov/data3/patent/application/multipagetiff/2016 and http://patents.reedtech.com/pampi.php

Patent Application Publication Single-Page Images (current Calendar Year)

Contains the images of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) from March 15,2001 to the current calendar year in Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression (single-page TIFFs).

Each weekly file contains approximately 5,000 published patent applications and is approximately 8 GB (compressed). Backfiles are approximately 10 GB (compressed). The entire yearly collection is approximately 4 TB.

Available weekly for no charge: https://bulkdata.uspto.gov/data/patent/application/yellowbook/2016 and http://patents.reedtech.com/payb.php

Documentation: https://www.uspto.gov/learning-and-resources/xml-resources/xml-resources-retrospective

(*An annual subscription for the current Calendar Year is available on blu-ray discs for $5,200. Contact ipd@uspto.gov for ordering information.)

Patent Application Publication Full-Text (2001-current Calendar Year)

Contains the full text of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) in the current calendar year (excludes images/drawings). The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.3 International Common Element (ICE) Document Type Definition (DTD). These files are a subset and concatenation of the Patent Application Publication Data/XML Version 4.3 ICE.

Because of the concatenation of the individual XML documents, these files will not parse successfully or open/display by default in Internet Explorer. They also will not import into MS Excel. Each XML document within the file should have one start tag and one end tag. Concatenation creates a file that contains 5,000 plus start/end tag combinations. If you take one document out of the Patent Application Publication Full Text file and place it in a directory with the correct DTD and then double click that individual document, Internet Explorer will parse/open the document successfully. NOTE: You may receive a warning about Active X controls.

All Patent Application Publication Full Text files will open successfully in MS Word; NotePad; WordPad; and TextPad.

These 52 zip files are available for no charge from: https://bulkdata.uspto.gov/data2/patent/application/redbook/fulltext/2016 and http://patents.reedtech.com/parbft.php

Documentation: https://www.uspto.gov/learning-and-resources/xml-resources/xml-resources-retrospective

(*An annual subscription for the current Calendar Year is available for $2,500. Contact ipd@uspto.gov for ordering information.)

Patent Application Publication Full-Text with Embedded Images (2001-current Calendar Year)

Contains the full text, images/drawings, and complex work units (tables, mathematical expressions, chemical structures, and genetic sequence data) of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) in the current calendar year. The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.3 International Common Element (ICE) Document Type Definition (DTD). Tables and sequence data are included using CALS markup. Mathematical expressions are included using MATHML markup and external Mathematica Notebook (NB) files. Chemical structures are represented by external CambridgeSoft Corp. ChemDraw (CDX) files and MDL Information Systems (MOL) files. Drawings, mathematical expressions, and chemical structures are also included as external Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression image files. Each weekly file contains approximately 5,000 patent application publications. There can be an optional weekly Supplemental zip file that contains lengthy sequence listings (anything over 300 pages) or lengthy tables (anything over 200 pages).

Available for no charge: https://bulkdata.uspto.gov/data2/patent/application/redbook/2016 and http://patents.reedtech.com/parbfti.php

Documentation: https://www.uspto.gov/learning-and-resources/xml-resources/xml-resources-retrospective

(*An annual subscription for the current Calendar Year is available on DVD-ROMs for $5,200. Contact ipd@uspto.gov for ordering information.)

Patent Application Publication Bibliographic (2001-current Calendar Year)

Contains the bibliographic text (i.e., front page) of each patent application publication (non-provisional utility and plant) published weekly (Thursdays) in the current calendar year (excludes images/drawings). The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.3 International Common Element (ICE) Document Type Definition (DTD). These files are a subset and concatenation of the Patent Application Publication Data/XML Version 4.3 ICE (Text Only). Because of the concatenation of the individual XML documents, these files will not parse successfully or open/display by default in Internet Explorer. They also will not import into MS Excel. Each XML document within the file should have one start tag and one end tag. Concatenation creates a file that contains 5,000 plus start/end tag combinations. If you take one document out of the Patent Application Publication Bibliographic file and place it in a directory with the correct DTD and then double click that individual document, Internet Explorer will parse/open the document successfully. NOTE: You may receive a warning about Active X controls.

All Patent Application Publication Bibliographic files will open successfully in MS Word; NotePad; WordPad; and TextPad. Available on publication day (Thursdays).

The data set is approximately 2.7 MB per week (compressed).

These product files are available for no charge from: https://bulkdata.uspto.gov/data2/patent/application/redbook/bibliographic/2016 and http://patents.reedtech.com/parbbib.php

For more information on bulk data products, contact ipd@uspto.gov.