Research datasets

To advance research on matters relevant to intellectual property, entrepreneurship, and innovation, the Office of the Chief Economist (OCE) releases datasets to facilitate economic research on patents and trademarks — an element in the USPTO economics research agenda. OCE offers these data in forms convenient for public use and academic research, consistent with the agency's responsibility to make patent and trademark information open and transparent. Furthermore, OCE data releases support White House policy that champions transparency and access to government data under the "data.gov" umbrella of initiatives. Since these data have not been commonly used in the research community, OCE provides supplementary documentation that comprehensively describes the data and presents initial findings.

The following datasets and accompanying documentation are available for download.

Patent Litigation Docket Reports Data

The Patent Litigation Dataset contains detailed patent litigation data on 74,623 unique district court cases filed during the period 1963-2015. OCE collected all of the data from the Public Access to Court Electronics Records (PACER) and RECAP, an independent project designed to serve as a repository for litigation data sourced from PACER. The final output datasets, provided in five different files, include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and descriptions of all documents submitted in a given case, which cover more than 5 million separate documents contained in the case docket reports.

The Office of the Chief Economist at the USPTO conducted a pilot project during which these data were collected and organized with the intent of making them accessible for public use. We encourage users to provide any feedback on the quality and coverage of the data and to share any suggested improvements. Please provide all feedback to EconomicsData@uspto.gov.

Patent Claims Research Dataset

The Patent Claims Research Dataset contain detailed information on claims from U.S. patents granted between 1976 and 2014 and U.S. patent applications published between 2001 and 2014. The dataset is derived from the Patent Application Publication Full-Text and Patent Grant Full Text files, available at https://bulkdata.uspto.gov/, to which the Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

Cancer Moonshot Patent Data

The USPTO Cancer Moonshot Patent Data contains detailed information on published patent applications and granted patents relevant to cancer research and development (R&D). We generate the dataset using USPTO examiner tools to execute a series of queries designed to identify the various fields and subject matter that cancer-related innovations encompass. The final dataset consists of roughly 270,000 patent documents spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.

Patent Examination Research Dataset (Public PAIR)

The Patent Examination Research Dataset (PatEx) contains detailed information on 9.2 million publicly viewable patent applications filed with the USPTO through December 2014. The data are sourced from the Public Patent Application Information Retrieval system (Public PAIR). The data files include information on each application’s characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.

PatentsView

PatentsView is a prototype patent data visualization and analysis platform intended to increase the value, utility, and transparency of US patent data. The initiative is supported by the Office of Chief Economist in the US Patent & Trademark Office (USPTO), with additional support from the US Department of Agriculture (USDA). The PatentsView platform is built on a newly developed database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The platform uses data derived from USPTO bulk data files. These data are provided for research purposes and do not constitute the official USPTO record. The data visualization tool, query tool, and flexible API enable a broad spectrum of users to examine the dynamics of inventor patenting activity over time and space. They also permit users to explore patent technologies, assignees, citation patterns and co-inventor networks.

Historical Patent Data Files

Patent classification systems are largely designed for administrative purposes, limiting their value for most research purposes. To address this deficiency, Hall, Jaffe, and Trajtenberg (2001) developed a higher-level classification for the National Bureau of Economic Research (NBER) Patent Citation Data File by aggregating U.S. Patent Classification (USPC) classes into economically relevant technology categories. While this NBER classification scheme has proven valuable for researchers investigating US patent grants, comparable information on patent applications remained unavailable. For that reason, OCE developed a probability-matching algorithm to apply NBER classifications to patent applications as well as in-force and expired patents. From matched data, we construct the USPTO Historical Patent Data Files, four research datasets containing time series and micro-level data by NBER sub-category on applications, grants, and in-force patents spanning two centuries of innovation.

Patent Assignment Dataset

The USPTO allows parties to record assignments of patents and patent applications to, as much as possible, maintain a complete history of claimed interests in a patent. The USPTO also permits recording of other documents that affect title (such as certificates of name change and mergers of businesses) or are relevant to patent ownership (such as licensing agreements, security interests, mortgages, and liens). The Patent Assignment Dataset contains detailed information on 6.8 million patent assignments and other transactions recorded at the USPTO since 1970 and involving roughly 11.1 million patents and patent applications. The Patent Assignment Dataset is updated annually.

Trademark Assignment Dataset

The USPTO allows parties to record assignments of trademark applications and registrations to maintain a complete history of claimed interests in a mark. The Trademark Assignment Dataset contains detailed information on more than 873,000 assignments and other transactions recorded at the USPTO since 1952 and involving 1.6 million unique trademark properties. The Trademark Assignment Dataset is updated annually.

Trademark Case Files Dataset

The Trademark Case Files Dataset contains detailed information on 7.7 million trademark applications filed with or registrations issued by the USPTO between 1870 and December 2015. It is derived from the USPTO main database for administering trademarks and includes data on mark characteristics and designs, prosecution events, ownership, classification, renewal history, foreign priority, and international registration. The Trademark Case Files Dataset is updated annually.

Legal Authority

The release of these data is consistent with the agency's responsibility under 35 USC 2 to make information about patents and trademarks available to the public. Providing research datasets to allow for study of the economics of patents and trademarks is also an element in the USPTO economics research agenda. Furthermore, it supports the Obama administration's policy championing transparency and access to government under the "data.gov" umbrella of initiatives.