The HealthData.gov API is used to provide software developers with programmatic access to the contents of our data catalog. The API can be used to find recently added datasets, to search the catalog, to download the contents of the catalog for analysis, or to build a new data catalog tool.
HealthData.gov uses CKAN for its API. We are running CKAN version 1.7. Documentation for CKAN’s API can be found at http://docs.ckan.org/en/latest/api-v2.html, and CKAN’s support for RDF is described at http://wiki.ckan.org/RDF_and_CKAN.
Questions about the data catalog API can be sent to the HealthDataGov Google Group.
The base URL for the HealthData.gov API is http://hub.healthdata.gov/api.
In this article:
- Accessing the Complete Data Catalog Listing
- Searching the Data Catalog
- Accessing Individual Datasets
- JSON Schema
Accessing the Complete Data Catalog Listing
A JSON listing of every dataset in the catalog can be accessed at http://hub.healthdata.gov/api/2/rest/dataset.
The response is a JSON list of dataset IDs (GUIDs). It looks like this:
["0056861d-28cd-4f8d-97b3-6205517637c3", "00aada73-a456-4547-ac5a-e5ffdc6b4847", "02588273-41d6-4ae5-a90a-1e336d0f129e", "03edc320-4eb7-4089-b66a-a54760a44b28", "0477da33-0795-4669-bba5-cc494604b022", "05457387-7ab6-4c1a-9dba-b1e5bdd5f2ad", "05b7319c-20a1-43f5-a01a-3847933d4ccf", "0660d0f4-b600-4d1e-a0be-228fc2857a12", "067da109-762f-4417-acea-521f227aea42", "088c4f1b-b266-40e8-a12b-cda2b97670eb", "08d78f4d-40c0-4691-948c-a4f17df65e59", "09bda462-ef6b-43ee-955f-b3e40d288eec", . . .
You can also get a list of slugs (i.e. the name that goes into the URL for each dataset) rather than GUIDs using http://hub.healthdata.gov/api/1/rest/dataset. The response is a JSON list of strings:
["dietary-supplements-labels", "nursing-home-compare", "child-growth-charts", "home-health-compare", "genetics-home-reference", "renal-dialysis-facility-medicare-cost-report-data-1996", "renal-dialysis-facility-medicare-cost-report-data-2001", "health-resources-county-comparison", "home-health-agency-medicare-cost-report-data", "omha-appeals-listed-state", "renal-dialysis-facility-medicare-cost-report-data", "hospital-medicare-cost-report-data-fy1995", "2008-basic-stand-alone-hospice", "find-shortage-areas-hpsas-eligible", "part-national-summary-data-file-cy2004", "departmental-appeals-board-decisions", . . .
You can alternatively get the complete dataset metadata record with each response using http://hub.healthdata.gov/api/search/dataset?all_fields=1&rows=500. The fields are described in more detail below.
Searching the Data Catalog
The data catalog can be searched using URLs such as:
http://hub.healthdata.gov/api/search/dataset?q=medicare&start=0&rows=20
Use the q parameter to specify the search term. Note that the results are paged. Use start and rows to specify the page of results to load. See the CKAN search API documentation for details. The response for the above search is:
{"count": 164, "results": ["medicare-enrollment-dashboard", "medicare-tools-downloadable", "medicare-appeals-council-decisions", "medicare-appeals-council-decisions-1", "medicare-medicaid-statistical", "medicare-geographic-variation", "data-compendium", "chronic-conditions-chart-book", "helpful-contacts", "plans-quality-compare", "2008-chronic-conditions", "2008-basic-stand-alone-hospice", "2008-basic-stand-alone-durable", "2008-basic-stand-alone-prescription", "active-project-reports", "2008-basic-stand-alone-home", "2008-basic-stand-alone-skilled", "2008-basic-stand-alone-carrier", "claims-listed-state", "omha-appeals-listed-state"] }
The primary fields (see below) support filtering. Use author=___ to filter the results by agency. For instance http://hub.healthdata.gov/api/search/dataset?author=Centers%20for%20Medicare%20%26%20Medicaid%20Services returns only datasets submitted by the Centers for Medicare & Medicaid Services:
{"count": 149, "results": ["2008-basic-stand-alone-carrier", "2008-basic-stand-alone-durable", "2008-basic-stand-alone-home", "2008-basic-stand-alone-hospice", "2008-basic-stand-alone-inpatient", "2008-basic-stand-alone-outpatient", "2008-basic-stand-alone-prescription", "2008-basic-stand-alone-skilled", "2008-chronic-conditions", "active-project-reports"] }
You can also search for recently revised entries using URLs such as http://hub.healthdata.gov/api/search/revision?since_time=2012-05-05. The result is a list of revision GUIDs. You can find the dataset GUID from a revision GUID by appending the revision GUID to http://hub.healthdata.gov/api/2/rest/revision/, such as http://hub.healthdata.gov/api/2/rest/revision/b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d, which gives this output:
{ "id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d", "timestamp": "2012-05-30T22:16:35.228513", ... "packages": ["e5784720-a9a5-407e-bc36-84420289f1a9"], "groups": [] }
The dataset GUID is the GUID in the packages element (b7de8bdd-2edc-4713-888d-d6cb87c7196b). You can plug that into the dataset details API explained next.
Accessing Individual Datasets
Dataset metadata is available in machine-readable form in JSON, RDF/XML, and Notation 3.
JSON
To access a particular dataset in JSON, append the dataset GUID to http://hub.healthdata.gov/api/2/rest/dataset/. The response is a JSON object containing information about the dataset. For instance the URL http://hub.healthdata.gov/api/2/rest/dataset/e5784720-a9a5-407e-bc36-84420289f1a9 gives:
{ "id": "e5784720-a9a5-407e-bc36-84420289f1a9", "metadata_created": "2012-05-30T22:16:35.228513", "metadata_modified": "2012-05-30T22:16:35.228513", "author": "Centers for Medicare & Medicaid Services", "tags": ["claims", "enrollment", "expenditures", "inpatient", "managed care", "medicaid", "prescription drug"], "name": "validation-reports", "notes_rendered": "<p>Medicaid Analytic eXtract (MAX) Validation Reports ...", "url": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp", "notes": "Medicaid Analytic eXtract (MAX) Validation Reports These documents contain ...", "title": "MAX Validation Reports", "extras": { "Unit of Analysis": "Person", "hd2-workflow-id": "753", "Agency": "Department of Health & Human Services", "Geographic Granularity": "State", "Technical Documentation": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/list.asp", "Collection Frequency": "Annually", "Agency Program URL": "http://www.cms.gov/MedicaidDataSourcesGenInfo/MVR/", "Date Updated": "2011-10-19", "Date Released": "2003-01-01", "author_id": "http://healthdata.gov/id/agency/cms", "Subject Area 1": "Medicaid", "Geographic Scope": "State" }, "revision_id": "b1dae0c1-10d6-4c4d-8f2b-e9eb46d59d7d" }
To find the URL for a dataset, you can also look for the link in the “Metadata API” field on the dataset page on www.healthdata.gov.
CKAN has three types of fields: primary fields, “extras” (general metadata), and “resources” (downloadable files). All but the primary fields are optional. Field definitions are documented at the end of this page.
RDF XML and Notation 3 (N3)
You can also access the dataset metadata in RDF, in either XML or Notation 3 format. The URL to these resources can be made by concatenating http://hub.healthdata.gov/dataset/, the dataset GUID or name, and either “.rdf” or “.n3”. (It is the public page for the dataset on our CKAN site plus the file extension. Alternatively you can set the HTTP Accept header to application/rdf+xml or text/n3 on the public page URL.)
Taking the same dataset as above, the RDF metadata can be accessed at http://hub.healthdata.gov/dataset/e5784720-a9a5-407e-bc36-84420289f1a9.rdf. We use Dublin Core, DCAT, and other vocabularies as appropriate.
You can also find the URL in the Metadata API field on www.healthdata.gov's dataset pages.
JSON Schema
The JSON output for datasets uses the following schema:
Primary Fields
field | type | description |
---|---|---|
id |
GUID |
The unique identifier for the dataset in the HealthData.gov API. |
title |
plain text |
The display name for the dataset. |
notes |
plain text |
The description of the dataset. |
notes_rendered |
HTML text |
The description of the dataset rendered in HTML using Markdown. |
author |
plain text |
The name of the federal agency that submitted the dataset to HealthData.gov. |
url |
url |
The URL to the home page for the dataset, which may link to downloadable files. |
tags |
array of strings |
Tags associated with the dataset. |
Extras Fields
field | type | description |
---|---|---|
author_id |
uri |
A URI uniquely identifying the agency submitting the data. The URI is in the http://healthdata.gov/id/agency space and while it does not currently resolve to a resource it can be used as a canonical identifier for the agency. |
Group Name |
plain text |
A display name shared across datasets that are related. |
Agency |
plain text |
The name of the federal department submitting the data. Generally “Health and Human Services.” |
Subject Area 1 |
string |
A subject area. Subjects come from a fixed vocabulary, currently: Administrative, Biomedical Research, Children's Health, Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments. |
Subject Area 2 |
string |
A subject area. See above. |
Subject Area 3 |
string |
A subject area. See above. |
Date Released |
date |
The date the dataset was first made available to the public (possibly before it was posted on HealthData.gov). Format: YYYY-MM-DD. |
Date Updated |
date |
The date the dataset was last changed, i.e. the last change to the data itself and not necessarily the metadata record. Format: YYYY-MM-DD. |
Agency Program URL |
url |
The URL of the agency program responsible for the data. |
Collection Frequency |
string |
The frequency with which the data was collected, which is sometimes different from the frequency at which the data is published. Possible values are Annually, Semi-Annually, Quarterly, Monthly, Weekly, Daily. |
Coverage Period Start |
date |
The start of the coverage period, i.e. the date range that the data pertains to. Format: YYYY-MM-DD. |
Coverage Period End |
date |
The end of the coverage period, i.e. the date range that the data pertains to. If the coverage period end date is omitted, the dataset may cover the period from the start date to the present time. Format: YYYY-MM-DD. |
Coverage Period Fiscal Year Start |
year |
For coverage periods that are based on fiscal years rather than calendar years, the starting fiscal year of the coverage period. Format: YYYY. |
Coverage Period Fiscal Year End |
year |
For coverage periods that are based on fiscal years rather than calendar years, the ending fiscal year of the coverage period. If the coverage period end fiscal year is omitted, the dataset may cover the period from the starting fiscal year to the present time. Format: YYYY. |
Unit of Analysis |
plain text |
The unit of analysis, i.e. the object of study. Examples are “recalled food items” and “renal dialysis facility”. |
Geographic Scope |
plain text |
The geographic region covered by the dataset. If omitted, the dataset is typically national in scope. |
Geographic Granularity |
string |
The granularity of the geographic coverage. Possible values are Latitude/Longitude Coordinate, Street Address, Census Tract, City, MSA (metropolitian statistical area), ZIP Code, County, State, Sub-National Region, and Country. |
Technical Documentation |
url |
The URL to technical documentation for the dataset. |
Data Dictionary |
url |
The URL to a data dictionary for the dataset. |
Collection Instrument |
url |
The URL to information about the data collection instrument. |
License Agreement Required |
integer |
Whether a license agreement must be agreed to before using the data (1 if yes, 0 if no, omitted if not known). |
License Agreement |
url |
The URL to a license agreement that must be agreed to before using the data. |
Resource Fields
A dataset may have one or more resource records, each of which represents a downloadable file or a query tool interface. Multiple files are often specified when the dataset is available in multiple formats. Each resource record uses these fields:
field | type | description |
---|---|---|
url |
url |
The URL of the downloadable file or the query interface. |
name |
plain text |
The display name of the media format, e.g. CSV. Currently the same as the format attribute. |
format |
string |
The media format. Possible values are API, CSV, ESRI, Feed, KML, Map, Query Tool, RDF, Text, Widget, XLS, XML. |