FAQ: COVID-19 Data and Surveillance

FAQ: COVID-19 Data and Surveillance

Frequently Asked Questions

Updated Nov. 20, 2020

National COVID-19 Case Surveillance

To protect Americans from serious infectious diseases and other health threats, public health authorities conduct national case surveillance to monitor more than 120 diseases and conditions. For these conditions, public health departments collect information on individuals with the infection in a population, which is known as case surveillance. One goal of case surveillance is to provide information needed for taking public health action to prevent cases and spread of disease; another is to control outbreaks. Case surveillance is especially important for new diseases, such as COVID-19, to understand the similarities and differences among cases, including:

  • Demographic, clinical, and epidemiologic characteristics;
  • Exposure and contact history; and
  • Course of clinical illness and care received.

During the COVID-19 response, state and jurisdictional health departments voluntarily send case data to CDC using the National Notifiable Diseases Surveillance System. To protect individuals’ privacy, COVID-19 case data are sent to CDC without personal identifiers, such as names or home addresses. A national standardized case definition is used to define confirmed, probable, and suspect cases and deaths.

Unlike data collected for clinical trials and research studies, in which scientists comprehensively measure and follow the health status of patients, national case surveillance data focus on capturing demographic and risk factor information about people with COVID-19.

The process for reporting, collecting, and analyzing disease data is called a data supply chain. Under state disease reporting laws, hospitals, healthcare providers, and laboratories must report confirmed or probable COVID-19 cases and deaths to state or local health departments. These laws are designed to help health departments quickly identify outbreaks and control the spread of disease. The figure below illustrates how data are transferred for case reporting (from hospitals, healthcare providers, and laboratories to local, state, regional, or territorial public health) and how data move for case notification (from state or territorial public health departments to CDC). These two steps of information flow make up national case surveillance. While case reporting is mandatory under state reportable disease laws, case notification from state and local health departments to CDC is voluntary and includes deidentified data.

Using the National Notifiable Diseases Surveillance System, health departments voluntarily send COVID-19 case and death data to CDC that do not include personally identifiable information. Because COVID-19 has been designated as a public health emergency of international concern, CDC reports national case surveillance data to the World Health Organization under International Health Regulations (2005). CDC also publishes deidentified COVID-19 national case surveillance data at data.cdc.gov, with additional privacy protections in place for public use.

To obtain timely and detailed data on COVID-19 cases in the United States, CDC uses two data sources. The first data source is an aggregate count based on a robust, multistep process to collect data and confirm the case and death numbers with jurisdictions daily:

  • A CDC data team collects information from jurisdictions’ websites, and a separate CDC data team double-checks the information collected.
  • CDC then shares the data back with the jurisdictions for confirmation or corrections.
  • CDC reconciles any differences and posts the finalized information to a CDC website.

This process is collaborative, with CDC and the jurisdictions working together to ensure the accuracy of the COVID-19 case and death numbers published on CDC’s website. Aggregate counts provide the most up-to-date validated numbers on cases and deaths; CDC may retrospectively update the counts after posting based on any updated information from jurisdictions.

The second data source involves line-level data for each case, which provide additional information about whether the patient died and other details such as age and race and ethnicity. CDC receives the line-level data primarily from state health departments without personal identifiers such as names or home addresses. Because it can be time-consuming for jurisdictions to collect the additional information, these data can lag behind the aggregate counts. Although CDC receives this information for most cases, it does not receive it for all cases.

The COVID-19 pandemic has put unprecedented demands on the public health data supply chain. In many states, the large number of COVID-19 cases has severely strained the ability of hospitals, healthcare providers, and laboratories to report cases with complete demographic information, such as race and ethnicity. The unprecedented volume of cases has also limited the ability of state and local health departments to conduct thorough case investigations and collect all requested case data.

As a result, many COVID-19 case notifications submitted to CDC do not have complete information on patient demographics; signs and symptoms of illness; underlying health conditions; characteristics of hospitalizations such as ventilator use; clinical outcomes; exposures; and factors that may put people at higher risk for severe disease. Because it can be time-consuming for jurisdictions to collect the additional information, these data can lag behind the aggregate counts. Because of missing data, analyses of these data elements are likely an underestimate of the true occurrence.

Most states have demographic factors like age and sex for the majority of reported cases. With thousands of cases being reported, however, completeness of these elements is unlikely to improve in the immediate future for some jurisdictions.

Because the racial and ethnic composition of the U.S. population varies by geographic area, comparisons of COVID-19 case information should consider the population of each geographic area. Additionally, because completeness of race and ethnicity information may vary by state or geographic area and other patient factors, such as severity of illness, CDC’s case data may not be generalizable to the entire U.S. population.

Case surveillance provides information on the characteristics of a disease within a population, usually through laboratory confirmation of cases using a standard case definition. CDC uses national case surveillance to:

  • Track the spread of COVID-19 around the country to identify areas of concern and inform state decision makers;
  • Help state and local public health departments better control COVID-19 by evaluating trends in case demographics, exposures, and outcomes to identify those groups most at risk, such as healthcare workers, racial and ethnic minorities, older adults, and people with certain underlying health conditions; and
  • Analyze exposure information and health outcomes among COVID-19 patients to develop guidance for the public, at-risk groups, and healthcare providers.

The COVID-19 pandemic has put unprecedented strain on the public health data supply chain. In many states, the large number of COVID-19 cases has severely strained the ability of hospitals, healthcare providers, and laboratories to report cases with complete demographic information, such as race and ethnicity. The unprecedented volume of cases has also limited the ability of state and local health departments to conduct thorough case investigations and collect all requested case data. As a result, many COVID-19 case notifications submitted to CDC do not have complete information on patient demographics, clinical outcomes, exposures, and factors that may put people at higher risk for severe disease.

National case surveillance data are constantly changing. For instance, as new information is gathered about previously reported cases, health departments provide updated data to CDC. As more information and data become available, analyses might find changes in surveillance data and trends during a previously reported time window.

A key challenge with case reporting is that people who are infected with the virus that causes COVID-19 may have mild or no symptoms. These people might not have sought testing or health care and are, therefore, less likely to be reported as cases. Similarly, cases in people who have had severe outcomes, such as hospitalization, intensive care unit (ICU) admission, and death, are more likely to be reported than cases in people with less severe illnesses. These challenges result in limitations when analyzing and interpreting the data.

CDC continues to work with state, local, and territorial health departments to accelerate reporting of national case surveillance data, improve data quality, and gather complete information about all COVID-19 cases.

CDC is working with healthcare providers, electronic health record developers, laboratories, and state and local health departments to modernize disease surveillance by automating the generation and transmission of case reports from the electronic health record to public health agencies for review and action for the COVID-19 response.

For example, expanded use of electronic case reporting, which make the submission of information from healthcare providers to public health departments seamless and automated, will reduce the burden of manually reporting COVID-19 cases, increase timeliness of reporting, and improve data completeness by pulling data directly from the medical record.

Additional resources

CDC COVID-19 Surveillance

Public health surveillance is the ongoing, systematic collection, analysis, and interpretation of health-related data essential to planning, implementation, and evaluation of public health practice.

For surveillance of COVID-19 and the virus that causes it, SARS-CoV-2, CDC is using multiple surveillance systems in collaboration with state, local, territorial, academic, and commercial partners to monitor COVID-19 in the United States. COVID-19 surveillance draws from a combination of data sources using existing influenza and viral respiratory disease surveillance, syndromic surveillance, case reporting, lab reporting, health care systems reporting, ongoing research platforms, and new surveillance systems designed to answer specific questions. Combined, the data from these systems create an updated picture of COVID-19’s spread and its effects in the United States and are used to inform the U.S. national public health response to COVID-19.

  • To monitor spread of COVID-19 in the United States
  • To understand disease severity and the spectrum of illness due to COVID-19
  • To understand risk factors for severe disease and transmission of COVID-19
  • To monitor for changes in the virus that causes COVID-19
  • To estimate disease burden due to COVID-19
  • To produce data for forecasting COVID-19’s spread and impact
  • To understand how COVID-19 impacts the capacity of the U.S. healthcare system (for example, availability and shortages of key resources)

COVID-19 data can be used to help public health professionals, policymakers, and healthcare providers monitor the spread of COVID-19 in the United States and support a better understanding of the spectrum of illness, the effectiveness of community intervention, and social disruptions associated with COVID-19 in the United States. These data help inform U.S. national, state, local, tribal, and territorial public health responses to COVID-19.

Detailed and accurate data will allow us to better understand and track the size and scope of the outbreak and strengthen prevention and response efforts.

CDC provides this information on the Cases, Data, & Surveillance webpage:

Understanding the Data

A COVID-19 case includes confirmed and probable cases and deaths. The case classifications for COVID-19 are described in an updated interim COVID-19 position statement and case definition issued by the Council of State and Territorial Epidemiologists on August 5, 2020. Although this updated case definition includes three case classifications (suspect, probable, and confirmed), CDC case counts exclude suspect cases and deaths.

A previous COVID-19 position statementpdf iconexternal icon issued by CSTE on April 5, 2020, included a case definition and made COVID-19 a nationally notifiable disease. A notifiable disease or condition is one for which regular, frequent, and timely information regarding individual cases is considered necessary to prevent and control the disease or condition.

A probable case or death is defined as any one of the following:

The virus that causes COVID-19 spreads very easily and sustainably between people. The more closely people interact with others and the longer that interaction, the higher the risk of COVID-19 spread. Practicing preventive actions such as avoiding close contact, wearing face coverings, washing hands often, and cleaning and disinfecting prevent the spread of COVID-19. Differences in community characteristics and changes in preventive behavior can result in increases or decreases of cases over time and geographical area.

The COVID-19 death count shown on the Cases and Deaths by State tab on the COVID-19 Data Tracker includes deaths reported daily by state, local, and territorial health departments. This count reflects the most up-to-date information received by CDC based on preliminary reporting from health departments.

In contrast, provisional COVID-19 death counts from the National Center for Health Statistics (NCHS) are updated Monday through Friday with information collected from death certificates. These data represent the most accurate death counts. However, because it can take several weeks for death certificates to be submitted and processed, there is on average a delay of 1–2 weeks before they are reported. Therefore, the provisional death counts may not include all deaths that occurred during a given time period, especially for more recent time periods. Death counts from earlier weeks are continually revised and may increase or decrease as new and updated death certificate data are received. Provisional COVID-19 death counts may therefore differ from those on other published sources, such as media reports or the COVID-19 Data Tracker webpage.

The mortality rate is the number of people who died due to COVID-19 divided by the total number of people in the population. Since this is an ongoing outbreak, the mortality rate can change daily.

Case numbers reported on other websites may differ from what is posted on CDC’s website because CDC’s overall case numbers are validated through a confirmation process with each jurisdiction. The process used for finding and confirming cases displayed by other reporting jurisdictions may differ. Differences between reporting jurisdictions and CDC’s website may occur due to the timing of reporting and website updates.

Case surveillance data are useful for tracking national trends in disease incidence (the number of new cases of a disease in a population at a certain time period). Limitations of using case surveillance data to understand the epidemiology (who, what, where, when, how) of COVID-19 include the following:

First, case surveillance data do not represent the true burden of COVID-19 in the United States. Many people infected with the virus that causes COVID-19 do not seek medical care or get tested. The information collected might be limited if people are unavailable or unwilling to provide additional information or if medical records are unavailable for data extraction.

Second, most of the case reports captured by health departments are based on laboratory reports that usually contain limited information on the patient. Because of the volume of cases, most health departments are unable to conduct investigations of every case to obtain additional information. Because of this, most case reports are missing data on patient demographics, symptoms, underlying health conditions, characteristics of hospitalizations such as ventilator use, and other factors such as recent travel history. Because of missing data, analyses of these data elements are likely an underestimate of the true occurrence.

Third, it is difficult to capture asymptomatic cases through case surveillance. People who are asymptomatic are unlikely to seek testing unless they are identified through active screening (e.g., contact tracing), and investigation of symptomatic people is prioritized.

When disease volume is high and a limited number of data elements are captured on each reported case, case surveillance data can be used to assess population burden, track the spread of the disease, monitor increases and decreases in cases in association with mitigation strategies, and study selected demographics such as age, sex, race and ethnicity, and geography. Clinical details and other characteristics about people with COVID-19 can be better assessed through special studies. CDC conducts these special epidemiologic studies to better understand risk factors, such as underlying conditions that might put people at increased risk for serious infection. CDC also conducts special studies using hospitalization and treatment data to better understand the clinical course of COVID-19 illness.

New COVID-19 cases and deaths are recorded based on data collected and reported by state, local, and territorial health departments. This information can be affected by local testing practices, laboratory capacity, and medical resources. Comparing the COVID-19 situation among jurisdictions should not be based on these rates alone.

When studying the COVID-19 situation in these jurisdictions, the rate of new COVID-19 cases should be combined with other data, including the number of tests performed, the proportion of tests that are positive for SARS-CoV-2, testing policies, excess deaths, and hospital and ICU admission rates.

In addition, jurisdictions vary in the completeness of certain demographic data for COVID-19 cases. Most states have demographic factors like age and sex for most reported cases. However, in many states, the large number of COVID-19 cases has severely strained the ability to report cases with complete demographic information for race and ethnicity. With thousands of cases being reported, completeness of these elements is unlikely to improve in the immediate future for some jurisdictions. Because the racial and ethnic composition of the U.S. population varies by geographic area, comparisons of COVID-19 case information should consider the population of each geographic area. In addition, because completeness of race and ethnicity information may vary by state or geographic area and by other patient factors, such as severity of illness, CDC’s case data may not be generalizable to the entire U.S. population.

CDC has worked with state and jurisdictional health departments to improve reporting of critical case surveillance data elements such as age, race and ethnicity, and death. With thousands of cases being reported, the reporting of some data elements remains low, but state and jurisdictional health departments have continued to make improvements in completeness of data collection for COVID-19 through methods such as automated data flows. As the epidemic changes and number of new cases goes down, CDC and our state, tribal, local, and territorial partners will continue to evaluate the most efficient means to increase the completeness and availability of actionable public health data.

Surveillance Reports

Yes. On April 3, we posted the first COVIDView report, which is posted weekly. This report, updated each Friday, summarizes and interprets key indicators, including information related to COVID-19 outpatient visitsemergency department visitshospitalizations, deaths, and laboratory test results.

In these weekly reports, CDC provides data on hospitalization rates and patient demographics as part of its COVID-NET surveillance system.

COVID-19 surveillance data are also used to produce publications, including CDC’s Morbidity and Mortality Weekly Report (MMWR), and to inform guidance documents to protect people from COVID-19 in a variety of settings.

CDC COVID Data Tracker

CDC COVID Data Tracker is a website that allows users to interact with a variety of COVID-19 data that are updated daily. The website builds on other agency efforts—such as CDC’s weekly COVID-19 surveillance report, COVIDView—to capture the impact the virus is having in the United States. CDC COVID Data Tracker presents data using visual dashboards that include interactive maps and graphs.

Tabs on CDC COVID Data Tracker are updated daily unless otherwise specified in the footnote of a given tab. Specifics of data reporting are described in a footnote on each page.

Yes, there are multiple datasets that can be downloaded directly from COVID Data Tracker. To download data from COVID Data Tracker, navigate to the data table in the tab you are viewing and click on the download icon (as seen here).

download solid icon

To download case and death data over time, including historical data, visit the U.S. and State Trends tab and click the download icon. Alternatively, the data can be downloaded at data.cdc.gov.

To download the most current aggregate case and death data, visit the Cases and Deaths by State tab on COVID Data Tracker and click the download icon.

Line-level data, including patient sex, age group, hospitalization status and race/ethnicity (where available) can be downloaded via the COVID-19 Case Surveillance Public Use Dataset.

You can conduct your own analyses using the available datasets to determine the number and selected characteristics of lab-confirmed cases reported to CDC through a specific date. You can download deidentified CDC case surveillance data, which includes fields for initial case report date to CDC, date of first positive specimen collection, case status (lab-confirmed vs. probable), and others.