HCUP National Estimates

Thank you for joining us for this Healthcare Cost and Utilization Project (HCUP) online tutorial on producing national and regional estimates. This tutorial was created for researchers who are using HCUP national databases, understand the design of the national databases, and are ready to produce national and regional estimates.

In this tutorial you'll learn how to produce national and regional estimates by weighting the unweighted HCUP data.

About HCUP

Before we get started, a quick word about HCUP:

HCUP is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is a family of databases, software tools, and related research products that enable research on a variety of healthcare topics.

If you are unfamiliar with HCUP or would like a refresher, please consider taking our General Overview Course.

Learning Objectives

There are three learning objectives in this tutorial:

The first objective is to understand how the three national databases (the NIS, or Nationwide Inpatient Sample; the NEDS, or Nationwide Emergency Department Sample; and the KID, or Kids' Inpatient Database) can be weighted to produce national and regional estimates.

The second objective is to select and apply the appropriate discharge or hospital weight in order to generate national estimates at the discharge or hospital level from unweighted record counts.

The third objective is to understand when it is appropriate to use the NIS and NEDS databases as unweighted samples. This module introduces weighting each of the three national HCUP databases.

Weighting HCUP Data

Why do we need to weight HCUP data?

Most researchers working with the HCUP nationwide databases are interested in using the data to create national and regional estimates of hospitals, hospital discharges, or emergency department visits, and will therefore want to weight the data.

The national HCUP samples are designed to facilitate the development of such estimates. The samples are built so that they can be weighted up to national and regional levels.

For an in-depth explanation of the sample designs, you can access the HCUP online course on Sample Design of National Databases. The next sections will cover how each national database can be used to produce national and regional estimates.

Nationwide Inpatient Sample (NIS)

The NIS is a database of hospital inpatient discharges which can be used to create national and regional estimates of hospital utilization, access, costs and quality.

In order to perform such analyses on the NIS data contained in the Core File, you must weight the unweighted observations.

Weighting the data will enable you to produce nationally representative estimates from the sample of hospitals in the HCUP NIS database.

NIS Discharge Weights

The weights you apply to the data depend on the type of estimates you want to produce. NIS data can be weighted to produce discharge-level estimates or hospital-level estimates.

To produce discharge-level estimates, such as estimates of the total number of discharges with a diagnosis of asthma in the US or estimates of the total number of discharges in the US for individuals age 65 and over, you must apply a discharge weight to each record in the Core File.

The discharge weights are calculated for NIS data by first stratifying the NIS hospitals on the same variables that were used for creating the sample. These variables are geographic region, urban/rural location, teaching status, bed size, and ownership. A weight is then calculated for each stratum by dividing the number of universe discharges in that stratum - obtained from American Hospital Association (AHA) data - by the number of NIS discharges in the stratum. Weighted estimates are calculated by uniformly applying stratum weights to the discharges according to the stratum from which the discharge was drawn.

Weights have been assigned to each discharge and are stored in each record in the data element DISCWT. When the discharge weights are applied to the unweighted NIS data, the result is an estimate of the number of discharges for the entire universe. In the case of the NIS, the universe is all inpatient discharges from community hospitals in the US.

02_030_04.jpg

About the Demonstrations

This tutorial will use SAS® to demonstrate how to weight HCUP data to produce national and regional estimates. In addition to SAS®, there are several other statistical software packages which are capable of producing statistics from the stratified, single-cluster sampling design of the national HCUP databases. STATA® and SPSS® are two commonly used examples. For a more detailed explanation of how to use these software packages to work with the national HCUP databases please refer to the documentation available on HCUP-US, including the Methods Report on Calculating Nationwide Inpatient Sample Variances.

During all demonstrations, this tutorial will refer to CCS diagnosis codes. Clinical Classification Software utilizes a categorization scheme that collapses the universe of ICD-9-CM diagnosis codes into over 260 clinically meaningful diagnosis categories. It does the same for procedure codes. The CCS categorization scheme has been applied to the records within the HCUP databases and the CCS codes are stored in each record.

NIS Unweighted Discharge Record Count

As a means of demonstration in this tutorial, we will tabulate the number of records in the NIS for which asthma is indicated as a principal diagnosis.

1.   First determine which records do and do not have asthma listed as the principal CCS diagnosis. The CCS code for asthma is 128.

2.   Use PROC SURVEYMEANS to generate statistics about the records which do and do not have asthma listed as a principal diagnosis. The SURVEYMEANS statement accounts for the complex sample design of the NIS.

 

CODE: Count records with CCS=128 (asthma) from 2007 NIS File

 

Title1 Count records with CCS=128 (asthma) from 2007 NIS File';

libname nis2007 "C:\NIS 2007\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data asthma;

set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);

if dxccs1 eq 128 then asthma = 1;

else asthma = 0;

run;

 

PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ;

VAR asthma;

CLUSTER hospid ;

STRATA NIS_stratum ;

run;

 

3.   The resulting output will contain a data summary of the number of strata, clusters, and total observations in the data. In this example, the summary confirms a database composed of 60 strata, 1,044 clusters - each cluster representing a single hospital - and 8,043,415 records - the number of records in the 2007 NIS.

4.   The statistics section provides the results of the analysis. The output in this example confirms that the number of records in the NIS with a principal diagnosis of asthma is 81,443. Remember, this is the number of records in the NIS for which asthma is indicated as a principal diagnosis. This is not an estimate of the number of hospital discharges nationwide for asthma.

OUTPUT: Count records with CCS=128 (asthma) from 2007 NIS File

 

Count records with CCS=128 (asthma) from 2007 NIS File

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 1044

Number of Observations 8043415

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

asthma 0.010125 0.000315 81443 2810.775895

 

 

 

A printer-friendly version of all example code and output shown in this tutorial is available.

NIS National Discharge-Level Estimates

Next this tutorial will cover an example of how to weight at the discharge level.

To estimate the number of hospital discharges nationwide with a principal diagnosis of asthma, weight the data by using the WEIGHT keyword and the DISCWT data element in the PROC SURVEYMEANS step.

CODE: Produce national estimate of discharges with CCS=128 (asthma) from

2007 NIS File (weighted)

 

Title1 'Produce national estimate of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)';

libname nis2007 "C:\NIS 2007\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data asthma;

set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);

if dxccs1 eq 128 then asthma = 1;

else asthma = 0;

run;

 

PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ;

VAR asthma;

WEIGHT discwt;

CLUSTER hospid ;

STRATA NIS_stratum ;

run;

 

In this example, the result is 402,088 - an estimate of the number of hospital discharges, nationwide, with a principal diagnosis of asthma in 2007.

OUTPUT: Produce national estimate of discharges with CCS=128 (asthma) from

2007 NIS File (weighted)

 

Produce national estimate of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 1044

Number of Observations 8043415

Sum of Weights 39541948

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

asthma 0.010169 0.000321 402088 13985

To verify that you have weighted the data correctly, perform a simple query on HCUPnet, the online system which provides quick access to national and regional estimates of HCUP data.

1.   Go to HCUPnet and select "National Statistics on All Stays."

2.   Describe yourself as "Researcher, medical professional."

3.   In this example, you are running a query on a particular diagnosis so you should select "Statistics on specific diagnoses or procedures."

4.   Select 2007 as the data year.

5.   You are using CCS codes to identify asthma patients, so select "Diagnoses grouped by CCS" and then "Principal diagnosis."

6.   Highlight CCS code 128 for asthma and select "Next."

7.   Select "Number of discharges."

8.   Select "All patients in all hospitals."

The HCUPnet results and the SAS® results should be the same - in this case 402,088 discharges with a principal diagnosis of asthma. Note that since the NIS contains close to a 20 percent sample of all US hospital discharges, another simple check on the accuracy of your weighted estimate is to multiply the number of unweighted discharges by 5.

NIS Regional Discharge-Level Estimates

You might want to also produce regional estimates of hospital discharges with a diagnosis of asthma once you have weighted the data. If so, one method for producing these estimates is to create a variable for region by using information contained in the NIS_STRATUM data element. Then, you can use the DOMAIN keyword to indicate that you want to produce estimates of asthma discharges by region. The resulting output will contain a separate line item estimate for each region.

CODE: Produce regional estimates of discharges with CCS=128 (asthma)

from 2007 NIS File (weighted)

 

Title1 Produce regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)';

libname nis2007 "C:\NIS 2007\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data asthma;

set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1);

retain dischgs 1;

region = substr(left(put(nis_stratum,8.)),1,1);

if dxccs1 eq 128 then asthma = 1;

else asthma = 0;

run;

 

PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ;

VAR dischgs;

WEIGHT discwt ;

CLUSTER hospid ;

STRATA NIS_stratum ;

DOMAIN region * asthma ;

run;


 

 

OUTPUT: Produce regional estimates of discharges with CCS=128 (asthma)

from 2007 NIS File (weighted)

 

Produce regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 1044

Number of Observations 8043415

Sum of Weights 39541948

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

dischgs 1.000000 0 39541948 799355

 

 

Domain Analysis: region*asthma

 

Std Error

region asthma Variable Mean of Mean Sum Std Dev

1 0 dischgs 1.000000 0 7660700 335678

1 dischgs 1.000000 0 92596 8089.096894

2 0 dischgs 1.000000 0 9038455 322029

1 dischgs 1.000000 0 91657 5868.575661

3 0 dischgs 1.000000 0 15112513 589289

1 dischgs 1.000000 0 160784 9 133.832891

4 0 dischgs 1.000000 0 7328192 256261

1 dischgs 1.000000 0 57051 3505.469814

 

 

 

Check your results using HCUPnet. The first part of the query on HCUPnet will be the same as that you performed for the national estimate. In terms of patient and hospital characteristics, this time you want to see the discharges by region, so you should select "Region of the US."

The HCUPnet results and the SAS® results should be the same.

NIS Discharge Weights over Time

NIS data are available annually going back to 1988. The NIS discharge weight variable has changed over time.

Years

Variable Name

Use

2001 and later

DISCWT

All national estimates

2000

DISCWT

National estimates except those including total charge

2000

DISCWTcharge

National estimates of total charge

1998-1999

DISCWT

All national estimates

1998

DISCWT_U

All national estimates

NIS Hospital Weights

To produce hospital-level estimates you must apply hospital weights to the data.

The hospital weights are calculated according to the NIS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents during the year. Since twenty percent of the AHA universe hospitals in each stratum are sampled when possible, the hospital weights are usually near five.

The hospital weights are represented by the data element HOSPWT and are stored in each hospital record in the Hospital File. When the hospital weights are applied to the unweighted NIS hospital observations, the result is the number of hospitals for the entire universe - in the case of the NIS, the universe is all US community hospitals.

02_030_04a.jpg

NIS Unweighted Hospital Record Count

To estimate the number of teaching hospitals nationwide, use the HOSP_TEACH variable to tabulate the number of hospital records in the NIS which are classified as teaching hospitals.

CODE: Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File

 

Title1 Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File';

libname nis2007 "C:\NIS 2007\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data TEACH1;

set NIS2007.nis_2007_hospital (keep=HOSPID DISCWT NIS_STRATUM HOSP_TEACH);

if hosp_teach = 1 then teach = 1;

else teach = 0;

run;

 

PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ;

VAR teach;

CLUSTER hospid ;

STRATA NIS_stratum ;

run;

 

The result is 191. This is the number of hospital records in the NIS which are classified as teaching hospitals. This is not a national estimate of teaching hospitals because you did not use a weighting variable.

OUTPUT: Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File

 

Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File

 

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 1044

Number of Observations 1044

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

teach 0.182950 0.003682 191.000000 3.844516

NIS National Hospital-Level Estimate

To estimate the number of teaching hospitals nationwide, weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

CODE: Produce national estimate of hospitals with HOSP_TEACH=1 from

2007 NIS HOSPITAL File (weighted)

 

Title1 'Produce national estimate of hospitals with HOSP_TEACH =1 from 2007 NIS HOSPITAL File (weighted)';

libname nis2007 "C:\NIS 2007\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data TEACH1;

set NIS2007.nis_2007_hospital (keep=HOSPID HOSPWT NIS_STRATUM HOSP_TEACH);

if hosp_teach = 1 then teach = 1;

else teach = 0;

run;

 

PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ;

VAR teach;

WEIGHT hospwt;

CLUSTER hospid ;

STRATA NIS_stratum ;

run;

 

The result is 927. This should be an estimate of the number of teaching hospitals nationwide.

OUTPUT: Produce national estimate of hospitals with HOSP_TEACH=1 from

2007 NIS HOSPITAL File (weighted)

 

Produce national estimate of hospitals with HOSP_TEACH =1 from 2007 NIS HOSPITAL File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 1044

Number of Observations 1044

Sum of Weights 5099

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

teach 0.181774 0.003672 926.863668 18.722859

 

To verify that you have weighted the data correctly, HCUPnet can be used to verify the results for more recent years of the NIS.

1.   Because you are looking at hospital-level data, select "Statistics on U.S. Hospitals."

2.   Select 2007 as the data year.

3.   Then create your own hospital group. In this example, select "All bed sizes," "All locations," and "All hospital ownership/control types."

4.   You are only interested in teaching hospitals, so under teaching status, select "Teaching."

5.   At this point, you are interested in a national estimate, so select "Entire U.S." and select "Next."

There are 927 teaching hospitals nationwide according to HCUPnet as well as the SAS® program.

NIS Hospital Weights over Time

Note that the variable for hospital weights depends on the data year. For the 1998 NIS and later years, HOSPWT should be used to create nationwide estimates. For NIS databases prior to 1998, the variable HOSPWT_U should be used.

NIS Unweighted Analysis

Depending on the nature of your research, you may be interested in using the NIS as an unweighted sample.

An analysis of the association between hospital-level results from a survey on safe hospital practices and hospital-level inpatient risk-adjusted mortality scores is one example of research for which it would be appropriate to use the NIS as an unweighted sample.

In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming.

NIS Summary

Let's review what you need to do to create national or regional estimates using NIS data.

Weight your data with discharge weights (DISCWT) for discharge-level estimates.

Weight your data with hospital weights (HOSPWT) for hospital-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NIS as an unweighted sample, do not apply weights to your data.

If you are interested in a more detailed statistical explanation of weighting, refer to the documentation available on the HCUP-US website, particularly the Methods Report on Calculating Nationwide Inpatient Sample Variances.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Nationwide Emergency Department Sample (NEDS)

The NEDS is a database of emergency department visits - visits for which the patient was treated and released as well as visits that resulted in a hospital admission - going back annually to 2006.

The NEDS can be used to produce national and regional estimates of emergency department care, utilization, access, costs and quality.

In order to produce national or regional estimates of the NEDS data contained in the Core File, you must weight the unweighted observations.

NEDS Discharge Weights

The weights you apply will depend on the type of analysis you are performing. Like the NIS data, NEDS data can be weighted to produce discharge-level estimates or hospital-level estimates.

To produce discharge-level estimates, such as estimating the number of emergency department visits for influenza in the US or estimating the number of emergency department visits for hip fractures among the elderly in the US, you must apply a discharge weight to each record in the Core File.

The discharge weights are calculated for NEDS data by first stratifying the NEDS hospitals on the same variables that were used for creating the sample. These variables are geographic region, trauma center designation, urban/rural location, teaching status, and ownership. A weight is then calculated for each stratum by dividing the number of universe discharges in that stratum - obtained from AHA data - by the number of NEDS discharges in the stratum. Weighted estimates are calculated by uniformly applying the stratum weights to the discharges according to the stratum from which the discharge was drawn.

Weights have been assigned to each record. In each record in the NEDS Core File, the weight is stored in the data element DISCWT. When the discharge weights are applied to the unweighted NEDS observations, the result is an estimate of the number of discharges for the entire universe. In the case of the NEDS, the universe is emergency department visits nationwide.

02_030_04.jpg

NEDS Unweighted Discharge Record Count

This tutorial will now demonstrate weighting the data using SAS®.

1.   As with the NIS data, begin by running a simple program to see how many records there are in the NEDS for which influenza is indicated as a first-listed diagnosis.

2.   Note that influenza is CCS code 123, so look for records in which DXCCS1 equals 123.

CODE: Count records with CCS=123 (influenza) from 2006 NEDS File

 

Title1 Count records with CCS=123 (influenza) from 2006 NEDS File';

libname neds2006  C:\NEDS 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data influenza1;

set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1);

if dxccs1 eq 123 then influenza = 1;

else influenza = 0;

run;

 

PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ;

VAR influenza;

CLUSTER hosp_ed ;

STRATA NEDS_stratum ;

run;

 

3.   In the resulting output, the data summary provides the number of strata, clusters, and total observations in the data. The summary in this example confirms a database that contains 71 strata, 958 clusters - each cluster representing a single hospital emergency department - and 25,954,816 records - the number of records in the 2006 NEDS.

4.   The statistics section provides the results of the particular analysis. The result is 46,185. This is the number of records in the NEDS for which influenza is indicated as a first-listed diagnosis. This is not an estimate of the number of emergency department visits nationwide for influenza.



OUTPUT: Count records with CCS=123 (influenza) from 2006 NEDS File

 

Count records with CCS=123 (influenza) from 2006 NEDS File

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 71

Number of Clusters 958

Number of Observations 25954816

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

influenza 0.001779 0.000065779 46185 1852.887261

 

NEDS National Discharge-Level Estimates

To estimate the number of nationwide emergency department visits with a first-listed diagnosis of influenza weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

CODE: Produce national estimate of discharges with CCS=123 (influenza)

from 2006 NEDS File (weighted)

 

Title1 Produce national estimate of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)';

libname neds2006  C:\NEDS 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data influenza1;

set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1);

if dxccs1 eq 123 then influenza = 1;

else influenza = 0;

run;

 

PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ;

VAR influenza;

WEIGHT discwt;

CLUSTER hosp_ed ;

STRATA NEDS_stratum ;

run;

 

 

The result is 211,740 - an estimate of the number of nationwide emergency department visits, both those in which the patient was treated and released and those that resulted in a hospital admission, with a first-listed diagnosis of influenza in 2007.

OUTPUT: Produce national estimate of discharges with CCS=123 (influenza)

from 2006 NEDS File (weighted)

 

Produce national estimate of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 71

Number of Clusters 958

Number of Observations 25954816

Sum of Weights 120033750

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

influenza 0.001764 0.000070788 211740 8898.259246

 


 

To verify that you have weighted the data correctly, perform a query on HCUPnet.

1.   Go to HCUPnet and select "National Statistics on All ED Visits."

2.   Select "All ED Visits."

3.   You are running a query on a particular first-listed diagnosis so select the first option, "Statistics on specific diagnoses."

4.   Select 2006 as the data year.

5.   Use CCS codes to identify influenza patients. Select "Diagnoses grouped by Clinical Classifications Software (CCS)" and then "First-listed diagnosis."

6.   Highlight CCS code 123 for influenza and select "Next."

7.   Select "Number of discharges."

8.   Select "All patients in all hospitals."

The HCUPnet results and the SAS® results should be the same.

NEDS Regional Discharge-Level Estimates

Once you have weighted the data, you might want to also produce regional estimates of emergency department visits for influenza. If so, one method for producing these estimates is to create a variable for region by using information contained in the NEDS_STRATUM data element. Then, use the DOMAIN keyword to indicate that you want to produce estimates of influenza discharges by region.

CODE: Produce regional estimates of discharges with CCS=123 (influenza)

from 2006 NEDS File (weighted)

 

Title1 Produce regional estimates of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)';

libname neds2006  C:\NEDS 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data influenza1;

set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1);

retain edrecs 1;

region = substr(left(put(neds_stratum,8.)),1,1);

if dxccs1 eq 123 then influenza = 1;

else influenza = 0;

run;

 

PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ;

VAR edrecs;

WEIGHT discwt ;

CLUSTER hosp_ed ;

STRATA NEDS_stratum ;

DOMAIN region * influenza ;

run;

 

 

OUTPUT: Produce regional estimates of discharges with CCS=123 (influenza)

from 2006 NEDS File (weighted)

 

Produce regional estimates of discharges with CCS=123 (influenza) from 2006 NEDS File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 71

Number of Clusters 958

Number of Observations 25954816

Sum of Weights 120033750

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

edrecs 1.000000 0 120033750 2477212

 

 

Domain Analysis: region*influenza

 

Std Error

region influenza Variable Mean of Mean Sum Std Dev

1 0 edrecs 1.000000 0 23520030 1120105

1 edrecs 1.000000 0 27885 2620.732277

2 0 edrecs 1.000000 0 27789334 1153925

1 edrecs 1.000000 0 47477 3196.225129

3 0 edrecs 1.000000 0 46771539 1656723

1 edrecs 1.000000 0 118487 7742.256091

4 0 edrecs 1.000000 0 21741106 889365

1 edrecs 1.000000 0 17892 1467.104457

 

 

 


 

Check your results by running a quick query on HCUPnet.

1.   The first part of the query on HCUPnet will be the same as that you performed for the national estimate of emergency department visits with a first-listed diagnosis of influenza.

2.   Select "Number of discharges."

3.   Select "All patients in all hospitals."

4.   In terms of patient and hospital characteristics, this time to see the discharges by region, select "Region of the U.S."

The HCUPnet results and the SAS® results should be the same.

NEDS Hospital Weights

To produce hospital-level estimates, such as the number of US emergency departments with a trauma center designation, you will apply a hospital weight, HOSPWT, to the data.

HOSPWT is also calculated according to the NEDS strata. Within each of the strata, each hospital's weight is equal to the number of universe hospitals it represents during the year.

To demonstrate how to weight at the hospital level, let's consider an analysis which requires us to estimate the number of emergency departments in the US with a trauma center.

NEDS Unweighted Hospital ED Record Count

First, tabulate the number of emergency department records in the NEDS which are classified as having a trauma center.

CODE: Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9)

from 2006 NEDS HOSPITAL File

 

Title1 Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File';

libname neds2006  C:\NEDS 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data TRAUMA1;

set NEDS2006.neds_2006_hospital (keep=HOSP_ED DISCWT NEDS_STRATUM HOSP_TRAUMA);

if hosp_trauma in (1,2,3,8,9) then trauma = 1;

else trauma = 0;

run;

 

PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN STDERR ;

VAR trauma;

CLUSTER hosp_ed ;

STRATA NEDS_stratum ;

run;

 

 

The result is 131. This is the number of emergency department records in the NEDS with a trauma center.

OUTPUT: Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9)

from 2006 NEDS HOSPITAL File

 

Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 71

Number of Clusters 958

Number of Observations 958

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

trauma 0.136743 0 131.000000 0

 

NEDS National Hospital ED-Level Estimate

To estimate the number of emergency departments with a trauma center nationwide you will weight the data.

CODE: Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9)

from 2006 NEDS HOSPITAL File (weighted)

Title1 Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted)';

libname neds2006  C:\NEDS 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data TRAUMA1;

set NEDS2006.neds_2006_hospital (keep=HOSP_ED HOSPWT NEDS_STRATUM HOSP_TRAUMA);

if hosp_trauma in (1,2,3,8,9) then trauma = 1;

else trauma = 0;

run;

 

PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN STDERR ;

VAR trauma;

WEIGHT hospwt;

CLUSTER hosp_ed ;

STRATA NEDS_stratum ;

run;

 

 

The result is 697. This is an estimate of the number of emergency departments with a trauma center nationwide.

OUTPUT: Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9)

from 2006 NEDS HOSPITAL File (weighted)

 

Produce national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 71

Number of Clusters 958

Number of Observations 958

Sum of Weights 4845

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

trauma 0.143860 0 697.000000 0

 

NEDS Unweighted Analysis

Depending on the nature of your research, you may be interested in using the NEDS as an unweighted sample.

Examples of an analysis in which it would be appropriate to use the NEDS as an unweighted sample include a hospital-level study in which NEDS emergency department-level data are linked to data on pre-hospital care, such as that provided by emergency medical services.

In most cases, it is critical to weight the data to produce accurate, unbiased results. However, if your research does not necessitate creating national or regional estimates, do not use any weights in your programming.

NEDS Summary

Let's review what you need to do to create national or regional estimates using NEDS data.

Weight your data with discharge weights (DISCWT) for discharge-level estimates.

Weight your data with hospital weights (HOSPWT) for hospital-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

In most cases, weighting the data is critical to producing accurate and unbiased results. However, if you are not producing national or regional estimates, but rather are using the NEDS as an unweighted sample, do not apply weights to your data.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Kids' Inpatient Database (KID)

The third national database is the KID - the database designed specifically for the study of pediatric conditions that require hospitalization.

The KID is produced every three years starting with the 1997 data year.

In order to produce national or regional estimates using KID data, you must weight the unweighted observations in the KID file.

KID Discharge Weights

KID data must be weighted to perform discharge-level analyses. Because of the sample design of the KID, it cannot be used as an unweighted database. For more information on the unique sample design of the KID, see the Sample Design tutorial.

Discharge weights are calculated for KID data by stratifying the hospitals on the same variables that were used for creating the sample and then creating weights by stratum. The stratifying variables are geographic region, urban/rural location, teaching status, bed size, ownership, and children's hospital.

Remember that the KID was designed for the study of pediatric hospitalizations, and that the discharges in the KID are a combination of newborn discharges (including both complicated and uncomplicated) and non-newborn pediatric discharges. Because of this, for each stratum, weights are created for both newborn discharges and non-newborn pediatric discharges.

The weights are created for newborn discharges (both complicated and non-complicated) by dividing the number of universe newborns in the stratum by the number of KID newborns in the stratum.

And the weights are created for non-newborn discharges by dividing the number of universe non-newborn pediatric discharges in the stratum by the number of KID non-newborn discharges in the stratum.

Weighted estimates are generated by applying the stratum weights to the discharges according to the stratum from which the discharge was drawn and according to whether the discharge is newborn or non-newborn.

KID Unweighted Discharge Record Count

Next, this tutorial will demonstrate how to weight the KID to produce national and regional estimates of pediatric hospital discharges with a principal diagnosis of cystic fibrosis - a discharge-level estimate. Remember that pediatric discharges are defined as those for which the patient was age 20 or less at admission.

1.   Tabulate the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. In other words, records in which DXCCS1 equals 56.

CODE: Count records with CCS=56 (cystic fibrosis) from 2006 KID File

 

Title1 Count records with CCS=56 (cystic fibrosis) from 2006 KID File';

libname kid2006 "C:\KID 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data cf1;

set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_stratum);

if dxccs1 eq 56 then cysticf = 1;

else cysticf = 0;

run;

 

PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ;

VAR cysticf;

CLUSTER hospid ;

STRATA KID_stratum ;

run;

 

2.   The resulting output provides a data summary of the number of strata, clusters, and total observations in the data. The summary shown here confirms a database that contains 60 strata, 3,739 clusters–each cluster representing a single hospital–and 3,131,324 records, the number of records in the 2006 KID.

3.   The statistics section provides the results of the particular analysis. The result is 4,063. This is the number of records in the KID for which cystic fibrosis is indicated as a principal diagnosis. This is not an estimate of pediatric discharges nationwide for cystic fibrosis.

OUTPUT: Count records with CCS=56 (cystic fibrosis) from 2006 KID File

 

Count records with CCS=56 (cystic fibrosis) from 2006 KID File

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 3739

Number of Observations 3131324

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

cysticf 0.001298 0.000103 4063.000000 346.994786

 


 

KID National Discharge-Level Estimates

Weight the data using the WEIGHT keyword in the PROC SURVEYMEANS step.

CODE: Produce national estimate of discharges with CCS=56 (cystic fibrosis)

from 2006 KID File (weighted)

 

Title1 Produce national estimate of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)';

libname kid2006  C:\KID 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data cf1;

set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_STRATUM);

if dxccs1 eq 56 then cysticf = 1;

else cysticf = 0;

run;

 

PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ;

VAR cysticf;

WEIGHT discwt;

CLUSTER hospid ;

STRATA KID_stratum ;

run;

 

 

The result is 6,947. This is an estimate of the number of pediatric hospital discharges, nationwide, with a principal diagnosis of cystic fibrosis in 2006.

OUTPUT: Produce national estimate of discharges with CCS=56 (cystic fibrosis)

from 2006 KID File (weighted)

 

Produce national estimate of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 3739

Number of Observations 3131324

Sum of Weights 7558812.48

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

cysticf 0.000919 0.000075764 6946.648756 601.770739


 

Note that HCUPnet contains a path for querying National Statistics on Children. However, this path provides statistics only on discharges in which the patient was age 17 or less. The KID file from the Central Distributor provides data on discharges where patients are age 20 or less. As a result, the weighted estimates produced will not exactly match those provided by HCUPnet unless you limit the data set to those age 17 or less when creating your estimate of discharges.

That said, use HCUPnet to get a ballpark idea of what the estimate should be.

1.   Select "National Statistics on Children."

2.   Select "Researcher, medical professional."

3.   Select "Statistics on specific diagnoses or procedures."

4.   Select 2006 as the data year.

5.   Use CCS codes to identify cystic fibrosis patients, so select "Diagnoses grouped by Clinical Classifications Software (CCS)" and then "Principal diagnosis."

6.   Highlight CCS code 56 for cystic fibrosis and select "Next."

7.   Select "Number of discharges."

8.   Select "All patients in all hospitals."

The HCUPnet total will be smaller than the total produced by SAS®, but in the same overall range.

KID Regional Discharge-Level Estimates

Once you have weighted the data, you might want to also produce regional estimates of emergency department visits for cystic fibrosis. Because the design of the KID is different from that of the NIS and NEDS, you need to use a different method for creating regional estimates than the one used above. For the KID, you have to merge the Hospital File with the Core File in order to pick up the hospital region data element. Then, you use the DOMAIN keyword to indicate that you want to produce estimates of cystic fibrosis discharges by region. Note that the same method can be used to produce regional estimates from the NIS and NEDS.

CODE: Produce regional estimates of discharges with CCS=56 (cystic fibrosis)

from 2006 KID File (weighted)

 

Title1 'Produce regional estimates of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)';

libname kid2006 "C:\KID 2006\";

options obs = MAX PageSize=51 LineSize=146 ;

 

data cf1;

set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1);

retain dischgs 1;

if dxccs1 eq 56 then cysticf = 1;

else cysticf = 0;

run;

 

proc sort data=cf1;

by hospid;

run;

 

proc sort data=KID2006.kid_2006_hospital (keep=HOSPID KID_STRATUM Hosp_region) out=hosp;

by hospid;

run;

 

data cf2;

merge cf1 (in=a)

hosp (in=b);

by hospid;

if a and b;

region = Hosp_region ;

run;

 

PROC SURVEYMEANS DATA=cf2 SUM STD MEAN STDERR ;

VAR dischgs;

WEIGHT discwt ;

CLUSTER hospid ;

STRATA KID_stratum ;

DOMAIN region * cysticf ;

run;

 

 

 

 

OUTPUT: Produce regional estimates of discharges with CCS=56 (cystic fibrosis)

from 2006 KID File (weighted)

 

Produce regional estimates of discharges with CCS=56 (cystic fibrosis) from 2006 KID File (weighted)

 

The SURVEYMEANS Procedure

 

Data Summary

 

Number of Strata 60

Number of Clusters 3739

Number of Observations 3131324

Sum of Weights 7558812.48

 

 

Statistics

 

Std Error

Variable Mean of Mean Sum Std Dev

dischgs 1.000000 0 7558812 123453

 

 

Domain Analysis: cysticf*region

 

Std Error

cysticf region Variable Mean of Mean Sum Std Dev

0 1 dischgs 1.000000 0 1276711 56801

2 dischgs 1.000000 0 1645709 59677

3 dischgs 1.000000 0 2894738 92104

4 dischgs 1.000000 0 1734707 67196

1 1 dischgs 1.000000 0 1302.943850 329.418971

2 dischgs 1.000000 0 2002.856118 397.992297

3 dischgs 1.000000 0 2222.916758 365.441907

4 dischgs 1.000000 0 1417.932030 304.223111

 

 

KID Weights over Time

The KID discharge weight variable has changed over time.

Years

Variable Name

Use

2003 and later

DISCWT

All national estimates

2000

DISCWT

National estimates except those including total charge

2000

DISCWTcharge

National estimates of total charge

1997

DISCWT_U

All national estimates

KID Limitations

Unlike NIS and NEDS data, KID data cannot be used to produce hospital-level estimates because the data for the KID was sampled at the discharge level rather than at the hospital level.

Note that the KID cannot be used as an unweighted sample because of the design of the database.

KID Summary

Let's review what you need to do to create national or regional estimates using KID data.

Weight your data with discharge weights (DISCWT) for discharge-level estimates.

Check the weighted data totals by performing a quick query on HCUPnet.

Hospital-level analyses are not possible with KID data.

The KID cannot be used as an unweighted sample. Weights should always be applied to the data.

Remember that the national databases cannot be used to conduct state-level analyses. If you are interested in performing state-level analyses, use the HCUP state-specific databases.

Key Points

In summary, weighting is a key concept when working with the HCUP national databases.

What to do:

Remember that the NIS, NEDS, and KID are sample databases. Thus, to produce national or regional estimates from these databases, you must be sure to properly weight the data.

It is important that you select the proper weight based on the database, the year of data, and the type of analysis you are conducting.

Check your estimates against HCUPnet to ensure that you are using the weights appropriately and calculating estimates and variances accurately.

And keep in mind that proper statistical techniques must also be used to calculate standard errors and confidence intervals when using each of the national databases. For detailed instructions, refer to the special report Calculating Nationwide Inpatient Sample Variances on the HCUP-US Website.

What not to do:

State-level analyses cannot be conducted with the national HCUP databases because the sampling frames are not designed with state as a stratification variable. If you are interested in analyses by state, you should use the state-specific databases.

HCUPnet cannot be used to check unweighted estimates, as weights have been applied to the HCUPnet data.

Remember that the KID cannot be used as an unweighted database.

Resources and Other Training

If you are looking for more information on the subject matter covered here, several resources are available on the HCUP User Support website.

If you can't find what you need, feel free to email the HCUP Technical Assistance staff at hcup@ahrq.gov. AHRQ has senior research personnel available to answer technical questions you may have.

Thank you for accessing this module. There are several other HCUP online tutorials. Access these tutorials to learn if there are other topics that could be helpful to you.

If you have any feedback regarding this module, please email us at hcup@ahrq.gov.