HCUP National
Estimates
1.
Introduction
1.
Welcome
2.
About HCUP
2.
NIS
1.
Nationwide Inpatient Sample
(NIS)
4.
NIS Unweighted Discharge
Record Count
5.
NIS National Discharge-Level
Estimates
6.
NIS Regional Discharge-Level
Estimates
7.
NIS Discharge Weights Over
Time
9.
NIS Unweighted Hospital
Record Count
10.
NIS National Hospital-Level
Estimate
11.
NIS Hospital Weights Over
Time
13.
NIS Summary
3.
NEDS
1.
Nationwide Emergency
Department Sample (NEDS)
3.
NEDS Unweighted Discharge
Record Count
4.
NEDS National Discharge-Level
Estimates
5.
NEDS Regional Discharge-Level
Estimates
7.
NEDS Unweighted Hospital ED
Record Count
8.
NEDS National Hospital
ED-Level Estimate
10.
NEDS Summary
4.
KID
1.
Kids' Inpatient Database
(KID)
3.
KID Unweighted Discharge
Record Count
4.
KID National Discharge-Level
Estimates
5.
KID Regional Discharge-Level
Estimates
8.
KID Summary
5.
Wrap-Up
1.
Key Points
Welcome
Thank
you for joining us for this Healthcare Cost and Utilization Project (HCUP)
online tutorial on producing national and regional estimates. This tutorial was
created for researchers who are using HCUP national databases, understand the
design of the national databases, and are ready to produce national and
regional estimates.
In
this tutorial you'll learn how to produce national and regional estimates by
weighting the unweighted HCUP data.
About HCUP
Before
we get started, a quick word about HCUP:
HCUP
is sponsored by the Agency for Healthcare Research and Quality (AHRQ). HCUP is
a family of databases, software tools, and related research products that
enable research on a variety of healthcare topics.
If
you are unfamiliar with HCUP or would like a refresher, please consider taking
our General
Overview Course.
Learning Objectives
There
are three learning objectives in this tutorial:
The
first objective is to understand how the three national databases (the NIS, or
Nationwide Inpatient Sample; the NEDS, or Nationwide Emergency Department
Sample; and the KID, or Kids' Inpatient Database) can be weighted to produce
national and regional estimates.
The
second objective is to select and apply the appropriate discharge or hospital
weight in order to generate national estimates at the discharge or hospital
level from unweighted record counts.
The
third objective is to understand when it is appropriate to use the NIS and NEDS
databases as unweighted samples. This module introduces weighting each of the
three national HCUP databases.
Weighting HCUP Data
Why
do we need to weight HCUP data?
Most
researchers working with the HCUP nationwide databases are interested in using
the data to create national and regional estimates of hospitals, hospital
discharges, or emergency department visits, and will therefore want to weight
the data.
The
national HCUP samples are designed to facilitate the development of such
estimates. The samples are built so that they can be weighted up to national
and regional levels.
For
an in-depth explanation of the sample designs, you can access the HCUP online
course on Sample Design of National Databases. The next sections will
cover how each national database can be used to produce national and regional
estimates.
Nationwide
Inpatient Sample (NIS)
The
NIS is a database of hospital inpatient discharges which can be used to create
national and regional estimates of hospital utilization, access, costs and
quality.
In
order to perform such analyses on the NIS data contained in the Core File, you
must weight the unweighted observations.
Weighting
the data will enable you to produce nationally representative estimates from
the sample of hospitals in the HCUP NIS database.
NIS
Discharge Weights
The
weights you apply to the data depend on the type of estimates you want to
produce. NIS data can be weighted to produce discharge-level estimates or
hospital-level estimates.
To
produce discharge-level estimates, such as estimates of the total number of
discharges with a diagnosis of asthma in the US or estimates of the total
number of discharges in the US for individuals age 65 and over, you must apply
a discharge weight to each record in the Core File.
The
discharge weights are calculated for NIS data by first stratifying the NIS
hospitals on the same variables that were used for creating the sample. These
variables are geographic region, urban/rural location, teaching status, bed
size, and ownership. A weight is then calculated for each stratum by dividing
the number of universe discharges in that stratum - obtained from American
Hospital Association (AHA) data - by the number of NIS discharges in the
stratum. Weighted estimates are calculated by uniformly applying stratum
weights to the discharges according to the stratum from which the discharge was
drawn.
Weights
have been assigned to each discharge and are stored in each record in the data
element DISCWT. When the discharge weights are applied to the unweighted NIS
data, the result is an estimate of the number of discharges for the entire
universe. In the case of the NIS, the universe is all inpatient discharges from
community hospitals in the US.
About the
Demonstrations
This
tutorial will use SAS® to demonstrate how to weight HCUP data to produce
national and regional estimates. In addition to SAS®, there are several other
statistical software packages which are capable of producing statistics from
the stratified, single-cluster sampling design of the national HCUP databases.
STATA® and SPSS® are two commonly used examples. For a more detailed explanation
of how to use these software packages to work with the national HCUP databases
please refer to the documentation available on HCUP-US, including the Methods
Report on Calculating Nationwide Inpatient Sample Variances.
During
all demonstrations, this tutorial will refer to CCS diagnosis codes. Clinical
Classification Software utilizes a categorization scheme that collapses the
universe of ICD-9-CM diagnosis codes into over 260 clinically meaningful
diagnosis categories. It does the same for procedure codes. The CCS
categorization scheme has been applied to the records within the HCUP databases
and the CCS codes are stored in each record.
NIS
Unweighted Discharge Record Count
As
a means of demonstration in this tutorial, we will tabulate the number of
records in the NIS for which asthma is indicated as a principal diagnosis.
1. First determine which
records do and do not have asthma listed as the principal CCS diagnosis. The
CCS code for asthma is 128.
2. Use PROC SURVEYMEANS
to generate statistics about the records which do and do not have asthma listed
as a principal diagnosis. The SURVEYMEANS statement accounts for the complex
sample design of the NIS.
CODE: Count records with CCS=128 (asthma) from 2007 NIS
File |
Title1 Count records with CCS=128 (asthma) from 2007
NIS File'; libname nis2007 "C:\NIS 2007\"; options obs = MAX PageSize=51 LineSize=146 ; data asthma; set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1); if dxccs1 eq 128 then asthma = 1; else asthma = 0; run; PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ; VAR asthma; CLUSTER hospid ; STRATA NIS_stratum ; run; |
3. The resulting output
will contain a data summary of the number of strata, clusters, and total
observations in the data. In this example, the summary confirms a database
composed of 60 strata, 1,044 clusters - each cluster representing a single
hospital - and 8,043,415 records - the number of records in the 2007 NIS.
4. The statistics
section provides the results of the analysis. The output in this example
confirms that the number of records in the NIS with a principal diagnosis of
asthma is 81,443. Remember, this is the number of records in the NIS for which
asthma is indicated as a principal diagnosis. This is not an estimate of the
number of hospital discharges nationwide for asthma.
OUTPUT: Count records with CCS=128 (asthma) from 2007 NIS
File |
Count
records with CCS=128 (asthma) from 2007 NIS File The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 1044 Number
of Observations 8043415 Statistics Std
Error Variable Mean
of Mean Sum Std
Dev asthma 0.010125
0.000315 81443
2810.775895 |
A
printer-friendly version of all example code and output
shown in this tutorial is available.
NIS National
Discharge-Level Estimates
Next
this tutorial will cover an example of how to weight at the discharge level.
To
estimate the number of hospital discharges nationwide with a principal
diagnosis of asthma, weight the data by using the WEIGHT keyword and the DISCWT
data element in the PROC SURVEYMEANS step.
CODE: Produce national estimate of discharges with CCS=128
(asthma) from 2007 NIS File (weighted) |
Title1 'Produce national estimate of discharges with
CCS=128 (asthma) from 2007 NIS File (weighted)'; libname nis2007 "C:\NIS 2007\"; options obs = MAX PageSize=51 LineSize=146 ; data asthma; set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1); if dxccs1 eq 128 then asthma = 1; else asthma = 0; run; PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ; VAR asthma; WEIGHT discwt; CLUSTER hospid ; STRATA NIS_stratum ; run; |
In
this example, the result is 402,088 - an estimate of the number of hospital
discharges, nationwide, with a principal diagnosis of asthma in 2007.
OUTPUT: Produce national estimate of discharges with
CCS=128 (asthma) from 2007 NIS File (weighted) |
Produce
national estimate of discharges with CCS=128 (asthma) from 2007 NIS File
(weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 1044 Number
of Observations 8043415 Sum
of Weights 39541948 Statistics
Std Error Variable Mean
of Mean Sum Std Dev
asthma
0.010169
0.000321 402088 13985
|
To
verify that you have weighted the data correctly, perform a simple query on HCUPnet, the online system
which provides quick access to national and regional estimates of HCUP data.
1.
Go
to HCUPnet and select "National Statistics on All Stays."
2.
Describe
yourself as "Researcher, medical professional."
3.
In
this example, you are running a query on a particular diagnosis so you should
select "Statistics on specific diagnoses or procedures."
4.
Select
2007 as the data year.
5.
You
are using CCS codes to identify asthma patients, so select "Diagnoses grouped
by CCS" and then "Principal diagnosis."
6.
Highlight
CCS code 128 for asthma and select "Next."
7.
Select
"Number of discharges."
8.
Select
"All patients in all hospitals."
The
HCUPnet results and the SAS® results should be the same - in this case 402,088 discharges with a principal diagnosis of asthma. Note that since the
NIS contains close to a 20 percent sample of all US hospital discharges,
another simple check on the accuracy of your weighted estimate is to multiply
the number of unweighted discharges by 5.
NIS Regional
Discharge-Level Estimates
You
might want to also produce regional estimates of hospital discharges with a
diagnosis of asthma once you have weighted the data. If so, one method for
producing these estimates is to create a variable for region by using
information contained in the NIS_STRATUM data element. Then, you can use the
DOMAIN keyword to indicate that you want to produce estimates of asthma
discharges by region. The resulting output will contain a separate line item
estimate for each region.
CODE: Produce regional estimates of discharges with CCS=128
(asthma) from 2007 NIS File (weighted) |
Title1 Produce regional estimates of discharges with
CCS=128 (asthma) from 2007 NIS File (weighted)'; libname nis2007 "C:\NIS 2007\"; options obs = MAX PageSize=51 LineSize=146 ; data asthma; set NIS2007.nis_2007_core (keep=KEY HOSPID DISCWT NIS_STRATUM DXCCS1); retain dischgs 1; region = substr(left(put(nis_stratum,8.)),1,1); if dxccs1 eq 128 then asthma = 1; else asthma = 0; run; PROC SURVEYMEANS DATA=asthma SUM STD MEAN STDERR ; VAR dischgs; WEIGHT discwt ; CLUSTER hospid ; STRATA NIS_stratum ; DOMAIN region * asthma ; run; |
OUTPUT: Produce regional estimates of discharges with
CCS=128 (asthma) from 2007 NIS File (weighted) |
Produce
regional estimates of discharges with CCS=128 (asthma) from 2007 NIS File
(weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 1044 Number
of Observations 8043415 Sum
of Weights 39541948 Statistics Std Error Variable
Mean of Mean Sum Std Dev dischgs
1.000000 0 39541948 799355 Domain
Analysis: region*asthma Std Error region
asthma
Variable Mean of Mean Sum Std Dev 1
0 dischgs 1.000000 0 7660700 335678 1 dischgs 1.000000 0 92596 8089.096894 2
0 dischgs
1.000000 0 9038455 322029 1 dischgs 1.000000 0 91657
5868.575661 3 0 dischgs 1.000000 0 15112513 589289 1 dischgs 1.000000 0 160784 9 133.832891 4
0 dischgs 1.000000 0 7328192 256261 1 dischgs 1.000000 0 57051 3505.469814 |
Check
your results using HCUPnet.
The first part of the query on HCUPnet will be the same as that you performed
for the national estimate. In terms of patient and hospital characteristics,
this time you want to see the discharges by region, so you should select
"Region of the US."
The
HCUPnet results and the SAS® results should be the same.
NIS
Discharge Weights over Time
NIS
data are available annually going back to 1988. The NIS discharge weight
variable has changed over time.
Years |
Variable Name |
Use |
2001 and later |
DISCWT |
All national estimates |
2000 |
DISCWT |
National estimates except those
including total charge |
2000 |
DISCWTcharge |
National estimates of total
charge |
1998-1999 |
DISCWT |
All national estimates |
1998 |
DISCWT_U |
All national estimates |
NIS Hospital Weights
To
produce hospital-level estimates you must apply hospital weights to the data.
The
hospital weights are calculated according to the NIS strata. Within each of the
strata, each hospital's weight is equal to the number of universe hospitals it
represents during the year. Since twenty percent of the AHA universe hospitals
in each stratum are sampled when possible, the hospital weights are usually
near five.
The
hospital weights are represented by the data element HOSPWT and are stored in
each hospital record in the Hospital File. When the hospital weights are
applied to the unweighted NIS hospital observations, the result is the number
of hospitals for the entire universe - in the case of the NIS, the universe is
all US community hospitals.
NIS Unweighted Hospital Record Count
To
estimate the number of teaching hospitals nationwide, use the HOSP_TEACH
variable to tabulate the number of hospital records in the NIS which are
classified as teaching hospitals.
CODE: Count hospital records with HOSP_TEACH=1 from 2007
NIS HOSPITAL File |
Title1 Count hospital records with HOSP_TEACH=1 from
2007 NIS HOSPITAL File'; libname nis2007 "C:\NIS 2007\"; options obs = MAX PageSize=51 LineSize=146 ; data TEACH1; set NIS2007.nis_2007_hospital (keep=HOSPID DISCWT NIS_STRATUM
HOSP_TEACH); if hosp_teach = 1 then teach = 1; else teach = 0; run; PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ; VAR teach; CLUSTER hospid ; STRATA NIS_stratum ; run; |
The
result is 191. This is the number of hospital records in the NIS which are
classified as teaching hospitals. This is not a national estimate of teaching
hospitals because you did not use a weighting variable.
OUTPUT: Count hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File |
Count
hospital records with HOSP_TEACH=1 from 2007 NIS HOSPITAL File The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 1044 Number
of Observations 1044 Statistics Std
Error Variable
Mean of Mean Sum
Std Dev teach
0.182950 0.003682
191.000000
3.844516
|
NIS National
Hospital-Level Estimate
To
estimate the number of teaching hospitals nationwide, weight the data using the
WEIGHT keyword in the PROC SURVEYMEANS step.
CODE: Produce national estimate of hospitals with
HOSP_TEACH=1 from 2007 NIS HOSPITAL File (weighted) |
Title1 'Produce national estimate of hospitals with
HOSP_TEACH =1 from 2007 NIS HOSPITAL File (weighted)'; libname nis2007 "C:\NIS 2007\"; options obs = MAX PageSize=51 LineSize=146 ; data TEACH1; set NIS2007.nis_2007_hospital (keep=HOSPID HOSPWT NIS_STRATUM
HOSP_TEACH); if hosp_teach = 1 then teach = 1; else teach = 0; run; PROC SURVEYMEANS DATA=TEACH1 SUM STD MEAN STDERR ; VAR teach; WEIGHT hospwt; CLUSTER hospid ; STRATA NIS_stratum ; run; |
The
result is 927. This should be an estimate of the number of teaching hospitals
nationwide.
OUTPUT: Produce national estimate of hospitals with
HOSP_TEACH=1 from 2007 NIS HOSPITAL File (weighted) |
Produce
national estimate of hospitals with HOSP_TEACH =1 from 2007 NIS HOSPITAL File
(weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 1044 Number
of Observations 1044 Sum
of Weights 5099 Statistics Std Error Variable
Mean of Mean Sum Std Dev teach
0.181774 0.003672 926.863668
18.722859 |
To
verify that you have weighted the data correctly, HCUPnet can be used to verify
the results for more recent years of the NIS.
1.
Because
you are looking at hospital-level data, select "Statistics on U.S. Hospitals."
2.
Select
2007 as the data year.
3.
Then
create your own hospital group. In this example, select "All bed sizes," "All
locations," and "All hospital ownership/control types."
4.
You
are only interested in teaching hospitals, so under teaching status, select
"Teaching."
5.
At
this point, you are interested in a national estimate, so select "Entire U.S."
and select "Next."
There
are 927 teaching hospitals nationwide according to HCUPnet as well as the SAS®
program.
NIS Hospital
Weights over Time
Note
that the variable for hospital weights depends on the data year. For the 1998
NIS and later years, HOSPWT should be used to create nationwide estimates. For
NIS databases prior to 1998, the variable HOSPWT_U should be used.
NIS
Unweighted Analysis
Depending
on the nature of your research, you may be interested in using the NIS as an
unweighted sample.
An
analysis of the association between hospital-level results from a survey on
safe hospital practices and hospital-level inpatient risk-adjusted mortality
scores is one example of research for which it would be appropriate to
use the NIS as an unweighted sample.
In
most cases, it is critical to weight the data to produce accurate, unbiased
results. However, if your research does not necessitate creating national or
regional estimates, do not use any weights in your programming.
NIS Summary
Let's
review what you need to do to create national or regional estimates using NIS
data.
Weight
your data with discharge weights (DISCWT) for discharge-level estimates.
Weight
your data with hospital weights (HOSPWT) for hospital-level estimates.
Check
the weighted data totals by performing a quick query on HCUPnet.
In
most cases, weighting the data is critical to producing accurate and unbiased
results. However, if you are not producing national or regional estimates, but
rather are using the NIS as an unweighted sample, do not apply weights to your
data.
If
you are interested in a more detailed statistical explanation of weighting,
refer to the documentation available on the HCUP-US website,
particularly the Methods Report on Calculating Nationwide Inpatient Sample
Variances.
Remember
that the national databases cannot be used to conduct state-level analyses. If
you are interested in performing state-level analyses, use the HCUP
state-specific databases.
Nationwide Emergency Department Sample (NEDS)
The
NEDS is a database of emergency department visits - visits for which the
patient was treated and released as well as visits that resulted in a hospital
admission - going back annually to 2006.
The
NEDS can be used to produce national and regional estimates of emergency
department care, utilization, access, costs and quality.
In
order to produce national or regional estimates of the NEDS data contained in
the Core File, you must weight the unweighted observations.
NEDS
Discharge Weights
The
weights you apply will depend on the type of analysis you are performing. Like
the NIS data, NEDS data can be weighted to produce discharge-level estimates or
hospital-level estimates.
To
produce discharge-level estimates, such as estimating the number of emergency
department visits for influenza in the US or estimating the number of emergency
department visits for hip fractures among the elderly in the US, you must apply
a discharge weight to each record in the Core File.
The
discharge weights are calculated for NEDS data by first stratifying the NEDS
hospitals on the same variables that were used for creating the sample. These
variables are geographic region, trauma center designation, urban/rural
location, teaching status, and ownership. A weight is then calculated for each
stratum by dividing the number of universe discharges in that stratum -
obtained from AHA data - by the number of NEDS discharges in the stratum.
Weighted estimates are calculated by uniformly applying the stratum weights to
the discharges according to the stratum from which the discharge was drawn.
Weights
have been assigned to each record. In each record in the NEDS Core File, the
weight is stored in the data element DISCWT. When the discharge weights are
applied to the unweighted NEDS observations, the result is an estimate of the
number of discharges for the entire universe. In the case of the NEDS, the
universe is emergency department visits nationwide.
NEDS Unweighted Discharge Record Count
This
tutorial will now demonstrate weighting the data using SAS®.
1.
As
with the NIS data, begin by running a simple program to see how many records
there are in the NEDS for which influenza is indicated as a first-listed
diagnosis.
2.
Note
that influenza is CCS code 123, so look for records in which DXCCS1 equals 123.
CODE: Count records with CCS=123 (influenza) from 2006 NEDS
File |
Title1 Count records with CCS=123 (influenza) from
2006 NEDS File'; libname neds2006 C:\NEDS 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data influenza1; set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); if dxccs1 eq 123 then influenza = 1; else influenza = 0; run; PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; VAR influenza; CLUSTER hosp_ed ; STRATA NEDS_stratum ; run; |
3.
In
the resulting output, the data summary provides the number of strata, clusters,
and total observations in the data. The summary in this example confirms a
database that contains 71 strata, 958 clusters - each cluster representing a
single hospital emergency department - and 25,954,816 records - the number of
records in the 2006 NEDS.
4.
The
statistics section provides the results of the particular analysis. The result
is 46,185. This is the number of records in the NEDS for which influenza is
indicated as a first-listed diagnosis. This is not an estimate of the number of
emergency department visits nationwide for influenza.
OUTPUT: Count records with CCS=123 (influenza) from 2006 NEDS File |
Count records with CCS=123
(influenza) from 2006 NEDS File The
SURVEYMEANS Procedure Data
Summary Number
of Strata 71 Number
of Clusters 958 Number
of Observations 25954816 Statistics Std Error Variable
Mean of Mean Sum Std Dev influenza
0.001779 0.000065779 46185 1852.887261 |
NEDS National Discharge-Level Estimates
To
estimate the number of nationwide emergency department visits with a first-listed
diagnosis of influenza weight the data using the WEIGHT keyword in the PROC
SURVEYMEANS step.
CODE: Produce national estimate of discharges with CCS=123
(influenza) from 2006 NEDS File (weighted) |
Title1 Produce national estimate of discharges with
CCS=123 (influenza) from 2006 NEDS File (weighted)'; libname neds2006 C:\NEDS 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data influenza1; set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); if dxccs1 eq 123 then influenza = 1; else influenza = 0; run; PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; VAR influenza; WEIGHT discwt; CLUSTER hosp_ed ; STRATA NEDS_stratum ; run; |
The
result is 211,740 - an estimate of the number of nationwide emergency
department visits, both those in which the patient was treated and released and
those that resulted in a hospital admission, with a first-listed diagnosis of
influenza in 2007.
OUTPUT: Produce national estimate of discharges with
CCS=123 (influenza) from 2006 NEDS File (weighted) |
Produce
national estimate of discharges with CCS=123 (influenza) from 2006 NEDS File
(weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 71 Number
of Clusters 958 Number
of Observations 25954816 Sum
of Weights 120033750 Statistics Std
Error Variable
Mean of Mean Sum Std Dev influenza
0.001764 0.000070788 211740 8898.259246 |
To
verify that you have weighted the data correctly, perform a query on HCUPnet.
1.
Go
to HCUPnet and select "National Statistics on All ED Visits."
2.
Select
"All ED Visits."
3.
You
are running a query on a particular first-listed diagnosis so select the first
option, "Statistics on specific diagnoses."
4.
Select
2006 as the data year.
5.
Use
CCS codes to identify influenza patients. Select "Diagnoses grouped by Clinical
Classifications Software (CCS)" and then "First-listed diagnosis."
6.
Highlight
CCS code 123 for influenza and select "Next."
7.
Select
"Number of discharges."
8.
Select
"All patients in all hospitals."
The
HCUPnet results and the SAS® results should be the same.
NEDS Regional Discharge-Level Estimates
Once
you have weighted the data, you might want to also produce regional estimates
of emergency department visits for influenza. If so, one method for producing
these estimates is to create a variable for region by using information
contained in the NEDS_STRATUM data element. Then, use the DOMAIN keyword to
indicate that you want to produce estimates of influenza discharges by region.
CODE: Produce regional estimates of discharges with CCS=123
(influenza) from 2006 NEDS File (weighted) |
Title1 Produce regional estimates of discharges with
CCS=123 (influenza) from 2006 NEDS File (weighted)'; libname neds2006 C:\NEDS 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data influenza1; set NEDS2006.neds_2006_core (keep=HOSP_ED DISCWT NEDS_STRATUM DXCCS1); retain edrecs 1; region = substr(left(put(neds_stratum,8.)),1,1); if dxccs1 eq 123 then influenza = 1; else influenza = 0; run; PROC SURVEYMEANS DATA=influenza1 SUM STD MEAN STDERR ; VAR edrecs; WEIGHT discwt ; CLUSTER hosp_ed ; STRATA NEDS_stratum ; DOMAIN region * influenza ; run; |
OUTPUT: Produce regional estimates of discharges with
CCS=123 (influenza) from 2006 NEDS File (weighted) |
Produce
regional estimates of discharges with CCS=123 (influenza) from 2006 NEDS File
(weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 71 Number
of Clusters 958 Number
of Observations 25954816 Sum
of Weights 120033750 Statistics Std
Error Variable
Mean of Mean Sum Std Dev edrecs
1.000000 0 120033750 2477212 Domain
Analysis: region*influenza Std
Error region
influenza Variable Mean of
Mean Sum Std
Dev 1
0 edrecs 1.000000 0 23520030
1120105 1
edrecs 1.000000 0 27885 2620.732277 2
0 edrecs 1.000000 0 27789334 1153925 1
edrecs 1.000000 0 47477 3196.225129 3
0 edrecs 1.000000 0 46771539 1656723 1
edrecs 1.000000 0 118487 7742.256091 4
0 edrecs 1.000000 0 21741106 889365 1
edrecs 1.000000 0 17892 1467.104457 |
Check
your results by running a quick query on HCUPnet.
1.
The
first part of the query on HCUPnet will be the same as that you performed for
the national estimate of emergency department visits with a first-listed
diagnosis of influenza.
2.
Select
"Number of discharges."
3.
Select
"All patients in all hospitals."
4.
In
terms of patient and hospital characteristics, this time to see the discharges
by region, select "Region of the U.S."
The
HCUPnet results and the SAS® results should be the same.
NEDS
Hospital Weights
To
produce hospital-level estimates, such as the number of US emergency
departments with a trauma center designation, you will apply a hospital weight,
HOSPWT, to the data.
HOSPWT
is also calculated according to the NEDS strata. Within each of the strata,
each hospital's weight is equal to the number of universe hospitals it
represents during the year.
To
demonstrate how to weight at the hospital level, let's consider an analysis
which requires us to estimate the number of emergency departments in the US
with a trauma center.
NEDS
Unweighted Hospital ED Record Count
First,
tabulate the number of emergency department records in the NEDS which are
classified as having a trauma center.
CODE: Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8,
or 9) from 2006 NEDS HOSPITAL File |
Title1 Count hospital records with HOSP_TRAUMA= (1,
2, 3, 8, or 9) from 2006 NEDS HOSPITAL File'; libname neds2006 C:\NEDS 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data TRAUMA1; set NEDS2006.neds_2006_hospital (keep=HOSP_ED DISCWT NEDS_STRATUM
HOSP_TRAUMA); if hosp_trauma in (1,2,3,8,9) then trauma = 1; else trauma = 0; run; PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN STDERR ; VAR trauma; CLUSTER hosp_ed ; STRATA NEDS_stratum ; run; |
The
result is 131. This is the number of emergency department records in the NEDS
with a trauma center.
OUTPUT: Count hospital records with HOSP_TRAUMA= (1, 2, 3, 8,
or 9) from 2006 NEDS HOSPITAL File |
Count
hospital records with HOSP_TRAUMA= (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL
File The
SURVEYMEANS Procedure Data
Summary Number
of Strata 71 Number
of Clusters 958 Number
of Observations 958 Statistics Std
Error Variable
Mean of Mean Sum Std Dev trauma
0.136743 0 131.000000
0 |
NEDS
National Hospital ED-Level Estimate
To
estimate the number of emergency departments with a trauma center nationwide
you will weight the data.
CODE: Produce national estimate of hospitals with
HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted) |
Title1 Produce national estimate of hospitals with
HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted)'; libname neds2006 C:\NEDS 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data TRAUMA1; set NEDS2006.neds_2006_hospital (keep=HOSP_ED HOSPWT NEDS_STRATUM
HOSP_TRAUMA); if hosp_trauma in (1,2,3,8,9) then trauma = 1; else trauma = 0; run; PROC SURVEYMEANS DATA=TRAUMA1 SUM STD MEAN
STDERR ; VAR trauma; WEIGHT hospwt; CLUSTER hosp_ed ; STRATA NEDS_stratum ; run; |
The
result is 697. This is an estimate of the number of emergency departments with
a trauma center nationwide.
OUTPUT: Produce national estimate of hospitals with
HOSP_TRAUMA = (1, 2, 3, 8, or 9) from 2006 NEDS HOSPITAL File (weighted) |
Produce
national estimate of hospitals with HOSP_TRAUMA = (1, 2, 3, 8, or 9) from
2006 NEDS HOSPITAL File (weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 71 Number
of Clusters 958 Number
of Observations 958 Sum
of Weights 4845 Statistics Std
Error Variable
Mean of Mean Sum Std Dev trauma
0.143860 0 697.000000 0 |
NEDS
Unweighted Analysis
Depending
on the nature of your research, you may be interested in using the NEDS as an
unweighted sample.
Examples
of an analysis in which it would be appropriate to use the NEDS as an
unweighted sample include a hospital-level study in which NEDS emergency
department-level data are linked to data on pre-hospital care, such as that
provided by emergency medical services.
In
most cases, it is critical to weight the data to produce accurate, unbiased
results. However, if your research does not necessitate creating national or
regional estimates, do not use any weights in your programming.
NEDS Summary
Let's
review what you need to do to create national or regional estimates using NEDS
data.
Weight
your data with discharge weights (DISCWT) for discharge-level estimates.
Weight
your data with hospital weights (HOSPWT) for hospital-level estimates.
Check
the weighted data totals by performing a quick query on HCUPnet.
In
most cases, weighting the data is critical to producing accurate and unbiased
results. However, if you are not producing national or regional estimates, but
rather are using the NEDS as an unweighted sample, do not apply weights to your
data.
Remember
that the national databases cannot be used to conduct state-level analyses. If
you are interested in performing state-level analyses, use the HCUP
state-specific databases.
Kids'
Inpatient Database (KID)
The
third national database is the KID - the database designed specifically for the
study of pediatric conditions that require hospitalization.
The
KID is produced every three years starting with the 1997 data year.
In
order to produce national or regional estimates using KID data, you must weight
the unweighted observations in the KID file.
KID Discharge Weights
KID
data must be weighted to perform discharge-level analyses. Because of the
sample design of the KID, it cannot be used as an unweighted database. For more
information on the unique sample design of the KID, see the Sample
Design tutorial.
Discharge
weights are calculated for KID data by stratifying the hospitals on the same
variables that were used for creating the sample and then creating weights by
stratum. The stratifying variables are geographic region, urban/rural location,
teaching status, bed size, ownership, and children's hospital.
Remember
that the KID was designed for the study of pediatric hospitalizations, and that
the discharges in the KID are a combination of newborn discharges (including
both complicated and uncomplicated) and non-newborn pediatric discharges.
Because of this, for each stratum, weights are created for both newborn
discharges and non-newborn pediatric discharges.
The
weights are created for newborn discharges (both complicated and
non-complicated) by dividing the number of universe newborns in the stratum by
the number of KID newborns in the stratum.
And
the weights are created for non-newborn discharges by dividing the number of
universe non-newborn pediatric discharges in the stratum by the number of KID
non-newborn discharges in the stratum.
Weighted
estimates are generated by applying the stratum weights to the discharges
according to the stratum from which the discharge was drawn and according to
whether the discharge is newborn or non-newborn.
KID Unweighted Discharge Record Count
Next,
this tutorial will demonstrate how to weight the KID to produce national and
regional estimates of pediatric hospital discharges with a principal diagnosis
of cystic fibrosis - a discharge-level estimate. Remember that pediatric
discharges are defined as those for which the patient was age 20 or less at
admission.
1.
Tabulate
the number of records in the KID for which cystic fibrosis is indicated as a
principal diagnosis. In other words, records in which DXCCS1 equals 56.
CODE: Count records with CCS=56 (cystic fibrosis) from 2006
KID File |
Title1 Count records with CCS=56 (cystic fibrosis)
from 2006 KID File'; libname kid2006 "C:\KID 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data cf1; set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_stratum); if dxccs1 eq 56 then cysticf = 1; else cysticf = 0; run; PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ; VAR cysticf; CLUSTER hospid ; STRATA KID_stratum ; run; |
2.
The
resulting output provides a data summary of the number of strata, clusters, and
total observations in the data. The summary shown here confirms a database that
contains 60 strata, 3,739 clusters–each cluster representing a single
hospital–and 3,131,324 records, the number of records in the 2006 KID.
3.
The
statistics section provides the results of the particular analysis. The result
is 4,063. This is the number of records in the KID for which cystic fibrosis is
indicated as a principal diagnosis. This is not an estimate of pediatric
discharges nationwide for cystic fibrosis.
OUTPUT: Count records with CCS=56 (cystic fibrosis) from
2006 KID File |
Count
records with CCS=56 (cystic fibrosis) from 2006 KID File The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 3739 Number
of Observations 3131324 Statistics Std
Error Variable
Mean of Mean Sum Std Dev cysticf
0.001298 0.000103 4063.000000 346.994786 |
KID National Discharge-Level Estimates
Weight
the data using the WEIGHT keyword in the PROC SURVEYMEANS step.
CODE: Produce national estimate of discharges with CCS=56
(cystic fibrosis) from 2006 KID File (weighted) |
Title1 Produce national estimate of discharges with
CCS=56 (cystic fibrosis) from 2006 KID File (weighted)'; libname kid2006 C:\KID 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data cf1; set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1 KID_STRATUM); if dxccs1 eq 56 then cysticf = 1; else cysticf = 0; run; PROC SURVEYMEANS DATA=cf1 SUM STD MEAN STDERR ; VAR cysticf; WEIGHT discwt; CLUSTER hospid ; STRATA KID_stratum ; run; |
The
result is 6,947. This is an estimate of the number of pediatric hospital
discharges, nationwide, with a principal diagnosis of cystic fibrosis in 2006.
OUTPUT: Produce national estimate of discharges with CCS=56
(cystic fibrosis) from 2006 KID File (weighted) |
Produce
national estimate of discharges with CCS=56 (cystic fibrosis) from 2006 KID
File (weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 3739 Number
of Observations 3131324 Sum
of Weights 7558812.48 Statistics Std
Error Variable
Mean of Mean Sum Std Dev cysticf
0.000919 0.000075764 6946.648756 601.770739 |
Note
that HCUPnet contains a path for querying National Statistics on Children.
However, this path provides statistics only on discharges in which the patient
was age 17 or less. The KID file from the Central Distributor provides data on
discharges where patients are age 20 or less. As a result, the weighted
estimates produced will not exactly match those provided by HCUPnet unless you
limit the data set to those age 17 or less when creating your estimate of
discharges.
That
said, use HCUPnet to get a ballpark idea of what the estimate should be.
1.
Select
"National Statistics on Children."
2.
Select
"Researcher, medical professional."
3.
Select
"Statistics on specific diagnoses or procedures."
4.
Select
2006 as the data year.
5.
Use
CCS codes to identify cystic fibrosis patients, so select "Diagnoses grouped by
Clinical Classifications Software (CCS)" and then "Principal diagnosis."
6.
Highlight
CCS code 56 for cystic fibrosis and select "Next."
7.
Select
"Number of discharges."
8.
Select
"All patients in all hospitals."
The
HCUPnet total will be smaller than the total produced by SAS®, but in the same
overall range.
KID Regional Discharge-Level Estimates
Once
you have weighted the data, you might want to also produce regional estimates
of emergency department visits for cystic fibrosis. Because the design of the
KID is different from that of the NIS and NEDS, you need to use a different
method for creating regional estimates than the one used above. For the KID,
you have to merge the Hospital File with the Core File in order to pick up the
hospital region data element. Then, you use the DOMAIN keyword to indicate that
you want to produce estimates of cystic fibrosis discharges by region. Note
that the same method can be used to produce regional estimates from the NIS and
NEDS.
CODE: Produce regional estimates of discharges with CCS=56
(cystic fibrosis) from 2006 KID File (weighted) |
Title1 'Produce regional estimates of discharges with
CCS=56 (cystic fibrosis) from 2006 KID File (weighted)'; libname kid2006 "C:\KID 2006\"; options obs = MAX PageSize=51 LineSize=146 ; data cf1; set KID2006.kid_2006_core (keep=HOSPID DISCWT DXCCS1); retain dischgs 1; if dxccs1 eq 56 then cysticf = 1; else cysticf = 0; run; proc sort data=cf1; by hospid; run; proc sort data=KID2006.kid_2006_hospital (keep=HOSPID KID_STRATUM Hosp_region) out=hosp; by hospid; run; data cf2; merge cf1 (in=a) hosp (in=b); by hospid; if a and b; region = Hosp_region ; run; PROC SURVEYMEANS DATA=cf2 SUM STD MEAN STDERR ; VAR dischgs; WEIGHT discwt ; CLUSTER hospid ; STRATA KID_stratum ; DOMAIN region * cysticf ; run; |
OUTPUT: Produce regional estimates of discharges with
CCS=56 (cystic fibrosis) from 2006 KID File (weighted) |
Produce
regional estimates of discharges with CCS=56 (cystic fibrosis) from 2006 KID
File (weighted) The
SURVEYMEANS Procedure Data
Summary Number
of Strata 60 Number
of Clusters 3739 Number
of Observations 3131324 Sum
of Weights 7558812.48 Statistics Std
Error Variable
Mean of
Mean Sum Std Dev dischgs
1.000000 0 7558812 123453 Domain
Analysis: cysticf*region Std
Error cysticf
region Variable Mean
of Mean Sum Std Dev 0
1 dischgs 1.000000 0 1276711 56801 2
dischgs 1.000000 0 1645709 59677 3 dischgs 1.000000 0 2894738 92104 4
dischgs 1.000000
0 1734707 67196 1
1 dischgs 1.000000 0 1302.943850 329.418971 2
dischgs 1.000000 0 2002.856118 397.992297 3
dischgs 1.000000 0 2222.916758 365.441907 4
dischgs 1.000000 0 1417.932030 304.223111 |
KID Weights over Time
The
KID discharge weight variable has changed over time.
Years |
Variable Name |
Use |
2003 and later |
DISCWT |
All national estimates |
2000 |
DISCWT |
National estimates except those
including total charge |
2000 |
DISCWTcharge |
National estimates of total
charge |
1997 |
DISCWT_U |
All national estimates |
KID Limitations
Unlike
NIS and NEDS data, KID data cannot be used to produce hospital-level estimates
because the data for the KID was sampled at the discharge level rather than at
the hospital level.
Note
that the KID cannot be used as an unweighted sample because of the design of
the database.
KID Summary
Let's
review what you need to do to create national or regional estimates using KID
data.
Weight
your data with discharge weights (DISCWT) for discharge-level estimates.
Check
the weighted data totals by performing a quick query on HCUPnet.
Hospital-level
analyses are not possible with KID data.
The
KID cannot be used as an unweighted sample. Weights should always be applied to
the data.
Remember
that the national databases cannot be used to conduct state-level analyses. If
you are interested in performing state-level analyses, use the HCUP
state-specific databases.
Key Points
In
summary, weighting is a key concept when working with the HCUP national
databases.
What
to do:
Remember
that the NIS, NEDS, and KID are sample databases. Thus, to produce national or
regional estimates from these databases, you must be sure to properly weight
the data.
It
is important that you select the proper weight based on the database, the year
of data, and the type of analysis you are conducting.
Check
your estimates against HCUPnet to ensure that you are using the weights
appropriately and calculating estimates and variances accurately.
And
keep in mind that proper statistical techniques must also be used to calculate
standard errors and confidence intervals when using each of the national databases.
For detailed instructions, refer to the special report Calculating Nationwide Inpatient Sample Variances on the
HCUP-US Website.
What
not to do:
State-level
analyses cannot be conducted with the national HCUP databases because the
sampling frames are not designed with state as a stratification variable. If
you are interested in analyses by state, you should use the state-specific
databases.
HCUPnet
cannot be used to check unweighted estimates, as weights have been applied to
the HCUPnet data.
Remember
that the KID cannot be used as an unweighted database.
Resources and Other Training
If
you are looking for more information on the subject matter covered here,
several resources are available on the HCUP User Support website.
If
you can't find what you need, feel free to email the HCUP Technical Assistance
staff at hcup@ahrq.gov. AHRQ has senior
research personnel available to answer technical questions you may have.
Thank
you for accessing this module. There are several other HCUP
online tutorials. Access these tutorials to learn if there are other topics
that could be helpful to you.
If you have any feedback regarding this module, please email us at hcup@ahrq.gov.