1997
Data Quality
All the statistics published
in this report are estimates of population values, such as the number
of households using natural gas. These estimates are based on a randomly
chosen subset of the entire population of households. The universe includes
all households in the 50 States and the District of Columbia, including
households on military installations.
The two major types of nonresponse
are unit nonresponse and item nonresponse. Unit nonresponse occurs when
a sampled household does not participate in the survey. Item nonresponse
occurs when a particular item of interest is missing from a completed
questionnaire. The next two sections provide details on the procedures
followed for each type of nonresponse.
Adjustments for Unit Nonresponse
Weight adjustment was used
to reduce unit nonresponse bias in the survey statistics. Weights were
calculated for each sample household. The household weight reflected
the selection probability for that household and additional adjustments.
These adjustments included correcting for potential biases arising from
the failure to list all housing units in the sample area and failure
to contact all sample housing units. Contacts were unsuccessful with
19.0 percent of the eligible units.
Six factors are used in the
processing of Residential Energy Consumption Survey (RECS) results to
develop an overall weight for each household for which a completed questionnaire,
either a personal interview or mailed questionnaire, is obtained. The
factors are the basic weight, a noninterview adjustment, a first-stage
ratio estimate, and three second-stage ratio adjustments. The overall
household weight is the product of these six factors.
The Basic Weight
The basic weight is calculated and applied
to households at the Secondary Sampling Unit (SSU) level.
Basic Weight = 1/ (Probability
of Selection)
For the 1997 RECS, all households in the
same SSU had the same probability of selection and hence the same basic
weight.
The Noninterview Adjustment
The noninterview adjustment factor
(NIAF) compensates for nonresponse households and for nonhousehold units
that were identified during the survey. Basically, this adjustment reflects
the ratio of the number of completed and uncompleted responses among
those selected to the number of completed responses. Since the probabilities
of selection are constant within an SSU for 1997, these adjustments
were applied at the SSU level.
The NIAF is computed at the SSU and is equal
to:
(Total Completed Plus Uncompleted
Responses in the SSU / (Completed Responses in the SSU)
If the ratio exceeds 2.0, then the NIAF is
set equal to 2.0 and the NIAFs for SSUs in the same Primary Sampling
Unit (PSU) and with the same metropolitan status are increased.
The First-Stage Ratio Adjustment Factors
The primary purpose of the first-stage
adjustment factor is to reduce the sampling variation in the estimates
of the number of housing units by main space-heating fuel resulting
from sampling of PSUs during the first stage of the sample design. The
correlation between main space-heating fuel and other important energy-related
characteristics implies that this adjustment will also reduce the sampling
variation for many important variables collected for the RECS.
In some cases, a single PSU comprising
all or part of a large metropolitan area was large enough in population
to be a stratum by itself. PSUs of this type are called Self-Representing
(SR) PSUs because the sample from each SR PSU represents only that PSU.
The first-stage ratio adjustment factor was 1.0 for all observations
in SR PSUs.
In other strata, one PSU was
selected from among two or more PSUs in the stratum. Each of the PSUs
selected from these strata is called a Non-Self-Representing (NSR) PSU
because each such PSU represents not only itself; it also represents
the unselected PSUs in the stratum.
The 1990 Census data were used
to determine the difference between the distribution of the main space-heating
fuel in the set of selected NSR PSUs and the distribution in the set
of all PSUs (selected and unselected) in the strata from which the NSR
PSUs are selected. Fuels are under-represented if the percentage of
households using the fuel is lower in the selected NSR PSUs than the
percentage in the set of all PSUs in the NSR strata. Fuels are over-represented
if the opposite occurs. The weights for the responding households in
NSR PSUs are adjusted upward when their main space-heating fuel is under-represented
and the weights are adjusted downward when it is over-represented.
The Second-Stage Ratio Adjustments
The second-stage ratio adjustments
are used to improve the accuracy of the estimates of the number of households
using data obtained from the Bureau of the Census as control totals.
The RECS can be used to produce an estimate of the number of households
in the country, but the Bureau of the Census produces much more accurate
estimates. Improving the accuracy of the data on the number of households
also improves the accuracy of almost all other estimates obtained from
the RECS. The first priority is the accuracy of estimates for the number
of households for the nine Census divisions and for the four largest
States. The second priority is the accuracy of estimates for the number
of households for three demographic cells (multiperson households, single-member
female households, and single-member male households).
The ratio adjustment process
was carried out in three steps. In step one, the population was divided
into 15 geographical cells. (Hawaii and Alaska were treated as separate
cells because their climate is different than the rest of the country.)
Control totals giving the number of households in each cell were derived
from Current Population Survey results. A ratio adjustment equal to
the control total divided by the weighted count using the weights after
the first-stage ratio adjustment was created. Multiplying the weights
after the first-stage ratio adjustment by the ratio yields the new weights
which, when summed, equal the control totals for the 15 cells. This
calculation yielded a weighted total number of households equal to 101,481,000.
Refer to Table B1 for estimates for each of the 15 geographical areas.
The third step is the same as
the first step except that the input weights are those resulting from
the second step. This produced a set of weights whose sum reproduced
the 15 geographic cell control totals and yielded estimates that are
quite close to the control totals for the three demographic cells.
Table
B1. Control Totals for Ratio Adjustment of Sampling in the 1997
RECS |
|
|
Location |
Thousands of Households |
New England |
5,310 |
Middle Atlantic
(minus New York State) |
7,597 |
East North Central
|
16,907 |
West North Central
|
7,153 |
South Atlantic (minus
Florida) |
12,764 |
East South Central
|
6,344 |
West South Central
(minus Texas) |
3,876 |
Mountain |
6,179 |
Pacific (minus Alaska,
California, and Hawaii) |
3, 532 |
New York |
6,827 |
Florida |
5,929 |
Texas |
6,964 |
California |
11,484 |
Alaska |
229 |
Hawaii |
386 |
Total United
States |
101,481 |
Source: EIA's linear extrapolation
from U.S. Bureau of the Census, 1996 and 1997 Current Population
Survey. |
Adjustments for Item
Nonresponse
Item nonresponse occurs
when respondents do not know the answer or refuse to answer a question,
or when an interviewer does not ask a question or does not record an
answer. The incidence of the latter, the interviewer not asking and/or
not recording the answer, was greatly reduced by the use of Computer
Assisted Personal Interviewing (CAPI). The majority of nonresponse was
due to interviewers recording answers of "Don't Know" and "Refused."
Some item nonresponse was due to programming problems in the questionnaire.
Table B2 lists the most frequently imputed items in the 1997 RECS.
The number of item imputations
for the 181 households receiving mail questionnaires was considerable,
since these questionnaires contained only a small subset of questions
from the household interview. For the mail questionnaires, a modified
hot-deck imputation method was used. A hot-deck matrix was created for
mail questionnaires and personal-interview households using Census region,
type of housing unit structure, space-heating fuel, water-heating fuel,
and presence and type of air-conditioning. Whenever possible, a donor
personal-interview household was chosen for each mail questionnaire
household from the same cell of the hot-deck matrix. For 90 percent
of the mail questionnaires, donors matched on all hot-deck variables.
Table B2. Household
Questionnaire Items Most Frequently Imputed in the 1997 RECS
Imputed
Item |
Cases
Imputed |
Percentage
of Total Samplea
(5,721) |
Method
of Imputing |
Question
Number on Questionnaire |
Income in past 12 months |
1,016 |
17.8 |
Hot deck |
J-14a |
Year home was built |
395 |
6.9 |
Hot deck |
A-15a |
Age of water-heating equipment
|
348 |
6.1 |
Deductive/Hot
deck |
E-4 |
Way household used central
AC equipment |
297 |
5.2 |
Hot deck |
F-6a |
Number of children between
the ages of 1 and 12 |
250 |
4.4 |
Hot deck |
J-1e |
Number of infants under the
age of 1 |
238 |
4.2 |
Hot deck |
J-1d |
Way household used Window/Wall
AC equipment |
149 |
2.6 |
Hot deck |
F-11 |
Use programmable or manual
features of thermostat |
126 |
2.2 |
Hot deck |
F-6b |
Fuel used to heat hot water
|
122 |
2.1 |
Hot deck |
E-1 |
Electricity shut off because
bill was not paid |
120 |
2.1 |
Hot deck |
K-4 |
Could not use heat because
ran out of bulk fuel |
120 |
2.1 |
Hot deck |
K-5a |
Could not use heat because
utility fuel shut off |
199 |
2.1 |
Hot deck |
K-5b |
Could not use heat because
equipment broken |
119 |
2.1 |
Hot deck |
K-5c |
Amount of heat provided by
main heating equipment |
108 |
1.9 |
Hot deck |
D-6 |
Type of self-cleaning oven
|
104 |
1.8 |
Hot deck |
B-3 |
Received employment income
in last 12 months |
103 |
1.8 |
Hot deck |
K-1a |
Received retirement income
in last 12 months |
103 |
1.8 |
Hot deck |
K-1b |
Received cash benefits in
last 12 months |
103 |
1.8 |
Hot deck |
K-1c |
Received non-cash benefits
in last 12 months |
103 |
1.8 |
Hot deck |
K-1d |
Government help in paying
home heating costs |
102 |
1.8 |
Hot deck |
K-2a |
Government help in paying
home cooling costs |
102 |
1.8 |
Hot deck |
K-2b |
Government help in paying
other home energy costs |
102 |
1.8 |
Hot deck |
K-2c |
Amount of wood burning in
past 12 months |
97 |
1.7 |
Hot deck |
H-7d |
Age of householder |
93 |
1.6 |
Allocative |
J-9 |
Amount of heating assistance
received |
82 |
1.4 |
Hot deck |
K-3d |
aMailed
interviews are not included in the percentage. To account for
these, add 3 percentage points to the percentage points given.
Source: Energy Information Administration,
Office of Energy Markets and End Use, Form EIA-457 A of the 1997
Residential Energy Consumption Survey (RECS). RECS Public Use
Data Files. |
The use of CAPI techniques
allowed EIA to program skip patterns, edit checks, and range checks
into the questionnaire. As a result, the quality of the data collected
during the interview improved and the amount of time needed to edit
and clean the data was reduced. Some of this improvement can be attributed
to the fact that the 1997 RECS questionnaire was shorter than the 1993
RECS questionnaire. But the switch to CAPI did result in cleaner data.
For example, the data collected during the paper and pencil interviews
for the 1993 RECS resulted in 40 variables with more than 100 cases
where there were missing data. On the other hand, the data collected
during the CAPI interviews for the 1997 RECS resulted in only 22 variables
with more than 100 cases where there were missing data.
The questions on both income
and year home was built have resulted in a substantial amount of missing
data for each RECS. The 1997 RECS was no exception. The large amount
of missing data for the age of the water-heating equipment, the number
of children, and the number of infants was caused by errors in the skip
patterns in the CAPI questionnaire. The plans the 1997 RECS questionnaire
included a question concerning the use of evaporative or swamp coolers
in housing units located in hot, dry areas of the country and a question
concerning the use of automobile block heaters in cold areas of the
country, but errors in the skip patterns forced the CAPI instrument
to skip these questions for all households.
Quality of Specific Data
Items
Housing Unit Type
There is a fine line between
the definitions of various types of housing units. The distinction between
a single-family attached unit and a unit in an apartment building is
particularly complex. The collection and editing of the data on housing
type changed from the paper-and-pencil questionnaire for the 1993 RECS
to the CAPI questionnaire for the 1997 RECS. The change in the data
collection and editing procedures may have contributed to changes in
the survey results. For example, the estimated number of occupied single-family
attached units increased from 7.3 million for the 1993 RECS to 10.0
million for the 1997 RECS. Conversely, the number of occupied housing
units in buildings with two to four units decreased from 8.0 million
for the 1993 RECS to 5.6 million for the 1997 RECS.
Programmable (Set-Back or Clock) Thermostats
The 1993 and 1997 RECS
both contained questions on the presence of a programmable thermostat.
In both surveys, the thermostats were referred to as "set-back or clock
thermostats," but not programmable thermostats. For the 1993 RECS, the
question was placed in the section on conservation measures and usage
(following questions on insulation, weather stripping, and caulking).
For the 1997 RECS, it was placed in the space-heating section, immediately
following the question on the presence of a thermostat. The 1997 RECS
also included a question that asked respondents if they programmed the
thermostat or used the manual features. Based on the 1993 RECS, an estimated
10.8 million households had programmable thermostats in 1993. Based
on the 1997 RECS, an estimated 33.1 million households had programmable
thermostats in 1997. Of these 33.1 million, an estimated 10.2 million
programmed their thermostats and an estimated 22.9 million used the
manual features.
The large increase in the number
of housing units with programmable thermostats from 1993 to 1997 is
questionable. The change in the placement of the question may have contributed
to the large change in the survey results. In addition, the question
concerning programmed versus manual use of the thermostats may have
changed how the interviewers coded the question on the presence of a
programmable thermostat.
Estimation of Sampling Error
Sampling error is the random
difference between a survey estimate and an actual population value.
It occurs because the survey estimate is calculated from a randomly
chosen subset of the entire population. The sampling error averaged
over all possible samples would be zero, but there is only one sample
for the 1997 RECS. Therefore, the sampling error is not zero and is
unknown for the 1997 RECS sample. However, the sample design permits
sampling errors to be estimated. This section describes how the sampling
errors were estimated and how they were made available to readers of
this report who are interested in the precision of the estimates in
this report.
Throughout this report, standard
errors are given as percents of their estimated values; that is, as
relative standard errors (RSE). The RSE is also known as the coefficient
of variation.
For a given population parameter
Y that is estimated by the survey statistic Y, the relative standard
error of Y, RSE(Y), and standard error of Y, S(Y), are given by:
RSE(Y) = [S(Y)/Y] × 100.
S(Y) = [RSE(Y)/100] × Y.
For some surveys, a convenient
algebraic formula for computing variances can be obtained. However,
the RECS used a multistage area sample design of such complexity (see
Appendix A, "How the Survey Was Conducted") that it is virtually impossible
to construct an exact algebraic expression for estimating variances.
In particular, convenient formulas based on an assumption of simple
random sampling, typical of most standard statistical packages, are
inappropriate for the RECS estimates. Such formulas tend to give low
values for standard errors, making the estimates appear much more accurate
than is the case. Instead, the method used to estimate sampling variances
for this survey was balanced half-sample replication. The balanced half-sample
replication method involves calculating the value for a statistic using
the full sample and calculating the value for each of a systematic set
of half samples. (Each half sample contains approximately one-half of
the observations contained in the full sample.) The variance is estimated
using the differences between the value of the statistic calculated
using the full sample and the values of the statistic calculated using
each of the half samples.
Generalized Variances
For every estimate in this
report, the RSE was computed by the balanced half-sample replication
method. This RSE was used for any statistical tests or confidence intervals
given in the text, or to determine if the estimate was too inaccurate
to publish (RSE greater than 50 percent).
Space limitations prevent publishing
the complete set of RSEs with this document. Instead, a generalized
variance technique is provided, by which the reader can compute an approximate
RSE for each of the estimates in the detailed tables. For the statistic
in the ith row and jth column of a particular
table, the approximate RSE is given by:
RSE(i,j) = R(i) × C(j)
where R(i) is the RSE row factor
given in the last column of row i, and C(j) is the RSE column factor
given at the top of column j. This value for the relative standard error
can be used to construct confidence intervals and to perform hypothesis
tests by standard statistical methods. However, because the generalized
variance procedure gives only approximate RSEs, such confidence intervals
and statistical tests must also be regarded as only approximate.
Return to Contents Page
Contact:
- Eileen M. O'Brien
- RECS Survey Manager
- Phone: (202) 586-1122
- Fax: (202) 586-0018
-
If you have any
technical problems with this site, please contact the EIA Webmaster
at wmaster@eia.doe.gov Phone: (202) 586-8959 Page last modified
on
02/25/2004
|