1997 Data Quality-1997 Residential Energy Consumption Survey(RECS)

Welcome to the U.S. Residential Homepage.  If you are having trouble, call 202-586-8800 for help" Energy Information Administration's Residential Homepage.  If you are having trouble, call 202-586-8800 for help

  Home > >Energy Users > >Residential Home > 1997 Data Quality

 
1997 Data Quality

All the statistics published in this report are estimates of population values, such as the number of households using natural gas. These estimates are based on a randomly chosen subset of the entire population of households. The universe includes all households in the 50 States and the District of Columbia, including households on military installations.

The two major types of nonresponse are unit nonresponse and item nonresponse. Unit nonresponse occurs when a sampled household does not participate in the survey. Item nonresponse occurs when a particular item of interest is missing from a completed questionnaire. The next two sections provide details on the procedures followed for each type of nonresponse.

Adjustments for Unit Nonresponse

Weight adjustment was used to reduce unit nonresponse bias in the survey statistics. Weights were calculated for each sample household. The household weight reflected the selection probability for that household and additional adjustments. These adjustments included correcting for potential biases arising from the failure to list all housing units in the sample area and failure to contact all sample housing units. Contacts were unsuccessful with 19.0 percent of the eligible units.

Six factors are used in the processing of Residential Energy Consumption Survey (RECS) results to develop an overall weight for each household for which a completed questionnaire, either a personal interview or mailed questionnaire, is obtained. The factors are the basic weight, a noninterview adjustment, a first-stage ratio estimate, and three second-stage ratio adjustments. The overall household weight is the product of these six factors.


The Basic Weight

The basic weight is calculated and applied to households at the Secondary Sampling Unit (SSU) level.

Basic Weight = 1/ (Probability of Selection)

For the 1997 RECS, all households in the same SSU had the same probability of selection and hence the same basic weight.

The Noninterview Adjustment

The noninterview adjustment factor (NIAF) compensates for nonresponse households and for nonhousehold units that were identified during the survey. Basically, this adjustment reflects the ratio of the number of completed and uncompleted responses among those selected to the number of completed responses. Since the probabilities of selection are constant within an SSU for 1997, these adjustments were applied at the SSU level.

The NIAF is computed at the SSU and is equal to:

(Total Completed Plus Uncompleted Responses in the SSU / (Completed Responses in the SSU)

If the ratio exceeds 2.0, then the NIAF is set equal to 2.0 and the NIAFs for SSUs in the same Primary Sampling Unit (PSU) and with the same metropolitan status are increased.


The First-Stage Ratio Adjustment Factors

The primary purpose of the first-stage adjustment factor is to reduce the sampling variation in the estimates of the number of housing units by main space-heating fuel resulting from sampling of PSUs during the first stage of the sample design. The correlation between main space-heating fuel and other important energy-related characteristics implies that this adjustment will also reduce the sampling variation for many important variables collected for the RECS.

In some cases, a single PSU comprising all or part of a large metropolitan area was large enough in population to be a stratum by itself. PSUs of this type are called Self-Representing (SR) PSUs because the sample from each SR PSU represents only that PSU. The first-stage ratio adjustment factor was 1.0 for all observations in SR PSUs.

In other strata, one PSU was selected from among two or more PSUs in the stratum. Each of the PSUs selected from these strata is called a Non-Self-Representing (NSR) PSU because each such PSU represents not only itself; it also represents the unselected PSUs in the stratum.

The 1990 Census data were used to determine the difference between the distribution of the main space-heating fuel in the set of selected NSR PSUs and the distribution in the set of all PSUs (selected and unselected) in the strata from which the NSR PSUs are selected. Fuels are under-represented if the percentage of households using the fuel is lower in the selected NSR PSUs than the percentage in the set of all PSUs in the NSR strata. Fuels are over-represented if the opposite occurs. The weights for the responding households in NSR PSUs are adjusted upward when their main space-heating fuel is under-represented and the weights are adjusted downward when it is over-represented.

The Second-Stage Ratio Adjustments

The second-stage ratio adjustments are used to improve the accuracy of the estimates of the number of households using data obtained from the Bureau of the Census as control totals. The RECS can be used to produce an estimate of the number of households in the country, but the Bureau of the Census produces much more accurate estimates. Improving the accuracy of the data on the number of households also improves the accuracy of almost all other estimates obtained from the RECS. The first priority is the accuracy of estimates for the number of households for the nine Census divisions and for the four largest States. The second priority is the accuracy of estimates for the number of households for three demographic cells (multiperson households, single-member female households, and single-member male households).

The ratio adjustment process was carried out in three steps. In step one, the population was divided into 15 geographical cells. (Hawaii and Alaska were treated as separate cells because their climate is different than the rest of the country.) Control totals giving the number of households in each cell were derived from Current Population Survey results. A ratio adjustment equal to the control total divided by the weighted count using the weights after the first-stage ratio adjustment was created. Multiplying the weights after the first-stage ratio adjustment by the ratio yields the new weights which, when summed, equal the control totals for the 15 cells. This calculation yielded a weighted total number of households equal to 101,481,000. Refer to Table B1 for estimates for each of the 15 geographical areas.

The third step is the same as the first step except that the input weights are those resulting from the second step. This produced a set of weights whose sum reproduced the 15 geographic cell control totals and yielded estimates that are quite close to the control totals for the three demographic cells.

Table B1. Control Totals for Ratio Adjustment of Sampling in the 1997 RECS
Location
Thousands of Households
New England
5,310
Middle Atlantic (minus New York State)
7,597
East North Central
16,907
West North Central
7,153
South Atlantic (minus Florida)
12,764
East South Central
6,344
West South Central (minus Texas)
3,876
Mountain
6,179
Pacific (minus Alaska, California, and Hawaii)
3, 532
New York
6,827
Florida
5,929
Texas
6,964
California
11,484
Alaska
229
Hawaii
386
Total United States
101,481

     Source: EIA's linear extrapolation from U.S. Bureau of the Census, 1996 and 1997 Current Population Survey.

Adjustments for Item Nonresponse

Item nonresponse occurs when respondents do not know the answer or refuse to answer a question, or when an interviewer does not ask a question or does not record an answer. The incidence of the latter, the interviewer not asking and/or not recording the answer, was greatly reduced by the use of Computer Assisted Personal Interviewing (CAPI). The majority of nonresponse was due to interviewers recording answers of "Don't Know" and "Refused." Some item nonresponse was due to programming problems in the questionnaire. Table B2 lists the most frequently imputed items in the 1997 RECS.

The number of item imputations for the 181 households receiving mail questionnaires was considerable, since these questionnaires contained only a small subset of questions from the household interview. For the mail questionnaires, a modified hot-deck imputation method was used. A hot-deck matrix was created for mail questionnaires and personal-interview households using Census region, type of housing unit structure, space-heating fuel, water-heating fuel, and presence and type of air-conditioning. Whenever possible, a donor personal-interview household was chosen for each mail questionnaire household from the same cell of the hot-deck matrix. For 90 percent of the mail questionnaires, donors matched on all hot-deck variables.

Table B2. Household Questionnaire Items Most Frequently Imputed in the 1997 RECS
Imputed Item Cases Imputed Percentage of Total Samplea
(5,721)
Method of Imputing Question Number on Questionnaire
Income in past 12 months 1,016 17.8 Hot deck J-14a
Year home was built 395 6.9 Hot deck A-15a
Age of water-heating equipment 348 6.1 Deductive/Hot deck E-4
Way household used central AC equipment 297 5.2 Hot deck F-6a
Number of children between the ages of 1 and 12 250 4.4 Hot deck J-1e
Number of infants under the age of 1 238 4.2 Hot deck J-1d
Way household used Window/Wall AC equipment 149 2.6 Hot deck F-11
Use programmable or manual features of thermostat 126 2.2 Hot deck F-6b
Fuel used to heat hot water 122 2.1 Hot deck E-1
Electricity shut off because bill was not paid 120 2.1 Hot deck K-4
Could not use heat because ran out of bulk fuel 120 2.1 Hot deck K-5a
Could not use heat because utility fuel shut off 199 2.1 Hot deck K-5b
Could not use heat because equipment broken 119 2.1 Hot deck K-5c
Amount of heat provided by main heating equipment 108 1.9 Hot deck D-6
Type of self-cleaning oven 104 1.8 Hot deck B-3
Received employment income in last 12 months 103 1.8 Hot deck K-1a
Received retirement income in last 12 months 103 1.8 Hot deck K-1b
Received cash benefits in last 12 months 103 1.8 Hot deck K-1c
Received non-cash benefits in last 12 months 103 1.8 Hot deck K-1d
Government help in paying home heating costs 102 1.8 Hot deck K-2a
Government help in paying home cooling costs 102 1.8 Hot deck K-2b
Government help in paying other home energy costs 102 1.8 Hot deck K-2c
Amount of wood burning in past 12 months 97 1.7 Hot deck H-7d
Age of householder 93 1.6 Allocative J-9
Amount of heating assistance received 82 1.4 Hot deck K-3d
     aMailed interviews are not included in the percentage. To account for these, add 3 percentage points to the percentage points given.
     Source: Energy Information Administration, Office of Energy Markets and End Use, Form EIA-457 A of the 1997 Residential Energy Consumption Survey (RECS). RECS Public Use Data Files.

The use of CAPI techniques allowed EIA to program skip patterns, edit checks, and range checks into the questionnaire. As a result, the quality of the data collected during the interview improved and the amount of time needed to edit and clean the data was reduced. Some of this improvement can be attributed to the fact that the 1997 RECS questionnaire was shorter than the 1993 RECS questionnaire. But the switch to CAPI did result in cleaner data. For example, the data collected during the paper and pencil interviews for the 1993 RECS resulted in 40 variables with more than 100 cases where there were missing data. On the other hand, the data collected during the CAPI interviews for the 1997 RECS resulted in only 22 variables with more than 100 cases where there were missing data.

The questions on both income and year home was built have resulted in a substantial amount of missing data for each RECS. The 1997 RECS was no exception. The large amount of missing data for the age of the water-heating equipment, the number of children, and the number of infants was caused by errors in the skip patterns in the CAPI questionnaire. The plans the 1997 RECS questionnaire included a question concerning the use of evaporative or swamp coolers in housing units located in hot, dry areas of the country and a question concerning the use of automobile block heaters in cold areas of the country, but errors in the skip patterns forced the CAPI instrument to skip these questions for all households.

 

Quality of Specific Data Items

Housing Unit Type

There is a fine line between the definitions of various types of housing units. The distinction between a single-family attached unit and a unit in an apartment building is particularly complex. The collection and editing of the data on housing type changed from the paper-and-pencil questionnaire for the 1993 RECS to the CAPI questionnaire for the 1997 RECS. The change in the data collection and editing procedures may have contributed to changes in the survey results. For example, the estimated number of occupied single-family attached units increased from 7.3 million for the 1993 RECS to 10.0 million for the 1997 RECS. Conversely, the number of occupied housing units in buildings with two to four units decreased from 8.0 million for the 1993 RECS to 5.6 million for the 1997 RECS.


Programmable (Set-Back or Clock) Thermostats

The 1993 and 1997 RECS both contained questions on the presence of a programmable thermostat. In both surveys, the thermostats were referred to as "set-back or clock thermostats," but not programmable thermostats. For the 1993 RECS, the question was placed in the section on conservation measures and usage (following questions on insulation, weather stripping, and caulking). For the 1997 RECS, it was placed in the space-heating section, immediately following the question on the presence of a thermostat. The 1997 RECS also included a question that asked respondents if they programmed the thermostat or used the manual features. Based on the 1993 RECS, an estimated 10.8 million households had programmable thermostats in 1993. Based on the 1997 RECS, an estimated 33.1 million households had programmable thermostats in 1997. Of these 33.1 million, an estimated 10.2 million programmed their thermostats and an estimated 22.9 million used the manual features.

The large increase in the number of housing units with programmable thermostats from 1993 to 1997 is questionable. The change in the placement of the question may have contributed to the large change in the survey results. In addition, the question concerning programmed versus manual use of the thermostats may have changed how the interviewers coded the question on the presence of a programmable thermostat.


Estimation of Sampling Error

Sampling error is the random difference between a survey estimate and an actual population value. It occurs because the survey estimate is calculated from a randomly chosen subset of the entire population. The sampling error averaged over all possible samples would be zero, but there is only one sample for the 1997 RECS. Therefore, the sampling error is not zero and is unknown for the 1997 RECS sample. However, the sample design permits sampling errors to be estimated. This section describes how the sampling errors were estimated and how they were made available to readers of this report who are interested in the precision of the estimates in this report.

Throughout this report, standard errors are given as percents of their estimated values; that is, as relative standard errors (RSE). The RSE is also known as the coefficient of variation.

For a given population parameter Y that is estimated by the survey statistic Y, the relative standard error of Y, RSE(Y), and standard error of Y, S(Y), are given by:

RSE(Y) = [S(Y)/Y] × 100.


S(Y) = [RSE(Y)/100] × Y.

For some surveys, a convenient algebraic formula for computing variances can be obtained. However, the RECS used a multistage area sample design of such complexity (see Appendix A, "How the Survey Was Conducted") that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are inappropriate for the RECS estimates. Such formulas tend to give low values for standard errors, making the estimates appear much more accurate than is the case. Instead, the method used to estimate sampling variances for this survey was balanced half-sample replication. The balanced half-sample replication method involves calculating the value for a statistic using the full sample and calculating the value for each of a systematic set of half samples. (Each half sample contains approximately one-half of the observations contained in the full sample.) The variance is estimated using the differences between the value of the statistic calculated using the full sample and the values of the statistic calculated using each of the half samples.

Generalized Variances

For every estimate in this report, the RSE was computed by the balanced half-sample replication method. This RSE was used for any statistical tests or confidence intervals given in the text, or to determine if the estimate was too inaccurate to publish (RSE greater than 50 percent).

Space limitations prevent publishing the complete set of RSEs with this document. Instead, a generalized variance technique is provided, by which the reader can compute an approximate RSE for each of the estimates in the detailed tables. For the statistic in the ith row and jth column of a particular table, the approximate RSE is given by:


RSE(i,j) = R(i) × C(j)

where R(i) is the RSE row factor given in the last column of row i, and C(j) is the RSE column factor given at the top of column j. This value for the relative standard error can be used to construct confidence intervals and to perform hypothesis tests by standard statistical methods. However, because the generalized variance procedure gives only approximate RSEs, such confidence intervals and statistical tests must also be regarded as only approximate.

Return to Contents Page

Contact:
Eileen M. O'Brien
RECS Survey Manager
Phone: (202) 586-1122
Fax: (202) 586-0018
If you have any technical problems with this site, please contact the EIA Webmaster at wmaster@eia.doe.gov Phone: (202) 586-8959 Page last modified on 02/25/2004

EIA Home 
Contact Us