Welcome to the U.S. Energy Information Administration's Residential Homepage.  If you are having trouble, call 202-586-8800 for help  Energy Information Administration  Homepage

Home >  Energy Users > Residential Home page > 2001 Data Quality                                                 

2001 Data Quality

All the statistics published in the RECS tables are estimates of population values, such as the number of households using natural gas. These estimates are based on a randomly chosen subset of the entire population of households. The universe includes all households in the 50 States and the District of Columbia, including households on military installations.

The differences between the estimated values and the actual population values are due to two types of errors, sampling errors and nonsampling errors.

  • Sampling errors are errors that are random differences between the survey estimate and the population value that occur because the survey estimate is calculated from a randomly chosen subset of the entire population. The sampling error, averaged over all possible samples, would be zero, but since there is only one sample for the 2001 RECS, the sampling error is nonzero and unknown for the particular sample chosen. However, the sample design permits sampling errors to be estimated. The section, Estimation of Sampling Error, provides details on calculation of the sampling errors for the 2001 RECS.

  • Nonsampling errors are related to sources of variability that originate apart from the sampling process and are expected to occur in all possible samples or in the average of all estimates from all possible standards. Nonsampling errors can result from:

    • Inaccuracies in data collection: due to questionnaire design errors, interviewer error, respondent misunderstanding, and data processing error;
    • Unit nonresponse: when an entire sampled household does not participate in the survey, and;
    • Item nonresponse: when a particular item of interest is missing from a completed questionnaire.

Adjustments for Unit Nonresponse, Adjustments for Item Nonresponse and Quality of Specific Data Items provide more details about nonsampling errors in the 2001 RECS.  
 

Adjustments for Unit Nonresponse

Weight adjustment was used to reduce unit nonresponse bias in the survey statistics. Weights were calculated for each sample household. The household weight reflected the selection probability for that household and additional adjustments. These adjustments included correcting for potential biases arising from the failure to list all housing units in the sample area and failure to contact all sample housing units.

Six factors are used in the processing of Residential Energy Consumption Survey (RECS) results to develop an overall weight for each household for which a completed questionnaire, either a personal interview or mailed questionnaire, is obtained.

The factors are:

  • The basic weight
  • A noninterview adjustment
  • A first-stage ratio estimate
  • Three second-stage ratio adjustments

The overall household weight is the product of these six factors.


The Basic Weight
The basic weight is calculated and applied to households at the Secondary Sampling Unit (SSU) level.

Basic Weight = 1/ (Probability of Selection)

For the 2001 RECS, all households in the same SSU had the same probability of selection and hence the same basic weight.


The Noninterview Adjustment

The noninterview adjustment factor (NIAF) compensates for nonresponse households and for nonhousehold units that were identified during the survey. Basically, this adjustment reflects the ratio of the number of completed and uncompleted responses among those selected to the number of completed responses. Since the probabilities of selection are constant within an SSU for 2001, these adjustments were applied at the SSU level.

The NIAF is computed at the SSU and is equal to:

(Total Completed Plus Uncompleted Responses in the SSU) / (Completed Responses in the SSU)

If the ratio exceeds 2.0, then the NIAF is set equal to 2.0 and the NIAFs for SSUs in the same Primary Sampling Unit (PSU) and with the same metropolitan status are increased.


The First-Stage Ratio Adjustment Factors

The primary purpose of the first-stage adjustment factor is to reduce the sampling variation in the estimates of the number of housing units by main space-heating fuel resulting from sampling of PSUs during the first stage of the sample design. The correlation between main space-heating fuel and other important energy-related characteristics implies that this adjustment will also reduce the sampling variation for many important variables collected for the RECS.

In some cases, a single PSU comprising all or part of a large metropolitan area was large enough in population to be a stratum by itself. PSUs of this type are called Self-Representing (SR) PSUs because the sample from each SR PSU represents only that PSU. The first-stage ratio adjustment factor was 1.0 for all observations in SR PSUs.

In other strata, one PSU was selected from among two or more PSUs in the stratum. Each of the PSUs selected from these strata is called a Non-Self-Representing (NSR) PSU because each such PSU represents not only itself; it also represents the unselected PSUs in the stratum.

The 1990 Census data were used to determine the difference between the distribution of the main space-heating fuel in the set of selected NSR PSUs and the distribution in the set of all PSUs (selected and unselected) in the strata from which the NSR PSUs are selected. Fuels are under-represented if the percentage of households using the fuel is lower in the selected NSR PSUs than the percentage in the set of all PSUs in the NSR strata. Fuels are over-represented if the opposite occurs. The weights for the responding households in NSR PSUs are adjusted upward when their main space-heating fuel is under-represented and the weights are adjusted downward when it is over-represented.


The Second-Stage Ratio Adjustments

The second-stage ratio adjustments are used to improve the accuracy of the estimates of the number of households using data obtained from the Bureau of the Census as control totals. The RECS can be used to produce an estimate of the number of households in the country, but the Bureau of the Census produces much more accurate estimates. Improving the accuracy of the data on the number of households also improves the accuracy of almost all other estimates obtained from the RECS. The first priority is the accuracy of estimates for the number of households for the nine Census divisions and for the four largest States. The second priority is the accuracy of estimates for the number of households for three demographic cells (multiperson households, single-member female households, and single-member male households).

The ratio adjustment process was carried out in three steps:

  • Step One, the population was divided into 15 geographical cells. (Hawaii and Alaska were treated as separate cells because their climate is different than the rest of the country.) Control totals giving the number of households in each cell were derived from Current Population Survey results. A ratio adjustment equal to the control total divided by the weighted count using the weights after the first-stage ratio adjustment was created. Multiplying the weights after the first-stage ratio adjustment by the ratio yields the new weights which, when summed, equal the control totals for the 15 cells. This calculation yielded a weighted total number of households equal to 101,481,000. Refer to Table B1 for estimates for each of the 15 geographical areas.
  • Step Two, the control totals are use for the three demographic cells (multiperson households, single-member female households, and single-member male households).
  • Step Three, the same as the first step except that the input weights are those resulting from the second step. This produced a set of weights whose sum reproduced the 15 geographic cell control totals and yielded estimates that are quite close to the control totals for the three demographic cells.

Table B1. Control Totals for Ratio Adjustment of Sampling in the 2001 RECS
Location
Thousands of Households
New England
5,407
Middle Atlantic (minus New York State)
7,766
East North Central
17,091
West North Central
7,400
South Atlantic (minus Florida)
13,986
East South Central
6,818
West South Central (minus Texas)
4,133
Mountain
6,725
Pacific (minus Alaska, California, and Hawaii)
3,603
New York
7,081
Florida
6,328
Texas
7,669
California
12,347
Alaska
227
Hawaii
409
Total United States
106,990
Source: EIA's linear extrapolation from U.S. Bureau of the Census, 2000 and 2001 Current Population Survey.

Adjustments for Item Nonresponse

Item nonresponse occurs:
  • When respondents do not know the answer or refuse to answer a question or
  • When an interviewer does not ask a question or does not record an answer.

The incidence of the latter, the interviewer not asking and/or not recording the answer, was greatly reduced by the use of Computer Assisted Personal Interviewing (CAPI). The majority of nonresponse was due to interviewers recording answers of “don’t know” and “refused.”

Methods of Imputation

Missing data values were assigned to question items in otherwise completed RECS Household Questionnaires. Two imputation methods were used in the 2001 RECS, “hot-deck” and “deductive procedures.”

Hot-Deck: This procedure requires sorting the file of household data by variables related to the missing item. A household is then selected that has the same value of the related variables, and this “donor” household supplies the value of the variable that is missing to the “donee” household. For example, a six-room, two-full bathroom, single-family detached housing unit with a household size of three members, and a household income of $50,000 per year would be the donor household for a similar housing unit, the donee, having all the same characteristics but with a missing value to the annual household income item.

Deductive: This procedure uses information available from the RECS questionnaire, or from related external data sources such as utility bills and the Rental Agent Survey, that permit a logically deduced value for the missing item. For example, a respondent that reports that they do not use their air-conditioning at all would permit a logically deduced value of zero for the missing response to the number of the rooms in the home that are air-conditioned question.

Most Frequently Imputed Household Questionnaire Items

Table B2 presents the household questionnaire items most frequently imputed in the 2001 RECS. In addition to these 13 items, there were 126 other questionnaire items for which responses were imputed. Of these 126 items, 34 involved imputations for 11 to 45 survey cases, between .23 percent and .93 percent, and 92 involved imputations for 10 or fewer cases, less than .23 percent.

Table B2. Household Questionnaire Items Most Frequently Imputed in the 2001 RECS
Imputed Item
Cases Imputed
Percentage of Total Sample
Imputation Method
Question Number
Household Income in past 12 months
487
10.1
Hot Deck
J-14
Year housing unit was built
483
10.0
Hot Deck
A-15
Were all rooms heated last winter
189
3.9
Deductive/Hot Deck
D-10
Number of rooms not heated last winter
189
3.9
Deductive/Hot Deck
D-10a
Natural gas available in neighborhood
186
3.9
Deductive/Hot Deck
A-17
Use a separate built-in range top or burners
117
2.4
Deductive
B-1
Range top or burners fuel
115
2.4
Deductive
B-1b1/B-1d
Use a separate built-in oven
106
2.2
Deductive
B-1
Oven fuel
104
2.2
Deductive
B-1b2/B-1f
Fuel used to heat hot water
88
1.8
Hot Deck
E-1
Amount of heat provided by main space heating equipment
60
1.2
Hot Deck
D-6
Entire housing unit air conditioned
55
1.1
Deductive
F-3
Number of rooms air conditioned
51
1.1
Deductive
F-3a

      Source: Energy Information Administration, Office of Energy Markets and End Use, Form EIA-457A of the 2001 Residential Energy Consumption Survey (RECS). RECS Public-Use Data Files.

Comparison with Previous Surveys: The incidence of item nonresponse in the 2001 RECS is consistent with the rate experienced in 1997 where imputations for 145 variables were performed. Of these variables, 6 contained missing data for 100 or more cases and 80 variables contained missing data for 10 or fewer cases.

Stove and Oven Questions: Of particular note in the above table are the imputations for “use a separate built-in range top or burners” and “use a built-in oven.” Question B-1 of the 2001 Household Questionnaire presented respondents with a list of cooking appliances and asked them to report all that they had. Included in this list was a stove that has both burner`s and one or two ovens, separate built-in range top or burners, separate built-in oven, and built-in or stove-top grill. In the 1997 questionnaire, no question asked about built-in or stove-top grills and respondents were asked about the other three items in each of three separate questions.

The result of this change in questionnaire format was that an unusually large number of respondents who reported that they had a built-in or stove-top grill did not report that they also had a stove, 115 cases, and an oven, 104 cases. A comparison of these housing units with those reporting having a built-in or stove-top grill and both a stove and an oven revealed that the obtained results were highly unusual and likely due to item nonresponse. Accordingly, it was deduced and imputed that these housing units did in fact have an oven and stove. Since respondents that did not report having a stove or an oven where not asked about the fuel used by these appliances, the fuel used was also imputed. For those housing units that reported having both a stove and an oven the same fuel was used for both appliances in 91 percent of the cases. Accordingly, the fuel used by the stoves and ovens was deduced to be the same one used by the built-in or stove-top grills.

Self Cleaning Features: Finally, the follow-up questions about the presence and type of a self-cleaning features of ovens asked when an oven was reported were also not asked for the 104 cases where a built-in stove-top grill was reported but an oven was not. For these cases no response value was imputed. Instead a response of “not asked” was recorded on the data file.

Quality of Specific Data Items


This section addresses some of the difficulties encountered in trying to obtain meaningful energy data on specific Household Questionnaire items in the 2001 RECS.

Housing Unit Type

Historically, how a housing unit was characterized in the RECS, e.g., whether it was a single-family detached unit, a single-family attached unit, a mobile home, an apartment in a 2-4 unit building, or an apartment in a building with more than 4 units, was the work of the interviewer. Upon the interviewer’s arrival at the selected housing unit they completed a summary sheet that included their observation as to the type of housing unit. In addition, the interviewers also recorded the number of floors and units in apartment buildings with more than 4 units. There was no independent verification of how the unit was characterized by either the householder or others.

There are two exceptions to these procedures. In the 1997 RECS, the householder, as part of the on-site, in-person interview, characterized the type of housing unit they lived in. The householder also provided the interviewer with the number of floors in apartment buildings with more than 4 units. The interviewer accepted the householder’s responses without question.

In the 2001 RECS, responsibility for the characterization of the type of housing unit and, for apartment buildings with more than 4 units, the number of floors and units in the building, was returned to the interviewer who recorded their observations before beginning the formal on-site, in-person household interview. However, in the 2001 interview the householder was asked to confirm the interviewer’s characterization of the housing unit. If the householder disagreed with the interviewer they were asked for their characterization. The interviewer resolved contradictions between the two observations. Of the 4,822 housing units included in the 2001 RECS, .75 percent (36 cases) disagreed with the interviewer’s characterization. Of these contradictions, 19 cases were re-characterized based on the householders input.

The change in the data collection procedures may have contributed to changes in the survey results. For example, the estimated number of occupied single-family detached units decreased from 63.8 million for the 1997 RECS to 63.1 million for the 2001 RECS. Conversely, the number of occupied housing units in buildings with two to four units increased from 5.6 million for the 1997 RECS to 9.5 million for the 2001 RECS.


Programmable (Set-Back or Clock) Thermostats

In the space-heating section of the 1997 RECS, respondents who reported that they had a thermostat in their home were also asked: Is that thermostat either a set-back or clock thermostat and if they actually programmed the thermostat or used the manual features. An estimated 44.9 million households reported having set back or clock (programmable) thermostats in 1997. Of these, an estimated 11.7 households reported that they programmed their thermostats and an estimated 33.2 million reported that they only used the manual features. The very large number of households with programmable thermostats was in itself questionable; even more so when compared to the 10.8 million households reporting having a programmable thermostat in the 1993 RECS where a comparable question was included in the conservation measures and usage section of the questionnaire.

The change in the placement of the question in the 1997 RECS may have contributed to the large change in the survey results. In addition, the question concerning programmed versus manual use of the thermostats may have changed how the interviewers coded the question on the presence of a programmable thermostat.

In the 2001 RECS questionnaire, the wording of the follow-up question after the respondent reported having a thermostat was revised to more explicitly describe the type of thermostat. Specifically, the question asked: “Is that thermostat programmable? That is, can you set it so that the temperature setting automatically changes at the times of the day or night that you choose?” In response, 25.1 million households in the 2001 RECS reported that they had such a thermostat. This number of households is substantially lower than the 44.9 million households that reported having this type of thermostat in the 1997 RECS and substantially higher than the 10.8 million households in the 1993 RECS.

The 2001 estimate of the number of programmable thermostats is probably the most accurate of the three attempts to determine the actual number in U.S. housing units. Inevitably, an uncertain amount of response error will result in an uncertain amount of inaccuracy. Respondents, when asked about a programmable thermostat, may have different notions of the meaning of programming than intended by the drafters of the question.


Estimation of Sampling Error

Sampling error is the random difference between a survey estimate and an actual population value. It occurs because the survey estimate is calculated from a randomly chosen subset of the entire population. The sampling error averaged over all possible samples would be zero, but there is only one sample for the 2001 RECS. Therefore, the sampling error is not zero and is unknown for the 2001 RECS sample. However, the sample design permits sampling errors to be estimated.

This section describes how the sampling errors were estimated and how they were made available to readers of RECS tables and analyses who are interested in the precision of the estimates in the RECS tables.


Relative Standard Errors (RSE’s)

Throughout the RECS tables, standard errors are given as percents of their estimated values; that is, as relative standard errors (RSE). The RSE is also known as the coefficient of variation.

For a given population parameter Y that is estimated by the survey statistic Y, the relative standard error of Y, RSE (Y), and standard error of Y, S (Y), are given by:

RSE (Y) = [S (Y)/Y] × 100.

S (Y) = [RSE (Y)/100] × Y.

For some surveys, a convenient algebraic formula for computing variances can be obtained. However, the RECS used a multistage area sample design of such complexity (see Survey Methods) that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are inappropriate for the RECS estimates. Such formulas tend to give low values for standard errors, making the estimates appear much more accurate than is the case.


Balanced Half-Sample Replication
Instead, the method used to estimate sampling variances for the RECS was balanced half-sample replication. The balanced half-sample replication method involves calculating the value for a statistic using the full sample and calculating the value for each of a systematic set of half samples. (Each half sample contains approximately one-half of the observations contained in the full sample.) The variance is estimated using the differences between the value of the statistic calculated using the full sample and the values of the statistic calculated using each of the half samples.

For every estimate in the RECS tables, the RSE was computed by the balanced half-sample replication method. This RSE was used for any statistical tests or confidence intervals given in the text, or to determine if the estimate was too inaccurate to publish (RSE greater than 50 percent).


Generalized Variances
Instead of publishing the complete set of RSE’s, a generalized variance technique is provided, by which the reader can compute an approximate RSE for each of the estimates in the detailed RECS tables. For the statistic in the ith row and jth column of a particular table, the approximate RSE is given by:

RSE (i,j) = R(i) × C(j)

where R(i) is the RSE row factor given in the last column of row i, and C(j) is the RSE column factor given at the top of column j.

This value for the relative standard error can be used to construct confidence intervals and to perform hypothesis tests by standard statistical methods. However, because the generalized variance procedure gives only approximate RSE’s, such confidence intervals and statistical tests must also be regarded as only approximate.

Contact:
Eileen M. O'Brien
RECS Survey Manager
Phone: (202) 586-1122
Fax: (202) 586-0018
URL: http://www.eia.doe.gov/emeu/recs/recs2001/appendixb.html

If you have any technical problems with this site, please contact the EIA Webmaster at wmaster@eia.doe.gov Phone: (202) 586-8959