2001 Data QualityAll the statistics published in the RECS tables are estimates of population values, such as the number of households using natural gas. These estimates are based on a randomly chosen subset of the entire population of households. The universe includes all households in the 50 States and the District of Columbia, including households on military installations. The differences between the estimated values and the actual population values are due to two types of errors, sampling errors and nonsampling errors.
Adjustments
for Unit Nonresponse, Adjustments for Item Nonresponse and Quality of Specific
Data Items provide more details about nonsampling errors in the 2001 RECS.
Weight adjustment was used to reduce unit nonresponse bias in the survey statistics. Weights were calculated for each sample household. The household weight reflected the selection probability for that household and additional adjustments. These adjustments included correcting for potential biases arising from the failure to list all housing units in the sample area and failure to contact all sample housing units. Six factors are used in the processing of Residential Energy Consumption Survey (RECS) results to develop an overall weight for each household for which a completed questionnaire, either a personal interview or mailed questionnaire, is obtained. The factors are:
The overall household weight is the product of these six factors.
The basic
weight is calculated and applied to households at the Secondary Sampling Unit
(SSU) level. |
Table
B1. Control Totals for Ratio Adjustment of Sampling in the 2001 RECS |
|
---|---|
Location |
Thousands
of Households |
New
England |
5,407 |
Middle
Atlantic (minus New York State) |
7,766 |
East
North Central |
17,091 |
West
North Central |
7,400 |
South
Atlantic (minus Florida) |
13,986 |
East
South Central |
6,818 |
West
South Central (minus Texas) |
4,133 |
Mountain
|
6,725 |
Pacific
(minus Alaska, California, and Hawaii) |
3,603 |
New
York |
7,081 |
Florida
|
6,328 |
Texas
|
7,669 |
California
|
12,347 |
Alaska
|
227 |
Hawaii
|
409 |
Total
United States |
106,990 |
Source:
EIA's linear extrapolation from U.S. Bureau of the Census, 2000 and
2001 Current Population Survey.
|
The
incidence of the latter, the interviewer not asking and/or not recording
the answer, was greatly reduced by the use of Computer Assisted Personal
Interviewing (CAPI). The majority of nonresponse was due to interviewers
recording answers of “don’t know” and “refused.”
Missing data values were assigned to question items in otherwise completed RECS Household Questionnaires. Two imputation methods were used in the 2001 RECS, “hot-deck” and “deductive procedures.”
Hot-Deck: This procedure requires sorting the file of household data by variables related to the missing item. A household is then selected that has the same value of the related variables, and this “donor” household supplies the value of the variable that is missing to the “donee” household. For example, a six-room, two-full bathroom, single-family detached housing unit with a household size of three members, and a household income of $50,000 per year would be the donor household for a similar housing unit, the donee, having all the same characteristics but with a missing value to the annual household income item.
Deductive:
This procedure uses information available from the RECS questionnaire,
or from related external data sources such as utility bills and the Rental
Agent Survey, that permit a logically deduced value for the missing item.
For example, a respondent that reports that they do not use their air-conditioning
at all would permit a logically deduced value of zero for the missing response
to the number of the rooms in the home that are air-conditioned question.
Table B2 presents the household questionnaire items most frequently imputed in the 2001 RECS. In addition to these 13 items, there were 126 other questionnaire items for which responses were imputed. Of these 126 items, 34 involved imputations for 11 to 45 survey cases, between .23 percent and .93 percent, and 92 involved imputations for 10 or fewer cases, less than .23 percent.
Table
B2. Household Questionnaire Items Most Frequently Imputed in the 2001
RECS |
||||
---|---|---|---|---|
Imputed
Item |
Cases
Imputed |
Percentage of Total Sample |
Imputation
Method |
Question
Number |
Household
Income in past 12 months |
487 |
10.1 |
Hot
Deck |
J-14 |
Year
housing unit was built |
483 |
10.0 |
Hot
Deck |
A-15 |
Were
all rooms heated last winter |
189 |
3.9 |
Deductive/Hot
Deck |
D-10 |
Number
of rooms not heated last winter |
189 |
3.9 |
Deductive/Hot
Deck |
D-10a |
Natural
gas available in neighborhood |
186 |
3.9 |
Deductive/Hot
Deck |
A-17 |
Use
a separate built-in range top or burners |
117 |
2.4 |
Deductive |
B-1 |
Range
top or burners fuel |
115 |
2.4 |
Deductive |
B-1b1/B-1d |
Use
a separate built-in oven |
106 |
2.2 |
Deductive |
B-1 |
Oven
fuel |
104 |
2.2 |
Deductive |
B-1b2/B-1f |
Fuel
used to heat hot water |
88 |
1.8 |
Hot
Deck |
E-1 |
Amount
of heat provided by main space heating equipment |
60 |
1.2 |
Hot
Deck |
D-6 |
Entire
housing unit air conditioned |
55 |
1.1 |
Deductive |
F-3 |
Number
of rooms air conditioned |
51 |
1.1 |
Deductive |
F-3a |
Source: Energy Information Administration, Office of Energy Markets and End Use, Form EIA-457A of the 2001 Residential Energy Consumption Survey (RECS). RECS Public-Use Data Files. |
Comparison with Previous Surveys: The incidence of item nonresponse in the 2001 RECS is consistent with the rate experienced in 1997 where imputations for 145 variables were performed. Of these variables, 6 contained missing data for 100 or more cases and 80 variables contained missing data for 10 or fewer cases.
Stove and Oven Questions: Of particular note in the above table are the imputations for “use a separate built-in range top or burners” and “use a built-in oven.” Question B-1 of the 2001 Household Questionnaire presented respondents with a list of cooking appliances and asked them to report all that they had. Included in this list was a stove that has both burner`s and one or two ovens, separate built-in range top or burners, separate built-in oven, and built-in or stove-top grill. In the 1997 questionnaire, no question asked about built-in or stove-top grills and respondents were asked about the other three items in each of three separate questions.
The result of this change in questionnaire format was that an unusually large number of respondents who reported that they had a built-in or stove-top grill did not report that they also had a stove, 115 cases, and an oven, 104 cases. A comparison of these housing units with those reporting having a built-in or stove-top grill and both a stove and an oven revealed that the obtained results were highly unusual and likely due to item nonresponse. Accordingly, it was deduced and imputed that these housing units did in fact have an oven and stove. Since respondents that did not report having a stove or an oven where not asked about the fuel used by these appliances, the fuel used was also imputed. For those housing units that reported having both a stove and an oven the same fuel was used for both appliances in 91 percent of the cases. Accordingly, the fuel used by the stoves and ovens was deduced to be the same one used by the built-in or stove-top grills.
Self Cleaning Features: Finally, the follow-up questions about the presence and type of a self-cleaning features of ovens asked when an oven was reported were also not asked for the 104 cases where a built-in stove-top grill was reported but an oven was not. For these cases no response value was imputed. Instead a response of “not asked” was recorded on the data file.
Historically, how a housing unit was characterized in the RECS, e.g., whether it was a single-family detached unit, a single-family attached unit, a mobile home, an apartment in a 2-4 unit building, or an apartment in a building with more than 4 units, was the work of the interviewer. Upon the interviewer’s arrival at the selected housing unit they completed a summary sheet that included their observation as to the type of housing unit. In addition, the interviewers also recorded the number of floors and units in apartment buildings with more than 4 units. There was no independent verification of how the unit was characterized by either the householder or others.
There are two exceptions to these procedures. In the 1997 RECS, the householder, as part of the on-site, in-person interview, characterized the type of housing unit they lived in. The householder also provided the interviewer with the number of floors in apartment buildings with more than 4 units. The interviewer accepted the householder’s responses without question.
In the 2001 RECS, responsibility for the characterization of the type of housing unit and, for apartment buildings with more than 4 units, the number of floors and units in the building, was returned to the interviewer who recorded their observations before beginning the formal on-site, in-person household interview. However, in the 2001 interview the householder was asked to confirm the interviewer’s characterization of the housing unit. If the householder disagreed with the interviewer they were asked for their characterization. The interviewer resolved contradictions between the two observations. Of the 4,822 housing units included in the 2001 RECS, .75 percent (36 cases) disagreed with the interviewer’s characterization. Of these contradictions, 19 cases were re-characterized based on the householders input.
The change in the data collection procedures may have contributed to changes in the survey results. For example, the estimated number of occupied single-family detached units decreased from 63.8 million for the 1997 RECS to 63.1 million for the 2001 RECS. Conversely, the number of occupied housing units in buildings with two to four units increased from 5.6 million for the 1997 RECS to 9.5 million for the 2001 RECS.
In the space-heating section of the 1997 RECS, respondents who reported that they had a thermostat in their home were also asked: Is that thermostat either a set-back or clock thermostat and if they actually programmed the thermostat or used the manual features. An estimated 44.9 million households reported having set back or clock (programmable) thermostats in 1997. Of these, an estimated 11.7 households reported that they programmed their thermostats and an estimated 33.2 million reported that they only used the manual features. The very large number of households with programmable thermostats was in itself questionable; even more so when compared to the 10.8 million households reporting having a programmable thermostat in the 1993 RECS where a comparable question was included in the conservation measures and usage section of the questionnaire.
The change in the placement of the question in the 1997 RECS may have contributed to the large change in the survey results. In addition, the question concerning programmed versus manual use of the thermostats may have changed how the interviewers coded the question on the presence of a programmable thermostat.
In the 2001 RECS questionnaire, the wording of the follow-up question after the respondent reported having a thermostat was revised to more explicitly describe the type of thermostat. Specifically, the question asked: “Is that thermostat programmable? That is, can you set it so that the temperature setting automatically changes at the times of the day or night that you choose?” In response, 25.1 million households in the 2001 RECS reported that they had such a thermostat. This number of households is substantially lower than the 44.9 million households that reported having this type of thermostat in the 1997 RECS and substantially higher than the 10.8 million households in the 1993 RECS.
The 2001 estimate of the number of programmable thermostats is probably the most accurate of the three attempts to determine the actual number in U.S. housing units. Inevitably, an uncertain amount of response error will result in an uncertain amount of inaccuracy. Respondents, when asked about a programmable thermostat, may have different notions of the meaning of programming than intended by the drafters of the question.
Sampling error is the random difference between a survey estimate and an actual population value. It occurs because the survey estimate is calculated from a randomly chosen subset of the entire population. The sampling error averaged over all possible samples would be zero, but there is only one sample for the 2001 RECS. Therefore, the sampling error is not zero and is unknown for the 2001 RECS sample. However, the sample design permits sampling errors to be estimated.
This section describes how the sampling errors were estimated and how they were made available to readers of RECS tables and analyses who are interested in the precision of the estimates in the RECS tables.
Throughout the RECS tables, standard errors are given as percents of their estimated values; that is, as relative standard errors (RSE). The RSE is also known as the coefficient of variation.
For a given population parameter Y that is estimated by the survey statistic Y, the relative standard error of Y, RSE (Y), and standard error of Y, S (Y), are given by:
RSE (Y) = [S (Y)/Y] × 100.
S (Y) = [RSE (Y)/100] × Y.
For some surveys, a convenient algebraic formula for computing variances can be obtained. However, the RECS used a multistage area sample design of such complexity (see Survey Methods) that it is virtually impossible to construct an exact algebraic expression for estimating variances. In particular, convenient formulas based on an assumption of simple random sampling, typical of most standard statistical packages, are inappropriate for the RECS estimates. Such formulas tend to give low values for standard errors, making the estimates appear much more accurate than is the case.
For every estimate in the RECS tables, the RSE was computed by the balanced half-sample replication method. This RSE was used for any statistical tests or confidence intervals given in the text, or to determine if the estimate was too inaccurate to publish (RSE greater than 50 percent).
RSE (i,j) = R(i) × C(j)
where R(i) is the RSE row factor given in the last column of row i, and C(j) is the RSE column factor given at the top of column j.
This value for the relative standard error can be used to construct confidence intervals and to perform hypothesis tests by standard statistical methods. However, because the generalized variance procedure gives only approximate RSE’s, such confidence intervals and statistical tests must also be regarded as only approximate.
Contact:
If you have any technical problems with this site, please contact the EIA Webmaster at wmaster@eia.doe.gov Phone: (202) 586-8959