|
|
|
1996
Introduction
The data contained in these Profiles, Summary Tape File tables and Public Use Microdata
Samples are based on the American Community Survey (ACS) sample interviewed in 1996. The
ACS is designed to provide accurate estimates for the housing units and household
population of the four sites participating in the 1996 ACS. The ACS, like any other
statistical activity, is subject to error. The purpose of this documentation is to provide
data users with a basic understanding of the ACS sample design, estimation methodology,
and accuracy of the ACS data.
Sample Design
- Sites -- Three urban sites: Multnomah County/Portland, OR, Rockland County, NY and
Brevard County, FL -- and one rural site: Fulton County, PA, participated in the 1996
ACS.
The primary sampling unit was the housing unit, including all occupants. Persons living in
group quarters were not included in the sample.
- Master Address File -- In the urban sites the Census Bureau developed a Master Address
File (MAF) that served as the housing unit sampling frame for the survey. The MAF was
constructed by an automated match of the sites' 1990 Census Address Control File and the
United States Postal Service 1995 Delivery Sequence File. For the Fulton County site the
Census Bureau compiled a sampling frame by canvassing and listing each housing unit in the
county, then computerizing the list.
- Sampling Rates -- An automated procedure designated the sample units. Two sampling rates
were employed. Most of the housing units selected into the survey were sampled at a rate
of 15 percent. For functioning governmental units in which there were fewer than 1,000
housing units on the sampling frame, the sampling rate was 30 percent. All of Fulton
County and a few governmental units in the three urban sites were sampled at 30 percent. A
variable sampling rate was used for the purpose of providing relatively more reliable
estimates for small areas.
- Public Use Microdata Sample -- A stratified systematic sample with probability equal to
the housing unit weight was used to select units for the public use microdata sample
(PUMS) file. Three sampling rates were used to account for subsampling for the field
personal visit cases and for the furlough adjustment. Housing units were stratified to
improve the reliability of estimates derived from the PUMS. The PUMS was selected to
represent roughly 5% of the population universe for each site.
Data Collection
Three data collection modes were used to conduct the 1996
ACS: Mail, Computer Assisted
Telephone Interviewing (CATI), and Computer Assisted Personal Interviewing (CAPI).
These three modes are described below.
- Mail Phase -- The Mail phase began with a prenotice letter mailed to each housing unit
on the next to last Wednesday of the month preceding the sample month. The ACS
questionnaire was mailed one week later, followed by a reminder card one week after that.
A replacement questionnaire was mailed three weeks later if the original questionnaire had
not yet been checked in at the processing site. Check-in of mail return questionnaires for
a sample panel was cut off at the start of the third month following the sample month.
- CATI Phase -- Approximately five weeks after the mailout of the original ACS
questionnaire, the CATI staff began contacting non-responding sample households by
telephone. Late mail returns were removed from the CATI workload on a daily basis. This
phase of nonresponse follow-up lasted for approximately four weeks.
- CAPI Phase -- The CAPI universe consisted of all outstanding non-response cases
remaining after the completion of the CATI phase. A 1 in 3 subsample was selected from the
outstanding cases and forwarded to the field interviewers. Field interviewers visited each
assigned housing unit and attempted to conduct an interview. Late mail returns were
removed from the CAPI workload on a daily basis. The CAPI phase of nonresponse follow-up
lasted approximately one month.
Confidentiality of the Data
- Confidentiality Edit -- To maintain the confidentiality required by law (Title 13,
United States Code), the Census Bureau applies a confidentiality edit to the ACS data to
assure that published data do not disclose information about specific individuals,
households, or housing units. As a result, a small amount of uncertainty is introduced
into the estimates of ACS characteristics. The sample itself provides adequate protection
for most areas for which sample data are published since the resulting data are estimates
of the actual characteristics. However, small areas require more protection. The
confidentiality edit is implemented by identifying a subset of individual housing units
from the sample data files as having a unique combination of specified person and
household characteristics within a block group. The confidentiality edit is controlled so
that the basic structure of the data is preserved.
Errors in the Data
- Sampling Error -- The data in the ACS products are estimates of the actual figures that
would have been obtained by interviewing the entire population using the same methodology.
The estimates from the chosen sample also differ from other samples of housing units and
persons within those housing units. Sampling error in data arises due to the use of
probability sampling, which is necessary to insure the integrity and representativeness of
sample survey results. The implementation of statistical sampling procedures provides the
basis for the statistical analysis of sample data.
- Nonsampling Error -- In addition to sampling error, data users should realize that other
types of errors may be introduced during any of the various complex operations used to
collect and process survey data. For example, operations such as editing, reviewing, or
keying data from questionnaires may introduce error into the estimates. These and other
sources of error contribute to the nonsampling error component of the total error of
survey estimates. Nonsampling errors may affect the data in two ways. Errors that are
introduced randomly increase the variability of the data. Systematic errors which are
consistent in one direction introduce bias into the results of a sample survey. The Census
Bureau protects against the effect of systematic errors on survey estimates by conducting
extensive research and evaluation programs on sampling techniques, questionnaire design,
and data collection and processing procedures. In addition, an important goal of the ACS
is to minimize the amount of nonsampling error introduced through nonresponse for sample
housing units. One way of accomplishing this is by following up on mail nonrespondents
during the CATI and CAPI phases.
- Standard Errors -- The standard error is a measure of the deviation of a sample estimate
from the average of all possible samples. Sampling errors and some types of nonsampling
errors are estimated by the standard error. The sample estimate and its estimated standard
error permit the construction of interval estimates with a prescribed confidence that the
interval includes the average result of all possible samples. The next section describes
the method of calculating standard errors and confidence intervals for the estimates in
this ACS product.
Calculation of Standard Errors
Direct Standard Errors
- Methodology Used -- Direct estimates of the standard errors were calculated for all
estimates reported in this product. They are provided in the Profiles and in tables of
Summary Tape File estimates. The standard errors, in most cases, are calculated using
standard variance estimation software using a methodology that takes into account the
sample design and estimation procedures.
- Exceptions -- There are four cases for which the direct standard error estimates are not
appropriate.
- The estimate of the number or proportion of people, households, housing
units or families in a geographic area with a specific characteristic is zero. A special
procedure was used to estimate the standard error.
- There are no sample observations available to compute an estimate of a proportion or per
capita amount or an estimate of its standard error. The estimate is represented in the
tables by "--" and the standard error estimate by "**".
- Only a small number of identical values are reported and used to calculate an
aggregate, median, mean or per capita amount. In this case, there are too few sample
observations to compute a stable estimate of the standard error. The standard error
estimate is represented in the tables by "*".
- The estimate of the number of people having a specified characteristic is controlled
to be equal to an independently derived population estimate at the county level. These
county estimates are produced by the Census Bureau's Population Estimates Program using
standard demographic analysis techniques. For these cases the standard error is zero. (See
"Estimation Procedure" for a further explanation.)
- Sums and Differences -- The standard errors estimated from these tables are for
individual estimates. Additional calculations are required to estimate the standard errors
for sums of and differences between two sample estimates. The estimate of the standard
error of a sum or difference is approximately the square root of the sum of the two
individual standard errors squared; that is, for standard errorsandof estimates and :
This method, however, will underestimate (overestimate) the standard error if the two
items in a sum are highly positively (negatively) correlated or if the two items in a
difference are highly negatively (positively) correlated.
- Ratios -- Frequently, the statistic of interest is the ratio of two variables, where the
numerator may or may not be a subset of the denominator. The standard error of the ratio
between two sample estimates is approximated as follows:
- Confidence Intervals -- A sample estimate and its estimated standard error may be used
to construct confidence intervals about the estimate. These intervals are ranges that will
contain the average value of the estimated characteristic that results over all possible
samples, with a known probability.
For example, if all possible samples that could result under the 1996 ACS sample design
were independently selected and surveyed under the same conditions, and if the estimate
and its estimated standard error were calculated for each of these samples, then:
- Approximately 68 percent of the intervals from one estimated standard error below the
estimate to one estimated standard error above the estimate would contain the average
result from all possible samples;
- Approximately 90 percent of the intervals from 1.65 times the estimated standard error
below the estimate to 1.65 times the estimated standard error above the estimate would
contain the average result from all possible samples.
- Approximately 95 percent of the intervals from two estimated standard errors below the
estimate to two estimated standard errors above the estimate would contain the average
result from all possible samples.
The intervals are referred to as 68 percent, 90 percent, and
95 percent confidence intervals, respectively.
- Confidence Intervals of Ratios, Sums, and Differences -- Confidence intervals also may
be constructed for the ratio, sum of, or difference between two sample figures. This is
done by first computing the ratio, sum, or difference, then obtaining the standard error
of the ratio, sum, or difference (using the formulas given earlier), and finally forming a
confidence interval for this estimated ratio, sum, or difference as above. One can then
say with specified confidence that this interval includes the ratio, sum, or difference
that would have been obtained by averaging the results from all possible samples.
Limitations -- The user should be careful when computing and interpreting confidence
intervals.
- The estimated standard errors included in this data product do not include all portions
of the variability due to nonsampling error that may be present in the data. In
particular, the standard errors do not reflect the effect of correlated errors introduced
by interviewers, coders, or other field or processing personnel. Thus, the standard errors
calculated represent a lower bound of the total error. As a result, confidence intervals
formed using these estimated standard errors may not meet the stated levels of confidence
(i.e., 68, 90, or 95 percent). Thus, some care must be exercised in the interpretation of
the data in this data product based on the estimated standard errors.
- Zero or small estimates; very large estimates -- The value of almost all ACS
characteristics is greater than or equal to zero by definition. For zero or small
estimates use of the method given previously for calculating confidence intervals relies
on large sample theory, and may result in negative values which for most characteristics
are not admissible. In this case the lower limit of the confidence interval is set to zero
by default. A similar caution holds for estimates of totals close to a control total or
estimated ratios near one, where the upper limit of the confidence interval is set to its
largest admissible value. In these situations the level of confidence of the adjusted
range of values is less than the prescribed confidence level.
- Small tabulation areas -- Estimates for small tabulation areas (particularly small areas
with high nonresponse rate) are based on small samples. Again, using large sample theory
methodology to construct confidence intervals may result in values that are not admissible
for the characteristic of interest or in overstating the confidence for a given range. The
user should exercise caution in the analysis of data for these areas.
Generalized Standard Errors
The information provided in Tables A through C can be used to approximate the standard
errors of sample estimates of totals and proportions in the Profiles and
Summary Tape File tables, and from the PUMS. Tables A and B give the basic standard error
for an estimate of a characteristic that would result under a simple random sampling
design. The estimates are for person, family, and housing unit characteristics. Design
factors by subject are provided in Table C. The term "subject" refers to a
characteristic, such as age for persons, tenure for housing units, and poverty for
families. The design factors reflect the effects of the actual sample design and
estimation procedures used for the 1996 American Community Survey. Details of the sample
design and estimation procedures are provided elsewhere in this chapter.
To approximate the standard error of an estimate of a total or a proportion using
Tables A through C follow the steps described in the next section. A proportion is defined
as a ratio of two estimates where the numerator is a subset of the denominator. For
example, the proportion of Black lawyers is the ratio of Black lawyers to all lawyers.
An inspection of the formulas used to calculate the simple random sampling standard
errors suggests that when dealing with zero estimates or very small estimates of totals
and percentages the standard error estimates approach zero. This is also the case for very
large estimates of totals and percentages. Zero or small estimates, like any other sample
estimates, are still subject to sampling variability and therefore an estimated standard
error of zero or close to zero is not adequate. For an estimated total that is less than
75 or within 75 of the total size of the tabulation area, use a basic standard error of
21. For estimated percentages that are less than 2 or greater than 98, use the basic
standard errors in Table B that are shown in the "2 or 98" row.
Confidence intervals can be constructed from generalized standard errors just as they
are from direct standard errors. However, for estimates other than totals and proportions,
generalized standard errors cannot be calculated from Tables A through C.
- To get standard errors for estimates of means, medians, aggregates
and per capita amounts the user should use direct standard errors since generalized
standard errors are not provided for these estimates. If two or more tabulation areas are
combined to obtain the median value of a characteristic, the user is referred to
"Medians" below for a description of how to approximate the standard error. The
method relies on linear interpolation. The standard error approximations that result from
this method may not be as adequate for characteristics that deviate from the linear
assumption. Users are recommended to exercise caution when analyzing these
characteristics.
- Table C includes a design factor for "Children Ever Born." However, the
standard errors shown in Table A and the formula provided below the table are not
appropriate to calculate standard errors for fertility estimates. The user should use
direct standard errors for these estimates (Table P28.) Direct standard errors are not
available for estimates of areas formed by combining two or more standard tabulation areas
or for estimates derived from the PUMS.
Approximate the basic standard error using the formula , where is the estimate of "Children Ever Born" and w is
the unweighted number of women in a specific age group. is obtained from P28 and w is approximated by the product f, where f is the site level
sampling fraction and is
the total estimate of women in the specific age group. Note that this method provides
adequate standard error estimates for PUMS estimates; however, the approximation
deteriorates as the size of the tabulation area (or the specific age group) decreases. To
obtain the adjusted standard error follow the procedure explained in the section "Use
of Tables to Approximate Standard Errors." Standard error estimates
approximated using this methodology are conservative and therefore the specified
confidence of statistical intervals calculated as described in "Confidence
Intervals" may be overstated.
Use of Tables to Approximate Standard Errors
Tables A through C are used in the following manner to approximate standard errors:
- Obtain the basic standard error from either Table A (for an estimate of a total) or
Table B (for an estimate of a percentage) or use the formula given below the appropriate
table. Obtain the number of persons, number of housing units, or number of families for
the county in the appropriate matrix for estimates of total. Use these numbers to
determine which column to look under in Table A. When working with the PUMS, multiply this
basic standard error by 1.83.
- Use Table C to obtain the appropriate design factor for the characteristic; for example,
educational attainment or ancestry. Multiply the basic standard error by this factor to
get the ACS standard error estimate.
Medians -- For the standard error of the median of a characteristic, it is necessary to
examine the distribution from which the median is derived, as the estimated number of
persons, households, families or housing units with the characteristic and the
distribution of the characteristic affect the standard error. An approximate method is
given here. As the first step, compute one-half of the estimated number having the
characteristic on which the median is based (refer to this result as B/2). Treat B/2 as if
it were an ordinary estimate and obtain its standard error as instructed above. Compute
the desired confidence interval about B/2. Starting with the lowest value of the
characteristic, cumulate the frequencies in each category of the characteristic until the
sum equals or first exceeds the lower limit of the confidence interval about B/2. By
linear interpolation, obtain a value of the characteristic corresponding to this sum. This
is the lower limit of the confidence interval of the median. In a similar manner, continue
cumulating frequencies until the sum equals or exceeds the count in excess of the upper
limit of the interval about B/2. Interpolate as before to obtain the upper limit of the
confidence interval for the estimated median.
When interpolation is required in the upper open-ended interval of a distribution to
obtain a confidence bound, use 1.5 times the lower limit of the open-ended confidence
interval as the upper limit of the open-ended interval.
The following examples of standard errors and confidence intervals are based on real
data from the 1996 ACS Test.
Example 1 - Proportion or Percentage Estimate
The estimated poverty rate for census tract 601 in Brevard County is 19.3 percent. The
base of the estimated percentage is 6,063. From Table B, use the row corresponding to
"20 or 80" percent and interpolating between the two columns 5,000 and 7,500 one
can approximate the basic standard error of this estimate as follows:
BasicSE(19.3) = 1.4 - (1.4 - 1.1)*[(6,063 - 5,000) / (7,500 - 5,000)] = 1.3.
The design factor for "Poverty Status in the Past 12 Months (persons)" for
Brevard County is 1.8. The approximate standard error estimate for the estimated poverty
rate of 19.3 percent is determined by multiplying the basic standard error 1.3 by the
design factor 1.8 from Table C. This yields an estimated standard error of 2.3. (The level
of precision on each calculation is the same as for the estimates.)
To avoid interpolation use the formula given below Table B. The use of the formula is
illustrated here.
BasicSE(19.3) =
(Note that the two basic standard error estimates are not identical.) Again, to get the
final standard error of 2.2 multiply 1.2 by the design factor 1.8.
To calculate the lower and upper bounds of the 90 percent confidence interval around
19.3 percent using the second final standard error, simply multiply 2.2 by 1.65, then add
and subtract the product from 19.3. Thus the 90 percent confidence interval for this
estimated percentage is found to be
[19.3 - 1.65(2.2)] to [19.3 + 1.65(2.2)] or 15.7 to 22.9.
Example 2 - Total Estimate
Consider the data in example 1. The estimate of persons in poverty in census tract 601
in Brevard County is 1,171. From Table A, use the column labeled "400,000" and
interpolating between the rows 1,000 and 2,500, approximate the basic standard error as
follows:
BasicSE(1,171) = 119 - (119-75)*[(2,500-1,171)/(2,500-1,000)] = 80.
Multiply 80 by the design factor 1.8 to approximate the ACS standard error estimate.
The standard error estimate is found to be 144 persons.
Avoid linear interpolation by using the formula given below Table A. The population of
Brevard county is 447,597. Thus,
BasicSE(1,171) = 82.
Note that using the formula yields a slightly different result. Multiply 82 by the
design factor 1.8 to get a final standard error estimate equal to 147. (Keep in mind that
the two results are approximations of the standard error estimate.)
Proceed as before to construct a 90 percent confidence interval around the estimate of
1,171 persons in poverty. The upper and lower bounds of the 90 percent confidence interval
are
[1,171 - 1.65(147)] to [1,171 + 1.65(147)] or 928 to 1,414.
Example 3 - Difference Between Two Sample Estimates
The following is an illustration of the calculations required to construct a confidence
interval for the estimated difference between two sample estimates. The 1996 ACS poverty
rate estimates for census tracts 601 and 611 in Brevard County are 19.3 and 5.2 percent,
respectively. The obvious question in comparing these two areas is whether the two areas
are really different with respect to the characteristic of interest or is the apparent
difference just the result of the use of sampling. The difference in the poverty rate for
the two tracts is 14.1 percent. To compute the final standard error estimate of the
difference use the formula given in "Sums and Differences."
First, calculate the standard error estimate of 5.2 and combine it with the results
obtained in example 1, as follows (the design factors are incorporated into the
calculation):
SE(14.1) = .
The 90 percent confidence interval for the difference is computed as before:
[14.1 - 1.65(2.6)] to [14.1 + 1.65(2.6)] or 9.8 to 18.4.
Since this confidence interval doesn't include zero, we can conclude with 90 percent
confidence that the poverty rates in the two tracts are really different.
Example 4 - Ratio of Two Estimates
For reasonably large areas or large samples ratio estimates are normally distributed.
The methods described in the previous examples can be used to calculate a confidence
interval around a ratio estimate. Suppose that one wished to express the poverty rate of
census tract 601 relative to the poverty rate of census tract 611. The ratio of the two
estimates of interest is
19.3/5.2 = 3.7. Thus, the poverty rate of census tract 601 is 3.7 times higher than the
corresponding rate of census tract 601. Using the formula to calculate the standard error
of a ratio estimate we have:
|
= 1.0 |
Using the result above, the 90 percent confidence interval for this ratio is
[3.7 - 1.65(1.0)] to [3.7 + 1.65(1.0)] or 2.0 to 5.4.
Example 5 - Median Estimate
Compute a 90 percent confidence interval for median adjusted household income in the
Brevard County Profile. This median is computed using all of the 183,251 households in
Brevard County. Half of that number is 91,626 = B/2. To avoid interpolation in finding the
basic standard error, use the formula given below Table A.
Multiply 511 by the design factor 1.3 for "Household Income in the Past 12
Months" from Table C to get the final standard error of 664.
Calculate the 90 percent confidence interval bounds around B/2 = 91,626.
[91,626 - 1.65(664)] to [91,626 + 1.65(664)] or 90,530 to 92,722.
Use the Profile "1996 Adjusted Income" table to determine the confidence
interval for the median. The table below shows the number of households in each income
category. A column of cumulative totals has been added. From this table, it is clear that
the 90,530th household has an income that falls in the range $25,000 to
$34,999.
1996 ADJUSTED INCOME |
Number in Range |
Cumulative Number |
Households |
|
|
Less than $5,000 |
7,490 |
7,490 |
$5,000 to $9,999 |
12,191 |
19,681 |
$10,000 to $14,999 |
15,105 |
34,786 |
$15,000 to $24,999 |
32,972 |
67,758 |
$25,000 to $34,999 |
29,648 |
97,406 |
$35,000 to $49,999 |
34,457 |
131,863 |
$50,000 to $74,999 |
31,556 |
163,419 |
$75,000 to $99,999 |
11,377 |
174,796 |
$100,000 to $149,999 |
6,242 |
181,038 |
$150,000 or more |
2,213 |
183,251 |
Interpolating for the lower bound gives:
The upper bound also falls in the $25,000 to $34,999 range and is computed in a similar
fashion.
ESTIMATION PROCEDURE
The estimates that appear in this product are obtained from a raking ratio estimation
procedure that results in the assignment of two sets of weights: a weight to each sample
person record and a weight to each sample housing unit record. For any given tabulation
area, a characteristic total is estimated by summing the weights assigned to the persons,
households, families or housing units possessing the characteristic in the tabulation
area. Estimates of person characteristics are based on the person weight and estimates of
family, household or housing unit characteristics are based on the housing unit weight.
Each sample person or housing unit record is assigned exactly one weight to be used to
produce estimates of all characteristics. For example, if the weight given to a sample
person or housing unit had the value 6, all characteristics of that person or housing unit
would be tabulated with the weight of 6. The estimation procedure, however, does assign
weights varying from person to person or housing unit to housing unit.
The estimation procedure used to assign the weights was performed independently within
each of the 1996 ACS sites.
- Initial Housing Unit Weighting Factors - This process produced the following factors:
- Base Weight (BW) - This factor was assigned to every housing unit based on its sampling
stratum and is the inverse of the housing unit's sampling rate. Base weights were either
3.3333 or 6.6667.
- Unduplication Adjustment Factor (UAF) - Addresses already included in other Census
Bureau surveys were not subjected to sampling. This factor adjusted the base weight to
account for the probability that an address had already been selected into some other
survey's sample. Factors were computed and assigned based on the following groups.
Site x County Block Type x 1990 Census Address Control
File Status
- CAPI Subsampling Factor (SSF) - The weights of the CAPI cases were adjusted to reflect
the results of CAPI subsampling. This factor was assigned to each record as follows:
Selected in CAPI subsampling: SSF = 3.0
Not selected in CAPI subsampling: SSF = 0.0
Not a CAPI case: SSF = 1.0
- Variation in Monthly Response by Mode (VMS) - This factor made the total weight of the
Mail, Delivery, CATI, and CAPI records to be tabulated in a month equal to the total base
weight of all cases originally mailed for that month. The value of VMS for Mail and
Delivery cases was 1.0. For CATI and CAPI cases, VMS was computed and assigned based on
the following groups.
Site x Month
- Noninterview Factor (NIF) - This factor adjusted the weight of all responding
occupied housing units to account for both responding and non-responding housing units.
This factor was computed in two stages: NIF1 and NIF2. NIF1 was computed and assigned to
occupied housing units based on the following groups.
Site x County x Building Type1
x Tract
After having adjusted each occupied housing unit for NIF1, NIF2 was computed and
assigned to occupied housing units based on the following groups.
Site x Building Type x Month
NIF was then computed for each occupied housing unit as the product of NIF1 and NIF2.
Vacancies were assigned a value of NIF = 1.0. Non-responding housing units were now
assigned a weight of 0.0.
- Noninterview Factor - Mode (NIFM) - This factor adjusted the weight of just the
responding CAPI occupied housing units to account for both CAPI respondents and all
non-respondents. This factor was computed as if NIF had not already been assigned to every
occupied housing unit record. This factor was not used directly but rather as part of
computing the next factor: MBF. NIFM was computed and assigned to occupied CAPI housing
units based on the following groups.
Site x Building Type x Month
Mail, Delivery and CATI cases received a value of NIFM = 1.0. Vacancies received a
value of NIFM = 1.0.
- Mode Bias Factor (MBF) - This factor made the total weight of the groups below the same
as if NIFM had been used instead of NIF. MBF was computed and assigned to occupied housing
units based on the following groups.
Site x Tenure2
x Month x Marital Status3
Vacancies received a value of MBF = 1.0.
- Furlough Adjustment Factor (FAF) - This factor adjusted the weights of the February 1996
CAPI records to account for the "missing" January 1996 CAPI cases caused by the
government-wide furlough and bad weather in Dec 95/Jan 96. This factor was assigned to
each record as follows:
Feb 1996 CAPI records: FAF 2.04
All other records: FAF = 1.0
- First Housing Unit Post-Stratification Factor (HPSF1) - This factor made the number of
housing units in a tract equal to the tract counts from the 1997 Master Address File
(MAF), after all factors through FAF had been applied. HPSF1 was computed and assigned to
all housing units based on the following groups.
Site x County x Tract
- Person Weighting Factors - Initially each person in an occupied housing unit received
the housing unit weight as their person weight. At this point everyone in the household
had the same weight. These person weights were then individually adjusted based on each
person's age, race, sex, and Hispanic origin as described below.
- Person Post-Stratification Factor (PPSF) - This factor was applied to individual persons
based on their age, race, sex and Hispanic origin. It adjusts the person weights so that
the weighted sample counts will match county population control counts by age, race, sex,
and Hispanic origin. These population control counts are independently derived by the
Census Bureau's Population Division.
This is an iterative procedure that first computes PPSF using the following groups:
County x Race5
x Sex6 x Age
Groups used for Race adjustment
After applying the value of PPSF computed above, a second stage value of PPSF is
computed using the following groups:
County x Hispanic Origin7
x Sex x Age Groups used for Hispanic adjustment
The above two steps were repeated up to six times, or until the change to PPSF from one
iteration to the next became small.
- Rounding - The final product of all person weights (BW x . . . x HPSF1
x PPSF) was rounded to an integer. Rounding was performed so that the sum of
the rounded weights was within one person of the sum of the unrounded weights for any of
the groups listed below:
County
County x Race
County x Race x Hisp
County x Race x Hisp x Sex
County x Race x Hisp x Sex x Age
County x Race x Hisp x Sex x Age x
Tract
County x Race x Hisp x Sex x Age x
Tract x Block
For example, the estimate of the number of White, Hispanic, Males, Age 30 using the
rounded weights is within one of the number produced using the unrounded weights.
- Final Housing Unit Weighting Factors - This process produced the following factors:
- Principal Person Factor (PPF) - This factor adjusted for differential response depending
on the race, Hispanic origin, sex, and age of the principal person in the household. The
principal person was defined as the female spouse of the responding householder. If there
was no such person, then the responding householder was the principal person.
The value of PPF was the PPSF of the principal person.
- Second Housing Unit Post-stratification Factor (HPSF2) - This factor made the number of
housing units in a tract again equal to the 1997 MAF control count totals after all
factors (BW x . . . x HPSF1 x PPF) had been applied. HPSF2 was
computed and assigned to all housing units based on the following groups.
Site x County x Tract
- Rounding - The final product of all housing unit weights (BW x . . . x HPSF1
x PPF x HPSF2) was rounded to an integer. Rounding was performed
so that total rounded weight was within one housing unit of the total unrounded weight for
any of the groups listed below:
Site
Site x County
Site x County x Tract
Site x County x Tract x Block
CONTROL OF NONSAMPLING ERROR
As mentioned earlier, sample data are subject to nonsampling error. This component of
error could introduce serious bias into the data, and the total error could increase
dramatically over that which would result purely from sampling. While it is impossible to
completely eliminate nonsampling error from a survey operation, the Census Bureau attempts
to control the sources of such error during the collection and processing operations.
Described below are the primary sources of nonsampling error and the programs instituted
for control of this error. The success of these programs, however, is contingent upon how
well the instructions actually were carried out during the survey.
- Undercoverage C It is possible for some sample housing units or persons to be missed
entirely by the survey. The undercoverage of persons and housing units can introduce
biases into the data. A major way to avoid undercoverage in a survey is to ensure that its
sampling frame, for ACS an address list in each site, is as complete and accurate as
possible.
The source of addresses in the three urban sites was a new product, the Master Address
File (MAF), currently being developed by the Census Bureau. The MAF is created by
combining the 1990 Census Address Control File and the Delivery Sequence File of the
United States Postal Service. An attempt is made to assign all appropriate geographic
codes to each MAF address via an automated procedure using the Census Bureau TIGER files.
A manual coding operation based in the appropriate regional offices is attempted for
addresses which could not be automatically coded. The MAF was used as the source of
addresses for selecting sample housing units and mailing questionnaires. TIGER produced
the location maps for personal visit CAPI assignments.
In Fulton County, PA the Census Bureau conducted a manual listing and map-spotting
operation in the summer of 1995. Interviewers hand delivered ACS questionnaires to the
addresses from this listing.
In the CATI and CAPI nonresponse follow-up phases, efforts were made to minimize the
chances that housing units that were not part of the sample were interviewed in place of
units in sample by mistake. If a CATI interviewer called a mail nonresponse case and was
not able to reach the exact address, no interview was conducted and the case was eligible
for CAPI. During CAPI follow-up, the interviewer had to locate the exact address for each
sample housing unit. In some multi-unit structures the interviewer could not locate the
exact sample unit or found a different number of units than expected. In these cases the
interviewers were instructed to list the units in the building and follow a specific
procedure to select a replacement sample unit.
- Respondent and Interviewer Error -- The person answering the questionnaire or responding
to the questions posed by an interviewer could serve as a source of error, although the
questions were phrased as clearly as possible based on testing, and detailed instructions
for completing the questionnaire were provided to each household. In addition,
respondents' answers were edited for completeness, and problems were followed up as
necessary.
- Interviewer monitoring -- The interviewer may misinterpret or otherwise incorrectly
enter information given by a respondent; may fail to collect some of the information for a
person or household; or may collect data for households that were not designated as part
of the sample. To control these problems, the work of interviewers was monitored
carefully. Field staff were prepared for their tasks by using specially developed training
packages that included hands-on experience in using survey materials. A sample of the
households interviewed by CAPI interviewers was reinterviewed to control for the
possibility that interviewers may have fabricated data.
- Item Nonresponse -- Nonresponse to particular questions on the survey questionnaire and
instrument allows for the introduction of bias into the data, since the characteristics of
the nonrespondents have not been observed and may differ from those reported by
respondents. As a result, any imputation procedure using respondent data may not
completely reflect this difference either at the elemental level (individual person or
housing unit) or on the average.
Some protection against the introduction of large biases is afforded by minimizing
nonresponse. In the ACS, nonresponse for the CATI and CAPI operations was reduced
substantially by the requirement that the automated instrument receive a response to each
question before the next one could be asked. For mail responses, the clerical edit and
follow-up operations were aimed at obtaining a response for every question on selected
questionnaires. Values for any items that remain unanswered were imputed by computer using
reported data for a person or housing unit with similar characteristics.
- Clerical Review -- Questionnaires returned by mail were edited for completeness and
acceptability. They were reviewed by clerks for content omissions and population coverage.
If necessary, a telephone follow-up was made to obtain missing information. Potential
coverage errors were included in this follow-up, as well as questionnaires with too many
omissions to be accepted as returned.
- Processing Error -- The many phases involved in processing the survey data represent
potential sources for the introduction of nonsampling error. The processing of the survey
questionnaires includes the clerical editing, follow-up by telephone, and keying of data
from completed questionnaires; the manual coding of write-in responses; and the electronic
data processing. The various field, coding and computer operations undergo a number of
quality control checks to insure their accurate application.
- Automated Editing -- After data collection was completed, any remaining incomplete or
inconsistent information was imputed during the final automated edit of the collected
data. Imputations, or computer assignments of acceptable codes in place of unacceptable
entries or blanks, were needed most often when an entry for a given item was lacking or
when the information reported for a person or housing unit on that item was inconsistent
with other information for that same person or housing unit. As in other surveys and
previous censuses, the general procedure for changing unacceptable entries was to assign
an entry for a person or housing unit that was consistent with entries for persons or
housing units with similar characteristics. Assigning acceptable values in place of blanks
or unacceptable entries enhances the usefulness of the data.
Table A. Unadjusted Standard Error for Estimated Totals
[Based on a 15 percent simple random sample.
For estimates from the PUMS multiply standard errors from the table or the formula by
1.83.]
Estimate1 |
Size of publication Area2 |
500 |
750 |
1,000 |
2,500 |
5,000 |
7,500 |
10,000 |
25,000 |
50,000 |
100,000 |
250,000 |
400,000 |
75 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
21 |
125 |
23 |
24 |
25 |
26 |
26 |
26 |
27 |
27 |
27 |
27 |
27 |
27 |
250 |
27 |
31 |
33 |
36 |
37 |
37 |
37 |
38 |
38 |
38 |
38 |
38 |
500 |
. |
31 |
38 |
48 |
51 |
52 |
52 |
53 |
53 |
53 |
53 |
53 |
750 |
. |
. |
33 |
55 |
60 |
62 |
63 |
64 |
65 |
65 |
65 |
65 |
1,000 |
. |
. |
. |
58 |
68 |
70 |
72 |
74 |
75 |
75 |
75 |
75 |
2,500 |
. |
. |
. |
. |
84 |
97 |
103 |
113 |
116 |
118 |
119 |
119 |
5,000 |
. |
. |
. |
. |
. |
97 |
119 |
151 |
160 |
165 |
167 |
168 |
7,500 |
. |
. |
. |
. |
. |
. |
103 |
173 |
191 |
199 |
204 |
205 |
10,000 |
. |
. |
. |
. |
. |
. |
. |
185 |
214 |
226 |
234 |
236 |
15,000 |
. |
. |
. |
. |
. |
. |
. |
185 |
245 |
270 |
283 |
287 |
25,000 |
. |
. |
. |
. |
. |
. |
. |
. |
267 |
327 |
358 |
366 |
50,000 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
377 |
477 |
499 |
75,000 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
327 |
547 |
589 |
100,000 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
585 |
654 |
250,000 |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
. |
731 |
1 To get a better approximation of the standard error of an estimated count
use the formula below.
N = Size of area
= Estimate of
characteristic total
2 The population estimate of the county the tabulation area is in if the estimate
is a person characteristic, the estimate of housing units in the county if the estimate is
a housing characteristic, or the estimate of families if the estimate is a family
characteristic.
Table B. Unadjusted Standard Error in Percentage Points for Estimated
Percentages
[Based on a 15 percent simple random sample.
For estimates from the PUMS multiply standard errors from the table or the formula by
1.83.]
Estimated Percentage |
Base of Estimated Percentage1 |
500 |
750 |
1,000 |
2,500 |
5,000 |
7,500 |
10,000 |
25,000 |
50,000 |
100,000 |
250,000 |
400,000 |
2 or 98 |
1.4 |
1.1 |
1.0 |
0.6 |
0.4 |
0.4 |
0.3 |
0.2 |
0.1 |
0.1 |
0.1 |
0.1 |
5 or 95 |
2.3 |
1.9 |
1.6 |
1.0 |
0.7 |
0.6 |
0.5 |
0.3 |
0.2 |
0.2 |
0.1 |
0.1 |
10 or 90 |
3.2 |
2.6 |
2.3 |
1.4 |
1.0 |
0.8 |
0.7 |
0.5 |
0.3 |
0.2 |
0.1 |
0.1 |
15 or 85 |
3.9 |
3.1 |
2.7 |
1.7 |
1.2 |
1.0 |
0.9 |
0.5 |
0.4 |
0.3 |
0.2 |
0.1 |
20 or 80 |
4.3 |
3.5 |
3.0 |
1.9 |
1.4 |
1.1 |
1.0 |
0.6 |
0.4 |
0.3 |
0.2 |
0.2 |
25 or 75 |
4.6 |
3.8 |
3.3 |
2.1 |
1.5 |
1.2 |
1.0 |
0.7 |
0.5 |
0.3 |
0.2 |
0.2 |
30 or 70 |
5.0 |
4.0 |
3.5 |
2.2 |
1.5 |
1.3 |
1.1 |
0.7 |
0.5 |
0.3 |
0.2 |
0.2 |
35 or 65 |
5.1 |
4.2 |
3.6 |
2.3 |
1.6 |
1.3 |
1.1 |
0.7 |
0.5 |
0.4 |
0.2 |
0.2 |
40 or 60 |
5.2 |
4.3 |
3.7 |
2.3 |
1.6 |
1.4 |
1.2 |
0.7 |
0.5 |
0.4 |
0.2 |
0.2 |
50 |
5.3 |
4.4 |
3.8 |
2.4 |
1.7 |
1.4 |
1.2 |
0.8 |
0.5 |
0.4 |
0.2 |
0.2 |
1 For an estimated percentage not shown in the table, use the formula below to
get standard error approximations. Use this table only for proportions, that is, where the
numerator is a subset of the denominator.
B = Base of estimated percentage
= Estimated percentage
Table C. Standard Error Design Factors - 1996 ACS Test
[Design factors are site specific, use the appropriate column]
Characteristics |
Brevard |
Rockland |
Multnomah |
Fulton |
Population |
|
|
|
|
Persons |
1.6 |
1.7 |
1.6 |
1.8 |
Families |
1.0 |
0.9 |
1.0 |
1.4 |
Households |
0.8 |
0.6 |
0.7 |
0.9 |
Age |
1.5 |
1.5 |
1.5 |
1.1 |
Sex |
1.4 |
1.5 |
1.3 |
1.1 |
Race |
1.7 |
2.0 |
1.6 |
1.5 |
Hispanic Origin |
1.7 |
1.8 |
1.6 |
1.5 |
Marital Status |
1.2 |
1.2 |
1.3 |
0.9 |
Ancestry |
1.8 |
2.0 |
1.8 |
1.6 |
Household Size |
1.2 |
1.3 |
1.2 |
1.0 |
Household Type and Relationship |
1.3 |
1.3 |
1.3 |
1.1 |
Children Ever Born |
1.3 |
1.3 |
1.4 |
1.3 |
Work Disability and Functional Limitation |
1.4 |
1.4 |
1.3 |
1.0 |
Place of Birth |
1.7 |
1.8 |
1.7 |
1.6 |
Residence 5 years ago |
1.9 |
2.0 |
1.8 |
1.7 |
Year of Entry |
1.7 |
2.0 |
2.4 |
1.0 |
Language Spoken at Home and English Ability |
1.5 |
1.8 |
1.4 |
1.4 |
Educational Attainment |
1.4 |
1.5 |
1.4 |
1.2 |
School Enrollment |
1.4 |
1.3 |
1.3 |
1.1 |
Family Type |
1.2 |
1.3 |
1.2 |
1.1 |
Employment Status |
1.3 |
1.3 |
1.2 |
0.9 |
Industry |
1.5 |
1.5 |
1.4 |
1.1 |
Occupation |
1.5 |
1.5 |
1.4 |
1.1 |
Class of Worker |
1.5 |
1.5 |
1.4 |
1.1 |
Hours Per Week and Weeks Worked in past 12 months |
1.3 |
1.3 |
1.3 |
1.0 |
Number of Workers in Family |
1.2 |
1.2 |
1.2 |
1.0 |
Place of Work |
1.5 |
1.4 |
1.3 |
1.1 |
Means of Transportation to Work |
1.4 |
1.4 |
1.4 |
1.1 |
Travel Time to Work |
1.5 |
1.5 |
1.4 |
1.2 |
Private Vehicle Occupancy |
1.4 |
1.4 |
1.4 |
1.1 |
Time Leaving Home to Go to Work |
1.5 |
1.5 |
1.4 |
1.1 |
Type of Income in the Past 12 Months |
0.9 |
0.8 |
0.8 |
0.8 |
Household Income in the Past 12 Months |
1.3 |
1.4 |
1.3 |
1.1 |
Family Income in the Past 12 Months |
1.3 |
1.4 |
1.3 |
1.1 |
Poverty Status in the Past 12 Months (persons) |
1.8 |
1.9 |
1.8 |
1.5 |
Poverty Status in the Past 12 Months (families) |
1.3 |
1.3 |
1.2 |
1.1 |
Armed Forces and Veteran Status |
1.3 |
1.2 |
1.2 |
0.9 |
Housing |
|
|
|
|
Housing units |
0.5 |
0.5 |
0.7 |
1.1 |
Age of Householder |
1.3 |
1.3 |
1.3 |
1.0 |
Race of Householder |
1.1 |
1.1 |
1.1 |
0.9 |
Hispanic Origin of Householder |
1.5 |
1.6 |
1.7 |
1.0 |
Condominium Status |
0.9 |
0.9 |
0.9 |
0.8 |
Units in Structure |
1.1 |
1.1 |
1.0 |
0.9 |
Occupied by Tenure |
1.0 |
1.1 |
1.0 |
0.9 |
Vacant |
1.9 |
1.9 |
1.9 |
1.3 |
Gross Rent |
1.5 |
1.5 |
1.4 |
1.1 |
Year Structure Built |
1.1 |
1.2 |
1.2 |
1.0 |
Rooms, Bedrooms |
1.2 |
1.3 |
1.2 |
1.0 |
Kitchen Facilities, Plumbing Facilities, and Source of Water |
0.6 |
0.5 |
0.5 |
0.7 |
Heating System |
0.6 |
0.4 |
0.7 |
0.8 |
Sewage Disposal |
0.7 |
0.6 |
0.6 |
0.9 |
House Heating Fuel |
0.9 |
0.7 |
1.1 |
0.9 |
Telephone |
1.0 |
1.0 |
1.0 |
0.9 |
Vehicles Available |
1.2 |
1.3 |
1.2 |
1.0 |
Length at Residence |
1.3 |
1.3 |
1.3 |
1.1 |
Mortgage Status and SMOC |
1.2 |
1.2 |
1.2 |
1.1 |
Gross Rent as a Percent of Household Income |
1.6 |
1.6 |
1.4 |
1.2 |
SMOC as a Percent of Household Income |
1.2 |
1.3 |
1.2 |
1.0 |
Air Conditioning |
0.7 |
1.1 |
1.2 |
1.0 |
Value |
1.2 |
1.2 |
1.2 |
1.0 |
Aggregate Persons by Tenure by Units in Structure |
2.0 |
2.2 |
2.0 |
2.0 |
Footnotes:
(1) Single-Unit: Only one housing unit at the basic
street address, or Multi-Unit: More than one housing unit at the basic street address.
Back
(2) Owner, Renter, or Not applicable. "Not
applicable" applies to temporarily occupied housing units. Back
(3) Married/Widowed, or Other Back
(4) A temporary duplicate of each February 1996 CAPI record was
created prior to estimation and assigned a tabulation month of January 1996. This
temporary record was weighted separately through the factor MBF. In the FAF stage, the
total weight of this temporary January 1996 CAPI record was added to the total weight of
its corresponding original February CAPI record. The temporary record was then deleted.
Back
(5) White, Black, or Other Back
(6) Male or Female Back
(7) Hispanic or Non-Hispanic Back
Top of page
|