U.S. Census Bureau

 Small Area Income & Poverty Estimates

 Model-based Estimates for States, Counties, & School Districts


2000 County-Level Estimation Details

The 2000 state and county estimates of poverty and income were released in October 2003. For an overview of the changes in methodology between this release and the previous release see Estimation Procedure Changes.


Here are some points to consider about the 2000 estimates of poverty for counties:

Using counties in the ASEC sample. Our use of the ASEC implicitly assumes that the counties in the survey sample are representative of those not selected, but this need not be the case. The ASEC sample is designed to represent each state's population and only incidentally represents counties. The characteristics of some counties guarantee that they are included, e.g., most counties in large metropolitan areas and counties with large populations. More generally, while all counties have a nonzero probability of being included in the sample, some have higher probabilities than others. Further, the probability of selecting a county may be related to its income and poverty level. On the other hand, comparison of regression equations based on census data for counties in the ASEC sample and equations based on all counties indicate remarkably similar results, providing some assurance that the ASEC counties are largely representative of all counties.

The survey weights used in estimation at the national level are not appropriate for county-level estimates. The ASEC sample design selects some primary sampling units (usually a county or group of counties) to represent a set of counties in the same stratum. The sum of the weights for sample households from such a county estimates the total population of the entire set of counties it represents. Because we want each county in the ASEC sample to stand for itself, we have adjusted the weights to make each county self-representing.

Estimation of the model equation. ASEC sampling variances are not constant over all counties. We avoid giving observations with larger variances (a great deal of uncertainty) the same influence on the regression as observations with smaller variances (less uncertainty) by, in effect, weighting each observation by the inverse of its uncertainty. Representing this uncertainty requires recognizing that it arises from two sources:

To estimate the lack-of-fit component, we estimate our model using the Census 2000 data and assume that the lack-of-fit component of residual variance is the same when the same model is fit to the ASEC and to the census. Since we have separate estimates of sampling variance for each observation in Census 2000, we use them to estimate the unknown lack-of-fit component with a maximum likelihood procedure (see "Chapter 8: Accuracy of the Data" in Census 2000, STF3 documentation). (7.4M) PDF

Next we fit a regression equation to the ASEC data. We assume the sampling variance of the log of the number of people in poverty is inversely proportional to the square root of the sample size (in households) and the lack-of-fit variance is the same as that estimated in the census regression. We estimate the ASEC regression parameters and the two components of the ASEC variance with a maximum likelihood procedure.

Combining model and direct survey estimates. Final estimates are weighted averages of the model predictions and the direct ASEC estimates, where they exist. The two weights for each county add to 1.0, and we compute the weight on the model prediction as the sampling variance divided by the total variance (sampling plus lack-of-fit) of the direct estimate. With this technique, the larger the sampling variance of the direct estimate, the smaller its contribution and the larger the contribution from the prediction model. These weights are commonly referred to as "shrinkage weights," and the final estimates as "shrinkage" or "Empirical Bayes" estimates. For counties not in the ASEC sample, the weight on the model's predictions is one and the weight on the direct survey estimate is zero.

Controlling to State Estimates.The last steps in the production process are transforming the county estimates from the log scale to estimates of numbers and controlling them to the independently derived state estimates. We make a simple ratio adjustment to the county-level estimates to ensure that they sum to the state totals. We control model-based estimates at the state level to the national level direct estimates derived from the ASEC. We adjust the estimated standard errors of the county estimates to reflect this additional level of control.

The estimates for the number of school-aged children in poverty are handled slightly differently. The Department of Education, a major sponsor of the SAIPE project, requires that the estimated numbers of school-aged children in poverty be integers. We use an algorithm to round the counties’ estimates in a way that forces the sum of the estimates of school-aged children in poverty for the counties to sum to the estimate for the states. Note that this algorithm is first applied to the states’ estimates, so they are integers and add to the integer-values national estimate.

We do not control estimates of county median household income to the state medians because the estimation model does not produce the entire household income distribution, which would be required to do so.

Standard Errors and Confidence Intervals. One goal of our small area estimation work is providing estimates of the uncertainty surrounding the estimates of the numbers of people in poverty. The census and model-based estimates shown in the tables are accompanied by their 90-percent confidence intervals. These intervals were constructed from estimated standard errors.

For the model-based estimates, the standard error depends mainly on the uncertainty about the model and the ASEC sampling variance. While the variance of the shrinkage weights could also be a significant component of uncertainty about our estimates (if sizeable and ignored, we would be underestimating the standard errors), our research indicates that its contribution is negligible.

For the census, we derive the standard errors from a set of generalized variance functions that reflects the nature of the census sample design for the long form questionnaire (for further information, see Quantifying Uncertainty in the Estimates).

The Model for Total Number of People in Poverty

The model is multiplicative; that is, we model the number of people in poverty as the product of a series of predictors that are numbers (not rates), and we model the unknown errors. To estimate the coefficients in the model, we take logarithms of the dependent and all independent variables. Our choice of a multiplicative model is motivated, in part, by the fact that the distribution of the number in poverty has a huge range -- from zero in some counties to more than a million in the largest county (with a mean of 10,000), based on the Census 2000 -- and the distribution is highly skewed. Taking the logarithm of all variables makes their distributions more centered and symmetrical and has the effect of diminishing the otherwise inordinate influence of large counties on the coefficient estimates. Another advantage of a multiplicative model is that it makes it plausible to maintain that the (unobserved) errors for every county, no matter how large or small, are drawn from the same distribution.

The predictor variables in the regression model used to estimate the total number of people in poverty by county for income year 2000 are:

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the total number of people in poverty in each county as measured by the three-year average of values from the ASEC for 2000, 2001, and 2002. We combine the regression predictions, in the log scale, with the logs of the direct ASEC sample estimates, and then transform the results into estimates of the numbers of people in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for the Number of Related Children Ages 5 to 17 in Families in Poverty

The estimation model for related children age 5 to 17 in poverty parallels that for all people in poverty in structure. There are five predictor variables:

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the number of related children in poverty ages 5 to 17 in each county as measured by the three-year weighted average of the ASEC for 2000, 2001, and 2002. We combine the regression predictions, in the log scale, with the logs of the direct ASEC sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for the Number of People Under Age 18 in Poverty

The estimation model for people under age 18 in poverty is quite similar. There are five predictor variables:

For further information on these variables see Information about Data Inputs.

The dependent variable is the log of the number of people in poverty under age 18 in each county as measured by the three-year weighted average of the ASEC for 2000, 2001, and 2002. We combine the regression predictions, in the log scale, with the logs of the direct ASEC sample estimates, and then transform the results into estimates of the numbers in poverty. Finally, we control the estimates to the independent estimates of state totals.

The Model for Median Household Income

Like the models for the number of people in poverty, the model for median household income is multiplicative. A consequence of the multiplicative form and the model performing well relative to the direct ASEC estimates of median household income is that the standard errors of the estimates are proportional to the point estimates. In other words, the unobserved errors associated with wealthy counties are larger than the unobserved errors in counties with high proportions of poverty. To estimate the model, we take logarithms of the dependent and all independent variables; i.e., the model is linear in logarithms. However, we report median household income in the linear scale and, as a result, the confidence intervals are asymmetric. The predictor variables in the regression model used to generate the estimate for 2000 county median household income are:

We define the nonfiler rate as the ratio of estimated total population minus total exemptions claimed on IRS tax returns to estimated total population. For further information on these variables see Information about Data Inputs.

The dependent variable is the log of county median household income interpolated with three years of ASEC surveys given in 2000, 2001, and 2002. We adjust the 2000 and 2002 ASEC surveys to express incomes in 2000 dollars before computing median household income, using the official Consumer Price Index for Urban Consumer (CPI-U).


Source: U.S. Census Bureau, Data Integration Division, Small Area Estimates Branch
For assistance, please contact the Demographic Call Center Staff at 301-763-2422 or 1-866-758-1060 (toll free) or visit ask.census.gov for further information.