Documentation for Puerto Rico Population Estimates Source: U.S. Bureau of the Census Internet Release date: April 30, 1997 The 1996 total population estimate for the Commonwealth of Puerto Rico and the estimates for its municipios were produced using two different methods. The total estimate for the Commonwealth was produced using a cohort component method developed by the Rural Urban Projections office (RUP) of Population Division, while the municipio estimates were calculated using a ratio correlation method developed by the Population Estimates Branch (PEB). The Cohort Component Method The cohort component method developed by RUP uses components of population change (births, deaths, and migration) to produce Puerto Rico population estimates by single year of age and sex. To make estimates using the cohort component method, a base populaton is required. The base year for the 1996 estimates is 1990. Since the Census Bureau makes midyear to midyear estimates, the census population was moved to midyear using the intercensal growth rate. Base mortality for 1990 for Puerto Rico was derived by calculating a life table using an average of registered age-specific deaths for 1989-1991 and the aforementioned midyear 1990 population estimate. Registered deaths by age and sex for 1990-1993 were used to project the population from midyear 1990. For 1994, only total deaths and infant deaths were available. In the case of fertility, the population was estimated using total reported births by age of mother, by sex, for 1990-1993. For 1994, only total registered births were available. The estimate for migration was derived by using the difference between two independently calculated populations for Puerto Rico. One populaton estimate used intercensal migration data while the second method used housing unit data. Methodology for Puerto Rico Municipio Population Estimates In the ratio-correlation model used for Puerto Rico, a multiple-regression equation is used to relate changes in the distribution of births, deaths, and housing units to changes in the distribution of population among municipios. For both development of the regression equation and the computation of the population calculating ratios of percentage shares in the later year to corresponding percentage shares in the earlier year. These transformations cause the resulting coefficients in the predicting equation to add to approximately 1.0. The regression equation is given by: Y predicted = .03 + .13births + .09deaths + .75housing units The r-square is .81 Below is an extract of a paper by Michael J. Batutis that provides a general-purpose explanation of the regression method used at various times in the population estimates program. For a hardcopy of the paper call (301) 457-2380. Subnational Population Estimates Methods of the U.S. Bureau of the Census Prepared By Michael J. Batutis Chief, Population Estimates Branch Population Division Bureau of the Census October 1991 (From page 9 of the Batutis paper) B. Regression Method Regression in a variety of forms has a long history in population estimates at nearly all levels of geography. In the usual applications, regression is a stock model whereby the population at time t is used as a base and a regression equation is used to estimate, or predict, the population at time t+n. There is no attempt in this method to deal with the demographic dynamics of population change. The application of regression that is used by the Census Bureau traces its roots to a 1954 article by Schmitt and Crosetti in which they tested the accuracy of several methods of estimating population (Schmitt and Crosetti, 1954). One of the methods was a so called ratio-correlation model, although the reasons for this name are obscure. The model is a least-squares, linear regression model in which the independent variables are ratios of county proportions of selected symptomatic indicators of population change in the estimate interval to the corresponding proportions in the base interval. The dependent variable in the model is the change in a county's share of the state population between the base point and the estimate date. Schmitt and Crosetti presumably called this method the ratio-correlation method because they chose the symptomatic indicators of population change for the model by examining a zero-order correlation matrix and selecting independent variables that were highly correlated with population. Although this procedure may be desirable, it is not intrinsic to the model and a better name would be ratio-regression, or simply regression. For purposes of this handbook, the term ratio-regression is used. In equation form, the generalized ratio-regression model is: Editor's note: formulas did not translate well to the text file, phone author for hardcopy (6) Y SUB {t} = X SUB {1} + BX SUB {2t} + BX SUB {3t}... BX SUB {i,t} + U SUB {t} where Y SUB {t} = the estimated value as of the most recent census. X SUB {1} = a constant. B = regression coefficient. X SUB {2}...X SUB {i} = independent variable. U SUB {t} = a term for random error. In a ratio-regression model, the dependent variable y SUB t takes the form: (7) y SUB {t,k} = (P SUB {t,k}/ SUM from {k=1} to m P SUB {t,k})/(P SUB {t-10,k}/SUM from {k=1} to m P SUB {t- 10,k}) where P = total population. t = the year of the most recent census. k = an index for geographic areas. m = the number of geographic areas. Therefore in the ratio-regression model, the dependent variable is the ratio of the k SUP{th} geographic area's share of the population across all k areas at the most recent census to the k SUP {th} geographic area's share of population at the census ten years prior. If the additional step is taken of subtracting 1 from Y SUP {t}, as in (8) {þ SUP {*}} SUB {t,k} = þ SUB {t,k} - 1.0 then the interpretation of Y SUB {t,k} is the change in population share in the k SUB {th} geographic area from one census to the next. The independent variables are defined in a similar way in the ratio regression model, except that each independent variable represents a different symptomatic indicator variable, like school enrollment or births. In equation form, the independent variables are: (9) X SUB {i,t,k} = (X SUB {i,t,k}/ SUM from {k=1} to m)/(X SUB {i,t-10,k}/SUM from {k=1} to m X SUB {i,t-10,k}) where X = the value of the symptomatic indicator variable. i = an index for the symptomatic indicators. t = the most recent decennial census year. k = an index for geographic areas. m = the number of geographic areas. All of these equations really express the rather simple idea that the change in a geographic area's, say a county, share of a number of symptomatic indicators over all counties in a state is related to the change in that county's share of state population. The choice of symptomatic indicator variables is guided by a demonstrated or presumed relationship to population coupled with the availability of the symptomatic indicator data in the post-censal period. This last point is particularly important since the application of the ratio-regression model to a population estimate requires the substitution of current data into the independent variables. Thus, equation (9) becomes (10) X SUB {i,t+n,k} = (X SUB {i,t+n,k}/SUM from {k=1} to m X SUB {i,t+n,k})/(X SUB {i,t,k}/SUM from {k=1} to m X SUB {i,t,k}) where n = the number of years between the most recent census and the estimate date. the result of the model, þ, becomes (11) þ sub {t+n,k} = (P sub {t+n,k}/sum from {k=1} to m P sub {t+n,k})/(P sub {t,k}/sum from {k=1} to m P sub {tk}) and (12) {þ sup {*}} sub {t+n,k} = þ sub {t+n,k} - 1.0 The ratio-regression model is used to estimate the change in a geographic sub-area's share of a larger geographic area's population since the last decennial census. The estimated change is used to compute a new share at the estimate date and applying the new share to an independent population total for the parent geographic area results in the estimated population at time t+n for the kth geographic sub-area. The effectiveness of the ratio-regression model for preparing population estimates depends on two major factors. First, the method presumes that an accurate population estimate for some higher level of geography is available. If the higher level estimate has a large degree of bias, the ratio-regression method may distort local variations in population. Second, and more seriously, the model assumes stability among the relationships of the independent and dependent variables. The regression coefficients are calculated on the basis of the experience of the prior decade and are held constant through the post-censal estimating interval until the next succeeding census. If the assumed relationships change differentially over time for various types of sub-areas, the method will produce a seriously distorted distribution of population. Another factor that affects the accuracy of the ratio-regression model is multicollinearity among the independent variables. The choice of independent variables is based on the strength of their correlation with population. Therefore, the independent variables are also likely to exhibit high correlations among themselves. If the pattern of intercorrelation changes over time, the accuracy of the regression coefficient is reduced. The strengths of the ratio regression method are ease of applic ation, provided that symptomatic indicators are available at the appropriate levels of geography, and the flexibility and stability that are inherent in the use of symptomatic indicator data. The method has been widely applied in numerous estimating environments and variations are suggested frequently. The model has generated a voluminous literature and has generally tested well against decennial census results.