LandScan Document

LandScan Global Population 1998 Database

For more information and/or a copy of the LandScan Global Population 1998 Database please contact:
Eddie A. Bright
Oak Ridge National Laboratory
P.O. Box 2008, MS 6237
Oak Ridge, TN 37831

August, 2002

LandScan Global Population 2001 Database Release

LandScan Global Population 2000 Database Release (ORNL Website)The Global GIS DVD volume switched to this dataset

Global GIS Raster Warning:

For the Global GIS product to make use of this dataset some processing was done to the original raster data. ArcView Data Publisher cannot query image pixel values. To overcome this problem a compromise was made. We created a 0.1 by 0.1 point file to store the image pixel values. Thus any values obtained from this method are approximate. For example, you may wish to query the exact top of a mountain peak for its' elevation. If this point does not fall exactly at a 0.1 x 0.1 degree point the value returned back to the user will be from the closest point. The value will not be as precise as the original dataset.

(Inserted by Trent Hare)

Portions of this original document were deleted because not all the datasets from LandScan are used in the Global GIS product. Thus the single point population count is an approximate value because the data was thinned from 30" x 30" to 6' x 6' (where " is seconds and ' is minutes). The total population calculation uses the same 6' x 6' sample size, but was calculated with a aggregate formula to total the sum of the values prior to thinning. The same thinning process was applied to the Lights at Night, Land Cover, and Slope datasets from the LandScan product.

(Inserted by Trent Hare)

Summary
Introduction
Ambient Versus Residential Population
Best Available Population Databases

Input Variables
- Roads [Image]
- Slope [Image]
- Land Cover [Image]
- Populated Places
- Nighttime Lights [Image]
- Exclusion Areas
- Urban Density Factor
- Coastlines
Population Model

Results
Conclusions
References

Summary

The LandScan Global Population Project is a worldwide population database at 30" X 30" resolution for estimating ambient populations at risk. Best available census counts are distributed to cells based on probability coefficients which, in turn, are based on road proximity, slope, land cover, and nighttime lights. Implementation will proceed region by region to complete global coverage in approximately one year. Version 1.2 has been completed for the entire world. Verification and validation (V&V) studies have been conducted routinely for all regions and more extensively for portions of the Middle East and the Southwestern United States.

Introduction

Natural and manmade disasters place vast populations at risk, often with little or no advance warning. Geographic information is essential for quick and effective response. How will a contaminant be dispersed? Where will it go? How many people are at risk? Who are they? Where are they? Emergency response by the United Nations, the United States, and other national and international organizations requires simulation of contaminant transport by air and water plus improved estimates of global population distribution.

Air diffusion models available today are capable of estimating contaminant plumes at spatial precisions far exceeding those of most official censuses. For many years, the U.S. Census Bureau has enhanced the precision of global population estimates through a manual procedure designed to allocate rural populations to 20' X 30' cells and urban populations to circles centered on major population concentrations. Yet, analysis of most hazardous releases requires data resolutions on the order of 1 square kilometer or even finer. To meet this need, Oak Ridge National Laboratory (ORNL) has developed an automated procedure to allocate rural and urban population distributions to 30" X 30" cells. The resulting population distribution can be used for (a) emergency response to natural disasters, terrorist incidents, or other threats; (b) humanitarian relief in famines and other long term disasters; (c) protection of civilian populations; (d) estimation of populations affected by global sea level rise; and (e) numerous other environmental and demographic applications.

ORNL's Global Population Project, part of a larger global database effort called LandScan, collects best available census counts (usually at province level) for each country, calculates a probability coefficient for each cell, and applies the coefficients to the census counts which are employed as control totals for appropriate areas (usually provinces). Ideally, the polygons associated with aggregate populations are administrative units with accurate census counts, but the procedure will work for any polygon. The probability coefficient is based on slope, proximity to roads, land cover, nighttime lights, and an urban density factor. GIS is essential for conflation of diverse input variables, computation of probability coefficients, allocation of population to cells, and reconciliation of cell totals with aggregate (usually province) control totals. Remote sensing is an essential source of two input variables-land cover and nighttime lights-and one ancillary database-high-resolution panchromatic imagery-used in verification and validation (V&V) of the population model and resulting LandScan database.

Ambient Versus Residential Population

The resulting LandScan distribution represents an ambient population which integrates diurnal movements and collective travel habits into a single measure. This is desirable for purposes of emergency response and, fortuitously, is easier to accomplish with currently available global imagery and other geographic data. Consider, for example, the hypothetical case of a cell with a major, multilane highway passing through an uninhabited desert. If an NBC release contaminates the cell, many lives will be at risk even though no one lives there. Most official census counts, if available at such fine resolution, would show zero population because most national censuses are concerned with residential population based primarily on where people sleep rather than where they work or travel. In the LandScan procedure population will be apportioned to the cell based on the presence of a highway and perhaps on nighttime lights emanating from ambient traffic. Consider another cell containing a large agricultural field and no houses. Most censuses would place farm workers in their village residences and record zero populations for their fields. Yet, a few lives are at risk in the fields, depending on when the NBC release occurs. Hence, our procedure will show a small population in the crop cell and a slightly reduced population in the village to suggest, albeit imprecisely, the collective time that villagers are in their fields rather than their homes. Even arid grassland cells will have sparse populations assigned to simulate the movements of nomads and other herders. Currently, we integrate all ambient population into a single value for each cell and do not attempt to distinguish the timing of such movements. The same can be said of factories, airports, and other places of work and travel.

Best Available Population Databases

Census Counts

All population counts, even the most sophisticated high-resolution official censuses of advanced nations like the United States, are stochastic estimates. Accuracy and precision are limited by the census takers' access to homes and even to whole neighborhoods; by the census takers' understandings of personal work and travel habits; and by the frequency with which censuses can be undertaken. These limits are exacerbated in many nations due to lack of resources and, all too often, outright manipulation of census figures to meet political objectives. In addition, many nations are reluctant to release detailed census counts, and some release only a national total. For most of the world, the best available official census data are at province level (i.e., one administrative division below national) and of varying age, sometimes decades old. A few nations (e.g., Israel) release high-quality census counts for sub-provinces, but only a few release the geometry of sub-province boundaries in digital form (e.g., U.S. Census TIGER files).

The variable quality of census figures from country to country presents a major challenge to global population distribution efforts such as LandScan. Official census counts must be acquired from published sources and evaluated skeptically. Fortunately, for most countries the demographic literature is surprisingly rich, deficiencies are recognized by scholars, and adjustments have been proposed in literature. In addition to published articles and reports, the World Wide Web has become an invaluable resource in locating and acquiring population data and understanding consensus and disagreement among demographers. Ultimately, ORNL analysts must choose a single number for each nation or province based on their own professional judgments of arguments and evidence offered by demographers. We reiterate that our purpose is not to count people and certainly not to count them in their nighttime residences. Our purpose is to distribute populations based on their likely ambient locations integrated over a 24-hour period for typical days, weeks, and seasons.

P-95 Circles and Rural Cells

Since 1965, the Geographic Studies Branch of the Center for International Research (CIR) of the U. S. Bureau of the Census has generated the most authoritative and, prior to 1995, the finest spatial resolution population database available for the whole earth (Leddy, 1994). CIR acquires latest census counts; conducts extensive evaluations; projects total country population growth based on births, deaths, and migrations; distributes country population to small areas; and projects small area populations annually for 12 years. Rural populations are allocated to cells measuring 20' latitude by 30' longitude. In certain areas-such as the United States, Western Europe, and Israel-rural populations are allocated to "mini-cells" measuring 5' X 7.5'. Urban agglomerations of 25,000 people or more are covered by one or more circles encompassing at least 95% of the population. These features, ranging from 0.3 to 2.0 nautical miles in radius, are known as P-95 circles. Each circle must contain at least 5,000 people, and at least 80% of the area covered by large circles (0.6 nautical mile radius or greater) must be residential built-up. Smaller circles (0.5 nautical mile radius) often are placed on the expanding edge of cities in anticipation of future growth.

The Global Demography Project

The Global Demography Project (Tobler et al., 1995), conducted by the National Center for Geographic Information and Analysis (NCGIA), developed a 1994 population database at 5' X 5' resolution for most of the world (57^o S to 72^oN). This constitutes the finest resolution global population database yet produced. However, its utility is limited due to three factors acknowledged by its authors. (1) Census data were obtained from the United Nations Statistical Division, which makes no attempt to evaluate the accuracy of census counts provided by individual nations. (2) Census dates, ranging from 1979 to 1994 were projected to 1994 based on annual growth rates by country also provided by the United Nations. (3) The algorithm employed to distribute population from administrative units (usually provinces) to cells is purely cartographic and is based on population alone. The authors note certain types of error resulting from these factors, and suggest that improvement would result from a "smart" interpolation or co-Kriging that incorporates ancillary data such as location and size of towns and cities, roads, railroads, natural features, and nighttime lights.

Input Variables

Calculation of the probability coefficient for each cell depends on publicly available databases offering worldwide coverage of roads, slope, land cover, and nighttime lights at scales of 1:1,000,000 or larger and resolutions of 1 km. or finer. The sources and characteristics of current databases are discussed in this section. All data are processed and transformed into a 30" X 30" lat/lon grid cell system.

Transportation networks (i.e., roads, railroads, airports, and navigable waterways) are primary indicators of population. As a single indicator, roads are preferred because of their vital role in human settlements with or without other forms of transport. It would be helpful to know the location of all roads and to calculate road densities as suggestive of population densities, but this is not possible for most of the world. The United States is an exception due to the availability of Census TIGER files which include the geometry of local roads and even some private driveways and farm roads. The best universal coverage of road networks comes from the National Imagery and Mapping Agency's (NIMA) Vector Smart Map (VMAP) series. VMAP-Level 0 (formerly Digital Chart of the World) is publicly available and covers the entire world at 1:1,000,000 scale. We consider VMAP-Level 0 a staple source for global coverage of road networks, though we plan to include VMAP-Level 1 data (1:250,000 scale) in future iterations as tiles become available.

Slope is an important variable in calculating the LandScan population probability coefficient because most human settlements occur on flat to gently sloping terrain. Even in regions noted for hillside settlement, relative measures of slope may correspond (inversely) with population density. The ideal measure of slope would be the area (at resolutions approaching the typical size of individual home sites) in each slope category, expressed as a percentage of LandScan cell area. LandScan's slope resolution is limited, however, by data availability and by the processing burden that would be required for global coverage. Hence, LandScan employs NIMA's Digital Terrain Elevation Data (DTED) Level 0, 30 Arc Second Terrain Data. We calculate a single gradient for each 1 km. LandScan cell.

Perhaps, the best single indicator of population density is land cover type. With local knowledge and well-structured in situ sampling one conceivably might determine average densities per unit of area for each land cover type which then could be multiplied times the total area occupied by that type. In most regions population would range from extremely low density in desert, water, wetlands, ice, or tundra land cover to high density in developed land cover associated with urban land use. Arid grasslands, forests, and cultivated lands would range in between. Globally, of course, such rigorous in situ sampling is infeasible, especially in politically sensitive areas. Alternatively, LandScan analysts assign relative weights to each land cover type and employ these weights in calculating the probability coefficient for each cell.

Even at 1 km. resolution, land cover can be a good indicator of relative population density, and its efficacy improves as resolution approaches the typical size of individual home sites. For example, the National Oceanic and Atmospheric Administration's (NOAA) Coastal Change Analysis Program (C-CAP) has demonstrated that high intensity developed and low intensity developed land cover can be distinguished reliably for coastal regions of the United States with Landsat Thematic Mapper (TM) imagery at 30 m. resolution (Dobson et al., 1995). Currently, the best land cover database available worldwide is the U.S. Geological Survey's (USGS) Global Land Cover Characteristics (GLCC) database derived from Advanced Very High Resolution Radiometry (AVHRR) satellite imagery at 1km. resolution (Loveland, 1991). Globally, GLCC is the staple land cover database for calculation of LandScan probability coefficients. Regionally, we find it reasonably reliable for all land cover types except wetlands and developed lands, but there is considerable variation in accuracy from cell to cell. In test comparisons against C-CAP data in the United States, most wetlands were recorded as water in GLCC. For all areas we tested, GLCC's developed land cover category is a rasterized version of VMAP-Level 0's "populated polygons," with attendant limitations which are discussed in the following section.

The LandScan Land Cover Database is derived from the U.S. Geological Survey's (USGS) Global Land Cover Characteristics (GLCC) database with the following substantial modifications:

1. The LandScan Land Cover Database has been georegistered at 30 arc second resolution in a common grid for the entire globe. The original GLCC database was in Goode's Homolosine projection.
2. Considerable effort has been devoted to reconciling the positional accuracy of diverse global databases. Mismatches among databases were most conspicuous on coastlines. On the southern coast of France, for example, positional errors amounted to several kilometers, and these have been corrected. Globally, the LandScan Land Cover database coastlines are based on NIMA's World Vector Shoreline (WVS) at 1:250,000 scale. Typically, this coastline differs somewhat from the related line representing the seaward boundary of administrative units, and both of these differ from the land/water boundary indicated on the GLCC gridded database. In the final LandScan Land Cover Database, Version 1.1, the WVS takes precedence, and water is assigned to all cells extending more than one-half cell beyond the WVS coastline. Wherever the land surface had to be expanded to reach the WVS shoreline, we inserted an "unclassified land."
3. The LandScan Land Cover Database contains a much improved "urban" class. We replaced the USGS "urban" class with two new classes--developed and partly developed. The developed class is composed of GLCC's urban cells plus all cells included in the Census Bureau's P-95 circles. The partly developed class is derived from Nighttime Lights of the World and contains all cells with a frequency value of 90% or greater. The the "partly developed" class typically includes suburban areas, small towns, and scattered industries, airports, etc.

For further information and detailed land cover class definitions check the Global Land Cover Characteristics (GLCC) database web site at:

http://edcdaac.usgs.gov/glcc/glcc.html

This GLCC web site at EROS Data Center points to the following documents:

USGS, National Mapping Program. 1992. Standards for digital line graphs for land use and land cover, technical instructions. Referral STO-1-2. Anderson, J. R., E. E. Hardy, J. T. Roach, and R. E. Witmer. 1976.

A land use and land cover classification system for use with remote sensing data: U.S. Geological Survey Professional Paper 965. For further information on Nighttime Lights of the World check the National Geophysical Data Center web site at:

http://julius.ngdc.noaa.gov:8080/production/html/BIOMASS/night.html

______________________________________

LandScan Land Cover Classes:

1 "Developed"
2 "Dry Cropland & Pasture"
3 "Irrigated Cropland"
5 "Cropland/Grassland"
6 "Cropland/Woodland"
7 "Grassland"
8 "Shrubland"
9 "Shrubland/Grassland"
10 "Savanna"
11 "Deciduous Broadleaf Forest"
12 "Deciduous Needleleaf Forest"
13 "Evegreen Broadleaf Forest"
14 "Evergreen Needleleaf Forest"
15 "Mixed Forest"
16 "Water"
17 "Herbaceous Wetland"
18 "Wooded Wetland"
19 "Barren"
20 "Herbaceous Tundra"
21 "Wooded Tundra"
22 "Mixed Tundra"
23 "Bare Tundra"
24 "Snow or Ice"
25 "Partly Developed"
28 "Unclassified"

==================================== ====================================

Global land cover databases are expected to improve as new satellite data become available. The MODIS (200-500 m. resolution, 36 spectral bands) satellite, if successfully launched in the near future, likely will replace AVHRR as the staple data source. Landsat MSS (60 m.) and TM (30 m.) will be principal sources for local area coverage, and these may be augmented in many local applications with finer resolution sources such as SPOT (10/20 m.). New commercially available "small sat" data, may be employed in certain instances to enhance spatial precision, temporal frequency, or spectral definition.

Populated Places

VMAP-Level 0 contains three categories of human settlement features. Two of them are point features distinguished only as "named" or "unnamed" populated places; the other consists of polygon boundaries for larger urban areas. Attributes for named populated places and populated polygons provide the name but not the population count for each place. Populated polygons originally were digitized from small-scale maps, sometimes aeronautical charts dating from the 1970s, and they are now notoriously imprecise and out of date.

We match the populated polygons with nighttime lights (discussed in the following section) and assign a greater probability weighting for LandScan cells containing both features than that for cells containing only nighttime lights.

Nighttime Lights

Several deficiencies of the previously discussed databases can be overcome with satellite data produced by the Defense Meteorological Satellite Program (DMSP) which measures nighttime light emanating from the earth's surface at 1 km. resolution. Unfortunately, the DMSP Operational Line Scanner (OLS) data which measure light intensity (Elvidge et al., 1997; Sutton et al., 1997; Sutton 1997) have not been released to the public. Hence, LandScan employs the Nighttime Lights of the World light frequency data processed and provided by NOAA's National Geophysical Data Center (NGDC). Frequency data cover the Northern Hemisphere and South America, but most areas south of the equator are limited to a binary value indicating lights present versus no lights present.

Investigating the efficacy of nighttime lights for estimating population in the United States, Sutton et al. (1997) found that saturated pixels (i.e., adjusted pixel value of 64) cover almost 8% of the territory of the contiguous 48 states and account for about 80% of the population in those states. Conversely, about 17% of the population, occupying about 90% of the land area, is dispersed too sparsely for detection (i.e., adjusted pixel value of 1) by this particular sensor. Sutton (1997) further investigated the correlation of nighttime lights with population density and his model accounted for 25% of the variation in population density. Thus at the high end of the population/light spectrum, no further distinction of population densities is possible once light saturation occurs. At the low end of the spectrum, no further distinction is possible in pixels with undetected lights. Sutton et al. (1997) suggest that nighttime lights "might also be used as a primary informant to a "smart" interpolation program for modeling human population distributions in areas where only large scale aggregate data are available." They recommend candidate variables to include city locations, coastlines, landforms, railroads, airports, harbors, and rivers.

Exclusion Areas

Areas with ambient populations of less than 1 person per LandScan cell are determined by identifying the Census Bureau's 20' X 30' cells with zero rural populations and no P-95 circles. These are then compared with populated places, roads, land cover, and nighttime lights. If none of these databases contradict the 20' X 30' cell data, zero population is assigned to all LandScan cells inside the 20' X 30' zero population cells and also to adjacent LandScan cells which likewise show no indicators of population, even if they lie within 20' X 30' cells which contain some population. Water cells and ice cells are assigned zero population.

Urban Density Factor

We match the point locations and diameters of P-95 circles with nighttime lights, and increase the probability weighting for LandScan cells containing both features over cells containing only nighttime lights. The associated P-95 population values proportionally increase the probability weighting, but absolute P-95 values are not employed in the final calculation of LandScan cell populations.

Coastlines

Considerable effort is required to reconcile the positional accuracy of diverse global databases, and mismatches among databases are most conspicuous on coastlines. Globally, LandScan coastlines are based on NIMA's World Vector Shoreline (WVS) at 1:250,000 scale. Typically, this coastline differs somewhat from the related line representing the seaward boundary of administrative units, and both of these differ from the land/water boundary indicated on the GLCC gridded database. In the final LandScan Global Population Database, Version 1.1, the WVS takes precedence, and no population is apportioned to cells extending more than one-half cell beyond the WVS coastline.

Population Model

Best available census counts (usually at province level) are allocated to 30" X 30" cells through a "smart" interpolation based on the relative likelihood of population occurrence in cells due to road proximity, slope, land cover, and nighttime lights. Probability coefficients are assigned to each value of each input variable, and a composite probability coefficient is calculated for each LandScan cell. Coefficients for all regions are based on the following factors:

Roads, weighted by distance from major roads,

Elevation, weighted by favorability of slope categories,

Land Cover, weighted by type with exclusions for certain types, and

Nighttime lights of the World, weighted by frequency.

The resulting coefficients are weighted values, independent of census data, which can then be used to apportion shares of actual population counts within any particular area of interest. Coefficients vary considerably from country to country even within a particular region. Control totals can be based on any administrative unit (nation, province, district, minor civil division) or arbitrary polygon for which census data are available. The resulting population distribution is normalized and compared with appropriate control totals to ensure that aggregate distributions are consistent with census control totals. Successful operation of the model has been demonstrated for various control totals, control areas, and weighting values.

Results

Distribution of Probability Coefficients

Probability coefficients are derived from the population model for each country. In one region in Version 1.1 of the database, for example, integer values ranged from 0 for exclusion areas (usually water, desert, or other wilderness areas) to 1 for the most remote land areas to as high as 65,000 for urban centers.

The generic model remains the same for all regions, but the probability weights of individual variables must be customized for each country due to economic, physical, and cultural factors. For example, nighttime lights tend to be intense in energy rich nations, like Kuwait, and less intense in energy poor nations like North Korea. A notable example is the main highway extending from Kuwait City to Kuwait's western border. It is so brightly lit with streetlights that a uniform probability weight would have caused vast urban populations to be distributed across uninhabited desert. Similarly, rural population densities associated with cultivated land cover in one region may differ greatly from those associated with the same land cover type in another region.

All weighting values employed in the actual LandScan calculation for each world region are retained and archived for future reference.

Verification and Validation

Verification of spatially explicit global population databases is inherently limited by the difficulty of establishing a suitable reference database for purposes of comparison. The ideal reference database would be actual census counts for sample areas at the same resolution or finer resolution than the database being evaluated, in this case 30" X 30" cells. Such a comparison may be feasible for certain countries that collect and disseminate high resolution (e.g., block or tract level) census data, but this applies only to urbanized areas of the United States, certain other advanced countries, and a few less developed countries. Unfortunately, suitable reference data are least likely to be available for unstable hostile regions where improved population distributions are most needed for government planning, ie. to assess risks to military and diplomatic personnel and civilian populations or to provide humanitarian relief.

In rare instances, V&V may include direct fieldwork in the region of interest. More often, it will depend on tests of (a) consistency with ancillary data and/or (b) surrogate analysis of similar areas. Thus, verification of data and validation of underlying models necessarily depends on indirect measures including:

A. Verification based on best available census counts at finest available resolution: This check is conducted for all countries comprising the LandScan Global Population Database. However, the results do not constitute an accuracy assessment because (a) the same data are employed in calculation of the LandScan results for each country and (b) actual census data are rarely available at full 30" X 30" resolution.

B. Surrogate area analysis: The results of verifications based on best available actual census counts in areas of good reference data (e.g., United States) may be extrapolated to areas of poor reference data.

C. Ancillary data analysis: Indicators of population (e.g., buildings, settlements, or land cover: high intensity developed, low intensity developed, cultivated, etc.) will be derived primarily from satellite imagery or aerial photographs. The imagery must not have been employed in calculation of the LandScan database, and generally it should be at finer spatial resolution than the input data.

D. Input data analysis: Verification, validation, and sensitivity analyses of input data (in this case land cover, elevation, roads, nighttime lights).

A robust V&V effort should include elements from each of these approaches because each has certain strengths and weaknesses. Census verification (A) provides reasonably precise estimates for certain developed countries, but census counts usually represent residential rather than ambient population and spatial precision is limited to province level, or even country level, for most of the world. Surrogate area analysis and extrapolation (B) provides reasonably good indications of accuracies for nearby areas with similar physical, economic, and cultural traits. However, official census counts typically characterize residential populations rather than the ambient populations estimated in LandScan. Hence, the correspondence would not be 100 % even if both databases (official census and LandScan) were perfect. This approach (B) may serve as a useful test of the overall methodology. However, extrapolated results may be misleading for LandScan databases in areas far removedphysically, economically, and culturallyfrom the surrogate area. In ancillary data analysis (C), buildings may appear to be residential and may suggest that people are present, but the inference is a subjective, non-quantitative indicator of population. Input database accuracies (D) can establish likely sources of error and bounds of error, but this approach incorporates no direct comparison of predicted versus actual population values.

Comparison with Tel Aviv Imagery

A comparison for Tel Aviv, Israel, reveals excellent correspondence between LandScan gridded population densities and developed land cover identifiable on high resolution panchromatic imagery. In total, the image contains about 21 settlements identifiable through visual interpretation. Of these, 17 appear as elevated population values in the LandScan database. The image contains 42 settlements designated as P-95 circles; all of these also appear as elevated population values in LandScan. This is not surprising since the locations of P-95 circles are used in the LandScan calculation. Conversely, however, in the easternmost sub-province of the image one substantial settlement interpreted on the imagery appears as elevated population values in LandScan and yet does not appear as a P-95 circle.

When province boundaries are overlaid on the image, in one case the LandScan data appear to reflect the influence of diverse province control totals, resulting in an artificially abrupt change in population density. In most cases throughout the image, however, the sharp LandScan gradients correspond with interpreted gradients from developed areas to their sparsely settled arid fringes. This correspondence is conspicuous in the abrupt gradient to unpopulated land in the southwestern quadrant of the image.

Census Validation for the Southwestern United States

The United States provides a unique opportunity for V&V of the LandScan methodology due to the availability of population counts and census unit boundary geometries at fine spatial resolution. We focused on the Southwestern region due to its arid climate and other physical similarities with Israel. State population counts for Arizona, California, Nevada, and Utah were distributed to 30" X 30" cells based on the LandScan population model with coefficients modified to account for distinctive regional differences between the Southwestern United States and Israel. The resulting cell values were then aggregated to counties and compared to actual census counts at county level.

For V&V purposes the input census data were deliberately entered as aggregate state totals, artificially limiting the calculation to the type of census counts available for most of the world, presuming states to be equivalent to countries elsewhere and counties equivalent to provinces elsewhere.

The final LandScan results, however, incorporate best available census counts down to census tract level. Consequently, the cell populations used for V&V analysis bare little relationship to those appearing in the final LandScan database. Without doubt, the final LandScan population database for the United States has far greater accuracy and spatial precision than the experimental results produced for the V&V exercise. Thus, the results obtained here are a conservative V&V measure (i.e., differences in comparison to census data are greatly exaggerated).

Even so, the overall correspondence is such that 87.8% of the simulated LandScan population for the Southwestern United States corresponds with the county totals of the official census (i.e., only 12.2% of the total population is placed in a county other than that indicated in the official census count). Respectively, this correspondence is 90.4% in Arizona, 88.2% California, 88.9% in Nevada, and 74.9% in Utah. The results indicate a difference of less than 20% (+) between the census count and the simulated coarse LandScan ambient population in 40.3% of the counties, and these counties contain the vast majority of the total population. Of course, small percentages of urban populations redistributed to rural cells causes large percentage differences in sparsely populated counties. Thus, most of the sizable percentage differences occur in Nevada, Utah, and a few sparsely populated counties in Arizona and California. This characteristic is abundantly clear in that most substantial differences occur in counties whose populations are negligible at this regional scale in spite of our diligent attempt to depict the full range of values with standard commercial software.

A cursory investigation suggests that a key cause of "over" estimation in rural Western United States counties is the presence of federal lands, which are treated the same as other lands in the simulated LandScan calculation but generally contain few residents who would be counted in the official census. This factor could have been recognized by including federal lands, which are available in digital form, in the United States LandScan calculation, but we elected not to do so. Undoubtedly, some ambient populations are present even on the most remote federal lands. Ranchers, hunters, and hikers, for example, contribute to a sparse continual presence. National parks, in particular, may at times have ambient populations equivalent to the residential populations of small cities.

Among urban counties, the most conspicuous "over" estimation of simulated ambient population is for Sacramento, California (49.0%). Certainly, part of the explanation is that Sacramento fairs unusually well in key LandScan variables. The slope variable, for example, would heavily favor Sacramento over San Francisco in a statewide simulation. Also, as state capital, Sacramento has a disproportionately large number of administrative buildings compared with its resident population. Buildings and related infrastructure are reflected in two LandScan variablesland cover and nighttime lights. As with other administrative and institutional centers, such as college towns, ambient populations in fact may warrant considerably higher LandScan values than are indicated by official census counts.

Again, we remind the reader that none of the differences discussed above apply to the final LandScan database which is based on tract level census data. Indeed, some of the advantages claimed for LandScan, such as recognition of ambient populations in state capitals and national parks, actually may diminish with finer resolution census data input. Consider, for example, that our simulated, statewide calculation for Arizona may account for Arizonans who depart from Phoenix to visit the Grand Canyon while our final database only accounts for movements within each census tract that touches the Grand Canyon. As global policy, we have adopted the geometry associated with the "best available census count" as the areal unit for which such travel patterns will be reconciled. We interpret the simulated results in the Southwestern United States to mean that the LandScan algorithm works as intended.

P-95 Circles and Rural Cells Compared to Tract Level Census Data in the Southwestern United States

For comparison, consider the accuracy and precision inherent in other attempts to characterize local population distributions. We mapped P-95 circles and rural cells (employing mini-cells wherever available) and intersected them with census tracts in the Southwestern United States, based on a uniform distribution within each circle or cell. A histogram of percentage differences between these estimates and official census counts for census tracts depicts substantial differences. Of 6,811 census tracts, only 1,105 (16.2%) show differences of less than 10% (+), and another 1,077 (15.8%) show differences of 10 to 20% (+). Some 701 census tracts (10.3%) differ by 100% or more. Most of the large differences occur in census tracts with small populations, and they are due, not to error per se, but to the spatial resolution of the P-95 circles and rural cells.

For comparison, a histogram of percentage differences between final LandScan estimates and official census counts for census tracts in the Southwestern United States would show a difference of 0 for every census tract. The correspondence is perfect because census tracts are employed as control totals in the LandScan calculation. Thus, it is the finer spatial resolution of LandScan, rather than any fundamental error in the P-95/rural cell values, that results in this highly favorable comparison.

P-95 Circles and Rural Cells Compared to County Census Data in Germany

We mapped P-95 circles and rural cells (employing mini-cells that were available for most of Germany) and intersected them with counties in Germany, based on a uniform distribution within each circle or cell. A histogram of percentage differences between these estimates and official census counts for counties depicts substantial differences. Of 445 counties, 246 (55.3%) show differences of less than 10% (+), and none differ by 100% or more. Most of the large differences occur in counties with small populations, and they are due, not to error per se, but to the spatial resolution of the P-95 circles and rural cells.

For comparison, a histogram of percentage differences between final LandScan estimates and official census counts for counties in Germany would show a difference of 0 for every county. The correspondence is perfect because counties are employed as control totals in the LandScan calculation. Thus, it is the finer spatial resolution of LandScan, rather than any fundamental error in the P-95/rural cell values, that results in this highly favorable comparison.

Census Validation for Israel

Analysis similar to that above can be conducted for certain foreign areas that have recent, high-quality, fine-resolution census data. For Israel, we (a) employed province level census totals as input data, (b) simulated ambient population for 1 km. cells, (c) aggregated cell values for sub-provinces, and (d) compared to official census data for sub-provinces. The results indicate good correspondence between census data and LandScan data, except for the same trend observed in the Southwestern United States, ie. areas with small populations were "over" estimated. The two sub-provinces with the smallest official census counts showed differences of 35 and 39%, respectively. Even so, the overall correspondence is such that 91% of the simulated LandScan population for Israel corresponds with the sub-province totals of the official census (i.e., only 9% of the total population is placed in a sub-province other than that indicated in the official census count). Conversely, among sub-provinces with census populations of 100,000 or more, differences range from 1.1% to 16.3% (+).

Again this V&V analysis is a conservative assessment based on artificially coarse results obtained by keeping the input data at unnecessarily high levels of aggregation. The final LandScan results are based on sub-province input data and will correspond precisely with official census totals for sub-provinces.

P-95 Circles and Rural Cells Compared to Sub-province Census Data in Israel

We mapped P-95 circles and rural cells (employing mini-cells wherever available) and intersected them with sub-provinces in Israel, based on a uniform distribution within each circle or cell. A histogram of percentage differences between these estimates and official census counts for sub-provinces depicts substantial differences. Of 14 sub-provinces, only 6 (42.9 %) show differences of less than 10% (+), and another 4 (28.6%) show differences of 10 to 20% (+). All other sub-provinces differ by less than 28% (+). These differences are due, not to error per se, but to the spatial resolution of the P-95 circles and rural cells.

For comparison, a histogram of percentage differences between final LandScan estimates and official census counts for sub-provinces in Israel would show a difference of 0 for every sub-province. The correspondence is perfect because sub-provinces are employed as control totals in the LandScan calculation. Thus, it is the finer spatial resolution of LandScan, rather than any fundamental error in the P-95/rural cell values, that results in this highly favorable comparison.

Plume Intersections With Population Databases

The V&V results indicate that LandScan is a viable new source of geographic information for estimating populations at risk. To illustrate the potential impact on such estimates, we intersected hypothetical contaminant plumes with the LandScan database and with the P-95 circles/rural cell population distributions. A comparison of populations at risk calculated by intersecting notional plumes with LandScan and P-95/Rural Cell populations was compared with official census counts in the Southwestern United States and Germany. The results indicate that LandScan produces more precise and accurate results by a considerable margin for small plumes and by a non-neglibible margin for large plumes.

Conclusions

LandScan provides global coverage of population at 30" X 30" resolution, the finest spatial resolution yet developed. LandScan employs a "smart" interpolation procedure based on variables similar to those recommended by Tobler et al. (1995), Sutton (1997), Sutton et al. (1997).

V&V conducted in the Southwestern U.S., Israel, and Germany indicate that greater spatial precision can been achieved with no sacrifice in aggregate accuracy compared to previous global population databases. Indeed, LandScan's inherent correspondence with best available census counts for finest available census units actually represents an improvement in accuracy over previous global population databases. Indeed, for Israel even the simulated coarse LandScan results matched official census counts better than did P-95 Circles/Rural cells (9.0 % versus 13.2 %, respectively). In addition, high resolution imagery for Tel Aviv, Israel show excellent correspondence between LandScan cell values and settlements identifiable on the imagery. Census validation efforts in the United States and Israel indicate that an overwhelming majority of the total population is properly apportioned to census areas, even when LandScan is artificially constrained to unnecessarily coarse aggregations of census input data. Most of the significant differences in the Southwestern United States and in Israel occur in sparsely populated areas.

References

Dobson, J. E., E. A. Bright, P. R. Coleman, R. C. Durfee, and B. A. Worley, 2000. A Global Poulation Database for Estimating Population at Risk. Photogrammetric Engineering & Remote Sensing 66(7).

Dobson, J. E., E. A. Bright, R. L. Ferguson, D. W. Field, L. L. Wood, K. D. Haddad, H. Iredale, III, V. V. Klemas, R. J. Orth, and J. P. Thomas, 1995. NOAA Coastal Change Analysis Program; Guidance for Regional Implementation Version 1.0, NOAA technical report NMFS 123.

Elvidge, C. D., K. E. Baugh, E. A. Kihn, H. W. Kroehl, and E. R. Davis, 1997. Mapping city lights with nighttime data from the DMSP Operational Linescan System. Photogrammetric Engineering & Remote Sensing 63(6), pp. 727-734.

Loveland, T., J. Merchant, D. Ohlen, and J. Brown, 1991. Development of a land-cover characteristics database for the conterminous U. S. Photogrammetric Engineering & Remote Sensing 57(11), pp. 1453-1463.

Leddy, R., 1994. Small area populations for the United States. Presented at the Annual Meeting of the Association of American Geographers, San Francisco, CA.

Sutton, P., 1997. Modeling population density with night-time satellite imagery and GIS. Computers, Environment, and Urban Systems, 21(3/4), pp. 227-244.

Sutton, P., D. Roberts, C. Elvidge, and H. Meij, 1997. A comparison of nighttime satellite imagery and population density for the continental United States. Photogrammetric Engineering & Remote Sensing 63(11), pp. 1303-1313.

Tobler, W. R., U. Deichmann, J. Gottsegen, and K. Maloy, 1995. The Global Demography Project. Technical Report No. 95-6. National Center for Geographic Information and Analysis. UCSB. Santa Barbara, CA, 75p.

Email comments to: bhaduribl@ornl.gov