Welcome to the Archive of Census Related Products (ARCP), one of
several efforts to support the Population, Land Use,
and Emissions (PLUE) Data Project at CIESIN. This archive is a
collection of georeferenced data files containing census information
that spans the United States and its territories. This coverage will
expand to include Mexico and Canada, establishing a North American
repository for population, land use, and emissions data and integrated
data products. These data files are value-added products derived from
the original 1990 census files compiled by the U.S. Bureau of the
Census. The major components are:
TIGER - boundary files based on TIGER 1992 files containing
U.S. census geographies.
STF - demographic data files containing population and housing
characteristics from the 1990 Summary Tape File STF3A and STF1B.
STP - migration files derived from the STP28 Special Tabulation
for 1990 which show the movement of persons by county.
PUMS - public-use microdata samples which provide information for a
sample of housing units with data on the characteristics of each unit
and the people in it. The data are 5% and 1% samples of the
population and housing of the U.S.
The products are available via FTP from the ACRP archive at
CIESIN. Documentation describing the data and filename convention are
included, as well as programs to facilitate format conversion and
platform portability. The archive supports the integration of
georeferenced population, land use, and emissions data by decision
makers and planners for the analysis of human settlement patterns.
Organization of the Archive
The data products reside in the /pub/census/usa
directory. The major branches (data directories) are tiger/, stf/,
stp/, grid/, and pums/ organized by state.
There are readme and content files in each directory to explain
the content and file names. A diagram showing the directory structure
of the entire archive can be found in /pub/census/dtree.txt.
The file naming convention is consistent across the states.
The executable binaries used to unzip files on multiple platforms
and license information are provided in the /pub/census/src
directory.
Documentation (readme files) describing the data, the filenaming
convention, and directories is available in the /pub/census/.support
directory. Text documents describing variables, format processing and
information about the data can be found in the /pub/census/.support/infofiles directory.
The program source code for converting the data to different
formats across multiple platforms can be found in the /pub/census/.support/
directory.
Files covering the entire U.S. can be found in the
/pub/census/usa/tiger/, stf/, and grid/ data directories. Data at the
state, county, puma, mcd (Minor Civil Divisions), place, tract,
blockgroup, and block levels are available. The 1980 mcd's and tract
files are also provided where available.
Boundary Files
The boundary files contain census geographic entities extracted from the
TIGER/Line Files, 1992. These data have been processed to create the following:
State boundary files for county, county subdivision (mcd), place,
1990 tract/block numbering area, blockgroup, and 1980 tract and mcd's
(where available). The tract and blockgroup files are grouped by MSA
(Metropolitan Statistical Area) and CMSA (Consolidated Metropolitan
Statistical Area).
County boundary files for all census blocks. Each census block is identified
by a unique POLygon IDentification (POLID) field which matches the POLID
in the STF data.
Boundary ASCII Format (BNA); The BNA file format can be directly
imported into ATLAS GIS (SMI). It can also be converted for other
desktop packages, like MAPINFO, ARC/INFO, SAS, etc.
Code
Several format conversion programs (code) reside in /pub/census/.support/. These include:
atlsbna.sas - converts SAS Gmap dataset to Atlas BNA format
bna2sas.sas - converts BNA files to SAS Gmap format
cnvtdlm.sas - converts SAS dataset to sequential file in delimited format
latlonm - converts x-y coordinates from latitude, longitude to miles
mlatlon - converts x-y coordinates from miles to latitude, longitude
Public Use Microdata Samples (PUMS) Documentation Files
Data dictionaries which contain a full technical description of the PUMS
data in text format. The series contains the decennial year 1940 through
1990. An additional file contains the 15% sample for the year 1970.
Equivalency Tables for 1990 PUMS: relates census geography to PUMAS.
Public Use Microdata Sample Areas (PUMA) Boundary Files
The PUMA boundary files consist of a 5% sample (apuma) and 1% sample
(bpuma) areas for the mapping of 1990 PUMS data covering the
continental U.S., Alaska, and Hawaii. These boundary files are created
based on equivalency files generated by the Geographic Correspondence
Engine (Geocorr). A
national census tract to PUMA geography correspondence file is used in
merging the two files resulting in the PUMA geographies. An additional
file is also available consisting of geographic centroids for the PUMA
coverages calculated by UIC (Urban Information Center/ Office of
Computing, University of Missouri).
The purpose of this project was to address the issue of the absence of
an authoritative boundary layer for the geographies associated with
the Public Use Micro Sample (PUMS) data files. The results are
available in standard formats for easy use within desktop geographic
information systems. Technical
documentation explaining in detail how the PUMA boundaries are
generated is available in the archive.
Spatial Coverage
United States
Format
The data files are presented in 3 widely used formats: Atlas GIS "agf"
system format, ARC/ARCView "shp" shapefile format, and the "bna" ascii-export
format used by a number of geographic import/export utility packages. Each
format has its own subdirectory in the archive.
thinned - county and state files for the contiguous United
States. These files are a simple concatenation of the county files for
the continental U.S. Thinning the files has been done in three steps,
reducing the polygon points by 50% in each step. The "thinning levels"
are indicated by the "thx" filename, where x is assigned a value of 1,
2, or 3. The value is indicative of the level of reduction for each
step of thinning. The designation "00" within the filename indicates
the continental U.S.
These data files contain the location (latitude/longitude) of all street
intersections found within a county. The files depict the English names
of both streets intersecting, as well as, the location and unique "node"
number (for streets which frequently intersect each other).
The files can provide a "nearest location" (i.e. a mortgage deed), or
can be imported into a desktop GIS measure and copied to any
geographic layer (i.e. census tract). A relative measure of "street
density" can then be calculated based on the number of street
intersections per land area or unit.
The standard extract files describe the nation's population and
housing characteristics. These data files are organized by state and
contain 225 FAFVAR (Frequently Asked For Variables) derived from 1990
STF3A, including standard geographic identification variables (Federal
Information Processing Standards [FIPS codes]). Polygon centroids
in latitude-longitude are also included. Population data tabulated for
each polygon contains demographic information about age distribution, education
levels, ethnicity, income distribution, labor force status, children, and
housing attributes. Housing items include the size and state of the housing
unit, value of the unit, water, sewage, heating, monthly owner costs and
other related information.
The data files are sub-divided by geographic level: county,
county subdivisions, place-within-county, tract/block numbering area
(bna), and blockgroup. Each file contains a unique POLID field which
matches a similar field in the corresponding TIGER-based boundary
files.
Tract and blockgroup files are grouped by MSA and CMSA code
(non-metro remainder of state files "9999"). These files are
consistent with the organization of the TIGER based boundary files,
with the exception of the New England states.
Data files are available for metro, 5-digit ZIP code, and
place-within-state census geographic entities. These have no
corresponding boundary files.
Demographic data files were split into "a" and "b" sub-files to
accomodate dBase III, which is limited to 128 variables. The "a" and
"b" files can be matched on the corresponding POLID field.
The entire 1990 STF3A database has been grouped by state according to
the state postal codes and subdivided by the following geographic
summary levels: state and county (slvl=040 and slvl=050), county
subdivision/mcd (slvl=060), tracts (slvl=140), blockgroup (slvl=150),
places (slvl= 155), and others (all other slvls). The other summary
level includes groups such as Indian reservations, tribal districts,
congressional districts, etc. The tabulation geographies, the
split-blockgroups (slvl=090), split-tracts (slvl=080) and split-places
(slvl=070), have been extracted into individual files by state. A
detailed description of the variables have been compiled in the file
3xptvar.lst.
The 1980 Summary Tape File (STF3A) consists of over 1100 variables
which contain demographic data from the Census of Population and
Housing. Data files are included for 1980 mcd and tract levels. The
file is located in:
directory path: /pub/census/usa/stf/xx/3xpt
/xx_1980.zip (where xx represents the state postal codes), organized by state.
Refer to the contents .xx80 file for a description of the 1980 data.
Note: There are no Standard Extract files for this database because
the boundary layers are not available or the entire U.S. is not
covered. One could try to match these data on the 1980 mcd and tract
(m8_ and t8_) boundary files located in /pub/census/usa/tiger/xx/bna_st,
(where xx represents the state postal codes), organized by state.
Spatial Coverage
Entire United States
Format
SAS transport (.XPT)
Zip Equivalency Files
Census Block Level
The zip equivalency files contain a subset of population
counts/housing units data derived from the STF3B header file including
5-digit zipcodes for each census block. These data files contain a
POLID census block field that corresponds to the POLID of the
Tiger-based boundary files at the census block level. The Tiger data
can be found at /pub/census/usa/tiger/xx/bnablk (where xx represents the state postal codes).
The Geocorr
search engine is an alternate way of accessing the data in the STF3B
header file.
Census Blockgroup Level
These zip equivalency files contain data that has been aggregated from
the census block level to the census blockgroup level and incude
population centroids. The "POP" and "HUS" counts are summed for all
census blocks belonging to the same blockgroup. These totals will be
slightly different from the STF3A reported PERSONS count because of
sample count weighting. Total metro, urban, and urbanized area
populations are included based on the census block level. Population
centroids for the blockgroups have been calculated, by "weighting" the
census block spatial centroids with the population counts.
These data files contain population and housing data extracted from the
1990 STF1B database on CDROM. The population items include total population,
age, race, and hispanic origin. The housing items include number of housing
units, tenure, room density, mean contract rent, mean value, and mean number
of rooms. Some data items are fairly limited in descriptive information.
In addition, the records contain relevant information from geographic header
records which includes land area, water area, centroids, MSA codes, place
codes, and special area codes. These data files are identified as population
census blocks (xblk).
For zero population and housing unit counts, the data files are
identified as non-populated census blocks (zblk). Geographic header
information is included with the "zblk" files.
Both the populated and non-populated census block records contain
a POLID field which corresponds to the POLID in the boundary files
based on TIGER92. These data are organized by state using the county
FIPS code.
These files can be found in the state directories:
These files contain 100 percent data with over 1100 variables consisting
of population and housing items. The population information includes age,
race, sex, marital status, hispanic origin, household type, and other demographic
data. They are cross tabulated by age, race, sex, or hispanic origin. The
housing information includes tenure, number of units, value, number of
rooms per unit, and the use of the unit.
The data files are sub-divided by geographic levels: county, county
subdivision, place-within-county, tract/bna (block numbering area), blockgroup,
and block and organized by state.
NOTE:The range of information for Illinois, New York,
Pennsylvania, and Wisconsin have been expanded to include three
additional levels: cdrom/, extract/, and full/.
cdrom/ - data files derived from the cdrom version of STF1B
containing a dozen variables
extract/ - data files containing 80+ variables extracted from the
full STF1B database
full/ - data files which are the full STF1B database containing
more than 1,100 variables. The "header" record is a separate
file.
Spatial Coverage
United States
Format
Comma Separated Value (.CSV)
Code
stf1bxz.sas - SAS program for the xblk and zblk files.
stf1bdoc.zip
- documentation from the U.S. Bureau of the Census
stf1bvar.lbl
- variable description for the census block
Enhanced Migration Files
The migration files are county to county migration data files
organized by state which contain enhancements that involve separations
of table matrices, the addition of multiple count fields for in and
out moves on table matrices, and creation of a third file for
net-migration flows. In all files, information about immigration and
emmigration are stored on a single record for ease of processing.
The "p1" and "p2" files contain all the data from the original census
STP28 files. The data in the p1 files are extracted by race while the
p2 files provide data based on hispanic origin. The p1 and p2 files
have been improved by creating two count fields per record in place of
one: the POPIN field contains person counts moving into the county
(COUNTY) from another county (COUNTY2), while the POPOUT field
contains person counts moving in the opposite direction. Both county
codes appear in each record to make processing easier for subsetting
the files.
The total flows (tf) state level file contains "total flows" of persons
for all states (into the county, out of the county, and within county
moves). These flows are reported for all counties within a state, and all
counties outside of the state which were sources or destinations of moves
involving the state of interest. This file was created by aggregating the
more detailed data in the p1 file.
The interstate migration (im) files were created in order to generate
the total flows (tf) files. The "im" files contain all migration from
a state of interest to all other states.
Macro that reads the county to county special migration
files and converts the raw files to 3 SAS data sets: a total
population flow, and p1 and p2 files (see file stp28c.sas).
Citation
The suggested citation for this web site:
Center for International Earth Science Information Network
(CIESIN). Integrated Population, Land Use, and Emissions Data Project
(PLUE). Palisades, NY: CIESIN, Columbia University. Available at http://sedac.ciesin.columbia.edu/plue.
The suggested citation for ACRP data products:
Center for International Earth Science Information Network
(CIESIN). 1996. Archive of Census Related Products (ACRP): data
set name. Saginaw, MI: CIESIN. Available at: http://sedac.ciesin.columbia.edu/plue/cenguide.html.
Credits
The ACRP archive was developed in 1996 through the collaborative
efforts of John Blodgett, Urban Information Center/Office of
Computing, University of Missouri, and Henk Meij of CIESIN.