Collecting Data on Governments – Innovation at Work!

Bookmark and Share

Written by: Carma Hogue, Assistant Division Chief, Governments Division

Today government finance, public pensions, education spending, and taxes are hot issues and in the information age – where information is readily available and more easily monitored and measured – statistics tell the stories. 

2009 State and Local Government Expenditures

The U.S. Census Bureau’s Governments Division collects data on federal, state and local government and constantly researches new ways to make data collection more efficient and the data more precise.

On March 15, 2012, the Council of Professional Associations on Federal Statistics held a workshop on censuses and surveys of governments.  Attendees at the conference included representatives from academia, the private sector, several federal statistical agencies, and members of a 2007 Committee on National Statistics panel on government statistics.

Governments Division staff presented their research, as well as planned research, on a host of topics.  We believe many readers will find this research to be of interest:

For more information on the papers, see http://www.census.gov/govs/pubs/research_reports.html

Posted in Uncategorized | Leave a comment

Using Historical Census Data to Reveal Migration Patterns of the Young, Single, and College Educated

Bookmark and Share

Written by: James Fitzsimmons, Assistant Division Chief, Population Division

Between 1965 and 2000, the young, single, and college-educated population in the United States—the “YSCE” population—migrated in patterns that were often at odds with those of other segments of the nation’s population.

In general, larger metropolitan statistical areas were more likely to have consistent net in-migration of the YSCE population, while smaller metros, micropolitan statistical areas, and areas outside of metros and micros were more likely to experience YSCE net out-migration.  These findings were often opposite those for the total population. Within metro areas, migration to principal cities also was a hallmark of the YSCE population.

Other findings reported in the recently released Population Division working paper Historical Migration of the Young, Single, and College Educated: 1965 to 2000, authored by Justyna Goworowska and Todd Gardner, included the fact that less than one-fifth of states saw consistent net in-migration of the YSCE population during that period.  About half of states, on the other hand, experienced consistent net out-migration of the group.

The working paper’s focus on migration of the YSCE population, a group with outsized human capital and potential impact on population growth, was possible thanks to the Census Bureau’s Historical Census Files Project. That project has recovered all available microdata from the 1960, 1970, and 1980 censuses, and it is in the process of harmonizing these files with ones from the 1990 and 2000 censuses.

The central outcome of the historical files project is a time series of anonymized historical decennial census microdata files available to researchers within the Census Bureau as well as to those with approved projects through the Census Bureau’s national network of secure Research Data Centers.

An eventual project goal is to extend the historical microdata holdings to earlier censuses, but at present the full range of data gathered from the “long form” of five consecutive censuses, along with documentation, is at hand for researchers with approved projects.  In its analysis of migration patterns of the YSCE population, the working paper has shed light on only one of a long list of potential subjects that would lend themselves to further study with the historical microdata series.

Posted in Uncategorized | Leave a comment

Calibrated Bayes Modeling at the Census

Bookmark and Share

Written by: Roderick Little, Associate Director for Research and Methodology and Chief Scientist

Federal statistics have a rather schizophrenic view of survey inference. The preferred approach for inferences about descriptive population quantities from large surveys is the so-called “design” or “randomization” based approach, where population values are treated as fixed and uncertainty is based on probabilistic selection of the sample. This approach is widely attributed to the famous 1934 paper by Jerzy Neyman, and the Census Bureau was a pioneer in putting it into practice, led by Morris Hansen and others.

The design-based approach does not work well for situations where the survey information is limited and the so-called “direct” estimates it produces are noisy, like small area estimation. It also falls down for problems such as missing data where response cannot be considered random. An alternative is the modeling paradigm, which bases inference on a statistical model for the population values. It is also widely practiced at the Bureau. Indeed, economists and other social scientists are trained as modelers and are often somewhat mystified by the design-based approach. This leads to controversies over such matters as how and when design weights need to be included in the analysis.

I favor the approach known as “calibrated Bayes,” where all inferences are based on Bayesian models, but models need to be chosen that have good repeated sampling properties. To me everything is modeling, but some models make limited assumptions and lead to answers similar to “direct” design-based approaches, others make stronger modeling assumptions to allow useful estimates for situations where direct estimates are too noisy. I have argued that this approach is more unified than the existing paradigm and provides a valuable way forward for official statistics. See “Calibrated Bayes, an Alternative Inferential Paradigm for Official Statistics” for more details.

This may seem a rather abstruse topic, but it’s fun to think about, and fundamental since it underlies nearly everything we do. What’s your view?

Posted in Uncategorized | Leave a comment

Take Your Best Shot– (and may the best model win)!

Bookmark and Share

Written by: Nancy Bates, Senior Researcher for Survey Methodology, Associate Directorate for Research and Methodology

On August 31, the Census Bureau launched a nationwide prize competition under Section 105 of the America COMPETES Reauthorization Act of 2011, Public Law 111-358 (2011). The contest – dubbed the Census Return Rate Challenge – encourages teams and individuals to compete for prize money for predicting 2010 Census mail return rates. The contest ends on November 1st.

The challenge is to create a statistical model that accurately predicts 2010 Census mail return rates for small geographic areas (census block groups). Nationwide, 79.3% of households that received a 2010 Census mail form completed it and mailed it back. However, the level of mail  return varied greatly by geography.  The Census Return Rate Challenge asks participants to model these variations using predictive variables found in the updated 2010 Census Planning Database.

The 2010 Census Planning Database is a block-group level database that assembles a range of geographic, housing, demographic, and census operational data extracted from the 2010 Census and 2006-2010 American Community Surveys.  Participants are provided a sample of the database upon which to build their models.

The Census Bureau will use the winning models for planning purposes for the decennial census and for demographic sample surveys. Participants are encouraged to develop and evaluate different statistical approaches to proposing the best predictive model for geographic units.

We hope the competition will generate new ideas on predicting census return rates as well as attract a range of talent and expertise. To help attract this talent, prizes are offered.

Prizes:

1st place: $14,000 / Visualization $1,000
2nd place: $7,500
3rd place: $2,500

Rules and Information

Contest details, rules, and eligibility guidelines are available at http://www.kaggle.com/c/us-census-challenge.

Posted in Uncategorized | Leave a comment

Demography at the U.S. Census Bureau

Bookmark and Share
By Howard Hogan, Ph.D.
Chief Demographer

 

Demography is literally “Writing about people.”  However, it has come to mean the quantative or statistical study of human populations. As such, most of what the U.S. Census Bureau does can be described as “demography,”  from the taking of the census every 10 years, to collecting information on race, ethnicity, employment, income, poverty, commuting, health, crime victimization… the list would be long.

A more narrow definition of demography focuses on the processes which determine the growth and composition of human populations. These processes are the most fundamental of all human activities:  birth, death and movement. Demography is the oldest of the social sciences, tracing its origin to 1662, when John Graunt analyzed the death rolls of London.

Each of these processes is greatly influenced by age, time, and what demographers refer to as cohort. A cohort is a group of people who experience the same event at the same time. The most famous cohort is the people born during the Baby Boom, from July 1946 through June 1964.

In all human populations, the chance of dying follows a predictable pattern. The probability of dying is relatively high just after birth, and then falls to a low in the early teen age years, slowly rises during the adult years, and increases dramatically with old age.  This overall pattern seems to be determined by basic biology. In studying the chance of dying at a given age, demographers typically analyze the conditional probability: when a person has reached a given birthday, what is the chance he or she will not survive to the next birthday?

Of course, while the general pattern may be the same, the level of mortality can differ greatly, as can the specific details. These are determined by things such as nutrition, public health, access to health care, and smoking. Thus, demographers quickly return to wider issues that affect humans.

Similarly, fertility tends to follow a general pattern, with few births to women under 16 or over 40, with the peak generally in the 20s. This much is driven by biology. However, the exact level and pattern is driven by customs of marriage and social expectations as well as factors such as nutrition and access to birth control. Of course, all of these factors are related to wider issues such as education, class, income, race and ethnicity. To understand fertility, the demographer must again tackle a wider set of issues.

Population movement is the least biologically driven of the three basic processes. There is a general tendency for young adults to be more mobile than young children or older adults, but population movement can be driven by economics (jobs), laws, availability of housing as well as crisis-driven movement due to natural or man-made disasters. One needs only to remember Hurricane Katrina to see how quickly population movement can take place.

How do Census Bureau demographers use these concepts?  They:

What is happening demographically around the world will affect the United States in many ways. So, the demographers at the U.S. Census Bureau  do not restrict their work to only the U.S. population. They have an active program to gather and study population statistics from around the world to inform Federal agencies as well as U.S. businesses of these trends.  A fine example of this is the report “An Aging World,” describing the effects of reduced fertility and mortality around the world to produce a population that is, on average, older than ever experienced in human history.

Births, marriage, deaths, movement, aging — demographers study the processes that affect us all.

Posted in Uncategorized | 1 Comment

Hard-to-Reach Populations: Research Wanted

Bookmark and Share

 

Nancy Bates
Senior Researcher for Survey Methodology, Associate Directorate for Research and Methodology

The purple and green areas represent two hard-to-count clusters from the 2010 Census.

As the statistical agency responsible for enumerating every person residing within the United States, finding and counting the so-called “hard to reach” is in the Census Bureau’s organizational DNA.  Still, in 2008 when I was researching how best to target the 2010 Census communications campaign, I was struck by the lack of empirical and peer reviewed research on methods to reach hard to count populations.

So, I did what any good empirical researcher would do when confronted with an untapped research opportunity – I pitched the idea of holding a special research conference devoted to the topic. Finding support wasn’t difficult since it has been 20 years since a similar conference was held in the U.S.  Obviously new hard-to-reach populations have emerged since then (as well as innovative solutions for measuring them). Why not pull together researchers from around the world to share their stories and successes? Thus the International Conference on Methods for Surveying and Enumerating Hard to Reach Populations was born (aka the “H2R 2012”).

The conference will be held October 31-November 3 in New Orleans, Louisiana, at the New Orleans Marriott at the Convention Center. Addressing both the statistical and survey design aspects of including hard-to-reach groups, researchers will report findings from censuses, surveys and other research related to the identification, definition, measurement, and methodologies for surveying undercounted populations. The conference is supported by the Census Bureau and more than 20 other government agencies, not for profits, and private sector survey research firms. The American Statistical Association will manage the conference.

The conference will include a plenary session on the All Ireland Traveller Health Survey. Travellers are a minority group on the island of Ireland, with a separate identity, culture and history. They are nomadic, socially disadvantaged, have high illiteracy levels, their own language (“shelta,”) and poor life expectancy and health status. The community is hard to reach in both geographical and psychosocial terms. The plenary will present both a methodological perspective from the survey director and a community perspective from community peer researchers.

In addition, the program will feature over 150 paper presentations including sessions on immigrant populations, populations affected by natural disasters, stigmatized populations, and homeless populations. Research on innovative sampling techniques, recruitment methods, use of community-based organizations, and social marketing and outreach campaigns will also be presented.

Perhaps the most exciting outcome of the conference will be the work products. These will include a 30 chapter invited monograph, a special issue planned for the Journal of Official Statistics, and online conference proceedings.

Registration is now open and the online program is available at the H2R website. (http://www.amstat.org/meetings/h2r/2012/index.cfm?fuseaction=main)

 

Posted in Uncategorized | 3 Comments

Steven Ruggles, Census Data Processing, Part 2

Bookmark and Share
Todd Gardner
Historian/Survey Statistician, Center for Economic Studies
 

In a recent presentation at the Census Bureau, Dr. Steven Ruggles, the director of the Minnesota Population Center (MPC) at the University of Minnesota, talked about the history of processing data from the census. Ruggles argued that the needs of the Census Bureau drove innovation in data processing technology up to 1960, but the private sector rather than the Census Bureau has played that role in the last 50 years. During this period the costs of data collection, storage and analysis have declined rapidly and the quantity of data collected has grown at an extraordinary pace.

Following the 1960 census, improvements in computers brought more potential for research using census data. The Census Bureau responded to researchers’ requests for data by releasing the 1960 Public Use Microdata Sample (PUMS), a 1-in-1000 sample of the records from the 1960 long form. PUMS files are for statistical purposes only and do not contain any personal information that would allow individuals to be identified. This dataset, which was delivered on 13 UNIVAC tapes (or 18,000 punch cards), allowed researchers to address a variety of questions that would not have been possible using publicly available tabulations. Since the sample consisted of microdata—records at the person- and household-level—it offered the opportunity to develop customized measures and to do multivariate analyses.

The 1960 PUMS was well received by the research community, so following the 1970 census, the Census Bureau released a one-percent sample of the 1960 census (a tenfold increase over what had initially been released) along with six percent of the records from the 1970 long form. Perhaps most importantly, the concurrent release of the revised 1960 sample and the 1970 sample allowed researchers to examine change over time very easily, as both samples used the same codes and formats.

The Census Bureau released a sample of records from the 1980 census, and outside researchers took up the task of producing samples of historical censuses. Hal Winsborough at the University of Wisconsin contracted with the Census Bureau to create samples from the 1940 and 1950 censuses, extending the series of available microdata to five censuses. Projects at the University of Washington and Penn led by Sam Preston produced samples of the 1900 and 1910 censuses, then the latest censuses available to the public. In the late 1980s, Ruggles led efforts at the University of Minnesota to produce samples of historical censuses dating back to 1850.

Though the consistent coding of the 1960 and 1970 census samples had demonstrated the power of interoperability to study change over time, none of the other samples were produced in a consistent manner. Ruggles initiated the Integrated Public Use Microdata Series (IPUMS) project to “harmonize” all of the census public use samples.  That is, to produce new versions of these datasets with consistent codes, record layouts, and integrated documentation without any loss of information from the original datasets. The initial release of IPUMS data came not long after the development of the first web browsers, and Ruggles was quick to take advantage of this technology to disseminate the harmonized census microdata, leading to a rapid increase in the use of the IPUMS database for research.

The IPUMS now includes data from all censuses from 1850 to 2000 (with the exception of the 1890 census, which was destroyed by fire). Ruggles is also involved in efforts to digitize entire historical censuses. This project, known as the North Atlantic Population Project (NAPP) since most of the counties involved are in North America and northern and western Europe, currently contains 120 million person records from 24 censuses covering the period from 1800 to 1910. Ruggles predicts that by 2016 all of the IPUMS and NAPP data releases will increase to 1150 censuses and surveys from 110 countries comprising 1.5 billion person records. MPC has also done extensive work harmonizing recent census data from other countries. IPUMS-International currently contains data from 185 censuses from 62 countries comprising some 400 million person records spanning the period from 1960 to 2010.

The greatest challenges to these efforts are that many datasets are inaccessible and at risk of loss. Also, whatever metadata exists is typically sketchy. Despite these challenges, improvements in computing technology and a rapid decline in the cost of storing data have also brought about new opportunities for data collection and analysis. Where it cost about $1200 to store one megabyte of data in 1980, the same amount of storage now costs about $0.00004. These factors have combined to bring about a marked acceleration in the pace of discovery in recent years.

Ruggles pointed out that the between all of MPC’s projects, they currently have data on more than 850 million people, or roughly as many people as Facebook. MPC is now collaborating with the Census Bureau, as well as other research organizations to expand its projects. Current large-scale efforts include the National Historical GIS Project and Terra Populus or “TerraPop,” which is an effort to preserve, integrate and disseminate global-scale spatiotemporal data describing population and the environment.

Where the Census Bureau once drove innovation in data processing technology, it is now a beneficiary of the technological changes of recent years. The Census Bureau is now collaborating with the research community in a variety of ways to improve data collection and to produce new data products.

Posted in Uncategorized | 1 Comment

From Tally Marks to Modern Computers — The Early Evolution of Census Data Processing

Bookmark and Share
Todd Gardner
Historian/Survey Statistician, Center for Economic Studies
 

When the nation counted 3.9 million residents in the very first census in 1790, the sheer workload of hand-tabulating the results was one of the founders’ greatest challenges.  As the nation grew, so did the challenge.  And when counting the results of the census took almost as long as the decade itself, the search for solutions ultimately led to the creation of modern data processing technology.

This was the thesis of Dr. Steven Ruggles, Regents Professor of History and Population Studies at the University of Minnesota, during a recent guest lecture at the Census Bureau’s Center for Economic Studies seminar series.

Ruggles, best known as the creator of the Integrated Public Use Microdata Series (IPUMS) at the Minnesota Population Center, argued that between 1850 and 1960 the needs of census data processing drove innovation in computing technology.

The first device to accelerate census tallying was created in 1872 by Chief Clerk of the Census Charles W. Seaton.   The machine used rollers to add up manually entered keystrokes.  However, even with the Seaton Machine the census took almost the full decade to process.

In 1888 the Census held a competition to find a more efficient way to process and tabulate data. Herman Hollerith’s electronic tabulator beat the competition, capturing and processing data by reading holes in punch cards. Census clerks could transfer the information on the census paper forms to punch cards, each creating about 500 punch cards a day. Variations on the Hollerith machine were used to process the 1890-1940 censuses.

World War II and the military’s need for faster ballistics information processing led to the next big data processing advance. After the war, the focus shifted to peacetime applications of computers, and in 1951, the Census Bureau took delivery of UNIVAC, becoming  the first civilian client of the modern digital computer.  Built by the Eckert-Mauchley Computer Corporation at a cost of $400,000, UNIVAC processed data at a pace far outstripping the old Hollerith machines, with data entered using magnetic tape and processed using vacuum tubes with sophisticated circuits.

In this same decade Census engineers and staff at the National Bureau of Standards tackled the last big data processing hurdle – eliminating the need to transfer data to punch cards that could then be read by the computers.  “Film Optical Sensing Device for Input to Computers,” or “FOSDIC,” made census forms machine-readable.  The 1960 Census was the first to use the technology.

As Ruggles contended,  advances in data processing up until the 1960s were driven by  the growing challenge of processing the decennial census.  See our next entry to learn how, from the 1960s and onward, the research community – increasingly equipped with modern computer technology – was the driving force behind revolutionary innovations in population data infrastructure and the expansion of the scale of all scientific data.

 

The Seaton Machine (1870-1880 Censuses)

 

 

The Hollerith Electronic Tabulator, 1902

Univac 1, 1952


 

Posted in Uncategorized | Leave a comment

The Census Bureau at the Association of American Geographers Annual Meeting

Bookmark and Share
Michael Ratcliffe
Assistant Division Chief, Geography Division
 

Geography is central to the work of the Census Bureau, providing the framework for survey design, sample selection, data collection, tabulation, and dissemination.  Geography provides meaning and context to statistical data.

Given the diversity of our population, our economic activities, and our geographic areas, use of the latest and best geographic methodologies is critical to the Census Bureau’s ability to serve as the leading provider of statistical and geospatial data.  Our geographic area concepts, information, and statistical data must keep pace with the needs of the researchers and analysts who work to understand the changing distribution and characteristics of our people, places and economy.

February’s recent annual meeting of the Association of American Geographers brought together more than 8,600 geographers and other social scientists. Many of those attending rely on Census Bureau geographical and statistical information as a core component of their work.

Census Bureau staff delivered presentations or served as panelists in 24 sessions. Researchers from our Population division discussed methodologies to estimate, map, and visualize global subnational demographic data, and the use of those data to address issues affecting population groups in various countries.

In response to the increasing interest in place-based analysis and planning and the importance of understanding how our census geographic concepts relate to the perceptions and expectations of researchers and data users, Geography division researchers organized three sessions focused on identifying, articulating, and defining places and communities.  Other presentations explored new techniques for assessing the quality of spatial data and the measurement of job accessibility.

Geographers have long understood the power of maps and other tools for visualizing data.  The Census Bureau shares this vision and has a rich history as an innovator of methods and techniques for visualizing statistical data. Census Bureau director Dr. Robert Groves delivered the keynote address in which he discussed Census Bureau efforts to develop new ways to visualize and disseminate data using the latest technologies.

For more information about the Census Bureau’s activities at the 2012 annual meeting of the Association of American Geographers, see our website:

http://www.census.gov/research/conferences/census_at_the_association_of_american_geographers_annual_meeting.php

Posted in Uncategorized | Leave a comment

Geography at the Census Bureau

Bookmark and Share
James Fitzsimmons
Assistant Division Chief, Population Division

As a field of study dating from antiquity, geography ranges broadly from the social sciences to the physical sciences and humanities.  The Census Bureau’s work in geography is concentrated in the domain of the social sciences and in such methodological subfields as cartography, geographic information systems, and remote sensing.

One of geography’s central questions—Where is it?—is the foundation for the agency’s most fundamental activities.  All residences and firms ultimately occupy a particular location or locations, and gains in program efficiency and effectiveness come about in part from detailed understanding of the geography of people and activities.  Geography often serves as the organizing dimension for the agency’s data collection, tabulation, presentation, and dissemination.

Collection of Census Bureau data, for example, depends heavily on America’s digital map—TIGER—and using that map more effectively along with the Master Address File is one subject of current research.  Recognizing settlement patterns informs strategies for designing samples and directing enumeration to ensure appropriate coverage while minimizing duplication and field interviewers’ travel.

Delineation of statistical areas such as census blocks, tracts, and urban areas is another geographic activity that is crucial to agency data collection and presentation programs and one that has benefited from continued research.  A host of questions underlies decisions on qualification of areas, delineating boundaries, and determining when areas have grown together sufficiently to be regarded as a single, larger area.

Finally, reviewing data and then disseminating them in ways that facilitate and encourage their use both gain from geographic insights. Geographers continue to be central to the effective presentation of Census Bureau data, and their work has benefited both identification of potential data problems and discernment of patterns and relationships.

The Census Bureau’s 19th century census atlases were leaders in the field of data presentation. Last decade’s Census Atlas of the United States  (see: http://www.census.gov/population/www/cen2000/censusatlas/) was the first comprehensive atlas of population and housing produced by the Census Bureau since the 1920s. It contains nearly 800 maps covering topics such as language and ancestry characteristics, housing patterns and the geographic distribution of the population.

The increasing numbers of website applications recently introduced and in preparation for use in reviewing and displaying data require sustained research on mapping best practices for the Census Bureau to continue to excel in this arena.

 

Census Superintendent J.D.B. DeBow published the first map in a census publication in 1854. This map showed his delineation of the United States into four major regions, based on the major drainage systems of North America.

 

Center of Population Map

Census Bureau “Center of Population” maps trace the center point of the nation's population starting with the first census in 1790. (See animated historical map: http://www.census.gov/geo/www/maps/2010center_pop_animation/2010mean_center_animation.html)

Posted in Uncategorized | Leave a comment