Skip Navigation
acfbanner  
ACF
Department of Health and Human Services 		  
		  Administration for Children and Families
          
ACF Home   |   Services   |   Working with ACF   |   Policy/Planning   |   About ACF   |   ACF News   |   HHS Home

  Questions?  |  Privacy  |  Site Index  |  Contact Us  |  Download Reader™Download Reader  |  Print Print      

Office of Planning, Research & Evaluation (OPRE) skip to primary page content
Advanced
Search

Table of Contents | Previous | Next

II. Methodological Issues and Limitations in Studies

Approaches to Studying Welfare

Studies of families receiving TANF and those of families that have left welfare fall into three broad categories: descriptive, analytic, and experimental. Descriptive studies use qualitative/ethnographic and quantitative data to assess the status of welfare leavers and recipients at one or more points in time. Such studies can document the demographic characteristics of these families (e.g., race/ethnicity, age, educational attainment); their employment, income, and earnings; and the hardships and barriers to work they face. They are extremely useful because they clearly illustrate the circumstances of families that are either still on welfare or have recently exited the program and can alert policymakers and program administrators about emergent problems and unmet needs in these populations. Although descriptive studies are sometimes used to determine whether the circumstances of welfare families have improved or deteriorated over time, such direct comparisons may be misleading if the types of families entering welfare are also changing (this issue is addressed in greater depth below). In addition, descriptive studies comparing welfare recipients or welfare leavers before and after welfare reform or any particular change in welfare policy cannot be used to draw strong conclusions about the impacts of changes in policy because one cannot be certain about how the welfare population would have changed in the absence of a policy shift.

Analytic studies attempt to understand how and why the circumstances of welfare leavers and recipients are changing over time. At a minimum, such studies use multivariate statistical techniques to isolate the disparate factors that can influence the status and outcomes of these families. For example, where a descriptive study might note that the earnings of families that left welfare declined from one year to the next, an analytic study could find that the earnings of both high school graduates and high school dropouts increased but that a higher percentage of leavers in the latter year were high school dropouts (and dropouts have lower earnings than graduates). Thus, an analytic study could demonstrate that what appears to be bad news (falling average earnings) masks good news (rising wages for all subgroups). Analytic studies also attempt to ascertain why observed changes occur; for example, have policy changes or policy choices contributed to changes in the circumstances of the welfare recipients and leavers?

Analytic studies that try to assess the impacts of policies on welfare caseloads, welfare recipients, and welfare leavers all require variation in welfare policies, and this variation generally arises over time and across jurisdictions (e.g., states). However, many factors other than welfare policies change over time such as the economy and changes in nonwelfare policies (e.g. EITC), and there are many differences between jurisdictions besides how they approach public assistance. “Welfare policy” comprises a variety of specific policies (e.g., benefit levels, sanctions, time limits), and jurisdictions may elect to implement policies with offsetting effects. This further complicates analytic studies. And different jurisdictions may implement the same de jure policy in different ways. As a result of these complications, subtle differences in the approach taken in analytic studies (e.g., the specific states and time periods, the way policy choices are modeled, the set of policies and nonpolicy factors considered) that try to explain changes in welfare recipients’ status and outcomes can produce divergent findings.

Experimental studies are considered the “gold standard” for assessing the impact of policies on welfare recipients and welfare leavers. Under these experiments, applicants to welfare programs are randomly assigned to either a treatment or control group. The treatment group is subject to a new welfare policy while the control group is subject to the rules governing the prior policy regime. Differences in outcomes between the two groups can be ascribed to the policy differences.

There are, however, several important limitations to experimental studies. Experiments are expensive to conduct and are usually confined to a limited geographic area and a limited set of policy changes. As such it is not clear whether findings from even well-done experimental studies are applicable outside of the study area (e.g., will a program that works in Nevada be as effective in Tennessee?). In addition, social science experiments are not conducted under controlled, laboratory conditions, and the experiment can be contaminated. For example, in some cases a policy shift may be implemented statewide but a few welfare applicants may be placed in a control group and must function under the previous policy regime. If the control group members believe the new rules apply to them because these rules are being widely advertised, then the control group effectively has received the treatment. Finally, policies may have effects that occur prior to the time affected individuals can be randomly sorted into treatment and control groups. For example, if random assignment takes place after an individual has applied for welfare, then the experiment cannot detect the effects of policies that deter or discourage individuals from even applying for benefits.

Descriptive, analytic, and experimental studies all have important limitations, but as long as one is mindful of these limitations, a great deal of useful information can be gleaned from each. Descriptive studies allow policymakers and analysts to document changes in the status of welfare recipients and welfare leavers. For example, if a growing share of welfare recipients have physical or mental health problems, it is important to focus on meeting their health needs and finding jobs that can accommodate their limitations even if this change in the composition of the caseload is the result of positive trends—i.e., because the most work-ready and able recipients are leaving welfare at a faster rate.

Analytic studies assessed as a group can provide some understanding as to why the status of welfare recipients and leavers may be changing over time or differing across states. Although it may be inappropriate to draw strong inferences about causal relationships from any single analytic study, an analyst or policymaker can be more confident about findings that consistently arise in multiple analytic studies using different data sets and a variety of empirical specifications. Finally, well-executed experimental studies can demonstrate the specific policies that can influence the status of specific populations, and policymakers can use this information to help decide whether a particular approach is more or less likely to work in other settings.

Data Sources for Studying Welfare

Descriptive and analytic studies of welfare populations rely on two types of data: administrative data and survey data. The most basic, fundamental information on the size and characteristics of the welfare caseload comes from administrative program data collected by the states and submitted to the federal government (Department of Health and Human Services [HHS]); summary information from these data is produced regularly by HHS.12

Survey data can provide a richer set of characteristics on current and former welfare recipients, although there are scope and quality issues with these data relative to administrative data—these issues are discussed later. Survey data are obtained through direct questioning of individuals and families; surveys can be conducted by mail, over the phone, or in person. Some surveys attempt to reinterview the same households at different points in time, gathering longitudinal data, while others cover a single point in time (cross section). Cross-section surveys may be repeated at different points in time, but the samples of surveyed households in different rounds of the survey are independent of one another—i.e., different households are surveyed in different rounds.13 Cross-section and repeated cross-section data can provide snapshots of welfare populations at different points in time, but there is no way to know how the circumstances of particular families change over time—this requires longitudinal data. Generally, it is more costly to collect longitudinal data because the same households (and in some cases individuals who leave those households) have to be relocated for every round of data collection, and those who cannot be located may differ markedly from those who can. As such, findings on how the same families are faring over time may be subject to attrition bias.

Several large, nationally representative surveys ascertain whether individuals currently receive or had been on welfare and can be used to study welfare recipients and leavers. These include the Current Population Survey (CPS—repeated cross sections), the Survey of Income and Program Participation (SIPP—longitudinal), and the National Survey of America’s Families (NSAF—repeated cross sections); these three surveys are discussed at greater length in the following section. In addition, the National Longitudinal Survey of Youth (both the 1979 and 1997 cohorts) and the Panel Study of Income Dynamics (PSID) (both longitudinal) can and have been used to study welfare-related topics, and other data sets such as the National Health Interview Survey (NHIS), the National Survey of Families and Households (NSFH), the National Survey of Family Growth (NSFG), and the Survey of Adolescent Health (AdHealth) all allow for at least limited analyses of welfare populations.

In addition to national surveys, many smaller surveys have been conducted that focus on welfare populations in specific locations. For example, the Women’s Employment Survey (WES) has tracked a sample of welfare recipients in a single Michigan county from 1997 to 2003. The Three City Study (TCS) was fielded to help understand how welfare reform has affected children and focuses on low-income families in Boston, Chicago, and San Antonio. It is a longitudinal study and has surveyed families in these three cities in 1999, 2001, and 2005. The Project on Devolution and Urban Change (Urban Change) aimed to assess how welfare reform affected low-income families with children and their communities. The Urban Change study gathered data in four of the nation’s largest urban counties—Cuyahoga, Ohio (which includes Cleveland); Los Angeles, California; Miami-Dade, Florida; and Philadelphia, Pennsylvania from 1997 to 2001. In addition, many jurisdictions have undertaken studies of welfare recipients and welfare leavers that have collected survey data. The U.S. Department of Health and Human Services (HHS) funded a series of welfare caseload studies (Colorado, D.C., Illinois, Maryland, Missouri, and South Carolina) in which all jurisdictions used the same core survey questions. Similarly, HHS funded a series of welfare leaver studies in which researchers tried to focus on common administrative data elements across jurisdictions.

Some studies of welfare populations use administrative data from state welfare programs, sometimes linked with other administrative data such as a state’s Unemployment Insurance (UI) system records. Administrative data records are the definitive source of information about program participation and provide information on the size of welfare caseloads and the rate at which families exit from the program. By linking data on current and former welfare recipients from TANF records with UI wage records, analysts can tell how many working recipients and former recipients are covered by states’ UI systems. Linking TANF data with data on other public assistance programs (e.g., food stamps, Medicaid, child welfare) can be used to study multiple program participation. Many of the jurisdiction-specific studies use both administrative and survey data in their analyses.

A final source of data comes from welfare experiments. Experimental data generally combine both administrative and survey data for the population studied in the experiment.

Advantages and Disadvantages of Different Data Sources

A key difference across survey, administrative, and experimental data sources is scope. Experimental data are the most limited in scope because they are collected to answer a specific research question, usually the impact of a specific program or program component. They are only collected for individuals and families in either the treatment or control groups of the study population, and the study population is often limited in place to a single city, county, or state and is always limited in time. While experimental data have been used in subsequent studies not related to the original experiment, this practice is relatively rare.

Administrative data are limited to the geographic entity that administers the welfare program, generally the state.14 Also, administrative data systems collect very accurate information on factors that influence eligibility for benefits and benefit levels; but less attention may be paid to other items often reported in administrative data (e.g., education levels). Data systems can differ substantially from state to state, so a national picture using state administrative data can be hard to obtain. To study families that have left welfare, state TANF data must be linked to other data systems like the UI system. Not all jobs are captured in state UI wage records (e.g., self-employment, jobs in other states, federal jobs) so matching, by definition, is incomplete.15

Survey data can also be limited to a single jurisdiction or set of jurisdictions (e.g., WES, TCS), but they can be nationally representative. However, national general-purpose surveys do not necessarily collect information on all the aspects of the welfare population that are of interest to policymakers. For example, the CPS does not collect information on several barriers to work that are of interest, such as mental health status or experience of domestic violence.

All survey data are also subject to a common set of measurement problems. First, all surveys have some element of nonresponse, that is, individuals or families that were selected as part of the survey sample but who cannot be contacted or refuse to be interviewed.16 The concern is that if those who respond are different than those who do not respond (e.g., less likely to be working or lower income), then the results of the survey will not be representative of the whole survey sample, and findings from these data may be biased. Surveys of welfare recipients in specific geographic areas have achieved reasonably high response rates. Hauan and Douglas (2004) report that four of the six surveys of TANF recipients they review had response rates of over 70 percent. A review of location-specific studies that surveyed former welfare recipients found six studies with response rates over 70 percent (Acs and Loprest 2004). Acs and Loprest (2002) discuss several studies that use administrative data to compare respondents and nonrespondents or compare respondents who were easy to contact versus more difficult to contact to gauge the extent to which there is bias in leaver studies due to nonresponse. Although the results vary by specific study, surveys with over a 50 percent response rate generally found similar average employment rates and other outcomes for welfare leavers (Acs and Loprest 2002).

Large national data sets used for welfare analysis also have nonresponse issues. The SIPP response rates for the first panel of data collection in 1996 and 2001 are 92 and 87 percent, respectively. The NSAF reports separate response rates for adults and children. In the 1997 round, response rates are 62 and 65 percent for adults and children, respectively; in 1999, the response rates are 59 and 62 percent. Between the 1999 and 2002 rounds, the response rate dropped, falling to 52 and 55 percent for adults and children, respectively. The CPS has a survey response rate over 90 percent (Weinberg 2006), so it is likely to have a small nonresponse bias. All three data sets use weights to adjust for nonresponse in the surveys as well as to adjust for complex sampling. It is important to use these weights for results to be nationally representative.17

Unlike the NSAF and the CPS, the SIPP tracks families over time. As such, in addition to survey nonresponse, the SIPP has the additional problem of attrition—families that respond to the initial wave of the SIPP may not respond to subsequent waves, and the families that stop responding to the survey may be systematically different from the families that provide information at each wave. Theoretically, when the SIPP data are used to make point-in-time assessments, using the wave-specific weight should help adjust for nonresponse for any particular data collection wave. Using the panel or longitudinal weight should adjust for nonresponse and attrition for comparisons that track families over time. The quality of these weighting adjustments, however, is of some concern.18 Despite these concerns, SIPP data are widely used in studies of low-income populations and we use them in our analysis in this study.

The second problem in survey data is reporting error. Individuals sometimes report inaccurate information, either intentionally or because they misunderstand the question or do not recall the correct response. For welfare related analyses, we are particularly interested in underreporting of TANF receipt. Some have suggested that confusion over geographic-specific TANF program names may lead to misreports of income (Kindleberger 1999). Nelson and Zedlewski (2003) suggest that there is greater misreporting of TANF income among Spanish-speaking respondents.

All the major national surveys have an undercount of TANF recipients. According to the NSAF, there are 2.3 million families receiving welfare in 1997 (weighted count) compared with approximately 3.7 million families receiving TANF reported in administrative data, meaning the NSAF captures about 62 percent of the total caseload.19 In 2002, the undercount in NSAF is similar—the NSAF reports about 1.3 million cases while the administrative data indicate that there were 2.1 million families receiving TANF. However, both the CPS and SIPP show an increase in underreporting of income, particularly welfare receipt, over time. The 1996 SIPP captures about 80 percent of the caseload; the 2001 panel captures about 60 percent (unpublished tabulations). In the CPS, the share of the TANF caseload captured falls from 71 percent in 1993 to 61 percent in 1998 (Wheaton and Giannarelli 2000). It is not known whether the problem continued to worsen through 2005 and beyond. Consequently, when making comparisons across SIPP panels or years of the CPS, differences over time between welfare recipients and welfare leavers may be due to both true changes in these groups as well as changes in the types of individuals who report welfare receipt.

In general, responses to retrospective questions may have more error than answers to current status questions (e.g., “Were you on welfare in the last two years?” versus “Did you receive welfare last month?”). As such, studies that follow the same individuals over time, like the SIPP (longitudinal data), may be more useful for assessing welfare recipients’ and leavers’ circumstances compared with cross-sectional data that ask retrospective questions. Further, when asking retrospective questions, the recall period likely affects the accuracy of responses. People are more likely to remember and accurately report on more recent events (e.g., last month, last week) than events that happened in the more distant past (e.g., last calendar year, in the last two years).

A related problem to reporting error is item nonresponse. Some survey respondents may refuse to answer certain questions or say they simply cannot recall or do not know the correct response. Many public-use data sets adjust for item nonresponse by imputing a response and noting when such imputations have taken place by adding variables known as allocation or imputation flags. Imputations are made using a variety of statistical techniques that are designed to not bias findings.20

The third problem with using survey data for welfare research is that those that do not explicitly target welfare recipients or low-income families may end up interviewing a small number of these families making it more difficult to draw significant statistical inferences from them. For example, the unweighted count of welfare recipients fell from 1,458 in the 1997 NSAF to 530 in the 2002 round.21

Issues in Assessing Research on Welfare Recipients and Leavers under TANF

When considering the research on the status of families that are either on or have recently left welfare, it is easy to fixate on the weaknesses and shortcomings of different analytic approaches and data sources. However, many studies have sound methods and provide rich information about welfare populations. The key is to appreciate the limits of different studies and, where possible, examine multiple studies seeking to answer similar questions.

Although the TANF program is entering its second decade, most of the available research on welfare reform focuses on the effects of waivers to the AFDC program in the years immediately preceding TANF and on the very early years of the TANF program. Two excellent syntheses of research on the impacts of TANF legislation on welfare caseloads, employment, earnings, use of other government programs, fertility and marriage, household income and poverty, food security and housing, and child well-being have been conducted by Grogger, Karoly, and Klerman (2002) and Blank (2002). Both provide comprehensive syntheses of this research up to 2001. Indeed, most available studies use data that predate 2000, and it is still rare to find studies using data any later than 2002.

Since federal welfare reform and the creation of the TANF program, only a handful of studies have used nationally representative data sets to study how the status of current and former welfare recipients has changed in the wake of 1996’s federal welfare reform, and given the diversity of methods and time periods considered, it is difficult to draw out strong, common conclusions. Indeed, the existing studies vary in the populations they consider (e.g., all welfare recipients, single-parent female headed households, only those assistance units whose heads fall into a narrow age range, etc.), the specific years they consider, and the characteristics and outcomes they assess. Even when a common characteristic is considered, it may be measured differently (e.g., mean age v. distribution across age intervals).

Further, studies that rely on different nationally representative data sets may have to use different definitions of a welfare leaver. For example, the SIPP lets an analyst observe month-to-month transitions in welfare receipt, but the NSAF can only identify leavers based on respondents who report no receipt at the time of the interview but some receipt in the past two years. Thus, a SIPP leaver is necessarily a recent leaver and an NSAF leaver may have been off TANF for almost two years. Because those who remain off welfare longer are likely to have different characteristics than the entire group of TANF leavers in the month of exit, it would not be at all surprising to find differences in the characteristics and outcomes of welfare leavers across the two data sets even when the same year is considered.

Given how difficult it is to compare studies at a given point in time, it is even harder to use studies to assess changes over time. One simply cannot use point-in-time information from one data set and compare it to point-in-time information from another data set from a later year and make any reasonable assessment about changes in the characteristics and outcomes of welfare recipients and leavers. Even two different studies that use the same data set but consider different years may be too dissimilar to make assessments over time.

Only a few studies explicitly consider changes over time using consistent data sets and definitions.22 But even these studies only use data through 2002. We simply know very little about how the characteristics and outcomes of welfare recipients and leavers have changed as the sluggish economy of 2000–2002 gave way to moderate growth from 2003 onward.

In addition, even using consistent data, it is difficult to draw conclusions concerning changes in the status and outcomes of welfare populations and the effects of welfare policies and other factors on their outcomes because the composition of the welfare caseload (and by extension, the composition of families leaving welfare) may have changed over time. (We address the evidence on these changes in the next section.) A comparison of employment, income, or any other outcome measure (and attempts to interpret the impact of welfare reform on these changes) needs to consider that the characteristics of who is on welfare and who has left welfare at any two points in time could be substantially different given the large decline in the overall size of the caseload. For example, a finding that employment rates for a group of welfare recipients fell over time needs to be interpreted in light of whether the characteristics indicating greater employability (e.g., higher education levels, fewer health problems) have changed as well. Careful studies will assess the status of welfare populations at two points in time while accounting for observable differences. In addition, given the dramatic decline in the welfare caseload, there may be important unobserved differences between the relatively large welfare population at the start of welfare reform and the relatively small population today. These differences can include day-to-day coping skills, undiagnosed health issues, or attitudes toward work and welfare. Unobserved differences also need to be taken into account when interpreting changes in outcomes over time.

For the balance of this report, we focus on national level data and research to assess how the status of welfare recipients and welfare leavers has changed since federal welfare reform in 1996. However, we will supplement this with findings from experimental studies and location-specific studies using both survey and administrative data to enhance our understanding of the root causes of observed changes and their implications and when no other information is available.




12 These data can be accessed through the internet at http://www.acf.hhs.gov/programs/ofa/caseload/caseloadindex.htm and http://www.acf.hhs.gov/programs/ofa/character/indexchar.htm. (back to footnote 12)

13 Some repeated cross-section surveys do reinterview some subset of respondents from round to round, but these data generally do not lend themselves to longitudinal analysis because the families that are captured in multiple rounds are not necessarily representative of the population or any specific population subgroup and, in general, these families are only interviewed at two points in time. (back to footnote 13)

14 It is important to note that about one-third of the total TANF caseload can be found in just two states: California and New York. Consequently, learning more about trends in the size and status of the TANF caseloads in these two jurisdictions can contribute significantly to understanding national trends. (back to footnote 14)

15 For a discussion of problems matching administrative data across programs see Goerge and Lee (2002); for a discussion of measuring employment and income using administrative and survey data, see Hotz and Scholz (2002). (back to footnote 15)

16 When using administrative data as the source for a survey sample, the quality of the contact information (such as phone number and address) is important for limiting survey nonresponse due to inability to contact respondents.  This may be a greater problem for surveys of former recipients whose information in TANF administrative records may be out-of-date. The problem is exacerbated as the time between exiting welfare and the survey increases.  National surveys using random-digit-dial methods do not have this problem. (back to footnote 16)

17 Nonresponse is a greater concern in the NSAF than in the SIPP and CPS. For an analysis of nonresponse and weighting adjustments in the NSAF, see Triplett (2006). (back to footnote 17)

18 This report cannot provide a comprehensive assessment of the quality of SIPP data. For more information about the problems with the SIPP, see Besharov, Morrow, and Shi (2006); Lamas, Tim, and Eargle (1994); and Rizzo, Kalton, and Brick (1994). (back to footnote 18)

19 Administrative data are from http://www.acf.hhs.gov/programs/ofa/caseload/caseloadindex.htm#2002. (back to footnote 19)

20 One common procedure for imputing or filling in missing data is called hot-decking. Here a valid response for an item drawn from a respondent with similar characteristics to the respondent who did not answer a particular question is used to fill-in the missing response. Other techniques involve using regression-based predictions for the unreported items. For more information on imputation techniques, see Rubin (1987). (back to footnote 20)

21 The NSAF does over-sample low-income families, and 530 observations on welfare recipients is a sufficiently large sample to sustain analysis.  However, the drop in the number of welfare recipients in the NSAF over time illustrates the growing challenges of using broad-based survey data to study this population. (back to footnote 21)

22 See, for example, Acs et al. (2001), Bavier (2003), and Loprest and Zedlewski (2006). The findings from these and other similar papers are discussed in Section III of this report. (back to footnote 22)

 

Table of Contents | Previous | Next