skip to content
Seal of U.S. Department of Labor
U.S. Department of Labor
Employment & Training Administration

Photos representing the workforce - Digital Imagery© copyright 2001 PhotoDisc, Inc.

www.doleta.gov
Advanced Search
About Us Find Job & Career Information Business and Industry Workforce Professionals Grants and Contracts ETA Library Foreign Labor Certification Performance and Results Regions and States
ETA Home  >  National Agricultural Workers > 
Sitemap   Printer Friendly Version

The National Agricultural Workers Survey

Statistical Methods of the
National Agricultural Workers Survey

Introduction

The National Agricultural Workers Survey (NAWS) is an employment-based survey of randomly sampled hired crop workers. Designed in response to the Immigration Reform and Control Act of 1986, the survey was to determine if a shortage of seasonal agricultural workers was to be expected annually from 1990 to 1993. Since 1993, when the IRCA mandate expired, the Department of Labor (DOL) has continued carrying out the survey and making its findings available to numerous federal government agencies. Today, NAWS is the only national information source on the demographic characteristics, and employment, health and living conditions of hired crop workers. NAWS findings are used for a multitude of purposes, including informing debate on immigration policy, allocating farm worker program funds, and program design and evaluation.

Overview of Sampling Design

The mobility of a large segment of the hired crop workforce and the temporal nature of agricultural work pose unique challenges to obtaining a nationally representative sample of migrant and seasonal crop farm workers. To overcome these challenges, the NAWS utilizes a multi-stage sampling process that focuses on a sampling universe of employed crop farm workers. As a household survey is infeasible, NAWS locates workers at their places of employment.

The NAWS is conducted in three cycles each year to account for seasonal changes in farm worker employment patterns. Interviews are allocated by proportional allotment across 12 regions and three cycles based on quarterly data from the U.S. Department of Agriculture’s (USDA) Agricultural Labor Survey. Within the 12 regions, 80 farm labor areas (FLAs) form a roster from which sampling locations are selected. These FLAs are aggregates of counties that have similar farm labor usage. The FLAs are selected in each region with probabilities proportional to the size of the seasonal farm labor force.

Within FLAs, counties are drawn with probabilities proportional to the size of farm labor expenses. In order to maintain regionally representative data and yet have an adequate distribution of rare events, simple random sampling is used in the sampling of growers. The end stage of sampling is the selection of farm workers at the establishment. Farm workers are selected using simple random sampling and measures are taken to avoid bias such as selection across sites at an establishment where workers can be found at more than one field and where they may be engaged in various tasks. This sampling scheme minimizes bias and creates a nationally representative selection of farm workers.

Description of Universe

Entity Universe Sample
Agricultural Region 12 12
Farm Labor Areas 487 80
Farms 2,000,000 1,500
Crop Workers (estimated) 1,800,000 3,200

The universe for the study is the population of field workers active in crop agriculture in the continental United States. The NAWS uses multi-stage sampling, relying on probabilities proportional to size, to interview about 3,200 randomly selected crop workers each fiscal year.

Multi-Stage Sampling Procedure

Regions, Farm Labor Areas, and Counties

NAWS samples workers from 12 regions, which are aggregated from 17 USDA-designated agricultural regions (Table 1).

Table 1. NAWS Sampling Regions

NAWS Sampling Region

USDA Region
Code & Name

States in USDA Region

AP12 AP1          Appalachian I NC, VA
AP2           Appalachian II KY, TN, WV
CBNP CB1          Corn Belt I IL, IN, OH
CB2           Corn Belt II IA, MO
NP             Northern Plains KS, NE, ND, SD
CA CA             California CA
DLSE DL            Delta AR, LA, MS
SE              Southeast I AL, GA, SC
 FL FL              Florida FL
LK LK             Lake MI, MN, WI
MN12 MN1          Mountain I ID, MT, WY
MN2          Mountain II      CO, NV, UT
MN3 MN3          Mountain III AZ, NM
NE1 NE1           Northeast I CT, ME, MA, NH, NY, RI, VT
NE2 NE2           Northeast II DE, MD, NJ, PA
PC PC             Pacific OR, WA
SP SP              Southern Plains OK, TX

The number of interviews per region is proportional to the size of the seasonal farm labor force in that region, as determined by the USDA’s National Agricultural Statistics Service (NASS).1

The next stage of sampling is a random selection of farm labor areas (FLAs) in each region, using probabilities proportional to the size of seasonal agricultural payroll. A FLA is a grouping of counties that are similar in agricultural production and farm labor use. Labor expense data at the county level for directly-hired and labor-contracted crop workers are obtained from the Census of Agriculture. Eighty FLAs are selected to represent the 12 agricultural regions, with a minimum of three FLAs being sampled in each region. These 80 FLAs contain 395 counties.

To account for the seasonality of the industry, NAWS interviews are administered three times a year in cycles lasting ten to twelve weeks. The cycles start in February, June and October. The number of interviews conducted in each cycle is proportionate to the number of farm workers employed at that time of the year and is determined by pro-rating the quarterly estimates of farm workers (obtained from the USDA NASS-administered Agricultural Labor Survey), to match the three NAWS cycles.

For each cycle, approximately 30 farm labor areas (FLAs) are selected using probabilities proportional to size (PPS) of the seasonal agricultural payroll. Using the same PPS sampling, the number of interviews is allocated to each selected FLA. Finally, within each FLA, counties are drawn in a random order, again using PPS of the seasonal agricultural payroll. As discussed above, the Census of Agriculture provides county-level hired and contract labor expense data for hired crop workers to the NAWS. These data are used to approximate the size of the crop labor force in each county. Interviewing begins in the first selected county and, as a county's work force is exhausted, moves to the next randomly-selected county on the list. This procedure is followed until all the allocated interviews in a FLA have been completed.

Selection of Employers

Within each selected county, employers are selected at random from a grower sampling list. The list of agricultural employers is constructed primarily from the unemployment insurance (UI) micro-data file. The Bureau of Labor Statistics (BLS) provides the UI data file directly to the NAWS contractor. For each employer, the file contains a 4-digit production code from the North American Industry Classification System, number of workers, total quarterly payroll, and weeks worked. This information is used to help determine employer eligibility and to choose only crop agriculture employers. The NAWS contractor supplements the BLS information with data from other sources to create the grower sampling list.

Interviewing Process

Once the randomly selected employer is located, the interviewer determines if the employer is familiar with his/her work force. If not, the interviewer seeks the name of the packinghouse manager, personnel manager, farm labor contractor, or crew leader who can help construct a sampling frame of the workers in the operation. The interviewers have specific sampling instructions, designed by a sampling statistician, which they must follow.

Sample Size

The number of persons interviewed each fiscal year has fluctuated between 1900 and 3600. In FY 2004, 3,046 crop workers were interviewed. The target N for fiscal 2005 is 2,400.

Reliability Requirements

Probability sampling used in the design and implementation of the survey controls the sampling errors of the survey's estimates. Sampling error estimates are calculated from the survey data. The standard errors for proportions based on the total sample are less than five percent. Standard errors for major sub-domains of the population are larger but acceptable for this study.

Response Rates

As farm workers are often reticent, considerable effort is made to maximize both employer and worker response rates. Growers' associations, extension and social services, training institutions, as well as individual employers and farm workers are informed of the importance of the survey and the need for their voluntary cooperation. Respondents are provided a pledge of confidentiality and $20 for their participation.

Employer Response Rate

The sampling design (described above) involves obtaining a random selection of employers. In Fiscal Year 2003, 75 percent of the employers who were eligible, i.e., who employed crop workers when they were contacted by an interviewer, agreed to participate in the survey.

Worker Response Rate

As there are no universe lists of workers, the sampling frame of workers is constructed after contact with the employer. After creating the worker frame, a random sample of workers is chosen. Interviewers then approach workers directly to set up interview appointments in the worker’s home or other agreed upon location. Approximately ninety percent of the approached workers agree to be interviewed.

Sample Weighting

According to the probability of inclusion, post-sampling weights are constructed taking into account the year, season and region in which the farm worker was sampled, as well as the number of days per week the farm worker was employed. Appendix A "Post-Sampling Weights," describes how these weights are constructed.

Non-response Bias

The honoraria to cover any costs farm worker respondents incur as a result of participating have enabled the survey to achieve an overall usable response rate of at least 90 percent. This high level of response helps to minimize non-response bias.


APPENDIX A: POST-SAMPLING WEIGHTS

Introduction

The NAWS sample is drawn with probabilities-proportional-to-size (PPS) and is designed to be self-weighting. According to the sample design, each worker in a region has, in theory, an equal chance of selection on any given day. Data limitations, however, make this design difficult to achieve in practice. For example, there is no accurate measure of the number of workers at any given farm for the weeks of data collection. As such, post-sampling weights are required.

Other small deviations from the sampling plan make it necessary to implement post-sampling weights. These deviations include discrepancies between the number of interviews allocated and completed, and the unequal probabilities of finding part-time versus full-time workers (probabilities which can only be established after interviews are conducted). Post-sampling weights, therefore, are used in the NAWS to adjust the relative value of each interview so that national estimates can be obtained from the sample.

The weighting scheme is composed of five weights. The first weight (week weight) compensates for the probability of finding respondents who have workweeks of differing lengths. The next three weights (region, cycle, and year) adjust for the relative importance of a region, a sampling cycle, and a sampling year. The last adjustment, the season weight, accounts for the different probabilities of workers who work in more than one season/cycle.

The most important post-sampling adjustment is to scale each interview to represent a certain number of farm workers: the size adjustment. The size adjustment is done at the region level by the region weight, which uses measures of size obtained from the U.S. Department of Agriculture’s (USDA) Agricultural Labor Survey. The cycle and year weights serve slightly different roles. They allow different cycles and sampling years to be combined for statistical analysis. These weights are also based on USDA measures of size.

The PPS sampling utilized by the NAWS avoids the need for relatively large post-sampling weights. This procedure also reduces the impact of the weights on the variability of the final estimates.

Deciding on the Level of Adjustment for Size

Even a plan that is self-weighting in principle can utilize sampling weights to adjust for deviations from the plan. If, for example, ten interviews should be obtained at a farm but only two are done, those two interviews could be given five times the weight they would have received otherwise. Thus, each interview needs to be adjusted to represent a certain value in terms of size.

Instead of making this adjustment at the farm level, it could be made at any higher level in the sampling plan. In general, more reliable size information is obtained by raising the level at which adjustments are made. This is due to the statistical effect of averaging, greater year-to-year stability over larger geographic areas, and the absence or suppression of data due to confidentiality considerations. On the other hand, lower-level adjustments are more sensitive, if the information used for making the adjustments is reasonably accurate.

For two important reasons, the size adjustment in the NAWS is made at the region level. First, the region is the lowest level with enough interview coverage to calculate size adjustment weights2. Second, the NAWS uses measures of size provided by the USDA, which are reported by quarter and region3. By using USDA figures to make the size adjustment, the NAWS can adjust the statistics by season and region to make the results more reliable. The size adjustment is therefore made at the region level in the form of the region weight.

Adjustment for Days Worked per Week: the Week Weight

The week weight is the first of four weights in the sampling scheme. It adjusts for the probability of finding part-time versus full-time farm workers. A part time worker, who works only two or three days per week, is less likely to be found by the interviewing staff than one who works six days per week. Therefore, respondents are weighted inversely proportional to the length of their workweek.

A conservative adjustment for the number of days worked is appropriate to avoid excessively large sampling weights. Field reports indicate that relatively few workers are contacted on Sundays and a review of the interviews indicates that virtually no workers report Sunday hours without Saturday hours. Accordingly, workers reporting at least six workdays per week nearly always have a full chance of selection. Adjustments are therefore made only for workers with less than six days of work per week. This choice of six (rather than seven) days affects the weights by less than 17%, not an appreciable amount statistically.

The week weight is computed as follows. The number of days per week worked is obtained from the current farm task at the time of the interview (if two tasks are reported, the one with more days per week is used). Seven-day workweeks are truncated to six, as explained previously. The week weight is defined as: WTWEEK = 6 / (length of the workweek).

For the few workers not reporting the number of days, WTDAYS is assigned a default value of 1.

Size Adjustment at the Region Level: the Region Weight

The region weight, which is based on USDA measures of regional farm employment activity, adjusts for the number of interviews collected in a region relative to the region’s importance4. If the number of interviews collected is small compared to the region’s importance, then a weight greater than one is assigned to each interview, and vice versa. This adjustment ensures that the sample is nationally representative.

Correspondence Between the USDA Data and the NAWS Sampling Cycles

The calculation of the region weight relies on two pieces of information: the USDA regional measures of size and the number of interviews completed in each region. The first step in the process of calculating the region weight is to apportion the USDA quarterly size figures among the NAWS sampling cycles.

While the USDA figures are reported quarterly, the NAWS sampling year covers a 12-month non-overlapping period from September to August, divided into three cycles. Accordingly, it is necessary to adjust the USDA figures to fit the NAWS sampling frame by apportioning the four quarters into three cycles.

For example, the number of farm workers in the NAWS fall cycle for a region is assumed to be the total number of workers for that region in USDA quarter 1 of the current fiscal year (FYc) plus one-third the number of workers for that region in USDA quarter 2 of the next fiscal year (FYp). The algorithm for the winter, spring, and summer cycles is interpreted similarly.

Grouping NAWS Regions According to Interview Coverage

The second step in calculating the region weight is to determine the appropriate NAWS region grouping based on interview coverage. The 12 NAWS regions cannot always be weighted separately since there is insufficient interview coverage for some regions in some of the sampling cycles, particularly in the winter5. It is therefore necessary to combine regions in case of non-sampling or when few interviews are allocated.

When a region has no interviews in a sampling cycle it is automatically combined with a sampled region that has similar agricultural characteristics in that season. This is necessary since the region with no interviews has a certain amount of USDA-reported size, which has to be allocated within the sample. Once the two regions are combined, the interviews in the sampled region represent the sum of the size in both regions. This process can be extended to any number of regions, provided they are agriculturally similar in the given cycle.

In addition to non-sampled regions, some regions have low interview coverage in certain cycles. As it would not be sensible to apportion the entire size for such a region among a handful of interviews6, low-coverage regions are combined with agriculturally similar regions7.

Once the USDA figures are allocated to the appropriate NAWS regions (or combinations thereof), the region weight is calculated8. The region weight is essentially the size of a region divided by the total number of interviews in that region. Thus, if the number of interviews increases for the same size, the region weight decreases. The region weight is attached to all interviews in the region.

Combining Different Sampling Cycles: the Cycle Weight

The NAWS combines data from the different sampling cycles (seasons) within the same sampling year in order to generate more observations for statistical analysis. To combine cycles it is necessary to adjust for the number of farm workdays represented in each cycle in relation to the number of interviews collected in the cycle. For instance, if the same number of interviews are obtained in each of the three cycles in fiscal year 2003, but the USDA reports more workers for the fall and spring/summer cycles compared to the winter cycle, then the interviews in the fall and spring/summer should be worth relatively more in terms of size than the interviews conducted in the winter cycle. Accordingly, the interviews in the winter are down-weighted in relation to the interviews in the other seasons (cycles) before the cycles are combined.

The cycle weight is calculated similarly to the region weight, but at the cycle- rather than region-level: the sum of the USDA size for a cycle is divided by the number of interviews in that cycle.

Combining Different Sampling Years: the Year Weight

The year weight allows different sampling years to be combined for statistical analysis. It follows the same rationale as the cycle weight, but at the sampling-year level. If the same number of interviews is collected in each sampling year, interviews taking place in years with more farm work activity are weighted more heavily in the combined sample.

Combining sampling years takes into account budgetary effects on sample size across sampling years. For example, if an increase in the NAWS budget results in the tripling of interviews obtained in a sampling year, and two sampling years are joined without any adjustment, the larger sampling year would have an unduly large effect on the results.

To avoid this, the year weight is calculated as the ratio of the total number of farm workers in a sampling year to the number of interviews obtained in that sampling year.

The Problem of Double Counting Workers: the Season Weight

The calculation of worker-based weights is complicated by the fact that workers have the potential of being sampled several times a year. While the USDA reports the number of farm workers employed each season, the same worker could be reported in multiple seasons. Because of this repetition of workers across seasons, it is impossible to derive the total number of persons working in agriculture during the year. The only information available to avoid double-counting farm workers comes from the NAWS itself. Although NAWS employment information is not available for each interviewee for each cycle during the year, one-year retrospective employment histories are obtained from each respondent.

In constructing the seasonal weight, the following two assumptions are imposed: 1) workers who worked in a previous season will also work in the next corresponding season (for example, a worker sampled in spring 2003 who reported having worked the previous summer (2002) is assumed to work in the following summer (2003)9, and 2) a worker is equally likely to be sampled in each season worked.

The second assumption is based on the relationship between the amount of farm work done by the worker in each season and the number of interviews obtained in that region for the season. Recall that the NAWS interview allocation is proportional to seasonal agricultural payroll in the counties. Thus, the probability of being sampled is related to the amount of work performed by individual workers.

With these simplifying assumptions, it is possible to calculate a seasonal weight, which is simply the inverse of the number of seasons the interviewee did farm work during the previous year.

The season weight is similar to the week weight in the sense that respondents who spend more time (seasons) working in agriculture have a greater chance of being sampled. Hence, to account for unequal sampling probability, the weighting has to be inversely proportional to the number of seasons worked. As previously noted, there are only three NAWS seasons (cycles) per year and, by definition, an interviewee always performs farm work during the trimester in which he\she is sampled. Furthermore, the one-year retrospective employment history reveals which of the two previous trimesters the respondent did farm work. If the interviewee didn’t work in either of the two previous trimesters, the seasonal weight is 1/1 or 1.00. If the interviewee worked during one of the two prior trimesters, the seasonal weight is 1/2 or 0.50. Finally, if the interviewee worked in both of the prior trimesters, the seasonal weight is 1/3 or 0.33.

Once the individual components are calculated, composite weights are calculated as the product of the week, season, and region weights. The cycle and year weights are also factored into the composite weights when multiple cycles or sampling years are used. The composite weights are adjusted so that the sum of the weights is equal to the total number of interviews at the next level of aggregation. These adjusted composite weights based on farm workers can then be used for calculating the estimated proportion of workers with various attributes. The composite weight (PWTYCRD) is used for almost all NAWS analysis. It is included in the public access dataset.


Footnotes

[1] Each quarter, USDA NASS provides DOL a special tabulation on the number of hired farm workers for each NAWS sampling region. 

[2] Most of the 12 NAWS regions are visited in every cycle.  If a NAWS region is not sampled in a cycle, or if there are too few interviews, the region can be combined with adjacent regions for weighting purposes.

[3] The USDA is the only source of quarterly statistics on levels of farm worker employment.   The Census of Agriculture, for instance, reports data annually rather than quarterly, and the statistics are only published every five years. 

[4] The USDA figures are reported by region and quarter, which allows the weight to be sensitive to seasonal fluctuations in regional importance. 

[5] The NAWS allocates interviews at the FLA level, and only a subset of the FLAs is chosen for sampling in each cycle.  This process makes the sampling sensitive to seasonal fluctuations in size and doesn’t ensure that every NAWS region is represented in every sampling cycle or that there is ample interview coverage in the regions that do enter the sampling.

[6] This would place too much importance on a small number of interviews, possibly skewing the national results. 

[7] Most of the regional combination occurs during winter cycles, when there is very little agricultural activity in the Northeast and Midwest regions, and in some Southern states.  In the winter cycles, the regions AP12, CBLK, DLSE, MN12, and NE12 are generally combined into the Rest of Country (ROC) region.

[8] When regions are combined, the USDA agricultural size is added across the combined regions and apportioned among the sum of the interviews in those regions.

[9] For some purposes, including the calculation of changes in work histories from year to year, this assumption cannot be used.  However, for purposes such as obtaining demographic descriptions of the worker population, this assumption provides satisfactory estimates



 
Created: July 01, 2005