Go to the NASS Home Page U.S. Department of Agriculture
National Agricultural Statistics Service
Research and Development Division

Office
of the Director
Census and Survey Research Branch
Data Quality | Statistical Methodology
Geospatial Information Branch
Area Frame | Spatial Analysis

DEVELOP AND EVALUATE A GENERALIZED AUTOMATED EDIT AND IMPUTATION SYSTEM FOR SURVEY USE 

    The National Agricultural Statistics Service (NASS) conducts a wide variety of agricultural surveys and censuses. Typically the surveys are conducted in a rigid time frame. For example, data collection begins around the first of the month, editing (to correct nonsampling errors to the extent possible) and imputation of the data must be near completion in the following two weeks, and the results are published near the end of the month. Thus timeliness is an important attribute of the quality of the data. New and innovative procedures are sought to improve the efficiency while maintaining the timeliness of the edit and imputation process.

    An automated edit and imputation system would handle obvious data inconsistencies without statistician intervention, and leave only those inconsistencies that cannot be resolved by computer edit actions for statisticians to resolve. This approach has potential for reducing editing burden during the tight survey periods. However this assumes the computer's resolution of the inconsistencies is reasonable.

    In 1997, Research staff examined the effectiveness of the Structured Program for Economic Editing and Referrals (SPEER) system developed by the Bureau of the Census, with Agricultural Survey data. While the system had many advantages, there were some drawbacks to using it in NASS, most prominently the edit specifications were limited to ratio edit and simple balance edits. More complex edits needed to be handled by another system. Imputation options were also somewhat limiting, although other options could have been programmed. 

    The methodology developed by Statistics Canada does not require these restrictions on edit formulations. Its only restriction is that the edits be linear. It also allows more imputation options for variables within a single survey. However use of Statistics Canada's system (GEIS or Generalized Edit and Imputation System) is dependent on the use of an Oracle database, which NASS does not use. 

    During 1998, Todd Todaro, of the Research Division staff, developed a system in SAS which uses the methodology developed by Statistics Canada. This system, called AGGIES for the AGricultural Generalized Imputation and Edit System, includes modules for edit specification, checking the consistency of the edit specifications, editing data and determining the minimum number of values to change in order to pass all the edits (error localization), item imputation specifications, item imputation, and an outlier detection module to flag/exclude unusual values in imputation routines. AGGIES was evaluated and the results were encouraging. The research report describing the system and the results was distributed in January 1999. 

    Additional evaluation for other States, commodities, and surveys/censuses are being conducted. In addition, enhancements and modifications to the system are being made. For example, a screen to interactively review and override the changes made by AGGIES is being tested, as well as integration with other SAS-based macro-editing modules. Another project is to develop a donor imputation module. This would potentially better preserve the original distribution of the data.