U.S. Department of Agriculture National Agricultural Statistics Service Research and Development Division |
of the Director |
|
|
DEVELOP AND EVALUATE A GENERALIZED AUTOMATED EDIT AND IMPUTATION SYSTEM FOR SURVEY USE
An automated edit and imputation system would handle obvious data inconsistencies without statistician intervention, and leave only those inconsistencies that cannot be resolved by computer edit actions for statisticians to resolve. This approach has potential for reducing editing burden during the tight survey periods. However this assumes the computer's resolution of the inconsistencies is reasonable. In 1997, Research staff examined the effectiveness of the Structured Program for Economic Editing and Referrals (SPEER) system developed by the Bureau of the Census, with Agricultural Survey data. While the system had many advantages, there were some drawbacks to using it in NASS, most prominently the edit specifications were limited to ratio edit and simple balance edits. More complex edits needed to be handled by another system. Imputation options were also somewhat limiting, although other options could have been programmed. The methodology developed by Statistics Canada does not require these restrictions on edit formulations. Its only restriction is that the edits be linear. It also allows more imputation options for variables within a single survey. However use of Statistics Canada's system (GEIS or Generalized Edit and Imputation System) is dependent on the use of an Oracle database, which NASS does not use. During 1998, Todd Todaro, of the Research Division staff, developed a system in SAS which uses the methodology developed by Statistics Canada. This system, called AGGIES for the AGricultural Generalized Imputation and Edit System, includes modules for edit specification, checking the consistency of the edit specifications, editing data and determining the minimum number of values to change in order to pass all the edits (error localization), item imputation specifications, item imputation, and an outlier detection module to flag/exclude unusual values in imputation routines. AGGIES was evaluated and the results were encouraging. The research report describing the system and the results was distributed in January 1999. Additional evaluation for other States, commodities, and surveys/censuses are being conducted. In addition, enhancements and modifications to the system are being made. For example, a screen to interactively review and override the changes made by AGGIES is being tested, as well as integration with other SAS-based macro-editing modules. Another project is to develop a donor imputation module. This would potentially better preserve the original distribution of the data. |