The three papers described below were commissioned in 2004 from Arthur Chapman by the GBIF DIGIT programme to address concerns about data quality and use. Arthur Chapman has spent over 20 years working in a number of capacities for the Australian Government, mainly in data management, information analysis, environmental modeling, development of environmental decision support systems and information presentation. He has been a member of a number of international organizations and committees involved with the utilizing biodiversity data. (For more details see: http://www.anbg.gov.au/people/chapman.arthur.html).
Principles of Data Quality
Data quality and errors in data are often neglected issues with environmental databases, modeling systems, GIS, decision support systems, etc. Too often, data are used uncritically without consideration of the errors contained within, and this can lead to erroneous results, misleading information, unwise environmental decisions and increased costs. The rapid increase in the exchange and availability of taxonomic and species-occurrence data has now made the consideration of the principles concerning data quality an important agenda item as users of the data begin to require more and more detail on the quality of this information. This paper expands on these issues and discusses a number of principles of data quality that should become core to the business of the natural history collections and observational communities as they release their data to the broader community.
Principles and Methods of Data Cleaning - Primary Species and Species-Occurrence Data
This document examines methods for preventing as well as detecting and cleaning errors in primary biological collections databases. It discusses guidelines, methodologies and tools that can assist the natural history collections community and the observational communities to follow best practice in digitizing, documenting and validating information. But first, it also sets out a set of simple principles that should be followed in any data cleaning exercises.
Uses of Primary Species-Occurrence Data
This paper examines uses for primary species occurrence data in research, education and in other areas of human endeavor, and provides examples from the literature of many of these uses. The paper examines not only data from labels, or from observational notes, but the data inherent in museum and herbarium collections themselves, which are long-term storage receptacles of information and data that are still largely untouched. Projects include the study of the species and their distributions through both time and space, their use for education, both formal and public, for conservation and scientific research, use in medicine and forensic studies, in natural resource management and climate change, in art, history and recreation, and for social and political use.
It is recognized that our understanding of these topics and the tools available for facilitating error checking and cleaning is rapidly evolving and as a result, GBIF sees these papers as interim discussions of the issues as they stood in 2004. We expect there will be future versions of these documents and would appreciate the data provider and user communities' input. Comments and suggestions can be submitted to:
Larry Speers
Senior Programme Officer
Digitization of Natural History Collections
Global Biodiversity Information Facility
Universitetsparken 15
2100 Copenhagen Ø
Denmark
E-mail: lspeers@gbif.org
and
Arthur Chapman
Australian Biodiversity Information Services
PO Box 7491, Toowoomba South
Queensland 4352
Australia
E-mail: papers.digit@gbif.org
|