Introduction

At the request of several data providers, we have prepared guidance on information management practices that the ORNL DAAC requires before it can archive and distribute data.

During Data Preparation

The ORNL DAAC offers the following Best Practices that investigators should perform to improve the usability of their data.

  1. Assign Descriptive File Names
  2. Use Consistent and Stable File Formats
  3. Define the Parameters
  4. Use Consistent Data Organization
  5. Perform Basic Quality Assurance
  6. Assign Descriptive Data Titles
  7. Provide Documentation

Metadata

In order to archive and distribute various data sets, we need metadata, which is information about the data we distribute. Metadata is used both to describe the data so that others can understand what it represents and to find data of interest. Metadata can be in the form of a document or a specially formatted list of the parameters, keywords, spatial and temporal extent, investigators, and other information about the data set. Please contact the DAAC by e-mail for assistance in preparing and formatting metadata.

The metadata accompanying your data should be written for a user 20 years into the future -- what does that person need to know to use your data properly? Prepare the metadata for a user who is unfamiliar with your project, methods, or observations.

A small amount of time invested in documenting your data will save money in the future. Data producers and users cannot afford to be without documented data. The initial expense of documenting data clearly outweighs the potential costs of duplicated or redundant data generation.

See the Best Practices section on Provide Documentation for a description of the metadata that should be in the data set documentation.

Submitting Data to the ORNL DAAC

All of the holdings at the ORNL DAAC are organized into what are called "data sets," a term used loosely to include all data archived at the DAAC. A data set includes all the information associated with a single research effort (typically the same investigator(s), same methods, possibly several sites or years). Data set components consist of the data and metadata stored in digital files. These files contain either tabular data, spatial data, or companion information.

Tabular data sets present and store your research results in list or spreadsheet style. These data will be stored in our archives as American Standard Code for Information Interchange (ASCII) text in your original column/row arrangement. Using ASCII file formats will ensure that your data are readable in the future.

Spatial data sets present and store your research results as images or Geographic Information System (GIS) files. We don't offer any general recommendations about vector or raster data formats, except to say that the format needs to be clearly documented. These data sets may be stored in our archives as binary, netCDF, HDF-EOS, or ASCII files. Companion information is any additional information pertaining to your research. This information includes definitions of the parameters, quality control or quality assurance steps you have taken, instructions on how to use your findings, descriptions of your findings, caveats and errata, or any other information you want to include. The companion information will be stored in our archives as ASCII text files, pdf, html, or JPEG files.

If you are ready to start preparing data sets to archive, please contact the DAAC for assistance.


Cartoon

Courtesy of American Scientist (Vol. 886, p. 525)