Earth Science Data Terminology and Formats

The DAAC Alliance data centers process, archive, and distribute data sets, or groups of data sets, derived from EOS instruments and other Earth science measurement systems. These datasets can be either standard data products (SDPs) or special data products. SDPs are processed at various levels ranging from Level 0-4. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats.

Data Products and Types

The data centers process, archive, and distribute EOSDIS data products. The products are data sets, or groups of data sets, derived from EOS instruments and other ESE Earth science measurement systems. They can be either standard data products (SDPs) or special data products.

Standard Data Products (SDPs)

Data products are considered to be SDPs if they are:

  • Generated as part of a research investigation using EOS data.
  • Recognized to have wide research utility.
  • Generated routinely.
  • Produced for spatially and/or temporally extensive sets of data.

SDPs are produced at the DAACs or by Science Investigator-led Processing Systems (SIPSs). These products are formally defined in EOSDIS requirements documentation.

Special Data Products

Data products are considered to be special data products if they are:

  • Generated as part of a research investigation using EOS data.
  • Produced for a limited region or time period.
  • Not accepted as standard by the EOS Investigators Working Group (IWG) and NASA Headquarters.
  • Referred to as "special data products" to distinguish them from other nonstandard products such as ancillary data sets.

Special data products are normally generated at the investigators' Scientific Computing Facilities (SCFs).

Data Processing Levels for Standard Data Products

EOSDIS SDPs are processed at various levels ranging from Level 0 to Level 4. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats. All EOS instruments must have Level 1 SDPs. Most have products at Levels 2 and 3, and some have products at Level 4.

The data processing levels described in the table below and referenced in the following sections are identical to the EOSDIS Data Panel's definitions and are consistent with the Committee on Data Management, Archiving, and Computing (CODMAC) definitions.


Data Level Description
Level 0 Reconstructed, unprocessed instrument and payload data at full resolution, with any and all communications artifacts (e.g., synchronization frames, communications headers, duplicate data) removed. (In most cases, the EOS Data and Operations System (EDOS) provides these data to the DAACs as production data sets for processing by the Science Data Processing Segment (SDPS) or by a SIPS to produce higher level products.)
Level 1A Reconstructed, unprocessed instrument data at full resolution, time-referenced, and annotated with ancillary information, including radiometric and geometric calibration coefficients and georeferencing parameters (e.g., platform ephemeris) computed and appended but not applied to the Level 0 data.
Level 1B Level 1A data that have been processed to sensor units (not all instruments have Level 1B data).
Level 2 Derived geophysical variables at the same resolution and location as Level 1 source data.
Level 3 Variables mapped on uniform space-time grid scales, usually with some completeness and consistency.
Level 4 Model output or results from analyses of lower level data (e.g., variables derived from multiple measurements).


Data Format Descriptions

Hierarchical Data Format (HDF)
HDF is designed to facilitate sharing of scientific data. Its features include platform independence, user extendibility, and embedded metadata for units, labels, and other descriptors. Standard data types include multidimensional array, text, table, raster image, and palette. HDF files are portable, and they can be shared across most common platforms, including many workstations and high-performance computers. An HDF file created on one computer can be read on a different system without modification. HDF was developed by the National Center for Supercomputing Applications (NCSA). This format is extensible and can easily accommodate new data models, regardless of whether they are added by the HDF development team or by HDF users.

HDF-EOS
The HDF for the Earth Observing System (HDF-EOS) data format is standard HDF with EOS Core System (ECS) conventions, data types, and metadata. HDF-EOS adds three geolocation data types (point, grid, and swath) that allow file contents to be queried by Earth coordinates and time. An HDF-EOS file also contains ECS core metadata essential for ECS search services. An HDF-EOS file can be read by any tool that processes standard HDF files. A data product need not fit any of the grid, point, or swath models to be considered HDF-EOS. If the product includes ECS metadata, it is a valid HDF-EOS file.

HDF-EOS is implemented as a C library extension of the standard HDF library (with FORTRAN bindings). This format ensures that data can be accessed by EOSDIS scientists and nonscientists from multiple disciplines. Use of HDF-EOS also can eliminate duplication of software development efforts, especially for analysis and visualization software. EOSDIS data providers must supply written justification for deviating from the HDF-EOS (or HDF) format.

The network Common Data Form (netCDF)
netCDF is an interface for array-oriented data access and a freely distributed collection of software libraries for C, FORTRAN, C++, Java, and Perl that provide implementations of the interface. The netCDF software was developed at the Unidata Program Center in Boulder, Colorado, and augmented by contributions from other netCDF users. The netCDF libraries define a machine-independent format for representing scientific data. Together, the interface, libraries, and format support the creation, access, and sharing of scientific data.

netCDF data have the following features: (1) self-describing--a netCDF file includes information about the data it contains; (2) architecture-independent--a netCDF file is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers; (3) directly accessible--a small subset of a large data set may be accessed without the need to first read through the preceding data; (4) appendable--data can be appended to a netCDF data set along one dimension without copying the data set or redefining its structure; and (5) sharable--one writer and multiple readers can simultaneously access the same file. (This information was taken from the Unidata Web site.)

American Standard Code for Information Interchange (ASCII)
An ASCII text file is one in which each byte represents one character according to the ASCII code. ASCII files are human readable and are sometimes called plain text files. Files that have been formatted with a word processor should be transmitted as binary files to preserve the formatting.

Binary
A binary file is computer readable but not human readable. Binary formats are used for executable programs and numeric data, whereas text formats are used for textual data. Many files contain a combination of binary and text formats. Such files are usually considered to be binary.