Earth Science Data Terminology and Formats
The DAAC Alliance data centers process, archive, and distribute data sets, or groups of data sets, derived from EOS instruments and other Earth science measurement systems. These datasets can be either standard data products (SDPs) or special data products. SDPs are processed at various levels ranging from Level 0-4. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats.
Data Products and Types
The data centers process, archive, and distribute EOSDIS data products. The products are data sets, or groups of data sets, derived from EOS instruments and other ESE Earth science measurement systems. They can be either standard data products (SDPs) or special data products.
Standard Data Products (SDPs)
Data products are considered to be SDPs if they are:
- Generated as part of a research investigation using EOS data.
- Recognized to have wide research utility.
- Generated routinely.
- Produced for spatially and/or temporally extensive sets of data.
SDPs are produced at the DAACs or by Science Investigator-led Processing Systems (SIPSs). These products are formally defined in EOSDIS requirements documentation.
Special Data Products
Data products are considered to be special data products if they are:
- Generated as part of a research investigation using EOS data.
- Produced for a limited region or time period.
- Not accepted as standard by the EOS Investigators Working Group (IWG) and NASA Headquarters.
- Referred to as "special data products" to distinguish them from other nonstandard products such as ancillary data sets.
Special data products are normally generated at the investigators' Scientific Computing Facilities (SCFs).
Data Processing Levels for Standard Data Products
EOSDIS SDPs are processed at various levels ranging from Level 0 to Level 4. Level 0 products are raw data at full instrument resolution. At higher levels, the data are converted into more useful parameters and formats. All EOS instruments must have Level 1 SDPs. Most have products at Levels 2 and 3, and some have products at Level 4.
The data processing levels described in the table below and referenced in the following sections are identical to the EOSDIS Data Panel's definitions and are consistent with the Committee on Data Management, Archiving, and Computing (CODMAC) definitions.
|
Data Format Descriptions
- Hierarchical Data Format (HDF)
- HDF is designed to facilitate sharing of scientific data. Its
features include platform independence, user extendibility, and embedded
metadata for units, labels, and other descriptors. Standard data types
include multidimensional array, text, table, raster image, and palette.
HDF files are portable, and they can be shared across most common
platforms, including many workstations and high-performance computers.
An HDF file created on one computer can be read on a different system
without modification. HDF was developed by the National Center for
Supercomputing Applications (NCSA). This format is extensible and can
easily accommodate new data models, regardless of whether they are added
by the HDF development team or by HDF users.
- HDF-EOS
- The HDF for the Earth Observing System (HDF-EOS) data format is
standard HDF with EOS Core System (ECS) conventions, data types, and
metadata. HDF-EOS adds three geolocation data types (point, grid, and
swath) that allow file contents to be queried by Earth coordinates and
time. An HDF-EOS file also contains ECS core metadata essential for ECS
search services. An HDF-EOS file can be read by any tool that processes
standard HDF files. A data product need not fit any of the grid, point,
or swath models to be considered HDF-EOS. If the product includes ECS
metadata, it is a valid HDF-EOS file.
HDF-EOS is implemented as a C library extension of the standard HDF library (with FORTRAN bindings). This format ensures that data can be accessed by EOSDIS scientists and nonscientists from multiple disciplines. Use of HDF-EOS also can eliminate duplication of software development efforts, especially for analysis and visualization software. EOSDIS data providers must supply written justification for deviating from the HDF-EOS (or HDF) format. - The network Common Data Form (netCDF)
- netCDF is an interface for array-oriented data access and a freely
distributed collection of software libraries for C, FORTRAN, C++, Java,
and Perl that provide implementations of the interface. The netCDF
software was developed at the Unidata Program Center in Boulder,
Colorado, and augmented by contributions from other netCDF users. The
netCDF libraries define a machine-independent format for representing
scientific data. Together, the interface, libraries, and format support
the creation, access, and sharing of scientific data.
netCDF data have the following features: (1) self-describing--a netCDF file includes information about the data it contains; (2) architecture-independent--a netCDF file is represented in a form that can be accessed by computers with different ways of storing integers, characters, and floating-point numbers; (3) directly accessible--a small subset of a large data set may be accessed without the need to first read through the preceding data; (4) appendable--data can be appended to a netCDF data set along one dimension without copying the data set or redefining its structure; and (5) sharable--one writer and multiple readers can simultaneously access the same file. (This information was taken from the Unidata Web site.) - American Standard Code for Information Interchange (ASCII)
- An ASCII text file is one in which each byte represents one
character according to the ASCII code. ASCII files are human readable
and are sometimes called plain text files. Files that have been
formatted with a word processor should be transmitted as binary files to
preserve the formatting.
- Binary
- A binary file is computer readable but not human readable. Binary
formats are used for executable programs and numeric data, whereas text
formats are used for textual data. Many files contain a combination of
binary and text formats. Such files are usually considered to be
binary.