Data & Metadata Formats Printer-friendly version Add This

Data Formats

HDF

The Hierarchical Data Format (HDF) is designed to facilitate managing and sharing scientific data. HDF includes two formats (HDF4 and HDF5), software for accessing data in HDF formats, and applications for working with HDF data. HDF is designed for efficient storage and access of high volume, complex data, and for mixing varieties of data types in a single container. HDF libraries are used to read and write data, to define data types and structures for applications, and to control how data is stored. HDF applications include commercial and free software for viewing, creating, comparing, searching, analyzing and visualizing HDF data, and for converting between HDF and other formats. There are specialized libraries for HDF in application domains. These libraries promote the standard use of HDF, enabling data consumers to more easily share their data and applications. Some libraries, such as HDF-EOS, are broad in scope, and support a very wide range of applications. For more information about HDF as a scientific data format, see http://hdfgroup.org.

HDF-EOS

Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA’s primary format for standard data products derived from EOS instruments. Because many Earth science data structures need to be geolocated, NASA developed the HDF-EOS format with additional conventions and data types for HDF files. There are two versions of HDF-EOS: HDF-EOS2 and HDF-EOS5. HDF-EOS2 uses HDF4 and HDF-EOS5 uses HDF5.

HDF-EOS2 and HDF-EOS5 support three geospatial data types (grid, point, swath) and HDF-EOS5 also supports a “Zonal Average” datatype. HDF-EOS provides uniform access to diverse data types in a geospatial context. The HDF-EOS software libraries allow a user to query or subset the contents of a file by Earth coordinates and time if there is a spatial dimension in the data. HDF-EOS also provides a container for EOS inventory, archive and product specific metadata. HDF-EOS2 is used operationally by MODIS, MISR, ASTER, Landsat, AIRS and other EOS instruments. HDF-EOS5 is used by EOS Aura instruments.

Tools that process standard HDF files will also read HDF-EOS files; however, standard HDF library calls cannot access geolocation data, time data, and product metadata as easily as with HDF-EOS library calls. For an overview of data tools, see Section 6. For more information on HDF-EOS, see http://www.hdfeos.org.

netCDF

The network Common Data Form (netCDF) is an interface for array-oriented data access and a freely distributed collection of software libraries for C, FORTRAN, C++, Java, and Perl that provide implementations of the interface. The netCDF software was developed at the Unidata Program Center in Boulder, Colorado, and augmented by contributions from other netCDF users. The netCDF libraries define a machine-independent format for representing scientific data. Together, the interface, libraries, and format support the creation, access, and sharing of scientific data.

For more information or to obtain netCDF software, see http://www.unidata.ucar.edu/software/netcdf. (The above information on netCDF was taken from the Unidata Web site.)

ASCII

An American Standard Code for Information Interchange (ASCII) text file is one in which each byte represents one character according to the ASCII code. ASCII files are human-readable and are sometimes called plain text files. Files that have been formatted with a word processor should be transmitted as binary files to preserve the formatting.

Binary

A binary file is computer-readable but not human-readable. Binary formats are used for executable programs and numeric data, whereas text formats are used for textual data. Many files contain a combination of binary and text formats. Such files are usually considered to be binary. Binary files are dependent upon machine architecture.

Shapefile

A shapefile is a digital vector (non-topological) storage format for storing geometric location and associated attribute information. The shapefile format specified by ESRI can be used by ArcView, ArcInfo, ArcGIS and other widely used GIS software. A shapefile stores map (geographic) features and attribute data as a collection of files having the same prefix and several file extensions. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). NOTE: An individual shapefile is actually a collection of files as described above that must be moved or distributed as a group otherwise the shapefile can be rendured unusable.

TIFF

A TIFF (Tagged Image File Format) is a raster data format for storage, transfer, display, and printing of raster images, such as clipart, logotypes, and scanned documents. The TIFF imagery file format can be used to store and transfer digital satellite imagery, scanned aerial photos, elevation models, scanned maps or the results of many types of geographic analysis. TIFF is a full-featured format in the public domain, capable of supporting compression, tiling, and extension to include geographic metadata.

GeoTIFF

GeoTIFF implements the geographic metadata formally, using compliant TIFF tags and structures. GeoTIFF refers to TIFF files which have geographic (or cartographic) data embedded as tags within the TIFF file. The geographic data can then be used to position the image in the correct location and geometry on the screen of a geographic information display. GeoTIFF is a metadata format, which provides geographic information to associate with the image data. But the TIFF file structure allows both the metadata and the image data to be encoded into the same file.

GeoTIFF makes use of a public tag structure which is platform interoperable between any and all GeoTIFF-savvy readers. GIS, CAD, image processing, desktop mapping and any other types of systems using geographic images can read any GeoTIFF files created on any system to the GeoTIFF specification.

JPEG

JPEG is the standard algorithm for the compression of digital images devised by the Joint Photographic Experts Group and having the filename extension jpg. The JPEG standard uses a ‘lossy’ Data Compression method in which some data is sacrificed (lost) to achieve greater compression. Files formatted using JPEG are not geolocated.

 

Metadata Formats

The role of metadata elements vary based on their scope. The following terms are frequently used to categorize metadata by scope:

  • Collection or aggregate metadata – These are metadata elements that describe an entire set of data products or files. Values of collection metadata apply to all of the products in a specific collection. Collections may represent the same release of any given data product, sets of data generated during an experiment, a campaign or an algorithmic test.
  • Granule or product metadata – These are metadata elements that describe a single instance of a data product. Values of granule metadata apply to all of the data in that one granule. Typical metadata in this category describe spatial and temporal extent of the data as well as the quality and lineage of the data.
  • Local or parameter metadata – These are metadata elements that describe a specific component of the data product. Values of local metadata apply specifically to the component with which they are associated. These elements often specify units of measure, the range of data within the array, the dimension sizes and identifying names for each dimension.

 

NetCDF--CF

NetCDF (network Common Data Form), a data model for array-oriented scientific data, is a freely-distributed collection of access libraries implementing support for that data model, and a self descriptive machine-independent format. The NetCDF-CF (Climate and Forecast) conventions are a set of codified reommendations for practices built around published specifications. While CF is a convention rather than an established metadata standard, CF is a critically important step towards better interoperability.

THREDDS Catalog Service Metadata

The Thematic Realtime Environmental Distributed Data Services (THREDDS) metadata addresses information stored in catalogs that contain information related to THREDDS Data Server (TDS) behavior, dataset grouping, and to dataset description. Most of the information that can be included in the catalogs overlaps with various ISO metadata standards. This overlap is explored on three levels: datasets, catalog structure, and service definitions. The THREDDS catalog specification defines a small but easy-to-use set of metadata and its XML schema.

ECHO

The EOS Clearinghouse (ECHO) is a spatial and temporal metadata registry that enables the science community to more easily use and exchange NASA’s data and services. ECHO stores metadata from a variety of science disciplines and domains held at 12 EOSDIS data centers. ECHO provides a single point of discovery, understanding and access for users, eliminating their need to have knowledge of location and type of data. ECHO metadata are specifically designed for use in ECHO.

FGDC–CSDGM

The Federal Geographic Data Committee (FGDC) is an interagency committee that promotes the coordinated development, use, sharing, and dissemination of geospatial data on a national basis. The current Federal standard for geospatial data is the Content Standard for Digital Geospatial Metadata (CSDGM). The standard provides a common set of terminology and definitions for the documentation of digital geospatial data. Efforts are underway to develop and finalize a new national (ANSI) standard - The North American Profile (NAP) of ISO 19115. This profile was developed to facilitate the transition from CSDGM to ISO. The FGDC recently endorsed ISO 19115, 19115-2, and other ISO Standards, along with the NAP, as U.S. National Standards, so Federal Agencies can use the standard(s) that most closely fits their needs.

GCMD DIF

The NASA Global Change Master Directory (GCMD) is an on-line system with information about Earth Science Data sets, and is intended for the use in the research science community. The GCMD offers descriptions of Earth Science Data sets (metadata) using a defined set of fields known as the Directory Interchange Format (DIF). The DIF has been endorsed by NASA’s Earth Science Data Systems Working Groups (ESDSWG) as a recommended standard.

OGC CS/W

The Open Geospatial Consortium’s Catalog Service for the Web (CS/W) provides catalog services for clients to find needed data and data-related services. CS/W specifies the interfaces, bindings, and a framework for defining application profiles that are required to publish and access digital catalogues for geospatial data and services. The CS/W specification does not require the use of a specific catalog schema. However, CS/W encourages the adoption of standard schemas for maximum interoperability. Specifically, OGC developed two application profiles for CS/W: the ISO19115/19119 profile and the electronic business Registry Information Model (ebRIM) profile.

Dublin Core

The Dublin Core metadata standard is a simple yet effective element set for describing a wide range of networked resources. It consists of fifteen properties for use in resource descriptions, which are part of a larger set of metadata vocabularies and technical specifications maintained by the Dublin Core Metadata Initiative (DCMI). Although the Dublin Core standard is not very broad, the DCMI has established standard ways to refine elements and encourage the use of encoding and vocabulary schemes.

METS

The Metadata Encoding and Transmission Standard (METS) schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using XML. In addition, METS provides the ability to associate a digital object with behaviors or services. The METS community consists, primarily, of university libraries, archives, and museums. The typical use of METS is to create a profile that is intended to describe a class of METS documents so that those documents can be created and processed in conformance to a profile.

NASA Earth Science Data Model

The NASA Earth Science Data Model (ESDM) was developed for NASA’s EOSDIS Core System (ECS). ESDM supports data management and search services for the diversity of Earth Science Data available through NASA’s Data Centers. It consists of a bounded set of attributes intended to cover the essential characteristics of all Earth Science Data sets, as well as provisions for product-specific metadata.