Understanding and Finding ARM Data

(2-9-98 Raymond A. McCord, rev. 12-20-2004 Giri Palanisamy)

The Atmospheric Radiation Measurement (ARM) program has generated billions of measurements of radiation, radiative transfer, meteorology, clouds, etc. (see URL http://www.arm.gov/ for more information). These measurements have been gathered by several dozen types of instruments and are stored in millions of data files. These files may be searched or categorized by site, facility, instrument type, measurement type, and time period. The intent of this documentation is to provide an overview of the organization of ARM data to help you find data you can use. After establishing a few key words, the structure of the data will be described and a few hints for finding other information about the ARM data that is not included in the files are provided. Options for finding data are based on these structural concepts and are described in the last section. A more detailed description of requesting and accessing data files from the Archive is presented in the Overview of ARM Archive User Interface.

Key words

Understanding the data from the ARM program includes the acceptance of a few key words and concepts. The following keywords are used to describe ARM data:

  • algorithm: a formula, function or logic used to compute derived measurements
  • data file: A single data file contains a limited time period of measurements from a data stream. Data files allow the information to be segmented into manageable increments for processing, storage and retrieval; usually one file per day.
  • data level: An indication of the extent of formatting, processing, or reviewing that has been completed on a data file
  • data quality report (DQR): A description of an event or observation that is believed to alter data quality from its normal patterns of measurement error. The scope a DQR is usually limited to a specific time interval and class of data files
  • data source: An object which generates data. Sources include instruments, instrument groups, and algorithms.
  • data stream: Collections of data files that have a common source and structure are called data streams.
  • "development" data: Data processes which are operated during the final phases of implementation of a data source result in data which are labeled as "Development." Development data files are available to users, but additional information about the operating conditions and potential for errors must be considered. [Revision note, 7/3/2001: The concept of development data files or data streams is no longer used in the current ARM data system. This concept is historically restricted to SGP data streams. Many of the older data streams are still named Dsgp.... Several of these data streams will be renamed to sgp... Pathways to access Dsgp... data streams are limited to the power user option of the browser interface.]
  • facilities: a location where instruments are installed within a site
  • instrument: A field device which measures one or more physical attributes. Some instruments may be complex collections of multiple 'sensors' or instrument groups. For simplicity, each physically integrated device will be called 'an instrument'
  • Intensive Operatonal Period (IOP): To meet specific experimental objectives (e.g., instrument comparison, detailed observation of phenomenon, etc.), ARM has intermittent periods when normal operations are revised. These periods are called IOPs and they regularly include extra varieties of data files. The IOP data are stored in a separate structure from the routine ARM data (See more information at: http://iop.archive.arm.gov/).
  • measurement: A value which represents a physical attribute of the environment. Examples include wind speed, air temperature, total direct radiation, etc.
  • netCDF: A binary, self-documenting file format used to store ARM data files. NetCDF was developed by UCAR and public domain software libraries for working with NetCDF files can be found at http://www.unidata.ucar.edu/packages/netcdf
  • site: A geographic region where ARM instruments are installed.
  • value added product (VAP): ARM data files and data streams that are derived from advanced algorithms or include the merging of data from more than one input data source are called value added products (VAPs). The value added products have a data level of c1 or higher and have a data flow pattern that is different from the directly processed data streams. The scope of active VAPs is continually evolving. More information on VAPs can be found at http://www.arm.gov/data/vaps_all.php

Explanation of ARM data structure

This section describes ARM data as a hierarchy of information (see Figure 1). [Revision note, 2/12/2002: Although the figure illustrates data through 1998, the concepts shown in the figure are still valid.] The discussion begins with the smallest unit of information and moves up the hierarchy. Although the user should understand the smaller units of ARM data, the available quantity of data is large enough, that it is easier to consider data at the higher levels.

Measurement: the smallest portion of ARM data

The smallest unit of ARM data is called a measurement. It is a value that represents an attribute of the environment (e.g., cloud base height). These measurements are usually stored as a series of values recorded over a time sequence. Measurements include values that directly represent environmental attributes (e.g., air temperature), but they may also include values about the operating condition of the instrument (e.g., the azimuth of the sensor). Each measurement is directly related to the time of observation and integration period. In addition to 'time of observation', the minimal context required to interpret a measurement includes the definition of the location, the units recorded, and the description of the method (reference to instrument, sensor, or algorithm). A discussion of the types of ARM measurements can be found at http://www.arm.gov/measurements/measclass.php. During its history the ARM program has generated billions of measurement values.

Data file: a collection of related measurements

The ARM measurements are aggregated together in data files. A data file usually contains a time series of one or more measurements for a known time interval and a single location. Most data are stored in netCDF format. This format allows for the definition of data fields and storage of operational information in the header of the file. See above for more information on netCDF. Some data files also contain one or more measurements distributed over a region or other dimension from a single time (i.e., soil water and temperature profiles, satellite images, mesonet files). Data from satellites are stored in the HDF format developed by NASA (see http://hdf.ncsa.uiuc.edu/ for more information on HDF). A single data file is the smallest unit of information distributed by ARM to its data users. Persons needing less than the contents of a single data file must derive the subset with their own tools. At this time, the ARM program has generated millions of data files.

Data stream: a series of data files

Collections of data files that have a common source and structure are called data streams. A data stream represents the output of a single data source (i.e., instrument, instrument group, or algorithm). All files within a single data stream have a common information content (e.g., same measurement types and operational information, and similar name). At this time, the ARM Program has generated more than 2400 data streams.

Data stream groups: conceptually related groups of data streams

Data streams can be categorized into more than one grouping scheme (e.g., by location, source, or processing status). Understanding relationships between data streams can simplify the overwhelming magnitude of ARM data. All of the data streams within a group may share common attributes. For example, all data streams from similar sources (instruments or algorithms) will have similar (often identical) data fields containing measurements. Data streams from similar sources will also reference common instrument descriptions.

Data streams may be "grouped" in the following categories:

Spatial Structure

Data generated by ARM typically represent a geographic point (rather than a geographic line or area). Collectively the ARM points represent a model cell of a General Circulation Model (GCM). These point locations are structured in a two-level hierarchy:

1) Sites - Southern Great Plains, Tropical Western Pacific, and North Slope of Alaska. [Revision note, 2/13/2002: The usage of virtual sites to designate data "in development" is currently inactive. Dsgp... is the only development site with stored data. Many of these data streams are being renamed to sgp... Dsgp... data streams are accessible only from the Power User option in the Browser Interface.] Developmental data have limited accessibility and users from the general scientific community are encouraged to discuss the use of this type of data with the appropriate technical contacts.

2) Facilities - similar instrument installations at different geographic locations within a site. The SGP site has central, extended, boundary, and intermediate facilities. The TWP site has installations on Manus and Nauru Island (A third site for operational staging will soon be implemented at Darwin, Australia). The NSA site includes installations at Barrow and Atqasuk.

External Data - The ARM data structure also includes similar data collected by other programs. These data frequently represent spatial areas such as satellite scenes and interpolated mesonet data. More information about the spatial extent of the external data can be obtained from URLs:

The following table describes the magnitude of the major dimensions of the ARM data structure.

Dimension Number of values Discussion
Site 3 values The major field locations operated by ARM (Southern Great Plains (SGP), North Slope of Alaska, Tropical Western Pacific)
Facility ~30 values for SGP in 5 categories Central, Extended, Boundary, Intermediate, External
Data level: ~8 values versions of the same information that reflects increments of data processing and QA review; most data streams have less than 3 data levels.
Time: (Day, month, or year ) currently spans 8-9 years and will continue several more years; many values possible depending on the time resolution chosen.
Data stream: ~2000 values can be partitioned by data levels or facility; can be partitioned by source (> 100 types of instruments and algorithms).

Supporting information external to the data files

Although the data files contain the measurement values, units, sample time, and location, many other types of information can contribute to the interpretation of the data. These include:

  • Data Quality Reports (DQRs) (see URL: http://www.db.arm.gov/PIFCARDQR2/)
    • These reports provide descriptions of events which are known or likely to alter the 'normal' pattern of measurement error. These reports are 'event-driven' and describe problems at a variety of scales (one or more instruments, one or more facilities, one or more time intervals)
    • DQRs that are specifically related to the requested data files are provided with each data request from the Archive.
    • DQRs received after the data files have been requested will be "retroactively distributed" to the data users. This retroactive distribution provides DQRs that are specifically related to the users previous requests.
  • Instrument and VAP descriptions
  • Measurement descriptions (see URL: http://www.arm.gov/data/types.stm)
  • Intensive Operational Period (IOP) descriptions
    • ARM operations and instrumentation are periodically modified to provide a better focus on a particular observation objective. During these IOPs, additional observations for routine measurements or additional instruments may exist. Descriptions of IOPs can be found at URL: http://www.arm.gov/docs/iops.html
    • IOP data can be accessed from an ARM IOP Data Browser Access to this Data Browser requires the creation of an Archive account (go to Archive User Interface login and click "Create Account"). Records from this account will enable the Archive to notify previous requesters of data when new versions are available.
    • The occasional operation of unmanned aerospace vehicles (UAVs) has been a special type of IOP in the ARM program.
      • Information about the UAV operations and data can be found at URL: http://www.arm.gov/uav/index.html
      • UAV data files are accessible from the portions of the ARM IOP Data Browser. Search for UAV data by selecting the related year and site subdirectory in the ARM IOP Data Browser.
  • Technical Contact information
  • Presentation of "Quick Look" data display
  • Site operations information (weather observations, maintenance records, etc).

Options for Finding Data

ARM data can be 'found' at the Archive by using various combinations of the following approaches:

  • Query Search using ARM Data Browser in the Archive User Interface
    • This approach is effective for finding data if you can specify your data of interest in the logic of the ARM data structure. The ARM Data Browser expects the user to specify selections of site, date range, measurement or instrument categories, measurements or instruments and facilities to identify available data (see URL: http://www.archive.arm.gov/). The response of the Archive User interface is the display of a summary of the quantity of available data and an option to 'order' data files to be retrieved from the mass storage system of the Archive (Overview of ARM Archive User Interface). If you are browsing for data with non-specific criteria, determining data availability by querying can be tedious (e.g., too many data files for a general search).
  • Catalog review in Archive User Interface
    • This approach is effective for finding data if you have general or non-specific criteria for "data of interest" (e.g., acceptable data may be obtained from several alternative time periods or locations). The catalog interface that displays data availability (number of files) in a hierarchy tables that are summarized by year, site, facility, instrument category, etc. (The catalog interface is also accessible at URL: http://www.archive.arm.gov/; Additional documentation about the catalog interface) The summarization in the catalog interface has a minimum resolution of monthly increments. If your "data of interest" include time ranges smaller than one month, then requesting data with the browser interface may be more efficient.
  • IOP Data Archive
    • The IOP data are stored separately from the routine ARM data files.
      • See more information at: http://iop.archive.arm.gov/
      • This enables the flexibility necessary to document the operational exceptions contained in the IOP data.
    • The IOP Data Browser (http://iop.archive.arm.gov/arm-iop/) provides a simple interface to the hierarchical directory structure in which the IOP data are stored. It allows the user to download individual files immediately, and provides a mechanism for the selection and subsequent download of entire directory trees in compress tar or zip formats.
    • Access to the IOP data browser uses the Archive account name as the username and password for the IOP system
      • The login process enables the Archive to track data usage and provide follow up information about IOP data.
    • Additional information about IOP data can be found on the main ARM page at: http://www.arm.gov/campaigns. Information stored under the IOP web pages also includes direct links to the IOP data browser.
  • Contact Archive User Services
    • Help with locating ARM data from the Archive can be attained by contacting Archive User Services via e-mail armarchive@ornl.gov or phone (888-276-3282 or 865-241-4851). Technical assistance about ARM instruments and measurements is also available from the ARM Working Group Translators. The Translators have a general knowledge of data related to their working group and are familiar with the research activities of many ARM PI's within their working group