Data Quality Program

Introduction

One of the goals of the ARM Climate Research Facility is to provide datastreams of quality suitable for scientific research. Maintaining data quality for an organization program of the size and complexity of the ARM Facility is a significant challenge; efforts toward this end have matured and evolved over the life of the Facility.

The Data Quality (DQ) Office was established in July 2000 to help coordinate the continued evolution and implementation of efforts to ensure the quality of the data collected by ARM’s field instrumentation. The DQ Office is responsible for ensuring that quality assurance results are communicated to (1) data users so that they may make informed decisions when conducting research with the data, and (2) ARM's infrastructure scientists, engineers, and site operators to facilitate improved instrument performance and thereby minimize the amount of unacceptable data collected.

Toward these goals, the DQ Office, instrument mentors, site scientists, and others in ARM help review and assess ARM's datastreams and write and submit appropriate Data Quality Reports (DQRs) for data users as needed. These groups also work closely with ARM Site Operations to impart information about data quality that will initiate troubleshooting and/or corrective maintenance activity. Work is also performed to devise and implement automated schemes to flag questionable data as they are processed into ARM data files. To learn about ARM datastream documentation and standard formatting and naming protocols, please refer to the Data Management and Documentation Plan.

In addition, the DQ Office participates in the collection and presentation of documentation on the data quality program to help achieve consistency of presentation within the ARM Facility and to make this information available and useful to data users. This includes web pages describing ARM’s instruments and measurements. Standardization of the quality assurance procedures that are applied to data is also an ongoing process in order to establish data quality baselines and protocols for each instrument that are consistent across the facility sites.

The results of routine datastream checking, often on an instrument-by-instrument basis, allow us to identify where known problems and questionable data are so that we may communicate this information. We also communicate how we made these determinations. In general, we will deem unflagged data as acceptable for most purposes. Higher-level datastream intercomparisons and the generation of value-added products (VAPs) can augment the routine checking by giving us an idea of the relative utility of datastreams, and in essence, tell us "how good" the acceptable data might be. These higher-level checks can also point out deficiencies that are not necessarily detectable within individual datastream checks. The creation of the higher-level products also provides the user community with heavily screened, best-estimate data sets ready for use in high-level scientific research.

The sections below provide more description and documentation of the various parts of ARM's DQ program. This page, along with instrument- and measurement-specific information on data quality contained within the instrument web pages, are living documents that allow us to provide updates, changes, and notices of progress.

[Top]

Data Quality Health and Status – Working Display of Current and Recent Quality Control Results

The DQ Office has a website dedicated to displaying data quality health and status information in the near-real-time so that those involved in quality assurance activities can monitor data quality.

Within this site you can find color-coded hourly summaries of data quality flags produced on a daily basis with accompanying diagnostic plots (DQ Explorer), a specialized diagnostic plots browser with thumbnail views (DQ PlotBrowser), an interactive plotting capability (NCVweb), a data quality documentation Wiki, and links to supporting information such as instrument and maintenance logs and various reports related to the data quality effort.

[Top]

Instrument Web Pages – Specifications and Expectations

The ARM Climate Research Facility has created detailed web pages that describe the specifications of its instruments. A fundamental property of quality control is a statement of expectations. Quality is the measure of how closely something conforms to an expectation. Without an expectation, a quality assessment is not possible, and so the expectation serves as the baseline against which the observations can be compared. The instrument handbook contained on each instrument web page includes our current understanding of the measurement systems and their quirks and deficiencies, including common problems encountered or inherent to the measurement.

For a quick summary of ARM instruments at each field site, view the Instrument Location Table or the alphabetized list. To learn about the specific data the ARM instruments collect, see Measurements.

[Top]

Automated Quality Control – In-File Flagging

Automated Quality Control refers to data quality checks that are applied when datastreams are processed by ARM, generally during ingest into the site data systems. For most ARM instruments at present, data are checked for violations of simple physical limits (minimum, maximum) and maximum change (delta) on each data field. Samples that exceed these criteria are flagged, and these flags are included in the netCDF files. Some instrument data files contain the results of more sophisticated quality checks. The results of automatic flagging are shown as color codes in the ARM Data Archive’s Data Browser.

Information about flagging is included in the header (known as a Data Object Design, or DOD) of each netCDF file. The data user needs to carefully read and consider this header information and any quality flags when processing their data.

The format in which the ARM Facility stores quality-control information in data files changed in the spring of 2001 to improve the consistency of data representation across the fixed ARM research sites. Details about flagging for an instrument can be found on the instrument’s web page in its instrument handbook, in Section 5 (“Data Description and Examples”).

[Top]

Quality Control Applied by Instrument Mentors, Site Scientists, and the Data Quality Office – "Value-Added" Checking

Instrument Mentors are charged with developing the technical specifications for instruments procured for the ARM Climate Research Facility. The instrument mentor then tests and operates the instrument system either at his or her location or at one of ARM's research sites. In addition, the mentor works with data system personnel on data ingest software requirements. Data ingest involves the conversion of datastreams to the International System of Units (SI), as well as the acquisition of parameters that can be used to monitor instrument performance (e.g., monitoring an instrument's output voltage for a 5-V power supply or the continuity of the wire in a hot-wire anemometer). Therefore, data collection and ingest are the focus of this first level of data quality assurance. At this level, instrument mentors and site operators routinely monitor the quality of the data. Instrument mentors provide all calibration, operations, and maintenance documents and lists of spare parts to Site Operations. Typically, the mentor provides additional detailed documentation and hands-on training so site operators can offer appropriate support.

A next level of mentor data quality assurance involves beta release of datastreams from individual instruments. The mentor receives the data from the instrument to determine whether the technical specifications of the instrument are being met. When the mentor is satisfied that the instrument is functioning properly and the technical specifications have been met, the data are formally released for use. After this release, the instrument mentor is charged with reviewing and reporting on the state of the instrument's data on a monthly basis. Mentors report their findings to data users via a monthly summary report and in DQRs. Mentors also provide the DQ Office with guidance on how to perform data quality inspections and assessments. Instrument mentors and Site Operations personnel are the key players in the problem resolution process. Refer to the web page of each instrument for specific data quality information, contained within Section 6 (“Data Quality”) of the each instrument handbook.

The DQ Office then provides the first line of defense for routine data inspection and assessment activities for the production data set collected from all ARM sites by performing daily to weekly checks of the data. The main goal of this inspection and assessment is to provide quick notification to site operators and instrument mentors whenever irregularities in data quality are found, in order to minimize the amount of unacceptable data collected at a field site. The DQ Office also extracts data quality information from VAPs (see below) to further evaluate the quality of the instrument datastreams that serve as input, and to validate the fidelity of the VAPs themselves. To this end, the DQ Office works closely with VAP translators, who translate the scientific algorithms developed by principal investigators to the rest of the ARM infrastructure.

DQ Office inspections are based on the results of in-file flagging, additional DQ Office-applied tests, diagnostic plots, and other supporting evidence such as Site Operations maintenance information and reports. Datastreams from different instruments often are compared, and the overall quality of the data is assessed based on guidelines developed or prescribed by instrument mentors and DQ Office staff. When problems are encountered, mentors, site operators, and site scientists are notified to begin the problem resolution process and/or to investigate further (see next section). The DQ Office issues weekly instrument assessment reports to instrument mentors and site operators. Problem reports are issued as needed.

Site Scientist data quality assessment efforts may involve in-depth evaluation of both individual and multiple datastreams to address a data quality issue specific to their site. These efforts complement and augment those of instrument mentors and the DQ Office. Site scientists may also perform research on topics related to site data quality issues. Site scientists are also responsible for making sure that the problem resolution process is conducted in a timely manner at their sites.

[Top]

Data Quality Reporting – Notification of Problems to Data Users

A Data Quality Problem Report (DQPR) allows the DQ Office, instrument mentors, site operators, site scientists, and others within the ARM infrastructure to formally submit problem reports in order to initiate and track the problem resolution process. Oversight of the process is performed by the respective site scientists and the DQ Office manager. A DQPR is closed when the problem is solved, with the final act being the writing of a DQR if necessary. If a problem cannot be solved within a reasonable amount of time (usually 30–45 days) or is catastrophic in nature, it will be escalated within the DQPR system to the attention of the ARM Problem Review Board, which meets bi-weekly to review outstanding problems.

A DQR is a written statement about the quality of data within a particular datastream. The information may be quite simple (e.g., stating an instrument system was turned off and the data do not exist) or quite complex (e.g., providing detailed analyses and equations that should be used to adjust the instrument's data). At present, when a person orders ARM data from the ARM Data Archive, all DQRs written on the desired datastreams are attached to the data order. DQRs are typically written by the appropriate instrument mentor, who has the final word on the data quality of his or her instrument. These DQRs, as described earlier, provide a quality designation beyond that of in-file flagging.

[Top]

Value-Added Products and Best Estimates – Second Generation Datastreams

Many of the scientific needs of the ARM Climate Research Facility are met through the analysis and processing of existing datastreams into VAPs. Despite extensive instrumentation deployed at the ARM sites, there may be quantities of interest that are either impractical or impossible to measure directly or routinely. Physical models using ARM instrument data as inputs are implemented as VAPs and can help fill some of the unmet measurement needs of the Facility. In addition, when more than one measurement of a particular variable is available at a site, ARM also produces "best-estimate" VAPs for that measurement. Information about each VAP can be found on the VAP’s web page and in the accompanying technical report, which is linked from its page if available. In addition, the operational mode and status of each VAP can be found on the ARM VAP status page.

[Top]

Data Plots – Online Visualization of Data Streams

Visualizations of collected data of varying sophistication are available. Some of these data plots are intended, and are more useful, for operational diagnostic purposes, while others are useful for scientific inquiry. An online compendium of data plots from across the ARM Facility is available for browsing.

[Top]

Data User Notes – General Online Guidance for the Data User

Several situations may arise during instrument operation that can affect the quality of the data, but may not be flagged or otherwise corrected; the user needs to be aware of them. Some of these instances may be documented in general DQRs and will be attached to data orders.

These instances are, however, documented within the instrument handbook found on each instrument web page. Such user advice can be found in Section 5 (“Data Description and Examples”) under the topics "User Notes and Known Problems" and "Frequently Asked Questions."

The data user is urged to read and heed such information when available. Site-specific issues, based on the vagaries of measurements in diverse locations such as the tropics, midlatitudes, and polar regions, will be called out in these websites or in general DQRs, if they exist.

[Top]

Calibration and Maintenance Information – Expected and Actual Instrument Performance

The ARM Climate Research Facility provides information on calibration and maintenance procedures and the results of applying such procedures. Some of this information is available within the instrument handbook of the instrument web pages within Section 7 (“Instrument Details”).

[Top]