National Environmental Satellite, Data, and Information Service Home Page. National Oceanic and Atmospheric Administration Home Page. National Climatic Data Center Home Page. Department of Commerce Home Page. NOAA, National Environmental Satellite, Data, and Information ServiceNational Climatic Data Center, U.S. Department of Commerce

The USHCN Version 2 Serial Monthly Dataset


Introduction

The United States Historical Climatology Network (U.S. HCN) is essentially a subset of the U.S. Cooperative Observer Network operated by NOAA's National Weather Surface (NWS). The approximately 1200 HCN stations were originally selected according to factors such as record longevity, percentage of missing values, spatial coverage as well as the number of station moves and/or other station changes that may affect data homogeneity. Most HCN stations are situated in rural areas or small towns; however, a smaller number of stations are also part of the NOAA NWS synoptic network, whose stations are generally located at airports in more urbanized environments. HCN datasets have been developed at NOAA's National Climatic Data Center (NCDC) in collaboration with the Department of Energy's Carbon Dioxide Information Analysis Center (CDIAC).

The U.S. HCN project dates to the mid-1980s (Quinlan et al. 1987). At that time, in response to the need for an accurate, unbiased, modern historical climate record for the United States, personnel at the Global Change Research Program of the U.S. Department of Energy and at NCDC defined a network of 1219 stations in the contiguous United States whose observation would comprise a key baseline dataset for monitoring U.S. climate. Since then, the U.S. HCN dataset has been updated several times (e.g., Karl et al., 1990; Easterling et al., 1996). The U.S. HCN Version 2 serial monthly data release will be the most recent update to the HCN datasets. Version 2 data were produced using a new set of quality control and homogeneity assessment algorithms. Two papers are in preparation that formally describe the Version 2 homogenization algorithm and provide an overall assessment of the version 2 maximum and minimum temperature trends. In the meantime, a brief summary of HCN processing steps is provided below. The methodology used in previous releases of the Version 1 monthly data is described in the USHCN Version 1 web site.

Version 2 Data Processing Steps

The data from each HCN station were subject to the following quality control and homogeneity testing and adjustment procedures.

Quality Evaluation and Database Construction

First, daily maximum and minimum temperatures and total precipitation were extracted from a number of different NCDC data sources and subjected to a series of quality evaluation checks. The three sources of daily observations included DSI-3200, DSI-3206 and DSI-3210. Daily maximum and minimum temperature values that passed the evaluation checks were used to compute monthly average values. However, no monthly temperature average or total precipitation value was calculated for station-months in which more than 9 were missing or flagged as erroneous. Monthly values calculated from the three daily data sources then were merged with two additional sources of monthly data values to form a comprehensive dataset of serial monthly temperature and precipitation values for each HCN station. Duplicate records between data sources were eliminated. Following the merging procedure, the monthly values from all stations were subject to an additional set of quality evaluation procedures, which removed between 0.1 and 0.2% of monthly temperature values and less than 0.02% of monthly precipitation values.

Time of Observation Bias Adjustments

Next, monthly temperature values were adjusted for the time-of-observation bias (Karl, et al. 1986; Vose et al., 2003). The Time of Observation Bias (TOB) arises when the 24-hour daily summary period at a station begins and ends at an hour other than local midnight. When the summary period ends at an hour other than midnight, monthly mean temperatures exhibit a systematic bias relative to the local midnight standard (Baker, 1975). In the U.S. Cooperative Observer Network, the ending hour of the 24-hour climatological day typically varies from station to station and can change at a given station during its period of record. The TOB-adjustment software uses an empirical model to estimate and adjust the monthly temperature values so that they more closely resemble values based on the local midnight summary period. The metadata archive is used to determine the time of observation for any given period in a station's observational history. This adjustment is the first of several adjustments to the USHCN monthly temperature values.

Homogeneity Testing and Adjustment Procedures

Following the TOB adjustments, the homogeneity of the TOB-adjusted temperature series is assessed. In previous releases of the U.S. HCN monthly dataset, homogeneity adjustments were performed using the procedure described in Karl and Williams (1987). This procedure was used to evaluate non-climatic discontinuities (artificial changepoints) in a station's temperature or precipitation series caused by known, random changes to a station such as equipment relocations and changes. Since knowledge of changes in the status of observations comes from the station history metadata archive maintained at NCDC, the original U.S. HCN homogenization algorithm was known as the Station History Adjustment Program (SHAP).

Unfortunately, station histories are often incomplete so artificial discontinuities in a data series may occur on dates with no associated record in the metadata archive. Undocumented station changes obviously limit the effectiveness of SHAP. To remedy the problem of incomplete station histories, the Version 2 homogenization algorithm addresses both documented and undocumented discontinuities.

The potential for undocumented discontinuities adds a layer of complexity to homogeneity testing. Tests for undocumented changepoints, for example, require different sets of test-statistic percentiles than those used in analogous tests for documented discontinuities (Lund and Reeves, 2002). For this reason, tests for undocumented changepoints are inherently less sensitive than their counterparts used when changes are documented. Tests for documented changes should, therefore, also be conducted where possible to maximize the power of detection for all artificial discontinuities. In addition, since undocumented changepoints can occur in all series, accurate attribution of any particular discontinuity between two climate series is more challenging (Menne and Williams, 2005).

The USHCN Version 2 homogenization algorithm addresses these and other issues according to the following steps. At present, only temperature series are evaluated for artificial changepoints.

  1. First, a series of monthly temperature differences is formed between numerous pairs of station series in a region. The difference series are calculated between each target station series and a number (up to 40) of highly correlated series from nearby stations. In effect, a matrix of difference series is formed for a large fraction of all possible combinations of station series pairs in each localized region. The station pool for this pairwise comparison of series includes U.S. HCN stations as well as other U.S. Cooperative Observer Network stations.
  2. Tests for undocumented changepoints are then applied to each paired difference series. A hierarchy of changepoint models is used to distinguish whether the changepoint appears to be a change in mean with no trend (Alexandersson and Moberg, 1997), a change in mean within a general trend (Wang, 2003), or a change in mean coincident with a change in trend (Lund and Reeves, 2002) . Since all difference series are comprised of values from two series, a changepoint date in any one difference series is temporarily attributed to both station series used to calculate the differences. The result is a matrix of potential changepoint dates for each station series.
  3. The full matrix of changepoint dates is then "unconfounded" by identifying the series common to multiple paired-difference series that have the same changepoint date. Since each series is paired with a unique set of neighboring series, it is possible to determine whether more than one nearby series share the same changepoint date.
  4. The magnitude of each relative changepoint is calculated using the most appropriate two-phase regression model (e.g., a jump in mean with no trend in the series, a jump in mean within a general linear trend, etc.). This magnitude is used to estimate the "window of uncertainty" for each changepoint date since the most probable date of an undocumented changepoint is subject to some sampling uncertainty, the magnitude of which is a function of the size of the changepoint. Any cluster of undocumented changepoint dates that falls within overlapping windows of uncertainty is conflated to a single changepoint date according to
    1. a known change date as documented in the target station's history archive (meaning the discontinuity does not appear to be undocumented), or
    2. the most common undocumented changepoint date within the uncertainty window (meaning the discontinuity appears to be truly undocumented)
  5. Finally, multiple pairwise estimates of relative step change magnitude are re-calculated at all documented and undocumented discontinuities attributed to the target series. The range of the pairwise estimates for each target step change is used to calculate confidence limits for the magnitude of the discontinuity. Adjustments are made to the target series using the estimates for each discontinuity.

Estimation of Missing Values

Following the homogenization process, estimates for missing data are calculated using a weighted average of values from highly correlated neighboring values. The weights are determined using a procedure similar to the SHAP routine. This program, called FILNET, uses the results from the TOB and homogenization algorithms to obtain a more accurate estimate of the climatological relationship between stations. The FILNET program also estimates data across intervals in a station record where discontinuities occur in a short time interval, which prevents the reliable estimation of appropriate adjustments.

Urbanization Effects

In the original HCN, the regression-based approach of Karl et al. (1988) was employed to account for urban heat islands. In contrast, no specific urban correction is applied in HCN version 2 because the change-point detection algorithm effectively accounts for any "local" trend at any individual station. In other words, the impact of urbanization and other changes in land use is likely small in HCN version 2. Figure 1 - the minimum temperature time series for Reno, Nevada - provides anecdotal evidence in this regard. In brief, the red line represents TOB-adjusted data, and the green line represents fully adjusted data. The TOB-adjusted data clearly indicate that the station in Reno experienced both major step changes (e.g., a move from the city to the airport during the 1930s) and trend changes (e.g., a growing urban heat island beginning in the 1970s). In contrast, the fully adjusted (homogenized) data indicate that both the step-type changes and the trend changes have been effectively addressed through the change-point detection process used in HCN version 2.

Difference between annual minimum temperatures at Reno, Nevada and the mean from 10 nearby stations.

Figure 1. Difference between annual minimum temperatures at Reno, Nevada and the mean from 10 nearby stations. The red line indicates TOB adjusted data; the green line is based on the fully adjusted data. Units are °F.

References