CarbonTracker
 
GLOBALVIEW : Data Extension

The GLOBALVIEW products are derived using the data extension and integration techniques described by Masarie and Tans, [1995]. These techniques were first developed using CO2 measurements and have since been applied to measurements of CH4, CO and other atmospheric trace constituents. Extended records are derived from measurements but contain no actual data. To facilitate use with carbon cycle modeling studies, measurements have been processed (smoothed, interpolated, and extrapolated) resulting in extended records that are evenly incremented in time.

Data extension attempts to address the issue of temporal discontinuities in and among measurement records by characterizing what we have learned from actual observations and extending this knowledge in time beyond the observations themselves. Thus far, "knowledge" has been defined as average behavior, i.e., average seasonal cycle and trends, and how, on average, a site differs from other sites that are nearby in latitude. It is this average behavior that is extended beyond the measurements.

Creating an Extended Record

We use NOAA ESRL atmospheric CO2 measurements from samples collected in the Azores (AZR) to describe the data extension technique. In this example, we will construct an extended CO2 record at AZR spanning the period January 1, 1979 to January 1, 2007. In the top panel of the figure below, a smooth curve is fitted to the actual observations from weekly air samples collected at AZR (open squares). Because we are trying to create an uninterrupted synchronized time series, we try to extract values from the smooth curve at 48 equal time steps within a year (7.6 day interval) from 1979 to 2007. If there were no gaps in the observational record, we would be done. However, in the AZR CO2 record, measurement gaps do exist with the most significant occurring in 1989-1990 and 1992-1994.

Data Extension
Figure 1. Development of the extended record at Azores (AZR). See text for discussion.

There are several ways one might think to fill in missing data. We could allow the curve fitting routine to perform an interpolation, but this approach will produce values based on the average seasonal pattern and trends derived from the entire AZR CO2 record. This would be a reasonable strategy if all we had were observations from AZR, but, in fact, there are many long-term high-precision CO2 records available from many laboratories throughout the world. We could use a multi-dimensional transport model and all available atmospheric CO2 observations to create a matrix of CO2 values defined everywhere in space and time (data assimilation). In this way we could extract values from the matrix wherever gaps occur. This would also be a reasonable strategy except, at present, model comparison experiments show that model results depend on the individual models [e.g., Gurney et al., Nature, VOL. 412, 7 February 2002].

The data extension procedure relies only on observations... and of course, a few key assumptions!

What have we learned from observations at AZR? In Figure 1, panel 1 above, we see a seasonal pattern and long-term trend in the CO2 observations at AZR (open symbols). We also note that both the seasonal pattern and trend vary in time. We can model these features by fitting curves to the data. The smooth curve, S(t), captures both the seasonal pattern and long-term trend; the trend curve, T(t), captures the long-term trend (seasonal pattern removed).

How do observations at AZR compare with observations nearby in latitude? We would like to compare the AZR observations with a reference time series, but what reference should we use? The data extension procedure constructs a reference using all available observations from marine boundary layer (MBL) sampling sites.

We must first construct a matrix of CO2 mixing ratios as function of time and sine latitude. The matrix is derived from all available observations from locations sampling large well-mixed marine boundary layer (MBL) air. We fit a smooth curve to each MBL time series (similar to the one shown in Figure 1, panel 1 for AZR). Next, we extract values from each curve at "weekly" intervals for the period 1979 to 2007 but only where measurements exist. We then step through each week to construct a weekly distribution of CO2 mixing ratios as a function of sine latitude using all available MBL values. We fit a curve to this north-south distribution and extract values from this curve at 0.05 sine latitude intervals. Lastly, we "glue" these weekly north-south values together to construct the MBL reference matrix (Figure 2).

CO2 MBL Reference Matrix
Figure 2. CO2 MBL Reference Matrix. Only most recent 10 years are shown.

We can now extract a reference time series from the MBL matrix at the latitude of AZR (Figure 1, panel 2). To see how observations at AZR differ from this reference, we subtract the smooth values, SAZR(t), (for periods where AZR observations actually exist) from the MBL reference, REFAZR(t). This difference distribution, ΔAZR,REF(t), is shown in blue in Figure 1, panel 3. We represent this distribution by a curve fit. The curve, δAZR,REF(t), tells us how, on average, measurements at AZR differ from the MBL reference. We extract values from this curve to fill gaps in the difference distribution. Remember, these gaps in the difference distribution exist because of actual gaps in the AZR observations. These filled in values are shown in red in Figure 1, panel 3.

FInally, we construct the extended record at AZR by adding the MBL reference (Figure 1, panel 2) to the difference distribution (Figure 1, panel 3). The resulting extended record is shown in Figure 1, panel 4. Note that the blue symbols in panel 4 are exactly the smooth values extracted at weekly time steps from the AZR record where AZR observations exist (panel 1). The red values are derived interpolated and extrapolated values for periods where observations at AZR do not exist.

How confident are we in the extended records? Unlike data assimilation schemes, data extension is model-independent, based entirely on actual data, relatively straightforward (and thus reproducible), and robust - improving with both time and increasing spatial coverage. A limitation of this technique is in the assumption that average behavior is constant in time. This assumption is most vulnerable in regions such as southern China where rapid economic development will likely result in an increasing regional "bump" in CO2 concentrations. With time, this "bump" will likely become the trend in average behavior relative to other sites and our assumption will again be reasonable. A second limitation is that extended values do not capture synoptic scale events because the method depends on relationships that are averaged over time.

Confidence in the smoothed values depends on the density of the data, the relative occurrence of rejected data, the "scatter" in the data, the type and number of corrections applied, and the length of the measurement period. Masarie and Tans [1995] describe in detail the relative weighting scheme and provide an example of how extended records and relative weights have been used in a 2-D modeling application. Users may choose to ignore our weighting scheme; sufficient information is included in the weight files so that users may devise their own weighting scheme.

For a comprehensive description of the data extension procedure, see Masarie and Tans, [1995] and Version History (GLOBALVIEW-CO2, 2000).