Jump to main content.


Research Project Search
 Enter Search Term:
   
 NCER Advanced Search

2002 Progress Report: Integrating Numerical Models and Monitoring Data

EPA Grant Number: R829402C002
Subproject: this is subproject number 002 , established and managed by the Center Director under grant R829402
(EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).

Center: Center for Integrating Statistical and Environmental Science
Center Director: Stein, Michael
Title: Integrating Numerical Models and Monitoring Data
Investigators: Stein, Michael , Amit, Yali , Beletsky, Dmitry , Kotamarthi, V. Rao , Lesht, Barry , Schwab, David
Current Investigators: Stein, Michael , Amit, Yali , Beletsky, Dmitry , Chen, Li , Kotamarthi, V. Rao , Lesht, Barry , Nakamura, Noboru , Schwab, David , Stroud, Jonathan , Zhang, Zepu
Institution: University of Chicago
Current Institution: Argonne National Laboratory , National Oceanic and Atmospheric Administration , University of Chicago , University of Michigan , University of Pennsylvania
EPA Project Officer: Smith, Bernice
Project Period: March 12, 2002 through March 11, 2007
Project Period Covered by this Report: March 12, 2002 through March 11, 2003
RFA: Environmental Statistics Center (2001)
Research Category: Environmental Statistics , Ecological Indicators/Assessment/Restoration

Description:

Objective:

The objective of this research project is to develop statistical approaches to problems in which both monitoring data and output from a physical model are available to assess the state of the physical environment. This project can be organized into eight subprojects, and sections of this report correspond to these subprojects. The subprojects cover a broad range of environmental applications, including air pollution monitoring, stratospheric ozone, adjustment of emissions inventories, sediment transport in Lake Michigan, and chlorophyll levels in Lake Michigan.

The development of statistical models and methods for spatial-temporal processes is central to much of this project, and the area perhaps most in need of advancement is the application of statistics to air and water pollution. We have been addressing this area from theoretical and practical perspectives, with each perspective challenging and supporting the other. One particularly challenging problem that arises in many of the subprojects is the development of statistical models for the errors made by deterministic numerical models, which is of great importance to describing and understanding why models are not successful, and is a critical component to developing effective data assimilation schemes for pollution models. Subprojects B, C, and D have active collaborations with U.S. Environmental Protection Agency (EPA) scientists. Principal Investigator (PI) Michael Stein met with all of the collaborators on a June visit to Research Triangle Park (RTP), NC, and several doctoral students also have visited EPA, including Mikyoung Jun, who spent the summer of 2002 at RTP. Jason Ching of the EPA recently spent a highly productive visit in Chicago and we hope that our other collaborators will be able to visit us during the coming year.

Progress Summary:

A. Space-Time Covariance Functions

We are studying theoretical properties of covariance functions for processes varying in space and time, which are fundamental to the statistical analysis of environmental data. As part of this project, we are developing new classes of space-time covariance functions, studying their properties, and applying these new models to environmental processes. Michael Stein is the PI for this subproject.

B. Comparing Community Multiscale Air Quality (CMAQ) Model Output to Monitoring Data

We are developing broadly applicable and easily interpretable ways of comparing CMAQ output to monitoring data that consider the space-time nature of environmental processes. The specific application under current study is comparing model output and data for daily sulfate levels. The PIs for this subproject are Rao Kotamarthi and Michael Stein. Mikyoung Jun is a graduate student working on the subproject. The EPA collaborators for this effort are Peter Finkelstein and Robin Dennis.

C. Correcting Emissions by Comparing Model Output and Monitoring Data

Persistent inconsistencies between CMAQ output and monitoring data may be largely due to problems with the emissions inventory. We are developing statistical methods for adjusting emissions that consider the spatial dependence and outlying values shown in the differences between model output and data. The PI for this subproject is Michael Stein. Hae-Kyung Im is a graduate student working on the effort, and the EPA collaborator is Alice Gilliland.

D. Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ

Our EPA collaborator, Jason Ching, has been developing a version of CMAQ that runs at very high resolution. We have begun investigating methods for describing the space-time variations in air pollution from this model and for comparing this high resolution output with lower resolution output. These methods focus on the use of empirical variograms and related quantities to describe the differences between the space-time variations of models at different resolutions. Michael Stein is the PI for this subproject. Xiaofeng Shao is a graduate student working on the effort, and Jason Ching is the EPA collaborator.

E. Data Assimilation in Hydrodynamic Models

We are addressing a number of challenges in applying data assimilation methods to combining remotely sensed observations with a sediment transport program on Lake Michigan. Some of the challenges include the spatially dependent observation errors, the strongly nonlinear relationship between the observed reflectances and the sediment levels, the potentially large number of observations at a time (more than 10,000), and the critical impact of resuspensions on sediment levels. We have begun work on describing the discrepancies between observations and model output through spatial deformations. The idea is to use the hydrodynamic model to generate classes of spatial deformations that are guaranteed to be invertible mappings from Lake Michigan to itself. The PIs for this subproject are Dmitry Beletsky, Barry Lesht, Dave Schwab, and Michael Stein. In addition, Jon Stroud from the University of Pennsylvania, who is not currently funded by the Center for Integrating Statistical and Environmental Science (CISES), works on this project.

F. Estimating Deformations of Stationary Random Processes

Many environmental processes show evidence of nonstationarity in space. Most efforts to date on estimating such nonstationarities assume that one has many replicates of the process over time. However, with remotely sensed or other large datasets, it may be feasible to estimate nonstationarities from a small number (perhaps one) of replications over time, thus removing the need to assume stationarity and/or independence across time. We are developing methods for representing spatial deformations, statistical approaches for estimating these deformations based on a single-dense realization of the process, and effective algorithms for computing these estimates. The PIs for this subproject are Yali Amit and Michael Stein. A graduate student, Ethan Anderes, also is working on the subproject.

H. Statistical Analysis of Phytoplankton in Lake Michigan

As part of the Episodic Events Great Lakes Experiment, chlorophyll levels have been indirectly observed along vertical cross sections of Lake Michigan via a fluorometric technique. This project addresses a number of problems raised by these data, including the modeling of spatial variograms when variations in the vertical and horizontal are fundamentally different, the development of a hybrid physical-statistical approach to correct for a severe bias in the fluorometric observations when taken in strong sunlight, and calibrating the fluorometric readings to laboratory measurements from water samples whose locations do not coincide with any of the fluorometric observations. This project began before the establishment of CISES, but received some CISES funding given its strong overlap with the objectives of this project. The PIs for this subproject are Barry Lesht and Michael Stein. Leah Welty is a graduate student working on the effort.

I. Combining Physical Models and Total Ozone Mapping Spectrometer (TOMS) Ozone Data for Assessing Stratospheric Ozone Trends

This subproject is part of the project titled "The Detection of a Recovery in Stratospheric and Total Ozone," and is described in more detail in the reports for that project (see reports for EPA Agreement No. R829402C001). However, the notion of combining physical models for the stratosphere with observations fits in with the theme of this project. The Pis for this subproject are Michael Stein and Don Wuebbles. Serge Guillas is a postdoctoral research associate working on the subproject. In addition to these projects, a major effort by Rao Kotamarthi (investigator) and Alexis Zubrow (senior program) has enabled us to run CMAQ on the CISES computers. This achievement will be of great value in continuing the work on subprojects B and C as well as that of the project titled "Air Quality and Reported Asthma Incidence in Illinois" (see reports for EPA Agreement No. R829402C003). We also can run the UIUC-2D model for the stratosphere on our computers, which will be of value for subproject H.

Results to Date

A. Space-Time Covariance Functions. We have developed new ways of thinking about space-time covariance functions, new classes of models, and their properties. We expect this project to have a broad impact on the future theoretical development of space-time statistical models, the models actually used in environmental applications, and the statistical methods employed to assess the adequacy of such models. This project has important direct implications for much of the rest of the work at CISES, for which space-time statistical models are needed.

B. Comparing CMAQ Output to Monitoring Data. We have developed numerical and graphical summaries allowing comparison between CMAQ output and monitoring data that allows one to assess CMAQ's ability to capture the dynamic patterns in air pollution and applied these methods to daily sulfate levels. We expect our methods to be broadly applicable to assessing numerical models for air pollution. A paper on this topic is nearing completion.

C. Correcting Emissions by Comparing Model Output and Monitoring Data. This project is fairly early in its development, but we have what we believe is a sensible model for comparing monthly ammonia depositions from CMAQ to observations, and we presently are fitting this model to obtain estimated correction factors for emissions with defensible standard errors. This project should play an important role in assessing the feasibility of correcting emissions estimates using CMAQ.

D. Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ. This project is in its preliminary stages, but we already have found that a simple bilinear interpolation scheme of low-resolution output provides more consistency with high-resolution output than assuming that pollution levels are constant within a grid cell at lower resolution. In particular, the bilinear interpolation removes a periodicity in the spatial variogram of differences between the model outputs at different resolutions caused by the coarser gridding. We also have found that what we call "interaction" variograms are more useful for looking at space-time variations than ordinary space-time variograms.

E. Data Assimilation in Hydrodynamic Models. We have made substantial strides in developing spatially dependent statistical models for observation error and physical model error, and in adapting and developing computational methods for implementing an ensemble Kalman filter for sediment transport in Lake Michigan. We have gained a much better understanding of the nonlinear relationship between satellite observations and the physical quantity of interest (sediment levels), which is an essential part of any data assimilation scheme. We expect that what we learn considering sediment transport in Lake Michigan will inform data assimilation methods for any environmental indicator measured using remote sensing.

F. Estimating Deformations of Stationary Random Processes. We have developed ways of representing smooth, invertible spatial deformations based on flows along potential surfaces. By restricting the class of deformations in this way, which yield deformations that are curl free, we hope to obtain more realistic deformations than an unrestricted approach. We have invented a likelihood-based approach to estimating these deformations, and have developed efficient algorithms for implementing this procedure. We hope that these methods will be broadly useful in modeling of nonstationary spatial and spatial-temporal processes, which occur commonly in environmental applications. They are likely to be of particular value with large spatial datasets such as those often obtained with remote sensing.

G. Statistical Analysis of Phytoplankton in Lake Michigan. We developed methods for analyzing and understanding spatial data for which the vertical and horizontal variations are fundamentally different, which will be the case for many environmental processes. We invented a mathematical model to correct for quenching bias in fluorescence measurements. This is critical to ensuring that the measurement technique is viable during daylight hours. We developed calibration methods for estimating chlorophyll concentrations from fluorescence, which are essential to using fluorescence measurements as an indicator of the biological status of a body of water. We expect at least two papers to emerge from Leah Welty's thesis work in the coming months.

H. Combining Physical Models and TOMS Ozone Data for Assessing Stratospheric Ozone Trends. We have shown how a physical model for stratospheric ozone (UIUC 2-D model) can explain some of the temporal variations in total column ozone better than a more empirical approach, thus leading to the possibility of more accurate trend estimates than purely statistical approaches yield. The scope for using physical models to remove natural variation from environmental time series and thus get better estimates of anthropogenic effects is obvious, and this could significantly enhance our ability to assess environmental impacts of human activities.

Future Activities:

The activities described below will be performed in the next reporting period.

Space-Time Covariance Functions. We will apply the new models that we have developed in a number of our other projects, and will compare the results obtained to those for previously suggested models. Some of the models proposed in our project are expressed in terms of their spectral densities, and additional research needs to be conducted to compute their corresponding covariance functions quickly and accurately.

Comparing CMAQ Output to Monitoring Data. We will apply our methods to pollutants other than sulfate. As runs from an updated version of CMAQ become available, we expect to repeat our analyses and seek evidence of improvement in the newer version as measured by our methods. We will refine and further develop our statistical methods; in particular, we will improve our methods for handling missing data. As this project progresses, we will deepen our understanding of the relationship between model output and monitoring data to develop sensible data assimilation methods for CMAQ.

Correcting Emissions by Comparing Model Output and Monitoring Data. We are just now fitting our model to the same period in 1990 that Dr. Gilliland has used to estimate emissions corrections and, upon completing this project, we will compare our results to hers and then try to explain any discrepancies we may find. We also will implement a Bayesian version of our procedure and compare these results with the plug-in approach we are using. A longer term objective is to develop models and methods for making emissions corrections that are not constant across space.

Statistical Issues Arising in the Study of High-Resolution Versions of CMAQ. There are a number of issues that we will address. Two of our highest priorities are to: (1) provide easily interpretable and useful statistical (both numerical and graphical) summaries of what is lost when using lower resolution models, and inferences about extreme values; and (2) use the highest resolution model as a kind of "truth" so that we can develop statistical descriptions of model error. In addition to being helpful in understanding what is gained by using higher resolution models, the development of statistical models for the errors of deterministic physical models is one of the most important and challenging problems in successfully applying data assimilation to air pollution models, for which an assumption of no model error is untenable. We have had discussions with Jason Ching about additional CMAQ runs that could be conducted relatively easily, and would enhance our ability to quantify model errors. We have met with Noboru Nakamura in Geophysical Sciences at the University of Chicago about this subproject, and may add him to the team.

Data Assimilation in Hydrodynamic Models. We think we have a feasible and sensible data assimilation scheme for sediment transport in Lake Michigan using satellite observations of lake color, and we will implement this scheme and assess its effectiveness. This approach assumes that model and observational errors are additive, and we will investigate modeling errors through the spatial deformations that we are now obtaining through the hydrodynamic model as described. Once we can demonstrate in principle the effectiveness of the deformation approach, we will need to address the difficult computational issues of estimating these deformation errors. It will likely be necessary to restrict the class of possible deformations to something that is of fairly low effective dimension.

Estimating Deformations of Stationary Random Processes. In addition to continuing the development of our likelihood-based approach for estimating deformations and the necessary algorithms for computing the estimates with large spatial datasets, we will apply our methods to environmental problems. Analyzing the output of a physical model such as CMAQ might be a sensible choice to examine because of the extensiveness and completeness of the output.

Statistical Analysis of Phytoplankton in Lake Michigan. This work is now completed, except for converting parts of Leah Welty's thesis into papers. This project will not receive additional funding.

Combining Physical Models and Total Ozone Mapping Spectrometer Ozone Data for Assessing Stratospheric Ozone Trends. We will repeat the UIUC 2-D model for the stratosphere to approximate ozone patterns that would have been observed if, instead of the Montreal protocol being implemented, a kind of "steady state" in ozone-destroying compounds had existed. To the extent that one could find a statistically significant increase in ozone in recent years relative to what this model predicts, we then could claim to have detected a "recovery" in stratospheric ozone. We would repeat this kind of analysis using other models for the stratosphere to determine how robust the conclusions are to the choice of physical model.


Journal Articles on this Report: 1 Displayed | Download in RIS Format

Other subproject views: All 28 publications 27 publications in selected types All 17 journal articles
Other center views: All 102 publications 59 publications in selected types All 37 journal articles

Type Citation Sub Project Document Sources
Journal Article Stein ML. Space-time covariance functions. Journal of the American Statistical Association 2005;100(469):310-321. R829402C002 (2002)
R829402C002 (2004)
R829402C002 (Final)
  • Abstract: Ingenta Connect Abstract
    Exit EPA Disclaimer
  • Supplemental Keywords:

    atmosphere, ozone, water, watersheds, stratospheric ozone, chemical transport, ecological effects, particulates, environmental chemistry, environmental policy, Great Lakes, EPA Region 5, air quality, health effects, regulation, ecosystem sustainability, decisionmaking, exploratory research, environmental biology, air pollution, chemical transport modeling, chemical transport models, ecological health, ecological models, ecological risk, ecosystem health, human health risk, monitoring, policymaking, risk assessment, risk management, statistical methodology, statistical methods, stochastic models, trend monitoring. , Ecosystem Protection/Environmental Exposure & Risk, Economic, Social, & Behavioral Science Research Program, Air, Geographic Area, Scientific Discipline, Health, RFA, PHYSICAL ASPECTS, Ecosystem/Assessment/Indicators, Engineering, Chemistry, & Physics, Risk Assessments, Environmental Statistics, Great Lakes, Applied Math & Statistics, Health Risk Assessment, Physical Processes, Ecological Risk Assessment, Environmental Engineering, EPA Region, particulate matter, Ecological Effects - Environmental Exposure & Risk, Ecosystem Protection, Monitoring/Modeling, Environmental Monitoring, risk assessment, trend monitoring, ozone , chemical transport models, particulate, stochastic models, statistical methodology, air quality, computer models, ecological risk, ecosystem health, environmental indicators, ozone, chemical transport, health risk analysis, human health risk, monitoring, statistical models, particulates, statistical methods, watersheds, Region 5, air pollution, sediment transport, stratospheric ozone, emissions monitoring, data models, exposure, water, chemical transport modeling, ecological models, ecological effects, ecological health, human exposure
    Relevant Websites:

    http://www.stat.uchicago.edu/~cises/index.html exit EPA
    http://galton.uchicago.edu/~cises/ exit EPA

    Progress and Final Reports:
    Original Abstract
    2004 Progress Report
    2006 Progress Report
    Final Report


    Main Center Abstract and Reports:
    R829402    Center for Integrating Statistical and Environmental Science

    Subprojects under this Center: (EPA does not fund or establish subprojects; EPA awards and manages the overall grant for this center).
    R829402C001 Detection of a Recovery in Stratospheric and Total Ozone
    R829402C002 Integrating Numerical Models and Monitoring Data
    R829402C003 Air Quality and Reported Asthma Incidence in Illinois
    R829402C004 Quasi-Experimental Evidence on How Airborne Particulates Affect Human Health
    R829402C005 Model Choice Stochasticity, and Ecological Complexity
    R829402C006 Statistical Approaches to Detection and Downscaling of Climate Variability and Change

    Top of page

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Conclusions drawn by the principal investigators have not been reviewed by the Agency.


    Local Navigation


    Jump to main content.