Skip common site navigation and headers
United States Environmental Protection Agency
Ground Water & Drinking Water
Begin Hierarchical Links EPA Home > Water > Ground Water & Drinking Water > M/DBP Stakeholder Meeting Statistics Workshop End Hierarchical Links

 

M/DBP Stakeholder Meeting Statistics Workshop

Meeting Summary:

November 19, 1998
Governor's House, Washington DC

Final
February 1999

Prepared for:
U.S. Environmental Protection Agency
Office of Ground Water and Drinking Water
401 M Street, SW
Washington, D.C. 20460

Prepared by:

RESOLVE
1255 23rd St, NW
Washington, DC 20037
EPA Contract No. 68-W4-0001
Work Assignment No. 195
Task 5

SAIC 1710 Goodridge Drive
McLean, VA 22102
EPA Contract No. 68-C6-0059
Work Assignment 1-20
SAIC Project No. 01-0833-08-3556-110

TABLE OF CONTENTS

1. Introduction

2. Initial Presentations

2.1 Regulatory Background and Context

2.2 Issues - ICR Recovery and Detection

3. Panelist Presentations: Responding to Problem Statement

3.1 Some Characteristics of Cryptosporidium Sampling Data - Dr. David Parkhurst

3.2 Issue: How should ICR data be used? - Dr. Bertram Price

3.3 Handling Cryptosporidium ICR Data - Dr. Richard O. Gilbert

3.4 Interpretation of Cryptosporidium and Giardia Monitoring Data Generated by ICR Program - Dr. Jery Russell Stedinger

3.5 Encounter Sampling for Polar Bear, Cryptosporidium, Giardia, and other critters - Dr. Lyman McDonald

3.6 The Rat, the Dancing Chicken, and the Fireman - Dr. Lawrence Mayer

4. Open Discussion

4.1 Summary of Discussion

4.2 List of Comments, Issues, and Possible Action Items

LIST OF ATTACHMENTS

Attachment 1: Statistics Workshop: Panelists - Biosketch (November 19, 1998)

Attachment 2: Draft II Agenda & Participants List - M/DBP Stakeholder Meeting Statistics Workshop

Attachment 3: Issue Paper: November 19, 1998 Meeting on ICR Cryptosporidium Statistical Treatments for Recovery and Detection

Attachment 4: Regulatory Background and Context (Stig Regli, USEPA)

Attachment 5: Issues: ICR Recovery and Detection (Mike Messner, USEPA)

Attachment 6: Some Characteristics of Cryptosporidium Data (David Parkhurst, Indiana University, November 19, 1998)

Attachment 7: How Should ICR Data Be Used (Bertram Price, Price Associates Incorporated, November 19, 1998)

Attachment 8: Handling Cryptosporidium ICR Data (Dr. Richard O. Gilbert, Battelle, November 19, 1998)

Attachment 9: Interpretation of Cryptosporidium and Giardia Monitoring Data Generated by ICR Program (Dr. Jery Russell Stedinger, Cornell, November 19, 1998)

Attachment 10: Encounter Sampling for Polar Bear, Cryptosporidium, Giardia, an other critters (Dr. Lyman McDonald)

Attachment 11: The Rat, the Dancing Chicken, and the Fireman (Dr. Lawrence Mayer, November 1998)

1. Introduction

The U.S. Environmental Protection Agency (EPA) held a public meeting for Microbial-Disinfection Byproducts (M-DBP) stakeholders on November 19,1998 to discuss statistical methods for analyzing microbial data being collected under the Information Collection Rule (ICR) and the ICR Supplement Survey Program. In particular, this Statistics Workshop reviewed issues associated with evaluating the occurrence of Cryptosporidium in source waters of drinking water supplies. The workshop was organized around the participation of six panelists with expertise in the related fields of statistics, environmental engineering, environmental monitoring, and human health. These panelists have varying degrees of previous involvement with the occurrence of drinking water pathogens in water. These six panelists were (see Attachment 1 for biosketches):

  • Dr. David Parkhurst of Indiana University

  • Dr. Bertram Price of Price Associates, Inc.

  • Dr. Jery Russell Stedinger of Cornell University

  • Dr. Richard O. Gilbert of Battelle Memorial Institute

  • Dr. Lyman McDonald of WEST, Inc.

  • Dr. Lawrence Mayer of Good Samaritan Medical Center

Abby Arnold of RESOLVE, the meeting facilitator, began the Statistics Workshop by welcoming the participants, introducing the expert panelists, and reviewing the objectives of the meeting and the proposed agenda [Attachment 2, including meeting participants list]. The purpose of the workshop was to get feedback from the expert panelists who were not involved in the current efforts and to identify ideas and thoughts on the process. The specific objectives identified in meeting materials were:

  • Review statistical approaches for evaluating a recovery factor for Cryptosporidium, provide feedback, and suggest alternatives and enhancements.

  • Review approaches for dealing with microbial nondetects, provide feedback and suggestions and enhancements, and discuss the characterization of the occurrence distributions to support national regulatory impact analyses.

Arnold explained that the meeting would consist of a discussion of the regulatory background and a definition of the problem, followed by panelist presentations concerning several questions related to the treatment of the ICR microbial occurrence data. After the panelist presentations there would be an open discussion of the problems and issues.

As part of the meeting materials, an issues paper for this meeting was prepared by EPA and sent to participants who had pre-registered for the workshop [Attachment 3]. This issue paper provided an overview of the problem including discussions on the ICR method recovery, detection capability of the ICR method, the intended use of the ICR data, and statistical modeling to predict outcomes and choosing sample sizes.

2. Initial Presentations

Stig Regli (EPA) and Mike Messner (EPA) provided initial presentations on regulatory background and context and background on the ICR recovery and detection issues.

2.1 Regulatory Background and Context

Stig Regli presented a brief summary of the relevant regulatory context for the discussions at this workshop [Attachment 4]. Regli provided a summary of current microbial regulations including the Surface Water Treatment Rule (SWTR) and the Total Coliform Rule (TCR). A discussion was provided of the pending Interim Enhanced SWTR that will address the removal of Cryptosporidium via filtration and other requirements such as sanitary surveys.

There are several remaining issues related to microbial protection of water supplies including: whether all systems are providing adequate treatment for pathogens; and whether distribution systems are adequately protected from pathogen intrusion or bacterial growth. Several regulatory options are under consideration for the Long-Term 2 ESWTR (LT2ESWTR) including a minimum fixed level of treatment for Cryptosporidium (e.g., 3-log removal), minimum fixed inactivation levels, levels of treatment dependent on source water quality, and best management practices requirements for distribution systems. Information from the ICR, ICR Supplemental Survey, available research, and other surveys will be used to support LT2ESWTR development.

Regli discussed the specific ICR microbial monitoring requirements. These requirements include, for large (serving greater than 100,000 people) surface water systems, monitoring of source waters (one sample each month) for 18 months for Giardia, Cryptosporidium, viruses, total coliforms, and fecal coliforms or E. coli. Finished water may have to be monitored if source water limits for protozoa or viruses are exceeded.

Regli provided a list of the intended uses of ICR data for Regulatory Impact Analysis (RIA) purposes. These uses included: developing national distributions of source water protozoa occurrence; characterization of co-occurrence of pathogens, indicators, and DBP precursors; characterization of pathogen occurrence by water body type and geographic region; and predicting national pathogen occurrence in finished waters for regulatory purposes.

Following Regli's presentation, meeting participants discussed the following points:

  • Based on the objectives of the workshop, the focus of discussion in this workshop will be on the generation of national distributions of pathogens in source water and finished water and the issues associated with this including how to consider below detects and the role of an adjustment factor for recovery in interpreting the data.

  • The statistical approach may differ depending on the specific questions being asked.

  • The protection of source water instead of only requiring treatment is an option that may not be an appropriate topic for the workshop discussion. However, treatment requirements (e.g., the level which might be required) may be based on source water occurrence levels.

  • In assessing health effects, one needs to understand finished water occurrence, and the ICR will provide useful information only on source water occurrence. It will be important to understand the relationships;

  • between source water occurrence and finish water occurrence and

  • between finish water occurrence and health risk

  • The focus should be on source water microbial occurrence since only a fraction of systems are conducting finished water monitoring and most of their crytosporidium results will be non-detects.

2.2 Issues - ICR Recovery and Detection

Dr. Messner provided a presentation on issues related to recovery and detection for the microbial analytical methods used under the ICR [Attachment 5]. The issues associated with the ICR method include low and variable recovery, variable volume analyzed, and the number of nondetects. The ICR method involves the counting of oocysts and there are numerous steps in which losses of oocysts can occur. The method provides integer results (i.e., 0, 1, etc.) which must be divided by the volume of water analyzed to determine a count per volume.

Messner also mentioned that the term "Method Detection Limit" does not have a use in the discussion of these methods. The expression "one per volume analyzed" does not support the use of this term.

Messner provided a discussion of estimating a recovery factor from the ICR Spiking Study. In this study, 70 water utility participants of random source water will provide two spiked samples during the ICR. Messner provided preliminary results of the spiking study. Information on recovery was presented by month and by volume analyzed.

Messner presented scenarios of possible results for individual source water microbial data under the ICR based on 18 observations. Source water may have many detections, few detections, or all zeros (i.e., counts). For these source waters Messner indicated that EPA wanted to know the true distribution means and true distributions of 90th or other percentiles for microbial occurrence. For the ICR data, the observed distributions will be biased because of the fractional recovery rate and will differ from the true distribution and some adjustment or correction for this bias of recovery may be needed, but this procedure could be problematic.

Messner noted that from the ICR data we would want to derive a national occurrence distribution for use in estimating the number of cryptosporidiosis cases attributed to drinking water. This derivation would be based on source water occurrence estimates, treatment efficiency estimates (by which to predict finished water oocyst concentrations) and application of dose-response information. Two methods for conducting this analysis are: 1) a system-by-system approach and aggregation of the results; or 2) using Monte Carlo techniques to simulate system level characteristics from national distributions.

The following questions and comments were made during this presentation:

  • Regarding the Spiking Study, it may be useful to examine whether recoveries correlate with individual laboratories.

  • Background levels of microbials were not subtracted out in computing recoveries from the preliminary Spiking Study data. It was noted that the source water samples were spiked with rather high levels of microbials and background levels are not expected to significantly influence the results.

  • Could part of a sample be lost in the analytical process? For example, if the water is very dirty then the pellet volume is large and only part of the pellet may be examined.

  • Is the concentration of microbials in the pellet uniform, what is the level of randomness?

  • There is temporal variability in 18 monthly samples. Using the 18 monthly samples results in 6 months being sampled twice. Should the analysis be designed around 12 monthly samples? Another participant thought that the complement of 18 monthly samples should be used.

3. Panelist Presentations: Responding to Problem Statement

Prior to the workshop, the expert panelists were provided with three questions that formed the problem statement for discussion purposes. The three questions were:

    1. How should a national distribution of source water oocyst concentrations (means or 90th percentiles) be derived from ICR results that were produced by a measurement method having low recovery and poor detection capability? For this purpose, how should zeros and low recovery be treated (see questions 2 and 3)? Finally, how can we gauge the resulting distribution's uncertainty?

    2. How should zeros be treated? Should they be replaced by nonzero values? Can confidence/probability intervals be produced and, if so, how can they be used in answering question 1?

    3. How should low recovery be treated? Can the results simply be "adjusted for recovery" (i.e., dividing estimates by an estimate of the method's mean recovery)?

Prior to this workshop the panel participated in a conference call- and corresponded by e-mail so as to develop a coordinated series of presentations

3.1 Some Characteristics of Cryptosporidium Sampling Data - Dr. David Parkhurst

Dr. David Parkhurst presented a discussion of characteristics associated with sampling data of Cryptosporidium [Attachment 6]. Using several examples, Parkhurst demonstrated that count data are not like continuous data. In summary, an occasional nonzero value among a series of zeros through time may: 1) be associated with a high concentration estimate because of a low recovery rate; and 2) wrongly suggest that the true concentration is currently much higher than usual. But such values could also correctly indicate that the current concentration is higher than usual. However, there is no way to determine which of these is occurring. The only way to resolve this ambiguity is to improve sampling by: increasing the number of samples; increasing the effective volume of each sample; or increasing the recovery rate.

Parkhurst also noted that zeros provide useful information; that is, zeros are unlikely if oocysts are common and are more likely when oocysts are at low concentrations. However, the treatment of zeros can be made more quantitative. Parkhurst provided an example of oocysts randomly and independently distributed in a water sample (which leads to a Poisson distribution). In this example, the probability of a count of zero (in a sample of given volume and recovery) is 5 percent if the mean count is about 3 and 10 percent if the mean count is about 2.3. Therefore, a zero sample lets you say there is only a 5 percent probability that the oocyst concentration of the water samples was greater that 3/(recovery rate x volume), and only a 10 percent probability that the oocyst concentration of the water sampled was greater that 2.3/(recovery rate x volume).

Parkhurst noted that for a sequence of samples, like 18 monthly samples, the mean concentration can be calculated by (counts)/(sample volumes). The sum of the counts include zeros and the sample volumes are "effective volumes" (i.e., water volumes sampled x fraction inspected).

In summary, Parkhurst noted that the ICR cyst and oocyst data consist of counts and that those counts are often zero. Parkhurst added that:

  • Zeros must be counted as zeros to balance out overestimates from nonzero counts.

  • Imperfect recovery increases the proportion of zeros and can be corrected for to some extent but variable recovery makes this difficult.

  • Even a single zero sample provides some information.

  • Upper percentiles depend on one or two values which may vary greatly because of random sampling and summary statistics that use all the data is usually preferred.

  • It is an inescapable conclusion that for scarce objects that are hard to find we need many large, high quality samples.

Participants had the following comments and questions during this presentation:

  • When the concentrations are low you have to include the zeros to calculated the correct average. If in the ICR method we are not measuring the entire aliquot this could be a problem. Is there a way to address this?

  • The use of Bayesian and non-Bayesian statistics may produce different results. Currently, Bayesian statistics are not being used, but a deterministic approach is being used to estimate numbers of cancer deaths avoided under alternate regulatory strategies. We may want to look at both approaches. Do we become Bayesian de facto when dealing with human health?

  • All sample volumes are not equal since as water quality degrades the volumes assayed become lower.

3.2 Issue: How should ICR data be used? - Dr. Bertram Price

Dr. Bertram Price provided a presentation on how to use ICR data to: 1) characterize the distribution of source water oocysts across the ICR plants; and 2) evaluate regulatory options (e.g., monitoring requirements, compliance decision rules) [Attachment 7]. In addition, he considered how one might specifically account for the high percentage of zero measurements and low recovery rate. Price noted that the zero measurements are as meaningful as any other measurements and should be treated as zeros in any and all statistical analyses. The matrix spiking analyses can be used to get a handle on recovery rates.

The topics addressed by Price included: 1) Data Quality Objectives (DQOs); 2) single measurement characteristics including sensitivity, detection limits, and bias (recovery); 3) combining multiple measurements; and 4) recovery.

Regarding DQOs, are the data sufficient for the intended application? The ICR data do not have to be perfect or even "good." We need to determine only if the data are good enough for the intended applications.

Sensitivity and detection limits are method design concepts. Sensitivity is the concentration estimate corresponding to a count of one oocysts in the sample. The detection limit is the minimum true mean oocyst concentration necessary to conclude with a high degree of confidence that a nonzero oocyst measurement does, in fact, indicate the existence of a true oocyst populations. To improve sensitivity, you must either revise the method to increase the fraction of sample examined and/or analyze multiple samples and use the average as the oocyst concentration estimate, which has improved sensitivity relative to a single measurement. A detection limit protects against false positives. Previously, there has been no indication that false positives were a problem for the oocyst measurement method. If the false positive rate is extremely small, the detection limit concept and the classification of measurements as "nondetects" are not useful for interpreting the data. Distributions that can possibly be used to describe oocyst measurements include: Poisson, negative binomial, or lognormal. Poisson and negative binomial should be adequate. These distributions incorporate the inherent discrete nature of the measurements, i.e., the measurements are counts. The lognormal should not be applied.

The Poisson or negative binomial distributions also can be used effectively to describe multiple oocyst measurements and averages of oocyst measurements. The negative binomial incorporates random fluctuations in the true mean oocyst level over time.

Recovery adjustment of the ICR data is essential for the direct interpretation of the data and to conduct simulations to evaluate regulatory options. Recovery studies are needed. To address recovery, one could determine, in theory, adjustments for each sample by: specifying an acceptable recovery estimation error, designing an experiment in split samples and spiking levels, and estimating recovery using point estimates and confidence limits. Each measurement, then, would have its own recovery adjustment. The measurement, adjusted (divided) by the recovery estimate, could be reported with a confidence interval that reflects the uncertainty in the recovery estimate. This approach would be very expensive and therefore impractical if applied to every sample, but may produce useful information on recovery for subsequent analyses of the oocysts data set if applied to a judiciously selected subset of samples.

In conclusion, Price summarized his presentation as follows: 1) the principal use of the ICR data is to conduct simulations to estimate exposure distributions and evaluate regulatory options; 2) zero counts should be treated as zeros in all statistical analyses and the nondetect designation is meaningless; 3) the preponderance of zero measurements is most likely due to small water volumes and poor recovery; 4) recovery adjustment estimation and application are the most significant statistical issues concerning the interpretation of the ICR data; and 5) the supplemental survey should include sufficient samples and analyses to assure that the data collected will "solve" the recovery adjustment problem.

The following comments and questions were made during this presentation:

  • Based on the size of confidence intervals, would the use of an action level be possible?

  • There is uncertainty as to whether there may be some real zeros and that oocysts may not be present in some source waters. This uncertainty remains if there is a measurement of 1. Because of this uncertainty, terminology is important is these discussions. A censored level may exist for these types of analyses.

  • If measurements are combined and, therefore, we are looking at the aggregate, then won't there be a reduced confidence around the single measurements.

  • The issue was raised as to the confidence intervals on any calculated recovery rate.

3.3 Handling Cryptosporidium ICR Data - Dr. Richard O. Gilbert

Dr. Richard Gilbert provided a presentation on the handling of Cryptosporidium data under the ICR [Attachment 7]. Gilbert's presentation specifically addressed some ideas for how zeros (or nondetects) and low recovery should be treated in deriving national distributions of source water oocyst concentrations (means or 90th percentiles) and the methods employed in estimating these distributions.

On addressing the issue of zeros, Gilbert suggested that we: 1) do not use the term of "nondetects" for zeros because zeros are real values; 2) do not replace zeros with an arbitrarily chosen detection limit; and 3) do not use data analysis methods developed for data sets that contain nondetects or less-than values.

Gilbert suggests that the oocyst concentrations be corrected by dividing by the recovery rate. Ideally, we would like to have a recovery rate for each individual filter sample. An average recovery rate can be obtained in a special study. However, this average recovery rate will over or underestimate the recovery for samples from another site, time, or sampling process and will increase the uncertainty in the national distribution.

Special recovery studies can be conducted to generate a distribution of recovery values that can be applied to the individual sites and sampling and measurement conditions under the ICR. Gilbert suggested that we make use of the estimated distribution of recovery values in developing the national distribution of oocyst concentrations. For example, recovery values could be selected at random from the estimated distribution of recovery values and used to represent an "uncertainty" component of the computed oocyst concentration from a filter sample.

The concentration of oocysts (C) can be computed from the counts of oocysts (K), the volume of water filtered (V), the fraction of centrifuged pellet that is examined for oocysts (F), and the fraction of oocysts (R) in the equivalent volume (EV) (where EV=VxF). If counts have a Poisson distribution and V, F, and R are constants (or have very small uncertainty) then we can estimate the mean concentration for each of the individual ICR sites using: C=100K/(VxFxR). An average C can be computed by averaging the 18 monthly values for each site. These means can be used to estimate the national distribution of mean concentrations and confidence intervals around the means.

Gilbert provided some caveats in using this approach when the counts are not Poisson and V, F, and R are not constants and do not have small uncertainty . The bias in the average C may be large if V, F, and R are highly uncertain. The average C will be accurate if the number of samples is large (but 18 may be too small). The confidence limits of the average C can be computed using the normal distribution if the number of samples is large (but 18 may be too small). Lastly, the average C can be adjusted to reduce the bias using Jackknife or bootstrap-type approaches.

Gilbert further elaborated on the use of the estimated national distributions. These national distributions can be used to estimate percentiles of the site means, mean of the site means, and range and variability of the site means. However, Gilbert warned that the national distributions are not sufficient to estimate the variability of concentrations over time for any given site. Uncertainties in estimating the national distribution are influenced by the following: unrepresentative sites or sampling times and locations; too few samples collected at the sites; changes in the true oocyst counts over time; and changes in the volume of water filtered.

Gilbert suggested an approach for addressing the level of uncertainty. First, use the data as collected, retaining the zero counts, and compute the weighted mean concentration using the equation presented. A national distribution should then be constructed from these average concentrations. Then, go back to the raw counts and adjust the zeros using "worst case" assumptions and recompute the weighted mean and national distribution. If the regulatory impact analysis (RIA) differs for these two cases, then it is important to estimate the uncertainty of the national distribution. If the RIA does not differ very much for these two cases, then quantifying the uncertainty of the estimated national distribution is not so important.

Gilbert also provided some simple options for estimating the uncertainty in the national distribution of site means. One option was to compute the mean of the 300 site means and compute confidence limits on that mean. Another option presented was to estimate the 90th percentile of the 300 site means and then obtain confidence limits on that estimated percentile using a simple nonparametric approach. A third option is to conduct Monte Carlo uncertainty and sensitivity analysis for each site and use that information to put uncertainty bounds on the true national distribution of site mean oocyst concentrations.

Gilbert concluded his presentation by noting that the panel was asked to focus on how to cope with the ICR data problems. Gilbert suggested that in the future we should focus on how to avoid these problems. One approach to think about is to use Ranked Set Sampling to obtain water samples that are more representative of the water population of interest.

Also, "how to cope" depends on the use of the data and quality and quantity of data required for making regulatory decisions. The DQO planning process is useful in these situations.

The following points were discussed following Gilbert's presentation:

Should the means be used or the 90th percentiles in evaluating oocyst occurrence? Do these protect public health? If using the 90th percentile, how are we sure that this is not just based on an extreme event? As an alternative, maybe we should determine a level of protection and pull down the percentage from this level. The comment was made that the regulatory level must lower endemic risks as well as eliminate outbreaks. The question is whether one or two numbers (i.e., the mean and 90th percentile) is required to accomplish this need. The 90th was chosen over the mean. To achieve something like a 99th percentile, the sampling design would be prohibitive.

3.4 Interpretation of Cryptosporidium and Giardia Monitoring Data Generated by ICR Program - Dr. Jery Russell Stedinger

Dr. Stedinger provided a presentation on interpreting microbial ICR monitoring data [Attachment 9]. Stedinger began by listing several lessons: 1) at low concentrations many samples have zero counts; 2) a zero count is a zero count and not a concentration below a detection limit; and 3) to correctly analyze the data, it must be represented correctly.

Stedinger provided two examples of Cryptosporidium data sets (from New Jersey and West Virginia) and presented several scenarios of analyzing the data. Stedinger calculated percentiles, means, and CV using lognormal, Poisson ,gamma/Poisson and gamma/beta/Poisson distributions. The beta distribution was introduced to explicitly represent the variability in the ICR methods recovery rate. Lessons learned from evaluating the New Jersey data included: 1) treating data as censored misrepresents variability; 2) the Poisson distribution did not fit count data because of variability in the effective concentrations; and 3) the gamma/beta/Poisson model represents variability in recovery rates AND counts thereby revealing variability in actual concentrations. Lessons learned from evaluating the West Virginia data included: 1) volumes analyzed and oocyst concentrations were inversely related; 2) total counts divided by total volume can give a highly biased estimate of mean; and 3) the gamma/beta/Poisson model represents variability in counts and recovery rates, thereby better revealing variability of oocysts.

Stedinger suggested that in modeling national oocyst concentrations that we should use all observations and supplemental data from all sites that we can see (i.e., oocysts, effective volumes, and recovery rate distributions). The average national distribution of Cryptosporidium can be derived for each region or season employing sample turbidity and other water quality parameters.

Stedinger noted that for risk analysis, the key concern of public health risks occurs with high doses. At locations where oocyst concentrations vary widely with water quality parameters, the sampling should focus on defining health risks in periods of high risk.

In conclusion, Stedinger noted that to correctly interpret data we must consider volume analyzed, the sampling distribution for counts, and recovery rate distribution. The ICR samples should include information on laboratory, flow levels, Giardia concentrations, turbidity, suspended solids, and other water chemistry parameters. This information will help explain recovery rates and variations in oocysts concentrations.

The following comments were made by participants during this presentation:

  • Recovery rates cannot be measured along with the ambient concentration. Therefore, recovery rates representing a matrix of source water has to be developed.

  • Recovery rates corresponding to a matrix of source water may be correlated with characteristics of the water matrix.

3.5 Encounter Sampling for Polar Bear, Cryptosporidium, Giardia, and other critters - Dr. Lyman McDonald

Dr. McDonald provided a presentation on "encounter sampling" associated with microbial and other similar sampling experiences [Attachment 10]. McDonald provided as an example of this type of monitoring for the aerial survey of polar bears. In encounter sampling not all individuals in a sample are encountered and counted. The probability of encountering and counting an individual may depend on the characteristics of the individual or other factors. McDonald defined this problem as unequal probability sampling from finite sampling theory. When counting individuals, one must divide by the probability of the encounter to adjust for the individuals missed.

McDonald noted that average recovery rates can be used if interested in estimating the total number of individuals or mean densities. However, average recovery rates cannot be used to conduct site-specific evaluations because over and underestimates can occur and the extreme values of densities are not representing in the distribution.

For the ICR data it is important to know if we are interested in the individual sample values (ratios) for each water source or, instead, if we are interested in the summary ratios of the totals. The mean density of oocysts over time is of some value and probably best estimated by the ratios of totals.

McDonald noted the probability of encounters is the recovery rate of oocysts. Also, zeros are zeros in that they are real data and should not be replaced by a nonzero number or a detection limit. Given enough data in the spiking studies, it may be possible to model and estimate unique recovery rates for different source waters.

McDonald questioned, under the ICR, whether we must work with the individual data from each of the 18 samples or can we evaluate the ratios of totals across time. McDonald also suggests that the data need to be adjusted for recovery when making inferences of the mean of the distribution and for summarizing the data for each by the ratio of totals. McDonald would not make adjustments for recovery if the objective is to make inferences to the 90th percentile of the data for a given water source.

3.6 The Rat, the Dancing Chicken, and the Fireman - Dr. Lawrence Mayer

Dr. Mayer provided a general presentation on the discussion of the problem statement [Attachment 11]. Mayer noted that his perspective is as an epidemiologist who has worked to protect the public health and welfare. Mayer made several introductory comments including: 1) Cryptosporidium as a public health problem is here to stay and we must concentrate on minimizing its impact; 2) Cryptosporidium is transmitted by routes other than water; 3) the disease mechanism of cryptosporidiosis is not known; 4) there are no good treatments for cryptosporidiosis; and 5) a single oocyst could cause an infection and death.

Mayer reinforced several previously made statistical comments including: 1) zero counts are zero counts; 2) plotting counts against volumes and similar plots will be useful; and 3) estimating the national distribution of averages is a different task than estimating if a particular water plan has a problem. Mayer also provided additional thoughts on the application of statistics including: 1) no threshold on the number of oocysts will provide a rational policy with regard to risk; 2) statistics is not an effective tool for risk analysis; and 3) epidemiology splits into public health and clinical epidemiology and they are not always the same.

Mayer suggested alternative issues to consider. The group may want to consider a stratified analysis by type of plant, seasonality, and demographics of population to determine what drive the counts. Also, consider an empirical Bayesian approach by using the sample plan measure level to determine when to sample next and to build an estimate of prior odds. Avoid transforming the data. In addition, epidemiological measures may prove useful (e.g., attributable risk).

Mayer also raised the concern of equilibrium and non-equilibrium failures. The former might result for the occurrence of unusual combinations of oocyst concentrations in raw water and poor water treatment plant performance. Their frequency can be estimated from the data sets being collected. Non-equilibrium failures result from events which are not reflected in available data sets, such as sabotage or gross mismanagement of a water plant.

4. Open Discussion

After the six panelists made their presentations, an open discussion of the issues was conducted involving all workshop participants. Questions and issues were raised by the workshop participants based on the presentations by the six panelists. This section is organized into a summary of the discussion followed by a list of issues and action items captured at the workshop.

4.1 Summary of Discussion

Question 1: How should a national distribution of source water oocyst concentration be derived?

There are different approaches available to calculate the 90th percentile. A list could be generated of the various approaches. Price mentioned that the problem with using the 90th percentiles, or any other summary statistic, is the necessity to adjust for recovery (i.e., the adjustment and its characteristics is the statistical problem). The 90th percentile has more meaning for some people in that there usually is an actual sample value associated with the 90th percentile as opposed to a zero. It was mentioned that the 90th percentile may have more meaning for people when they are comparing their source water oocyst concentrations. Price continued that the confidence interval for the 90th percentile is much wider than the confidence interval for the mean (i.e., in any sample, the 90th percentile as an estimator of the true 90th percentile of the distribution has less precision than the average as an estimator of the true mean).

EPA must develop a regulation and the regulation will be based on the best data available using best judgement. The decision to use the 90th percentile to characterize source water oocyst occurrence was developed during the 1992-93 Regulatory Negotiation Committee meetings.

Correlations, Modeling, and Graphical Techniques

The use of modeling and Monte Carlo techniques were suggested as tools for developing national distributions and the interrelationships between variables. It would be possible to use regression and plotting techniques on the variables associated with the initial recovery rates (from the Spiking Study) presented earlier in the day. Graphics could be used to show variations and relationships. Empirical data could be used to examine recovery rates to study issues like laboratory variation.

It may be useful to correlate Cryptosporidium oocyst concentrations with water quality parameters such as turbidity. How would adjusting oocyst concentrations by recovery rates impact these types of analyses? Will modifying all oocyst data by recovery rates diminish the ability to correlate? Or would using the unadjusted data be more useful?

Estimating the Recovery Rate

The recovery rate estimate is critical to the evaluation of the ICR data. The recovery rate needs to be properly estimated otherwise bias may become an issue. A remaining question is whether there is a minimum effective volume for the purpose of estimating the recovery rate? Because the Supplemental Survey is being conducted with the new Method 1622 and not the ICR method, it could not be modified to assist in estimating a recovery rate for the ICR.

Use of 18 Months or 12 Months of data

If we base the analysis on the mean concentration then we cannot use 18 months of data and must use 12 months of data. It would be possible to use 12 months of data based on a moving window. It was mentioned that if the weighted average is used then monthly samples are not counted equally. It was noted that 18 monthly data points were collected so that there would be an overlay of 6 months.

Geographical and Temporal Differences

There may be a need for estimating distribution of concentrations of source waters of different regions and other characteristics such as water body type. Site-by-site characterizations was not the intended use of the ICR data. There also would be temporal variability. The ICR data set does not account for 100-year events.

Purpose of ICR data

What are the intended purposes of the ICR data? For example, will the data be used for characterizing individual sites? Or would the data be better suited for characterizing regions or just characterizing oocysts nationally? What would be the basis for using Monte Carlo simulations during the regulatory development process? It was mentioned that people are interested in the distribution of oocysts concentrations. People are not necessarily interested in the distribution of means or the distribution of 90th percentiles.

When oocyst data are reported there are two pieces of information that are needed. In order to correctly interpret the data, an analyst needs to know the concentration and the counts (or counts and volumes of water analyzed). Without both the information is compromised in its usefulness since it is not possible to make sense of data reported only by the concentrations and with "less then detections." The idea of sensitivity introduced by Price is important in this regard, or perhaps the "unit" of analysis (one over the effective volume) could be reported and would be better term.

Dr. Stedinger summarized three purposes for ICR data:

    1. Characterization of a particular site.

    2. Characterization of region or nation (requires a probability density function and distribution concentration for Crypto.)

    3. Basis of Monte Carlo simulation of regulatory procedures (requires a probability density function and distribution concentration for Crypto.)

Further Participation of Panelists

It was suggested that the expertise of the six panelists may be needed in this and other deliberations on an ongoing basis.

4.2 List of Comments, Issues, and Possible Action Items

List of Specific Comments and Issues

  • Evaluate whether and when to use the 90 percentile and when to use the continuous mean; for example, after recovery adjustment.

  • How should the 90th percentile be calculated?

  • Confidence intervals around mean and percentiles should be included.

  • Statistical implications should be differentiated from policy when using percentiles or means.

  • Need to address whether to use 12 or 18 months of data and whether to adjust for replication of the two seasons and single year data from other two seasons.

  • Suggest that the estimates of regional distributions be computed. It is difficult to estimate national distributions of concentrations.

  • Estimate regional distributions by source, water body type, and time of year.

  • The estimate of recovery rate and its distribution is critical is critical.

  • Use modeling to develop national distributions and interrelationships between variables.

  • Break out variables by source water, time of year, and water body type.

  • For future studies: specify that a minimum volume has to be sampled.

  • Use empirical data to look at recovery rates and laboratory variation.

  • Use regression modeling on Spiking Study recovery rates and use graphics to show variation.

  • Do not assign values to samples that counted zero (i.e., count zeros).

  • Develop questions on what variables we should look at and what variables are important to evaluate.

  • In the future, it would be desirable for the method to specify a constant volume and a constant effective volume, to improve the comparability of results and to ease statistical analysis.

Other Comments

  • The 90th percentile of 18 monthly sample values appears to be biased high relative to the 90th percentile for the true concentrations in the bulk water being sampled.

  • The 90th percentile may be too high when we do not have even distributions and when we multiply by recovery factors.

  • How should we correlate Cryptosporidium to different variables? Will modifying the oocyst data by recovery factors diminish ability to conduct correlations with other water quality parameters? If this does, should we use raw data? We must look at correct data in relation to variables and note when we dont have accurate correction factors.

  • We may want to look at the distribution of Cryptosporidium across the nation and characterize extremes by looking at the high end of the distribution.

  • There is a need for more of this kind of expertise on an ongoing basis throughout the M/DBP process

Questions with Sampling and Suggested Vocabulary

  • Lab Spiking Study: Is there a random distribution of oocysts in the concentrated pellet? Is their random variability/heterogeneity in these samples?

  • How should false positives be addressed?

  • What is the preferred way to describe the distribution of occurrence?

  • Should we use Monte Carlo or other modeling techniques?


Safewater Home | About Our Office | Publications | Links | Office of Water | En Español | Questions and Answers

 
Begin Site Footer

EPA Home | Privacy and Security Notice | Contact Us