USGS - science for a changing world

Upper Midwest Environmental Sciences Center

Home/ Overview/ Science Programs/ Data Library/ Products and Publications/States/ Rivers/Teachers and Students/ Links/ Contact/ Search
A-Team Cornerfolder.gifLong Term Resource Monitoring Program
  A Team Corner
  LTRMP Statistics

Estimating Variance Components using LTRMP Survey Data


Introduction

Variance composition analysis is often used to assess the proportions of variance attributable to random components within an experimental design setting. Such analyses are used to estimate the relative importances of variance components and may also be useful for power analyses. Variance components analyses using LTRMP data face a number of challenges, including the presence of fixed effects, strata, unbalanced designs, spatial and temporal correlation, skewed data, and nonproportional sampling. These issues are addressed below.

Fixed vs. Random Effects

Efforts to extend variance components analysis to data sets containing variances associated with both random and fixed effects must, at minimum, ensure that the interpretation of variance estimates from these two types of effects are not confused. Fixed effects refer to specific and selected effects, while random effects represent effects that are arguably interchangeable with a larger population of effects. Consequently, the variance estimates for fixed effects may more properly be termed pseudo or finite variances.

For the LTRMP, fixed effects include effects associated with field station, possibly spatial strata (these could also be viewed purely as restrictions on randomization), season, and their interactions. By contrast, components associated with annual sampling events are often treated as random. This is based on the assumptions that we have no a priori interest in particular years and that, apart from some specific model, sampled years are interchangeable with some larger set of years. (Given that most monitoring programs have short lifetimes and that years are not selected randomly, this assumption may be open to criticism.) Components associated with year include year and any interaction with year (e.g., year * season). Less commonly considered sources of variation may also be treated as random (e.g., backwater lakes within a given backwater stratum, sampling day within a multiday sampling period, observer effects) but are not considered further in this document.

Unbalanced Designs

Variance component estimates from unbalanced design are generally approximate. As the design underlying virtually any analysis of LTRMP data will be unbalanced, all LTRMP variance component estimates should be viewed as approximate. Exceptions may include when inferences are confined to a single stratum within reaches, and, for the macroinvertebrate component, year effects within single reaches (with strata effects ignored).

Parametric Methods for Estimating Variance Components

Likelihood-based methods of estimating variance components are well developed but, as popularly used, are appropriate only for estimating variance components from normally distributed data. For the LTRMP, only water quality data where missing (below detection) data are trivial in proportion or have been imputed will typically appear essentially normal. Data from other sources cannot be made normal without compromising the information contained in those data.

Parametric methods of estimating variance components using discrete data have traditionally been viewed as challenging for all but the simplest models (Searle et al. 1992). Recently, however, some advances have been made for estimating variance components for binary and binomial data (Snijders and Bosker 1999; Goldstein et al. 2002; Browne et al. 2005). However, implementation or interpretation using these methods will often be challenging, and particularly so in the presence of fixed effects, nonproportional sampling, stratification or spatial or temporal correlation. Corresponding advancements for count data appear unknown.

Spatial and Temporal Correlation

Spatial and temporal correlation, where present in LTRMP data, may be presumed positive and to occur at any spatial or temporal scale. For example, spatial correlation may occur among sampled observations and/or among strata-specific and/or field-station means, while temporal correlation may occur among observations from the same site (vegetation, Pool 8) and among annual strata- or field-station specific means. Absent evidence to the contrary, nontrivial spatial correlation among samples should be presumed for data from the vegetation and water quality components, and nontrivial spatial correlation among strata and reach means from all components. Similarly and absent evidence to the contrary, nontrivial temporal correlation should be presumed for site-specific data from the vegetation component in Pool 8 (years 2001 through 2004) and temporal correlation among strata and reach means from all biotic components. In all instances, not accounting for such correlation will lead to underestimation of one or more variance components. Unless data are either presumed normally distributed or are binomial and independent of strata, explicit adjustment for spatial and/or temporal correlation will be challenging. A caveat is that spatial correlation at the sampling scale may possibly be ignored if the purpose of the analysis relates solely to the design.

Nonparametric Methods

Where data are nonnormal and unbalanced, variance components analysis typically proceeds by equating observed mean squares terms with expected mean squares terms. This method presumes data and means are independent (i.e., not spatially or temporally correlated). Expected mean squares values may differ depending on whether they derive from fixed or random effects. This means that simply treating fixed effects as random effects, without appropriate adjustments, may yield incorrect estimates of the variances associated with the fixed effects.

Corrections for Nonproportional Sampling

If inference to the sampled population is desired, variance component estimation using data derived from designs that include nonproportional sampling, such as are used in the LTRMP designs, must weight observations by sampling weights (see Estimating Means and Standard Errors from LTRMP Survey Data; Courbois and Urquhart 2004 [p. 249]).

Will Variance Components Inferences Complement LTRMP Status and Trend Information?

Maybe. A major impediment to treating variance components inferences as complementary to status and trend information is that all three sets of estimates are derived using different methods and may be derived under different assumptions. Comparability among methods for both variance components and status and trend estimation may be poorly known. Another is that, for nonlinear data, variance at the sampling and aggregated (e.g., year) scales are typically presumed to vary on different distributional scales (e.g., for counts, on count and log scales, respectively). The relationship between variance components at these different scales may be complex. Inferences for data that are ostensibly normally distributed may be expected to be qualitatively complementary.

We note that Kincaid et al. (2004) published estimated variance proportions for proportion and count monitoring data. The focus of that study, however, was strictly monitoring—rather than the monitoring and research foci that typically characterize LTRMP analyses. In addition, the data used in that study did not derive from a stratified design and did not include fixed effects (i.e., represented a two-level system).

Multivariate Responses

The above comments were written with univariate outcomes in mind. Multivariate responses may be addressable using methods described by Borcard et al. (2004) or in references therein.

References

Borcard, D., P. Legendre, C. Avois-Jacquet, and H. Tuomisto. 2004. Dissecting the spatial structure of ecological data at multiple scales. Ecology 85:1826-1832.

Browne, W. J., S. V. Subramanian, K. Jones, and H. Goldstein, H. 2005. Variance partitioning in multilevel logistic models that exhibit over-dispersion. Journal of the Royal Statistics Society, Series A 168:599-614.

Courbois, J-Y. P., and N. S. Urquhart. 2004. Comparison of survey estimates of the finite population variance. Journal of Agricultural, Biological, and Environmental Statistics 9:236-251.

Goldstein G., W. Browne, and J. Rasbash. 2002. Partitioning variation in multilevel models. Understanding Statistics 1:223-232.

Kincaid, T. M., D. P. Larsen, and N. S. Urquhart. 2004. The structure of variation and its influence on the estimation of status: indicators of condition on lakes in the Northeast, U.S.A. Environmental Monitoring and Assessment 98:1-21.

Lohr, L. 1999. Sampling: Design and analysis. Duxbury Press Publishing Company, Pacific Grove, California.

Searle, S. R., G. S. Casella, and C. E. McCulloch. 1992. Binary and discrete data. Pages 367-377 in S. R., Searle, G. S. Casella, and C. E. McCulloch, editors. Variance components. John Wiley & Sons, New York.

Snijders, T. A. B., and R. J. Bosker. 1999. Multilevel analysis. Sage, London. 266 pp.

Contact: Further information about variance component analysis using LTRMP data may be obtained from Brian Gray, LTRMP statistician, Upper Midwest Environmental Sciences Center, La Crosse, Wisconsin, at brgray@usgs.gov.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://www.umesc.usgs.gov/ltrmp/variance.html
Page Contact Information: Contacting the Upper Midwest Environmental Sciences Center
Page Last Modified: October 2, 2007