STATISTICAL METHODS AND SOFTWARE FOR THE

ANALYSIS OF OCCUPATIONAL EXPOSURE DATA WITH

NON-DETECTABLE VALUES

 

 

Edward L. Frome

Computer Science and Mathematics Division

Oak Ridge National Laboratory

 

Paul F. Wambach

U. S. Department of Energy

 

Date Published:  September  2005

 

 

ABRIDGED HTML VERSION  CONTAINS: SECTIONS IN RED

 

Appendix Contains some errors in notation---see original PDF Version

 

 

CONTENTS

 

ABSTRACT ………………………………………………………………………………                            

 

1.  INTRODUCTION …….………………………………………………….…………..                       1        

 

2. STATISTICAL ANALYSIS FOR COMPLETE SAMPLES ………………………                       2

     2.1  CONFIDENCE LIMITS FOR THE MEAN EXPOSURE LEVEL ……..…. .                       3

     2.2  CONFIDENCE LIMIT FOR PTH PERCENTILE ……………………………                      3

     2.3  CONFIDENCE LIMITS FOR EXCEEDANCE FRACTION…………………                      4                     

 

3.  ANALYSIS OF DATA WITH NON-DETECTS ……………………………………                      5

     3.1  MAXIMUM LIKLIHOOD ESTIMATEION FOR LOGNORMAL DATA

            WITH NON-DETECTS ………………………………………………………….                      5

3.2    CONFIDENCE LIMITS FOR THE MEAN EXPOSURE LEVELS

       WITH NON-DETECTS ………………………………………………………….                      6

3.3    CONFIDENCE LIMITS FOR THE PTH PERCENTILE WITH

       NON-DETECTS …………………………………………………………………..                     7

3.4    CONFIDENCE LIMITS FOR EXCEEDANCE FRACTIONS

            WITH NON-DETECTS…………………………………………………………..                      7

3.5    NON-PARAMETRIC METHODS FOR SAMPLES WITH

       NON-DETECTS…………………………………………………………………..                      8

3.6    NON-PARAMETRIC UPPER TOLERANCE LIMIT AND

       EXCEEDANCE FRACTION……………………………………………………                       9

4.  APPLICATIONS………………………………………………………………………                      9

4.1  EXAMPLE 1.    SURFACE WIPE SAMPLES FROM ELEVATED ………..                     10

       SEMELTER SURFACES

4.2  EXAMPLE 2.  TWA BERYLLIUM EXPOSURE DATA ……………………..                    12

5.  DISCUSSION ………………………………………………………………………….                    15

6.  ACKNOWLEDGEMENTS …………………………………………………………..                    19

REFERENCES ……………………………………………………………………..……..                   20

APPENDIX …………………………………………………………………….………….                   22

 

ABSTRACT

 

Environmental exposure measurements are, in general, positive and may be subject to left censoring; i.e,. the measured value is less than a “detection limit.”  In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit.  Parametric methods used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level, an upper percentile, and the exceedance fraction are used to characterize exposure levels, and confidence limits are used to describe the uncertainty in these estimates.  Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations.  In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption.  If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on an upper percentile (i.e., the upper tolerance limit) is obtained using a nonparametric approach.

 

All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data.  The recent development of the R environment for statistical data analysis and graphics has greatly enhanced the availability of high-quality nonproprietary (open source) software that serves as the basis for implementing the methods in this paper.  Numerical examples are provided and R(2004) functions are available at the analysis of occupational exposure data  web site (AOED)  .

 

 

 

 

 

 

 

INTRODUCTION

 

Regulatory and advisory criteria for evaluating the adequacy of occupational exposure controls are generally expressed as limits that are not to be exceeded in a work shift or shorter time-period if the agent is acutely hazardous.  Exposure monitoring results above the limit require minimal interpretation and should trigger immediate corrective action. Demonstrating compliance with a limit is more difficult. 

The American Industrial Hygiene Association (AIHA) has published a consensus standard with two basic strategies for evaluating an exposure profile [Mulhausen and Damiano (1998)].  The first approach is based on the mean of the exposure distribution, and the second approach considers the “upper tail” of the exposure profile.  Statistical methods for estimating the mean, an upper percentile of the distribution, the exceedance fraction, and the uncertainty in each of these parameters are described.  Most of the AIHA methods are based on the assumptions that the exposure data does not contain non-detects, and that a lognormal distribution can be used to describe the data. Exposure monitoring results from a compliant workplace tend to contain a high percentage of non-detected results when the detection limit is close to the exposure limit, and in some situations, the lognormal assumption may not be reasonable.  There are parametric methods for censored lognormal data and non-parametric methods that can be used with left censored data to calculate all of the statistics recommended by the AIHA for the complete data case. However, the only practical way to compute these statistics is with statistical software.  The recent availability of free, high-quality statistical software means that complex calculations are no longer a barrier to the statistical analysis of almost any occupational exposure data set.  This also eliminates the need for special tables and graphical methods that are used in the complete data case for the lognormal distribution.

 

Statistical methods for the analysis of right censored data using various parametric and non-parametric methods are well known and generally referred to as “survival analysis” [Cox and Oakes (1984) or Kabfleish and Prentice (1980)].  In this situation, the dependent or response variable (say T) is usually time to the occurrence of event; i.e., the “survival time” (or time to failure) of an observational or experimental unit (e.g., animal, person, or machine).  T may be referred to as a “lifetime random variable” and is by definition positive, and may be subject to “censoring.”  As a typical example, let Ti represent the survival time of the ith patient in a clinical trial.  If the trial ends and the patient is not known to have “failed” the observed survival time, say , is right censored (i.e., it is only known that Ti is greater than ).  This can occur for several reasons.  Suppose, for example, that all patients enter the trial at the same time and are followed until a specified end date, then those individuals still at risk have a censored survival time that is the same for all surviving patients (Type I censoring).  If patients enter the trial at random and the trial ends at a fixed date, then the value of  is different for each surviving patient (random censoring).  Statistical methods for the analysis of right censored data are widely used and computer software for survival analyses is available in most general purpose statistical programs

(e.g., the R survival library).

 

In this report, the dependent or response variable of interest is the amount, say X, of a measured quantity.  X is a positive random variable and as the result of the analytic methods used, the observed value for the ith measurement may be reported as (left) “censored” and is referred to as a non-detect or as being less than a “detection limit”(DL) say  (i.e., it is only known that  is less than ).   A frequent assumption is that the distribution of X is lognormal . See Aitchison and Brown(1969) and  Crow and Shimizu(1988) for general treatment of the lognormal distribution and its application.  Schmoyer et. al. (1996) considered the lognormal model for contaminant concentrations in environmental risk assessment for both complete and left censored samples.  Akritas et. al. (1994) provide a detailed discussion of various methods that have been proposed for parameter estimation for left censored data.  Methods were classified into three general areas as described by Helsel (1990).  They are: (1) simple substitution; e.g., replace a censored observation with one-half of its value; (2) parametric methods; e.g., censored data maximum likelihood (ML), and (3) “robust parametric methods” based on variations of  “probability plot regression.”  Simulations studies have been done to compare these methods under various conditions with the primary focus on bias and mean square error of location and scale parameters [see Helsel and Cohen(1988), Newman et. al. (1989)].  Akritas et. al. (1994) have reviewed these and other studies, and note that ML methods under the lognormal model provide expressions for the variance of the parameter estimates.  This is important when an upper percentile, say Xp , of the exposure distribution is of interest.  For example, the ML estimate of log(Xp) is a linear combination of the lognormal parameters, and the standard error of this quantity can be estimated from the ML parameter covariance matrix.  Consequently, confidence limits for both Xp and the exceedance fraction can be obtained using the ML approach.   Taylor et. al. (2001) have noted that regarding all non-detected values as censored outcomes from a lognormal distribution may not always be appropriate.  If there is reason to believe that a non-negligible proportion of the non-detects are “true zero” exposures, then a censored lognormal mixture model (a zero-inflated lognormal model with censoring) should be considered.  ML methods for estimation and hypothesis testing are described and the relationship between ML parameter estimates from the mixture model and those based on either a left truncated or censored lognormal model are described.  Moulton and Halsey(1995) emphasize that it is also possible that non-detects may be from a second (possibly lognormal) distribution rather than a point mass at zero.  Fowlkes (1979) has described methods for studying the mixture of two lognormal distributions, although he did not consider left censored data.  In situations where the lognormal model (or some other distribution such as the gamma or Weibull) is not reasonable, a non-parametric approach can be used.  All of the methods just described can be implemented using R, the ML method being the most difficult.  It is also possible to develop procedures for all of these methods in several proprietary statistical programs that are available commercially.

 

In the discussion that follows the generic term “acceptable” refers to the situation where the distribution of the exposure measurement X satisfies a specified criteria indicating, for example, that the workplace is “safe,” or that the surfaces of a survey unit are “clean.”  The term survey unit describes all or part of an entity (e.g., building, piece of equipment) that is being evaluated.  The term “unacceptable” means that the distribution of the exposure measurements indicates that the workplace or survey unit is “contaminated” or “hazardous.”  The formal statistical procedures used to demonstrate that an exposure distribution is acceptable is to state a null hypothesis in the form H0 : q ³ L, where q represents a parameter of the exposure distribution (e.g., the mean, a percentile, or the exceedance fraction), and q ³ L indicates that the exposure distribution is unacceptable.  Then, based on a random sample from the exposure distribution, an estimate of q and an upper confidence limit (UCL) with a specified confidence level, say g, are calculated.  If the 100g%UCL is less than L, then the null hypothesis is rejected and the exposure distribution is acceptable.  A Type I error occurs if H0 is rejected when it is true (i.e., the X distribution is incorrectly considered to be acceptable).  This will occur with a probability (type I error rate) that is less than or equal to a = (1 - g), with a = 1 - g when q = L. These and other related procedures are described in detail in an occupational exposure context by Mulhausen and Damiano (1998).

 

 

 

4.  APPLICATIONS

 

In several situations of practical interest statistical analysis of left censored data from a lognormal distribution are required.  The "exact" results for complete samples described in Section 2 have not been developed for censored data.  The methods presented here are "large sample" results and follow directly from the properties of ML estimators described in Section 3.  Each of the examples will describe the censored data equivalent of the exact methods used with complete samples.  The emphasis here is on describing the methods and software.  The focus in the examples is two areas of application that are part of the Department of Energy (DOE) Chronic Beryllium Disease Prevention Program.  The DOE is concerned with monitoring objects (e.g., equipment, buildings) for beryllium contamination and workers for exposure to beryllium in the workplace.  The first example describes the results of a survey to evaluate possible beryllium “contamination” based on surface wipe sampling of a smelter facility used to recycle metal.  The second example describes the results of a beryllium worker-monitoring program using 8-hour TWA.  In both situations “limit values” have been established to determine if exposure levels are acceptable; i.e., the object is “clean” or the workplace is “safe.”  In general, the limit value will depend on the strategy that is being used as described in the introduction.  In the examples the null hypothesis of interest is that the 95th percentile of exposure distribution does not exceed the specified limit.  Hewett (1996) explains that occupational exposure limits (OELs) are generally single shift limits used for day-to-day risk management that will also constrain long-term, working lifetime exposures of each individual worker to protective levels.  OELs are based on health or toxicology studies that establish protective mean exposure levels.  A work environment that rarely exceeds the OEL will also maintain mean levels well below the OEL.  Day-to-day exposure prevention is achieved through investigations to determine cause and corrective actions for exposure measurements above the OEL. 

Ninety-five-percent confidence that fewer than 5% of measurements are above a specified limit is a statistical definition of compliance that has come into widespread use to determine the monitoring efforts needed to demonstrate compliance [see chapter 7, Mulhausen and Damiano (1998)].    In this situation the upper tolerance limit (i.e., the 95% UCL for the 95th percentile) and the UCL for the exceedance fraction are of primary interest (see Section 2.3) to determine if the exposure distribution is acceptable.  The exact results for samples from a lognormal (or normal) distribution described in Section 2 and the Appendices of Mulhausen and Damiano(1998)  are based on the assumption of complete samples; i.e., no left censored data.   The statistical methods and computer software for the analysis of left censored data described in Section 3 can be used to calculate the censored data equivalent of all of the statistics described by Mulhausen and Damiano (1998).  Details describing R and the R driver functions used to obtain these results are described in the Appendix and at the AOED website.  All of the R functions can be downloaded from the AOED web site. 

In the examples, results obtained using R interactively are shown in a monospaced font like this (where “>” is the R prompt).  To duplicate these results in the examples read the Appendix and then visit the AOED web site and complete steps 1-4.   Note that Exhibit 1 is listed at the end of readall.R at the AOED web site, and the data frame SESdata and character string IpSESdata will be in the R working directory.

4.1  Example 1.   Surface Wipe Samples from Elevated Semelter Surfaces

The data in Exhibit 1 are 31 surface wipe samples from elevated surfaces of a smelter with beryllium contamination.    Exhibit 1 illustrates one method that can be used to enter data into R in the format required for the ML estimation.   This would normally be done by using a text editor to create a file (example1.txt).  All characters on a line to the right of the # sign are comments.  The data is entered into the R working directory  using the R function source; i.e, if the file is in the directory(folder) where the R session was initiated, source (“example1.txt”) will input the vectors x, det, and the data.frame SESdata.

 

                     Exhibit 1 of Section 4.1

       #    Illustrates one method that can be used to enter

       #    data into R in the format required for ML estimates

       #

       #       Surface Wipe Samples from Smelter

            IpSESdata <- "SESdata:  Smelter-Elevated Surfaces "

       #    IpSESdata is Character string for Use by qqlognB()

       x <- (15,15,15,25,25,40,40,40,45,50,50,70,75,95,100,125,125,

            145,145,150,150,165,270,290,345,395,395,420,495,840,1140)

       x <- x/1000         # wipe samples micrograms per 100 cm^2

       det<- c(0,0,0,rep(1,28) )   # first three values are censored

       SESdata<- data.frame(x,det) # R data frame for mlndln()

       #

    ML estimates of µ, σ, log() , and σ2 are obtained using :

> mlndln(SESdata)

              mu     sigma       logE      sig2      -2Log(L) Conver

mle   -2.2907643 1.2760000 -1.4766777 1.6281796 -12.852885390      0

semle  0.2311395 0.1754489  0.3137301 0.4477474  -0.002005525     28

 

The R function mlndln is described in the Appendix and is available at the AOED website. The data in Exhibit 1 are shown graphically in Fiqure 1.  ML estimates of µ, and σ are shown in title of the plot.  To obtain Figure 1, use the following at the R prompt:

 

>qqlognB(SESdata,IpSESdata,L=0.2,unit = “mug/100 cm^2”,p=0.95,gam=0.95)

 

If equipment is being evaluated for release to the public or for non beryllium use the DOE has established a release limit for removable beryllium contamination of  L95 = 0.2  µg/100cm2.   The ML estimate of the

 

 

 

Figure 1.  Results for Surface Wipe Samples in Example 1.

 

exceedance fraction (see Section 2.3),  95% LCL, and 95% UCL  are obtained using R function efclnd based on the method described in Section 3.4;

 

> efclnd(SESdata,gam=0.95,L=0.2)

   f_MLE LCL_0.95 UCL_0.95

29.66864 19.45963 41.80762

 

The nonparametric estimates of  the exceedance fraction,  95% LCL, and 95% UCL are obtained using efclnp based on the method described in Section 3.5:

 

> efclnp(SESdata,gam=0.95,L=0.2)

     fnp   LCL_95   UCL_95

29.03226 16.06111 45.19044.

 

The exceedance fraction is an estimate of the percentage of surface area that is expected to exceed the release limit Lp = 0.2 µg/100cm2 with p =   0.95.   Both the point estimate and the UCL for F exceed Fo=100( 1-p)= 5%,  indicating that the equipment is not acceptable.  In fact, the 95% LCL indicates that at the 95% confidence level at least 19.5% of the surface area exceeds the release limit.   These lognormal based and nonparametric estimates of the exceedance fraction and 95% CLs are shown in the lower right of Figure 1 along with the lognormal based estimate of the 95th percentile Xp  = 0.825, the lower 95% CL = LX(0.95,0.95) = 0.446, and upper 95% CL= UX(0.95,0.95)= 1.526  (see Section 3.3).  The GM, GSD, ML estimates of the (arithmetic) mean of X (with  confidence limits) based on the lognormal model, the distribution free Kaplan-Meier mean (with confidence limits), the percent non-detects, the sample size (n), the number of detects (m),  R2  (as defined in Section 3.5), and the z value for the limit Lp are in the upper left corner of  Figure 1. 

 

All of the these summary statistics and a brief description of each can be obtained using  R function allss(dd,L,p,gam); e.g ,

 

> allss(dd=SESdata,L=0.2,p=0.95,gam=0.95)

           sstat                                             Sec

mu        -2.291 ML estimate of mean of y=log(x)        Sec 3.1

se.mu      0.231 Estimate of standard error of mu       Sec 3.1

   ...

Fnp_0.2   29.030 Nonparmetric estimate of F for limit L Sec 3.6

FnLCL_95  16.060 Nonparmetric estimate of LCL for  F    Sec 3.6

FnUCL_95  45.190 Nonparmetric estimate of UCL for  F    Sec 3.6

>

 

4.2  Example 2.  TWA Beryllium Exposure Data

 

As part of a chronic disease prevention program the DOE adopted an 8-hour TWA OEL limit value of 0.2 micrograms per cubic meter proposed by the American Conference of Government Industrial Hygienists (DOE 10 CRF Part 850 and ACGIH 2004).  Figure 2 summarizes the results of 280 personal 8-hour TWA beryllium exposure readings at a DOE facility.  This data contains 175 non-detects that range in value from 0.005 to 0.100 µg/m3.  This is an example of random (progressive) left censored data (available at the AOED web site in file beTWA.txt).  The q-q plot in Figure 2 is based on the PLE as described in Section 3.5 and was calculated using the R function plend(beTWA).  Figure 2 can be obtained using R utility function qqlognB.  To obtain Figure 2, use the following at the R prompt:

 

>beTWA <- read.table(“beTWA.txt”)

>qqlognB(beTWA,IpbeTWA,L=0.2, unit = “mug/m^3”).

 

 ML estimates of µ, σ, log() , and σ2 are obtained using :

> mlndln(beTWA)

              mu     sigma       logE      sig2      -2Log(L) Conver

 

mle   -5.1786787 1.5357165 -3.9994324 2.3585614 -2.175955e+02      0

semle  0.1340638 0.1155163  0.1485077 0.3548366 -8.918476e-03    105

>

The ML estimate of the 95-95 geometric upper tolerance limit is calculate using the results in Section 3.3 equation  (9), i.e.=  + z.95   -2.652 and 

var()

=

Var() + var () + 2zpcov(,)

 

=

0.13412 + 1.6452(0.1155)2 + 2*1.645(-.008918)

 

=

0.0247

 

Then, using equation 6 , UX(0.95,0.95) = exp[-2.652 + 1.659 (0.0247)1/2] = 0.091.  The nonparametric upper tolerance limit from Section 3.6 is 0.13.  Both estimates are well below the limit  L0.95 = 0.2 µg/m3 indicating that the workplace acceptable (in compliance). 

 

The lognormal based estimate of the exceedance fraction for L0.95  = 0.2 is 1.01%, and the 95% upper confidence limit Uf(0.2,0.95) = 3.24%  < 5% indicating the workplace is acceptable.  The non-parametric estimate of the exceedance fraction is 1.43% with 95% UCL = 3.24%.  All of the above results provide strong support for the decision that the workplace is in compliance.

 

Figure 2.  Results for 8-hour TWA Data in Example 2.

 


 

 

APPENDIX

 

R (2004) is available as Free Software under the terms of the Free Software Foundation's

GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.  Detailed documentation on all aspects of R is available at the R home page http://www.r-project.org/.  Sources, binaries, and documentation for R can be obtained via CRAN, the “Comprehensive R Archive Network”

(click on CRAN on the R home page menu ).  An Introduction to R and related manuals edited by the R Development Core Team are  provided under the “Manuals” link.  Additional manuals, tutorials, etc. are provided by users of R under the “Contributed” Link.   References are provide under the “Publications” link. [Venables and Ripley(2002)] is a highly recommended book on statistical data analysis using R.  All of the R functions discussed in this report and the data used in the examples in Section 4 are available at the website http://www.csm.ornl.gov/esh/aoed (AOED).  Most of the serious computing is done by R base functions optim and uniroot.   The R driver functions at AOED are provided to assist the reader that may not have experience with R.  They are not “formal” R functions, i.e. there is limited error checking and the usual type of R online “help” files are not provided.  Documentation for each function is provided in this report and as comments in the function.  All of the files at the AOED web site are ascii (txt) files and can be modified using any text editor (e.g. xemacs, wordpad, vi).   The most important functions with more detailed documentation are combined into one file oedmain.R (Exhibit 3).  Additional functions that reflect the authors’ interest and that may require revisions for other applications are also provided in the file oedutil.R (Exhibit 4).  Both of these are available at the AOED web site the AOED website.

 

In Section 4 several examples of the interactive use of these functions for the analysis of left censored data were provided.  These functions can be used for complete data sets by providing an input matrix with the data in column 1 and the censoring indicator for each data value (all equal to 1) in column 2.   The results will be the ML estimates.   For complete data sets when n is small the “exact “ results  described in Section 2 may be of interest.   The R function lnexact in Exhibit 2 illustrates the use of the functions (see Exhibit 3) extol and efcl for the exact analysis of complete samples. It is used here to provide an introduction to R (see below).  These functions are appropriate for complete data sets when the “upper tail” of the lognormal distribution is of interest (see Mulhausen and Damiano, 1998 Appendix VII).   In this situation the industrial hygienist picks an upper percentile Xp  (often the 95th percentile) and specifies a limit Lp (e.g., the OEL).   The value of p is the minimum proportion of the exposure distribution that must fall below the limit Lp for the exposure profile to be considered “acceptable.”  The exact 100g% upper confidence limit for the pth percentile Xp   is UX(p, g) = exp(+ K sy) and is referred to as the upper tolerance limit ( see Section 2.2). The exact 100g% lower confidence limit for Xp is LX(p, g) = exp(+ K΄sy).  The point estimate of the exceedance fraction FL and 100g% lower and upper confidence limits are calculate using efcl(x,gam,L) as described in Section 2.3.  The Sapiro-Wilk W statistic recommended by AIHA can be used on complete data sets using R function shapiro.test for n between 3 and 5000.

 

To duplicate the results that follow visit the AOED web site and complete steps 1-4.   The R working directory AIHD will contain all of the functions described in this report and data frames aiha (example data from Mulhausen and Damiano, 1998, page 259) and aihand (example data from Mulhausen and Damiano, 1998, page 244 with 3 non-detected values).  Note that aihand is used as the default data set in all functions that require a two column matrix or data frame (e.g., mlndln).  The input to lnexact(x,p,gam,L) in Exhibit 2 is a vector x of positive data  xi, i = 1,…, n ,  and the values of p, gam(γ), and L.  The first line is the function name and arguments.  Lines 2 –18 are comments that describe what the function does, the arguments, and the values returned.  All characters on a line to the right of the # sign are comments. In a formal R function this information is obtained from a “help” file by entering ? followed by the name of the function; e.g,  typing pt at the R prompt ( the symbol >) describes the probability density function for Student’s t distribution that is used by extol and efcl.  Line 20 describes the error check on line 21.   The mean and standard deviation of y = log(x) and sample size are calculated on line 23 ( note that the semicolon separates executable statements on the same line).   The next 3 lines calculate the point estimate and confidence limits for Xp  using extol to calculate the values of  K and K´.   Line 27 combines the three values into a vector xp and names it np.  Line 28 uses efcl to calculate the point estimate and confidence limits for the exceedance fraction, and the next line combines vector xp and fp into a data frame with names based on the input values of p, gam, and L. These functions can be revised using the R functions fix or edit, or by using a text editor (e.g., Notepad or XEmacs) to make a new file. 

 

Exhibit 2 in the Appendix

 

lnexact <-function(x=aiha[,1],p=0.95,gam=0.95,L=5){

#    Find point estimates and confidence limits for Xp the pth

#    percentile and the exceedance fraction F_L = Pr[ x > L]

#    for a  (complete) sample from a lognormal distribution

# USAGE: lnexact(x,p,gam,L)

# ARGUMENTS:

#     x is a vector of positive lognormal data

#     p is proportion that should fall below L

#     gam is the one-sided confidence level

#     L is the specified limit of interest

# VALUE: data.frame containing point estimates

#     of Xp and F_L in column 1

#     100gam% lower confidence limits in column 2

#     100gam% upper confidence limits in column 3

#      Xp    LX(p,gam)    UX(p,gam)

#      F_L   Lf(L,gam)    Uf(L,gam)

# NOTE: The combined confidence limits are an approximate

#       100*[1 - (1-gam)*2] Percent Confidence Interval

#

#     The next line is an example of "error checking"

 if( any(x) <= 0 ) stop("all data values must be positive")

# calculate mean(yb) and standard deviation(sy) of y=log(x)

yb <- mean(log(x)) ;  sy<- sd(log(x) ) ; n <- length(x)

xp <- exp( yb + qnorm(p)*sy  )        # point estimate of Xp

lxp <- exp( yb + extol(n,p,1-gam)*sy) # exact LCL for Xp

uxp <- exp( yb + extol(n,p,gam)*sy )  # exact UCL for Xp

xp <- c(xp,lxp,uxp)   ; nx <- paste("X",100*p,sep="")

fl <- efcl(x,gam,L,T) ; nf <- paste("F",L,sep="_")

out<-  rbind(xp,fl)

dimnames(out)<- list( c(nx,nf),c("Est","LCL","UCL"))

out

}


The following script demonstrates the use of lnexact

 

> #   Use data from Hewett and Ganser (HG) 1997 page 135 to 

> #   illustrate use of lnexact to obtain exact confidence  

> #   limits for Xp percentile and the exceedance fraction

> #   for complete samples from lognormal distribution the

> #   next line assigns the data to the vector variable xhg

>

> xhg<- c(4.25,1.38,3.11,2.20,2.82)

>

> # now use function lnexact with xhg as input

>

> lnexact(xhg,p=0.95,gam=0.95,L=5)

         Est    LCL.95   UCL.95

X95 5.145787 3.6328368 15.10336

F_5 5.744611 0.3795139 35.55304

>

> # HG results--- F_5= 5.7  LCL= 0.38  UCL= 35.55

> #           --- X95  not calculated

> 

> # Next use aiha[,1] to check results in Appendix VII

> # of  Mulhausen and Damiano, 1998 (DM)

> Ipaiha

[1] "Strategy for Assessing & Managing Occ Exp page 259"

>

>  lnexact(aiha$x,p=0.95, gam=0.95)

         Est       LCL       UCL

X95 4.842716 3.9015308  7.045908

F_5 4.241070 0.8570198 15.282684

>

>

> #  DM Appendix VII page 278-280 results UTL= 7.11  

> #  F_5= 4.4%   95% LCL = 2%   95% UCL  = 15%

> #   difference is due to graphical interpolation and

> #   rounding of the mean and SD of log(x)

> mean( log(aiha[,1]) ) ; sd( log(aiha[,1]) )

[1] 0.9079064 # rounded to 0.91

[1] 0.4070693 # rounded t0 0.41

>

> # DM Appendix VII page 277 mean= 0.91 SD = 0.41

>

 

The results from lnexact will agree with those from Odeh and Owen for any reasonable values of p, gam, and L. Note that if the value of L is changed to the 95% UCL for Xp

>

> #  change L to 7.046  the 95% UCL for X95

>

> lnexact(aiha[,1],p=0.95, gam=0.95, L=7.046)

              Est     LCL.95   UCL.95

X95     4.8427163 3.90153084 7.045908

F_7.046 0.5143435 0.02923116 4.999718

 

#  and the 95% UCL for F is 5% as expected

 


This shows the equivalance of the relationship between the upper tolerance limit and the exceedance fraction; i.e, with γ = 0.95 , p= 0.95,  L= 7.046, F0 =  5% the upper tolerance limit is  UX(0.95,0.95) =  7.04559 and the upper confidence limit  for the exceedance fraction is Uf(7.046,0.95)= 4.99972 

 

            H0:  Xp ³ L        reject if UX(p, γ) <  L         REJECT  H0

            H0:  FL ³ 1-p     reject if Uf(L, γ) < F0                REJECT  H0

 

Likewise, if the value  p = 1 - UCLF_5/100 = .8472 is used

 

> #  change p to  1 - 0.1528= 0.8472

>

 

> lnexact(aiha[,1],p=0.8472, gam=0.95, L=5)

           Est    LCL.95    UCL.95

X84.72 3.76199 3.1313547  5.000301

F_5    4.24107 0.8570198 15.282684

> #  and the 95% UCL for Xp is 5 as expected

 

This is equivalent to γ = 0.95 , p= 0.8472, L= 5, F0 =  15.29% , the upper tolerance limit is  UX(0.847,0.95)= 4.999157,  and the upper confidence limit  for the exceedance fraction is   Uf(5,0.95) =  15.282684 

 

            H0:  Xp ³ L        reject if  UX(p, γ) <  L         REJECT  H0

            H0:  FL ³ 1-p     reject if Uf(L, γ) < F0               REJECT  H0

 

 

Exhibit 3 in the Appendix oedmain.R

 

 

Exhibit 4 in the Appendix oedutil.R .

 

 

REFERENCES

 

Aitchison, J. and J.A.C. Brown, 1969, The Lognormal Distribution, Cambridge, U.K., Cambridge University Press.

 

Akritas, M.G., T. F. Ruscitti, and G.S. Patil, 1994.  Statistical Analysis of Censored Environmental Data, Handbook of Statistics, Vol. 12, G.P Patil and C.R. Rao (eds),  221-242, Elsevier Science , New York.

 

American Conference of Governmental Industrial Hygienists (ACGIH), Notice of Intended Change In: 2004 TLVs® and BEIs®, p. 60, ACGIH, Cincinnati, OH.

 

Armstrong, B. G., 1992.  “Confidence Intervals for Arithmetic Means of Lognormally Distributed Exposures,” American Industrial Hygiene Association Journal, 53(8), 481-485.

 

Chambers, J. M., W. S. Cleveland, B. Kleiner and P. A. Tukey, 1983.  Graphical Methods for Data Analysis, Duxbury Press, Boston.

 

Clopper, C. J. & Pearson, E. S. (1934). The Use of Confidence or Fiducial Limits Illustrated in the Case of the Binomial. Biometrika_, 26, 404-413.

 

Cohen, A. C,. 1991.  Truncated and Censored Samples. Marcel Dekker, Inc., New York.

 

Crow, E. L. and K. Shimizu, 1988.  Lognormal Distribution, Marcel Decker, New York.

 

Cox, D. R. and D. V. Hinkley, 1979.  Theoretical Statistics. Chapman & Hall, New York.

 

Cox, D.R. and  D. Oakes, 1984.  Analysis of Survival Data. Chapman and Hall, New York.

 

Department of Energy, 10 CFR Part 850, Chronic Beryllium Disease Prevention Program, Federal Register, Vol. 64, No. 235, 68854-68914, December 1999.

 

Fowlkes, E. B. 1979. “Some Methods for Studying the Mixture of Two Normal (Lognormal) Distributions,” Journal of the American Statistical Association, 74, 561-575.

 

Hahn, G.J. and W.Q. Meeker, 1991. Statistical Intervals.  John Wiley and Sons, New York.

 

Hewett, P. and G. H. Ganser, 1997.  “Simple Procedures for Calculating Confidence Intervals Around the Sample Mean and Exceedance Fraction Derived from Lognormally Distributed Data,” Applied Occupational and Environmental Hygiene, 12(2), 132-147.

 

Helsel, D.,1990. “Less Than Obvious: Statistical Treatment of Date Below the Detection Limit”,  Environmental Science and Technology, 24(12), 1767-1774.

 

Hesel, D.R, and T.A. Cohn ,1988, “Estimation of Descriptive Statistics for Multiply Censored Water Quality Data”, Water Resoureces Research, 24, 1997-2004.

 

Johnson, N. L. and B. L. Welch, 1940. “Application of the Non-Central t-Distribution,” Biometrika, 31(3/4), 362-389.

 

Kalbfleisch,  J.D. and  R.L. Prentice, 1980.  The Statistical Analysis of Failure Time Data.  John Wiley and Sons, New York..

 

Kaplan, E. L. and P. Meir, P.,1958.  “Nonparametric Estimation from Incomplete Observations,” Journal of the American Statistical Association, 457-481.

 

Land, C. E., 1972.  “An Evaluation of Approximate Confidence Interval Estimation Methods for Lognormal Means,” Technometrics, 14(1), 145-158.

 

Meeker, W.Q and L.A. Escobar, 1998. Statistical Methods for Reliability Data. John Wiley and Sons, New York.

 

Moulton, L.H. and N.A. Halsey, 1995. “A Mixture Model  with Detection Limits for Regression Analysis of Antibody Response on Vaccine,” Biometrics, 51, 1570-1578.

 

Mulhausen, J. R. and J. Damiano, 1998.  A Strategy for Assesing and Managing Occupational Exposures, Second Edition, AIHA Press, Fairfax, VA.

 

Neuman, M.C., P.M. Dixon,  B.B. Looney,and J.E. Pinder, 1989, “Estimating Mean and Variance for Environmental Samples with Below Detection Limit Observations, Water Resources Bulletin,25, 905-916.

Odeh, R. E. and D. B. Owen, 1980.  Tables  for Normal Tolerance Limits, Sampling Plans, and Screening, Marcel Deker, New York.

R Development Core Team (2004). R: A language and environment for statistical computing. R  Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-00-3, URL http://www.R-project.org.

Schmee, J., D. Gladstein and W. Nelson, 1985.  “Confidence Limits for Parameters of a Normal Distribution From Singly Censored Samples, Using Maximum Likelihood.” Technometrics, 27, 119-128.

Schmoyer, R. L., J. J. Beauchamp, C. C. Brandt and F. O. Hoffman, Jr., 1996.  “Difficulties with the Lognormal Model in Mean Estimation and Testing,” Environmental and Ecological Statistics, 3, 81-97.

 

Sommerville, P. N., 1958.  “Tables for Obtaining Non-Parametric Confidence Limits,” Annals of Mathematical Statistics, 29, 599-601.

 

Taylor, D. J., L. L. Kupper, S. M. Rappaport, and R. H. Lyles, 2001. “A Mixture Model for Occupational Exposure Mean Testing with a Limit of Detection”,  Biometrics, 57, 681-688.

 

Tuggle, R. M., 1982.  “Assessment of Occupational Exposure Using One-Sided Tolerance Limits,” American Industrial Hygiene Association Journal, 43, 338-346.

 

Turnbull, B. W., 1976. “The Empirical Distribution Function with Arbitrarily Grouped, Censored and Truncated Data,” Journal of the Royal Statistical Society, Series B (Methodological), 38(3), 290-295.

 

Venables, W. N. and B. D. Ripley, 2002.  Modern Applied Statistics with S,” 4th edition.  Springer-Verlag, New York.

 

Verrill, S. and R. A. Johnson, 1998.  “Tables and Large-Sample Distribution Theory for Censored-Data Correlation Statistics for Testing Normality,” Journal of the American Statistical Association, 83(404), 1192-1197.

 

Waller, L. A., and B. W. Turnbull, 1992. “Probability Plotting with Censored Data,”

The American Statistician, 46(1), 5-12.