Confidence Limits for Linear Functions of the Normal Mean and Variance
B.F. Lyon and C.E. Land
Home Download Run Online
General     Lognormal Mean
Program Verification  

What is it?

 

A program to calculate upper, lower, or two-sided confidence limits for linear functions of the mean and variance of a normal distribution. For Y normally distributed with mean m and variance s2, the program calculates confidence limits for m + l s2, where l is a specified constant, based on the usual estimates of m and s2

 

What is it good for?

 

The best known application is probably that of determining confidence limits for the mean of a lognormal distribution. In particular, if X is a lognormal random variable, with Y = log(X) ~ N( m, s2 ) [i.e., the natural logarithm of X has a normal distribution with mean m and standard deviation s], then

Mean of X = exp(m + 0.5 s2)

 

and confidence limits for mean (X) are obtained by taking the exponentials of the corresponding confidence limits for m + 0.5 s2.

In the simplest case, where the data are a random sample of n observations on X, m is estimated by the sample mean of the log observations, which corresponds to a single observation from a normal distribution with mean m and variance s2/n, and the sample variance of the same observations corresponds to s2/(n-1) times a single observation from a c2-distribution with n-1 degrees of freedom. The "lognormal mean" option is for this case.

A more complex example is one in which the mean of

Y(z) = log(X(z))


is linearly dependent upon a regression variable z, e.g.,


mean(Y(z)) = a + b z,


where both a and b are unknown parameters, and the data set comprises n paired observations, on (zi, X(zi)) for known z1,...,zn. In this case, the parameters a, b, and s2 are estimated by linear regression Y on z. For a particular value z0, the expected value of X(z0) is


exp( a + b z0 + 0.5 s2/n ),

 

the point estimate of the mean of Y(z0) corresponds to a single observation from a normal distribution with mean


a + b z0

and variance


{1/n + (z0 - mean(z))2 / Sum(zi - mean(z))2} s2 ,


and the sample variance corresponds to s2/(n-2)  times a single observation from a c2 distribution with n-2 degrees of freedom. The "general" option corresponds to this and more complex cases.

The lognormal distribution is the only one whose mean can be expressed as a function of a non-trivial linear combination of m and s2 (Land 1971). However, linear combinations arise in other contexts. For example, it can also be used to determine confidence limits for the lognormal mode and other moments about 0, since

 

Mode X = exp(m - s2)

 

and

 

the kth  moment about zero of X = exp(km + k2 s2/2)

 

The lognormal distribution arises in the analysis of data in many different areas, including epidemiology, biology, and environmental engineering.

Other applications arise from calculating approximate confidence limits for variates that can be transformed to normality via a differentiable function. If Y = f(X) is  normally distributed with mean m and variance s2, then the mean of f-1(Y) is a known function of  m and s2; for example,

Square root transform: Mean Y2 = m2 + s2

Cube root transform: Mean Y3 = m3 + 3 m s2

Arcsine square root transf.: Mean sin2(Y) = 2 (1 - cos(2 m) exp(-2 s2))

 

In these cases, methods for calculating exact confidence limits for the mean may not exist; however, one can approximate the mean by a linear function of m and s2, for which the exact confidence limits can be calculated.

 

What’s so complicated about the problem?

 

Confidence intervals for m + l s2 are based on the family of uniformly most powerful unbiased tests for the null hypothesis that m + l s2 = 0 or, equivalently, that m/s2 = -l, vs. one-sided or two-sided alternatives. Confidence intervals are obtained by translating (adding or subtracting) all of the original, normally-distributed observations (or, equivalently, just the estimate of m) by a constant value. The confidence interval for m + l s2 is the set of translation constants for which the null hypothesis is not rejected, using the translated data, at the designated significance level.

 

The test statistic is the familiar Student’s t statistic, calculated as the translated estimate of m divided by its estimated standard deviation.  For l = 0, the null distribution of the test statistic is that for Student’s t test, and this null distribution, which is conditional on a weighted sum of the estimate of s2 and the square of the translated estimate of m (Land, 1971), depends only on the number of degrees of freedom for estimating s2, which is of course invariant under translation of the data. For l ¹ 0, on the other hand, the null distribution of the test statistic depends upon the value of the weighted sum as well. This means that different critical values have to be calculated, either from published tables or from scratch as is done by our program, for different data sets and for each translation constant.

 

The original methods for calculation of these confidence limits were developed by Land (1971). At their heart, these methods boil down to a root-solving problem involving modified Bessel functions of the third kind and incomplete Bessel functions. Tables have been published, but their use is often tedious, requiring repeated interpolation and calculation. An unpublished Fortran program was available in 1987. The program available here, based in part on the 1987 program, represents a significantly faster and more stable implementation of the basic methods.

 

 


Home Download Run Online
General     Lognormal Mean
Program Verification