Confidence Limits for Linear Functions of the
Normal Mean and Variance
|
Home | Download | Run Online
|
Program Verification |
What is it?
A program to calculate upper,
lower, or two-sided confidence limits for linear functions of the mean and
variance of a normal distribution. For Y normally distributed with mean m and variance s2, the program calculates confidence
limits for m + l s2, where l is
a specified constant, based on the usual estimates of m and s2
What is it good for?
The best known application is probably that of determining
confidence limits for the mean of a lognormal distribution. In particular, if X
is a lognormal random variable, with Y = log(X) ~ N( m, s2
) [i.e., the natural logarithm of X has a normal distribution with mean m and standard deviation s], then
Mean
of X = exp(m +
0.5 s2)
and confidence limits for mean (X)
are obtained by taking the exponentials of the corresponding confidence limits
for m + 0.5 s2.
In the simplest case, where the data are a random sample of n
observations on X, m is estimated by
the sample mean of the log observations, which corresponds to a single
observation from a normal distribution with mean m
and variance s2/n, and the
sample variance of the same observations corresponds to s2/(n-1) times a single observation from a c2-distribution with n-1 degrees of freedom.
The "lognormal mean" option is for this case.
A more complex example is one in which the mean of
Y(z) = log(X(z))
is linearly dependent upon a regression variable z,
e.g.,
mean(Y(z)) = a + b z,
where both a
and b are unknown parameters, and the
data set comprises n paired observations, on (zi,
X(zi)) for known z1,...,zn. In this case, the parameters a, b,
and s2 are estimated by
linear regression Y on z. For a particular value z0,
the expected value of X(z0)
is
exp( a
+ b z0 + 0.5 s2/n ),
the point estimate of the mean of Y(z0) corresponds to a single observation from a normal distribution with mean
a + b z0
and variance
{1/n + (z0 - mean(z))2 / Sum(zi - mean(z))2} s2 ,
and the sample variance corresponds to s2/(n-2) times a single
observation from a c2 distribution
with n-2 degrees of freedom. The "general" option corresponds
to this and more complex cases.
The lognormal distribution is the only one whose mean can be expressed as a
function of a non-trivial linear combination of m
and s2 (Land 1971).
However, linear combinations arise in other contexts. For example, it can also
be used to determine confidence limits for the lognormal mode and other moments
about 0, since
Mode
X = exp(m - s2)
and
the kth moment about zero of X = exp(km + k2 s2/2)
The lognormal distribution arises
in the analysis of data in many different areas, including epidemiology,
biology, and environmental engineering.
Other applications arise from calculating
approximate confidence limits for variates that can
be transformed to normality via a differentiable function. If Y = f(X) is normally
distributed with mean m and variance s2, then the mean of f-1(Y) is a known function of
m
and s2; for example,
Square
root transform: Mean Y2 = m2
+ s2
Cube
root transform: Mean Y3 = m3
+ 3 m s2
Arcsine
square root transf.: Mean sin2(Y) = 2 (1 - cos(2 m) exp(-2 s2))
In these cases, methods for
calculating exact confidence limits for the mean may not exist; however, one
can approximate the mean by a linear function of m
and s2, for which the exact
confidence limits can be calculated.
What’s so complicated
about the problem?
Confidence intervals for m + l s2
are based on the family of uniformly most powerful unbiased tests for the
null hypothesis that m + l s2 = 0 or, equivalently, that m/s2 = -l, vs. one-sided or two-sided alternatives. Confidence intervals
are obtained by translating (adding or subtracting) all of the original,
normally-distributed observations (or, equivalently, just the estimate of m) by a constant value. The confidence
interval for m + l s2 is the set of translation constants
for which the null hypothesis is not rejected, using the translated data, at
the designated significance level.
The test statistic is the familiar
Student’s t statistic, calculated as the translated estimate of m divided by its estimated standard
deviation. For
l = 0,
the null distribution of the test statistic is that for Student’s t test,
and this null distribution, which is conditional on a weighted sum of the
estimate of s2 and the
square of the translated estimate of m
(Land, 1971), depends only on the number of degrees of freedom for estimating s2, which is of course invariant
under translation of the data. For
l ¹ 0, on the other hand, the null distribution of the test statistic depends upon
the value of the weighted sum as well. This means that different critical
values have to be calculated, either from published tables or from scratch as
is done by our program, for different data sets and for each translation
constant.
The original methods for calculation of these confidence limits were developed by Land (1971). At their heart, these methods boil down to a root-solving problem involving modified Bessel functions of the third kind and incomplete Bessel functions. Tables have been published, but their use is often tedious, requiring repeated interpolation and calculation. An unpublished Fortran program was available in 1987. The program available here, based in part on the 1987 program, represents a significantly faster and more stable implementation of the basic methods.
Home | Download | Run Online
|
Program Verification |