NOAA Logo

NOAA/PMEL/OERD

Report of Error Analysis for the T-Phase Locations
Computed by the Least-Square Technique*

T-K Andy Lau

Revised March 17, 1998

Purpose of the Error Analysis

The error analysis will provide an indirect method to compute the standard error of the T-Phase locations based on several assumptions. An indirect method is needed because there are not sufficient Degrees of Freedom (df) to calculate the standard errors using a least-square procedure. The procedure will solve for 3 unknown variables: Latitude, Longitude and the origin time of the T-Phase location. Currently there are only 6 hydrophones available for event detection, making the df to be at most 3 ( = 6 observations - 3 variables ). A df<3 or even df=0 will occur because the same event may not be recorded by all the hydrophones. With a small or 0 df, the standard errors computed from the least-square procedure will not give very accurate results or the errors will be very large. This method can provide an improved estimate of error in each variable over the entire field. A note of terminology, a T-Phase location is equivalent to an earthquake epicenter.

Formulation for Computing the T-Phase Location by a Least-Square Method

The Nonlinear Least-Squares Method for estimating the T-Phase origin is as follows:
Minimize Residual = SUMall i Wi( Ai - Bi )2
where Wi = Weight and 1 can be used for unweighted calculations,
Ai = Recorded arrival time i, Bi = Oi + Di/Si,
Bi = Estimated arrival time i, Oi = Estimated Origin time i, Di/Si = Travel time,
Di = Distance between Hydrophone location i and the assumed T-Phase origin in (latitude,longitude), and
Si= Sound Speed i between the path of the hydrophone and the assumed T-Phase origin.

The Wi can be computed using the travel times: Di/Si as follow
wi = 1/travel timei = Si/Di and Normalized it by Wi = (Si/Di) /max( wi )
so that the hydrophone with the shortest travel time or the first arrival time will receive greater weight. Its weight will be 1 and the rest will be < 1 but > 0. The Di's are the shortest distances between 2 points on earth, computed by a geodesics algorithm with a spheroidal earth assumption. The Si's are computed from a GDEM data base provided by the U.S. Navy.

The input parameters are arrival times, hydrophone locations, an initial guess of the T-Phase location and the origin time. The estimators or output paramters latitude, longitude and the origin time of the T-Phase event will be calculated by the Gradient-Expansion (Marquardt) algorithm , a least-squares method. This method combines the best features of the gradient search with the method of linearizing the fitting function.

Calculations of Errors

The objective is to obtain sampling distributions of the estimated variables: Latitude, Longitude and Origin Time. This is done by simulating a sampling experiment many times, and calculating the estimators each time. This builds up the distribution of the estimators, to show how close they are to the true values (Wonnacott 1979).

In this study, before the experiment, a reference point in term of latitude and longitude as a T-Phase origin will be selected. Its origin time will be set to 0, i.e., the arrival times will be same as the travel times. The hydrophones' locations will be defined. They will be used to compute the sound speeds and arrival times. Then the experiment will be as follows: Random Errors with a normal distribution will be added to either sound speeds, arrival times, or both; weights will be calculated from the arrival times if it is asked for; the least-square procedure is applied to determine the origin; and save the results.

The experiment is repeated, e.g. N=100 times per simulation. Let xi = the computed variable: either latitude, longitude, or origin time, X = average of xi for i=1 to N, and T = the true value of either latitude, longitude, or origin time, the following statistics will be calculated:
The Observed Bias = X - T which shows how close the computed value is to the true target. For example, Observed Bias of Latitutde = Average of the computed latitudes - True latitude.
The Observed Mean Square Error (MSE) = [SUMall i ( xi - T )2]/N
which measures on average how far X is off its target. It also measures the square-root of the MSE = Bias Root Mean Square (RMS).
The Observed Variance = [SUMall i ( xi - X )2]/N
which indicates how much X fluctuates. Also, the square-root of Observed Variance = Standard Error. This will be the result for the study. The Standard Error will be accepted if the Observed Bias is small and the MSE is very close to the Observed Variance.

Error Analysis

The error analysis is performed via Monte Carlo simulations using normally distrubuted random numbers to represent measurement errors. These are added to either the sound speeds, arrival or travel times, or both. The distribution of these errors will influence the results of error analysis; therefore, the standard deviation (STD) of the error distribution should be set as realistic as possible. This is done by using events with a known origin as the calibration.

A computer model which includes the least-square procedure and the error calculations is constructed to accept the following parameters: number of hydrophones to be used, their geographic locations, sound speed factor and season, number of reference or study points and their geographic positions to be used as T-Phase event origins, number of experiments to be run at each reference point, and random seeds for a normally distributed random generator with mean = 0 and the adjustable standard deviation.

Note that normally distributed random eorrors are used for the study here. Since the event distribution from any given T-Phase origin is unknown in general, a normal distribution is an arbitrary or more a neutral choice of assumption.

Calibration

Loihi is a seamount located at approximately 18.9oN and 155oW. It is off the SE coast of the big island of Hawaii. Continuous eruptions were recorded by the Hawaii Volcano Observatory from Loihi between Julian days 199 to 210, 1996. They were recorded by the NOAA hydrophones as well as other seismic stations in Hawaii. From all the events recorded only by the 5 NOAA hydrophones (for their locations, see Figures 2, 3, & 4 except the one at 0o, 95oW), 1653 of them were chosen. All their locations were computed and are shown in Figure 1.

Scatter plot of the Loihi events.
Figure 1: Loihi T-Phase Events. The rec cross: + is the average position
(18.978oN, 55.574oW) computed from the 1653 locations.

Assuming a common point of origin, the scatter of the T-Phase locations in Figure 1 gave the following statistics:

Average

Standard Deviations

Latitude

18.978oN

0.1128o

Longitude

155.574oW

0.3601o

The same statistics for the origin time cannot be independently extracted from the Loihi events. Note that the statistics above were computed from the NOAA's hydrophone data which are independent of the results from the Loihi web-site.

Assuming the Standard Deviations from the computed Loihi locations are representive of the location error at that site, simulations were run to determine what arrival time measurement error should be used to obtain approximately the same Standard Deviation from the Loihi statistics, i.e., determine the standard deviation: STD of the error distribution for the random number generator.

Different standard deviations (0.6, 0.65, 0.7, 0.75, & 1 for STD) were used for the random number generator to produce numbers as errors, and these errors were added to the arrival times. The arrival times are the origin time (which is 0 in this case) plus the travel times from the Loihi location: (18.92oN,155.25oW) to the same positions of the 5 hydrophones used to locate the Loihi events. Both arrival times and the errors were in seconds. No errors were added to the sound speeds. The assumption is the sound speeds will not change significently through 11 days time over a contant path through a ocean. For each simulation of STD, 100 experiments were run and both unweighted ( when all weights were set to 1, i.e. an equal weighted ) and weighted conditions were applied. They both gave a very similar results as shown in the following table where the values inside the parentheses () are the weighted results:

STD Standard Errors
Latitude Longitude
0.60 0.089 (0.089) 0.365 (0.369)
0.65 0.095 (0.096) 0.403 (0.400)
0.70 0.103 (0.104) 0.431 (0.431)
0.75 0.111 (0.111) 0.459 (0.462)
1.00 0.147 (0.148) 0.619 (0.615)

STD = 0.75 was selected as a conservative choice. It gave the standard errors in degrees of 0.111 for the latitude and 0.459 for the longitude. They are higher than the Loihi statistics, especially the longitude.

Results from the Simulations

Using normally distributed errors with mean=0, and standard deviation (STD)=0.75 added only to the predicted arrival times, two simulations were performed. They cover the NE and SE of the Pacific Ocean with 6 hydrophones positioned at the 2 meridians 110oW and 95oW, spread 8 degrees both North and South from the equator to form a rectangular array on the map as shown in the following 3 figures. Weighted Error Analyses were also run and the weighted results were not signiticently different from the unweighted results.

For all 3 figures below, weighted and unweighted results can be switched by clicking the pictures. When click on the area inside the square marks ( the hydrophones ), the enlarged view of the area will be shown.

Standard Errors of Latitudes
Figure 2: The contour lines are in minutes, 1o=60 minutes

Standard Errors of Longitudes
^Figure 3: The contour lines are in minutes, 1o=60 minutes.^

Standard Errors of the Origin Times
^Figure 4: The contour lines are in seconds.^

For all 3 figures above, the simulations did not include the gulf of Alaska. The area to the west of the Washington and Oregon coast lines are blind spots for the hydrophones. The abrupt contours at between -50o &-40o latitudes and -180o longitude were caused by the Islands of New Zealand and Chatham.

Figures 2 and 3 show the contours of standard errors in minutes for latitudes and longitudes when computing the T-Phase event origins. For example, if an event is located at 20oN and 150oW, the standard errors for this position will be approximately 6 minutes in latitude and 20 minutes in longitude, i.e., the ranges of the computed location will be 20oN +/- 6 minutes and 150oW +/- 20 minutes. By the Empirical Rule in statistics, these are the approximately 68% confidence intervals. For 95.5%, double the errors, e.g., 20oN +/- 2x(6 minutes).  For 99.7%, triple the errors. Again this is assuming the error of arrival times is +/- 0.75 seconds and the error distribution is normal with the mean = 0 and standard deviation = 0.75.

The positions of the 6 hydrophones were the same locations for the NOAA's deployed hydrophones. This hydrophone geometry is not an ideal orientation. These positions were used because the deployment and recovery of hydrophones were done as a courtesy by the NOAA ship KA'IMIMOANA for the TAO array project. The hydrophone at 0o and 95oW was not operational after the deployment during the period Julian Days 126 to 297, 1996. The other 5 worked and recorded the Loihi events and others.

Hydrophone Geometry

As the indicated before , the hydrophones' positions at figures 2 to 4 are not ideal , and they can, for some cases, create a local solution (local minimum) for the least-squares procedure when computing the T-Phase event lcoation. For example, using the same hydrophone geometry as shown in figure 2, let  26.5o S and 129.5oW be an origin, this will create a local minimum at 7.61oS and 109.515oW. The local minimum problems were solved by using 2 starting points: one inside and other outside of area formed by the hydrophones. Two locations were computed and the one that gave the smallest residual would be chosen as the final answer. The weighted simulations with the same settings as the unweighted created 2 sensitive locations at (14.5oS,104.5oW) and (42.5oS,166.5oW). A stopping tolerance=1E-10 was needed in order for the least-squares procedure to converge. For the rest of area, tolerance=1E-5 was enough to reach the results.

Other problems affecting the ability for the hydrophones to locate in general are shown in Figure 5. The marks: * show the problem spots because they are far away from the hydrophones.

Unsatisfactory Hyrdophones' Geometry
^Figure 5^

A better setting of the hydrophones will be a Hexagon orientation. However, no matter how good the hydrophone geometry will be, they will always have a problem to locate event origins in some instances specially at a far distance, e.g., 10 or 20 degrees away, see Figure 6 below.

Better Hydrophones' Geometry
^Figure 6^

In any case, hydrophone geometry plays an important role for getting better results. In general, the hydrophone orientation should be positioned so that no more than 2 hydrophones and the T-Phase event origin can be collinearly along the T-Phase travel path.

The spacing between the hydrophones will depend on the coverage or study area of the interest. They should be directly correlated. Other research papers (Anderson 1982) & ( Bratt and Bacho 1988) also mentioned the similar problem of the geometry.

Other Results from the Error Analysis

In general, when more hydrophones are available for computing the T-Phase event locations, the results are better than the ones calculated by fewer hydrophones. The following numerical examples show the improvement of the Standard Errors from 2 locations computed by 3 to 6 hydrophones. The 2 locations are P1=(4oS , 109oW) or P1=(-4o,-109o) and P2=(-10o,-117o). All weights were set to 1. Their starndard errors are nonincreasing as number of hydrophones increase.

Number of Hydphones Used Standard Error of P1 Standard Error of P2
Laitutde Longitude Origin Time Latitude Longitude Origin Time
3 0.01972 0.05535 1.3698 0.07507 0.1388 10.818
4 0.007243 0.008782 0.4254 0.06154 0.1054 8.4663
5 0.006744 0.007552 0.3618 0.04344 0.05920 4.9397
6 0.006519 0.006554 0.3747 0.03981 0.05331 4.4831

In P1 the Standard Error of Origin Time computed by 5 hydrophones caluclation is 0.3618 < 0.3747 that computed by the 6 hydrophones. This may be due to the combinations of the bad hydrophone orientation, random probability, and/or numerical round-off errors. The calculations of the statistics above again used STD=0.75 for the error distribution. Errors were added to arrival times, and 100 experiments were run for each simulation. The locations of the hydrophones are: (0o,-110o), (0o,-95o), & (-8o,-95o) for the 3 hydrophone setting; the same 3 locations above plus (-8o,-110o) for the 4 hydrophones; (8o,-110o), (0o,-110o), (-8o,-95o), (8o,-95o) &  (-8o,-95o) for the 5 hydrphones; the same locations in the 5 hydrophones' setting plus (0o,-95o) for the 6 hydrophones' setting.

Summary

An error analysis has provided estimates of the standard error of T-Phase locations derived from the NOAA/PMEL hydrophone array as shown in Figures 2, 3, and 4 with the following assumptions:

It is assumed that the random errors from a normal distribution are a good approximation of the real-time errors. The standard deviation: STD computed from the calibration, used for the error distribution is assumed to be a realistic one. If data for calibration are not available, STD can be set to a value of best educated guess. Bear in mind that the results are inferenced by the STD. The results should be used with care.

The calculations of the sound speeds are based on 3rd season: July to September, and the speeds are modify by the correction factor: 0.998. No errors are added to the sound speeds for this study. It is assumed that the errors of the sound speeds are negligible and can be ignored. The 6 hydrophones with the same locations shown in Figures 2 to 4 are used for the study. If any one of these assumptions is altered, the experiment must be repeated to reflect the changes.

Once the standard errors are calculated, they are in the form of 2-D arrays so that contour plots can be made. The standard errors can be fitted by a 2-D polynomial function so that the coefficients of the function can be saved and incorporated into other computer programs for interpolating the standard errors without searching the results from the contour plots in Figures 2, 3, and 4.

One must understand these results are calibrated by only 1 event for the whole study area, the East Pacific Ocean. Statistically speaking, the results are inferenced beyond the scope of the data. Until more data are available, this is one way to estimate the errors. Therefore, these results must be used with care.

Discussion

For this study, selecting STD for the normally distributed random numbers as errors is the central part of how to compute the standard errors. The normality assumption is a neutral choice to use since the event distribution from any given T-Phase origin is unknown in general. The problem caused by small d.f. is not new. Seismologists have been dealing with this problem for long time. Their solutions are applying prior information to the mean square error of the residual: s2
( Evernden 1969), (Bratt and Bache 1988),( Jordan and Sverdrup 1981) or to the scaling factors for the correction values of the estimated parameters (Pavlis 1986).

This study is no different from the previous research. Calibration is used to insure that the selection of STD is realistic. This approach may be outside of the traditional statistical regression analysis.  However, the Least-Square method being used to solve for the T-Phase event origions here is not the typical setting in statistics either. In statistics, it observes a set of the data from a random sampling and establish rules of cause and effect or association between the explanatory variables and responese variables. This is not the case here. Even the nonlinear regression is employed to solve an optimization problem which in this study is minimizing the residuals; but it does not necessarily make it a statistical problem. Together with small d.f. which leads to difficulty of standard error interpretiation, alternative approach such as this study and others is understandable.

The open question here is: what is the minimum d.f. that is considered to be just large enough so that the standard errors can be computed directly from a regression model? Before address this question, one must understand statistical theory. The properties of the least square estimators are Not based on the large sample size asymptotics, so there is nothing wrong with performing regression analysis with very small sample size. Theoretically, the relevance of sample size, which is used to get d.f., is depended on the strength of the relationship, the size of the error about the reqression, the number of explanatory variables, and the question that is being asked of the reqression. Having said that, there are people still want the "minimum d.f." question to be answered; because, in practice the study like this one and other researches indicate the size of d.f. that will influence the results of the error estimations.

Since statistical theory cannot provide a general answer, one can only speculate that the answer may be at least 50 by judging thestatisitcal tables (because their d.f. usually are listed from 1 to 60 then 120 and infinitive) from most of the statistical text books. Two research papers encounted for this report had addressed the size of d.f. question. (Anderson 1982) indicated at least 3 or 4 observations per d.f. would be desirable. (James, et al 1969) suggested that at least 9 seismic stations for the events occur within the stations. If an event is outside the stations, 9 stations are not enough.

Acknowledgments

Thanks to Dr. Chris Fox for his support. Dr. Dan Schafer and the consulting service from the Oregon State University, Department of Statistics for their helps and suggestions. This work was part of the NOAA/PMEL/ OERD T-Phase research project.

References

Anderson, K. R. Robust earthquake location using M-estimates, Physics of the Earth and Planetary Interiors, Volume 30, pp.119-130, 1982.
Bratt S. R. and Bache T. C. Bulletin of the Seisomological society of America, Volume 78, Number 2, pp. 780-798, 1988.
Evernden J. F. Precision of epicenters obtianed by samll numbers of world-wide stations, Bulletin of the Seisomological society of America, Volume 59, Number 3, pp. 1365-1398, 1969.
James, D. E., Sacks I. S., Eduardo Lazo L., and Pablo Aparicio G. On locating local earthquakes using samll networks, Bulletin of the Seisomological society of America, Volume 59, Number 3, pp. 1201-1212, 1969.
Jordan T. H. and Sverdrup K. A. Teleseimic location techniques and their application to earthquake clusters in the south-central pacific, Bulletin of the Seisomological society of America, Volume 71, Number 4, pp. 1105-1130, 1981.
Pavlis G. L. Appraising earthquake hypocenter location errors: A complete, pactical approach for single-event locations, Bulletin of the Seisomological society of America, Volume 76, Number 6, pp. 1699-1717, 1986
Wonnacott R. J. and Wonneacott T. H. Econotmetrics, 2nd Edition, Wiley, pp. 221-223, 1979.

Presentation

Information of this report is part of the paper: Monitoring Pacific Ocean Seismicity from an Autonomous Hydrophone Array, by Christopher G. Fox, Haruyoshi Matsumoto, Tai-Kwan Andy Lau, published in the Journal of Geophysical Research, Volume 106, Number B3, Pages 4183-4206, March 10, 2001.

This report was presented on December 12, 1997 as a poster section at the AGU meeting in San Francisco, California, Location: Moscine Convention Center, Room number: MC, Paper number: T52B-17. The poster title was Error Analysis of Epicenter Locations Collected from the NOAA/PMEL Autonomous Hydrophone Array.

Back to Other Andy's Pages
[T-K Andy Lau] [T-Phase Project] [ Sidescan Project] [Publications]