roy troll border art
nefsc banner
Technical Memoranda Reference Documents Classic Publications Contract/Grant Reports
CMER Publications Series Information Links and Contacts Annual/Biennial Lists
Web Manager Email Search Publications Publications Home Site Map
CONTENTS
Introduction
Sample Description
Details of the Exercise
Methods
Results
Discussion
Conclusions
References
Northeast Fisheries Science Center Reference Document 01-13

Elemental Composition of Fish Otoliths:
Results of a Laboratory Intercomparison Exercise


by Vincent S. Zdanowicz

National Marine Fisheries Serv., 74 Magruder Rd., Highlands, NJ 07732
Current Address: U.S. Customs Serv., 7501 Boston Blvd., Ste. 113, Springfield, VA 22153


Print publication date September 2001; web version posted December 10, 2001

Citation: Zdanowicz, V.S. 2001. Elemental composition of fish otoliths: results of a laboratory intercomparison exercise. Northeast Fish. Sci. Cent. Ref. Doc. 01-13; 92 p.

get acrobat reader Download complete PDF/print version

Introduction

Within the past decade, otolith elemental analysis has become increasingly utilized in studies of fishery biology.  Based on the premise that differences in habitat chemistry are manifested in otoliths as differences in chemical composition, otolith elemental analysis has been used to address important questions in fishery research.  However, this application has been hindered by the lack of an adequate standard for analytical quality control.  As a newly emerging research technique, otolith elemental analysis requires further standardization between laboratories in order to optimize its usefulness.

Standardization can be achieved through the use of Certified Reference Materials (CRMs) and participation in intercomparison exercises.  A CRM is a substance, one or more of whose properties is sufficiently well characterized to be used for the calibration of an apparatus or the assessment of a measurement method.  Results of CRM analyses can be used to validate a laboratory’s analytical results.  They also provide a basis for comparing data generated at different laboratories or using different analytical methods.  At the time of this exercise, no otolith CRMs were available, although one has since been developed through a collaboration between the Western Australian Marine Research Laboratories and Japan’s National Institute for Environmental Studies (Yoshinaga, et al., 2000).

With the appearance of an increasing number of published studies and the absence of a suitable means of assessing the quality of their results, a laboratory intercomparison exercise was conducted in 1999 in an attempt to benchmark the status of otolith elemental analyses being conducted by investigators in the otolith research community.  In the absence of a suitable otolith CRM, participation in intercomparison exercises can provide participants with a basis for direct comparison of their results with those of other laboratories.


SAMPLE DESCRIPTION

Three samples were used for the exercise: RMRS1, a fish otolith powder; SRM915a, a powdered, high purity calcium carbonate Standard Reference Material (SRM) produced by the National Institute of Standards and Technology (NIST); and SRMSOLN, a solution containing SRM915a dissolved in 1% nitric acid (1 mg CaCO3/ml), spiked in the low ng/ml range with a mixture of metals. 

RMRS1 was composed of ground, sieved, homogenized otoliths of red snapper (Lutjanus campechanus) from the Gulf of Mexico.  Its true elemental composition was unknown.  The otoliths were initially ground at NIST using their cryogenic homogenization technique (Zeisler, et al., 1983).  This technique yields a powder containing particles of a range of sizes, including relatively large particles, so coarser fractions were ground (acid-rinsed, agate mortar and pestle) at the Howard Laboratory until all the powder passed through a 100 mesh screen (acid-rinsed, nylon).  Enough material was obtained to produce 36 bottles of powder, each containing 450-500 mg.  Homogeneity was assessed by analyzing three replicate samples from every sixth bottle, including the first and last, and comparing the concentrations of eight elements (Li, Na, Mg, K, Ca, Mn, Sr and Ba) using the analysis of variance (ANOVA) (Zar, 1984).  No statistical differences in composition were found between bottles, so the powder was judged to be homogeneous.

SRM915a was provided as an analytical reference material.  Although it is not certified for trace metal content, some data on trace metal levels are provided in its Certificate of Analysis.  Also, it is very pure, so it can be used to provide a clean matrix match for calibration standards/spikes, or possibly as a calibration standard for probe analyses.

SRMSOLN was intended to provide information on interlaboratory variability due to sample preparation methods.  It was spiked with measurable levels of a suite of metals, some of whose concentrations in the powdered samples were expected to be below the detection limits of some laboratories.  It was thought that laser laboratories that analyze solutions might also find it useful.


DETAILS OF THE EXERCISE

The exercise was performance-based.  No particular sample preparation or instrumental procedure was specified.  Participants were to use their routine procedures, including solid state (probe) methods, total dissolution techniques, isotope dilution, atomic absorption/emission, and others.  Participants were invited to use more than one procedure and report more than one set of results.

Each participant was sent approximately 500 mg of RMRS1, 500 mg of SRM915a and 30 ml of the SRMSOLN.  A copy of the Certificate of Analysis for SRM915a was included.  Participants were asked to prepare five replicates of each material for analysis using their routine sample preparation method(s), to analyze the prepared samples using their routine analytical procedure(s), and to measure as many, or as few, elements / isotopes as they wished.  For the powders, they were asked to calculate dry weight concentrations for each element using their routine procedure.  For the solution, they were asked to report results in ng/ml and not to perform a blank correction.

A formatted spreadsheet was provided for reporting the analytical results and the details of laboratory procedures, including sample preparation methods, calibration methods, etc.  Analytical results were compiled at the Howard Laboratory and each set of results was assigned a unique identification number (Lab #).  When all results had been received, each participant was sent a listing of raw data in order to verify that no transcription errors were made in the compilation of the data.

The true composition of the otolith powder was unknown, and no attempt was made to assign consensus or “correct” values to any analyte.  The purpose of the exercise was to provide a basis for direct comparison of results among laboratories.


METHODS

Two data sets were used for this report.  The Raw (unedited) data set contains all verified, final data, including LTVs ("Less Than" Values).  The Raw data set was used only to construct the tables in Appendix B.  The Quantitative (edited) data set contains all verified, final data, not including LTVs.  The Quantitative data set was used for summary statistics appearing in the tables in Appendix B, for data plots in Appendix C, and for all computations and statistical evaluations and any tables resulting from them (text Tables 1-10 and Appendix D Tables D1 through D6).

Tables

Replicate data for 28 elements are compiled in the tables in Appendix B.  Data are listed as received with respect to significant figures, or using four decimal places, whichever was fewer.  Also shown are the mean, standard deviation and the %CV (coefficient of variation, computed as 100 x laboratory standard deviation / laboratory mean).  For SRM915a, values listed as "Ref" are from the Certificate of Analysis.  For SRMSOLN, values listed as "Actual" are the total concentrations present in the solution.  For example, Mg is present in SRM915a at 1 ug/g dry weight.  SRMSOLN contains 1 mg of the SRM per ml of solution.  This contributes 1 ng Mg per ml of solution to the total solution concentration.  Another 3.5 ng Mg per ml was contributed by the spike.  Thus, the "Actual" concentration was 4.5 ng Mg per ml solution.  For elements present in the spike solution but not listed on the Certificate of Analysis, the amount contributed by the SRM was assumed to be zero.

Figures

Replicate data for 18 elements are plotted versus Lab # in the data plots in Appendix C.  Elements with fewer than four sets of results for at least one sample were not plotted.

Z-scores and P-scores

Z-scores and p-scores were computed for each laboratory for each element in each sample (Appendix D).

Z-scores were calculated as

z  =  (xL - X) / S

where xL = the laboratory mean, X = the accepted value and S = the target value for the standard deviation. 

P-scores were calculated as

p  =  sL / S

where sL = the laboratory standard deviation and S = the target value for the standard deviation. 

For this exercise, the value used for X was the overall mean for the element in the sample and the value used for S was the overall standard deviation for the element in the sample.  The term "overall" indicates that all data (from the Quantitative data set) submitted by all participants were used.


RESULTS

Samples were distributed in January 1999 to 29 laboratories in ten countries, listed in Appendix A.  Results were due by July 1999.  Sixteen sets of results were received from fourteen laboratories from eight countries.  Data were received for 28 elements.  Results are compiled and summarized in tables and plots in Appendices B and C.  Z-scores and p-scores are given in Appendix D.

Methods used and isotopes measured by the participants are listed in Appendix E.  Two laboratories conducted solid state analyses by microprobe; they did not analyze the SRMSOLN sample.  Fourteen laboratories dissolved the powders; they also analyzed the SRMSOLN sample.  In almost every case, the dissolution procedure was performed using nitric acid in an open vessel at room temperature.  Most laboratories used quadrupole inductively coupled plasma mass spectrometry (QICPMS) for the solution analyses, one used High Resolution ICPMS.  Atomic absorption spectrophotometry (AAS) was next most frequently used, followed by ICPAES (atomic emission spectrometry).

After excluding LTVs and elements with fewer than five sets of results, 11 elements remained as candidates for evaluation for SRM915a, 15 for the RMRS1 and SRMSOLN samples.  Three factors greatly influenced the evaluation of the results: 1) few certified or reference data were available for SRM 915a; 2) the composition of RMRS1 was unknown; and 3) no attempt was made to derive consensus or “correct” values for any analyte in any sample.  Nevertheless, an attempt was made to assess the accuracy and precision of the results and the extent of agreement between laboratories.

Accuracy

Certified or informational values are available for six elements in SRM915a.  In general, informational values differ from certified values in that they have not been determined by two independent methods, nor subjected to rigorous statistical evaluation.  An informational value is a "value of a property, not certified but provided because it is believed to be reliable and to provide information important to the certified material" (Taylor, 1985).  Quantitative results for sodium were submitted by only one laboratory, so accuracy was evaluated based on the five elements listed below.

Mg - 1.0 ug/g (informational value)
Ca - 40.0 % (certified value)
Mn - 0.6 ug/g (informational value)
Cu - 0.95 ug/g (average of 0.9 ug/g and 1 ug/g informational values)
Sr - 2.1 ug/g (informational value)

The measure used was % Recovery, computed as 100 x laboratory mean / certificate value.  State-of-the-art accuracy in trace element analysis requires recoveries within 20% of the true concentration at analyte levels less than 1 ppm, within 10% of the true concentration at analyte levels greater than 1 ppm and within 5% of the true concentration at analyte levels in the percent range (greater than 1000 ppm).  In this report, three cut-off levels were used to categorize results: ±10%, ±20% and > 150%, corresponding to "good," "acceptable," and "poor" or "unacceptable" accuracy.

Of the five elements listed above, good accuracy was achieved by all the laboratories only for Ca, present in the SRM at 40.0% (400,000 ppm, Table 1).  All the laboratories (nine of nine) had recoveries within 10% of the certificate value.  Results for the other four elements were generally poor.  For Mg, present at 1 ug/g, only one (of nine) laboratory had a recovery within 10% of the certificate value; eight of nine recoveries were >150% of the certificate value.  For Mn (0.6 ug/g) and Cu (0.95 ug/g), only one laboratory had a recovery within 10% of the certificate value; three had recoveries within 20% of the certificate value, but recoveries for three laboratories were >150% of the certificate value.  For Sr (2.1 ug/g), the best recovery was 147%; seven of eight laboratories had recoveries >150% of the certificate value.

Recovery results for the SRMSOLN sample are summarized in Table 2.  Again, the best accuracy was achieved only for Ca, present in the sample at 400,000 ng/ml.  All the laboratories (10 of 10) had recoveries within 20% of the actual value, while eight of 10 recoveries were within 10% of the actual value.  Results for the other four elements were slightly better than for the SRM, but still poor.  For Mg, present at 4.5 ng/ml, no recoveries were within 10% of the actual value, while three (of 11) were within 20% of the actual value, and only four of 11 were >150% of the actual value.  For Mn (4.1 ng/ml), more than half the laboratories (seven of 13) had recoveries within 10% of the actual value, and only three recoveries were >150% of the actual value.  For Cu (4.4 ng/ml), three laboratories had recoveries within 10% of the actual value; five had recoveries within 20% of the actual value, and only two laboratories had recoveries >150% of the actual value.  For Sr (5.6 ng/ml), one laboratory had a recovery within 10% of the actual value; two had recoveries within 20% of the actual value, and three laboratories had recoveries >150% of the actual value.  Thus, for the SRMSOLN sample, there was a slight improvement over SRM915a recoveries.

Accuracy was not assessed for the RMRS1 sample, since its composition was unknown and consensus values were not derived for any element in the sample.

Although most laboratories used QICPMS to measure elemental concentrations, other techniques were also used.  However, accuracy does not appear to be related to the instrumental methods used, nor to the specific isotopes measured.  For example, best recoveries were obtained for Ca in the SRM915a and SRMSOLN samples.  Ca was measured using ICPAES, Flame AA, and ICPMS (five different isotopes) with equally good results.  These good recoveries were most likely related to the high concentrations of Ca in these samples, resulting in greater ease of measurement.  Worst recoveries, in general, were obtained for Mg, which was also was measured by ICPAES, Graphite Furnace AA, and ICPMS (three different isotopes).  Recoveries in the SRMSOLN sample were better than in the SRM915a sample, but again this was most likely due to higher concentrations of Mg in the SRMSOLN sample.

Precision

Analytical precision was assessed based on intralaboratory %CV.  For each sample, elements with five or more sets of data were evaluated.  Thus, 11 elements were evaluated for SRM915a and 15 for the RMRS1 and SRMSOLN samples.  Cut-off levels used to categorize results were 10%, 20% and 50%, corresponding to "good," "acceptable," and "poor" or "unacceptable" precision.

Table 3 shows the precision results for SRM915a.  Best results were obtained for Ca; all nine laboratories had CVs of 10% or less.  This is not unexpected, however, since precision usually improves with concentration.  If the sample weights reported in Appendix E were used for SRM915a as well as for RMRS1, Ca concentrations in solution would be more than high enough to promote good precision.  This would not be true for other elements, however.  For elements present in SRM915a in ug/g levels, they would be present in solutions prepared from SRM915a in ng/ml concentrations and precision would not be expected to be as high as for Ca.  This, in fact, was observed.  Nevertheless, for Mg, Mn, Cu and Sr (elements for which accuracy was assessed), precision results were much better than accuracy results - that is, most laboratories had CVs < 20%, and for Cu and Sr most CVs were < 10%.  The worst precision for this group was for Mg, where two laboratories had CVs > 50%.  Precision results for Co, Ni, Zn, Ba and Pb were comparable to results for the "accuracy group"; half the laboratories (for Co and Ni, almost all the laboratories) had CVs < 20% and many laboratories had CVs < 10%.  Ba was the worst of this group; three laboratories had CVs > 50%.  The worst results were for Cr.  Of four laboratories, only one had a CV < 20%; one laboratory had a CV > 50%.

Results for SRMSOLN are given in Table 4.  Best precision was again obtained for Ca (all laboratories had CVs < 10%), but comparably high precision was also achieved for Mg, Mn, Cu and Sr (the "accuracy group"), as well as for Li, Co, Zn, Cd, Ba and Pb.  For Cr, Ni, As and Rb, most laboratories had CVs < 20%, and for As and Rb, one laboratory had CVs > 50%.  Thus, compared to SRM915a, there was substantial improvement in the precision of measurements of the SRMSOLN sample.  This is interesting, considering SRMSOLN was prepared from SRM915a.  However, this improvement can likely be attributed to two factors.  First, because this sample was spiked, the concentrations of many of these elements were much higher (relatively) in the SRMSOLN sample than in solutions of SRM915a prepared by the participants, promoting better precision.  Second, the SRMSOLN sample was prepared as a single large sample by the organizer and aliquots were sent to the participants.  Thus, the variability caused by sample preparation procedures was removed from the measurements.  A similar result was observed in an intercomparison exercise conducted by NOAA and refereed by NRC Canada in the mid 1980s.

Precision results for RMRS1 are given in Table 5.  Results for the “accuracy group” of elements (Mg, Ca, Mn, Cu and Sr) were mixed.  For Ca and Sr, all laboratories had CVs < 20%, with all laboratories except one within 10%.  For Mg and Mn, all laboratories except two had CVs < 20%, but for Mn only five of 12 laboratories had CVs < 10%; for Mg, eight of 11 laboratories had CVs < 10%, but one laboratory had a CV > 50%.  And for Cu, only four of seven laboratories had CVs < 20%, while only two of seven were < 10%.  Results for the other elements were also mixed.  For Na, K and Ba almost all laboratories had CVs < 20%; most were < 10%.  For Li, Co and Ni most laboratories had CVs < 20%, although for Co one laboratory had a CV > 50%.  And for Cr, Zn, Rb and Pb few laboratories had CVs < 20%; several CVs were > 50%.  These results are also consistent with the trend toward better precision with increasing concentration.  In fish otoliths, elemental concentrations may vary with species and geographic location, but certain elements consistently appear to occur in high abundance, while others occur at low abundance (Table 6, Zdanowicz, unpublished data).  Thus,  higher precision was generally obtained for the more abundant elements (Na, Mg, K, Ca and Sr) than for those occurring in otoliths at low ug/g levels.

As above, precision does not appear to be related to the methods used.  A more significant result is the increase in precision obtained in the measurement of the solution sample relative to the precision in the measurement of the powder samples.

Agreement Among Laboratories

In order to gauge the extent of agreement between laboratories, two measures were used, z-scores and p-scores, listed in Appendix D.  Z-scores are related to accuracy;  a z-score is the number of standard deviations from some accepted value a laboratory's mean value is.  P-scores are related to precision; a p-score is the number of multiples of some accepted value a laboratory's standard deviation is.  Results were assessed by element and by laboratory. 

Z-Scores

Z-scores were calculated for each element in each sample for each laboratory.  Then, for each element in each sample, means (AVZ) and standard deviations (SDZ) of the z-scores were computed and an interval calculated which ranged from (AVZ-SDZ) to (AVZ+SDZ).  Z-scores that fell within those intervals were designated "in agreement."

SUMMARY BY ELEMENT (Table 7).  For each sample, by comparing z-scores for an element, agreement among laboratories in the measurement of that element could be assessed.  It was expressed as Percentage of Laboratories in Measurement Agreement (%LMA), computed as 100 x number of laboratories with z-scores in agreement / the total number of laboratories for which z-scores were calculated.  For example, z-scores were calculated for Mg in SRM915a for nine laboratories.  Z-scores for eight laboratories were in agreement, as defined above.  Thus, for that sample, %LMA was 89% (Mg results were in agreement 89% of the time).

  • for the SRM915a sample, z-scores were computed for 11 elements.  %LMA ranged from 75% to 92%, and was 80% or higher for eight elements and 90% or higher for two elements.
  • for the SRMSOLN sample, z-scores were computed for 15 elements.  %LMA ranged from 50% to 92%, and was 80% or higher for nine elements and 90% or higher for three elements.
  • for the RMRS1 sample, z-scores were computed for 15 elements.  %LMA ranged from 71% to 89%, and was 80% or higher for 12 elements and 90% or higher for 0 elements.

SUMMARY BY LABORATORY (Table 8).  For each sample, by examining z-scores for all elements measured by a laboratory, measurement performance of individual laboratories could be assessed.  It was expressed as Percentage of Elements in Measurement Agreement (%EMA), and computed as 100 x number of elements with z-scores in agreement / the total number of elements measured by that laboratory for which z-scores were calculated.  For example, Lab 1 measured 12 elements in SRMSOLN.  Z-scores for two of those elements were in agreement, as defined above.  Thus, for that sample, %EMA for Lab 1 was 17% (Lab 1 was in agreement with other laboratories 17% of the time).

  • for the SRM915a sample, z-scores were computed for at least one element for 15 laboratories.  %EMA ranged from 0% to 100%, and was 80% or higher for nine laboratories, and 90% or higher for eight laboratories.
  • for the SRMSOLN sample, z-scores were computed for at least one element for 14 laboratories.  %EMA ranged from 17% to 100%, and was 80% or higher for 10 laboratories, and 90% or higher for six laboratories.
  • for the RMRS1 sample, z-scores were computed for at least one element for 16 laboratories.  %EMA ranged from 20% to 100%, and was 80% or higher for 10 laboratories, and 90% or higher for nine laboratories.

On the whole, the extent of measurement agreement among laboratories was moderate.  %LMA ranged from 50% to 92%, and was 80% or higher in 29 of 41 instances (71% of the time), and 90% or higher in five of 41 instances (12% of the time).  %EMA ranged from 0 to 100%, and was 80% or higher in 29 of 45 instances (64% of the time), and 90% or higher in 23 of 45 instances (51% of the time). 

P-Scores

As above, p-scores were calculated for each element in each sample for each laboratory.  Then, for each element in each sample, means (AVP) and standard deviations (SDP) of the p-scores were computed and an interval calculated which ranged from (AVP-SDP) to (AVP+SDP).  P-scores that fell within those intervals were designated "in agreement."

SUMMARY BY ELEMENT (Table 9).  For each sample, by comparing p-scores for an element, agreement among laboratories in precision for that element could be assessed.  It was expressed as Percentage of Laboratories in Precision Agreement (%LPA), computed as 100 x number of laboratories with p-scores in agreement / the total number of laboratories for which p-scores were calculated.  For example, p-scores were calculated for Mg in SRM915a for seven laboratories.  P-scores for six laboratories were in agreement, as defined above.  Thus, for that sample, %LPA was 86% (precision of Mg results were in agreement 86% of the time).

  • for the SRM915a sample, p-scores were computed for 11 elements.  %LPA ranged from 71% to 90%, and was 80% or higher for nine elements and 90% or higher for two elements.
  • for the SRMSOLN sample, p-scores were computed for 15 elements.  % LPA ranged from 75% to 90%, and was 80% or higher for 12 elements and 90% or higher for one element.
  • for the RMRS1 sample, p-scores were computed for 15 elements.  % LPA ranged from 80% to 92%, and was 80% or higher for 15 elements and 90% or higher for two elements.

SUMMARY BY LABORATORY (Table 10).  For each sample, by examining p-scores for all elements measured by a laboratory, precision performance of individual laboratories could be assessed.  It was expressed as Percentage of Elements in Precision Agreement (%EPA), and computed as 100 x number of elements with p-scores in agreement / the total number of elements measured by that laboratory for which p-scores were calculated.  For example, Lab 1 measured 12 elements in SRMSOLN.  P-scores for three of those elements were in agreement, as defined above.  Thus, for that sample, %EPA for Lab 1 was 25% (Lab 1 was in agreement with other laboratories 25% of the time).

  • for the SRM915a sample, p-scores were computed for at least one element for 14 laboratories.  %EPA ranged from 0% to 100%, and was 80% or higher for 12 laboratories, and 90% or higher for nine laboratories.
  • for the SRMSOLN sample, p-scores were computed for at least one element for 11 laboratories.  %EPA ranged from 25% to 100%, and was 80% or higher for 10 laboratories, and 90% or higher for eight laboratories.
  • for the RMRS1 sample, p-scores were computed for at least one element for 15 laboratories.  %EPA ranged from 25% to 100%, and was 80% or higher for 12 laboratories, and 90% or higher for eight laboratories.

On the whole, the extent of precision agreement among laboratories was greater than that observed for measurement agreement.  %LPA ranged from 71% to 92%, and was 80% or higher in 36 of 41 instances (88% of the time), and 90% or higher in five of 41 instances (12% of the time).  %EPA ranged from 0% to 100%, and was 80% or higher in 34 of 40 instances (85% of the time), and 90% or higher in 25 of 40 instances (63% of the time).


DISCUSSION

Three questions are of particular importance to participants in an intercomparison exercise.  How accurate are my results?  How good is my precision?  How do my results compare with results from other labs?

Accuracy can be evaluated using %Recovery, as defined earlier.  In this exercise, accuracy could be assessed for only five elements in two of the three samples (Tables 1 and 2).  Results were not encouraging.  Good accuracy was achieved only for Ca, the major component of the samples.  This result is reassuring, however, since Ca is measured in many studies of otolith chemistry.  Recoveries for the other four elements (Mg, Mn, Cu, and Sr), were widely scattered and generally poor.

SRM915a is not a particularly good reference material for otolith analyses.  First of all, levels of Mg and Sr, two important elements in otolith studies, are much lower in SRM915a than in otoliths.  Nevertheless, good accuracy for Mg and Sr in SRM915a would indicate that otolith analysts can measure these two elements at low levels in a high Ca matrix, thus providing some suggestion that Mg and Sr measurements in otoliths might be done properly.  Second, SRM915a is not a good matrix match for otoliths, which contain protein and other constituents not present in SRM915a.  Finally, SRM915a contains few trace elements.  Consideration was given to including limestone CRMs in the exercise, but they were rejected because they contain alumino-silicate phases.  Alumino-silicates would have insured the presence of more trace elements for use in guaging accuracy, but would not have improved the matrix match.  At the time of this exercise, there was no reference material available that was composed of otoliths, so SRM915a was, all things considered, the best reference material available.  However, now that the Japanese otolith CRM is available, there will be little reason to use SRM915a as a reference material for otolith analyses in the future.

One method of evaluating precision is by using %CV, as defined earlier.  In general, precision results were much better than accuracy results, although there is ample room for improvement.  Best precision was obtained for Ca in all three samples (Table 3, Table 4, and Table 5).  For elements of high abundance in otoliths (Na, K and Sr), %CVs were generally good (< 10%).  However, for elements of low (Mg) or trace concentrations in otoliths (Li, Cr, Mn, Co, Ni, Cu, Zn, As, Rb, Cd, Ba and Pb), %CVs ranged from 1-2% to almost 200% in the two powder samples (%CVs were lower in the SRMSOLN sample).  For Mg and Ba, two elements important in otolith research, precision was generally not good for SRM915a (containing trace levels of these elements), but was much improved for RMRS1, an otolith powder which contains higher levels.  This result, too, is encouraging and suggests that otolith analysts generally achieve acceptable precision in measuring Mg and Ba in otolith samples.

Finally, regarding comparability of results, the main objective of this exercise was to provide the participants with a basis for direct comparison of their results with those of other laboratories.  This can be accomplished by simple inspection of the data tables and plots.  This inspection reveals that in many cases, there was considerable agreement among laboratories in their measurement results.

An attempt was also made to summarize the extent of agreement among participating laboratories in their measurement and precision results in a concise form, so z-scores and p-scores were used. 

Z-scores are commonly used as an indicator of accuracy of results.  However, their "goodness" as an indicator depends on several factors:

a) well characterized samples of known composition are analyzed
b) a "true" or statistically derived consensus value is used for X
c) the target value used for S is meaningfully related to X - typically X/10 for trace level elements and X/20 for percent level elements, or S is related to a confidence interval around X
d) the exercise produces a well behaved data set - one where the vast majority of laboratories submit quantitative results of high accuracy and precision

Under these circumstances, z-scores will scatter around zero with most values ranging between -2 and +2, and they will be good indicators of accuracy.

In this exercise, z-scores are not good indicators of accuracy.  Other measures show that most results submitted by participants were not very accurate or precise, yet most z-scores ranged between -2 and +2.  This is because the value used for X was the overall mean (of all Quantitative data, including obvious outliers) and the value used for S was the standard deviation of that overall mean.  As stated earlier, no attempt was made to assign consensus or "correct" values to any analyte.  The main consequence of this was that there was no objective rationale for excluding any data from consideration, even obvious outliers.  The inclusion of outliers in X and S caused their values to be excessively large and the values of the z-scores to be lower than they otherwise would have been (in the presence of large outliers, S increases much more rapidly than X).  Consequently, the z-scores computed here are deceptively low. 

Nevertheless (and accuracy notwithstanding), by computing "agreement intervals," they can be used to show the extent of agreement among laboratories.  As already mentioned, there was only moderate agreement among participants.  In only 29 of 41 instances were 80% of the laboratories in agreement in the measurement of specific elements (Table 7), and in only five of 41 instances were 90% of the laboratories in agreement.  For elements important in otolith studies (Mg, Ca, Sr and Ba), agreement among laboratories averaged 81%.  In a mature area of analytical chemistry, one where the great majority of practitioners are highly experienced in the subject analyses, one would expect 90% agreement among laboratories much more often than in five of 41 instances.  With respect to individual laboratory performance (Table 8), there were 29 instances out of 45 where 80% of measurement results submitted by an individual laboratory agreed with results submitted by all laboratories, and there were 23 instances out of 45 where 90% of measurement results submitted by an individual laboratory agreed with results submitted by all laboratories.  This suggests that a laboratory either was in agreement with the other laboratories, or it was not.  For example, results submitted by Labs 2 and 10 were generally not in agreement with results submitted by the other laboratories.  Labs 2 and 10 conducted solid state analyses using microprobe methods.  Thus, not surprisingly, there appear to be significant differences between microprobe and dissolution methods.

Using the same approach for precision, p-scores were used to summarize the results.  As with z-scores, the "goodness" of p-scores as indicators of precision depends on the same factors as described above.  And, as above, those conditions were not met in this exercise, so p-scores, too, are deceptively low.  However, "agreement intervals" can be used to show the extent of agreement between laboratories.  As already mentioned, on the whole, the extent of precision agreement between laboratories was greater than that observed for measurement agreement.  In 36 of 41 instances, 80% of the laboratories were in agreement, although in only five of 41 instances were 90% of the laboratories in agreement (Table 9).  For elements important in otolith studies (Mg, Ca, Sr and Ba), agreement among laboratories averaged 86%.  The incidence of 90% agreement will most likely improve as this area of measurement matures.  Regarding individual laboratory performance (Table 10), there were 34 instances out of 40 where 80% of precision results submitted by an individual laboratory agreed with results submitted by all laboratories, and there were 25 instances out of 40 where 90% of precision results submitted by an individual laboratory agreed with results submitted by all laboratories.  As above, this suggests that a laboratory either was in agreement with the other laboratories, or it was not.  For example, results submitted by Lab 1 were generally not in agreement with results submitted by the other laboratories.


CONCLUSIONS

The main objective of this exercise was to provide participants with a basis for direct comparison of their results with those of other laboratories.  On that basis, the exercise was a success.  Data tables and plots contained in this report can be used to achieve that end.  Another valuable goal would have been to evaluate the accuracy and precision of the participants' results and provide summaries of these characteristics.  In these areas, the exercise was, at best, only partially successful.  Use of samples of wholly or largely unknown composition, combined with the lack of consensus values for the analytes, severely hindered the evaluation of accuracy.  For those analytes for which concentration values were available, accuracy was generally poor.  Precision, in contrast, could be assessed for all the analytes.  Results, however, were only slightly better.  Levels of agreement among laboratories observed in this exercise spanned a fairly broad range.  This is normally not considered a good situation, since comparability of results is essential if different studies are to be compared.  Thus, there is considerable room for improvement in this area of analytical chemistry.

Results of this exercise reflect the fact that this is not a mature area of analytical chemistry - one where the great majority of practitioners are highly experienced in the subject analyses.  Currently, the number of analysts in this field is relatively small, partially accounting for the small number of participants in this exercise.  However, that number is growing, and as it increases, so will the need increase for methods of assessing the accuracy, precision and comparability of data generated at different laboratories using different analytical methods.  Intercomparison exercises will partially fill that need, but acceptable control over the quality of otolith analyses will not be achieved without the use of suitable otolith CRMs.  One such CRM now exists and others surely will follow.  However, more intercomparison exercises are also needed that employ well characterized samples of known composition, so that "true" or consensus values for analytes of interest are available, allowing meaningful assessments of accuracy and precision to be made.


REFERENCES

Taylor, J.K.  Handbook for SRM Users; NIST Special Publication 260-100; National Institute of Standards and Technology: Gaithersburg, MD, 1985.

Yoshinaga, J., A. Nakama, M. Morita and J. S. Edmonds.  Fish otolith reference material for quality assurance of chemical analyses.  Mar. Chem., 2000, 69:91-97.

Zar, J.H.  Biostatistical Analysis, 2nd Ed.; Prentice Hall: Englewood Cliffs, NJ, 1984.

Zdanowicz, V.S.  James J. Howard Marine Sciences Laboratory, Highlands, NJ.  Unpublished data on the elemental composition of otoliths of fish from four pelagic species (blackfin tuna, bluefin tuna, bluefish, and cod).

Zeisler, R., J.K. Langland and S.H. Harrison.  Cryogenic homogenization of biological tissues.  Anal. Chem., 1983, 55:2431-2434.