Northeast Fisheries Science Center Reference Document 01-13
Elemental
Composition of Fish Otoliths:
Results of a Laboratory Intercomparison Exercise
by Vincent S. Zdanowicz
National Marine Fisheries Serv., 74 Magruder Rd., Highlands, NJ 07732
Current Address: U.S. Customs Serv., 7501 Boston Blvd., Ste. 113, Springfield, VA 22153
Print
publication date September 2001;
web version posted December 10, 2001
Citation: Zdanowicz, V.S. 2001. Elemental composition of fish otoliths: results of a laboratory intercomparison
exercise. Northeast Fish. Sci. Cent. Ref. Doc. 01-13; 92 p.
Download complete PDF/print version
Introduction
Within the past decade, otolith elemental analysis has become increasingly
utilized in studies of fishery biology. Based on the premise that
differences in habitat chemistry are manifested in otoliths as differences
in chemical composition, otolith elemental analysis has been used
to address important questions in fishery research. However, this
application has been hindered by the lack of an adequate standard
for analytical quality control. As a newly emerging research technique,
otolith elemental analysis requires further standardization between
laboratories in order to optimize its usefulness.
Standardization can be achieved through the use of Certified Reference
Materials (CRMs) and participation in intercomparison exercises. A
CRM is a substance, one or more of whose properties is sufficiently
well characterized to be used for the calibration of an apparatus or
the assessment of a measurement method. Results of CRM analyses can
be used to validate a laboratory’s analytical results. They also provide
a basis for comparing data generated at different laboratories or using
different analytical methods. At the time of this exercise, no otolith
CRMs were available, although one has since been developed through
a collaboration between the Western Australian Marine Research Laboratories
and Japan’s National Institute for Environmental Studies (Yoshinaga,
et al., 2000).
With the appearance of an increasing number of published studies and
the absence of a suitable means of assessing the quality of their results,
a laboratory intercomparison exercise was conducted in 1999 in an attempt
to benchmark the status of otolith elemental analyses being conducted
by investigators in the otolith research community. In the absence
of a suitable otolith CRM, participation in intercomparison exercises
can provide participants with a basis for direct comparison of their
results with those of other laboratories.
SAMPLE
DESCRIPTION
Three samples were used for the exercise: RMRS1, a fish otolith powder;
SRM915a, a powdered, high purity calcium carbonate Standard Reference
Material (SRM) produced by the National Institute of Standards and
Technology (NIST); and SRMSOLN, a solution containing SRM915a dissolved
in 1% nitric acid (1 mg CaCO3/ml), spiked in the low ng/ml
range with a mixture of metals.
RMRS1 was composed of ground, sieved, homogenized otoliths of red
snapper (Lutjanus campechanus) from the Gulf of Mexico. Its
true elemental composition was unknown. The otoliths were initially
ground at NIST using their cryogenic homogenization technique (Zeisler,
et al., 1983). This technique yields a powder containing particles
of a range of sizes, including relatively large particles, so coarser
fractions were ground (acid-rinsed, agate mortar and pestle) at the
Howard Laboratory until all the powder passed through a 100 mesh screen
(acid-rinsed, nylon). Enough material was obtained to produce 36 bottles
of powder, each containing 450-500 mg. Homogeneity was assessed by
analyzing three replicate samples from every sixth bottle, including
the first and last, and comparing the concentrations of eight elements
(Li, Na, Mg, K, Ca, Mn, Sr and Ba) using the analysis of variance (ANOVA)
(Zar, 1984). No statistical differences in composition were found
between bottles, so the powder was judged to be homogeneous.
SRM915a was provided as an analytical reference material. Although
it is not certified for trace metal content, some data on trace metal
levels are provided in its Certificate of Analysis. Also, it is very
pure, so it can be used to provide a clean matrix match for calibration
standards/spikes, or possibly as a calibration standard for probe analyses.
SRMSOLN was intended to provide information on interlaboratory variability
due to sample preparation methods. It was spiked with measurable levels
of a suite of metals, some of whose concentrations in the powdered
samples were expected to be below the detection limits of some laboratories. It
was thought that laser laboratories that analyze solutions might also
find it useful.
DETAILS
OF THE EXERCISE
The exercise was performance-based. No particular sample preparation
or instrumental procedure was specified. Participants were to use
their routine procedures, including solid state (probe) methods, total
dissolution techniques, isotope dilution, atomic absorption/emission,
and others. Participants were invited to use more than one procedure
and report more than one set of results.
Each participant was sent approximately 500 mg of RMRS1, 500 mg of
SRM915a and 30 ml of the SRMSOLN. A copy of the Certificate of Analysis
for SRM915a was included. Participants were asked to prepare five
replicates of each material for analysis using their routine sample
preparation method(s), to analyze the prepared samples using their
routine analytical procedure(s), and to measure as many, or as few,
elements / isotopes as they wished. For the powders, they were asked
to calculate dry weight concentrations for each element using their
routine procedure. For the solution, they were asked to report results
in ng/ml and not to perform a blank correction.
A formatted spreadsheet was provided for reporting the analytical
results and the details of laboratory procedures, including sample
preparation methods, calibration methods, etc. Analytical results
were compiled at the Howard Laboratory and each set of results was
assigned a unique identification number (Lab #). When all results
had been received, each participant was sent a listing of raw data
in order to verify that no transcription errors were made in the compilation
of the data.
The true composition of the otolith powder was unknown, and no attempt
was made to assign consensus or “correct” values to any analyte. The
purpose of the exercise was to provide a basis for direct comparison
of results among laboratories.
METHODS
Two data sets were used for this report. The Raw (unedited) data
set contains all verified, final data, including LTVs ("Less Than" Values). The
Raw data set was used only to construct the tables in Appendix
B. The Quantitative (edited) data set contains all verified, final
data, not including LTVs. The Quantitative data set was used for summary
statistics appearing in the tables in Appendix B, for data plots in
Appendix C, and for all computations and statistical evaluations and
any tables resulting from them (text Tables 1-10 and Appendix D Tables D1 through
D6).
Tables
Replicate data for 28 elements are compiled in the tables in Appendix
B. Data are listed as received with respect to significant figures,
or using four decimal places, whichever was fewer. Also shown are
the mean, standard deviation and the %CV (coefficient of variation,
computed as 100 x laboratory standard deviation / laboratory mean). For
SRM915a, values listed as "Ref" are from the Certificate
of Analysis. For SRMSOLN, values listed as "Actual" are
the total concentrations present in the solution. For example, Mg
is present in SRM915a at 1 ug/g dry weight. SRMSOLN contains 1 mg
of the SRM per ml of solution. This contributes 1 ng Mg per ml of
solution to the total solution concentration. Another 3.5 ng Mg per
ml was contributed by the spike. Thus, the "Actual" concentration
was 4.5 ng Mg per ml solution. For elements present in the spike solution
but not listed on the Certificate of Analysis, the amount contributed
by the SRM was assumed to be zero.
Figures
Replicate data for 18 elements are plotted versus Lab # in the data
plots in Appendix C. Elements with fewer
than four sets of results for at least one sample were not plotted.
Z-scores and P-scores
Z-scores and p-scores were computed for each laboratory for each element
in each sample (Appendix D).
Z-scores were calculated as
z = (xL -
X) / S
where xL = the laboratory mean, X = the accepted value
and S = the target value for the standard deviation.
P-scores were calculated as
p = sL / S
where sL = the laboratory standard deviation and S = the
target value for the standard deviation.
For this exercise, the value used for X was the overall mean for the
element in the sample and the value used for S was the overall standard
deviation for the element in the sample. The term "overall" indicates
that all data (from the Quantitative data set) submitted by all participants
were used.
RESULTS
Samples were distributed in January 1999 to 29 laboratories in ten
countries, listed in Appendix A. Results
were due by July 1999. Sixteen sets of results were received from
fourteen laboratories from eight countries. Data were received for
28 elements. Results are compiled and summarized in tables and plots
in Appendices B and C. Z-scores and p-scores are given in Appendix
D.
Methods used and isotopes measured by the participants are listed
in Appendix E. Two laboratories conducted
solid state analyses by microprobe; they did not analyze the SRMSOLN
sample. Fourteen laboratories dissolved the powders; they also analyzed
the SRMSOLN sample. In almost every case, the dissolution procedure
was performed using nitric acid in an open vessel at room temperature. Most
laboratories used quadrupole inductively coupled plasma mass spectrometry
(QICPMS) for the solution analyses, one used High Resolution ICPMS. Atomic
absorption spectrophotometry (AAS) was next most frequently used, followed
by ICPAES (atomic emission spectrometry).
After excluding LTVs and elements with fewer than five sets of results,
11 elements remained as candidates for evaluation for SRM915a, 15 for
the RMRS1 and SRMSOLN samples. Three factors greatly influenced the
evaluation of the results: 1) few certified or reference data were
available for SRM 915a; 2) the composition of RMRS1 was unknown; and
3) no attempt was made to derive consensus or “correct” values for
any analyte in any sample. Nevertheless, an attempt was made to assess
the accuracy and precision of the results and the extent of agreement
between laboratories.
Accuracy
Certified or informational values are available for six elements in
SRM915a. In general, informational values differ from certified values
in that they have not been determined by two independent methods, nor
subjected to rigorous statistical evaluation. An informational value
is a "value of a property, not certified but provided because
it is believed to be reliable and to provide information important
to the certified material" (Taylor, 1985). Quantitative results
for sodium were submitted by only one laboratory, so accuracy was evaluated
based on the five elements listed below.
Mg - 1.0 ug/g (informational value)
Ca - 40.0 % (certified value)
Mn - 0.6 ug/g (informational value)
Cu - 0.95 ug/g (average of 0.9 ug/g and 1 ug/g informational values)
Sr - 2.1 ug/g (informational value)
The measure used was % Recovery, computed as 100 x laboratory mean
/ certificate value. State-of-the-art accuracy in trace element analysis
requires recoveries within 20% of the true concentration at analyte
levels less than 1 ppm, within 10% of the true concentration at analyte
levels greater than 1 ppm and within 5% of the true concentration at
analyte levels in the percent range (greater than 1000 ppm). In this
report, three cut-off levels were used to categorize results: ±10%, ±20%
and > 150%, corresponding to "good," "acceptable," and "poor" or "unacceptable" accuracy.
Of the five elements listed above, good accuracy was achieved by all
the laboratories only for Ca, present in the SRM at 40.0% (400,000
ppm, Table 1). All the laboratories (nine
of nine) had recoveries within 10% of the certificate value. Results
for the other four elements were generally poor. For Mg, present at
1 ug/g, only one (of nine) laboratory had a recovery within 10% of
the certificate value; eight of nine recoveries were >150% of the
certificate value. For Mn (0.6 ug/g) and Cu (0.95 ug/g), only one
laboratory had a recovery within 10% of the certificate value; three
had recoveries within 20% of the certificate value, but recoveries
for three laboratories were >150% of the certificate value. For
Sr (2.1 ug/g), the best recovery was 147%; seven of eight laboratories
had recoveries >150% of the certificate value.
Recovery results for the SRMSOLN sample are summarized in Table
2. Again, the best accuracy was achieved only for Ca, present
in the sample at 400,000 ng/ml. All the laboratories (10 of 10)
had recoveries within 20% of the actual value, while eight of 10
recoveries were within 10% of the actual value. Results for the
other four elements were slightly better than for the SRM, but still
poor. For Mg, present at 4.5 ng/ml, no recoveries were within 10%
of the actual value, while three (of 11) were within 20% of the actual
value, and only four of 11 were >150% of the actual value. For
Mn (4.1 ng/ml), more than half the laboratories (seven of 13) had
recoveries within 10% of the actual value, and only three recoveries
were >150% of the actual value. For Cu (4.4 ng/ml), three laboratories
had recoveries within 10% of the actual value; five had recoveries
within 20% of the actual value, and only two laboratories had recoveries >150%
of the actual value. For Sr (5.6 ng/ml), one laboratory had a recovery
within 10% of the actual value; two had recoveries within 20% of
the actual value, and three laboratories had recoveries >150%
of the actual value. Thus, for the SRMSOLN sample, there was a slight
improvement over SRM915a recoveries.
Accuracy was not assessed for the RMRS1 sample, since its composition
was unknown and consensus values were not derived for any element in
the sample.
Although most laboratories used QICPMS to measure elemental concentrations,
other techniques were also used. However, accuracy does not appear
to be related to the instrumental methods used, nor to the specific
isotopes measured. For example, best recoveries were obtained for
Ca in the SRM915a and SRMSOLN samples. Ca was measured using ICPAES,
Flame AA, and ICPMS (five different isotopes) with equally good results. These
good recoveries were most likely related to the high concentrations
of Ca in these samples, resulting in greater ease of measurement. Worst
recoveries, in general, were obtained for Mg, which was also was measured
by ICPAES, Graphite Furnace AA, and ICPMS (three different isotopes). Recoveries
in the SRMSOLN sample were better than in the SRM915a sample, but again
this was most likely due to higher concentrations of Mg in the SRMSOLN
sample.
Precision
Analytical precision was assessed based on intralaboratory %CV. For
each sample, elements with five or more sets of data were evaluated. Thus,
11 elements were evaluated for SRM915a and 15 for the RMRS1 and SRMSOLN
samples. Cut-off levels used to categorize results were 10%, 20% and
50%, corresponding to "good," "acceptable," and "poor" or "unacceptable" precision.
Table 3 shows the precision results for SRM915a. Best
results were obtained for Ca; all nine laboratories had CVs of 10%
or less. This is not unexpected, however, since precision usually
improves with concentration. If the sample weights reported in Appendix
E were used for SRM915a as well as for RMRS1, Ca concentrations
in solution would be more than high enough to promote good precision. This
would not be true for other elements, however. For elements present
in SRM915a in ug/g levels, they would be present in solutions prepared
from SRM915a in ng/ml concentrations and precision would not be expected
to be as high as for Ca. This, in fact, was observed. Nevertheless,
for Mg, Mn, Cu and Sr (elements for which accuracy was assessed), precision
results were much better than accuracy results - that is, most laboratories
had CVs < 20%, and for Cu and Sr most CVs were < 10%. The worst
precision for this group was for Mg, where two laboratories had CVs > 50%. Precision
results for Co, Ni, Zn, Ba and Pb were comparable to results for the "accuracy
group"; half the laboratories (for Co and Ni, almost all the laboratories)
had CVs < 20% and many laboratories had CVs < 10%. Ba was the
worst of this group; three laboratories had CVs > 50%. The worst
results were for Cr. Of four laboratories, only one had a CV < 20%;
one laboratory had a CV > 50%.
Results for SRMSOLN are given in Table 4. Best
precision was again obtained for Ca (all laboratories had CVs < 10%),
but comparably high precision was also achieved for Mg, Mn, Cu and
Sr (the "accuracy group"), as well as for Li, Co, Zn, Cd,
Ba and Pb. For Cr, Ni, As and Rb, most laboratories had CVs < 20%,
and for As and Rb, one laboratory had CVs > 50%. Thus, compared
to SRM915a, there was substantial improvement in the precision of measurements
of the SRMSOLN sample. This is interesting, considering SRMSOLN was
prepared from SRM915a. However, this improvement can likely be attributed
to two factors. First, because this sample was spiked, the concentrations
of many of these elements were much higher (relatively) in the SRMSOLN
sample than in solutions of SRM915a prepared by the participants, promoting
better precision. Second, the SRMSOLN sample was prepared as a single
large sample by the organizer and aliquots were sent to the participants. Thus,
the variability caused by sample preparation procedures was removed
from the measurements. A similar result was observed in an intercomparison
exercise conducted by NOAA and refereed by NRC Canada in the mid 1980s.
Precision results for RMRS1 are given in Table
5. Results for the “accuracy group” of elements (Mg, Ca, Mn,
Cu and Sr) were mixed. For Ca and Sr, all laboratories had CVs < 20%,
with all laboratories except one within 10%. For Mg and Mn, all
laboratories except two had CVs < 20%, but for Mn only five of
12 laboratories had CVs < 10%; for Mg, eight of 11 laboratories
had CVs < 10%, but one laboratory had a CV > 50%. And for
Cu, only four of seven laboratories had CVs < 20%, while only
two of seven were < 10%. Results for the other elements were
also mixed. For Na, K and Ba almost all laboratories had CVs < 20%;
most were < 10%. For Li, Co and Ni most laboratories had CVs < 20%,
although for Co one laboratory had a CV > 50%. And for Cr, Zn,
Rb and Pb few laboratories had CVs < 20%; several CVs were > 50%. These
results are also consistent with the trend toward better precision
with increasing concentration. In fish otoliths, elemental concentrations
may vary with species and geographic location, but certain elements
consistently appear to occur in high abundance, while others occur
at low abundance (Table 6, Zdanowicz, unpublished
data). Thus, higher precision was generally obtained for the more
abundant elements (Na, Mg, K, Ca and Sr) than for those occurring
in otoliths at low ug/g levels.
As above, precision does not appear to be related to the methods used. A
more significant result is the increase in precision obtained in the
measurement of the solution sample relative to the precision in the
measurement of the powder samples.
Agreement Among Laboratories
In order to gauge the extent of agreement between laboratories, two
measures were used, z-scores and p-scores, listed in Appendix
D. Z-scores are related to accuracy; a z-score is the number
of standard deviations from some accepted value a laboratory's mean
value is. P-scores are related to precision; a p-score is the number
of multiples of some accepted value a laboratory's standard deviation
is. Results were assessed by element and by laboratory.
Z-Scores
Z-scores were calculated for each element in each sample for each
laboratory. Then, for each element in each sample, means (AVZ) and
standard deviations (SDZ) of the z-scores were computed and an interval
calculated which ranged from (AVZ-SDZ) to (AVZ+SDZ). Z-scores that
fell within those intervals were designated "in agreement."
SUMMARY BY ELEMENT (Table 7). For each sample,
by comparing z-scores for an element, agreement among laboratories
in the measurement of that element could be assessed. It was expressed
as Percentage of Laboratories in Measurement Agreement (%LMA), computed
as 100 x number of laboratories with z-scores in agreement / the total
number of laboratories for which z-scores were calculated. For example,
z-scores were calculated for Mg in SRM915a for nine laboratories. Z-scores
for eight laboratories were in agreement, as defined above. Thus,
for that sample, %LMA was 89% (Mg results were in agreement 89% of
the time).
- for the SRM915a sample, z-scores were computed for 11 elements. %LMA
ranged from 75% to 92%, and was 80% or higher for eight elements
and 90% or higher for two elements.
- for the SRMSOLN sample, z-scores were computed for 15 elements. %LMA
ranged from 50% to 92%, and was 80% or higher for nine elements and
90% or higher for three elements.
- for the RMRS1 sample, z-scores were computed for 15 elements. %LMA
ranged from 71% to 89%, and was 80% or higher for 12 elements and
90% or higher for 0 elements.
SUMMARY BY LABORATORY (Table 8). For each
sample, by examining z-scores for all elements measured by a laboratory,
measurement performance of individual laboratories could be assessed. It
was expressed as Percentage of Elements in Measurement Agreement (%EMA),
and computed as 100 x number of elements with z-scores in agreement
/ the total number of elements measured by that laboratory for which
z-scores were calculated. For example, Lab 1 measured 12 elements
in SRMSOLN. Z-scores for two of those elements were in agreement,
as defined above. Thus, for that sample, %EMA for Lab 1 was 17% (Lab
1 was in agreement with other laboratories 17% of the time).
- for the SRM915a sample, z-scores were computed for at least one
element for 15 laboratories. %EMA ranged from 0% to 100%, and was
80% or higher for nine laboratories, and 90% or higher for eight
laboratories.
- for the SRMSOLN sample, z-scores were computed for at least one
element for 14 laboratories. %EMA ranged from 17% to 100%, and was
80% or higher for 10 laboratories, and 90% or higher for six laboratories.
- for the RMRS1 sample, z-scores were computed for at least one element
for 16 laboratories. %EMA ranged from 20% to 100%, and was 80% or
higher for 10 laboratories, and 90% or higher for nine laboratories.
On the whole, the extent of measurement agreement among laboratories
was moderate. %LMA ranged from 50% to 92%, and was 80% or higher in
29 of 41 instances (71% of the time), and 90% or higher in five of
41 instances (12% of the time). %EMA ranged from 0 to 100%, and was
80% or higher in 29 of 45 instances (64% of the time), and 90% or higher
in 23 of 45 instances (51% of the time).
P-Scores
As above, p-scores were calculated for each element in each sample
for each laboratory. Then, for each element in each sample, means
(AVP) and standard deviations (SDP) of the p-scores were computed and
an interval calculated which ranged from (AVP-SDP) to (AVP+SDP). P-scores
that fell within those intervals were designated "in agreement."
SUMMARY BY ELEMENT (Table 9). For each sample,
by comparing p-scores for an element, agreement among laboratories
in precision for that element could be assessed. It was expressed
as Percentage of Laboratories in Precision Agreement (%LPA), computed
as 100 x number of laboratories with p-scores in agreement / the total
number of laboratories for which p-scores were calculated. For example,
p-scores were calculated for Mg in SRM915a for seven laboratories. P-scores
for six laboratories were in agreement, as defined above. Thus, for
that sample, %LPA was 86% (precision of Mg results were in agreement
86% of the time).
- for the SRM915a sample, p-scores were computed for 11 elements. %LPA
ranged from 71% to 90%, and was 80% or higher for nine elements and
90% or higher for two elements.
- for the SRMSOLN sample, p-scores were computed for 15 elements. %
LPA ranged from 75% to 90%, and was 80% or higher for 12 elements
and 90% or higher for one element.
- for the RMRS1 sample, p-scores were computed for 15 elements. %
LPA ranged from 80% to 92%, and was 80% or higher for 15 elements
and 90% or higher for two elements.
SUMMARY BY LABORATORY (Table 10). For each
sample, by examining p-scores for all elements measured by a laboratory,
precision performance of individual laboratories could be assessed. It
was expressed as Percentage of Elements in Precision Agreement (%EPA),
and computed as 100 x number of elements with p-scores in agreement
/ the total number of elements measured by that laboratory for which
p-scores were calculated. For example, Lab 1 measured 12 elements
in SRMSOLN. P-scores for three of those elements were in agreement,
as defined above. Thus, for that sample, %EPA for Lab 1 was 25% (Lab
1 was in agreement with other laboratories 25% of the time).
- for the SRM915a sample, p-scores were computed for at least one
element for 14 laboratories. %EPA ranged from 0% to 100%, and was
80% or higher for 12 laboratories, and 90% or higher for nine laboratories.
- for the SRMSOLN sample, p-scores were computed for at least one
element for 11 laboratories. %EPA ranged from 25% to 100%, and was
80% or higher for 10 laboratories, and 90% or higher for eight laboratories.
- for the RMRS1 sample, p-scores were computed for at least one element
for 15 laboratories. %EPA ranged from 25% to 100%, and was 80% or
higher for 12 laboratories, and 90% or higher for eight laboratories.
On the whole, the extent of precision agreement among laboratories
was greater than that observed for measurement agreement. %LPA ranged
from 71% to 92%, and was 80% or higher in 36 of 41 instances (88% of
the time), and 90% or higher in five of 41 instances (12% of the time). %EPA
ranged from 0% to 100%, and was 80% or higher in 34 of 40 instances
(85% of the time), and 90% or higher in 25 of 40 instances (63% of
the time).
DISCUSSION
Three questions are of particular importance to participants in an
intercomparison exercise. How accurate are my results? How good is
my precision? How do my results compare with results from other labs?
Accuracy can be evaluated using %Recovery, as defined earlier. In
this exercise, accuracy could be assessed for only five elements in
two of the three samples (Tables 1 and 2). Results
were not encouraging. Good accuracy was achieved only for Ca, the
major component of the samples. This result is reassuring, however,
since Ca is measured in many studies of otolith chemistry. Recoveries
for the other four elements (Mg, Mn, Cu, and Sr), were widely scattered
and generally poor.
SRM915a is not a particularly good reference material for otolith
analyses. First of all, levels of Mg and Sr, two important elements
in otolith studies, are much lower in SRM915a than in otoliths. Nevertheless,
good accuracy for Mg and Sr in SRM915a would indicate that otolith
analysts can measure these two elements at low levels in a high Ca
matrix, thus providing some suggestion that Mg and Sr measurements
in otoliths might be done properly. Second, SRM915a is not a good
matrix match for otoliths, which contain protein and other constituents
not present in SRM915a. Finally, SRM915a contains few trace elements. Consideration
was given to including limestone CRMs in the exercise, but they were
rejected because they contain alumino-silicate phases. Alumino-silicates
would have insured the presence of more trace elements for use in guaging
accuracy, but would not have improved the matrix match. At the time
of this exercise, there was no reference material available that was
composed of otoliths, so SRM915a was, all things considered, the best
reference material available. However, now that the Japanese otolith
CRM is available, there will be little reason to use SRM915a as a reference
material for otolith analyses in the future.
One method of evaluating precision is by using %CV, as defined earlier. In
general, precision results were much better than accuracy results,
although there is ample room for improvement. Best precision was obtained
for Ca in all three samples (Table 3, Table 4, and Table 5). For
elements of high abundance in otoliths (Na, K and Sr), %CVs were generally
good (< 10%). However, for elements of low (Mg) or trace concentrations
in otoliths (Li, Cr, Mn, Co, Ni, Cu, Zn, As, Rb, Cd, Ba and Pb), %CVs
ranged from 1-2% to almost 200% in the two powder samples (%CVs were
lower in the SRMSOLN sample). For Mg and Ba, two elements important
in otolith research, precision was generally not good for SRM915a (containing
trace levels of these elements), but was much improved for RMRS1, an
otolith powder which contains higher levels. This result, too, is
encouraging and suggests that otolith analysts generally achieve acceptable
precision in measuring Mg and Ba in otolith samples.
Finally, regarding comparability of results, the main objective of
this exercise was to provide the participants with a basis for direct
comparison of their results with those of other laboratories. This
can be accomplished by simple inspection of the data tables and plots. This
inspection reveals that in many cases, there was considerable agreement
among laboratories in their measurement results.
An attempt was also made to summarize the extent of agreement among
participating laboratories in their measurement and precision results
in a concise form, so z-scores and p-scores were used.
Z-scores are commonly used as an indicator of accuracy of results. However,
their "goodness" as an indicator depends on several factors:
a) well characterized samples of known composition are analyzed
b) a "true" or statistically derived consensus value is used for
X
c) the target value used for S is meaningfully related to X - typically X/10
for trace level elements and X/20 for percent level elements, or S is related
to a confidence interval around X
d) the exercise produces a well behaved data set - one where the vast majority
of laboratories submit quantitative results of high accuracy and precision
Under these circumstances, z-scores will scatter around zero with
most values ranging between -2 and +2, and they will be good indicators
of accuracy.
In this exercise, z-scores are not good indicators of accuracy. Other
measures show that most results submitted by participants were not
very accurate or precise, yet most z-scores ranged between -2 and +2. This
is because the value used for X was the overall mean (of all Quantitative
data, including obvious outliers) and the value used for S was the
standard deviation of that overall mean. As stated earlier, no attempt
was made to assign consensus or "correct" values to any analyte. The
main consequence of this was that there was no objective rationale
for excluding any data from consideration, even obvious outliers. The
inclusion of outliers in X and S caused their values to be excessively
large and the values of the z-scores to be lower than they otherwise
would have been (in the presence of large outliers, S increases much
more rapidly than X). Consequently, the z-scores computed here are
deceptively low.
Nevertheless (and accuracy notwithstanding), by computing "agreement
intervals," they can be used to show the extent of agreement among
laboratories. As already mentioned, there was only moderate agreement
among participants. In only 29 of 41 instances were 80% of the laboratories
in agreement in the measurement of specific elements (Table
7), and in only five of 41 instances were 90% of the laboratories
in agreement. For elements important in otolith studies (Mg, Ca, Sr
and Ba), agreement among laboratories averaged 81%. In a mature area
of analytical chemistry, one where the great majority of practitioners
are highly experienced in the subject analyses, one would expect 90%
agreement among laboratories much more often than in five of 41 instances. With
respect to individual laboratory performance (Table
8), there were 29 instances out of 45 where 80% of measurement
results submitted by an individual laboratory agreed with results submitted
by all laboratories, and there were 23 instances out of 45 where 90%
of measurement results submitted by an individual laboratory agreed
with results submitted by all laboratories. This suggests that a laboratory
either was in agreement with the other laboratories, or it was not. For
example, results submitted by Labs 2 and 10 were generally not in agreement
with results submitted by the other laboratories. Labs 2 and 10 conducted
solid state analyses using microprobe methods. Thus, not surprisingly,
there appear to be significant differences between microprobe and dissolution
methods.
Using the same approach for precision, p-scores were used to summarize
the results. As with z-scores, the "goodness" of p-scores
as indicators of precision depends on the same factors as described
above. And, as above, those conditions were not met in this exercise,
so p-scores, too, are deceptively low. However, "agreement intervals" can
be used to show the extent of agreement between laboratories. As already
mentioned, on the whole, the extent of precision agreement between
laboratories was greater than that observed for measurement agreement. In
36 of 41 instances, 80% of the laboratories were in agreement, although
in only five of 41 instances were 90% of the laboratories in agreement
(Table 9). For elements important in otolith
studies (Mg, Ca, Sr and Ba), agreement among laboratories averaged
86%. The incidence of 90% agreement will most likely improve as this
area of measurement matures. Regarding individual laboratory performance
(Table 10), there were 34 instances out of
40 where 80% of precision results submitted by an individual laboratory
agreed with results submitted by all laboratories, and there were 25
instances out of 40 where 90% of precision results submitted by an
individual laboratory agreed with results submitted by all laboratories. As
above, this suggests that a laboratory either was in agreement with
the other laboratories, or it was not. For example, results submitted
by Lab 1 were generally not in agreement with results submitted by
the other laboratories.
CONCLUSIONS
The main objective of this exercise was to provide participants with
a basis for direct comparison of their results with those of other
laboratories. On that basis, the exercise was a success. Data tables
and plots contained in this report can be used to achieve that end. Another
valuable goal would have been to evaluate the accuracy and precision
of the participants' results and provide summaries of these characteristics. In
these areas, the exercise was, at best, only partially successful. Use
of samples of wholly or largely unknown composition, combined with
the lack of consensus values for the analytes, severely hindered the
evaluation of accuracy. For those analytes for which concentration
values were available, accuracy was generally poor. Precision, in
contrast, could be assessed for all the analytes. Results, however,
were only slightly better. Levels of agreement among laboratories
observed in this exercise spanned a fairly broad range. This is normally
not considered a good situation, since comparability of results is
essential if different studies are to be compared. Thus, there is
considerable room for improvement in this area of analytical chemistry.
Results of this exercise reflect the fact that this is not a mature
area of analytical chemistry - one where the great majority of practitioners
are highly experienced in the subject analyses. Currently, the number
of analysts in this field is relatively small, partially accounting
for the small number of participants in this exercise. However, that
number is growing, and as it increases, so will the need increase for
methods of assessing the accuracy, precision and comparability of data
generated at different laboratories using different analytical methods. Intercomparison
exercises will partially fill that need, but acceptable control over
the quality of otolith analyses will not be achieved without the use
of suitable otolith CRMs. One such CRM now exists and others surely
will follow. However, more intercomparison exercises are also needed
that employ well characterized samples of known composition, so that "true" or
consensus values for analytes of interest are available, allowing meaningful
assessments of accuracy and precision to be made.
REFERENCES
Taylor, J.K. Handbook for SRM Users; NIST Special Publication 260-100;
National Institute of Standards and Technology: Gaithersburg, MD, 1985.
Yoshinaga, J., A. Nakama, M. Morita and J. S. Edmonds. Fish otolith
reference material for quality assurance of chemical analyses. Mar.
Chem., 2000, 69:91-97.
Zar, J.H. Biostatistical Analysis, 2nd Ed.; Prentice Hall: Englewood
Cliffs, NJ, 1984.
Zdanowicz, V.S. James J. Howard Marine Sciences Laboratory, Highlands,
NJ. Unpublished data on the elemental composition of otoliths of fish
from four pelagic species (blackfin tuna, bluefin tuna, bluefish, and
cod).
Zeisler, R., J.K. Langland and S.H. Harrison. Cryogenic homogenization
of biological tissues. Anal. Chem., 1983, 55:2431-2434.