Northeast Fisheries Science Center Reference Document 06-27
Accuracy and precision exercises associated
with 2005 TRAC production
aging
by Sandra J. Sutherland,
Nancy J. Munroe, Vaughn Silva,
Sarah E. Pregracke, and John M. Burnett
National
Marine Fisheries Serv., Woods Hole Lab., 166 Water St., Woods Hole MA
02543-1026
Print
publication date November 2006;
web version posted January 23, 2007
Citation: Sutherland
SJ, Munroe NJ, Silva V, Pregracke SE, Burnett JM.
2006. Accuracy and precision exercises associated
with 2005 TRAC production aging. US Dep Commer, Northeast
Fish Sci Cent Ref Doc 06-27; 17 p.
Download complete PDF/print version
INTRODUCTION
In production aging programs, age reader accuracy can be thought of
as how often the “right” age is obtained, and precision
as how often the “same” age is obtained (Campana 2001). It
is possible that, over time, an age reader may inadvertently change
the criteria that are used for determining ages, thereby introducing
a bias into the age data. This bias can be measured with accuracy
tests, which consist of the age reader blindly examining known- or
consensus-aged fish from established reference collections. An
age reader may also make periodic mistakes, which introduces random
errors into the data. The degree of this error can be measured
with precision tests, which consist of the age reader blindly re-aging
fish which they have already aged. Both accuracy and precision
must be considered within a quality-control monitoring program.
Acceptable levels of aging accuracy and precision are influenced by
factors such as species, age structure, and age reader experience. Although
percent agreement is strongly affected by these differences, the staff
of the Fishery Biology Program at the Northeast Fisheries Science Center
(NEFSC) have long considered levels above 80% to be acceptable. The
total coefficient of variation (CV) is less affected by these differences
and, thus, is a better measure of aging error. In many aging
labs around the world, total CVs of under 5% are considered acceptable
among species of moderate longevity and aging complexity (Campana 2001),
such as the species considered here.
For over 35 years, scientists at the NEFSC Fishery Biology Program
have regularly conducted production aging, determining the ages for
large numbers of samples over a short period of time using established
methods (Penttila and Dery 1988), for the species assessed by the Transboundary
Resources Assessment Committee (TRAC). Historically, our approach
to age-data quality control and assurance has been a two-reader system. In
this approach, there are both a primary and a secondary age reader
for each species. The primary age reader conducts all production
aging, and the secondary age reader then ages a portion of those same
samples using similar methods. The ages determined by the two
readers are compared, and if they agree sufficiently (above 80% agreement),
the production ages are considered valid. If not, the sources
of disagreement must first be resolved. This interreader approach
is still used in the course of training new readers in order to ensure
consistency in application of aging criteria and in inter-laboratory
sample exchanges. Budgetary and staffing constraints have made
this approach less feasible, however, by reducing the number of species
for which there are two competent age readers at this laboratory.
In response, the NEFSC Fishery Biology Program has implemented a new
approach to quality control and assurance. Intrareader tests
of aging accuracy and precision, as described above, allow us to quantify
the amount of inherent aging error and bias in the ages determined
by each of our staff members. These values provide a measure
of the reliability of the production age data used in stock assessments,
and they may be directly incorporated into population models as a source
of variability.
In conjunction with implementation of these tests, we have begun to
establish reference collections of age samples for each species. These
collections are necessary to evaluate aging accuracy. Fish of
known age are difficult to obtain, so we have focused on assembling
collections from age samples which have been included in aging exchanges
with other laboratories. From those samples, we have selected
those fish for which multiple experienced age readers agree on the
age (see Silva et al. 2004 for more details).
In what has become an annual process, exercises were undertaken to
estimate the accuracy and/or precision of U.S. production aging for
the 2005 TRAC assessments (Hunt et al. 2005; Stone and Legault 2005;
Van Eeckhaute and Brodziak 2005) of Georges Bank stocks of cod (Gadus
morhua), haddock (Melanogrammus aeglefinus), and yellowtail
flounder (Limanda ferruginea). This report lists the results
of those exercises.
METHODS
For all species, subsamples were randomly selected to be re-aged in
order to test age-reader accuracy (versus the reference collections)
or precision (versus samples previously aged by that reader). Some
consideration was given to selecting a range of lengths in these random
samples to include a wider range of ages. When re-aging fish,
the age reader had knowledge of the same data as during production
aging (i.e. fish length, date captured, and area captured) but no knowledge
of previous age estimates.
During age-testing exercises, no attempts were made to improve results
with repeated readings. There was also no attempt to revise the
production ages in cases where differences occurred. Results
are presented in terms of percentage agreement, total coefficient of
variation (CV), age-bias plots, and age-frequency tables (Campana et
al. 1995; Campana 2001).
For cod, the current primary age reader was unable to conduct production
aging within the available time, and did not resume production aging
until late in 2005. Therefore, the previous age reader, who aged
cod samples from 1984 to 2003, completed aging of all samples for the
2005 TRAC meeting. Following production aging, the accuracy of
this previous age reader was determined from a random subsample drawn
from the NEFSC cod otolith reference collection. Because of time
constraints, no precision estimates were attempted.
For haddock, age-reader precision was estimated on multiple occasions
from blind second readings of subsamples from each NEFSC survey (autumn
2004 and spring 2005) and from each quarter of the 2004 NEFSC
commercial port samples. These exercises immediately followed
the completion of each cruise or quarter. Following the completion
of production aging, age-reader accuracy was also assessed by re-aging
a subsample from the NEFSC haddock otolith reference collection.
For yellowtail flounder, age-reader precision was estimated three
times from blind second readings of random subsamples from the 2004
Canadian Department of Fisheries and Oceans (DFO) port samples, the
2005 DFO spring survey, and a combination of U.S. samples (autumn 2004
and spring 2005 NEFSC surveys, plus 2004 NEFSC commercial port samples). These
latter samples were also re-aged by the person who was then being trained
as a yellowtail age reader. This trainee assumed yellowtail aging
duties in 2006, after the previous age reader retired. Both age
readers worked together during production aging for the above samples,
precluding an interreader comparison.
RESULTS AND
DISCUSSION
The total sample sizes associated with the accuracy and precision
exercises were N = 106, 393, and 367 for cod, haddock, and yellowtail
flounder, respectively. Results for cod are presented in Figure
1, haddock in Figures 2–8, and yellowtail in Figures 9–12. Table
1 summarizes these results.
The accuracy estimate for cod was high (91% agreement), and the total
CV (1.5%) was low. However, there was a tendency toward overaging
by one year in the test readings (Figure
1). Even so, the precision
level was virtually the same as that obtained last year (91% agreement
and 1.9% CV, Sutherland et al. 2004 [unpubl.]), suggesting that the
temporary change to the previous age reader was not problematic.
For haddock, precision levels ranged between 91 and 98% agreement,
with total CVs of 0.3–0.9% (Figures 2–7), indicating a
high level of consistency in age determinations. No disagreement
between readings was more than one year. More importantly, no
pattern of seasonal bias was present across exercises this year, as
was observed last year in samples from the 1st and 2nd quarters
(Sutherland et al. 2004 [unpubl.]). This year’s results
showed an increase in precision from last year (median of 86% agreement
and 2.0% CV, Sutherland et al. 2004 [unpubl.]). The relatively
high accuracy estimate (94% agreement, 1.3% CV, Figure 8), coupled
with consistently high precision results, supports the conclusion that
the haddock age reader is performing at a reliable level of aging capability.
Precision levels for yellowtail flounder aging were consistent between
Canadian samples from the 2004 DFO port samples (86% agreement, 2.5%
CV, Figure 9) and the 2005 DFO spring survey (92% agreement, 1.8% CV,
Figure 10). In the port samples, there was a tendency towards higher
ages for intermediate-age fish in the second readings. The values
obtained for U.S. samples, however, were less precise (71% agreement
and 6.6% CV, Figure 11) and revealed a bias towards underaging of older
fish (age ≥ 4 years) in the second readings.
When the latter exercise was performed by the trainee, results were
comparably precise (73% agreement, 6.1% CV, Figure 12) but did not
exhibit a bias. This may indicate that the change in age readers
for yellowtail flounder could increase the reliability of age determinations. Nevertheless,
the new reader’s progress was closely monitored in the first
year of production aging.
Observations of poor scale condition in yellowtail flounder from eastern
Georges Bank, which began in 2002, have continued in these samples. The
scales were characterized by actual holes and moderate to severe erosion
of the anterior scale edges (illustrated in Sutherland et al. 2004
[unpubl.]). This condition remains unexplained.
In summary, U.S. age determinations for cod and haddock appear to
be reliable during recent production aging. Yellowtail flounder
aging precision was acceptable for Canadian samples, but lower among
U.S. samples. This situation may improve among samples aged in
2006, after the new age reader has take responsibility for production
aging in this species.
REFERENCES
Campana SE. 2001. Accuracy, precision, and quality control
in age determination, including a review of the use and abuse of age
validation methods. J Fish Biol. 59:197-242.
Campana SE, Annand MC, McMillan JI. 1995. Graphical and
statistical methods for determining the consistency of age determinations. Trans
Am Fish Soc. 124:131-138.
Hunt JJ, O'Brien L, and Hatt B. 2005. Population
status of eastern Georges Bank cod (unit areas 5Zj,m) for 1978-2006. TRAC
Ref Doc. 2005/01; 48 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.htm
Penttila J and Dery LM. 1988. Age determination methods for
northwest Atlantic species. NOAA Tech Rep NMFS 72; 135 p. Available
at http://www.nefsc.noaa.gov/fbi/age-man.html
Silva V, Munroe N, Pregracke SE, Burnett J. 2004. Age
structure reference collections: the importance of being earnest. In: Johnson
DL, Finneran TW, Phelan BA, Deshpande AD, Noonan CL, Fromm S, Dowds
DM, compilers. Current fisheries research and future ecosystems
science in the Northeast Center: collected abstracts of the Northeast
Fisheries Science Center's Eighth Science Symposium, Atlantic City,
New Jersey, February 3-5, 2004. Northeast Fish Sci Cent Ref Doc.
04-01; p. 60.
Stone HH, and Legault CM. 2005. Stock assessment
of Georges Bank (5Zhjmn) yellowtail flounder for 2005. TRAC Ref
Doc. 2005/04; 83 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.htm
Van Eeckhaute L and Brodziak J. 2005. Assessment of haddock
on eastern Georges Bank. TRAC Ref Doc. 2005/03; 77 p. Available
at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.htm