roy troll border art
nefsc banner
Technical Memoranda Reference Documents Classic Publications Contract/Grant Reports
CMER Publications Series Information Links and Contacts Annual/Biennial Lists
Web Manager Email Search Publications Publications Home Site Map
CONTENTS
Introduction
Background
Definition of Strata – Fishery Identification
Data Sources
Imputation Rules for Unobserved Fisheries
Sampling Theory and Optimization Models
Application of the Model
Precision, Bias and Sampling Intensity
An Evaluation of Bias in the Northeast Fisheries Observer Program
Sources of Uncertainty
Acknowledgments
References
Northeast Fisheries Science Center Reference Document 05-09

NEFSC Bycatch Estimation Methodology:
Allocation, Precision, and Accuracy


by Paul J. Rago, Susan E. Wigley, and Michael J. Fogarty

National Marine Fisheries Serv., Woods Hole Lab., 166 Water St., Woods Hole, MA 02543

Print publication date August 2005; web version posted August 25, 2005

Citation: Rago PJ, Wigley SE, Fogarty MJ. 2005. NEFSC bycatch estimation methodology: allocation, precision, and accuracy. US Dep Commer, Northeast Fish Sci Cent Ref Doc. 05-09; 44 p.

get acrobat reader Download complete PDF/print version

EXECUTIVE SUMMARY

This report describes the standardized methodology used to estimate bycatch rates of finfish by commercial fisheries in the Northeast.  In this report, bycatch is defined as the observed discarded catch, summed over from eleven different groundfish species.  Estimates of unobserved discards are not considered.   All retained catches are included whether or not the catches were incidental to the target species.  Emphasis is placed on the methods used to define the sampling frame (i.e., the population of commercial fishing trips to be sampled), appropriate stratification, and efficient allocation of sampling effort to these strata.  Efficient allocation of sampling effort within a stratified survey design improves the precision of the estimate of overall discard rates.   Accuracy of sample estimates is evaluated by comparing various performance measures (e.g., landings, trip duration) between vessels with and without observers present. Although formal statistical distinctions between accuracy and bias of estimators and estimates can be made, in this report we use the terms interchangeably and less formally. A biased estimator is inaccurate; an accurate estimator is unbiased.

This report focuses on bycatch estimates based on discard to kept ratios.  Use of this ratio is appropriate for trawl, gillnet and longline fisheries in the Northeast US.  A formal assessment of  bycatch estimates based on the ratio of discards to fishing effort is not considered in this report.  Estimators based on ratios of total discard to fishing effort are more appropriate for fisheries that do not target groundfish, such as the sea scallop and herring fisheries.  Evaluations of groundfish bycatch in these fisheries are being conducted by technical committees for their respective fishery management plans.

The Northeast Fisheries Science Center allocates observer sea days to monitor bycatch in commercial fisheries along the Northeast coast.  These fisheries are diverse and therefore it is necessary to stratify commercial trips into fleet sectors (strata) with similar characteristics.  Data from Northeast Fisheries Observer Program and the Fishing Vessel Trip Report are used together to define the size of the sample and the size of the strata, respectively.  We define a total of 227 fisheries for 2005 observer coverage, consisting of three major gear types, four mesh sizes, two levels of trip durations, six port areas, and four seasonal quarters. The total fishing effort for April 2003 to March 2004 in the defined strata comprises 43,703 trips.  Our examination of efficacy of observer coverage included results from 1,103 trips and 2,704 sea days.  Every effort has been made to make the sampling program synoptic (i.e., cover all the major fisheries that discard commercially important species) and robust to sources of uncertainty.  In particular, we utilize discard information at the trip level as opposed to the tow level.  Sampling selection relies on observable properties of the strata, rather than desired outcomes (e.g., a targeted “cod” trip).  Trips within strata are also assigned a probability of obtaining useful information relative to the species group of interest.  The “usefulness” of a trip is conditional on the likelihood that a trip will catch one or more of the species within a predefined group of species.

Our analysis of sea-day allocations and use of optimization methods to improve allocations rest on two primary assumptions.  First, the extant data are sufficient to obtain consistent estimates of the underlying variance of the discard ratio per stratum.  Consistency is ensured if the samples are representative.  Second, the relative size of the strata, i.e., the total number of trips, remains constant from year to year.  This is a more tenuous assumption, as the balance of fishing effort can change in response to changes in resource abundance or regulations.  Both of these assumptions are inherent in the use of retrospective data to improve a future sampling program.

The observer sea-day allocation model developed here represents an extension of Neyman optimal allocation (Cochran 1977).  Observer trips are allocated to strata as a function of their contribution to the total variance, the expected number of observer days per trip, and the probability that a trip will provide information on one or more of the species groups of interest.  The essential features of the sampling design and allocation process are summarized below.

  • Strata are defined on the basis of observable properties of the fleet sector
  • The sample unit within a stratum is a trip 
  • The primary response variables are total discards and kept weights of groups of species. Eleven groundfish species constitute one group, monkfish another group, and summer flounder-scup-sea bass, a third group
  • The probability of obtaining information on one or more of the species groups from a future trip in a stratum is estimated from analysis of observer data
  • An estimate of the probability of not obtaining any information about one of the three species groups is incorporated to allow appropriate increases in sample sizes commensurate with this risk
  • Expected average trip durations are defined for each stratum
  • Total observer days at sea serve as a constraint on the allocation process
  • Additional constraints can be imposed on the minimum and maximum numbers of samples per stratum
  • Unsampled strata use imputed (or borrowed) values from adjacent strata to ensure that some information is used for sample selection
  • Imputation also identifies gaps in coverage and allows for updates of the population frame as new data are acquired
  • Discard ratios and standard errors incorporate the approximate covariance of the ratio
  • The precision of the overall discard/kept ratio is the primary performance measure in the allocation process.
  • Total variance can be minimized subject to a total observer day constraint, or the number of observer days can be minimized subject to a desired level of precision 

Results from the optimization model are used as a tool to improve observer coverage.  Some post-processing of the optimized sea days is needed to fine-tune coverage across fleet sectors. Where feasible, the fine-tuning of sea-day allocation capitalizes on the multi-purpose attributes of observer coverage oriented toward assessment of non-finfish species (e.g., acquire data in the sea scallop fishery from trips designed to evaluate turtle bycatch rates.)

Presently the model is based on aggregate Discard/Kept (D/K) ratios. These ratios are relevant to most fisheries but, of course, the Discard/Effort (D/E) ratio is important in others.  D/E ratio data have been prepared but not yet implemented in the model.   D/E ratios are relevant for fisheries such as sea scallops, northern shrimp, and herring.  It should be noted that one of the primary difficulties of implementing the D/E methodology is the selection of an appropriate unit of effort. The “trip” level of effort may be the most useful but additional work will be necessary before extending the methodology  to optimally allocate observer coverage to these fisheries.

The optimization methodology addresses the precision of the overall D/K ratio in the context of multiple objectives and limited resources.  The issue of accuracy/bias is addressed by comparing various properties of vessels with and without observers onboard.   Bias -- the systematic difference between the estimated and true value -- is addressed by first ensuring that the vessel trips are representative, and that a variety of quality assurance/control procedures are employed to accurately monitor vessel performance.  Refusals to take an observer and other forms of non-response by industry are possible sources of bias.  These sources are addressed via increased use of Enforcement personnel.   For these concerns, the NEFSC observer program is consistent with the recommendations of the NMFS National Working Group on Bycatch (NMFS 2004).

Babcock et al. (2003) assert that increases in sampling effort are sufficient to reduce bias.  If the presence of observers onboard alters the vessels fishing patterns, then it can be argued that all observed trips yield potentially biased results.  If the unobserved vessel fishes with different methods in different areas and so forth, then the increases in sample size can only reduce but not eliminate the scope for bias.  A variety of statistical techniques for inferring bias can be applied, but a review of the literature suggests that these techniques have been only moderately successful.  Independent measures of vessel behavior may be possible from Vessel Monitoring System data, but such analyses can only detect gross changes from observed trips.  Where possible, verification by independent data sources is encouraged, but one should be careful to avoid the problems of incorrectly assuming that a particular methodology is completely unbiased.

Several tests were conducted to address the potential sources of bias by comparing measures of performance for vessels with and without observers present.   Bias can arise if the vessels with observers on board consistently catch more or less than other vessels, if the average trip durations change, or if vessels fish in different areas.  Each of these hypotheses was tested by comparing observable properties in strata having vessels with and without observers. 

Average catches (pounds landed) for observed and total trips compare favorably, following an expected linear relationship. The expected difference of the stratum specific means and standard deviations for both kept weight of groundfish and total trip duration was near zero.   The frequency distribution of these differences provided no evidence of systematic bias.  The mean difference between average catch rates of 238 pounds was not significantly different from zero (p = 0.59, df = 84).   A paired t-test of the stratum specific standard deviations of pounds kept suggested no significant difference from zero (p = 0.08).  A similar analysis of average trip duration revealed a strong correlation between observed and unobserved trips (Figure 7) and a suggestion that the observed trips were about a half-day longer when the observer was on board (p = 0.01).  A paired t-test of the difference in stratum specific standard deviations of trip length was not significantly different from zero (p = 0.60) (Figure 8B).  Some skewing of the differences in mean trip durations was observed, with observed trips being slightly longer.

Two measures of spatial coherence suggest that the spatial distribution of fishing effort for trips having observers closely matches the spatial distribution of all trips.  The null hypothesis of observer proportions equal to the VTR proportions was rejected (P<0.05) in 20 of 65 comparisons.  Of these 20 cases, 10 involved ports in Southern New England and the Mid-Atlantic region where landings of New England groundfish are expected to be low.  Of the remaining ten cases, five involved the large and extra-large gill net fisheries that mainly target monkfish.  Thus, the null hypothesis of equivalent spatial distribution of sampling was rejected in only 5 of 50 fleet sectors, a rejection rate only slightly higher than due to chance alone.  

A paper by Murawski et al. (2005 in press) presents information on the spatial distribution of otter trawl fishing effort for vessels with Vessel Monitoring Systems (VMS) with the distribution of tows on observed trips. Qualitatively, the spatial distributions match very well with high concentrations of effort near the boundaries of the existing closed areas on Georges Bank and within the Gulf of Maine.  Moreover, the effort concentration profiles deduced from VMS data coincided almost exactly with the profiles derived from observed trips. Overall, these comparisons suggest strong coherency between the two independent measures of fishing locations.

An assessment of the sources of uncertainty in the design and data collected in the Northeast Fisheries Observer program indicates that the level of precision in the discard ratios (d/k) for the New England Groundfish fisheries as a whole is high and there is little evidence of bias.  However, at finer temporal and spatial scales, precision of the discard ratios will generally be lower than the aggregate.  Precision of the discards estimates will also be lower for individual species, age groups and size classes.


Introduction

Estimation of bycatch in any commercial fishery is a difficult task.  At the level of an individual trip, bycatch occurs sporadically over wide geographical ranges.  Proper quantification typically requires presence of trained observers.  The commercial marine fisheries of the Northeastern US comprise many vessels of widely different sizes, targeting multiple species in a variety of habitats.  Overlaying the complexity of the fleet and target species is a complex regulatory environment that constrains fleet behaviors.   Since many stocks are in rebuilding phases, the effects of restrictions on landings per trip, and therefore revenue per trip, are difficult to predict.  The Northeast Fisheries Observer Program (NEFOP) addresses this complexity by first ensuring that the data obtained from any trip are of the highest quality.  This is achieved through a rigorous training program, standardized on-board data collection protocols, and thorough auditing of data.   To allow for extrapolation from the sample data to the fleet as a whole, these procedures must be embedded in a statistical sampling design.  This report provides a summary of the issues relevant to the design and analysis of the observer sampling program particularly with respect to the allocation of observer days to achieve desired levels of precision. 

The NEFOP program incorporates the following important features:

  1. Definition of a sampling frame across all relevant fisheries
  2. Identification of strata based on observable properties
  3. Development of rules for imputing variance estimates in unsampled strata (i.e., “borrowing” estimates from appropriate strata)
  4. Use of a trip as the sample unit (rather than individual tow)
  5. Definition of discards by species groups, corresponding to the major finfish species within the Northeast US.
  6. Use of discard to kept ratios (d/k) for species groups as the primary response variable.
  7. Estimation of approximate variances for d/k for groups of species, rather than individual species
  8. Allocation of sampling effort based on reduction in total variance of the d/k estimate, subject to total cost constraints.
  9. Allowance for observer coverage in remaining fisheries not included in the sampling frame, owing to other priorities (e.g., protected species concerns).
  10. Where feasible, capitalize on the multi-purpose attributes of observer coverage oriented toward assessment of non-finfish species (e.g., acquire data in sea scallop fishery from trips designed to evaluate turtle bycatch rates.)

In this report we describe the foundations of our standardized approach for bycatch reporting methodologies and the primary sources of uncertainty.


Background

The Northeast Fisheries Science Center (NEFSC) routinely allocates observer coverage to monitor bycatch (fish, invertebrates, and protected species) in the commercial fisheries in the Mid-Atlantic and New England regions.   The observer coverage is administered in units of ‘sea days’.   Based on the daily cost of an observer at sea, the available funds determine the number of potential sea days.  However, for the New England groundfish fishery, the number of sea days is presently mandated to be 5% coverage of the fishery.  The projected fishing activity (in days) for the year is estimated by the available days-at-sea allowed under the Northeast Multispecies Fishery Management Plan.  Thus, in a given year, the NEFSC has a mixture of mandated sea days and non-mandated sea days to monitor bycatch in the Northeast region (North Carolina to Maine) for various fisheries.  

Allocation of sea days is guided by an optimization algorithm that is based on generalization of the well-known Neyman allocation principle in survey sampling.  Precision of the overall estimate of the discard ratio is improved by allocating samples to strata with the greatest contribution to the total variance, subject to an overall constraint on available resources.  In this application, “resources” refers to the total number of observer days available.  Improvement of the allocation process requires an evaluation of the current sampling design and precision of estimators.  The ability to improve the design is contingent on the reliability of the stratum-specific variances and the persistence of these estimates in the future (or at least the next sampling period).

The optimization algorithm can be used to (1) minimize the variance of the discard estimate subject to a given number of sea days, or (2) minimize the number of sea days subject to a desired level of precision.  Results from the optimization model are used as a tool to improve the coverage.  However, the model does not incorporate information regarding sampling for protected species, nor does it include information for fisheries where the discard ratio may be more appropriately measured by a discard to effort ratio (d/e).  Thus the model predictions are conditioned to exploit the multipurpose utility of the protected species sampling, and coverage in important fisheries (like sea scallops) is ensured by reserving some additional days to “level out” sampling that may be required for either protected species or closed area trips.

This report will describe: 1) the fishery identification and data sources used; 2) imputation rules for unobserved fisheries; 3) sampling theory and optimization methods; 4) application of the model to observer coverage; and 5) address accuracy issues discussed by Babcock et al. (2003).


Definition of Strata -- Fishery Identification 

Diverse commercial fisheries are prosecuted off the Northeastern coast of the USA.  These fisheries vary in size (number of trips) and have varying bycatch rates.   To monitor these fisheries with at-sea observers, it is necessary to stratify the trips into fleet sectors with similar characteristics.  For this report, fleet sectors are defined as strata within a survey design.

Commercial fishing trips are partitioned into fleet sectors using five classification variables:  calendar quarter, gear type, mesh size, geographical region, and trip length.   These classification variables are selected because they are generally known before a trip occurs. Using these criteria it is possible to generate a list of candidate vessels for each stratum, which simultaneously enables a random selection process and reduces the number of repeat trips on vessels. This is a critical aspect for both strata definition and sample selection.    One cannot base a sampling design on the outcome of a sample observation.  In this exercise, it is not possible to select a sampling design that specifically improves the precision of cod discards, since that objective is dependent on the realization of the actual sample.    However, it is possible to select samples that will improve the probability of obtaining improved discard estimates by estimating the expected proportion of trips that catch species groups of interest.

Calendar quarter was considered the most feasible temporal unit to capture seasonal variations in fishing activity and bycatch rates over the full range of fisheries.  Although some management regulations operate at a finer scale (e.g. weekly), quarterly data can be further subdivided if finer resolution is needed.   Otter trawl, gillnet and longline gear were defined as the three major gear types for finfish.   Otter trawl and gillnet trips were classified into four mesh size groups:  Small (less than 3.99 inch mesh); Medium (between 3.99 and 5.49 inch mesh); Large (between 5.5 and 7.99 inch mesh) and XLarge (8.0 inch mesh or greater).   Additionally, trips are classified into six geographical regions based upon the port of departure: ports located within Maine and New Hampshire (ME_NH); Massachusetts (N_MA, excluding Bristol county); Connecticut, RI, and Bristol county, MA (SNE); New Jersey - New York (NJ/NY); Maryland and Delaware (MD/DE); Virginia and North Carolina (VA/NC).  Trip length serves as a surrogate for spatial resolution (inshore vs. offshore).   Otter trawl trips are further classified into two trip length categories: day trips and multi-day trips.  Longline and gillnet gears are not partitioned by trip length.

Due to the mixture of species caught during a trip, it is not sufficient to classify trips with regard to target species because discard of target and non-target species may occur.  To account for target and non-target discard, trips in each fleet sector are classified into one or more of three species groups:  New England groundfish (NEGF); summer flounder, scup and black sea bass (FSB); and monkfish (MONK).   There is often overlap between trips which catch NEGF, FSB and MONK.  The estimated number of trips and sea days needed to cover these fleet sectors may be overestimated when the trips are assumed to be independent, therefore the overlapping nature of the fishing fleets are taken into account.  Sampling fractions, and how the overlap is accounted for, are described in a later section.

Eleven species constitute the New England groundfish species group: cod, haddock, yellowtail flounder, American plaice, witch flounder, winter flounder, redfish, pollock, white hake, windowpane, and halibut.   If a trip catches (retains or discards) at least 1 of the 11 large-mesh regulated species, the trip is categorized as NEGF trip and the hail weights of the 11 species are summed to form an aggregate species total for NEGF.  Similarly, if a trip catches (retains or discards) either summer flounder, black sea bass or scup, the trip is categorized as a FSB trip and the hail weights of these species are summed to form an aggregate species total for FSB.  If a trip catches (retains or discards) monkfish, then the trip is categorized as a MONK trip.   A trip may be categorized to one or more of the three species groups.


Data Sources

Trip characteristics are recorded in both the NEFOP and Fishing Vessel Trip Reports (VTR) data sets.  Together, these databases are used to define the size of the sample and the size of the strata, respectively.   Data from each source are retrieved and prepared separately before the two sets are combined (Figure 1).     

Fishing Vessel Trip Report Data

Beginning in June 1994, the Northeast Region’s data collection system was changed from a voluntary to a mandatory reporting system for USA fishermen and dealers who catch and buy/sell groundfish species regulated by the Northeast Multi-species Fishery Management Plan.  The mandatory reporting system consists of two components: 1) dealer reporting and 2) vessel trip reporting.  Each component contains information needed for fishery management and stock assessment analyses: the dealer reports contain total landings by market category, while the vessel trip reports contain information on area fished, kept and discarded portions of the catch, and fishing effort.   The VTR data has been routinely used in management analyses and peer reviewed stock assessments. Details on example applications of the VTR to stock assessments may be found in a large number of reports of the Stock Assessment Review Committee (SARC). Reports prepared since 2000 may be found at http://www.nefsc.noaa.gov/nefsc/saw/. Earlier reports are available by contacting saw_reports@noaa.gov.

In this report, the VTR data are used to: 1) define the sampling frame of the commercial fishing trips, and 2) evaluate the accuracy of the observer data with respect to area fished, kept pounds, and trip length. The VTR data are the only synoptic data source for vessel activity, area fished and fishing effort for commercial fisheries.  The Vessel Monitoring System data and the Days-At-Sea data systems cover only portions of the fisheries and therefore are limited in use. 

The VTR data can be used as a basis for defining the sampling frame, because all federally permitted vessels are required to file a VTR for each fishing trip (see NMFS-NERO http://www.nero.noaa.gov/ro/fso/vtr_inst.pdf).   These self-reported data constitute the basis of the fishing activity of the commercial fleets.  The VTR trip data are collapsed into fleet sectors and species groups as defined above. For each species group within a fleet sector, the number of trips that caught the species group, the average number of days absent, and the weight of the species in the species group are calculated.

The limitations of self-reported catch data are well known (e.g., Walsh et al. 2002, NMFS 2004).  Limitations of the initial data VTR data sets were described by the SARC in 1996 (NMFS 1996).  Since then, many of these limitations have been addressed. In particular, subsequent peer-reviews through numerous SARCs  and a review by the National Research Council (1998) have identified the strengths, weaknesses, and appropriate uses of the VTR data from the Northeast.  

The validity of VTR data as a basis for a sampling frame is supported by comparisons with total landings data from dealer records. All dealers which buy and sell groundfish regulated by federal FMPs are required to report 100% of the landings.  These data are generally thought to constitute a near census of landings of groundfish. The NRC (1998) noted that misreporting of landings is “usually a significant issue only when fisheries are managed by setting a total allowable catch.”  On this basis, the magnitude of misreporting by dealers would be low as Northeast groundfish stocks have been managed primarily through effort controls.  A comparison of total groundfish landings from VTR and Dealer records for calendar year 2003 reveals close agreement between the two sources:

Species

VTR Landings (mt)

Dealer Landings (mt)

Difference (mt)

Pecent Difference

Cod

8240

8692

452

5.2%

Winter flounder

5321

5714

393

6.9%

Witch flounder

2971

3108

137

4.4%

Yellowtail flounder

5208

5530

322

5.8%

American Plaice

2204

2415

211

8.7%

Windowpane flounder

102

60

-42

-70%

Haddock

5778

5874

96

1.6%

White Hake

2268

3305

1037

31.4%

Halibut

11

13

2

15.4%

Redfish

338

360

22

6.1%

Pollock

3839

4188

349

8.3%

Total

36281

39258

2977

7.6%

For the three major species, cod, haddock and yellowtail flounder, the percentage differences range from 1.6% to 5.8%. Only windowpane flounder, white hake and halibut exhibit large percentage differences. Total landings of windowpane flounder and halibut represent small fractions of the total (0.3% of VTR and 0.2% Dealer) landings and these percentage differences are considered negligible.  Large percentage differences for white hake may be attributable to confusion between white hake and red hake. White hake can be difficult to distinguish from red hake (sp) and may be identified simply as “hake” by both dealers and fishermen.  The overall difference of 7.6% is dominated by large differences in the landings of white hake. Excluding white hake from the comparison reduces the overall percentage difference to 5.4%. 

Other measures to ensure the validity of the VTR database include routine auditing procedures, standardized data entry protocols and compliance reviews (pers. comm. Greg Power, Chief, Fisheries Information Section, Northeast Regional Office, NMFS).

Northeast Fisheries Observer Program Data

The NEFOP employs trained, sea-going observers to collect catch data by species and disposition (retained and discarded).  Biological samples, gear characteristics data, and economic information are also collected.  For the optimization data set, only observed hauls from trips classified as ‘standard sea sampling trips’ are used.   Observed trips that were aborted or which used a ‘limited’ fish sampling protocol (no discard data collected) are excluded.   Hail weight can be reported in round or dressed weights; if kept hail weights are reported as ‘dressed’, then the hail weight is converted to round (live) weight using Commercial Fisheries Database System (CFDBS) conversion factors for the species.   All discard hail weights are assumed to be round (live) weight.

The NEFOP data are collapsed into strata as defined above.  For each stratum, the number of observed trips that caught one or more of the three species groups is calculated. For each fleet sector and species group, the number of observed trips, number of observed hauls, average trip length (in days), kept weight of all species in the species group, discarded weight of all species in species group, and the number of observed days are calculated.  A discard ratio and the variance of the ratio are calculated for each stratum (fleet sector and species group). 

Optimization Data Set

The VTR and NEFOP data sets are concatenated by fleet sector and species group.  A list of variables and their definitions are presented in Table 1.  Not all VTR fleet activity may have NEFOP coverage (Table 2).  When fleet sectors do not have observer coverage, imputed values are used (Table 3).  The imputed values are derived from NEFOP data from similar fleet sectors, thus providing an estimate for the non-observed fleets.  Details of the imputation process are provided in the following section.

The optimization tool is flexible and allows the user to select the entire input data set, or a subset.  To allocate sea days for an entire year, four calendar quarters of data are used.  Using the most recent available data, given the time needed for data entry and auditing, the year consists of calendar quarter 3 and 4 from year -1 and calendar quarter 1 and 2 from the current year.

The three gear types (otter trawl, gillnet, and longline) used in the optimization data set are gear types for which fishing regulations allow finfish to be retained, thus a discard to kept ratio estimator (d/k) is used.  Fisheries using other gear types where regulations may prohibit groundfish possession are excluded from the current optimization process because a d/k ratio is not appropriate for these cases.


Imputation Rules for Unobserved Fisheries

Not all of the fishery strata had observed trips between April 2003 and March 2004. To account for the expected variance of the estimates in the missing cells, it was necessary to develop a standardized procedure to handle both missing and minimal levels (e.g., a single trip) of observer coverage.  This procedure is referred to hereafter as ‘imputation’ and the estimates derived by the imputation are referred to ‘imputed values’.  Imputed values are derived by sequentially relaxing the fleet sector classification. The fleet sectors for each species group (NEGF, FSB, and MONK) are imputed separately.  The imputed values fill in missing values for the unobserved strata.  Fishery strata are defined with respect to rigid definitions of categorical variables such as region or quarter.  A stratum with missing data must be filled with data from similar strata.  To identify suitable candidate strata as “donor” or “parent” cells, it is necessary to “relax” the definitions of the strata.  For example, if no trips occur in the Jan.-Mar. quarter, one might relax the definition to include data from the Jan-Jun. half year.  The objective process of relaxing strata definitions to impute data is described below.

A fleet sector was not imputed if:

1) VTR number of trips = 0 (no imputation needed when there is no fleet activity for the species group);

2) VTR number of trips > 0 and standard error was not missing (no imputation needed when there is fleet activity for the species group and there is a standard error of the observer d/k ratio); and

3) VTR number of trips > 0 and total observed kept pounds = 0 (no imputation needed when there is fleet activity for the species group and the standard error cannot be calculated); otherwise, the fleet sector was imputed.

The imputation uses three increasing levels of aggregated NEFOP data (using the same data and calculation methods as the original calculations of observed d/k ratio and associated statistics).  Three of the five stratification factors are relaxed (region, mesh size and calendar quarter).   Gear type and trip length are used, but their stratification is not relaxed.  Trip length is not relaxed because the average trip length is used to determine the number of sea days needed to obtain the desired precision level.  Gear type is not relaxed because of fundamental differences in catches (retained and discarded) occur using these gear types.

Level 1: Calendar quarter is relaxed to half year and the six geographic regions are relaxed to two regions (NE region = ME/NH, N_MA, SNE; MA region = NY/NJ, DE/MD, NC/VA); gear, mesh size and trip length categories are maintained.

Level 2: Calendar quarter is relaxed to an entire year, the six geographic regions are relaxed to two regions (as in Level 1), and the four mesh groups are relaxed to two mesh groups (SMALL = small and medium mesh groups; LARGE = none, large, and Xlarge mesh groups); gear and trip length categories are maintained.

Level 3: Calendar quarter is relaxed to an entire year (as in Level 2), the six regions are relaxed to one region (all six regions combined), and the four mesh groups are relaxed into one mesh group. This level served as a ‘catch-all’ for all remaining fleets sectors that required imputation. 

The VTR-NEFOP data set is merged with Level 1 NEFOP data; if a fleet sector needs imputed values, based on the criteria list above, then the imputed values from the observed trips in Level 1 are transferred to the corresponding VTR-NEFOP fleet sector and species group only if the trips in the Level 1 data set are greater than 1.  Data from Level 2 and Level 3 are subsequently merged with the VTR-NEFOP.    When imputed values are used in the VTR-NEFOP data set, the fleet sector and species group is ‘flagged’ with the imputation level used.   All fleet sectors that need imputation obtain values at one of the three levels. 

Below is a summary of the number of fleet sectors, by imputation level and species group used in the 2005 sea day allocation.

Imputation Level Species group
NEGF FSB MONK
Level 0 150 116 111
Level 1 30 51 44
Level 2 27 41 35
Level 3 20 19 37
Total 227 227 227

To include all fisheries using otter trawl, gillnet and longline gear in the optimization, approximately 33% to 50% of the mean discard rates and variances are imputed or ‘borrowed’.

When a fleet sector and species group is imputed, five variables (number of observed trips, observed d/k ratio, total observed kept pounds, standard error of the d/k ratio, and number of observed days) are estimated with imputed values. Because the aggregated NEFOP data at each level have more observations than the original VTR-NEFOP fleet sector, the imputed values need to be rescaled before they are used. Except for the imputed d/k ratio, the imputed values for the number of observed trips, the total observed kept pounds, the standard error and the number of observed days are re-scaled using a sampling fraction represented by the ratio of the total NEFOP trips for that level, fleet sector and species group to the total VTR trips for that level, fleet sector and species group. Equations used to re-scale imputed values within stratum h are:

Tvtr = total VTR trips of Level i
Tobs = total NEFOP trips for Level i
Timp,h = (Tobs / Tvtr) * Tripsvtr,h;
Keptimp = (Timp,h / Tobs) * NEFOP kept pounds sum in Level i
SE imp = (Tobs / Timp,h) 1/2 * NEFOP standard error in Level i
Days imp = (Timp,h / Tobs) * total number of NEFOP days in Level i
Timp,h is rounded to a whole number, if Timp,h < 1, then Timp,h = 1;

where Level i denotes Imputation Level 1, Level 2 or Level 3.


Sampling Theory and Optimization Methods

Fishing trips are considered the primary sample unit in estimating d/k ratios.   Fishing trips generally catch multiple species, some of which are not landed owing to various regulations or market conditions.  We defined three major groups of species: (1) New England groundfish, (2) summer flounder, scup and sea bass, and (3) monkfish.  Fishing trips in a given stratum may catch species from one or more of these groups.  The degree of overlap among species groups has important implications for the efficacy of sampling within strata, i.e., the number of samples necessary to achieve a desired level of precision.  Because some fraction of trips provide information on more than one species group,  estimates of sample size based on the assumption of independence, will overestimate the number of required trips.  Developing estimators that explicitly account for the magnitude of overlap can circumvent this potential inefficiency. There are two ways to approach this estimation.  One is based on the pattern of overall trips from the vessel trip reports.  The second is based on the pattern in observer sampled trips.  In theory, if the observed trips are a representative sample, the proportions in the vessel trip reports and observer trips should be the same.  In practice, the proportions in the observed trips will deviate from those in the VTRs due to sampling variability and other factors.  The selection of observed trips reflects a practical mix of vessel availability, knowledge of vessel operations, familiarity, and safety considerations.  These are, of course, important factors for program management, but it must be recognized that these factors introduce bias into estimates.

Both approaches follow the algorithm described below.  Let Ihij be an indicator variable denoting the presence or absence of species group j within trip i in stratum h.   Then Ihij =1 if species group j is present, else 0.   A design matrix can be used to describe each unique trip within a stratum.  The design matrix appends to each trip record a set of indicator variables that identify the presence/absence of species groups caught.  The following table illustrates a hypothetical case with 7 trips in stratum h.

Example 1

  Ih_1
j=1
Ih_2
j=2
Ih_3
j=3
Trip ID NEGF Monk FSB
1
1 0 0
2
1 1 0
3
1 1 1
4
1 0 1
5
0 1 1
6
0 1 0
7
0 0 1

Sum
4 4 4
nh=7
nh1 nh2 nh3

In this simple example, four of the seven trips caught New England groundfish, four trips caught monkfish, and four caught summer flounder, scup or sea bass. If all of these trips (or trip types) are equally likely, then the probability of obtaining a sample that yields information on NEGF is 4/7 and so forth. The probability of obtaining information on species j is the sum of the species group specific trips within the stratum (i.e., nhj) divided by the total number of unique trips within the stratum (nh). Note that

owing to the overlap in coverage for some trips. The probability that a random trip provides information on species group j is defined as

(1)

For each stratum, the probabilities can be computed that a random sample will contain information about species group j. The basis for the probability estimator can either be the observed set of trips within a stratum or the total set of trips represented in the VTRs. Applying the same set of indicator variables to the VTR data, one can obtain the population estimates of these quantities as

(2)

Eq. 1 establishes the basis for a random sample from the set of observed trips. Eq. 2 establishes the same basis from the VTR. On first principles, Eq. 2 is a better estimator if a representative sample can be taken in a stratum. Eq. 1 is more appropriate if the set of observed trips within a stratum is representative of those trips available for observation.

Using Eq. 1 or 2, it is now possible to examine the effects of altered sample sizes. Let n’h represent the new total number of trips to be taken in stratum h. For the purpose of evaluating the expected change in variance in the component species groups, the n’hj for each species group need to be redefined. This is accomplished using the equation
(3)

if Eq. 1 is used , or

(4)

if Eq. 2 (based on VTR) is used to estimate the expected probabilities that a trip in stratum h will capture fish from species group j.

Another worked example will reinforce the basic concept of the expected proportions of samples likely to sample species group j. Consider a stratum with 10 observed trips with Eq.1 used to estimate p’hj.

Example 2
  Ih_1
j=1
Ih_2
j=2
Ih_3
j=3
Trip ID NEGF Monk FSB
1
1 1 0
2
1 0 0
3
1 0 1
4
1 1 0
5
1 1 1
6
0 0 1
7
0 0 1
8
1 0 1
9
0 1 0
10
0 1 0

Sum
7 4 5
nh=10
nh1 nh2 nh3
Phj
7/10 4/10 5/10

If the nh were increased to n’ h=30 then the revised estimates of n’hj would be

Thus, adding 20 trips to stratum h would translate into an expected increase of 14 trips for NEGF (i.e., 21-7), 8 trips for monkfish (i.e., 12-8) and 10 trips for FSB (i.e., 15-5). The increase in the total number of trips for a stratum differs with respect to the pattern of information in the sample. The allowance for non-integer numbers of trips is considered to have a negligible effect. In practice, the actual implementation of a sampling strategy would be based on rounding to the nearest integer, and subject to a lower bound constraint, say nhj= 2.

Example 2 could be repeated for estimates derived from the VTR data. For such an example, the universe of trips would be much larger.

Measures of Overlap

Venn diagrams of the number of trips in the VTR and NEFOP depict the degree of overlap between the three species groups in the two data sets. In the April 2003-March 2004 VTR database, half of the trips (22,274 trips out of 43,703 trips) are unique to the species groups (Figure 2), while in the NEFOP database, a third of the trips (286 trips out of 1,103 trips) are unique to the species groups (Figure 3). The sampling fractions (NEFOP trips divided by VTR trips) are given in Figure 4. The numbers of trips (and days) in the Venn diagrams are based on whole trips, and therefore slight differences occur in the number of trips between the Venn diagram and d/k ratio analyses (e.g. there are trips in d/k ratio analysis which used two different mesh sizes during a trip).

Observers Days at Sea Constraints

hile trips constitute the sampling unit, the total number of sampling units is constrained by the total number of days available during any interval. To consider this component of the sampling design, it is necessary to consider the average trip duration in stratum h. Let thi be the trip duration (days) for the i-th trip in stratum h. The total number of observed trips in stratum h is nh and the total number of observed days is thi The average trip duration is estimated as

(5)

The actual number of future observer days that will be required under some new sampling intensity (n’h) is proportional to n’h/ nh. Eq. 5 can also be defined in terms of the durations of the trips in the VTR database. The expected total number of days allocated to stratum h is defined as

(6)

regardless of whether observer or VTR data are used. The average trip duration in stratum h is not influenced by the number of trips allocated, as long as the trips selected are representative of the basis used to define the species composition of the trips. Recall that either the observer database or the VTR database can be used. Thus the total number of observer days allocated to stratum h under some new allocation is

(7)

The grand total number of days at sea that would be allocated given some new set {n’h} would be

(8)

Some key points in this derivation are:

  • It is not possible to derive any real-world sampling program without considering the key uncertainties related to the probability that the trip will be “successful” and that the cost of sea days may vary.
  • The number of successful trips, relative to the objective of reducing the variance of the estimate, is a random variable, based on a probability estimate. The expected number of actual trips may not actually result in information necessary to improve the precision of the estimate.
  • The “cost” per trip is expressed as the expected duration. Actual duration may also vary within strata, although the stratification is designed reduce the variation in this component.

Optimization is a technique for maximizing (or minimizing) some quantity of interest subject to one or more constraints. Constraints are the key concept. In this application, we consider upper and lower bounds on the size of the sample within a strata, a total constraint on the number of available days, and a constraints related to acceptable levels of precision. For problems that do not explicitly consider dynamic (i.e., time dependent) processes, a variety of optimization methods can be used including linear and nonlinear programming. For this project, the optimization program, Premium Solver Platform (Version 5.5) developed by Frontline Systems, Inc. (2003) was used.

To address the optimization problem, the overall variance of the discard to kept ratio must first be estimated. The discard ratio for species group j in stratum h is the sum of discard weight over all trips divided by sum of kept weights over all trips:

(9)

where dijh is the discards for species group j within trip i in stratum h and kijh is the kept portion of the catch. Rjh is the discard rate for species group j in stratum h. The stratum weighted discard to kept ratio for species group j is obtained by weighted sum of discard ratios over all strata:

(10)

The variable Ih is a zero/one indicator of whether or not a stratum is included in the computation. The indicator variable can be considered as a composite measure of the suitability of stratum h in the estimator. The indicator variable allows a stratum to be filtered on the basis of one or more metrics. A more complete description of the various types of filtering is described in the next section.

The approximate variance of the estimate of Rjh is obtained from a first order Taylor series expansion about the mean:

(11)

where dijh is the total discard weight of species group j in trip i within stratum h, kijh is the total kept weight of species group j in trip i within stratum h, njh is the sample size (number of trips) that caught species group j in stratum h, and kjh bar is the mean kept landing of species group j within stratum h. Note that in this formulation of the variance, the finite population correction factor (fpc), i.e., one minus the sampling fraction within the stratum, has been omitted. This has been done to improve readability. The fpc is included however, in Eq. 11 for the total variance of the d/k ratio.

The variance of the d/k ratio for species group j over the entire set of strata is estimated using standard sampling theory methodology for a stratified random design as
(12)

The overall coefficient of variation for the discard/kept ratio is defined as

(13)

It is now possible to define an overall estimate of the relative precision of the d/k ratio across all species groups as

(14)

where j is an arbitrary weighting factor for species group j. In this formulation, the j can be used as binary factors (0,1) to examine the allocations individually for species groups.

The optimization tool evaluates the potential improvements in the precision of the discard ratio through reallocation of the number of trips to individual strata. Equation 11 illustrates that the variance of the ratio decreases as the number of trips (nh) increases. Assuming that the data yield representative estimates of the stratum specific variances, then the reduction in total variance can be examined as a function of alternative allocation schemes for each stratum. If n*h is defined as the optimal number of trips taken in stratum h, then the variance of the overall ratio is estimated as

(15)

The optimization problem can now be posed as the minimization of the CV of the composite ratio estimate, subject to a total days at sea constraint (T C) and constraints on the number of trips per stratum.

(16)

Alternatively, the optimization problem can be defined with the objective of minimizing the total number of days at sea, subject to an acceptable coefficient of variation (CVCRIT). This version of the model can be written as:

(17)

Another relevant consideration is that a trip may not yield information on any of the target species groups. In some strata, for example, a number of trips fail to capture groundfish, monkfish or the summer flounder, scup and sea bass mixture. To protect against this possibility, it is desirable to inflate the optimal number of trip estimates by the ratio of Nh to N’h where Nh is the total number of trips in stratum h and N’h is the number of trips that obtained information on one or more of the species groups.


Application of the Model

Using the optimization algorithm to minimize the variance of the discard estimates subject to a given number of sea days, the allocation of observer sea days for the Mid-Atlantic (M-A) and New England (NE) regions was optimized separately and the resulting allocated sea days combined. Separate analyses were conducted because of differential sea days constraints (mandated sea days for New England groundfish versus non-mandated sea days for the Mid-Atlantic region). Before the optimization began, a portion of the available sea days were set aside to cover fisheries which do not enter the optimization process (e.g. scallop dredge fishery). For these fisheries, sea days are allocated proportional to fishing effort (number of trips or number of days fished).

The Mid-Atlantic optimization used data from the SNE, NJ/NY, DE/MD and VA/NC regions with the species weighting coefficients set to 1 for both FSB and MONK and to 0 for NEGF. The NE optimization used data from the SNE, N_MA, and ME-NH regions, with the species weighting coefficients set to 1 for NEGF and to 0 for both FSB and MONK. Data from the SNE region were included in both optimizations due to the intersection of the NE and M-A regions. Stratum indexes were applied to reduce the data set to contain only the relevant fisheries.

Below is a summary of the indexes and thresholds used in the NE and M-A sea day optimizations.

NE region trip and landings setting and thresholds

Switch

Setting

Threshold (fraction)

Description of Filters that Operate on Entire Strata

I(L_negf%)

1

0.0025

Landings of NEGF<Threshold=>0, else 1

I(L_fsb%)

(All)

0.0001

Landings of FSB<Threshold=>0, else 1

I(L_monk%)

(All)

0.0001

Landings of Monk<Threshold=>0, else 1

sum(I(L_all%))

(All)

NA

If any of Landings indices for NEGF,FSB or Monk=1 then =>1, else 0

I(Nh_negf%)

1

0.0001

Trips of NEGF<Threshold=>0, else 1

I(Nh_fsb%)

(All)

0.0001

Trips of FSB<Threshold=>0, else 1

I(Nh_monk%)

(All)

0.0001

Trips of Monk<Threshold=>0, else 1

I(%TotVTR_3sp)

1

0.00005

Filter on % of total landings of 3 species groups

Filter on All Trips

0

NA

Excludes entire Strata if value=0

M-A region trip and landings settings and thresholds

Switch

Setting

Threshold (fraction)

Description of Filters that Operate on Entire Strata

I(L_negf%)

(All)

0.0025

Landings of NEGF<Threshold=>0, else 1

I(L_fsb%)

1

0.0001

Landings of FSB<Threshold=>0, else 1

I(L_monk%)

1

0.0001

Landings of Monk<Threshold=>0, else 1

sum(I(L_all%))

(All)

NA

If any of Landings indices for NEGF,FSB or Monk=1 then =>1, else 0

I(Nh_negf%)

(All)

0.0001

Trips of NEGF<Threshold=>0, else 1

I(Nh_fsb%)

1

0.0001

Trips of FSB<Threshold=>0, else 1

I(Nh_monk%)

1

0.0001

Trips of Monk<Threshold=>0, else 1

I(%TotVTR_3sp)

1

0.00005

Filter on % of total landings of 3 species groups

Filter on All Trips

0

NA

Excludes entire Strata if value=0

NE and M-A regions d/k ratio thresholds

 

Threshold (d/k ratio)

Description of Filters that Operate on Individual Cells (Species within Strata)

Number of Cells Included

Number of Cells Excluded

Max d/k_NEGF

1

Maximum d/k ratio used for NEGF. Values>Threshold excluded

25

11

Max d/k_FSB

2

Maximum d/k ratio used for FSB. Values>Threshold excluded

32

4

Max d/k_Monk

2

Maximum d/k ratio used for Monkfish. Values>Threshold excluded

33

3

Some ‘post-processing’ of the allocation of optimized sea days was necessary.  Even though one or more indicator variables (i.e., filters) were applied during optimization, it was necessary to fine-tune the sea day allocations by applying a minimum and maximum amount of coverage, and to maintain coverage of fishing activity throughout the year.  The optimized sea days were multiplied by the average trip duration for each stratum to estimate the projected number of observed trips.  If the projected number of observed trips was less than 3 trips per strata, then the sea days were redistributed to other strata representing more relevant fisheries.  If the number of potential observed trips in a stratum exceeded 15% of the VTR trips, then the sea days in that stratum were reduced to the number of sea days representing 15% (potential observer trips/VTR trips) coverage.  The sea days from strata exceeding the 15% coverage cap were reassigned to other strata.

The number of unique vessels and the vessel selection protocols in a stratum limit the number of trips that can be observed in that stratum.   The number of unique vessels varies among strata; in the 2005 sea day optimization, the number of unique vessels in a stratum ranged between 1 and 146 vessels, with 85% of the strata having 50 vessels or less.   The vessel selection protocols state a vessel is not to be observed more than twice during a month.  As an approximate guide for balancing between the potential number of observed trips and the number of unique vessels in a stratum, a 15% trip coverage cap was selected to prevent assigning more sea days to a stratum than the number of vessels could support.  The 15% cap prevented clustering of sampling effort, particularly in instances where the estimate of the variance of d/k might be imprecise.  In these instances, the optimization model will tend to allocate large number of trips to such strata to reduce the standard error of the estimate.  When the analysis was restricted to the relevant strata for the New England groundfish fisheries, the 15% cap was binding in only 4 of 33 strata for the observer coverage allocation scheme based on 2,708 observer days.

The diagnostics within the optimization tool were used to evaluate the imputation process.  The optimization algorithm calculates the d/k ratios and the variance estimates for 'all data' and for 'data without imputed values'.  Generally, the d/k ratios and variance estimates were similar between the 'all data' and 'data without imputed values' for each species groups.  This indicates that the imputation generally provided consistent values across the three levels of aggregation.  


Precision, Bias and Sampling Intensity: A Rebuttal to E.A Babcock et al. (2003)

Understanding the sampling properties of estimates of bycatch derived from observer programs and other sources with respect to accuracy and bias is critical.  This section reviews issues related to bycatch estimation in observer programs with an emphasis on potential biases that may exist.  The NMFS national bycatch report (NMFS 2004) emphasizes that wherever possible, attempts to detect and guard against bias should be made in observer programs.  The report strongly advocates the development of rigorous randomization procedures in sample selection to help ensure representative sampling.  All can agree that with unlimited resources, the more observer coverage the better.  The real issue however is how to allocate finite resources to meet multiple requirements for stock assessment and protected species evaluation.  The cases that Babcock et al. (2003) point to as success stories typically have relative few boats involved compared to many other fisheries.  These cases are not representative overall of the issues facing program managers.

Babcock et al. (2003) insufficiently distinguish between two very different types of bias.  The first type arises when non-representative sampling occurs.  The second type is related to the statistical properties of the consistency of the estimators.  These two types of bias are very different and it is important to be clear which type of bias is under consideration. The second type of bias is typically reduced with sufficiently large sample size.  However, this may not be addressed by increases in sample size if fishermen refuse to take observers, if certain classes of boats cannot accommodate observers, etc.   Babcock et al. (2003) take as an article of faith that increasing the number of trips will reduce bias.  Some of the solutions identified by Babcock et al. (2003) for correcting bias (e.g. the use of bootstrap estimators) apply to correcting bias of the second type.  However, no amount of bootstrapping will overcome non-representative sampling.

The mean square error (MSE) of an estimate is composed of two elements, the variance of the estimate and the square of the bias (defined as the difference between the mean of the sample and the true population value).  The MSE therefore comprises two additive elements.  Cochran  (1977) notes that if bias is less than 10% of the standard deviation of the estimate, the effect of this bias on the accuracy of the estimate is negligible. As noted by Babcock et al. (2003), most work on the properties of estimates derived from observer programs have focused on the variance component, with far fewer studies examining bias.  For reasons described in detail below, we believe that estimating the bias of the first type is more difficult than intimated by Babcock et al. (2003).  It is nonetheless important to try to estimate this quantity.  Focusing on the precision part of the MSE in certain analyses does not imply that bias is unimportant, or that it should be dismissed as insolvable as suggested by Babcock et al. (2003)

A critical element of the arguments developed by Babcock et al. (2003) appears to be that increasing the number of trips sampled will, by itself, reduce bias of the first type.  This assertion, if true, is important.  However, no corroborative evidence is provided.  The argument is that fishermen will change behavior if they are subjected to a higher probability of being included in a sample, or of being sampled more frequently by observers.  In essence, fishermen will be less likely to fish in a non-typical manner when an observer is on board if the probability of selection is higher.  This may not be true if say a particular fishing trip has a 20% chance of being selected vs. a 10% chance and if the fishermen do not know in advance how many trips they may have to accommodate within a specified time period.   In any event, we doubt that this can be calculated unless a model of human behavior is part of the estimation procedure.  

Babcock et al. (2003) report that Sampson (2002) detected statistically significant differences between a multivariate indicator of landings composition by participants in the Enhanced Data Collection Project (EDCP) of the Oregon Department of Fish and Wildlife and the composition of landings by the entire groundfish trawl fleet.  This analysis is used to indicate that biases exist in voluntary programs such as the EDCP and that it is possible to use similar approaches to identify bias in observer programs in general.  What Babcock  et al. do not report is that Sampson indicated that the multivariate analysis employed (Principal Components Analysis) was only “moderately successful” in  capturing the properties of the data.  The first three principal components accounted for 15.4, 12.0, and 8.0 % of the variance `respectively for trips landing more than 10,000 lbs in which hake comprised less than 50% of the total (designated “Big” trips by Sampson).  For trips less than 10,000 lbs in which hake comprised less than 50% of the total (“Small” trips), the first three principal components accounted for 13.7, 10.4, and 9.0% of the variance.  Sampson (2002) reported significant differences between the participants in the EDCP and the total fleet in the 1st and 3rd principal components for both Big and Small trips and concluded that the EDCP fleet may not be representative of the entire fleet.  However, because the first three PCs captured only a moderate fraction of the variance, these analyses should be viewed with caution. It is worth noting that Sampson provided canonical variable plots of PCA 1 against PCA 2 (Figure 6a and 6b of his report) in which both the information from the EDCP and the whole fleet are superimposed and these show that the data from the EDCP do not appear to be markedly different from the total fleet.  A truly important bias should show up clearly in these plots, which take into account more of the variance of the samples than the individual t-tests actually used in the report.

The general issue of testing for bias in observer data using landings data raises some important questions concerning the inferences that can be drawn.  In particular, if no significant differences are detected between observer and landings data, this does not guarantee that there is no bias in the estimates of discards.

The other major source of information that could be used to test the representativeness of observer data is to test against self-reported estimates by fishermen.  Sampson (2002) made such an analysis for the EDCP data and detected differences.  In this case, it was inferred that the self-reported estimates were not accurate.  In contrast, Liggens (1997) found no differences between observer data for catch and discards against fleet wide estimates.  In general, self-reported estimates are rightly viewed with caution and this is the most commonly available type of discard information against which to compare observer data.

To deal with logistical constraints and their effect on observer programs, Babcock et al. (2003) cite the work of Cotter et al. (2002) using a probability proportional to size (PPS) sampling allocation procedure.  However, Cotter et al. (2002) concluded that this approach did not markedly improve the performance of the estimators.

Babcock et al. (2003) refer to the method of collapsing strata as an ad hoc procedure when, in fact, it is a very well established method (see Cochran 1977).  Bias can occur using this method if an investigator deliberately chooses similar strata to combine.  However, methods in which objective rules for combining strata are employed are much less likely to cause bias.

Babcock et al. (2003) assert that Fogarty and Gabriel (2002) assumed that the sampling fraction did not matter. In fact, Fogarty and Gabriel (2002) noted that the sampling fraction does affect the precision of the estimate through the finite population correction factor.  The effect indicated by Babcock et al. (2003) is a very well established property of the statistical estimators employed.  Fogarty and Gabriel (2002) noted in their analysis that “Ignoring the finite population correction factor results in an overestimate of the standard error…” Fogarty and Gabriel (2002) did not include the FPC in their estimates so as to provide a conservative estimate of the variance (e.g. biased on the high side).  This is very different than assuming that the sampling fraction does not matter.

Recommendations made by the NMFS National Working Group on Bycatch (NMFS 2004) largely address the issues of major concern – the importance of obtaining representative sampling, careful consideration of stratification, etc.  We recommend that information from observer trips (catch, trip duration, number of hauls/tows, fishing location etc.) also be checked against independent sources of information to see if differences can be detected.  The only solution that Babcock et al. (2003) provide when such a bias is detected is to increase the number of trips covered by observers.  As noted above, this may or may not be effective.  Other solutions to the problem need to be explored, as well as increasing observer coverage when analyses indicate it is cost-effective to do so given finite resources and competing programmatic needs. 


An Evaluation of Bias in the Northeast Fisheries Observer (Sea Sampling) Program

Several tests were conducted to address the potential sources of bias.  We compared several measures of performance for vessels with and without observers present.  Bias can arise if the observed trips within a stratum are not representative of the other vessels within the stratum. Such bias could arise if the vessels with observers on board consistently catch more or less than other vessels, if the average trip durations change, or if vessels fish in different areas.  Each of these hypotheses was tested by comparing observable properties in strata having data from vessels with and without observers. 

All vessels are required to report the total trip landings, the number of days absent from port, and the primary statistical area fished.  Average catches (pounds landed) for observed and total trips compare favorably (Figure 5), and follow an expected linear relationship.  If the observed and unobserved trips within a stratum measure the same underlying process, one would expect no statistical difference in the average catches (and the standard deviations) between the VTR and observer data sets.  An examination of the distribution of these differences (Figures 6A and 6B) indicates no evidence of systematic bias.  The mean difference of 238 pounds in average catch rates between the two data sets is not significantly different from zero (p=0.59, df=84).   As well, a paired t-test of the stratum specific standard deviations of pounds kept showed no significant difference from zero (p=0.08).  A strong correlation was detected in trip duration between observed and unobserved trips (Figure 7), with observed trips averaging about a half-day longer (p = 0.01) (Figure 8A).  However, the difference in stratum specific standard deviations of trip length was not significantly different from zero (p = 0.60) (Figure 8B).  Some skewing of the differences in mean trip durations is evident, with observed trips being slightly longer.

Two measures of spatial coherence were also examined.  Within stratum h the expected number of observer trips by statistical area j as the product of the proportion of VTR trips in Statistical Area j and stratum h   (Vjh) and the number of observed trips in stratum nh .   Thus, Ejh= Vjh * nh.   These expectations can then be compared to the actual frequencies (Ojh) of observed trips by statistical area.  Results of these analyses indicate that the spatial distribution of fishing effort for trips with observers on board closely matches the spatial distribution of trips for the stratum as a whole (Table 4).  It was possible to compute chi-square statistics for 65 strata.  The null hypothesis of observer proportions equal to VTR proportions was rejected (P<0.05) in 20 of the 65 comparisons.  Of these 20 cases, 11 were from ports in Southern New England and Mid-Atlantic states.  Of the remaining nine cases, five involved the large and extra-large gill net fisheries that land both groundfish and monkfish. Thus, the null hypothesis of equivalent spatial distribution of sampling was rejected in only 4 of 50 cases, a rejection rate only slightly higher than expected from chance alone.  

As a final measure of the potential spatial bias, a paper by Murawski et al. (2005 in press) is instructive.  In this paper, information is presented on the spatial distribution of otter trawl fishing effort for vessels with Vessel Monitoring Systems (VMS) and compared with the distribution of fishing effort from observed trips (Figure 9).  Qualitatively, the spatial distributions match very well with high concentrations of effort near the boundaries of existing closed areas on Georges Bank and within the Gulf of Maine. Moreover, the effort concentration profiles deduced from VMS data coincide almost exactly with the profiles derived from the observed trips.  Overall, these comparisons suggest strong coherency between these two independent measures of fishing locations.


Sources of Uncertainty

In the Northeast, every effort is made to ensure representative observer coverage. This is accomplished by stratifying the fleet into homogeneous spatial, temporal and gear group and by randomly selecting vessels from these strata. Stratification and randomization of sampling units are basic principles of survey design (e. g. Cochran 1977; Thompson 2002) and have been used in previous studies of bycatch to improve both “knowledge of the fleet” (Cotter et al. 2002) and precision of estimates (Allen et al. 2002; Borges et al. 2004.)  VTR data are used to produce a list of fishing vessels, by quarter and fleet sector.  The vessel list contains a randomly ordered list of all vessels that participated in each fleet sector.  To obtain a representative sample of the fleet, the NEFOP Area Coordinators use this vessel list, in addition to their local knowledge of fleet activity, to identify vessels on which to place observers.  Vessels are required to take an observer if requested to do so.  The NEFOP has standard protocols regarding vessel selection.  A vessel, using the same gear, is not observed more than twice in the same month— this prevents repeated observations from the same vessel.  The NEFOP Area Coordinators have protocols for documenting refusals; a refusal occurs when a vessel owner/captain is asked to take an observer and the owner/captain declines — or agrees but does not follow through (i.e. the vessel leaves the dock without the observer on board).  Refusals are forwarded to Law Enforcement.  A vessel owner can be prosecuted for failing to take an observer.

An objective process is used for imputation of missing values in unsampled strata.  The imputation methodology helps identify gaps in sampling strategy and is an important component for ongoing improvements of the survey design. Stratoudakis et al. (1999) employed a post-stratification technique of “collapsing strata” as a way of dealing with unsampled strata. Our method of imputing means and variances for unsampled strata builds on this approach by utilizing information in comparable strata as a basis for initial sample allocation. Imputation represents a tradeoff between a realistic survey consistent with known fishing patterns and a less realistic pooled survey. Excessive imputation, however, can be indicative of an overly ambitious stratification approach; utilizing the observer data at an unrealistically fine temporal or spatial scale (say daily estimates in a small area) not only leads to an excessive extrapolation, but also violates the premise that observations in the current year are sufficient to predict patterns in the following year. 

Persistence of annual patterns is critical to the estimation of an ‘optimal’ scheme.  As regulations change and fishing patterns shift, using data based on fleet activity in the preceding year may be problematic. Using the current year’s fishing activity pattern to predict future fishing patterns within strata cannot account for changes induced by variations in resource abundance, revenues, or management regimens. In a study of discards in the North Sea, Statoudakis et al. (1998) reported immediate increases in discarding rates following increases in minimum size limits but noted consistent patterns over time and among gears for higher value species such as cod and haddock. Without a predictive model of human behavior, it is not possible to anticipate fine-scale changes in fishing patterns. Rochet et al. (2002) were unable to find reliable predictor variables for prediction of bycatch but it should be noted that their study examined only 26 trips, about two orders of magnitude less than the number of trips considered in this report.

A related source of uncertainty is the ability to make inferences about specific species, stocks or age groups.  Our evaluation of the Northeast Observer Program considers discard to kept ratios at the level of species groups. This approach is consistent with recent literature (Allen et al. 2001, Borges et al. 2004).   An optimal strategy for New England Groundfish as a group, however, will not necessarily be optimal for age 2 haddock on Georges Bank.  The precision of discard information required at this level will typically exceed the nominal levels predicted as a result of optimal sampling.  Figure 10 illustrates the relationship between the coefficient of variation for the overall New England groundfish discard ratio estimate as a function of total observer days allotted to this fishery.  Assuming that 2,708 sea days can be allocated in an optimal manner in 2005, the predicted CV of the d/k ratio is well below 4%.  The predicted CV drops to 2.5% at about 4,000 days and drops to about 1% at 20,000 days (about 50% coverage).  The continuously decreasing slope of the relationship between CV and observer sea days reflects the reduced effectiveness of additional days as a way of improving overall precision. 

Several important points are relevant to the interpretation of Figure 10.  First, any non-optimal allocation of sampling effort will tend to increase the overall CV of the d/k ratio.  Non-optimal allocations occur when the desired sampling plan cannot be followed, or when the pattern of landings among the strata in the current year differs from the pattern used as a basis for the optimal allocation scheme.  Second, the CV of the overall d/k ratio is smaller than the precision of the individual components.  Thus, the CV of the d/k ratio for a particular gear type or for a d/k ratio based on a finer temporal or spatial scale will generally be greater than the composite estimate.  This property is illustrated in Figures 11 and 12 for quarterly estimates in the New England groundfish otter trawl and gillnet fisheries, respectively.  Note that the number of observed otter trawl trips would need to be tripled to reduce the CV of the d/k ratio from 20% to 10%.

The coefficient of variation (CV) of the d/k ratios for New England groundfish are well below the 20% - 30% CV range established by the Atlantic Coastal Cooperative Statistics Program (ACCSP) for high priority commercial fisheries (ACCSP 2001) and by NMFS’s National Working Group on Bycatch (NWGB) (NMFS 2004).  The NWGB recommends:  “For fishery resources, excluding protected species, caught as bycatch in a fishery, the recommended precision goal is a 20-30% CV for estimates of total discards (aggregated over all species) for the fishery; or if total catch cannot be divided into discards and retained catch then the recommended goal for estimates of total catch is a CV of 20-30% (NMFS 2004).  Assuming that landings are known without error, the precision of estimated total discard for New England groundfish equals the precision of the d/k ratio for this fishery.

A decrease in precision of the d/k ratio is also expected for any single species analysis.  For example, the CV of the d/k ratio for haddock alone will probably be much greater than the CV of the d/k ratio for the overall groundfish complex.  Once again, it is important to remember that the sampling program must be based on observable properties of the strata, not on the outcome of the experiment.  Any efforts to improve the precision of the d/k ratio for a single species will come at the expense of reduced precision for other species.  Moreover, oversampling of a particular group of vessels may introduce undesirable properties (e.g., repeat trips on a single vessel) that can make the sampling less representative. 

An exact definition of an acceptable level of bias and precision depends on the objectives of the analyses and the levels of acceptable risk to the fishery resource and the fishery.  The acceptable level of risk must be defined externally by managers but should, at a minimum, consider the risk of stock collapse if management actions are compromised by imprecise information on discards. From the analyses presented in this report, it would appear that the level of precision is high for the groundfish resource as a whole and that there little evidence of bias in the discard rates.

Presently the optimization model uses aggregate d/k ratios, which are appropriate for most fisheries; however, for other fisheries, d/e ratios are more appropriate.  The optimization algorithm can handle datasets containing either type of ratio, but not both in the same set (without external weighting).    Input data sets with d/e ratios have been developed, but have not yet been incorporated into the overall process.  A comparison of the precision of alternative estimators of discard ratios is the subject of ongoing research.


ACKNOWLEDGMENTS

We wish to thank Mark Terceiro, Katherine Sosebee, and Ralph Mayo for their insights and assistance in identifying the fishery strata,  the bases for imputation, and the iterative process of refining the application. We also thank Fred Serchuk for his constructive comments and review.


REFERENCES

ACCSP (Atlantic Coastal Cooperative Statistics Program).  2001. Technical Source Document Series V: Biological Module and Discard, Release and Protected Species Interactions Module. June 28, 2001 draft.  137 p.  On-line document:  http://www.accsp.org/tsdocs.htm.

Allen, M., D. Kilpatrick, M. Armstrong, R. Briggs, N. Perez, and G. Course. 2001.Evaluation of sampling methods to quantify discarded fish using data collected during discards project EC 95/-94 by Northern Ireland, England, and Spain.  Fish. Res. 49:241-254.

Allen, M., D. Kilpatrick, M. Armstrong, R. Briggs, G. Course, and N. Perez. 2002. Multistage cluster sampling design and optimal sampling sizes for estimation of fish discards from commercial trawlers. Fish. Res. 55:11-24.

Babcock, E.A., E. K. Pikitch and C.G. Hudson.  2003.  How much observer coverage is enough to adequately estimate bycatch?  Report of the Pew Institute for Ocean Science, Rosenstiel School of Marine and Atmospheric Science, University of Miami, Miami, FL.  On-line version:  http://www.oceana.org/uploads/BabcockPikitchGray2003FinalReport.pdf

Borges, L., A. F. Zuur, E. Rogan, and R. Officer. 2004. Optimum sampling levels in discard sampling programs. Can. J. Fish. Aquat. Sci. 61:1918-1928.

Cochran, W.L. 1977.  Sampling Techniques.  J. Wiley and Sons.  New York.

Cotter, A.J.R., G. Course, S.T. Buckland and C.Garrod.  2002.  A PPS sample survey of English fishing vessels to estimate discarding and retention of North Sea cod, haddock and whiting.  Fisheries Research 55: 25-35.

Fogarty, M.J. and W. Gabriel.  2002.  Relative precision of discard estimates for the Northeast groundfish complex.  Report of National Marine Fisheries Services, Northeast Fisheries Science Center, Woods Hole, MA. 

Frontline Systems. 2003.  Premium Solver Platform version 5.5.  Incline Village, NV. 222 p.

Liggens, G.W., M.J. Bradley, S.J. Kennel.  1997.  Detection of bias in observer-based estimates of retained and discarded catches from a multispecies trawl fishery.  Fisheries Research Report 9(3):46-52.  University of British Columbia.

Murawski, S., S. Wigley, M. Fogarty, P. Rago and D. Mountain. (2005 in press).  Effort distribution and catch patterns adjacent to temperate MPAs. ICES Journal of Marine Science.

NMFS (National Marine Fisheries Service). 2004.  Evaluating bycatch: a national approach to standardized bycatch monitoring programs.  U. S. Dep. Comm., NOAA Tech. Memo. NMFS-F/SPO-66, 108 p.  On-line version,  http://www.nmfs.noaa.gov/by_catch/SPO_final_rev_12204.pdf

NMFS-NERO (National Marine Fisheries Service) Northeast Regional Office.  http://www.nero.noaa.gov/ro/fso/vtr_inst.pdf 

National Research Council (NRC) 1998. Review of Northeast Fishery Stock Assessments. National Academy Press. Washington DC

NEFSC (Northeast Fisheries Science Center). 1996. Analysis of the 1994 fishing vessel logbook data. In: 22nd Northeast Regional Stock Assessment Workshop: Stock Assessment Review Committee consensus summary of assessments. NEFSC Reference Doc. 96-13; 242p.

Rochet, M-J, I. Peronnet, and V. M. Trenkel. 2002. An analysis of discards from the French trawler fleet in the Celtic Sea. ICES J. Mar. Sci. 59:538-552.

Sampson, D. 2002.  Final Report to the Oregon Trawl Commission on Analysis of Data from the At-Sea Data Collection Project.  Oregon State University.  Newport, Oregon. On-line http://www.onid.orst.edu/~sampsond/projects/edcp

Stratoudakis, Y., R. J. Fryer, R. M. Cook. 1998. Discarding practices for commercial gadoids in the North Sea. Can. J. Fish. Aquat. Sci. 55:1632-1644.

Stratoudakis, Y., R. J. Fryer, R. M. Cook, and G. J. Pierce. 1999. Fish discarded from Scottish dermersal vessels: Estimators of total discards and annual estimates for targeted gadoids. ICES J. Mar. Sci. 56:592-605.

Thompson, S. K. 2002. Sampling. 2nd ed.,  J. Wiley and Sons, Inc. New York.

Walsh, W. A., P. Kleiber, and M. McCracken. 2002. Comparison of logbook reports of incidental blue shark catch rates by Hawaii-based longline vessels to fishery observer data by application of a generalized additive model. Fisheries Research 58:79-94.