FINAL REPORT HISTORICAL DATA QUALITY REVIEW FOR THE U.S. EPA NATIONAL ESTUARY PROGRAM to Office of Marine and Estuarine Protection U.S. Environmental Protection Agency Washington, DC Contract No. 68-03-3319 Work Assignment No. 20 Work Assignment Managers: Joe Hall, Ray Baum Prepared by Tetra Tech, Inc. for Battelle Ocean Sciences 397 Washington Street Duxbury, MA 02332 July 1987 ------- CONTENTS Page LIST OF FIGURES iii LIST OF TABLES iv INTRODUCTION 1 BACKGROUND 1 OBJECTIVES 3 AVAILABLE HISTORICAL DATA 3 OVERVIEW OF DATA USES AND REQUIREMENTS 7 DATA USES 7 DATA REQUIREMENTS 8 QUALITY REVIEW OPTIONS 10 LEVELS OF QUALITY REVIEW 10 TECHNICAL OVERSIGHT OF DATA ENTRY 11 COMPUTERIZED CHECKS 12 TECHNICAL EVALUATION OF ENTERED DATA 14 RECOMMENDATIONS 17 OVERVIEW 17 STANDARD FORMATS AND CODES 19 ESTUARINE VARIABLES 28 CRITICAL DATA REQUIREMENTS 28 RANGE LIMITS 33 NATIONAL QUALITY REVIEW 50 REGIONAL QUALITY REVIEW 50 i i ------- FIGURES Number Page 1 Example of a form used to identify priority data sets for uses in estuary characterization 6 2 Overview of the proposed quality review process 18 3 Schematic of the recommended five-level hierarchy for SAS libraries 23 ------- TABLES Number Page 1 Variables commonly encountered in historical estuarine data sets 5 2 List of estuarine variables 29 3 Critical data requirements for estuarine variables 30 4 Range limits for estuarine variables 34 5 Upper range limits for chemical contaminants in the water column and bottom sediments 38 6 Upper range limits for chemical contaminants in muscle and liver tissue 44 ------- INTRODUCTION BACKGROUND The National Estuary Program is administered by the Office of Marine and Estuarine Protection (OMEP) of the U.S. Environmental Protection Agency (EPA). The Program is implemented through U.S. EPA regional offices under the guidance of OMEP. The National Estuary Program has two major compo- nents. The first is oversight and implementation of existing estuarine management programs such as the Chesapeake Bay and Great Lakes Programs. The second major component is initiation of new programs. At present, new programs are being developed for Puget Sound (WA), San Francisco Bay (CA), Long Island Sound (NY), Buzzards Bay (MA), Narragansett Bay (RI), and Albemarle-Pamlico Sounds (NC). For each estuary within the National Estuary Program, a five-year program is developed for addressing environmental problems. In the first year, a planning initiative is prepared. This initiative defines the organization of the estuary program and identifies key participants. In the second, third, and fourth years, environmental problems within the estuary are identified and evaluated from both a scientific and programmatic perspective. In the fourth and fifth years, a comprehensive conservation and management plan is developed. This plan presents the details of how environmental problems will be corrected, including who will conduct various activities and when those activities will be conducted. A key process in addressing the environmental problems of an estuary is defining those problems and conveying the relevant information to the public. This process is termed characterization, and occurs in the following major steps: ------- • The historical (i.e., existing) data sets needed to define environmental problems are identified, collected, and screened. 0 New data are generated to fill important gaps in the historical database. 0 Data are analyzed to define the present status of the estuary, historical trends, and likely future trends if current practices are not modified. • Results of the data analyses are conveyed to the public in a form that can be understood and supported. Most of the individual estuary programs rely primarily upon historical data to characterize the status and trends of estuarine conditions. Given the value of historical data to the development of estuary programs, it is essential that these data be treated in a manner that maximizes their usefulness to the individual estuary programs. This treatment includes identification of priority data sets, transfer of data to computer files, and verification of data quality. The National Estuary Program, in conjunction with U.S. EPA regional offices, has identified a number of historical data sets as useful for characterizing estuarine conditions. These data sets have been or will be transferred to SAS computer files on the U.S. EPA National Computer Center (NCC) mainframe computer. However, before these data can be used to characterize the status and trends of estuarine conditions, they will be subjected to a quality review process to ensure they are appropriate for those evaluations. Although the quality requirements for new data collected by individual estuary programs generally are known and specified, the requirements for historical data are not well defined. Specification of quality requirements for historical data is difficult, because these data often were collected for a variety of reasons using different methods. In addition, much of the ------- information required to conduct a detailed quality review of historical data is not available. Quality review of historical data must therefore strike a balance between the ideal of a rigorous scrutiny of all data and the reality of the limitations of this kind of data. OBJECTIVES The primary objective of this document is to develop an approach for conducting quality reviews of historical data used by the National Estuary Program. The goal is to ensure that all data used to characterize estuarine conditions pass a minimum level of quality review. Data users can therefore be assured that these data are of known quality. The proposed quality review approach is described from a national perspective, to ensure consistency among individual estuary programs. However, the approach has the flexibility to be modified as necessary to meet the specific needs of individual programs. For example, additional variables can be added or more stringent quality review criteria can be specified for individual programs. To be cost-effective, the proposed quality review approach is based primarily on computerized checks, rather than evaluations by technical experts. However, an overview is presented of the general kinds of technical review that may be conducted by the individual estuary programs. The remainder of this section describes the data sets currently selected for use by the National Estuary Program. The following sections provide overviews of how data are used by the program and what options are available for conducting quality review evaluations of those data. The last section of the document presents the quality review approach recommended for historical data used by the National Estuary Program. AVAILABLE HISTORICAL DATA Historical estuarine data generally are found in two major forms: measurements and attributes. Measurements are data to which numerical values can be assigned (e.g., concentrations of dissolved oxygen), whereas ------- attributes are data that cannot be measured or ordered, but must be expressed qualitatively (e.g., male or female, juvenile or adult). Both kinds of data are valuable for characterizing the status and trends of estuaries. A wide variety of variables is encountered in historical estuarine data sets (Table 1). Most variables pertain to the characteristics of stations, the water column, sediments, or organisms. The contents of individual data sets range from several variables (e.g., abundance of fish at a transect) to a very large number of variables (e.g., a large-scale survey of chemical contamination and biological effects). The specific data sets used for characterization by individual estuary programs generally are a subset of the total number of data sets available for each estuary. These priority data sets are selected on the basis of the following criteria: • Relevance of the data to the objectives of characterization. • Identity of the key variables included in the data set. t Preliminary quantitative or qualitative assessment of the quality of the data. • Accessibility of the data set. To assist in the identification of priority data sets, forms (Figure 1) are frequently sent to investigators to obtain detailed information on candidate data sets. At present, over 40 priority data sets from four estuaries have been identified and entered into SAS files on the U.S. EPA NCC computer. Within the next year, up to 60 additional data sets may be added to this estuarine database. ------- TABLE 1. ENVIRONMENTAL VARIABLES COMMONLY ENCOUNTERED IN HISTORICAL ESTUARINE DATA SETS Kind of Data Variable3 Station description Water column variables Sediment variables Biological variables Position - latitude and longitude or other kinds of coordinates Depth Sampling time - date, hour Ambient conditions - tidal stage and height, current speed and direction, wave height, wind speed and direction Nutrients - various forms of nitrogen and phosphorus Organic carbon Alkalinity Temperature PH Salinity Specific conductivity Dissolved oxygen Transparency Turbidity Total suspended sol ids Chloride Grain size Total solids Total volatile solids Total organic carbon Oil and grease Chemical contaminants" Bacterial indicators - abundance in water and tissue Plankton - species composition and abundance Benthic macroinvertebrates - species composition and abundance Fishes and megainvertebratesc - species composition and abundance, tissue concentrations of chemical contaminants'*, histopathology a Variables were selected from the historical submitted to the National Estuary Program. b U.S. EPA priority pollutants and other chemicals. data sets already c Large invertebrates captured in trawls, dredges, and traps. Distinguished from smaller benthic macroinvertebrates that are sampled using grabs or box corers. 5 ------- LONG ISLAND SOUND DATA CHARACTERIZATION OXYGEN DEPLETION IN WESTERN LONG ISLAND SOUND 1. LIS Document Reference Number: 2. Organization Contacted: 3. Principal Investigator: 4. Contact: 5. Telephone Number: 6. Address of Contact: 7. Citation: a) Author(s) b) Year c) Title d) Journal/Rept. e) Volume: Number f) Pages 8. Sample, Survey Type: a) Station(s) b) Synoptic Survey c) Vertical Resolution 9. Measurements: a) Dissolved Oxygen b) % Oxygen Saturation c) Temperature d) Salinity e) Phytoplankton Pigments f) Phytoplankton Counts g) Inorganic Nutrients (Ammonium, Nitrite. Phosphate, Silicate) h) Organic Nutrients (DOC, TOC DON, TON. OOP. TOP) i) BOD. COD j) Biological Rates (Primary Produc- tivity, Water Respiration, Sediment Respiration, etc.) 10. Data, Study Area: 11. Time Span of Data: 12. Status of Data: a) Raw b) Reprint c) Computerized d) Database Name e) Data Products 13. Comments: Y/N Frequency/Resolution Y/N Units From . , Y/N To Availability Cost Figure 1. Example of a form used to identify priority data sets for use in estuary characterization. ------- OVERVIEW OF DATA USES AND REQUIREMENTS This section provides a general description of how historical data are used by the National Estuary Program, and the requirements the data must meet to be acceptable for the desired uses. This information is needed to evaluate the quality review options and recommendations that are described in subsequent sections of this document. DATA USES The primary use of historical estuarine data by the National Estuary Program is for characterizing the status and trends of conditions within specific estuaries. In general, characterization has four major components: • Identification of important variables. • Spatial patterns of variables. • Temporal trends of variables. • Relationships among variables. Descriptions of variables include evaluations of the chemical, physical, and biological characteristics of all or part of each estuary. These descriptions are useful as a broad overview of the conditions encountered in each estuary. They may include lists of the species and chemicals that are commonly encountered within an estuary. Descriptions may also include the mean values and ranges of conditions (e.g., water temperature, salinity, depth) within the estuary. Evaluations of the spatial patterns of variables within an estuary are useful for identifying such locations as critical habitats, resource harvesting areas, pollutant sources, and areas exhibiting environmental ------- impacts. Spatial patterns usually are displayed by mapping or contouring the values of a variable. These kinds of maps can be used by managers and the public to visualize the magnitude and extent of environmental problems. Evaluations of the temporal trends of variables within an estuary are useful for determining how variables change over time. This information can be used to assess how conditions have varied in the past and how they might change in the future. Temporal trends usually are displayed by plotting the values of a variable observed at different times. This kind of information is important for determining if environmental conditions are improving or deteriorating over time. Evaluations of the relationships among variables within an estuary are useful for determining potential cause and effect relationships. For example, by evaluating similarities in the spatial patterns (e.g., pollutant sources and impacted areas) or temporal trends (e.g., increasing turbidity and decreasing density of aquatic vegetation), potential cause and effect relationships can be identified. Relationships among variables can be evaluated by simply plotting values and looking for similar trends. Alternatively, relationships among variables can be evaluated more rigorously using statistical techniques (e.g., correlation, regression, analysis of variance). Understanding the relationships among variables is an important step in the process of recommending corrective action. DATA REQUIREMENTS The requirements necessary for interpreting estuarine data can be subdivided into those that are universal (i.e., they apply to all variables) and those that are specific to each variable. Universal data requirements are the location and time of data collection, the methods used to measure the variable, and the measurement units in which the data are expressed. Variable-specific requirements depend upon the intended use of the data. Location of data collection for all kinds of estuarine data refers to the geographic position of the sampling site within an estuary. It generally is expressed as latitude and longitude, or as coordinates of alternate 8 ------- systems (e.g., Loran and Raydist navigation systems, state plane grids) that can be converted to latitude and longitude. For some purposes, location can be expressed less precisely as the waterway or estuary segment within which data were collected. In addition to geographic position, location for some variables also refers to vertical position. Examples of vertical position are depth in the water column, depth below the sediment surface, and elevation above sea level (i.e., altitude). For interpretation of some variables, knowledge of vertical position can be as critical as knowledge of geographic location. Time of data collection refers to the hour, day, month, or year in which sampling occurred. Depending on the kind of data and the intended use, the precision with which time is expressed can vary widely. For example, evaluations of diel movements of fish might require that sampling time be reported to the nearest hour, whereas stock assessments of fish might only require that data be reported to the nearest month. The measurement units of each variable must be known to interpret the absolute magnitude of each data value. In most cases, data must be converted to common units before being compared. The kinds of units reported for each data value therefore are not important, as long as they can be converted to the units commonly used for the variable. In some relatively rare cases, estuarine data are unitless by definition. Unitless data frequently are encountered when indices are used. In such cases, to interpret the absolute magnitude of each data value, it is critical to know how the unitless data were derived. Variable-specific data requirements are dependent upon the intended uses of the data. In general, the critical data requirements for a variable should include the universal data requirements discussed previously, as well as any other information that is essential for interpretation of a data value. Additional data requirements (i.e., beyond those considered critical) may be necessary for specialized uses of the data (e.g., detailed statistical analyses). ------- QUALITY REVIEW OPTIONS This section describes the general kinds of quality review that can be applied to estuarine data. This information provides the basis for the detailed recommendations made in the final section of this document. LEVELS OF QUALITY REVIEW The level of quality review applied to a data set can vary from no review to detailed scrutiny of every data value. From a cost-benefit standpoint, neither extreme may be desirable. In the former case, failure to identify and correct substantial errors could lead to costly and ineffective management decisions based on those data. In the latter case, excessive quality review may be costly and yield little additional benefit in terms of enhanced data quality, compared to a more modest review. The optimal level of quality review generally lies between the extremes of no review and detailed review of all data. This review may consist of a combination of computerized checks and evaluations by technical experts. In general, computerized checks can be conducted inexpensively on a complete data set. By contrast, technical evaluation generally is more expensive, and therefore usually cannot be applied to all values in a data set. However, a technical evaluation can produce an assessment of many aspects of a data set that computerized checks cannot. The optimal quality review approach combines the strengths of both kinds of evaluation to effectively review a data set at a reasonable cost. The remainder of this section describes the kinds of computerized checks and technical review that can be applied to estuarine data. In the following section, recommendations are made for combining these two kinds of review to evaluate the historical data used by the National Estuary Program. 10 ------- TECHNICAL OVERSIGHT OF DATA ENTRY When data in hard-copy form are entered into a machine-readable format, it is desirable that a technical expert oversee the entry process. The two major kinds of technical oversight are 1) assurance that data from the hard copy are interpreted accurately and 2) assurance that data are transferred accurately to the machine-readable format. Historical data in hard-copy form frequently are found in a variety of locations (e.g., text, tables, appendices) and formats (e.g., different units, significant figures). Because data entry personnel may not have the training and experience required to understand the details of technical information, it may be necessary for a technical expert to ensure that data are interpreted accurately prior to entry. Data interpretation may include transformation to different units, rounding off to fewer significant figures, or calculations (e.g., from wet weight to dry weight). It may also include review of the data source to ensure that all pertinent supporting information is collected with the data values. Such information might include detection limits for chemical analyses, mesh size for benthic infaunal analyses, or depth for water column variables. Technical oversight at this stage is critical because subsequent data users may not have access to the original hard copies and therefore cannot check for accurate interpretation of the data. Whenever data are transferred from hard-copy form to a machine-readable format, it is advisable to check at least 10-15 percent of the data for accurate transferral. Accurate transferral refers to use of proper codes and formats, as well as accurate entry of values. Given the complexity of many historical environmental data sets, it is preferable that a technical expert oversee the transferral checks. It is recommended that these checks focus primarily upon the most complex components of each data set (i.e., those components having the highest potential for data transferral errors). In many data sets, these components are related to taxonomic names and names of complex organic compounds. 11 ------- COMPUTERIZED CHECKS The speed and reliability of computers can be used to conduct a variety of cost-effective quality review checks. Four major kinds of computerized checks include the following: • Format checks to ensure data are entered in the proper format. • Coding checks to ensure that all codes are valid. • Range checks to ensure that all numerical values fall within specified ranges. • Checks for critical data requirements to ensure that all essential ancillary information (e.g., station location, sampling time) is available. To conduct computerized format and coding checks efficiently, it is essential that all machine-readable data sets have a uniform file structure and coding system. Data sets in hard-copy form can be receded before entry, and then entered directly according to the standard format. By contrast, data sets existing in machine-readable form must be receded and reformatted automatically. As mentioned in the previous section, reformatting and receding of data generally require technical oversight to ensure that the diverse kinds of data encountered in unrelated original data sources are translated properly into the desired uniform system. Format checks ensure that no data field contains inappropriate characters. For example, fields with numeric data should not contain alphabetic characters, and alphabetic fields should not contain numeric characters. Format checks will not ensure that numeric and alphabetic characters were entered accurately. Coding checks ensure that all coded entries have valid codes. For example, if taxonomic codes are used instead of species names, the coding checks will determine whether or not each taxonomic code is valid (i.e., it 12 ------- is in the code dictionary). These checks will not determine whether each valid code is properly matched with each species name. To conduct range checks, a list of variable-specific ranges must be developed. Each range establishes the numerical limits within which the value of a variable is expected to occur. The automated checks identify data that lie outside the specified ranges. For example, the range limits for the sediment concentrations of naphthalene might be 0 and 10 mg/kg (dry weight). A value of 20 mg/kg would therefore be identified as being outside the specified range. Range limits can be established to identify at least two kinds of extreme values. For example, an initial upper range limit of 10 mg/kg might be used to identify naphthalene concentrations that are unusually high, but sometimes found. This initial range limit would identify potential errors. A second upper range limit of 50 mg/kg might also be used to identify concen- trations that are unusually high, and unlikely to be found. This second range limit would identify probable errors. An important consideration when using range checks is that the results only indicate which values are inside or outside specified ranges. They do not indicate that values within the ranges are correct. For example, a naphthalane concentration of 0.4 mg/kg that was entered mistakenly as 4.0 mg/kg would pass the range-checking procedure, but be incorrect. To conduct computerized checks for critical data requirements, a list of variable-specific essential ancillary information must be developed. This information represents the supporting information that is essential for interpreting a particular data value. For example, knowledge of sieve mesh size might be a critical data requirement for interpreting the total abundance of benthic invertebrates at a station. An abundance of 10,000 individuals/m2 might be interpreted as high if a 1.0-mm mesh was used, whereas the same value might be considered low if a 0.5-mm mesh was used. Once the list of critical data requirements is developed, data sets can be searched automatically and missing critical data requirements can be identified as such. The identification of a missing critical data require- 13 ------- ment does not imply that the respective data value is incorrect; it only suggests that the data value will be difficult to interpret. TECHNICAL EVALUATION OF ENTERED DATA A second level of quality review that can be conducted in conjunction with computerized checks is evaluation of entered data by technical experts. This kind of evaluation is most valuable if the experts have access to the documents that describe the field and laboratory techniques used to generate the data. However, technical evaluation is valuable whether or not the original documentation is available. In many cases, automated quality review checks simply identify aberrant data. The data user must decide how the aberrant data will influence the intended use of a data set. If the data user does not have the technical training to understand the implications of the aberrant data, the data set may either be used inappropriately or rejected unnecessarily. Thus, to ensure that data sets containing aberrant data are used properly, a technical evaluation may be desirable. The most detailed kind of technical evaluation involves assessments of the study design, sampling procedures, and analytical methods used to generate the data set of interest. This kind of evaluation usually requires review of the original documents from which the data were taken. Evaluation of the study design might focus on how the study objectives influence subsequent uses of the data. For example, if the objectives were to characterize conditions near sources of contamination, most stations within a particular water body may be located as close to sources as possible. Use of such a highly biased data set to characterize conditions throughout the water body could produce misleading results. Evaluation of sampling protocols can determine how they influenced the accuracy of the resulting data. For example, checks can be made to ensure that collection equipment was operated properly (e.g., an otter trawl was fishing on the bottom), that samples were handled appropriately following 14 ------- collection (e.g., preserved as specified), and that the entire sampling effort was documented adequately (e.g., adequate logkeeping and chain-of- custody). The knowledge that these procedures were executed properly greatly increases confidence in the resulting data. As with sampling protocols, evaluation of analytical methods can determine how they influenced the accuracy of the resulting data. For example, checks can be made to ensure that acceptable methods were followed (e.g., that departures from standard protocols were justified) and that application of the methods was adequate (e.g., that analyses of standards or spiked samples were acceptable). A less detailed kind of technical review would place less emphasis on examining original documents, and focus primarily on the information available in machine-readable form. Automated quality review checks would greatly facilitate this kind of review by identifying data that do not conform to established criteria. As mentioned in the previous section, these automated checks can include checks for proper formats, valid codes, range limits, and critical data requirements. Data identified as having improper format or invalid codes can be evaluated to determine the implications of their exclusion from subsequent uses of the data set. If data are not considered critical for certain kinds of analyses, they can be deleted from those analyses. However, if data are considered essential for an intended use, the technical expert may be required to examine the previous machine-readable or hard-copy forms of the data set to rectify the problem. Data identified as lying outside of range limits can be evaluated to determine whether the values may be accurate or whether they appear to be erroneous. A technical expert familiar with the conditions encountered in a particular estuary often can review supporting information such as station location, season, depth, and habitat characteristics, and judge whether an unusual value was possible under the specific set of existing conditions. In some cases, review of original documentation may be required to evaluate an unusual value. 15 ------- When critical data are missing, a technical expert may be required to determine the implications of the missing information with respect to subsequent uses of the data set. For example, if information on sieve mesh size is missing for a data set composed of abundances of benthic inverte- brates, meaningful comparisons with other data sets based on known mesh sizes would not be possible. Because abundance is partly a function of sieve mesh size, interpretation of differences in abundances between data sets would be difficult. The differences could be due primarily to mesh size differences rather than to differences in the variable under study (e.g., concentration of a chemical contaminant). 16 ------- RECOMMENDATIONS OVERVIEW This section presents recommendations for conducting quality reviews of historical data used by the National Estuary Program. The background for these recommendations is presented in previous sections. The recommended quality review process (Figure 2) relies primarily upon computerized checks of entered data. However, the potential roles for technical oversight and review are also described. Key assumptions used to derive these recommenda- tions include the following: t All estuarine data must pass some level of quality review. t Data not passing quality review criteria will be flagged, but otherwise left intact in the database. • Individual estuary programs will be responsible for deter- mining their own quality review criteria. • Funding for quality review will be limited, requiring that emphasis be placed on cost-effective computerized checks. The initial step of the quality review process involves translating diverse historical data into a set of standard codes and a standard format. For data already in machine-readable form, translation involves computerized receding and reformatting. For data in hard-copy form, translation entails manual receding and reformatting as data are entered into computer files. It is recommended that the manual receding and reformatting be conducted with technical oversight, to ensure that data are translated and entered accurately. 17 ------- HISTORICAL DATA (MACHINE-READABLE) OR HISTORICAL DATA (HARD COPY) DATA REFORMATTING AND RECODING NCC SAS FILES •STANDARD FORMAT • STANDARD CODES COMPUTERIZED CHECKS •FORMAT •CODES • CRmCAL DATA REQUIREMENTS . RANGE LIMITS EGENTRY H TECHNICAL RSIGHT QUALITY REVIEW DICTIONARY NATIONAL AND/OR REGIONAL CRITERIA FILES • CRITICAL DATA REQUIREMENTS • RANGE LIMITS SAS FILES WITH DATA QUALIFIERS OPTIONAL TECHNICAL EVALUATIONS BY REGIONAL ESTUARY PROGRAM Figure 2. Overview of the recommended quality review process. 18 ------- The next step in the quality review process entails computerized checks of formats, codes, critical data requirements, and range limits for a group of estuarine variables. To accomplish this, a series of computer programs will be developed, with each program specific to a particular type of data (e.g., sediment chemistry, water quality). These programs will read in the SAS data files and compare them with a quality review dictionary. The critical data requirements and range limits will be specified in these dictionary files. Because the specifications in the dictionary files can be modified independently by each estuary program, quality review checks can be tailored to the specific needs of each estuary. After scanning the SAS data files and the appropriate quality review dictionary, the computer programs will produce new SAS data files containing qualifiers for all those data that failed to meet the specifications in the quality review dictionary. Aside from being flagged, these data will be left intact in the database. The initial variables and range limits to be included in the quality review dictionary are discussed in the following sections. The final step in the proposed quality review process is optional, and will be conducted by the regional estuary programs. This step involves evaluations of the machine-checked data by technical experts. In some cases, these evaluations may require review of the original hard-copy documentation of the data. The remainder of this section describes the proposed coding and formatting systems and the computerized quality review criteria that will be applied to the standard variables in each historical estuarine data set. In addition, general guidance is provided for the kinds of criteria modification and technical review that can be conducted by the regional estuary programs. STANDARD FORMATS AND CODES A key element in the recommended quality review procedures shown in Figure 2 is the use of standard data formats and codes. By standardizing these data elements, computer programs will need to be developed only once. Costs for quality review will be minimized, because these programs will not require extensive modifications for each data set that is scanned. An 19 ------- additional benefit of standardization is increased user familiarity with data files and variables. Formats This section describes the recommended system for formatting historical estuarine data. The standardized, modular structure of the system is designed for the following purposes: • Reduce quality review and maintenance costs over a multiyear operational period. 0 Ensure consistency in naming conventions and file structures. • Facilitate system updates and modifications. • Minimize use of on-line disk space at NCC. • Facilitate use of data by program participants. • Reduce training time and associated costs. • Minimize data retrieval time. • Facilitate the addition of specialized data from individual estuary programs. To achieve the above objectives, it is recommended that top-down standards be established for the following system levels: • Names and organization of SAS libraries as catalogued in the NCC environment. 0 Names and organization of members within SAS libraries. 0 Names and organization of variables within the SAS members. 20 ------- Details on standards for these levels are provided in the following sections. Naming Conventions and Organization of SAS Libraries— At the highest level, SAS library names should be developed in a consistent fashion to allow users to quickly identify and retrieve data of interest. It is recommended that the following standard three-level naming convention be used: PREFIX.ESTUARY.DATA_TYPE, where: PREFIX = the catalog prefix assigned by NCC (e.g., XXXODES) ESTUARY = a two-character code unique to each estuary study area (e.g., "NB" for Narragansett Bay) DATA_TYPE = a three-character code for standard types of data (e.g., "WAC" for Water Column Data). For example, all water column data for Narragansett Bay would be stored in an SAS library named "XXXODES.NB.WAC." Naming Conventions and Organization of SAS Members— Within the SAS libraries, members should be organized in a five-level hierarchy based on the range of information they contain. Member names and hierarchy levels should be the same for all data types. Information common to more than one level should be retained only at the highest level for which it is relevant. All levels should contain one or more primary sort keys that would enable users to move from level to level by using SAS "MERGE" commands. The standardized five-level hierarchical organization minimizes data retrieval time, user-training time, and system resource demands. Because it 21 ------- is modular, it gives program participants a great deal of flexibility in their use of data, and it simplifies modification and maintenance of the entire system. It is recommended that the following hierarchy of members and member names be used for all data types: • DATA_SET - contains basic information about the data collected; provides a descriptive index to the data set. • VARIABLE - contains a list of all variables in the data set and their general quality review status (data dictionary). t STATION - contains station-specific information and flags. • SAMPLE - contains sample-specific information and flags. • SOURCE - contains variable-specific information and flags; may also contain additional regional variables. Figure 3 provides a diagram of the hierarchical relationship among these five SAS members. For example, under this organizational scheme, all station-specific data values for Narragansett Bay water column data would be contained in SAS library XXXODES.NB.WAC, member STATION. These values would not be repeated in member SAMPLE. To obtain station-specific values for use with SAMPLE data, users would simply sort and then merge STATION and SAMPLE by their common primary sort key, sample code. Naming Conventions and Organization of SAS Variables— For each SAS member, there will be a series of standard SAS variables. Variables will remain as uniform as possible across all data types, recog- nizing the obvious differences in file structures for different data types. For example, members SAMPLE and SOURCE will contain additional depth variables for water column data, and member SOURCE may contain additional regional variables. Currently, OS_ID, STN_CD, and SAMP_ID are designed to be used as primary sort keys on which data from different members may be 22 ------- DATA_SET KEYS: DS ID STATION KEYS: DSJD STN CD VARIABLE KEYS: DS ID STN CD SAMPLE KEYS: STN_CD SAMP ID SOURCE KEYS: STN CD SAMP ID Figure 3. Schematic of the recommended five-level hierarchy for SAS libraries. 23 ------- matched. However, all variables have the potential to be used as keys which may be matched to information in other files, tables, and data dictionaries. Use of these standard variables will enable users to access special tables and data dictionaries in a logical and efficient manner; obtain uniform definitions, ranges for variables, and units of measurement for data comparisons; and document and disseminate information according to a standardized format. For example, program participants may choose to develop specialized tables of values to perform additional edits on their data. This standardized system would allow those program participants to use the same tables to selectively process all data sets in the system. By contrast, an understandable method of naming and organization would require the use of multiple tables. Use of these standard variables should also simplify system modification, maintenance, and documentation. In accordance with these standards, it is recommended that standard variables be used and organized as follows [note that field type (A=alphanumeric, N=numeric, I=integer) and length are listed for each variable]: • DATA_SET (Standard format for all data types, K = Key Variable) K — DS_ID - data set identification code (A10) — ESTUARY - name of the estuary from which the data were obtained (A20) — DATASET - name of the data set (A40) — SUBMITR - name of the individual or organization responsible for submittal of the data (A15) -- SUB_ADDR - address of the data submitter (A40) — SUB_PHON - phone number of the data submitter (A12) — SD_ED - starting and ending dates for the sampling period (N12 or SAS date YYMMDD) — STACOUNT - number of stations included in the data set (15) — DOC - field indicating whether documentation for the data set is present (A3) — PURPOSE - field describing the purpose of the data (A40) 24 ------- — QC_LEVEL - field expressing the submitter's subjective review of the overall quality of the data (A5) — AUTHOR - if Doc flag is set, author.name (A40) — YEAR - if Doc flag is set, year of document (14) — TITLE - if Doc flag is set, first 80 characters of title (A80) — JOURNAL - if Doc flag is set, journal name (A40) — VOL_PAGE - if Doc flag is set, volume and page numbers (A20). 0 VARIABLE (Standard format for all data types, K = Key Variable) K — DS_ID - data set identification code (A10) — VARIABLE - name of variable in data set (A12) — QA_RV - flag indicating whether quality review was performed for the above variable (Al) — UNITS - units of each variable (A15) — VARCOM - comment field for each variable (A60) — METHOD - method code (A12). t STATION (Standard format for all variable types, K = Key Variable) K — DS_ID - data set identification code (A10) K — STN_CD - code identifying the station at which sampling was performed (A7) — F_STN_CD - flag providing information about the quality of the value for STN_CD (Al) — LAT - latitude (degrees, minutes, and seconds to nearest tenth) at which the station is located (N7) — F_LAT - flag providing information about the quality of the value for LAT (Al) — LONG - longitude (degrees, minutes, and seconds to nearest tenth) at which the station is located (N8) — FJ.ONG - flag providing information about the quality of the value for LONG (Al) 25 ------- -- SDEPTH - station depth (meters to nearest tenth) (N5) — F_SDEPTH - flag providing information about the quality of the value for SDEPTH (Al). • SAMPLE (Additional water column variables are preceded by '*', K = Key Variable) K — STN_CD - code identifying the station at which sampling was performed (A7) K — SAMP_ID - sample identification code (A4) — DATE - code indicating the date (year, month, day) on which the sample was taken (N6 or SAS date YYMMDD) — F_DATE - flag providing information about the quality of the value for DATE (Al) — TIME - code indicating the time (hours, minutes) at which the sample was taken (N4 or SAS format HHMM) — F_TIME - flag providing information about the quality of the value for TIME (Al) — TIDE_HT - tidal height (meters to nearest tenth) (N3) — F_TIDE - flag providing information about the quality of the value for TIDE_HT (Al) — WAVE HT - wave height (meters to nearest tenth) (Al) — F_WAVE - flag providing information about the quality of the value for WAVE_HT (Al) — CURR_SP - current speed to nearest tenth (N3) — F_CURR - flag providing information about the quality of the value for CURR_SP (Al) — WIND_SP - wind speed to nearest tenth (N2) — F_WIND - flag providing information about the quality of the value for WIND_SP (Al) * — DEPTH - depth at which sample was taken (meters to nearest hundredth) (N6) * — F_DEPTH - flag providing information about the quality of the value for DEPTH (Al). 26 ------- • SOURCE - (Additional water column variable is preceded by ***, K = Key Variable) K — STN_CD - code identifying the station at which sampling was performed (A7) K — SAMP_ID - sample identification code (A4) — DATE - date (year, month, day) on which sample was taken (N6 or SAS format YYMMDD) — TIME - time (hours, minutes) at which sample was taken (N4 or SAS format HHMM) * — DEPTH - depth (meters) at which sample was taken (to nearest hundredth) (N5) — VARIABLE - name of variable in data set (A12) — F_VAR - flag providing information about the quality of VARIABLE (Al) — ORIG_AMOUNT - value of the original variable as reported by the investigator (N8) — STD_AMOUNT - value of the variable in National Estuary Program units (N8) — F_AMOUNT - flag providing information about the quality of the value for AMOUNT (Al) -- ORIGJJNIT - unit of measurement used to express variable value as reported by the investigator (A3) — STDJJNIT - National Estuary Program standard units (A3) — F_UNIT - flag providing information about the quality of the value (Al). Codes Taxonomic, variable, and method codes as specified in the Ocean Data Evaluation System (ODES) Data Submissions Manual are recommended for use with National Estuary Program data. Key features of these codes are the use of National Ocean Data Center (NODC) codes for species identifications as well as mnemonic codes for chemical variables (e.g., the code for copper is "copper"). 27 ------- ESTUARINE VARIABLES The variables encountered most frequently in the historical estuarine data sets submitted to the National Estuary Program are listed in Table 2. These variables are the ones for which computerized quality review criteria were developed. Other variables that have been measured in estuarine studies are encountered less frequently than those in Table 2 and were not considered for quality review. The estuarine variables can be grouped into the following four categories: • Station information - geographic location and depth of the station, time and location (i.e., depth) of sample collection, and characteristics of gross environmental variables (i.e., tides, currents, wind) at the time of sampling. • Water column variables - physical and chemical characteristics of the water column. 0 Sediment variables - physical and chemical characteristics of bottom sediments. 0 Biological variables - abundances, tissue chemical concen- trations, and other characteristics of aquatic organisms. CRITICAL DATA REQUIREMENTS The critical data requirements for estuarine variables are listed in Table 3. They include sampling location, sampling time, analytical method, and measurement units for all variables, as well as a range of additional variable-specific requirements. Each critical data requirement should be included on the computerized record for each respective value. Missing critical data will be identified as such during the automated quality reviews of historical estuarine data. 28 ------- TABLE 2. LIST OF ESTUARINE VARIABLES Station Information Water Column Sediment Biological Latitude Longitude Station depth Sampling date Sampling time Sample depth Tidal height Wave height Current speed Wind speed Water temperature pH Dissolved oxygen Salinity Turbidity Transparency Total suspended solids Specific conductivity Chloride Nitrogen Phosphorus Carbon Total alkalinity Silica Chemical contaminants3 Grain size: -gravel -sand -silt -clay Total solids Total volatile solids Total organic carbon Oil and grease Chemical contaminants3 Benthic invertebrates: -area of sampler -sieve mesh size -species abundance Megainvertebrates: -species abundance -tissue chemical contaminants3 -tissue lipids Demersal fishes: -fishing duration -distance fished -species abundance -fish length -fish weight -tissue chemical contaminants3 -tissue lipids Phytoplankton: -species abundance -chlorophyll a Bacteria: -total coliforms -fecal coliforms 3 U.S. EPA priority pollutants and other chemicals. 29 ------- TABLE 3. CRITICAL DATA REQUIREMENTS FOR ESTUARINE VARIABLES3 Variable Additional Critical Data Requirements1* WATER COLUMN Water temperature PH Total alkalinity Dissolved oxygen Salinity Specific conductivity Turbidity Transparency Total suspended solids Chloride Nitrogen (all kinds) Whole Filtered Particulate Phosphorus (all kinds) Whole Filtered Particulate None None pH for manual titrimetric method (should=4.5) Time of day None Water temperature (should=25° C) None Time of day Kind of filter, filter pore size None None Kind of filter, filter pore size Kind of filter, filter pore size None Kind of filter, filter pore size Kind of filter, filter pore size 30 ------- TABLE 3. (Continued) Variable Additional Critical Data Requirements Carbon (all kinds) Whole Filtered Particulate Total silica Filtered Chemical contaminants SEDIMENT Grain size (all fractions) Total solids Total volatile solids Total organic carbon Oil and grease Chemical contaminants BIOLOGICAL Benthic invertebrates Species abundance Megainvertebrates Species abundance Tissue levels of chemical contaminants None Kind of filter, filter pore size Kind of filter, filter pore size Kind of filter, filter pore size Detection limits, holding times Presence/absence of oxidation step None Combustion temperature None None Detection limits, holding times Kind of sampler, area of sampler, sieve mesh size Kind of sampler, mesh size (if applicable), area or time fished (if applicable) Detection limits, holding times 31 ------- TABLE 3. (Continued) Variable Additional Critical Data Requirements Demersal fishes Species abundance Tissue levels of chemical contaminants Phytoplankton Species abundances Chlorophyll a Bacteria Total or fecal coliform abundance Kind of sampler, mesh size (if applicable), area or time fished (if applicable) Detection limits, holding times Kind of sampler, enumeration method None None a Universal requirements for all variables are location, time of measurement, analytical method, and measurement units. b Other than the universal requirements. 32 ------- In addition to the critical data requirements, various other kinds of information are desirable for interpreting and evaluating most kinds of estuarine data. These additional kinds of information are discussed below (see Technical Evaluations). Approximately 60 percent of the estuarine variables have some kind of critical data requirement (Table 3) other than sampling location, sampling time, analytical method, and measurement units. The most common kind of variable-specific requirement is related to the collection and partitioning of samples prior to laboratory analysis (e.g., kind of filter, filter pore size, kind of biological sampling equipment, mesh sizes of biological samplers). A second common requirement is related to the conditions under which laboratory measurements were made (e.g., titration endpoints, water temperature, incubation temperature, combustion temperature, presence/absence of oxidation step). Because all of the above factors can bias analytical results, they must be known so that data can be interpreted accurately. RANGE LIMITS The range limits for estuarine variables are presented in Tables 4-6. Two kinds of range limits are used to identify unusual and unlikely values. Unusual values are ones that are extreme but are sometimes encountered. Unlikely values are also extreme, but are almost never encountered or are not possible. Values exceeding the specified range limits will be identified as such during the automated quality reviews of historical estuarine data. The range limits presented in Tables 4-6 were developed from a national perspective. That is, they correspond to the ranges of values encountered over all estuaries of the National Estuary Program. The ranges commonly found in individual estuaries may be narrower than those presented here. For many kinds of chemical variables (e.g., nutrients, chemical contami- nants), ranges were specified for individual chemicals or groups of chemicals. This is appropriate because most of these chemicals could possibly occur in all of the estuaries within the National Estuary Program. By contrast with chemical variables, the primary biological variable (i.e., 33 ------- TABLE 4. RANGE LIMITS FOR ESTUARINE VARIABLES Range Lower Variable STATION INFORMATION Latitude Longitude Station depth Sampling date Sampling time Sample depth Tidal height Wave height Current speed Wind speed WATER COLUMN Water temperature PH Dissolved oxygen Salinity Units Degrees Minutes Seconds Degrees Minutes Seconds m Month Day Year h m m m m/sec m/sec °C Standard units mg/L PPt A 34 0 0 70 0 0 0 1 1 1900 0000 0 -1.2 0 0 0 0 6 0 0 B 34 0 0 70 0 0 0 1 1 1940 0000 0 -1.5 0 0 0 0 5 0 0 Limits3 Upper A 49 59 59 125 59 59 200 12 31 1987 2400 200 4.0 2.0 4.0 13.0 30 9 14 32 B 49 59 59 125 59 59 245 12 31 1987 2400 245 4.5 3.0 5.0 18.0 35 11 17 35 34 ------- TABLE 4. (Continued) Range Limits Lower Upper Variable Turbidity Transparency (Secchi depth) Total suspended solids Specific conductivity Chloride Total dissolved nitrogen -filtered Total Kjeldahl nitrogen -filtered -whole Particulate organic nitrogen Nitrite -filtered -whole Nitrate -filtered -whole Nitrite and nitrate -filtered -whol e Ammonia -filtered -whole Units NTU m mg/L umhos/cm mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L mg/L as as as as as as as as as as as as N N N N N N N N N N N N A 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 .5 .1 .5 .02 .1 .1 .00005 .0004 .0004 .001 .001 .001 .001 .001 .001 B 0 0. 0. -1 0 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1 5 60 19 02 1 1 A 300 1 10.0 250 ,000 100 ,000 25 2 2.1 2.1 00005 3 0004 0.2 0004 0.2 001 001 001 001 001 001 2 2 2 2 1 1 B ,000 10. 500 ,000 ,000 4 3 4 6 0. 0. 4 4 4 4 2 2 0 4 4 Total inorganic nitrogen mg/L as N 0.001 0.001 35 ------- TABLE 4. (Continued) Variable Total phosphorus -filtered -particulate -whol e Orthophosphate -filtered Inorganic phosphorus -whol e Organic phosphorus -filtered Organic carbon -filtered -whole Total carbon Total alkalinity Total silica filtered Chemical contaminants SEDIMENT Grain size -gravel -sand -silt -clay Total solids Total volatile solids Total organic carbon Oil and grease Units mg/L as P mg/L as P mg/L as P mg/L as P mg/L as P mg/L as P mg/L mg/L mg/L mg/L as CaC03 mg/L as Si mg/L % dry weight % dry weight % dry weight % dry weight % wet weight % dry weight % dry weight mg/kg dry weight Lower A 0.003 0.001 0.005 0.001 0.001 0.001 0.4 0.5 0.4 1 0.01 (see 0 1 1 1 5 0.1 0.1 5 Range B 0.003 0.001 0.002 0.001 0.001 0.001 0.4 0.5 0.4 1 0.01 Table 0 0 0 0 0 0 0 0 2, Limits Upper A 0. 0. 1 0. 0. 0. 10 20 30 125 3 5) 98 98 98 98 90 50 20 000 B 5 1 3 0.6 2 2 0.4 2 0.4 2 0.4 20 40 60 250 6 100 100 100 100 100 100 75 20,000 Chemical contaminants mg/kg dry weight (see Table 5) 36 ------- TABLE 4. (Continued) Variable Units Range Limits Lower Upper B BIOLOGICAL Benthic Invertebrates Area of sampler Sieve mesh size Species abundance Megainvertebrates Species abundance Tissue chemical contaminants Tissue total extractable lipids Demersal Fishes Net widthb Net mesh size** Fishing duration** Distance fished** Species abundance** Individual length Individual weight Tissue chemical contaminants Tissue total extractable lipids Phytoplankton Species abundance Chlorophyll a (corrected) mm #/m2 0.05 0.01 0.1 0.25 0.5 0.5 1.0 1.0 0 0 10,000 20,000 0 10 100 ;see Table 6) 0.1 20 100 #/m2 0 mg/kg wet weight % wet weight 0 m 3 mm 6 min 5 m 50 #/haul 0 mm (TL or SL) 2 g wet weight 1 mg/kg wet weight (see Table 6) % wet weight 0.1 0 20 100 1 0 0 10 0 0 0 9 50 30 2,000 100 600 5,000 15 100 60 5,000 500 1,000 10,000 #/mL ug/L 0 0 5,000 10,000 0.01 0.01 200 400 Bacteria Total col i forms -water -tissue Fecal col i forms -water -tissue MPN/100 mL MPN/100 g MPN/100 mL MPN/100 g 0 0 0 0 0 0 0 0 10,000 100,000 1,000 10,000 10,000 100,000 1,000 10,000 a A = Range limit for unusual values. B = Range limit for unlikely values. b For collections made with otter trawls. 37 ------- TABLE 5. UPPER RANGE LIMITS FOR CHEMICAL CONTAMINANTS IN THE WATER COLUMN AND BOTTOM SEDIMENTS Variable3 Water Column** Sediment^ (mg/L) (mg/kg dry weight) A B B Phenols *phenol 2-methylphenol 4-methylphenol *2,4-dimethylphenol *2-chlorophenol *2,4-dichlorophenol *4-chloro-3-methylphenol *2,4,6-trichlorophenol 2,4,5-trichlorophenol *pentachlorophenol *2-nitrophenol *4-nitrophenol *2,4-dinitrophenol *4,6-dinitro-o-cresol Low Molecular Weight Aromatic Hydrocarbons *naphthalene *acenaphthylene *acenaphthene *fluorene *phenanthrene *anthracene ACID-EXTRACTABLE COMPOUNDS 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.5 10 10 10 10 10 10 10 10 10 10 10 10 10 10 100 100 100 100 100 100 100 100 100 100 100 100 100 100 BASE-NEUTRAL EXTRACTABLE COMPOUNDS 0.05 0.05 0.05 0.05 0.05 0.05 0.5 0.5 0.5 0.5 0.5 0.5 10 10 10 10 10 10 100 100 100 100 100 100 High Molecular Weight Aromatic Hydrocarbons *fluoranthene *pyrene *benzo(a)anthracene *chrysene *benzo(b)f1uoranthene *benzo(k)f1uoranthene *benzo(a)pyrene *indeno(l,2,3-c,d)pyrene *dibenzo(a,h)anthracene *benzo(g,h,i jperylene .05 .05 0.05 0.05 .05 .05 .05 .05 0.05 0.05 0. 0. 0. 0. 0. 0. 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0.5 10 10 10 10 10 10 10 10 10 10 100 100 100 100 100 100 100 100 100 100 38 ------- TABLE 5. (Continued) Variable3 Water Column** (mg/L) A B Sediment** (mg/kg dry weight) A B Chlorinated Aromatic Hydrocarbons *1,3-di chlorobenzene *l,4-dichlorobenzene *l,2-dichlorobenzene *1,2,4-trichlorobenzene *2-chloronaphthalene *hexachlorobenzene (HCB) 0.05 0.05 0.05 0.05 0.05 0.05 0. 0. 0. 0.5 0.5 0.5 10 0.5 3 0.5 0.1 1 100 5 30 5 1 10 Chlorinated Aliphatic Hydrocarbons *hexachloroethane *hexachlorobutadiene *hexachlorocyclopentadiene 0.05 0.05 0.05 0.5 0.5 0.5 1 1 0.1 10 10 1 Halogenated Ethers *bis(2-chloroethyl) ether 0.05 0.5 *bis(2-chloroisopropyl) ether 0.05 0.5 *bis(2-chloroethoxy)methane 0.05 0.5 *4-chlorophenyl phenyl ether 0.05 0.5 *4-bromophenyl phenyl ether 0.05 0.5 0.1 0.5 0.1 0.1 0.1 1 5 1 1 1 Phthalates *dimethyl phthalate 0.05 0.5 *diethyl phthalate 0.05 0.5 *di-n-butyl phthalate 0.05 0.5 *benzyl butyl phthalate 0.05 0.5 *bis(2-ethylhexyl)phthalate 0.05 0.5 *di-n-octyl phthalate 0.05 0.5 0. 0. 2 1 2 5 5 5 50 20 50 100 Miscellaneous Oxygenated Compounds *isophorone benzyl alcohol benzoic acid *2,3,7,8-tetrachlorodi- benzo-p-dioxin dibenzofuran 0.05 0.05 0.05 0.001 0.05 0.5 0.5 0.5 0.01 0.5 1 0.5 1 0.5 2 10 5 10 5 20 39 ------- TABLE 5. (Continued) Water Column^ (mg/L) Sediment'* (mg/kg dry weight) Variable3 Organonitrogen Compounds aniline *nitrobenzene *N-n i t roso-d i -n-propyl ami ne 4-chloroaniline 2-nitroaniline 3-nitroaniline 4-nitroani 1 ine *2,6-dinitrotoluene *2,4-dinitrotoluene *N-ni trosodi phenyl ami ne *N-ni trosodimethyl ami ne *1 ,2-di phenyl hydrazi ne *benzidine (4,4'-diamino bi phenyl ) *3,3'-dichlorobenzidine A 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 B 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 A 1 0.1 0.1 0.5 0.5 0.5 0.5 0.1 0.1 1 ? 6.1 0.1 0.1 B 10 1 1 5 5 5 5 1 1 10 7 i i i PESTICIDES AND PCBs Pesticides *p,p'-DDE *p,p'-DDD *p,p'-DDT *aldrin *dieldrin *chlordane *alpha-endosulfan *beta-endosulfan *endosulfan sulfate *endrin *endrin aldehyde *heptachlor *heptachlor epoxide *alpha-HCH *beta-HCH *delta-HCH *gamma-HCH (lindane) *toxaphene 0001 0001 0001 0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0001 0001 0001 0.005 0.001 0.001 0.001 0.0001 0. 0. 0. 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.05 0.01 0.01 0.01 0.001 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 40 ------- TABLE 5. (Continued) Water Column^ Sediment** (mg/L) (mq/kq dry weiqht) Variable3 PCBs *Aroclor 1016 *Aroclor 1221 *Aroclor 1232 *Aroclor 1242 *Aroclor 1248 *Aroclor 1254 *Aroclor 1260 Total PCBs A 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 0.000005 B 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 0.00005 A 1 1 1 1 1 5 4 10 B 10 10 10 10 10 50 40 100 VOLATILE ORGANIC COMPOUNDS Volatile Halogenated Alkanes dichlorodi fluoromethane *chloromethane *bromomethane *chloroethane *methy1ene chloride (dichloromethane) f1uorotrichloromethane *1, l'-dichloroethane *chloroform *l,2-dichloroethane *l,l,l-trichloroethane *carbon tetrachloride *bromodichloromethane *1,2-di chloropropane *chlorodibromomethane *1,1,2-tri chloroethane *bromoform *1,1,2,2-tetrachloroethane 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.5 0.5 0.5 0.5 0. 0. 0. 0. 0. 0. 0, 0. 0. 0.5 0.5 0.5 0.5 0.1 0.1 0.1 0.1 10 0.1 0.1 1.0 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 1,000 1 1 10 1 1 1 1 1 1 1 1 1 Volatile Halogenated Alkenes *vinyl chloride *l,r-dichloroethene *trans-l,2-dichloroethene cis and trans-l,3-dichloro- propene *trichloroethene *tetrachloroethene 0.05 0.05 0.05 0.05 0.05 0.05 0.5 0.5 0.5 0.5 0.5 0.5 0.1 0.1 0.1 0.1 0.1 1 1 1 1 1 1 10 41 ------- TABLE 5. (Continued) Variable3 Water Column** (mg/L) A B Sediment^ (mg/kg dry weight) A B Volatile Aromatic Hydrocarbons *benzene *toluene *ethylbenzene styrene (ethenylbenzene) total xylenes *chlorobenzene 0, 0. 0. 05 05 05 0.05 0.05 0.05 0. 0. 0. 0. 0.5 0.5 0.1 0.1 0.5 0.5 1 0.1 1 1 5 5 10 1 Volatile Unsaturated Carbonyl Compounds *acrolein *acrylonitrile 0.05 0.05 0.5 0.5 0.1 0.1 Volatile Ethers bis(chloromethyl)ether *2-chloroethylvinyl ether 0.05 0.05 0.5 0.5 0.1 0.1 Volatile Ketones acetone 2-butanone 2-hexanone 4-methyl-2-pentanone 0.05 0.05 0.05 0.05 0.5 0.5 0.5 0.5 0.1 0.1 0.1 0.1 Miscellaneous Volatile Compounds carbon disulfide vinyl acetate aluminum *antimony *arsenic *beryllium *cadmium *chromium 0.05 0.05 0.05 0.05 0.05 0.1 0.1 0.5 0.5 METALS 100 0.5 0.5 0.5 1 1 0. 0. ,000 20 200 1 50 500 1 1 1 1 500,000 5,000 100,000 100 1,000 50,000 42 ------- TABLE 5. (Continued) Variable3 Water Columnb (mg/L) A B Sediment** (mg/kg dry weight) A B *copper *lead *mercury *nicke1 *selenium *silver *thal Hum *zinc *cyanide iron 0.1 0.6 0.001 0.6 0.05 0.1 0.05 0.1 10 — 1 2 0.01 2 0.5 1 0.5 1 100 — 500 1,000 1 100 5 5 1 1,000 0. 100,000 500,000 100,000 500 5,000 500 500 100 100,000 5 100 500,000 a Each U.S. EPA priority pollutant is preceded by an asterisk. b A = Range limit for unusual values. B = Range limit for unlikely values. 43 ------- TABLE 6. UPPER RANGE LIMITS FOR CHEMICAL CONTAMINANTS IN MUSCLE AND LIVER TISSUE Variable3 Muscle Tissue** (mg/kg wet weight) A B Liver Tissueb (mg/kg wet weight) A B Phenols *phenol 2-methylphenol 4-methylphenol *2,4-dimethylphenol *2-chlorophenol *2,4-d i chlorophenol *4-chl oro-3-methylphenol *2,4,6-trichlorophenol 2,4,5-trichlorophenol *pentachlorophenol *2-nitrophenol M-nitrophenol *2,4-dinitrophenol *4,6-di ni tro-o-cresol Low Molecular Weight Aromatic Hydrocarbons *naphthalene *acenaphthylene *acenaphthene *fluorene *phenanthrene *anthracene ACID-EXTRACTABLE COMPOUNDS 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.01 0.005 0.005 0.005 0.005 0. 0. 0.05 0.05 0.05 0.05 .05 .05 0.05 0.05 0.05 0.1 0.05 0.05 0.05 0.05 0. 0. 0. 0. 0. 0. 0. 0. 0.1 0. 0. 0. 0. 0.1 BASE-NEUTRAL EXTRACTABLE COMPOUNDS 0.1 0.01 0.01 0.01 0.1 0.01 2 1 1 1 1 1 0. 0. 0. 0. 0. 0.2 2 2 2 2 2 2 High Molecular Weight Aromatic Hydrocarbons *fluoranthene *pyrene *benzo(a)anthracene *chrysene *benzo(b)f1uoranthene *benzo(k)f1uoranthene *benzo(a)pyrene *indeno(l,2,3-c,d)pyrene 0. 0. 0. 0. 0. 0. 0.1 0.1 0. 0. 0. 0. 0. 0. 0. 0.1 44 ------- TABLE 6. (Continued) Muscle Tissueb Liver Tissue1' (mg/kg wet weight) (mg/kg wet weight) Variable3 *dibenzo(a, *benzo(g,h, h)anthracene ijperylene 0 0 A .1 .1 B 1 1 A Oil 0.1 B 1 1 Chlorinated Aromatic Hydrocarbons *l,3-dichlorobenzene 0.02 0.2 0.1 1 *l,4-dichlorobenzene 0.02 0.2 0.1 1 *l,2-dichlorobenzene 0.02 0.2 0.1 1 *l,2,4-trichlorobenzene 0.02 0.2 0.1 1 *2-chloronaphthalene 0.02 0.2 0.1 1 *hexach1orobenzene (HCB) 0.02 0.5 1.0 10 Chlorinated Aliphatic Hydrocarbons *hexachloroethane 0.05 0.5 0.2 2 *hexachlorobutadiene 0.1 1 1 10 *hexachlorocyclopentadiene 0.05 0.5 0.2 2 Halogenated Ethers *bis(2-chloroethyl) ether 0.01 0.1 0.5 5 *bis(2-chloroisopropyl) ether 0.01 0.1 0.5 5 *bis(2-chloroethoxy)methane 0.01 0.1 0.5 5 *4-chlorophenyl phenyl ether 0.01 0.1 0.5 5 *4-bromophenyl phenyl ether 0.01 0.1 0.5 5 Phthalates *dimethyl phthalate 0.01 0.1 0.05 0.5 *diethyl phthalate 0.01 0.1 0.05 0.5 *di-n-butyl phthalate 0.01 0.1 0.5 5 *benzyl butyl phthalate 0.01 0.1 0.05 0.5 *bis(2-ethylhexyl)phthalate 1 10 0.05 0.5 *di-n-octyl phthalate 1 . 10 0.05 0.5 45 ------- TABLE 6. (Continued) Variable3 Muscle Tissue** (mg/kg wet weight) A B Liver Tissue'* (mg/kg wet weight) A B Hi seel 1 aneous Oxygenated Compounds *isophorone benzyl alcohol benzoic acid *2,3,7,8-tetrachlorodi- benzo-p-dioxin dibenzofuran Organonitrogen Compounds aniline *nitrobenzene *N-ni troso-di -n-propyl ami ne 4-chloroaniline 2-nitroaniline 3-nitroaniline 4-nitroanil ine *2,6-dinitrotoluene *2,4-dinitrotoluene *N-ni trosodi phenyl ami ne *N-ni trosodimethyl ami ne *1 ,2-di phenyl hydrazi ne *benzidine (4,4'-diamino bi phenyl ) *3,3'-dichlorobenzidine 0.01 0.01 0.01 0.001 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.1 0.1 0.1 0.01 0.1 0.1 0.1 01 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.05 0.05 0.05 0.005 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.05 0.5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 PESTICIDES AND PCBs Pesticides *p,p'-DDE *p,p'-DDDO *p,p'-DDTO *aldrin *dieldrin *chlordane *alpha-endosulfan *beta-endosulfan *endosulfan sulfate *endrin *endrin aldehyde *heptachlor *heptachlor epoxide .5 .5 0.005 0.01 0.1 0.01 0.01 0.01 0.01 0.01 0.01 0.01 20 10 10 0.05 0.1 1 0.1 '0.1 0. 0. 0. 0. 50 1 50 0. 0. 0. 0. 1 1 1 01 0.1 0.01 0.01 0.01 0.01 0.01 0.01 500 10 500 1 1 1 0. 0. 0. 0. 0. 0. 0.1 46 ------- TABLE 6. (Continued) PCBs *Aroclor 1016 *Aroclor 1221 *Aroc1or 1232 *Aroclor 1242 *Aroclor 1248 *Aroclor 1254 *Aroclor 1260 Total PCBs Muscle Tissueb (mg/kg wet weight) Liver Tissue*1 (mg/kg wet weight) Variable3 *alpha-HCH *beta-HCH *delta-HCH *gamma-HCH (lindane) *toxaphene A 0.01 0.01 0.01 0.01 0.01 B 0.1 0.1 0.1 0.1 0.1 A 0.01 0.01 0.01 0.01 0.01 B 0.1 0.1 0.1 0.1 0.1 0. 0. 0. 0. 0, 1 1 2 5 5 5 5 5 10 10 20 5 5 5 5 5 10 10 20 50 50 50 50 50 100 100 200 VOLATILE ORGANIC COMPOUNDS Volatile Halogenated Alkanes dichlorodifluoromethane *chloromethane *bromomethane *chloroethane *methylene chloride (dichloromethane) fluorotrichloromethane *l,l'-dichloroethane *chloroform *l,2-dichloroethane *l,l,l-trichloroethane *carbon tetrachloride *bromodi chloromethane *1,2-di chloropropane *chlorodibromomethane *1,1,2-trichloroethane *bromoform *1,1,2,2-tetrachloroethane 0.005 0.005 0.005 0.005 0. 0. 0.005 0.005 .005 .005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.005 0.05 0.05 0.05 0.05 0.05 0.05 .05 .05 .05 .05 0.05 0.05 .05 .05 0.05 0.05 0.05 0. 0. 0. 0. 0. 0. 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 0 0 0.5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 Volatile Halogenated Alkenes *vinyl chloride 0.005 0.05 0.5 5 *l,l'-dichloroethene 0.005 0.05 0.5 5 *trans-l,2-dichloroethene 0.005 0.05 0.5 5 47 ------- TABLE 6. (Continued) Muscle Tissue** Liver Tissueb (mg/kg wet weight) (mg/kg wet weight) Variable3 *cis and trans-l,3-dichloro- propene *trichloroethene *tetrachloroethene A 0.005 0.005 0.01 B 0.05 0.05 0.1 A 0.5 0.5 0.5 B 5 5 5 Volatile Aromatic Hydrocarbons *benzene 0.01 0.1 0.5 5 *toluene 0.01 0.1 0.5 5 *ethylbenzene 0.01 0.1 0.5 5 styrene (ethenylbenzene) 0.01 0.1 0.5 5 total xylenes 0.01 0.1 0.5 5 *chlorobenzene 0.01 0.1 0.5 5 Volatile Unsaturated Carbonyl Compounds *acrolein 0.05 0.5 0.5 5 *acrylonitrile 0.05 0.5 0.5 5 Volatile Ethers bis(chloromethyl)ether 0.005 0.05 0.5 5 *2-chloroethylvinyl ether 0.005 0.05 0.5 5 Volatile Ketones acetone 0.005 0.05 0.5 5 2-butanone 0.005 0.05 0.5 5 2-hexanone 0.005 0.05 0.5 5 4-methyl-2-pentanone 0.005 0.05 0.5 5 Miscellaneous Volatile Compounds carbon disulfide 0.005 0.05 0.5 5 vinyl acetate 0.005 0.05 0.5 5 48 ------- TABLE 6. (Continued) Variable3 Muscle Tissue^ (mg/kg wet weight) A B Liver Tissue^ (mg/kg wet weight) A B METALS aluminum *antimony *arsenic *beryllium *cadmium *chromium *copper *lead *mercury *nickel *selenium *silver *thallium *zinc 1 5 1 5 5 20 5 0.5 1 1 1 1 50 10 50 10 50 50 200 50 5 10 10 10 10 500 5 10 5 5 5 50 5 20 5 5 5 5 100 50 100 50 50 50 500 50 20 50 50 50 50 1,000 a Each U.S. EPA priority pollutant is preceded by an asterisk. b A = Range limit for unusual values. B = Range limit for unlikely values. 49 ------- species abundance) was considered at a general level for all groups except bacteria. Species-specific range limits could not be developed from a national perspective because species composition differs among estuaries. Differences in species composition are most dramatic between east and west coast estuaries. NATIONAL QUALITY REVIEW Automated quality review checks will be made of all historical data sets included in the database of the National Estuary Program. Checks will be made for proper formats and codes, critical data requirements, and range limits. These checks will be made from a national perspective, using the specifications presented in this document. Each data value identified by the automated checks will have a qualifier permanently attached to it, but otherwise remain intact in the database. After being subjected to the automated quality review, data sets will be made available to the regional offices for use in characterizing their respective estuaries. The regional offices will decide how to treat qualified data values and will have the option of conducting a more rigorous evaluation of the data. REGIONAL QUALITY REVIEW After estuarine data sets have been reviewed and qualified at the national level, the regional offices may conduct additional evaluations before the data are used. This section presents general guidance for conducting these additional evaluations using both automated checks and technical review. Automated QA/QC Checks Because all data sets of the National Estuary Program should have a standard format and coding system when they are made available to the regional offices, use of automated checks to conduct additional evaluations will be facilitated. The most effective method of conducting these 50 ------- evaluations might be to modify the quality review dictionary that has been developed for review at the national level. By "fine tuning" the existing quality review dictionary to represent the characteristics of individual estuaries, the regional offices can greatly enhance the effectiveness of the automated quality review checks. Examples of modifications include the following: t Addition of new variables to the list of estuarine variables. • Specification of additional critical data requirements for individual variables. • Adjustment of the range limits for each variable to represent more precisely the conditions encountered in individual estu- aries. Because the list of estuarine variables was limited to those variables commonly measured in most estuaries, variables measured primarily in a single estuary are not included. However, these somewhat unique variables may be important for characterizing conditions in a particular estuary. For example, hepatic lesions in demersal fish have been used routinely as indicators of biological effects in Puget Sound. Their use in other estuaries is much rarer. Thus, liver pathology should probably be added to the list of standard estuarine variables when evaluating historical information for Puget Sound. By narrowing the range limits for each variable, the precision of quality review checks would be enhanced. For example, the upper range limit for depth from a national perspective is 200 m, because depths in Puget Sound sometimes exceed that value. However, the maximum depth in Chesapeake Bay is less than 70 m. Thus, although depths of 70-200 m cannot occur in Chesapeake Bay, they would not be flagged as erroneous during the initial quality review. 51 ------- The greatest benefit from developing estuary-specific quality review dictionaries might be the ability to set species-specific criteria for all groups of organisms (e.g., phytoplankton, benthic invertebrates, megainverte- brates, fishes). As noted earlier, these kinds of criteria generally cannot be developed from a national perspective. A species list for an estuarine study could be examined to detect species known not to occur in that estuary. In addition, different range limits could be set for species that are always rare and species that are sometimes or always abundant in a particular estuary. Technical Evaluations In addition to conducting automated quality review checks, the regional offices may elect to have technical experts examine historical estuarine data sets. In some cases, these technical evaluations may require examina- tion of original documents (e.g., reports, laboratory notebooks, data sheets). For many historical data sets, the amount of information available for a detailed technical review will be limited. A general discussion of the kinds of information that may be required for a technical evaluation is presented below. Field Collection— Because field collection techniques can substantially influence the results obtained in subsequent data analyses, it is recommended that those techniques be evaluated as closely as possible. The evaluation should attempt to verify the following items (if applicable) for each data set: 0 Navigation was sufficiently accurate to ensure that the sample was collected at the appropriate location. • Collection containers and devices were cleaned properly before sample collection. • Collection devices were operated properly. 52 ------- t Samples were collected in a representative manner. • Samples were preserved, stored, and transported properly, so that sample integrity was maintained. The information needed to verify the above items generally can be found in final reports, cruise reports, field logbooks, and chain-of-custody docu- ments. Biological Laboratory Analyses— The primary biological measurement for most groups of organisms is number of individuals. Additional measurements often include biomass and size of organisms. A key concern for all of these measurements is accurate identification of organisms. Technical evaluation of biological laboratory analyses might focus on the following considerations: • Benthic sorting efficiency. • Subsampling representativeness. • Taxonomic accuracy. • Taxonomic representativeness. • Interlaboratory comparisons. Physical and Chemical Laboratory Analyses— The level of technical review appropriate for physical and chemical variables can differ, depending on the variable under consideration. For example, review of temperature measurements made with a thermometer may require only that the instrument be calibrated with a standard thermometer. By contrast, evaluation of measurements of U.S. EPA priority pollutant organic compounds may require measurements of extraction efficiency, recovery of spiked compounds, blanks, and replicate samples. Technical evaluation of 53 ------- physical and chemical laboratory analyses might focus on the following considerations: • Holding times. t Analytical methods. • Methods modifications. • Analyses of replicates. t Analyses of blanks. t Analyses of spikes. • Analyses of standard reference materials. 0 Instrument calibrations. 0 Laboratory audits. 0 Interlaboratory comparisons. 54 ------- |