NIST Position Statement on the Conduct of Key Comparisons

Approved by the Measurement Services Advisory Group, December 16, 2003

Introduction:

To facilitate international trade beneficial to U.S. industry, NIST participates in international interlaboratory comparisons, called Key Comparisons (KCs), to assess the degree of equivalence of measurement standards used at different National Metrology Institutes (NMIs). Having participated in nearly 250 comparisons and piloted nearly 70, NIST technical staff has asked for the development of a NIST-wide position regarding several interpretable points of the CIPM MRA. Questions have arisen regarding the following issues:

How definitive with regard to the intended statistical analysis should the design protocol of a comparison be?
Is there any leeway in changing results after they have been submitted for Draft A of the comparison report?
What is the status of the KCRV and what constitutes an "exceptional" case wherein the calculation of a KCRV does not make technical sense?
Is a single approach to the KCRV estimation required for all comparisons?
What is the relationship between the approval process for a Key Comparison Final Report and the review of relevant CMCs?

This NIST Position Statement seeks to answer these questions. Recognizing that there are both stochastic and non-stochastic elements in all interlaboratory comparisons, a companion document, "NIST Statement on Statistical Principles for the Design and Analysis of Key Comparisons," also gives guidelines to ensure that the design and analysis of Key Comparisons in which NIST participates will yield clearly interpretable estimates of the difference between measurements and standards of the NMIs and the associated uncertainties of these differences.

Statement of Position:

The conduct of key comparisons

The document, Guidelines for Key Comparisons, is not part of the CIPM MRA. However, Section T.6 of the MRA calls for use of the Guidelines when carrying out Key Comparisons. Key comparisons will be carried out according to the Guidelines with the following specific references:
- NIST specifically endorses Section 6, which states that the technical protocol will include a "list of the principal components of the uncertainty budget to be evaluated by each participant, and any necessary advice on how uncertainties are estimated." Moreover, the technical protocol should include both a statistical design and the intended approach to the statistical analysis of results.
- NIST specifically endorses Section 9, which defines how the integrity of the results is to be maintained through the development of Drafts A, B, and Final versions of the Key Comparison Report. Before Draft A of the report is prepared, each participant must submit its result. As stated in the Guidelines, a "result from a participant is not considered complete without an associated uncertainty, and is not included in the draft report unless it is accompanied by an uncertainty supported by a complete uncertainty budget. Uncertainties are drawn up following the guidance given in the technical protocol." Before Draft A of the report is developed, apparent anomalies can be reported to the relevant participants. The corresponding institutes are invited to check their results for numerical errors but without being informed as to the magnitude or sign of the apparent anomaly. If no numerical error is found the result stands and the complete set of results is sent to all participants. Note that once all participants have been informed of the results, individual values and uncertainties may be changed or removed, or the complete comparison abandoned, only with the agreement of all participants and on the basis of a clear failure of the traveling standard or some other phenomenon that renders the comparison or part of it invalid.
- Although the Guidelines implies that every Key Comparison must have a Key Comparison Reference Value, NIST specifically endorses the language in the CIPM MRA, Section T.3, which states that "in some exceptional cases a Consultative Committee may conclude that for technical reasons a reference value for a particular key comparison is not appropriate." If experts in the comparison working group agree that such technical reasons exist, then the exceptional case exists and no KCRV is calculated. In this case, "the results are then expressed directly in terms of the degrees of equivalence between pairs of standards."
- Section T.3 of the CIPM MRA also states that "although a key comparison reference value is normally a close approximation to the corresponding SI value, it is possible that some of the values submitted by individual participants may be even closer. In a few instances, for example in some chemical measurements, there may be difficulty in relating results to the SI. Nevertheless, the key comparison reference value and deviations from it are good indicators of the SI value." There are completed KCs that for one reason or another have KCRVs that do not fulfill this characterization. For instance, when a KC transfer standard drifts, the drift is modeled and participant results are adjusted according to this model, the KCRV may have no relation to a corresponding SI value (cf. CCEM-K4). NIST recognizes that such KCRVs have no intrinsic value other than as a convenient summary of the ensemble of those specific KC results.

The analysis of key comparison results

Key comparisons necessarily involve different designs for the wide variety of scientific areas of metrology for which these are conducted. Moreover for different measurement methods, the sources of uncertainty, let alone their quantitative estimates, will be different among different KCs; this may be true even within a single KC, when a variety of measurement methods are employed. NIST notes that Section 6 of the "Guidelines for Key Comparisons" asserts "that the purpose of a key comparison is to compare the standards as realized in the participating institutes, not to require each participant to adopt precisely the same conditions of realization. The protocol should, therefore, specify the procedures necessary for the comparison, but not the procedures used for the realization of the standards being compared." Consequently, it is NIST position that a single approach to developing summary statistics for all KCs cannot be adopted.

The interpretation of key comparison results

Key comparison results are intended to support the statements of Calibration and Measurement capabilities (CMCs) as listed in Appendix C of the CIPM MRA. Degrees of equivalence derived from the analysis of KC results should be consistent with the uncertainties listed in participants' CMCs. However, KC protocols may not exactly match the conditions of a participant's calibration or measurement service delivery. Key Comparisons necessarily involve transfer standards, which may introduce components of uncertainty unique to the KC. Therefore, degrees of equivalence developed from a KC may in fact be larger than a participant's uncertainties associated with relevant CMCs without automatically invalidating those CMCs.
It is NIST position to follow the recommendation in JCRB Document 9/12 (Revised 4 October 2002) that it is the on-going responsibility of the Working Group on CMCs within each Consultative Committee to monitor the results of key and supplementary comparisons and provide a written report to the JCRB in the case that these results appear to contradict published CMCs. The relevant Regional Metrology Organization (RMO) representative to the JCRB transmits this report as appropriate within the RMO. It is the responsibility of the NMI providing the CMCs to notify the KCDB Coordinator in order to undertake appropriate action. Such action may involve increasing the uncertainties of CMCs or withdrawing CMCs. The relevant RMO will keep the JCRB informed of the status of such CMCs. Furthermore, it is NIST position that the process of review and publication of a KC Final Report should not be delayed in any way because of questions related to CMCs.

NIST Statement on Statistical Principles for the Design and Analysis of Key Comparisons

Introduction:

To facilitate international trade beneficial to U.S. industry, NIST participates in international interlaboratory comparisons, called Key Comparisons, to assess the equivalence of measurement standards used at different National Metrology Institutes. Because Key Comparisons impact both scientific and economic decisions made by different countries, there are clearly defined procedures governing their conduct. The primary document governing Key Comparisons is the Mutual Recognition Arrangement (MRA) developed by the CIPM. In addition, NIST has developed a "Position on the Conduct of Key Comparisons," which offers guidance on several interpretable points of the MRA for NIST participants in Key Comparisons. Having participated in nearly 250 comparisons and piloted nearly 70, NIST technical staff has asked for a clear articulation of the statistical principles that are central to the design, implementation, analysis and interpretation of Key Comparisons and Supplementary Comparisons. Questions have arisen regarding the following issues:

What are the requirements in designing a Key Comparison to assure a clear interpretation from the data once the comparison is completed?
What are the conditions for a statistical analysis of a Key Comparison to be valid?
When is the statistical analysis of a Key Comparison complete?

Is there a single statistical approach to the analysis of a Key Comparison or to the estimation of a reference value (KCRV) or the estimation of degrees of equivalence?

This NIST Statement identifies statistical principles for different types of Key Comparisons that should be followed to ensure that the comparisons in which NIST participates will be clearly interpretable. Interpretability requires statistically sound estimates of the various quantities of interest including reference values and degrees of equivalence between measurement standards maintained by different NMI's each with its associated uncertainties. Interpretation also extends to the statistical basis for addressing unexplained deviations, whether individual observations or the collective observations from a particular NMI, and to statistically sound methods for combining information from Key and Regional Comparisons in order to address differences between NMIs participating in separate, but linked, comparisons.

Information on sound statistical procedures and/or methodologies and established statistical practices can be found in the archival and applied journals of statistical societies, other technical and educational publications and reputable statistical software in both the commercial and public domains.

Statement of Principles:

The statistical premises for Key Comparisons

Recognizing that there are both stochastic and non-stochastic elements in all interlaboratory comparisons, the general goal of the analysis of Key Comparison data is to draw statistical inference. As a particular example, degrees of equivalence among measurements and measurement standards for the various NMIs with their associated uncertainties must be resolved on the basis of sound statistical procedures.
As expressed in Sections 6 and 9 of the Guidelines for Key Comparisons and endorsed by the NIST Position Statement on the Conduct of Key Comparisons, integrity of the data is essential to the interpretability of a Key Comparison. Thus a prerequisite to the inclusion of an NMI's data in the analysis is the complete submission of data with attendant detailed uncertainty budget. Similarly, according to the Guidelines, the integrity of the Key Comparison analysis is protected by explicit documentation of any changes to the data (e.g., that may occur when the data is reviewed prior to preparation of Draft A).
Open accessibility of all data and uncertainty budgets permits alternate or expanded analyses, as these may be appropriate and may serve as additional validation of the conclusions.

The statistical design of Key Comparisons

The statistical design of the Key Comparison should conform to established principles of sound statistical design of experiments. From the outset, a specific statistical analysis should be posited to ensure that unbiased estimates of degrees of equivalence and reference value and also of their associated uncertainties will be possible; but the eventual analysis should not be limited to this particular method. It is an established statistical practice to construct the statistical design both for (statistical) efficiency and for robustness due to any loss of data or to effects of unforeseen factors.
Since each Key Comparison involves a specific metrology, the statistical design of the Key Comparison need to be individualized to reflect the particular metrological requirements and practical constraints of each comparison. Statistical design features include among other things: factors affecting the measurement process, artifact attributes, replication and randomization (where feasible).

The statistical analysis of Key Comparisons

Key Comparisons inherently involve statistical (Type A) sources of uncertainty, non-statistical (Type B) sources of uncertainty, and mathematical constants not subject to uncertainty. The statistical methodology must distinguish among these, assigning the correct (different) mathematical role to each. Statistical sources of uncertainty should be measured from data and be verifiable from data. Non-statistical sources may be statements of individual expert opinion not verifiable directly from data or may incorporate both expert opinion and data-verifiable uncertainties such as offsets.
Because Key Comparisons are necessary in a wide variety of metrological areas, no single statistical methodology can be universally applied either to their design or to their analysis. Therefore an appropriate statistical approach for a Key Comparison will require individualization because of the diversity of the measurement processes, the variety of metrological models and the differences in the designs of the Key Comparisons.
In general, multiple statistical approaches are valid for a Key Comparison. Every statistical approach requires a set of underlying assumptions; for a particular approach to be valid these assumptions must be stated and checked, wherever possible. These assumptions include (but are not limited to) the statistical models used, independence or interdependencies in the data, and distributional assumptions about the data.
Critical conclusions drawn from analysis of a Key Comparison should hold generally under analysis by alternative contextually valid statistical approaches. Divergence among valid statistical approaches in the principal conclusions is an indicator of insufficient information or a crucial dependence upon assumptions that are not verifiable.
For the purposes of a Key Comparison, the analysis of the Key Comparison data is complete and adequate when it satisfies two criteria: 1) a more elaborate analysis does not alter conclusions drawn with respect to the primary objectives, and 2) the uncertainty associated with the summary results meet or surpass the specific requirements for all primary uses of the comparison. Statistical principles endorse more extensive or more focused analyses for other purposes, for example: to shed light on the measurement methodology or on the measurement process for a particular NMI or subset of NMIs, or to resolve deviations of individual observations or the collective data from a single NMI or from a single measurement method.

Date created: 07/29/2004
Last updated: 07/29/2004
Please email comments on this WWW page to sedwww@cam.nist.gov.