NIST Statement on Statistical Principles for the Design and Analysis of Key Comparisons

Approved by the Measurement Services Advisory Group, December 16, 2003

Introduction:

To facilitate international trade beneficial to U.S. industry, NIST participates in international interlaboratory comparisons, called Key Comparisons, to assess the equivalence of measurement standards used at different National Metrology Institutes. Because Key Comparisons impact both scientific and economic decisions made by different countries, there are clearly defined procedures governing their conduct. The primary document governing Key Comparisons is the Mutual Recognition Arrangement (MRA) developed by the CIPM. In addition, NIST has developed a "Position on the Conduct of Key Comparisons," which offers guidance on several interpretable points of the MRA for NIST participants in Key Comparisons. Having participated in nearly 250 comparisons and piloted nearly 70, NIST technical staff has asked for a clear articulation of the statistical principles that are central to the design, implementation, analysis and interpretation of Key Comparisons and Supplementary Comparisons. Questions have arisen regarding the following issues:

What are the requirements in designing a Key Comparison to assure a clear interpretation from the data once the comparison is completed?
What are the conditions for a statistical analysis of a Key Comparison to be valid?
When is the statistical analysis of a Key Comparison complete?

Is there a single statistical approach to the analysis of a Key Comparison or to the estimation of a reference value (KCRV) or the estimation of degrees of equivalence?

This NIST Statement identifies statistical principles for different types of Key Comparisons that should be followed to ensure that the comparisons in which NIST participates will be clearly interpretable. Interpretability requires statistically sound estimates of the various quantities of interest including reference values and degrees of equivalence between measurement standards maintained by different NMI's each with its associated uncertainties. Interpretation also extends to the statistical basis for addressing unexplained deviations, whether individual observations or the collective observations from a particular NMI, and to statistically sound methods for combining information from Key and Regional Comparisons in order to address differences between NMIs participating in separate, but linked, comparisons.

Information on sound statistical procedures and/or methodologies and established statistical practices can be found in the archival and applied journals of statistical societies, other technical and educational publications and reputable statistical software in both the commercial and public domains.

Statement of Principles:

The statistical premises for Key Comparisons

Recognizing that there are both stochastic and non-stochastic elements in all interlaboratory comparisons, the general goal of the analysis of Key Comparison data is to draw statistical inference. As a particular example, degrees of equivalence among measurements and measurement standards for the various NMIs with their associated uncertainties must be resolved on the basis of sound statistical procedures.
As expressed in Sections 6 and 9 of the Guidelines for Key Comparisons and endorsed by the NIST Position Statement on the Conduct of Key Comparisons, integrity of the data is essential to the interpretability of a Key Comparison. Thus a prerequisite to the inclusion of an NMI's data in the analysis is the complete submission of data with attendant detailed uncertainty budget. Similarly, according to the Guidelines, the integrity of the Key Comparison analysis is protected by explicit documentation of any changes to the data (e.g., that may occur when the data is reviewed prior to preparation of Draft A).
Open accessibility of all data and uncertainty budgets permits alternate or expanded analyses, as these may be appropriate and may serve as additional validation of the conclusions.

The statistical design of Key Comparisons

The statistical design of the Key Comparison should conform to established principles of sound statistical design of experiments. From the outset, a specific statistical analysis should be posited to ensure that unbiased estimates of degrees of equivalence and reference value and also of their associated uncertainties will be possible; but the eventual analysis should not be limited to this particular method. It is an established statistical practice to construct the statistical design both for (statistical) efficiency and for robustness due to any loss of data or to effects of unforeseen factors.
Since each Key Comparison involves a specific metrology, the statistical design of the Key Comparison need to be individualized to reflect the particular metrological requirements and practical constraints of each comparison. Statistical design features include among other things: factors affecting the measurement process, artifact attributes, replication and randomization (where feasible).

The statistical analysis of Key Comparisons

Key Comparisons inherently involve statistical (Type A) sources of uncertainty, non-statistical (Type B) sources of uncertainty, and mathematical constants not subject to uncertainty. The statistical methodology must distinguish among these, assigning the correct (different) mathematical role to each. Statistical sources of uncertainty should be measured from data and be verifiable from data. Non-statistical sources may be statements of individual expert opinion not verifiable directly from data or may incorporate both expert opinion and data-verifiable uncertainties such as offsets.
Because Key Comparisons are necessary in a wide variety of metrological areas, no single statistical methodology can be universally applied either to their design or to their analysis. Therefore an appropriate statistical approach for a Key Comparison will require individualization because of the diversity of the measurement processes, the variety of metrological models and the differences in the designs of the Key Comparisons.
In general, multiple statistical approaches are valid for a Key Comparison. Every statistical approach requires a set of underlying assumptions; for a particular approach to be valid these assumptions must be stated and checked, wherever possible. These assumptions include (but are not limited to) the statistical models used, independence or interdependencies in the data, and distributional assumptions about the data.
Critical conclusions drawn from analysis of a Key Comparison should hold generally under analysis by alternative contextually valid statistical approaches. Divergence among valid statistical approaches in the principal conclusions is an indicator of insufficient information or a crucial dependence upon assumptions that are not verifiable.
For the purposes of a Key Comparison, the analysis of the Key Comparison data is complete and adequate when it satisfies two criteria: 1) a more elaborate analysis does not alter conclusions drawn with respect to the primary objectives, and 2) the uncertainty associated with the summary results meet or surpass the specific requirements for all primary uses of the comparison. Statistical principles endorse more extensive or more focused analyses for other purposes, for example: to shed light on the measurement methodology or on the measurement process for a particular NMI or subset of NMIs, or to resolve deviations of individual observations or the collective data from a single NMI or from a single measurement method.

Date created: 07/29/2004
Last updated: 07/29/2004
Please email comments on this WWW page to sedwww@cam.nist.gov.