Skip Navigation U.S. Department of Health and Human Services www.hhs.gov
Agency for Healthcare Research Quality www.ahrq.gov
www.ahrq.gov

Appendix E: Information on Statistical Significance

This section is provided for data analysts who want to generate other statistics and/or perform statistical tests for other comparisons than those that are provided in the NHQR and NHDR.

Comparing State and Average Estimates Using P-Values

When comparing an individual State estimate to another estimate, such as the all-State average or the average for the top tier of States, every measure has error associated with it. The error is associated with sampling (size of the sample or sampling methods), accuracy of respondents' recall and responses, data entry processes, and many other factors. When comparing estimates it is important to take this error (which can be estimated with statistical assumptions) into account.

P-Values

A common statistic for comparing two rates to determine whether they differ is the t-test based on a normal distribution. The t-test can be compared to a normal distribution with a pre-specified level of significance or acceptable error in conclusions about whether or not two statistics come from the same distribution or population. The p-value, a statistic for a normal distribution, can be calculated to determine whether two measures are likely from the same or from different distributions.

Judgments About Comparisons

Statistical significance and magnitude of the difference should be considered together when comparing two estimates. The first check should be: Is the difference statistically different? The second check should be: Are the differences large enough to be meaningful for policy purposes?

  • Is the difference statistically different? Are the p-values less than 0.05? If so, you can assume that the underlying distributions come from different populations or experiences. But there are some other considerations. The statistical test of differences is affected by the number of observations from which the measures were generated. For example, if the measures were generated from hundreds of thousands of records then summary measures (such as averages) have less variance and lower p-values, which imply “statistical significance” even when the magnitude of the differences might be tiny. Alternatively, when differences are large and the number of observations is few, the absence of statistical significance might simply mean that the data set does not have enough observations for a powerful test. This happens frequently with the BRFSS measures because the annual sample sizes of the State surveys are small—from about 2,000 to 8,500 observations.
  • Are the differences large enough to be meaningful for policy purposes? Because of the relationship between the statistical test and the number of observations, some judgment must be used to assess the meaning of the differences between State estimates. Thus, in addition to statistical significance, it is important to ask the second question: Is the State-to-benchmark difference large enough to warrant efforts to rectify it? A 1- or 2-percentage-point difference in a measure may not be worth the effort to improve it. A 5- or 10-percentage-point difference may mean that a substantial number of State residents are affected by poor health care quality in the State. These are judgments that local experts and stakeholders who understand the environment of a State can help make.

How To Calculate P-Values

P-values are used in this Resource Guide to determine whether the estimate of a given State is statistically different—above or below a given average (e.g., the national average or the average of the top decile States). Calculating the p-value is straightforward when the standard errors (SEs) of the estimates are provided, as in the case of the national rate and individual State rates in the first formula and example below. However, when the standard error has not been provided, as is the case for the mean of the top decile of States, then the calculation is more complicated and may require additional data, such as sample sizes. The method for the p-value calculation for the top-decile States is also provided (see second formula and example).

Calculating P-Value To Compare States to the National Average

For an individual State estimate compared to the national average, the following formula shows how to derive a t-test statistic, which is a statistical test for whether the State average is likely to come from a distribution different from the national average. From the t-test, a p-value can be derived; and if the p value is less than 0.05, it can be concluded with 95-percent confidence that the mean from the State distribution is statistically different from the mean from the national distribution (go to example for one State). Rates and standard errors are provided for most measures in the NHQR tables.

Two-sided t-test: Two-sided t-test formula. For details see description below.

Description:

Equation: t equals R subscript 1 minus R subscript 2 over square root of SE superscript 2 subscript 1 plus SE superscript 2 subscript 2.
p equals 2 times Prob paren capital Z greater than | t | paren

where:

R1 = a State rate
R2 = national rate
SE21 = square of the standard error of the State rate (or its variance)
SE22 = square of the standard error of the national rate (or its variance)

This formula is more conveniently calculated using SAS or EXCEL with the following commands:

SAS: p = 2 * (1 - PROBNORM(ABS(t)))
EXCEL: p = 2*(1 - NORMDIST(ABS(t),0,1,TRUE))

Example: How does Georgia compare to the national average for annual retinal exams for adults with diabetes? The national rate and standard error for adults with diabetes receiving annual retinal exams are 66.7 and 1.2, respectively. Georgia's rate and standard error for annual retinal exams are 70.4 and 3.7, respectively. Following is the EXCEL statement for the p-value, which encompasses the t-test formula with the Georgia and national values.

p = 2*(1-NORMDIST(ABS(70.4-66.7)/SQRT((3.7*3.7)+(1.2*1.2)),0,1,TRUE))
p = 0.34

Because the p-value is greater than 0.05, we cannot conclude that Georgia is statistically different from the national average. Our confidence is that this would be true 95 percent of the time in repeated tests.

Calculating P-Value To Compare States to the Top Decile Average

To compare individual States to the top decile average, both the top decile rate and its standard error must be estimated, which is done using the fundamental equation of analysis of variance and weighting individual State values by their respective samples sizes. (The NHQR tables do not provide sample sizes; but this information is available from the CDC Web site.)

Let us assume that the top decile is comprised of three States. Using the three top States, the formula determines the three-State sample size, the weighted mean for the three States, and the total sum of squares about the three-State mean. The latter is the sum of the within-State sum of squared deviations from the State mean and the between-State sum of squared deviations from the three-State mean. The within-State sum of squares (SS) is obtained by squaring the State's SE and multiplying by the sample size times the sample size minus one. The between-State sum of squares is obtained by summing the sample-weighted squared difference between the State mean and the overall three-State mean. Here is the formula (note: sqrt(x) = square root of x):

Let n1, n2, and n3 be the sample sizes for each State.
Let m1, m2, and m3 be the means for each State.
Let s1, s2, and s3 be the standard errors for each State.
N = n1 + n2 + n3, is the overall three-State sample size.
M = (n1*m1 + n2*m2 + n3*m3) / N, is the overall three-State mean.
Within State SS = n1*(n1-1)*s12 + n2*(n2-1)*s22 + n3*(n3-1)*s32, represents the simplified sum of squared deviations of values within the State from its mean.
Between State SS = n1*(m1-M) 2 + n2*(m2-M) 2 + n3*(m3-M)2, is the sum of squared deviations of means between the three States weighted by sample size.
Total SS= Within State SS + Between State SS
VAR = SS/(N-1), is the estimated variance for the three-State mean
SE = sqrt(VAR/N), is the estimated standard error for the three-State mean.

Using the estimated standard error and weighted mean for the top decile of States, a p-value can be calculated that reflects how a State compares to the average of top decile States.

Example: How does Georgia compare to the top decile of States for rates of annual retinal exams for adults with diabetes? First, determine the number of States in the top decile. If all States are considered, then the top decile would be the top five States, however, not all States report for all data sources. In the case of BRFSS data for diabetes, 41 States and the District of Columbia reported; therefore, the top decile is the top four States.

The four States with the highest rates for retinal eye exams are Wisconsin with a rate of 82.5, SE=3.1, and sample size of 201; Maine with a rate of 82.3, SE=3.5, and a sample size of 172; Nebraska with a rate of 80.4, SE=5.6, and a sample size of 214; and Connecticut with a rate of 77.1, SE=4.3, and a sample size of 492. The overall sample size is 1,079 and the overall weighted average is 79.6. The within State SS=6,642,737, the between State SS=6,156, and the total SS=6,648,893. From the total SS, the weighted SE can be determined for the top decile average and the calculation for p-values can be used to compare States to that top decile average.

The p-value for Georgia compared to the top decile average is 0.03. Because the p-value is less than 0.05, it can be concluded that Georgia, which is below the top-decile average, is significantly different from the top decile and, thus, there is opportunity for improvement in annual retinal exams.

Return to Contents
Proceed to Next Section

 

AHRQ Advancing Excellence in Health Care