Key Concepts About Degrees of Freedom for Performing Statistical Tests and Calculating Confidence Limits

Degrees of Freedom and NHANES Subgroups

Estimates are often calculated for various subgroups of interest within the total NHANES population. When the number of first stage sampling units (PSUs) is small, the z-statistic should be replaced by a value from a t-distribution when computing confidence limits for these estimates (see SUDAAN 1995 — ref from NHANES III analytic guidelines).

 

To calculate the correct value for the t-statistic from a t-distribution and a selected level of significance, you must calculate the proper degrees of freedom for the estimate .

 

In addition, it is important to examine the number of degrees of freedom from which a standard error estimate is based.  Continuing research on issues related to stability of variance estimates in subdomains of  NHANES have been published and show that standard error estimates based on small numbers of paired PSUs (i.e., degrees of freedom) are prone to instability.

 

The reliability of the estimated standard error, as measured by its relative standard error (i.e., (standard error of the standard error of the estimate/standard error of the estimate)*100), is inversely proportional to its degrees of freedom. As the number of degrees of freedom increases, the relative standard error decreases and the reliability of the estimate increases. The NHANES guidelines recommended a relative standard error of at most 30%. This corresponds to at least 22 degrees of freedom.

 

Degrees of freedom are properly calculated by subtracting the number of clusters in the first level of sampling (strata) from the number of  clusters in the second level of sampling (PSUs) for each subgroup you are analyzing as shown the in equation below.

 

Equation for Degrees of Freedom

degress of freedom equals number of PSUs minus number of strata

 

Differences in Degrees of Freedom for Subgroups in SUDAAN and SAS Survey Procedures

For both SUDAAN and SAS Survey procedures, the degrees of freedom are calculated in the same way when looking at the entire sample population or in subgroups where all strata and PSUs are represented.

 

However, when you analyze data on a subgroup of sample persons who may not be represented in all strata and PSUs (e.g., Mexican Americans), the degrees of freedom provided in the output may differ. For example, SAS Survey procedures, such as proc surveymeans, compute the degrees of freedom as the number of clusters (PSUs) in the non-empty strata minus the number of non-empty strata. This means that if your data have empty strata (no persons in the population for either PSU) the number of degrees of freedom will increase. This is incorrect and SAS is currently working on correcting this problem.  For more information on methods of correctly calculating degrees of freedom using SAS Survey procedures, please see the following two SAS Survey procedures macros.

 

%SMSUB macro provides additional capabilities for proc surveymeans

http://support.sas.com/ctx/samples/index.jsp?sid=541

PURPOSE:

Provides additional subgroup capabilities beyond those provided by the domain statement in proc surveymeans. This includes:

  • presenting subgroup and overall estimates in one table (TABLES=),
  • computing ratio estimates for subgroups (RATIO=),
  • computing contrasts for means, totals, and ratios (CONTRAST=),
  • restricting table requests to a subpopulation (SUBPOP=), and
  • incorporating missing values into the variance computations.

 

%SREGSUB macro provides additional capabilities for proc surveyreg

http://support.sas.com/ctx/samples/index.jsp?sid=483

PURPOSE:

Provides linear regression capabilities currently not available in proc surveyreg. This includes:

  • restricting the regression analysis to a subpopulation (SUBPOP= ), and
  • incorporating missing values into the variance computations

 

In contrast, SUDAAN will correctly count the number of PSU's and strata with at least one valid observation for each cell of the table being requested.

 

Both SAS Survey procedures (proc surveymeans) and SUDAAN version 9.1 (proc descript) produce 95% confidence intervals (CI). These 95% CIs are calculated using the Wald method, which is based on a t-statistic for the number of degrees of freedom in the entire NHANES sample.  However, they do not correct for the reduction in the degrees of freedom in subdomains where not all strata and PSUs are represented. Details on how to correctly produce 95% confidence intervals (CI) will be discussed in the next task, How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom. Also, the Wald method should not be used when the proportion is close to 0% or 100% (see Alternate methods for calculating 95% confidence limits section below for more information).

 

warning icon The 95% confidence intervals calculated using the Wald method are based on a t-statistic for the number of degrees of freedom in the entire NHANES sample.  The proc surveymeans procedure in SAS Survey Procedures and the proc descript procedure in SUDAAN ver 9.1, DO NOT correct for the reduction in the degrees of freedom in subdomains where not all strata and PSUs are represented.

 

Alternate methods for calculating 95% confidence limits.

For prevalence estimates near 0% or near 100%, standard methods of calculating confidence limits, such as the Wald method, may produce lower limits less than 0% or upper limits greater than 100%. In these cases, it is often recommended to use alternative methods for calculating 95% confidence limits using transformations (such as the logit or arcsine transformation), using the Wilson method, or calculating exact confidence limits such as the Clopper-Pearson approach. For applications to survey data, see Korn and Graubard.

 

References

Wilson, EB (1927). Probable Inference, the Law of Succession and Statistical Inference.  JASA, 22,209-212.

Clopper  CJ and Pearson ES.(1934). The Use of Fiducial Limits Illustrated in the Case of the Binomial. Biometrika. 26 404-413.

Korn EL and Graubard BI. Analysis of Health Surveys. Wiley Series in Probability and Statistics. 1999. New York, New York.

For the arcsin and logit, recommend Appendix C of Wolter, K. Introduction to Variance Estimation. Springer-Verlag. New York.

 

In How to Perform Statistical Tests and Calculate Confidence Limits with Degrees of Freedom, you will learn how to save the results from your SUDAAN procedure as a SAS data file or (or you may specify an ASCII data file). You can use the mean (prevalence), standard error, and degree of freedom estimates, which were correctly calculated from the number of stratum and number of PSUs, #psu, in a SAS spreadsheet or other software program to calculate confidence limits using one of the approaches listed above or other alternative methods.

 

SAS code to calculate 95% confidence limits using the arcsine transformation, log transformation, or the Clopper-Pearson method of calculating exact confidence limits is available at the Sample Code and Datasets page. 

 

To learn more

To understand more about variance estimation methods you may wish to review  the Analytic Guidelines for NHANES analysis on the NHANES web site; read the text by Korn and Graubard (Korn EL and Graubard BI. Analysis of Health Surveys. Wiley Series in Probability and Statistics. 1999. New York, New York.); or take a course in SUDAAN or complex survey sampling.

 

 

close window icon Close Window