Go to the Table Of Contents Skip To Content
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

Substate Substance Abuse Estimates from the 1999-2001 NSDUH

Section B: Substate Region Estimation Methodology

This report includes substate region level estimates of 12 substance use measures (see Section B.1) using the combined data from the 1999, 2000, and 2001 National Surveys on Drug Use and Health (NSDUHs).

The survey-weighted hierarchical Bayes (SWHB) methodology used in the production of State estimates from the 1999-2003 surveys also was used in the production of the 1999-2001 substate estimates. The SWHB methodology is described by Folsom, Shah, and Vaish (1999). A brief discussion of the precision and validation of the estimates and interpretation of the predication intervals (PIs) is given in Section B.1. Section B.2 lists the 12 substance use measures for which substate-level small area estimates were produced. The list of predictors used in the 1999-2001 substate-level small area estimation (SAE) modeling is given in Section B.3. The improved methodology used to select relevant predictors is described in Section B.4. The goals of SAE modeling, general model description, and the implementation of SAE modeling remain the same and are described in Appendix E of the 2001 State report (Wright, 2003). A general model description is given in Section B.5.

Small area estimates obtained using the SWHB methodology are design consistent (i.e., for States or substates with large sample sizes, the small area estimates are close to the robust design-based estimates). The substate small area estimates when aggregated by using the appropriate population totals result in national small area estimates that are very close to the national design-based estimates. However, due to many reasons, such as internal consistency, it is desirable to have national small area estimates exactly match the national design-based estimates. Beginning in 2002, exact benchmarking was introduced (see Appendix A, Section A.4, in Wright & Sathe, 2005).

B.1. Precision and Validation of the Estimates

The primary purpose of this report is to give policy officials a better perspective on the range of prevalence estimates within and across States. Because the data were collected in a consistent manner by field interviewers who adhered to the same procedures and administered the same questions across all states and substate areas, the results are comparable across the 50 States and the District of Columbia.

The 95 percent PI associated with each estimate provides a measure of the accuracy of the estimate. It defines the range within which the true value can be expected to fall 95 percent of the time. For example, the prevalence of past month use of marijuana in Region 1 in Alabama is approximately 3.8 percent, and the 95 percent PI ranges from 2.8 to 5.1 percent. Therefore, the probability is 0.95 that the true value is within that range. The PI indicates the uncertainty due to both sampling variability and model bias. The key assumption underlying the validity of the PIs is that the State- and substate-level error (or bias) terms in the models behave like random effects with zero means and common variance components.

A comparison of the standard errors (SEs) among substate areas with small (n image representing less than or equal to 500), medium (500 < n image representing less than or equal to 1,000), and large (n > 1,000) sample sizes for the 12 measures in this report shows that the small area estimates behave in predictable ways. Regardless of whether the substate area is from one of the eight States with a large annual sample size (3,000 to 4,000) or one of the other States (n = 900 annually), the sizes of the PIs are very similar and are primarily a function of the sample size of the substate area and the prevalence estimate of the measure. Substate areas with large sample sizes had the smallest SEs.

For past month use of alcohol, where the national prevalence for all persons aged 12 or older was 47.3 percent (for 1999-2001), the average relative standard error (RSE) was about 3.6 percent for substate areas with a sample size greater than 1,000.2 For substate areas with sample sizes between 500 and 1,000 records, the average RSE was 5.1 percent; for sample sizes smaller than 500, the RSE average was 6.4 percent.

For past month use of marijuana (with a national prevalence of 5.1 percent), the average RSE was 10.5 percent for substate areas with large samples. For medium sample sizes, the average RSE was 14.0 percent, and for samples smaller than 500, the RSE was 16.3 percent. Substance measures with the lowest prevalence, such as past year use of cocaine (1.7 percent nationally), displayed the highest average RSE. For sample sizes greater than 1,000, the average RSE was 15.6 percent. For substate areas of medium sample sizes, the average RSE was 19.0 percent, and for samples smaller than 500, the average RSE was 20.2 percent.

The SAE methods used for substate regions in this report were previously validated for the NSDUH State-by-age group small area estimates (Wright, 2002). This validation exercise used direct estimates from pairs of large sample states (n = 7,200) as internal benchmarks. These internal benchmarks were compared with small area estimates based on random subsamples (n = 900) that mimicked a single year small State sample. The associated age group–specific small area estimates were based on sample sizes targeted at n = 300. Therefore, validation of the State-by-age group small area estimates should lend some validity to the small sample size substate small area estimates reported here.

Further validation of the substate region small area estimates is being pursued. It may be possible to compare the NSDUH substate estimates with those from State-sponsored surveys having similar data collection procedures. Internal benchmarking to direct NSDUH estimates also is possible for seven of the largest sample substate areas. Pooling of substate areas with similar characteristics also could yield useful benchmarks.

B.2. Variables Modeled

Substate-level small area estimates were produced for the following set of 12 binary (0, 1) substance use measures, using the 1999-2001 NSDUHs:

  1. Marijuana Use in Past Month

  2. Average Annual Rate of First Use of Marijuana

  3. Perceptions of Great Risk of Smoking Marijuana Once a Month

  4. Any Illicit Drug Use in Past Month

  5. Any Illicit Drug Use Other than Marijuana in Past Month

  6. Cocaine Use in Past Year

  7. Alcohol Use in Past Month

  8. Binge Alcohol Use in Past Month

  9. Perceptions of Great Risk of Having Five or More Drinks of an Alcoholic Beverage Once or Twice a Week

  10. Cigarette Use in Past Month

  11. Any Tobacco Product Use in Past Month

  12. Perceptions of Great Risk of Smoking One or More Packs of Cigarettes Per Day

B.3. Predictors Used in Logistic Regression Models

Local area data used as potential predictor variables in the mixed logistic regression models were obtained from several sources, including Claritas, the U.S. Bureau of the Census, the Federal Bureau of Investigation (FBI) (Uniform Crime Reports), Health Resources and Services Administration (Area Resource File), the Substance Abuse and Mental Health Services Administration (SAMHSA) (National Survey of Substance Abuse Treatment Services [N-SSATS]), and the National Center for Health Statistics (mortality data). The list of sources of data used in the modeling is provided below.

To obtain a detailed list of predictors, please see Appendix A, Section A.2, of the 2002-2003 State estimates report (Wright & Sathe, 2005).

B.4. Selection of Independent Variables for the Models

To produce small area estimates based on the pooled 1999-2001 NSDUH data, the fixed effect predictors were selected using the following methodology:

  1. There were 207,399 respondents in the pooled 1999-2001 NSDUH data. Any variable selection performed on such a large dataset would result in an excessive number of predictors in the final model. To avoid this and build parsimonious models, the pooled data were randomly partitioned into modeling and validation samples in such a way that both samples contained respondents from all the survey years. This data partitioning scheme minimized the chance of selecting year-specific predictors at the first stage of modeling. The modeling sample was first used to get a preliminary list of significant predictors using the variable selection methodology described below. These predictors were further reduced by using SUDAAN® logistic regression on the validation dataset resulting in parsimonious models (RTI, 2001). The modeling sample (hence referred to as sample 1) had 136,732 respondents, whereas the validation sample (hence referred to as sample 2) had 70,667 respondents.

  2. Separate SAS® stepwise logistic regression models were fit to sample 1 for all outcomes by four age group domains. The input list to these models included all linear polynomials (constructed from continuous predictor variables) and other categorical or indicator variables. All predictors that were significant at 5 percent (except in a few cases, where the 10 percent level was chosen) then were input to the 3rd step of variable selection.

  3. Using sample 1, almost all significant predictors from step 2 then were input to AnswerTree® to identify significant higher order (at most three-way) interaction terms. AnswerTree® is an SPSS® software package that uses decision-tree algorithms to build classification systems. The exhaustive chi-squared automatic interaction detector algorithm (CHAID) was used to create the trees. The constraints for making a tree were maximum depth = 3; minimum number of records in parent node = 1,000; minimum number of records in child node = 300; and splitting criterion = 3 percent.

  4. All the significant variables from step 2 along with their corresponding higher order polynomials (quadratic and cubic), interaction of gender with race, and the significant interactions detected by AnswerTree® in step 3 then were input to SAS® stepwise logistic regression models, run on sample 1. All predictors that remained significant at 5 percent (except in a few cases, where the 10 percent level was chosen) then were input to the 5th step of variable selection.

  5. All significant variables from step 4 were input to SUDAAN® logistic regression models fit to the validation sample 2, and predictors that remained significant at the 5 percent level were input to PROC GIBBS and PROC GSTAT software. In all mixed logistic models, race and gender main effects were forced.

B.5. General Model Description

The model described here is similar to the logistic mixed hierarchical Bayes (HB) model that has been used successfully since the 1999 NSDUH to produce age group-specific small area estimates for the 50 States and the District of Columbia. The following model was used:

Equation,     D

where image representing piaijk is the probability of engaging in the behavior of interest (e.g., to use marijuana in the past month) for person-k belonging to age group-a in substate region-j of State-i. Let xaijk denote a pa×1 vector of auxiliary variables associated with age group-a and image representing Betaa denote the associated vector of regression parameters. The age group-specific vectors of auxiliary variables are defined for every block group in the Nation and also include person-level demographic variables, such as race/ethnicity and gender. The vectors of random effects image representing lower case etai = (image representing lower case eta1i, ..., image representing lower case etaAi)image representing prime and vij = (v1ij, ... ,vAij)image representing prime are assumed to be mutually independent with image representing lower case etai ~ NA(0,Dimage representing lower case eta) and vij ~ NA(0,Dv), where A is the total number of individual age groups modeled (generally A = 4). For HB estimation purposes, an improper uniform prior distribution is assumed for image representing Betaa, and proper inverse Wishart prior distributions are assumed for Dimage representing lower case eta and Dv. The HB solution for image representing piaijk involves a series of complex Markov Chain Monte Carlo (MCMC) steps to generate values of the desired fixed and random effects from the underlying joint distribution. The basic process is described in Folsom et al. (1999), Shah, Barnwell, Folsom, and Vaish (2000), and Wright (2003).

Once the required number of MCMC samples for the parameters of interest are generated and tested for convergence properties (see Raftery & Lewis, 1992), the small area estimates for each age group by race/ethnicity by gender cell within a block group can be obtained. These block group-level small area estimates then can be aggregated using the appropriate population count projections to form State- and substate-level small area estimates for the desired age group(s). These small area estimates then are benchmarked to the national design-based estimates (see Appendix A, Section A.4, in Wright & Sathe, 2005).

B.6. Calculation of Average Annual Rate (Incidence) of First Use of Marijuana

Incidence rates are typically calculated as the number of new initiates of a substance during a period of time (such as in the past year) divided by the estimate of the number of person years of exposure (in thousands). The incidence definition in this report is the result of a simpler definition based on the model-based methodology and is as follows:

Average annual incidence rate = {(Number of marijuana initiates in past 24 months) /
[(Number of marijuana initiates in past 24 months * 0.5) +
Number of persons who never used marijuana]} / 2.

For details on calculating the average annual rate of first use of marijuana, see Appendix A, Section A.6, of the 2002-2003 State estimates report (Wright & Sathe, 2005).

Go to Top of PageGo to the Table of Contents

This is the page footer.

This page was last updated on January 15, 2009.

SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.

Yellow Line

Site Map | Contact Us | Accessibility Privacy PolicyFreedom of Information ActDisclaimer  |  Department of Health and Human ServicesSAMHSAWhite HouseUSA.gov

* Adobe™ PDF and MS Office™ formatted files require software viewer programs to properly read them. Click here to download these FREE programs now

What's New

Highlights Topics Data Drugs Pubs Short Reports Treatment Help Mail OAS