Go to the Table Of Contents Skip To Content
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

2002-2004 Sub State Report of Substance Use & Serious Psychological Distress

Section B: Substate Region Estimation Methodology

This report includes substate region level estimates of 22 substance use measures (see Section B.2) using the combined data from the 2002, 2003, and 2004 National Surveys on Drug Use and Health (NSDUHs).

The survey-weighted hierarchical Bayes (SWHB) methodology used in the production of State estimates from the 1999-2004 surveys also was used in the production of the 2002-2004 substate estimates. The SWHB methodology is described by Folsom, Shah, and Vaish (1999). A brief discussion of the precision and validation of the estimates and interpretation of the predication intervals (PIs) is given in Section B.1. Section B.2 lists the 22 substance use measures for which substate-level small area estimates were produced. The list of predictors used in the 2002-2004 substate-level small area estimation (SAE) modeling is given in Section B.3. The methodology used to select relevant predictors is described in Section B.4. The goals of the SAE modeling, the general model description, and the implementation of SAE modeling remain the same and are described in Appendix E of the 2001 State report (Wright, 2003). A general model description is given in Section B.5. A short description of the calculation of the rate of first use of marijuana, serious psychological distress, and underage drinking is included in Section B.6.

Small area estimates obtained using the SWHB methodology are design consistent (i.e., for States or substates with large sample sizes, the small area estimates are close to the robust design-based estimates). The substate small area estimates when aggregated by using the appropriate population totals result in national small area estimates that are very close to the national design-based estimates. However, for many reasons, including internal consistency, it is desirable to have national small area estimates exactly match the national design-based estimates. Beginning in 2002, exact benchmarking was introduced (see Appendix A, Section A.4, in Wright & Sathe, 2005). The small area estimates presented here have been benchmarked to the national design-based estimates.

B.1. Precision and Validation of the Estimates

The primary purpose of this report is to give policy officials a better perspective on the range of prevalence estimates within and across States. Because the data were collected in a consistent manner by field interviewers who adhered to the same procedures and administered the same questions across all States and substate areas, the results are comparable across the 50 States and the District of Columbia.

The 95 percent PI associated with each estimate provides a measure of the accuracy of the estimate. It defines the range within which the true value can be expected to fall 95 percent of the time. For example, the prevalence of past month use of marijuana in Region 1 in Alabama is approximately 4.0 percent, and the 95 percent PI ranges from 3.0 to 5.3 percent. Therefore, the probability is 0.95 that the true value is within that range. The PI indicates the uncertainty due to both sampling variability and model bias. The key assumption underlying the validity of the PIs is that the State- and substate-level error (or bias) terms in the models behave like random effects with zero means and common variance components.

A comparison of the standard errors (SEs) among substate areas with small (n ≤ 500), medium (500 < n ≤ 1,000), and large (n > 1,000) sample sizes for the 22 measures in this report shows that the small area estimates behave in predictable ways. Regardless of whether or not the substate area is from one of the eight States with a large annual sample size (3,000 to 4,000) or one of the other States (n = 900 annually), the sizes of the PIs are very similar and are primarily a function of the sample size of the substate area and the prevalence estimate of the measure. Substate areas with large sample sizes had the smallest SEs.

For past month use of alcohol, where the national prevalence for all persons aged 12 or older was 50.4 percent (for 2002-2004), the average relative standard error (RSE)4 was about 5.4 percent, and the RSE for substate areas with a sample size greater than 1,000 was about 3.4 percent. For substate areas with sample sizes between 500 and 1,000 records, the average RSE was 4.7 percent; for sample sizes smaller than 500, the RSE average was 6.0 percent.

For past month use of marijuana (with a national prevalence of 6.1 percent), the average RSE was 9.8 percent for substate areas with large samples. For medium sample sizes, the average RSE was 12.8 percent, and for samples smaller than 500, the RSE was 15.3 percent. Substance use measures with lower prevalences, such as past year use of cocaine (2.5 percent nationally), displayed larger average RSEs. For sample sizes greater than 1,000, the average RSE was 13.8 percent. For substate areas of medium sample sizes, the average RSE was 16.2 percent, and for samples smaller than 500, the average RSE was 17.8 percent.

The SAE methods used for substate regions in this report were previously validated for the NSDUH State-by-age group small area estimates (Wright, 2002). This validation exercise used direct estimates from pairs of large sample States (n = 7,200) as internal benchmarks. These internal benchmarks were compared with small area estimates based on random subsamples (n = 900) that mimicked a single year small State sample. The associated age group–specific small area estimates were based on sample sizes targeted at n = 300. Therefore, validation of the State-by-age group small area estimates should lend some validity to the small sample size substate small area estimates reported here.

B.2. Variables Modeled

Substate-level small area estimates were produced for the following set of 22 binary (0, 1) substance use measures, using the 2002-2004 NSDUHs:

  1. past month use of any illicit drugs,
  2. past month use of any illicit drug other than marijuana,
  3. past month use of marijuana,
  4. average annual rate of first use of marijuana,
  5. perception of great risk of smoking marijuana once a month,
  6. past year use of marijuana,
  7. past year use of cocaine,
  8. past year nonmedical use of pain relievers,
  9. past month use of alcohol,
  10. past month binge alcohol use,
  11. perception of great risk of having five or more drinks of an alcoholic beverage once or twice a week,
  12. past month use of cigarettes,
  13. past month use of any tobacco product,
  14. perceptions of great risk of smoking one or more packs of cigarettes per day,
  15. past year alcohol dependence,
  16. past year any illicit drug dependence,
  17. past year alcohol dependence or abuse,
  18. past year any illicit drug dependence or abuse,
  19. past year dependence on or abuse of any illicit drug or alcohol,
  20. needing but not receiving treatment for illicit drug use in the past year,
  21. needing but not receiving treatment for alcohol use in the past year, and
  22. past year serious psychological distress (SPD).

In addition to the 22 measures listed above, estimates also have been produced for the underage use of alcohol and underage binge alcohol use.

B.3. Predictors Used in Logistic Regression Models

Local area data used as potential predictor variables in the mixed logistic regression models were obtained from several sources, including Claritas, the U.S. Census Bureau, the Federal Bureau of Investigation (FBI) (Uniform Crime Reports), Health Resources and Services Administration (Area Resource File), the Substance Abuse and Mental Health Services Administration (SAMHSA) (National Survey of Substance Abuse Treatment Services [N-SSATS]), and the National Center for Health Statistics (mortality data). The list of sources of data used in the modeling is provided below.

To obtain a detailed list of predictors, please see Appendix A, Section A.2, of the 2003-2004 State estimates report (Wright & Sathe, 2006).

B.4. Selection of Independent Variables for the Models

No new variable selection was done. The same fixed-effect predictors that were used in modeling the 2002-2003 and 2003-2004 State estimates were used to model the 2002-2004 substate estimates.

B.5. General Model Description

The model described here is similar to the logistic mixed hierarchical Bayes (HB) model that was used to produce the 1999-2001 substate small area estimates (Office of Applied Studies [OAS], 2005). The following model was used:

log[πaijk/(1-πaijk)]=χ′aijkβaai+vaij,

where πaijk is the probability of engaging in the behavior of interest (e.g., to use marijuana in the past month) for person-k belonging to age group-a in substate region-j of State-i. Let χaijk denote a pa×1 vector of auxiliary variables associated with age group-a and βa denote the associated vector of regression parameters. The age group–specific vectors of auxiliary variables are defined for every block group in the Nation and also include person-level demographic variables, such as race/ethnicity and gender. The vectors of random effects ηi=(ηli,...,ηAi)′ and vij=(vlij,...,vAij)′ are assumed to be mutually independent with ηi˜NA(0,Dη) and vij˜NA(0,Dv), where A is the total number of individual age groups modeled (generally A=4). For HB estimation purposes, an improper uniform prior distribution is assumed for βa, and proper Wishart prior distributions are assumed for Dη-1 and Dv-1. The HB solution for πaijk involves a series of complex Markov Chain Monte Carlo (MCMC) steps to generate values of the desired fixed and random effects from the underlying joint distribution. The basic process is described in Folsom et al. (1999), Shah, Barnwell, Folsom, and Vaish (2000), and Wright (2003).

Once the required number of MCMC samples for the parameters of interest are generated and tested for convergence properties (see Raftery & Lewis, 1992), the small area estimates for each age group by race/ethnicity by gender cell within a block group can be obtained. These block group–level small area estimates then can be aggregated using the appropriate population count projections to form substate- and State-level small area estimates for the desired age group(s). These small area estimates then are benchmarked to the national design-based estimates (see Appendix A, Section A.4, in Wright & Sathe, 2006).

B.6. Calculation of Average Annual Rate (Incidence) of First Use of Marijuana, Serious Psychological Distress, and Underage Drinking

Incidence rates typically are calculated as the number of new initiates of a substance during a period of time (such as in the past year) divided by the estimate of the number of person years of exposure (in thousands). The incidence rate definition per 100 person years of exposure used in this report is the result of a simpler definition based on the model-based methodology and is as follows:

Average annual incidence rate = 100* [{(Number of marijuana initiates in past 24 months) /
[(Number of marijuana initiates in past 24 months * 0.5) + Number of persons who never used marijuana]} / 2.]

For details on calculating the average annual rate of first use of marijuana, see Appendix A, Section A.5, of the 2003-2004 State estimates report (Wright & Sathe, 2006).

In 2004, SPD was measured using the K6 screening instrument for nonspecific psychological distress (Furukawa, Kessler, Slade, & Andrews, 2003; Kessler et al., 2003). In previous NSDUH reports, the K6 scale was referred to as a measure of serious mental illness (SMI). An adjusted measure of SPD was created in 2004. For details on how SPD was produced and adjusted for 2004, see Appendix A, Section A.7, of the 2003-2004 State estimates report (Wright & Sathe, 2006). For the purpose of producing substate-level estimates of SPD for this report, SMI data from 2002 and 2003 were pooled with data using the adjusted measure of SPD from 2004.

To obtain small area estimates for persons aged 12 to 20 for past month alcohol use and binge alcohol use, a separate set of models was fit for these two outcomes for the 12 to 17 age group and the 18 to 20 age group (similar to what was done for producing State estimates using the 2003-2004 NSDUH data). For details, refer to Appendix A, Section A.6, of the 2003-2004 State estimates report (Wright & Sathe, 2006).

Go to the Table Of Contents

This is the page footer.

This page was last updated on January 15, 2009.

SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.

Yellow Line

Site Map | Contact Us | Accessibility Privacy PolicyFreedom of Information ActDisclaimer  |  Department of Health and Human ServicesSAMHSAWhite HouseUSA.gov

* Adobe™ PDF and MS Office™ formatted files require software viewer programs to properly read them. Click here to download these FREE programs now

What's New

Highlights Topics Data Drugs Pubs Short Reports Treatment Help Mail OAS