Skip To Content Go to the Table Of Contents
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

bulletNational data      bulletState level data       bulletMetropolitan and other subState area data

 

Substance Abuse in States and Metropolitan Areas:
Model Based Estimates from the 1991-1993 National Household Surveys on Drug Abuse

Chapter 2

2. Methodology Used:

2.1 Potential Estimation Strategies for Small Areas

2.2 Small Area Estimation Method Used

2.3 Fitting the Small Area Estimation Model

2.3.1 Data Used

2.3.2 States and MSAs selected for estimation

2.3.3 Summary of methodology

2.3.4 Confidence Intervals

 

  

2. Methodology Used

The methodology that was used for these small area estimates employs logistic regression models that combine NHSDA data with local area indicators that were found to be associated with substance abuse. Several innovative strategies were used to produce these estimates.

  

2.1 Potential Estimation Strategies for Small Areas

Several procedures have been used by other researchers to produce various health and economic statistics for small areas in other fields. Schaible (footnote#6)  presents information describing three common indirect estimation methods that have been used by federal statistical agencies: synthetic estimators, regression estimators, and composite estimators. (footnote#7)  State level synthetic estimators are constructed by taking national estimates for demographic subgroups and applying them to the demographic composition of the particular States for which estimates are desired. Synthetic estimators, however, often fail to reflect the actual variation across local areas because demographic characteristics (age, gender, race/ethnicity, etc.) do not fully determine the phenomena being estimated. Regression estimators incorporate additional factors or predictors in the estimation procedure in an attempt to improve the estimates for the small area. Synthetic and regression estimators are indirect estimation methods and do not require any direct measures of the phenomena for the small area. In many cases, however, direct estimates are available for the small area but the sample size for the small area is not large enough to yield precise estimates. Composite estimators are constructed as a weighted combination of direct survey estimators and indirect estimators. They are used to take maximum advantage of any direct information from the survey for the small area, to reduce the bias associated with indirect estimators, and are likely to be more accurate than using solely either the direct or indirect estimator. (footnote#8)

In developing the small area estimates for the States and MSAs included in this report, we developed procedures that built on earlier small area estimation work, which focused on a variety of social, health and economic phenomena. The previously cited works by Schaible (1996) and Ghosh and Rao (1994) discuss these types of estimators and note that a variety of research has been carried out investigating the optimal weighting and estimation schemes for composite estimators. In this study, similar methods were developed to produce estimates of substance abuse.

  

2.2 Small Area Estimation Method Used

This study built on prior methodologies and improved them in several ways. The estimator used can be approximated mathematically as a composite estimator which is the weighted average of an indirect regression estimator and a direct survey estimator. The basic form of the composite estimator approximation is:

 

ˆ       _            __    __   __  
θ = фs  + (1 - фs ) [ps   - (∏ s - ∏) ]

 

where θ is the estimated prevalence rate for a given small area. In this equation, corresponds to the indirect regression estimator which is obtained as the population weighted average of block group level predictions of substance abuse prevalence and the term in the square brackets is the direct survey estimator which is a function of the actual NHSDA survey estimator ps for the local area.(footnote#9)  In the bracketed direct survey estimator, estimator, estimator, estimator, ps is the NHSDA survey weighted prevalence rate estimate and s  is the corresponding survey weighted mean of person level logistic model predicted probabilities of use.(footnote#10)  The weighting factors, фs  and 1- фs   in the composite estimator are a function of the sampling variance of the local area effects and the variances of the logistic regression estimators. They were constructed so that when the NHSDA sample size in a State was large, the estimates are very close to the direct survey estimate.

In fitting these models, three innovations were introduced:

  • First, logistic regression models for the relationship between substance abuse and a variety of predictors, including demographic characteristics and a number of social and economic characteristics, were fit at the block group level using Census block group and tract level predictors of substance abuse. These block level estimates were then summed to arrive at the estimates for the States and MSAs.

     

  • Second, additional county-level predictors were included in the model to account for still more variation across local areas.
  •  

  • Third, the survey design weights were used in the estimation of the logistic regression coefficients and the State and MSA local area effects. This innovation causes the State small area estimates to sum to the NHSDA estimates for US regions and the nation.(footnote#11)

  •   

    2.3 Fitting the Small Area Estimation Model

    The final estimates of the logistic regression coefficients and the local area effects for States and MSAs were estimated using an iteratively reweighted least squares algorithm patterned after Breslow and Clayton's (1993)(footnote#12)  prescription for analyzing generalized linear mixed models (GLMM).

      

    2.3.1 Data Used

    Four types of data were used in the estimation: NHSDA data, Census data, county level (social indicators) correlates of substance abuse, and block group level population projections.

    The NHSDA data were used to fit models for each of the 11 outcome measures described in Chapter 1. Estimates were made for four age groups (12-17, 18-25, 26-34, and 35+). In addition, separate models were fit for two geographic subpopulations comprised of 1) the six large MSAs and 2) the remainder of the nation. Altogether, this resulted in the fitting of 88 models, (11 outcome measures x 4 age groups x 2 geographic subpopulations). The dependent variables in these models were the rates for the outcome measures. In fitting these models, two types of auxiliary data were used: census data and county level indicators of drug use. After models were fit, the population projections were applied to the estimated rates at the block group level.

    NHSDA data: The data that were used for the estimation came from the respondents to the NHSDA for 1991-1993. Essentially the same survey methodology was used in each of these three years: The sample was a deeply stratified, multistage national probability sample of civilian persons age 12 and older, living in households and certain group quarters, such as, college dormitories and homeless shelters. Civilians living on military installations are included in the target population. On the other hand, military personnel on active duty as well as most transient populations, such as homeless people not residing in shelters, were not included in the target population.

    Each year, roughly 120 primary sampling units (PSUs) were selected at the first stage of sampling. These PSUs were generally individual counties or groups of adjacent counties constituting Metropolitan Statistical Areas (MSAs). At the second stage of selection, groups of Census blocks, called sample segments, were selected. Within each segment, dwelling units were selected; within each successfully screened dwelling unit, either zero, one or two occupants were selected for the NHSDA interview. Generally, the NHSDA yields dwelling unit screening response rates of approximately 94 percent and interview response rates of approximately 80 percent. The pooled 1991-1993 NHSDAs yielded a combined sample size of 87,915 people. In 1991-1993, the NHSDA included a special sample from six large MSAs. This sample was designed to provide independent estimates of prevalence of substance abuse in Chicago, Denver, Los Angeles, Miami, New York, and Washington, D.C.

    Since nearly all of the substance abuse data that are collected in the survey are highly sensitive, rigorous methods are used in the NHSDA to protect the privacy and confidentiality of responses. The interviewer works with the respondent to find a private place for the interview. After the respondent answers some of the less sensitive questions by responding to interviewer queries, he or she is trained in the completion of a series of self-administered questionnaires. These self-administered questionnaires allow the respondent to conceal his or her answers from both the interviewer and any household members who may be nearby. These methods that are undertaken to protect the privacy and confidentiality of respondents have been shown to increase the reporting of substance abuse.(footnote#13)

    Census data: Exhibit 2.1 lists the Census data that were used in the modeling. Exhibit 2.1 lists nineteen groups of variables that were considered as potential predictors in each of the 88 models formulated for this study. All of the attributes come from the 1990 U.S. Census long form sample.(footnote#15)

      

    Exhibit 2.1  1990 Census Variables Used to Model Prevalence of Substance Abuse.From Summary Tape File 3; 1990 Census of Population and Housing. 

    1.  Race x Hispanic - -   Percent:
     White nonHispanic
     Black nonHispanic
     Hispanic
     Other
    2.  Education for persons 18 or older- - Percent with:
     0-8 years
     9-12 years and no H.S. diploma
     H.S. graduate
     some college and no degree
     associate degree
     bachelors, graduate, or professional degree
    3.  Age - - Percent aged:
     0-18 years
     19-24 years
     25-34 years
     35-44 years
     45-54 years
     55-64 years
     65 and over
    4.  Poverty - - Percent:
     families below poverty level
    5.  Public Assistance - - Percent of:
     households with public assistance income
    6.  Disability - - Percent:
     persons 16-64 with a work disability
    7.  Household composition - - Percent:
     one-person households
     of households with female heads (no spouse present) with children under 18
    8.  Employment - - Percent:
     of men 16 years and older in the labor force
     of women 16 years and older in the labor force
    9.Housing value - owner occupied units:
    Median value of owner occupied housing units
    10.  Housing rent - rental units - -
    Median rents for rental units
    11.  Sex by marital status  (persons 16 years and older) - - Percent:
     Females currently married and not  separated
     Females separated, divorced, or widowed
     Females never married
     Males currently married and not separated
     Males separated, divorced, or widowed
     Males never married
    12.  Income - -
    Median Household Income
    13.  Urbanicity - - Percent:
     of persons residing in an urban place
    14.  Urbanized Area - - Percent:
     of persons in an MSA urbanized area
    15.  Age of Housing  Units (HU) - - Percent:
     of HUs built before 1939
     of HUs built from 1940 to 1949
    16.  High School Dropout Rate (Tract level only) - - Percent:
     of high school age children who have  dropped out
    17.  Underclass Tract Indicator (Tract level only)
     
    18.  Hispanic Subpopulations - - Percent:
     of Hispanics that are Cuban
     of Hispanics that are Puerto Rican
    19.  Other Race Subpopulations - - Percent:
     Population that is Asian and Pacific Islander
     Population that is Native American, Alaskan, or Aleut
    From Summary Tape File 3; 1990 Census of Population and Housing.(footnote#14)

     

    County level (social indicator) correlates of substance abuse: In addition to the Census variables, recoded county level 'social indicators' of substance abuse were also considered. These county level variables were obtained from three sources: The first of these sources is the FBI's Uniform Crime Reports data base for 1991. This source yielded data on arrest rates per 10,000 persons for illegal drug possession, and drug sales/manufacture by several reported drug categories, and on total violent crime arrest rates. The second source combined data from the 1991 and 1992 National Drug and Alcoholism Treatment Unit Survey (NDATUS) conducted by the Substance Abuse and Mental Health Services Administration. From this source, data was obtained on the 1991 and 1992 average treatment rates per 1,000 county residents for (1) alcohol treatment alone and (2) for illicit drug treatment (includes treatment for both drug and alcohol use). Thirdly, 1990 alcohol related death rates per 10,000 county residents were obtained from the National Center for Health Statistics national death certificate registry. Two such rates were considered in this research: (a) the A any related rate@ which includes all ICD-8 cause-of-death codes that are deemed to have a significant link to alcohol abuse, and (b) a more restrictive rate which requires explicit mention of alcohol on the death certificate.

    Block group level population projections: Population projections for 1992 for each Census block group, were obtained from Claritas, Inc.(footnote#16)   These projections included counts by age group (12-17, 18-24, 25-29, 30-34, 35+ years old), by gender, by race (white, black, American Indian plus Asians & Pacific Islanders, and other races) and by Hispanic indicator. These block group level population projections were adjusted by gender category to conform to the four age groups and the four race ethnicity groups employed for data reporting by the NHSDA.

      

    2.3.2 States and MSAs Selected for Estimation

    Composite small area estimators require at least some direct information for the small areas under consideration. Thus, only those States and MSAs which had some NHSDA sample points were included in the study. The States and MSAs selected for small area estimation are presented in Exhibit 2.2. This exhibit shows the number of people who responded to the combined 1991-1993 NHSDA surveys plus information on the NHSDA sample including numbers of the sample MSA/County units, sample block groups, and the estimated 1992 population.

    The States and MSAs presented in Exhibit 2.2 were chosen for small area estimation because:

    • The NHSDA sample size was large enough (~ 400 persons) to support model based, indirect estimation, and not necessarily large enough to support direct NHSDA estimation,
    • The number of distinct sample MSA/County units was greater than or equal to 4 units, and
    • The number of distinct sample segments was greater than 40 segments.

    Four exceptions to the four or more MSA/County unit rule were allowed for States that met the sample person and area segment minimums.

      

    Exhibit 2.2A States Selected for Inclusion in the Study: Population Size and NHSDA Sample Characteristics.

     

     

     

     

     

     

    STATE

    SAMPLE

    MSA/

    COUNTIES1

    SAMPLE

    BLOCK GROUPS2

    SAMPLE

    RESPONDING PERSONS3

    1992

    POPULATION

    PROJECTION4

     

    TOTAL UNITED STATES

    213

    8,942

    84,974

    205,945

     

    NORTH EAST REGION

    34

    1,489

    13,681

    42,236

     

    New Jersey

    6

    167

    1,523

    6,443

     

    New York

    7

    843

    8,505

    14,892

     

    Pennsylvania

    11

    269

    2,133

    9,945

     

    SOUTH REGION

    49

    1,723

    15,456

    71,396

     

    Florida

    13

    964

    10,066

    11,265

     

    Georgia

    5

    118

    1,061

    5,442

     

    Kentucky

    6

    112

    1,081

    3,053

     

    Louisiana

    6

    133

    1,099

    3,387

     

    North Carolina

    12

    222

    1,863

    5,641

     

    Oklahoma

    4

    63

    494

    2,563

     

    South Carolina

    4

    48

    330

    2,939

     

    Tennessee

    4

    92

    780

    4,117

     

    Texas

    13

    503

    5,082

    13,751

     

    Virginia

    9

    362

    3,538

    5,227

     

    West Virginia

    3

    44

    394

    1,497

     

    NORTH CENTRAL REGION

    85

    3,271

    32,346

    48,968

     

    Illinois

    6

    799

    8,088

    9,378

     

    Indiana

    6

    114

    978

    4,581

     

    Kansas

    4

    60

    515

    2,010

     

    Michigan

    5

    175

    1,187

    7,615

     

    Minnesota

    3

    76

    684

    3,576

     

    Missouri

    6

    120

    1,059

    4,223

     

    Ohio

    12

    264

    2,021

    8,946

     

    Wisconsin

    3

    55

    475

    4,021

     

    WEST REGION

    45

    2,459

    23,491

    43,346

     

    California

    22

    1,320

    12,364

    24,342

     

    New Mexico

    5

    74

    676

    1,199

     

    Oregon

    4

    59

    412

    2,397

     

    Washington

    3

    73

    690

    4,094

    1MSA/Counties refers to geographic entities formed to estimate random effect terms in the logistic model and which are generally analogous to NHSDA primary sampling units (PSUs). The exceptions are the distinct MSA constituents of PSUs that crossed State boundaries or combined more than one MSA.

    2Block groups refers to the sample segments which were selected at the second stage of selection in the 1991-1993 NHSDA.

    387,915 people responded to the 1991-1993 NHSDA however 2,941 people were omitted from the small area estimation research because of missing local area indicator variables which were used as potential predictors in the models.

    4Population projections presented in 1000's.

      

    Exhibit 2.2B - - MSA Small Areas Selected for Inclusion in the Study:  Population Size and NHSDA Sample Characteristics

     

     

     

     

     

    MSA

    SAMPLE

    BLOCK GROUPS2

    SAMPLE

    RESPONDING PERSONS3

    1992

    POPULATION

    PROJECTION4

     

     

    Anaheim-Santa Ana, CA

    *

    *

    1,996

     

    Atlanta, GA

    *

    *

    2,425

     

    Baltimore, MD

    *

    *

    1,996

     

    Boston, MA

    *

    *

    3,145

     

    Chicago, IL

    735

    7,537

    4,981

     

    Dallas, TX

    *

    *

    2,120

     

    Denver, CO

    719

    7,585

    1,346

     

    Detroit, MI

    *

    *

    3,593

     

    El Paso, TX

    *

    *

    456

     

    Houston, TX

    *

    *

    2,661

     

    Los Angeles, CA

    768

    7,533

    7,127

     

    Miami-Hialeah, FL

    725

    8,142

    1,600

     

    Minneapolis-St. Paul, MN

    *

    *

    2,035

     

    Nassau-Suffolk, NY

    *

    *

    2,178

     

    New York, NY

    730

    7,676

    7,086

     

    Newark, NJ

    *

    *

    1,500

     

    Oakland, CA

    *

    *

    1,727

     

    Philadelphia, PA-NJ

    *

    *

    4,037

     

    Phoenix, AZ

    *

    *

    1,770

     

    San Antonio, TX

    *

    *

    1,040

     

    San Bernardino, CA

    *

    *

    2,122

     

    San Diego, CA

    *

    *

    2,089

     

    St. Louis, MO-IL

    *

    *

    2,006

     

    Tampa-St. Petersburg, FL

    *

    *

    1,822

     

    Washington, DC

    725

    7,795

    3,345

    *Number of sample block groups ranged from 40 to 110; number of respondents ranged from 400 to 1200 in these MSAs.

    2Block groups refers to the sample segments which were selected at the second stage of selection in the 1991-1993 NHSDA.

    387,915 people responded to the 1991-1993 NHSDA however 2,941 people were omitted from the small area estimation research because of missing local area indicator variables which were used as potential predictors in the models.

    4Population projections presented in 1000's.

    See Appendix B for a list of counties included in each MSA.

      

    2.3.3 Summary of Methodology

    In summary, the estimates were produced by completing the following three basic steps:

    • Estimate regression parameters: Using NHSDA data, logistic regression models were developed which identified (and estimated the parameters associated with) local area indicators that were significant predictors of the eleven substance abuse measures in NHSDA sample locations. Separate models were run for each of the 11 outcome measures and for each of four age groups (12-17, 18-25, 26-34, and 35+). Because the 1991-1993 NHSDA included very large samples for six large US cities, separate models were fit collectively for these six large cities and for the remainder of the nation. Thus, a total of 88 separate models were used for this small area estimation. An important feature of these logistic regression models was the inclusion of random-effect parameters that adjust for the actual A direct@ estimates obtained from NHSDA sample data in the States and MSAs for which small area estimates were to be generated.
    • Apply regression parameters to the entire U.S.: Predicted estimates of substance abuse rates were generated for every Census block group in the U.S. by applying the regression parameters to the local area indicators, which were known for every block group. Within each Census block group, estimates were made for each of the 11 measures and for 32 demographic groups defined by four age groups, four race/ethnicity (Hispanic, non-Hispanic black, non-Hispanic white, non-Hispanic other race) groups, and gender, (4x4x2 = 32 demographic subgroups).
    • Sum up block group estimates to the State and MSA levels: Estimated rates were multiplied by population estimates for each block group, resulting in estimated numbers of people for each of the eleven attributes. Data from all block groups in States and MSAs were then summed to give final State and MSA total estimates.
    •   

    2.3.4 Confidence Intervals

    Asymmetric 95 percent confidence intervals were constructed based on the logit transformation. This is the typical procedure that is used when estimating characteristics with small prevalence. The mean square errors that are used to construct the confidence intervals include a contribution that accounts for the variances and covariances of the logistic regression coefficients incorporated in the indirect component ( ) and the sampling variance of the direct local area estimate. The weight put on these mean square error contributions depends on the values of the weights in Equation (1).

    This is the page footer.

    This page was last updated on June 16, 2008.

    SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.

    Yellow Line

    Site Map | Contact Us | Accessibility Privacy PolicyFreedom of Information ActDisclaimer  |  Department of Health and Human ServicesSAMHSAWhite HouseUSA.gov

    * Adobe™ PDF and MS Office™ formatted files require software viewer programs to properly read them. Click here to download these FREE programs now

    What's New

    Highlights Topics Data Drugs Pubs Short Reports Treatment Help Mail OAS