Appendix B: Statistical Methods and Limitations of the Data - 2000 NHSDA - Substance Dependence, Abuse, and Treatment

An important limitation of the National Household Survey on Drug Abuse (NHSDA) estimates of drug use prevalence is that they are only designed to describe the target population of the survey—the civilian, noninstitutionalized population aged 12 or older. Although this population includes almost 98 percent of the total U.S. population aged 12 or older, it excludes some important and unique subpopulations who may have very different drug-using patterns. For example, the survey excludes active military personnel, who have been shown to have significantly lower rates of illicit drug use. Persons living in institutional group quarters, such as prisons and residential drug treatment centers, are not included in the NHSDA and have been shown in other surveys to have higher rates of illicit drug use. Also excluded are homeless persons not living in a shelter on the survey date, another population shown to have higher than average rates of illicit drug use. Appendix D describes other surveys that provide data for these populations.

The sampling error of an estimate is the error caused by the selection of a sample instead of conducting a census of the population. Sampling error is reduced by selecting a large sample and by using efficient sample design and estimation strategies, such as stratification, optimal allocation, and ratio estimation.

With the use of probability sampling methods in the NHSDA, it is possible to develop estimates of sampling error from the survey data. These estimates have been calculated for all estimates presented in this report using a Taylor series linearization approach that takes into account the effects of the complex NHSDA design features. The sampling errors are used to identify unreliable estimates and to test for the statistical significance of differences between estimates.

Estimates of proportions, such as drug use prevalence rates, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics is performed using a first-order Taylor series approximation in the SUrvey DAta ANalysis (SUDAAN) statistical software package developed by RTI (Shah, Barnwell, & Bieler, 1996). The approximation is unbiased for sufficiently large samples and has proven to beat least as accurate and less costly to implement than its competitors, such as balanced repeated replication or jackknife methods (Rao & Wu, 1985).

Corresponding to proportion estimates,

, the number of drug users, Y_d , can be estimated as

where

is the estimated population total for domain d, and

is the estimated proportion for domain d. The standard error (SE) for the total estimate is obtained by multiplying the SE of the proportion by

, that is,

This approach is theoretically correct when the domain size estimates

are among those forced to Census Bureau population projections through the weight calibration process. In these cases,

is clearly not subject to sampling error.

For domain totals Y_d where

is not fixed, this formulation may still provide a good approximation if we can reasonably assume that the sampling variation in

is negligible relative to the sampling variation in

. In most analyses conducted for prior years, this has been a reasonable assumption. SUDAAN also provides an option to directly estimate the variance of the linear statistic that estimates a population total. Using this option did not affect the standard error (SE) estimates for the corresponding proportions presented in the same sets of tables.

As has been done in past NHSDA reports, direct survey estimates considered to be unreliable due to unacceptably large sampling errors are not shown in this report and are noted by asterisks (*) in the tables containing such estimates found in the appendices. The criterion used for suppressing all direct survey estimates was based on the relative standard error (RSE), which is defined as the ratio of the SE over the estimate.

Proportion estimates (p) within the range [0<p<1], rates, and corresponding estimated number of users were suppressed if

Using a first-order Taylor series approximation to estimate RSE[-ln(p)] and RSE[-ln(1 - p)], we have the following, which was used for computational purposes:

The separate formulas for p < 0.5 and p > 0.5 produce a symmetric suppression rule; that is, if p is suppressed, then so will 1 - p. This is an ad hoc rule that requires an effective sample size in excess of 50. When 0.05 < p < 0.95, the symmetric properties of the rule produces a local maximum effective sample size of 68 at p = 0.5. Thus, estimates with these values of p along with effective sample sizes falling below 68 are suppressed. A local minimum effective sample size of 50 occurs at p = 0.2 and again at p = 0.8 within this same interval; so, estimates are suppressed for values of p with effective sample sizes below 50.

In NHSDAs prior to the 2000 NHSDA, these varying sample size restrictions sometimes produced unusual occurrences of suppression for a particular combination of prevalence rates. For example, in some cases, lifetime prevalence rates near p = 0.5 were suppressed (effective sample size was <68 but >50), while not suppressing the corresponding past year or past month estimates near p = 0.2 (effective sample sizes were >50). To reduce the occurrence of this type of inconsistency, a minimum effective sample size of 68 was added to the suppression criteria. As p approached 0.00 or 1.00 outside the interval (0.05, 0.95), the suppression criteria still required increasingly larger effective sample sizes. For example, if p = 0.01 and 0.001, the effective sample size must exceed 152 and 684, respectively.

Also new to the 2000 survey were minimum nominal sample size suppression criteria (n = 100) that protect against unreliable estimates caused by small design effects and small nominal sample sizes. Prevalence estimates were also suppressed if they were close to 0 or 100 percent (i.e., if p < 0.00005 or if p > 0.99995). Estimates of other totals (e.g., number of initiates) along with means and rates (both not bounded between 0 and 1) were suppressed if RSE(p) > 0.5. Additionally, estimates of the mean age at first use were suppressed if the sample size was smaller than 10 respondents; moreover, the estimated incidence rate and number of initiates were suppressed if they rounded to 0. The suppression criteria for various NHSDA estimates are summarized in Table B.1.



Table B.1. Summary of 2000 NHSDA Suppression Rules
Estimate	Suppress if:
Prevalence rate, p, with nominal sample size, n, and design effect, deff	The estimated prevalence rate, p, is < 0.00005 or >0.99995, or when p < 0.5, or when p > 0.5, or Effective n < 68, or n < 100 where Note: The rounding portion of this suppression rule for prevalence rates will produce some estimates that round at one decimal place to 0.0% or 100.0% but are not suppressed from the tables.
Estimated number (numerator of p)	The estimated prevalence rate, p, is suppressed. Note: In some instances when p is not suppressed, the estimated number may appear as a 0 in the tables; this means that the estimate is >0 but <500 (estimated numbers are shown in thousands).
Mean age at first use, , with nominal sample size, n	, or n < 10
Incidence rate,	Rounds to < 0.1 per 1,000 person-years of exposure, or
Number of initiates,	Rounds to < 1,000 initiates, or
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

Nonsampling errors can occur from nonresponse, coding errors, computer processing errors, errors in the sampling frame, reporting errors, and other errors not due to sampling. Nonsampling errors are reduced through data editing, statistical adjustments for nonresponse, close monitoring and periodic retraining of interviewers, and improvement in various quality control procedures.

Although nonsampling errors can often be much larger than sampling errors, measurement of most nonsampling errors is difficult or impossible. However, some indication of the effects of some types of nonsampling errors can be obtained through proxy measures, such as response rates and from other research studies.

Response rates for the NHSDA were stable for the period from 1994 to 1998, with the screening response rate at about 93 percent and the interview response rate at about 78 percent (response rates discussed in this appendix are weighted). In 1999, the computer-assisted interviewing (CAI) screening response rate was 89.6 percent and the interview response rate was 68.6 percent. A more stable and experienced field interviewer (FI) workforce improved these rates in 2000. Of the 182,576 eligible households sampled for the 2000 NHSDA main study, 169,769 were successfully screened, for a weighted screening response rate of 92.8 percent (Table B.2). In these screened households, a total of 91,961 sample persons were selected, and completed interviews were obtained from 71,764 of these sample persons, for a weighted interview response rate of 73.9 percent (Table B.3). A total of 10,109 (15.0 percent) sample persons were classified as refusals, 4,834 (5.5 percent) were not available or never at home, and 5,254 (5.5 percent) did not participate for various other reasons, such as physical or mental incompetence or language barrier. Tables B.4 and B.5 show the distribution of the selected sample by interview code and age group. The weighted interview response rate was highest among 12 to 17 year olds (82.6 percent), females (75.1 percent), blacks and Hispanics (76.2 and 78.0 percent, respectively), in nonmetropolitan areas (77.6 percent), and among persons residing in the South (76.4 percent) (Table B.6).



Table B.3 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Persons Aged 12 or Older
Final Interview Code	1999 NHSDA		2000 NHSDA
Final Interview Code	Sample Size	Weighted Percentage	Sample Size	Weighted Percentage
Total Selected Persons	89,883	100.00	91,961	100.00
Interview complete	66,706	68.55	71,764	73.93
No one at dwelling unit	1,795	2.13	1,776	2.02
Respondent unavailable	3,897	4.53	3,058	3.52
Breakoff	50	0.07	72	0.09
Physically/mentally incompetent	1,017	2.62	1,053	2.57
Language barrier—Spanish	168	0.12	109	0.08
Language barrier—Other	480	1.46	441	1.06
Refusal	11,276	17.98	10,109	14.99
Parental refusal	2,888	1.01	2,655	0.88
Other	1,606	1.53	924	0.86
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.



Table B.4 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Youths Aged 12 to 17
Final Interview Code	1999 NHSDA		2000 NHSDA
Final Interview Code	Sample Size	Weighted Percentage	Sample Size	Weighted Percentage
Total Selected Persons	32,011	100.00	31,242	100.00
Interview complete	25,384	78.07	25,756	82.58
No one at dwelling unit	322	1.09	278	0.86
Respondent unavailable	872	3.04	617	2.05
Breakoff	13	0.03	18	0.05
Physically/mentally incompetent	244	0.76	234	0.76
Language barrier—Spanish	15	0.03	10	0.03
Language barrier—Other	58	0.18	50	0.20
Refusal	1,808	5.97	1,455	4.52
Parental refusal	2,885	9.50	2,641	8.35
Other	410	1.33	183	0.59
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.



Table B.5 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Persons Aged 18 or Older
Final Interview Code	1999 NHSDA		2000 NHSDA
Final Interview Code	Sample Size	Weighted Percentage	Sample Size	Weighted Percentage
Total Selected Persons	57,872	100.00	60,719	100.00
Interview complete	41,322	67.41	46,008	72.92
No one at dwelling unit	1,473	2.25	1,498	2.16
Respondent unavailable	3,025	4.71	2,441	3.69
Breakoff	37	0.07	54	0.09
Physically/mentally incompetent	773	2.85	819	2.78
Language barrier—Spanish	153	0.13	99	0.09
Language barrier—Other	422	1.62	391	1.16
Refusal	9,468	19.41	8,654	16.22
Parental refusal	3	0.00	14	0.01
Other	1,196	1.55	741	0.89
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

Table B.6 Response Rates and Sample Sizes for the 1999 and 2000 NHSDAs, by Demographic Characteristics
	1999 NHSDA			2000 NHSDA
	Selected Persons	Completed Interviews	Weighted Response Rate	Selected Persons	Completed Interviews	Weighted Response Rate
Total	89,883	66,706	68.55%	91,961	71,764	73.93%
Age in Years
12-17	32,011	25,384	78.07%	31,242	25,756	82.58%
18-25	30,439	22,151	71.21%	29,424	22,849	77.34%
26 or older	27,433	19,171	66.76%	31,295	23,159	72.17%
Gender
Male	43,883	31,987	67.12%	44,899	34,375	72.68%
Female	46,000	34,719	69.81%	47,062	37,389	75.09%
Race/Ethnicity
Hispanic	11,203	8,755	74.59%	11,454	9,396	77.95%
Non-Hispanic, white	63,211	46,272	67.98%	64,517	49,631	73.39%
Non-Hispanic, black	10,552	8,044	70.39%	10,740	8,638	76.19%
Non-Hispanic, all other races	4,917	3,635	59.28%	5,250	4,099	67.31%
Region
Northeast	16,794	11,830	64.03%	18,959	14,394	71.68%
Midwest	24,885	18,103	69.63%	25,428	19,355	73.23%
South	27,390	21,018	70.93%	27,217	22,041	76.38%
West	20,814	15,755	67.47%	20,357	15,974	72.68%
County Type
Large metropolitan	36,101	25,901	65.15%	37,754	28,744	71.77%
Small metropolitan	30,642	22,612	69.98%	31,400	24,579	74.96%
Nonmetropolitan	23,140	18,193	74.97%	22,807	18,441	77.58%
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

The increase in nonresponse between the 1998 and 1999 NHSDAs can be attributed primarily to the hiring of many new and inexperienced FIs in 1999 and a larger than usual turnover. By the end of 2000, the interviewer workforce primarily consisted of experienced interviewers, with fewer FIs leaving for other jobs. In 1999, there were 1,997 FIs hired and trained to conduct the CAI and paper-and-pencil interviewing (PAPI) surveys. More than a third of them did not complete the survey year (37.7 percent). In 2000, the number of trained interviewers decreased to 1,356 (because only CAI interviews were conducted in 2000), and the attrition rate dropped to 29.8 percent. Both prior NHSDA experience and on-the-job experience were shown to be related to nonresponse. Previously experienced interviewers and interviewers with one, two, or three quarters of on-the-job experience were more successful at obtaining an interview.

The overall weighted response rate, defined as the product of the weighted screening response rate and weighted interview response rate, was 61.5 percent in 1999 and 68.6 percent in 2000 (an 11.5 percent improvement over the 1999 rate). Nonresponse bias can be expressed as the product of the response rate (R) and the difference between the characteristic of interest between respondents and nonrespondents in the population (P_r - P_nr). Thus, assuming the quantity (P_r - P_nr) is fixed over time, the improvement in response rates in 2000 should result in estimates with lower nonresponse bias.

Among survey participants, item response rates were above 98 percent for most questionnaire items. However, inconsistent responses for some items, including the drug use items, were common. Estimates of substance use from the NHSDA are based on the responses to multiple questions by respondents, so that the maximum amount of information is used in determining whether a respondent is classified as a drug user. Inconsistencies in responses are resolved through a logical editing process that involves some judgment on the part of survey analysts and is a potential source of nonsampling error. Because of the automatic routing through the CAI questionnaire (e.g., lifetime drug use questions that skip entire modules when answered "no"), there is less editing of this type than in the PAPI questionnaire used in previous years.

In addition, less logical editing is used because with the CAI data, statistical imputation is relied upon more heavily to determine the final values of drug use variables in cases where there is the potential to use logical editing to make a determination. The combined amount of editing and imputation in the CAI data is still considerably less than the total amount used in prior PAPI surveys. For the 2000 CAI data, for example, 3.2 percent of the estimate of past month hallucinogen use is based on logically edited cases and 5.4 percent on imputed cases, for a combined amount of 8.6 percent. For the 1999 CAI data, 1.7 percent of the estimate of past month hallucinogen use is based on logically edited cases and 4.6 percent on imputed cases, for a combined amount of 6.2 percent. In the 1998 NHSDA (administered using PAPI), the amount of editing and imputation for past month hallucinogen use was 60 and 0 percent, respectively, for a total of 60 percent. The combined amount of editing and imputation for the estimate of past month heroin use is 5.0 percent for the 2000 CAI, 14.8 percent for the 1999 CAI, and 37.0 percent for the 1998 PAPI data.

NHSDA estimates are based on self-reports of drug use, and their value depends on respondents' truthfulness and memory. Although many studies have generally established the validity of self-report data and the NHSDA procedures were designed to encourage honesty and recall, some degree of underreporting is assumed (Harrell, 1997; Harrison & Hughes, 1997; Rouse, Kozel, & Richards, 1985). No adjustment to NHSDA data is made to correct for this. The methodology used in the NHSDA has been shown to produce more valid results than other self-report methods (e.g., by telephone) (Aquilino, 1994; Turner, Lessler, & Gfroerer, 1992). However, comparisons of NHSDA data with data from surveys conducted in classrooms suggest that underreporting of drug use by youths in their homes may be substantial (Gfroerer, 1993; Gfroerer, Wright, & Kopstein, 1997). The results of several studies indicate that underreporting of drug use increases as the social stigma associated with the drug increases (Cisin & Parry, 1980, as cited in Harrell, 1997). Because estimates of drug dependence and abuse are generally associated with a higher level social stigma than drug use, these estimates may be underreported more than the estimates of drug use. In the same study cited by Harrell (1997), failure to report treatment was not associated with the level of stigma attached to the drug at the time of admission. However, when responses on treatment received from former drug treatment clients were compared with their treatment records, almost half denied ever receiving treatment.

A study to validate the dependence and abuse questions currently in the NHSDA has been designed. This study will compare estimates of dependence and abuse using the current NHSDA questions administered using ACASI with the measures from a structured clinical interview. Interviewing for the study will begin in 2002.

SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.

* Adobe™ PDF and MS Office™ formatted files require software viewer programs to properly read them. Click here to download these FREE programs now



Table B.2 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Screening Result Code
Screening Result	1999 NHSDA		2000 NHSDA
Screening Result	Sample Size	Weighted Percentage	Sample Size	Weighted Percentage
Total Sample	223,868	100.00	215,860	100.00
Ineligible cases	36,026	15.78	33,284	15.09
Eligible cases	187,842	84.22	182,576	84.91
Ineligibles	36,026	100.00	33,284	100.00
Vacant	18,034	49.71	16,796	50.76
Not a primary residence	4,516	12.90	4,506	13.26
Not a dwelling unit	4,626	12.70	3,173	9.33
All military personnel	482	1.22	414	1.21
Other, ineligible	8,368	23.46	8,395	25.43
Eligible Cases	187,842	100.00	182,576	100.00
Screening complete	169,166	89.63	169,769	92.84
No one selected	101,537	54.19	99,999	55.36
One selected	44,436	23.63	46,981	25.46
Two selected	23,193	11.82	22,789	12.03
Screening not complete	18,676	10.37	12,807	7.16
No one home	4,291	2.38	3,238	1.82
Respondent unavailable	651	0.36	415	0.24
Physically or mentally incompetent	419	0.24	310	0.16
Language barrier—Hispanic	102	0.06	83	0.05
Language barrier—other	486	0.28	434	0.27
Refusal	11,097	5.92	7,535	4.14
Other, access denied	1,536	1.08	748	0.45
Other, eligible	38	0.02	7	0.00
Other, problem case	56	0.03	37	0.02
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.