Skip To Content Table Of Contents
Click for DHHS Home Page
Click for the SAMHSA Home Page
Click for the OAS Drug Abuse Statistics Home Page
Click for What's New
Click for Recent Reports and HighlightsClick for Information by Topic Click for OAS Data Systems and more Pubs Click for Data on Specific Drugs of Use Click for Short Reports and Facts Click for Frequently Asked Questions Click for Publications Click to send OAS Comments, Questions and Requests Click for OAS Home Page Click for Substance Abuse and Mental Health Services Administration Home Page Click to Search Our Site

Initiation of Marijuana Use: Trends, Patterns, and Implications

Appendix A: Statistical Methods and Limitations of the Data

A.1 Target Population

An important limitation of the National Household Survey on Drug Abuse (NHSDA) estimates of drug use prevalence is that they are only designed to describe the target population of the survey—the civilian, noninstitutionalized population aged 12 or older. Although this population includes almost 98 percent of the total U.S. population aged 12 or older, it excludes some important and unique subpopulations who may have very different drug-using patterns. For example, the survey excludes active military personnel, who have been shown to have significantly lower rates of illicit drug use. Persons living in institutional group quarters, such as prisons and residential drug treatment centers, are not included in the NHSDA and have been shown in other surveys to have higher rates of illicit drug use. Also excluded are homeless persons not living in a shelter on the survey date, another population shown to have higher than average rates of illicit drug use.

A.2 Sampling Error and Statistical Significance

The sampling error of an estimate is the error caused by the selection of a sample instead of conducting a census of the population. Sampling error is reduced by selecting a large sample and by using efficient sample design and estimation strategies, such as stratification, optimal allocation, and ratio estimation.

With the use of probability sampling methods in the NHSDA, it is possible to develop estimates of sampling error from the survey data. These estimates have been calculated for all estimates presented in this report using a Taylor series linearization approach that takes into account the effects of the complex NHSDA design features. The sampling errors are used to identify unreliable estimates and to test for the statistical significance of differences between estimates.

A.2.1 Variance Estimation for Totals

Estimates of proportions, such as drug use prevalence rates, take the form of nonlinear statistics where the variances cannot be expressed in closed form. Variance estimation for nonlinear statistics is performed using a first-order Taylor series approximation in the SUrvey DAta ANalysis (SUDAAN) statistical software package developed by RTI (Shah et al., 1996). The approximation is unbiased for sufficiently large samples and has proven to be at least as accurate and less costly to implement than its competitors, such as balanced repeated replication or jackknife methods (Rao & Wu, 1985).

Corresponding to proportion estimates, , the number of drug users, Yd, can be estimated as

where is the estimated population total for domain d, and is the estimated proportion for domain d. The standard error (SE) for the total estimate is obtained by multiplying the SE of the proportion by , that is,

 .

This approach is theoretically correct when the domain size estimates are among those forced to Census Bureau population projections through the weight calibration process. In these cases, is clearly not subject to sampling error.

For domain totals Yd where is not fixed, this formulation may still provide a good approximation if we can reasonably assume that the sampling variation in is negligible relative to the sampling variation in . In most analyses conducted for prior years, this has been a reasonable assumption. SUDAAN also provides an option to directly estimate the variance of the linear statistic that estimates a population total. Using this option did not affect the SE estimates for the corresponding proportions presented in the same sets of tables.

A.2.2 Suppression Criteria for Unreliable Estimates

As has been done in past NHSDA reports, direct survey estimates considered to be unreliable due to unacceptably large sampling errors are not shown in this report and are noted by asterisks (*) in the tables containing such estimates found in the appendices. The criterion used for suppressing all direct survey estimates was based on the relative standard error (RSE), which is defined as the ratio of the standard error (SE) over the estimate.

Proportion estimates (p) within the range [0<p<1], rates, and corresponding estimated number of users were suppressed if

RSE[-ln(p)] > 0.175 when p < 0.5

or

RSE[-ln(1 - p)] > 0.175 when p >0.5 .

Using a first-order Taylor series approximation to estimate RSE[-ln(p)] and RSE[-ln(1 - p)], we have the following, which was used for computational purposes:

[SE(p)/p ÷ -1n( p)] > 0.175 when p < 0.5

or

[SE(p)/(1-p) ÷ -1n(1 - p)] > 0.175 when p 0.5 .

The separate formulas for p < 0.5 and p > 0.5 produce a symmetric suppression rule; that is, if p is suppressed, then so will 1 - p. This is an ad hoc rule that requires an effective sample size in excess of 50. When 0.05 < p < 0.95, the symmetric properties of the rule produces a local maximum effective sample size of 68 at p = 0.5. Thus, estimates with these values of p along with effective sample sizes falling below 68 are suppressed. A local minimum effective sample size of 50 occurs at p = 0.2 and again at p = 0.8 within this same interval; so, estimates are suppressed for values of p with effective sample sizes below 50.

In NHSDAs prior to the 2000 NHSDA, these varying sample size restrictions sometimes produced unusual occurrences of suppression for a particular combination of prevalence rates. For example, in some cases, lifetime prevalence rates near p = 0.5 were suppressed (effective sample size was <68 but >50), while not suppressing the corresponding past year or past month estimates near p = 0.2 (effective sample sizes were >50). To reduce the occurrence of this type of inconsistency, a minimum effective sample size of 68 was added to the suppression criteria. As p approached 0.00 or 1.00 outside the interval (0.05, 0.95), the suppression criteria still required increasingly larger effective sample sizes. For example, if p = 0.01 and 0.001, the effective sample size must exceed 152 and 684, respectively.

Also new to the 2000 survey were minimum nominal sample size suppression criteria (n = 100) that protect against unreliable estimates caused by small design effects and small nominal sample sizes. Prevalence estimates were also suppressed if they were close to 0 or 100 percent (i.e., if p < 0.00005 or if p > 0.99995). Estimates of other totals (e.g., number of initiates) along with means and rates (both not bounded between 0 and 1) were suppressed if RSE(p) > 0.5. Additionally, estimates of the mean age at first use were suppressed if the sample size was smaller than 10 respondents; moreover, the estimated incidence rate and number of initiates were suppressed if they rounded to 0. The suppression criteria for various NHSDA estimates are summarized in Table A.1.

A.3 Nonsampling Error

Nonsampling errors can occur from nonresponse, coding errors, computer processing errors, errors in the sampling frame, reporting errors, and other errors not due to sampling. Nonsampling errors are reduced through data editing, statistical adjustments for nonresponse, close monitoring and periodic retraining of interviewers, and improvement in various quality control procedures.

Although nonsampling errors can often be much larger than sampling errors, measurement of most nonsampling errors is difficult or impossible. However, some indication of the effects of some types of nonsampling errors can be obtained through proxy measures, such as response rates and from other research studies.

A.3.1 Screening and Interview Response Rate Patterns

Response rates for the NHSDA were stable for the period from 1994 to 1998, with the screening response rate at about 93 percent and the interview response rate at about 78 percent (response rates discussed in this appendix are weighted). In 1999, the computer-assisted interviewing (CAI) screening response rate was 89.6 percent and the interview response rate was 68.6 percent. A more stable and experienced field interviewer (FI) workforce improved these rates in 2000. Of the 182,576 eligible households sampled for the 2000 NHSDA main study, 169,769 were successfully screened, for a weighted screening response rate of 92.8 percent (Table A.2). In these screened households, a total of 91,961 sample persons were selected, and completed interviews were obtained from 71,764 of these sample persons, for a weighted interview response rate of 73.9 percent (Table A.3). A total of 10,109 (15.0 percent) sample persons were classified as refusals, 4,834 (5.5 percent) were not available or never at home, and 5,254 (5.5 percent) did not participate for various other reasons, such as physical or mental incompetence or language barrier. Tables A.4 and A.5 show the distribution of the selected sample by interview code and age group. The weighted interview response rate was highest among 12 to 17 year olds (82.6 percent), females (75.1 percent), blacks and Hispanics (76.2 and 78.0 percent, respectively), in nonmetropolitan areas (77.6 percent), and among persons residing in the South (76.4 percent) (Table A.6).

The increase in nonresponse between the 1998 and 1999 NHSDAs can be attributed primarily to the hiring of many new and inexperienced FIs in 1999 and a larger than usual turnover. By the end of 2000, the interviewer workforce primarily consisted of experienced interviewers, with fewer FIs leaving for other jobs. In 1999, there were 1,997 FIs hired and trained to conduct the CAI and paper-and-pencil interviewing (PAPI) surveys. More than a third of them did not complete the survey year (37.7 percent). In 2000, the number of trained interviewers decreased to 1,356 (because only CAI interviews were conducted in 2000), and the attrition rate dropped to 29.8 percent. Both prior NHSDA experience and on-the-job experience were shown to be related to nonresponse. Previously experienced interviewers and interviewers with one, two, or three quarters of on-the-job experience were more successful at obtaining an interview.

The overall weighted response rate, defined as the product of the weighted screening response rate and weighted interview response rate, was 61.5 percent in 1999 and 68.6 percent in 2000 (an 11.5 percent improvement over the 1999 rate). Nonresponse bias can be expressed as the product of the response rate (R) and the difference between the characteristic of interest between respondents and nonrespondents in the population (Pr - Pnr). Thus, assuming the quantity (Pr - Pnr) is fixed over time, the improvement in response rates in 2000 should result in estimates with lower nonresponse bias.

A.3.2 Inconsistent Responses and Item Nonresponse

Among survey participants, item response rates were above 98 percent for most questionnaire items. However, inconsistent responses for some items, including the drug use items, were common. Estimates of substance use from the NHSDA are based on the responses to multiple questions by respondents, so that the maximum amount of information is used in determining whether a respondent is classified as a drug user. Inconsistencies in responses are resolved through a logical editing process that involves some judgment on the part of survey analysts and is a potential source of nonsampling error. Because of the automatic routing through the CAI questionnaire (e.g., lifetime drug use questions that skip entire modules when answered "no"), there is less editing of this type than in the PAPI questionnaire used in previous years.

In addition, less logical editing is used because with the CAI data, statistical imputation is relied upon more heavily to determine the final values of drug use variables in cases where there is the potential to use logical editing to make a determination. The combined amount of editing and imputation in the CAI data is still considerably less than the total amount used in prior PAPI surveys. For the 2000 CAI data, for example, 3.2 percent of the estimate of past month hallucinogen use is based on logically edited cases and 5.4 percent on imputed cases, for a combined amount of 8.6 percent. For the 1999 CAI data, 1.7 percent of the estimate of past month hallucinogen use is based on logically edited cases and 4.6 percent on imputed cases, for a combined amount of 6.2 percent. In the 1998 NHSDA (administered using PAPI), the amount of editing and imputation for past month hallucinogen use was 60 and 0 percent, respectively, for a total of 60 percent. The combined amount of editing and imputation for the estimate of past month heroin use is 5.0 percent for the 2000 CAI, 14.8 percent for the 1999 CAI, and 37.0 percent for the 1998 PAPI data.

A.3.3 Validity of Self-Reported Use

NHSDA estimates are based on self-reports of drug use, and their value depends on respondents' truthfulness and memory. Although many studies have generally established the validity of self-report data and the NHSDA procedures were designed to encourage honesty and recall, some degree of underreporting is assumed (Harrell, 1997; Harrison & Hughes, 1997; Rouse, Kozel, & Richards, 1985). No adjustment to NHSDA data is made to correct for this. The methodology used in the NHSDA has been shown to produce more valid results than other self-report methods (e.g., by telephone) (Aquilino, 1994; Turner, Lessler, & Gfroerer, 1992). However, comparisons of NHSDA data with data from surveys conducted in classrooms suggest that underreporting of drug use by youths in their homes may be substantial (Gfroerer, 1993; Gfroerer et al., 1997).

A.4 Incidence Estimates

For diseases, the incidence rate for a population is defined as the number of new cases of the disease, N, divided by the person time, PT, of exposure or:

.

The person time of exposure can be measured for the full period of the study or for a shorter period. The person time of exposure ends at the time of diagnosis (e.g., Greenberg, Daniels, Flanders, Eley, & Boring, 1996, pp. 16-19). Similar conventions are applied for defining the incidence of first use of a substance.

Beginning in 1999, the NHSDA questionnaire allows for collection of year and month of first use for recent initiates. Month, day, and year of birth are also obtained directly or imputed in the process. In addition, the questionnaire call record provides the date of the interview. By imputing a day of first use within the year and month of first use reported or imputed, the key respondent inputs in terms of exact dates are known. Exposure time can be determined in terms of days and converted to an annual basis.

Having exact dates of birth and first use also allows us to determine person time of exposure during the targeted period, t. Let the target time period for measuring incidence be specified in terms of dates; for example, for the period 1998 we would specify:

,

a period that includes 1 January 1998 and all days up to but not including 1 January 1999. The target age group can also be defined by a half-open interval as a= [a1,a2). For example, the age group 12 to 17 would be defined by a=[12, 18) for persons at least age 12, but not yet age 18. If person I was in age group a during period t, the time and age interval, , can then be determined by the intersection:

assuming we can write the time of birth as in terms of day (DOBi), month (MOBi), and year (YOBi). Either this intersection will be empty () or we will designate it by the half-open interval where:

and

.

The date of first use, tfu,d,i, is also expressed as an exact date. An incident of first drug d use by person I in age group a occurs in time t if . The indicator function I i (d, a, t) used to count incidents of first use is set to 1 when , and to 0 otherwise. The person time exposure measured in years and denoted by ei(d, a, t) for a person I of age group a depends on the date of first use. If the date of first use precedes the target period (), then ei(d, a, t) = 0. If the date of first use occurs after the target period or if person I has never used drug d, then

.

If the date for first use occurs during the target period Lt,a,i, then

.

Note that both Ii (d, a, t) and ei(d, a, t) are set to zero if the target period Lt,a,i is empty; that is, person I is not in age group a during time t. The incidence rate is then estimated as a weighted ratio estimate:

where the wi are the analytic weights.

Prior to the 1999 survey, exact date data were not available for computing incidence rates. For these rates, a person was considered to be of age a during the entire time interval t if his or her ath birthday occurred during time interval t (generally, a single year). If the person initiated use during the year, the person time exposure was approximated as one-half year for all such persons rather than computing it exactly for each person.

Because of the new methodology, incidence estimates discussed in this report are not strictly comparable with the estimates presented before the 1999 NHSDA. Because they are based on retrospective reports by survey respondents, as was the case for earlier estimates, they may be subject to some of the same kinds of biases.

Bias due to differential mortality occurs because some persons who were alive and exposed to the risk of first drug use in the historical periods shown in the tables died before the 1999 NHSDA was conducted. This bias is probably very small for estimates shown in this report. Incidence estimates are also affected by memory errors, including recall decay (tendency to forget events occurring long ago) and forward telescoping (tendency to report that an event occurred more recently than it actually did). These memory errors would both tend to result in estimates for earlier years (i.e., 1960s and 1970s) that are downwardly biased (because of recall decay) and estimates for later years that are upwardly biased (because of telescoping). There is also likely to be some underreporting bias due to social acceptability of drug use behaviors and respondents' fear of disclosure. This is likely to have the greatest impact on recent estimates, which reflect more recent use and reporting by younger respondents. Finally, for drug use that is frequently initiated at age 10 or younger, estimates based on retrospective reports 1 year later underestimate total incidence because 11-year-old children are not sampled by the NHSDA. Prior analyses showed that alcohol and cigarette (any use) incidence estimates could be significantly affected by this.

Johnson et al. (1998) concluded that the marijuana incidence trend from the NHSDA was biased because the reporting of initiation declines as the length of time between initiation and the survey increases. However, their study did not address very recent estimates (i.e., 1996 to 1998), which could be biased because they reflect recent drug use and because they are heavily based on the reports of adolescents. To better understand the size of the biases and to assess the reliability of estimates for recent years, OAS performed an analysis of estimates based on single years of NHSDA data. This analysis focused on three drugs: cocaine, heroin, and marijuana. Using the survey data from 1994 to 1998, estimates were made of the number of initiates, the rate of initiation for youths aged 12 to 17, and the rate of initiation for persons aged 18 to 25. For the 1994 survey, an estimate was made for the year 1993. For the 1995 survey, another estimate was made for the year 1993. In this way, two recent estimates of the same year could be compared. Similarly, the 1995 and 1996 data provided two estimates for 1994, the 1996 and 1997 surveys provided two estimates for 1995, and the 1997 and 1998 surveys provided two estimates for 1996. Because these calculations represent two measurements of the same population characteristic, they would ideally be the same. Examples of these estimates are shown in Table A.7.

Drug initiation rates for youths aged 12 to 17 for the more hard-core drugs (e.g., cocaine and heroin) appeared to be most prone to bias. For example, on average across the 4 survey years, the estimate for the rate of initiation of cocaine use among youths aged 12 to 17 was 48 percent higher the first time the estimate could be made than the second time. This indicates a probable bias in the estimation; however, it is unclear which estimate is the correct one. As a result, one should be cautious in interpreting any changes between the prior year and the most recent year in the initiation rates for youths of the more stigmatized drugs. Because only 5 years of data were used to estimate how the rate of incidence changed between the first year it could be estimated and the second, one should be cautious about inferring the magnitude of the bias (e.g., that it was 48 percent for cocaine).

In Table A.7, the average ratio of 1-year recall to 2-year recall is calculated across 4 "years." Implicit in the table is the fact that the estimates for each ratio vary around the average. For example, taking the 18 to 25 marijuana incidence numbers, the four individual ratios can be calculated as 1.13, .75, 0.89, and 1.06. Although the average ratio was 0.96, the year-to-year variation was much larger, ranging from 0.75 to 1.13. So, it is clear that for any single year, the bias implied by the sample estimates could be negative or positive. Because we are not clear whether the 1-year recall or the 2-year recall estimate is closer to an unbiased true value, the estimate that we use for the most recent year could be as much as 25 percent too high or too low in this example. The samples for 1999 and 2000 based on the new CAI method were significantly larger than those in prior years; therefore, estimates of bias should suffer from less sampling variability and the estimates should be less variable than before. Nevertheless, because there are only 2 years under the new CAI method, and, therefore, only one calculation possible of the ratio of the 1- to 2-year recall, more analysis is needed to see how stable the new estimates from CAI will be.

A.5 Changes in NHSDA Measures of Substance Use Initiation

The redesign of the NHSDA in 1999 introduced some changes in the initiation of use questions and the method of administration. In the presence of these changes, the overall data processing and estimation methodologies were revised. A new incidence rate methodology was developed. This section discusses the impact of methodological change on substance use initiation measures: the change in the incidence rate estimation method and its impact, the impact of the editing and imputing changes, and the questionnaire wording and administration mode effects.

A.5.1 Impact of Imputation and Incidence Rate Calculation Method

Prior to 1999, the only questions about initiation of drug use asked the respondent to report his or her age at first use for specified drugs. Each respondent's year at first use was imputed based on the reported age at first use, the interview date, and the respondent's date of birth. The imputed year was used to develop estimates of annual initiates and to develop the respondent-level numerator and denominator inputs to the incidence rate calculation.

For the redesigned CAI instrument, additional questions about initiation of drug use were included. Recent users (persons who first used at their current age or at their current age minus 1 year) were asked to report not only their age at first use for specified drugs, but also the month and year of that first use. As a result, the exact month of first use for specified drugs was known completely or in part (sometimes month or year were not reported) for 7 to 16 percent of the drug users (depending on the drug) in the 1999 NHSDA sample. The questionnaire also changed due to the routing logic used in the CAI instrument, which helped automatically resolve data inconsistencies between related items. For example, respondents were asked their age at first substance use and were prompted to review their response if the reported age at first use was inconsistent with their reported current age.

These changes led to three methodology changes used in the calculation of the 1999 NHSDA drug incidence rates. First, missing age at first use data were imputed, which resulted in consistent and nonmissing age at first use data for all users. Prior to the 1999 data, respondents with a missing age at first use were simply excluded from the calculation of incidence numbers and rates. Second, the assignment of the date of first use was refined such that the assigned date was now consistent with other reported related information, such as drug use recency and frequency data. Third, the improved data on age at first drug use and date of first drug use allow a more exact person time of exposure during the targeted period to be determined. For example, if a respondent was deemed a drug user and did not answer the age at first use question, the response was statistically imputed to eliminate missing data. An exact date of first use (i.e., the month, day, and year of first use) was then assigned by randomly picking a date within the 365 days corresponding to the respondent's age at first drug use. By using this date of first use in conjunction with the birth date, the computation of the period of exposure can be determined exactly in terms of whole days.

The new combined editing and imputation procedures flag more inconsistencies, impute for both missing and inconsistent reports, and retain the imputed date of first use consistent with reported age at first use and other drug use measures. The availability of an imputed date of first use for each lifetime substance user enabled consideration of a more precise approach to calculate substance use incidence rates. The new incidence rate calculation method accounts for the fact that this person's age does not exactly intersect calendar time in whole years. Details on the new methodology are reported by Gfroerer et al. (in press).

A.5.2 Impact on Incidence Rate Estimates

Incidence rate estimates are impacted by both the new editing and imputation procedures and the incidence rate calculation method. To sort out the separate impacts of these two changes, age-specific incidence rates were computed from the 1999 CAI data using three methods: (1) new methodology using imputed data, (2) new methodology using edited data, and (3) old methodology using imputed data.

The effect of the new editing and imputation procedures can be evaluated using the new incidence rate calculation method and comparing the results from the fully imputed data (the first two data columns) with the results from edited data only (the middle two data columns). The annual estimates for marijuana in Table A.8 show 11 statistically significant differences for youths aged 12 to 17, and 7 of the 11 differences were higher with imputation. At ages 18 to 25, all six significant differences favor the imputed data. The general tendency was for incidence rates based on fully edited and imputed data to be higher than those based on the older edit-only approach. The average effects on marijuana incidence estimates for youths aged 12 to 17 were 58.3 with imputation and 57.3 with editing only, a relative increase of 1.5 percent (data not shown). For persons aged 18 to 25, the averages were 40.8 with imputation and 39.9 with editing only, a relative increase of 2.3 percent.

A second set of comparisons looked at the differences between the old and new method of calculating age-specific rates using imputed data in both methods. This comparison illustrates the difference in the two calculation methodologies holding the editing and imputation constant (at the fully edited and imputed level). These differences based on 1999 CAI data are shown by years in Table A.8 by comparing the first two data columns (new method - imputed variables) with the last two data columns (old method - imputed variables).

The new methodology removed some of these borderline cases from the calculation of the 18 to 25 age-specific rates and correctly placed them into the calculation of the 12 to 17 age-specific rates. Although both the numerator (new initiates) and the denominator (exposure time) were influenced by the change in method, the main impact was through the classification of initiates by age group in the numerator. Under the new method, new initiates were assigned to an age group based on their attained age at the date of initiation. Under the old method, new initiates were assigned to an age group based on their age at their birthday during the current year. Under the old methodology, many of the 17-year-old initiates were being counted in the 18 to 25 age-specific rate. However, the new methodology placed them in a correct age group based on their attained age at the date of initiation. On average, this resulted in an increase of almost 13 percent for marijuana incidence rates at ages 12 to 17 and a decrease of about 10 percent at ages 18 to 25 (data not shown).

A.5.3 Impact of the New Editing and Imputation Procedures

Table A.9 shows the impact of the new editing and imputation for 1999 CAI data on the annual number of marijuana initiates and the mean age at first use for marijuana. Estimates based on 1994-1998 combined PAPI data are also presented in the table as an indication of the overall impact of interview mode and revised editing and imputation procedures. (1)

Comparisons of the estimated numbers of marijuana initiates based on edited versus imputed 1999 CAI showed an increase for the imputed data: 25 significant differences showing higher marijuana estimates from imputed data. The multiyear average numbers of marijuana initiates increased 2.4 percent (data not shown).

The impact of the 1999 editing and imputation procedures on estimates of average age at first use were small and mixed. Comparisons against edited 1999 CAI data showed two significant differences favoring the imputed data and six favoring the edited data for marijuana. Multiyear averages showed a 0.02 percent increase in average age at first use for marijuana (data not shown). In general, the relative impact of the 1999 imputation procedures on estimated average age at first use was small relative to the impact on estimates of initiates or of incidence rates. With so few significant differences and no correction for multiple comparisons, there is little evidence for concluding any differences between the 1994-1998 PAPI data and the 1999 CAI data with respect to average age at first use.

A.5.4 Impact of Questionnaire Mode Change on Estimates of Marijuana Use Initiation

The changes in questionnaire mode (i.e., switching from the PAPI to the CAI questionnaire in 1999) could affect the incidence rate estimates, including the fact that the CAI instrument allowed for more internal consistency and more complete responses. In addition, the format of the CAI questionnaire gave the respondent more privacy when answering sensitive questions. The 1999 CAI data showed a higher level of inconsistent data (0.2 percent for 1994-1998 PAPI and 0.5 percent for 1999 CAI). This probably reflected the more comprehensive editing for inconsistencies within the whole substance module employed with the 1999 CAI data. This increased rigor in the edit process produced an increase in inconsistencies in spite of the programmed consistency checks within the CAI instrument.

Table A.10 displays the 12 to 17 and 18 to 25 age-specific incidence marijuana rates for PAPI and CAI. Annual estimates are provided for PAPI combined data for 1994-1998, for 1999 PAPI data, (2) and for 1999 CAI data. For both PAPI and CAI, edited data and the old incidence methodology were used to compute these estimates. For comparability, nonimputed edited data was used because the PAPI data did not have imputed versions of the age at first use data. Only a small number of years showed a significant difference between the CAI and PAPI estimates.

Another way to deal with high variability in annual incidence estimates is to average the annual estimates over several years. This approach found a 5.4 percent increase in CAI marijuana estimates at ages 12 to 17 and a 1.2 percent decrease in CAI estimates at ages 18 to 25 (data not shown).

Even though the statistical results were mixed, there was evidence of some overall increased reporting of drug use initiation under the CAI mode, which in turn increased estimates of incidence rates. Any appreciable effect on mean age at first use could not be concluded.

A.5.5 Summary

Although the estimates for individual years were quite variable, the overall average impact of editing and imputation was to increase incidence rates for both age groups (12 to 17 and 18 to 25) and to increase the estimated number of new initiates. The new incidence rate calculation rules treated respondents as 17 year olds right up (but not including ) their 18th birthday. The old rule classified respondents as 18 year olds for the entire year in which their 18th birthday occurred. This had the effect of increasing the estimates of time at risk and the number of initiates for 17 year olds. However, because the number of initiates was high at age 17, the overall impact was greater on the numerator than the denominator. As a result, the incidence rates for youths aged 12 to 17 increased and the incidence rate for persons aged 18 to 25 usually decreased somewhat.

Mode effects could not be cleanly isolated because of some accompanying changes in the question routing process and supplementary questions on date of first use for recent users that were implemented in conjunction with the implementation of CAI. One somewhat surprising result was that the level of missing or inconsistent data actually increased with the introduction of CAI. The increase in detected inconsistencies may have resulted because of the increased number of checks employed to identify inconsistent data in the post-survey processing. The increase in the proportion of missing age at first use data may have been facilitated by the respondent's option to answer "don't know" or "refused."

The overall impact of the conversion from PAPI to CAI was assessed by comparing the results from the 1999 PAPI and CAI samples using the edited data. The old method of rate calculation was applied to both samples for mode comparison purposes. Annual estimates were highly variable, and few statistically significant differences were identified.

The larger national sample sizes available since 1999 will help make the study of the initiation of substance use more feasible and more precise. The revisions and corrections introduced in the CAI questionnaire, in the coordinated editing and imputation procedures, and in rate computation methodology in 1999 should also increase the utility of the survey data for these purposes. Based on the analyses reported in this section, any comparisons of 1999 and subsequent years' data with data from 1998 and prior years' data should either be avoided or tempered with an understanding of the methodological effects reported earlier.

1  This comparison is partially confounded with respondent recall effects for surveys conducted in different years. Sample sizes for 1999 PAPI data were not adequate to permit a meaningful comparison with 1999 CAI. Note also that for each year beginning with 1994, only initiation prior to that year could be estimated using the PAPI data.

2 The weights applied to the PAPI analysis were the initially computed and calibrated weights without any adjustment to match the distribution of field interviewer experience to prior years.

 

Table A.1. Summary of 2000 NHSDA Suppression Rules

Estimate

Suppress if:

Prevalence rate, p, with nominal sample size, n, and design effect, deff

The estimated prevalence rate, p, is < 0.00005 or >0.99995, or

when p < 0.5, or

when p > 0.5, or

Effective n < 68, or

n < 100

where

Note: The rounding portion of this suppression rule for prevalence rates will produce some estimates that round at one decimal place to 0.0% or 100.0% but are not suppressed from the tables.

Estimated number
(numerator of p)

The estimated prevalence rate, p, is suppressed.

Note: In some instances when p is not suppressed, the estimated number may appear as a 0 in the tables; this means that the estimate is >0 but <500 (estimated numbers are shown in thousands).

Mean age at first use, , with nominal sample size, n

, or
n < 10

Incidence rate,

Rounds to < 0.1 per 1,000 person-years of exposure, or

Number of initiates, 

Rounds to < 1,000 initiates, or

Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.2 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Screening Result Code
Screening Result 1999 NHSDA   2000 NHSDA
Sample Size Weighted
Percentage
  Sample Size Weighted
Percentage
Total Sample 223,868 100.00   215,860 100.00
    Ineligible cases
36,026 15.78   33,284 15.09
    Eligible cases
187,842 84.22   182,576 84.91
Ineligibles 36,026 100.00   33,284 100.00
    Vacant
18,034 49.71   16,796 50.76
    Not a primary residence
4,516 12.90   4,506 13.26
    Not a dwelling unit
4,626 12.70   3,173 9.33
    All military personnel
482 1.22   414 1.21
    Other, ineligible
8,368 23.46   8,395 25.43
Eligible Cases 187,842 100.00   182,576 100.00
Screening complete 169,166 89.63   169,769 92.84
    No one selected
101,537 54.19   99,999 55.36
    One selected
44,436 23.63   46,981 25.46
    Two selected
23,193 11.82   22,789 12.03
Screening not complete 18,676 10.37   12,807 7.16
    No one home
4,291 2.38   3,238 1.82
    Respondent unavailable
651 0.36   415 0.24
    Physically or mentally incompetent
419 0.24   310 0.16
    Language barrier—Hispanic
102 0.06   83 0.05
    Language barrier—other
486 0.28   434 0.27
    Refusal
11,097 5.92   7,535 4.14
    Other, access denied
1,536 1.08   748 0.45
    Other, eligible
38 0.02   7 0.00
    Other, problem case
56 0.03   37 0.02
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.3 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Persons Aged 12 or Older
Final Interview Code 1999 NHSDA   2000 NHSDA
Sample Size Weighted
Percentage
  Sample Size Weighted
Percentage
Total Selected Persons 89,883 100.00   91,961 100.00
Interview complete 66,706 68.55   71,764 73.93
No one at dwelling unit 1,795 2.13   1,776 2.02
Respondent unavailable 3,897 4.53   3,058 3.52
Breakoff 50 0.07   72 0.09
Physically/mentally incompetent 1,017 2.62   1,053 2.57
Language barrier—Spanish 168 0.12   109 0.08
Language barrier—Other 480 1.46   441 1.06
Refusal 11,276 17.98   10,109 14.99
Parental refusal 2,888 1.01   2,655 0.88
Other 1,606 1.53   924 0.86
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.4 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Youths Aged 12 to 17
Final Interview Code 1999 NHSDA   2000 NHSDA
Sample Size Weighted
Percentage
  Sample Size Weighted
Percentage
Total Selected Persons 32,011 100.00   31,242 100.00
Interview complete 25,384 78.07   25,756 82.58
No one at dwelling unit 322 1.09   278 0.86
Respondent unavailable 872 3.04   617 2.05
Breakoff 13 0.03   18 0.05
Physically/mentally incompetent 244 0.76   234 0.76
Language barrier—Spanish 15 0.03   10 0.03
Language barrier—Other 58 0.18   50 0.20
Refusal 1,808 5.97   1,455 4.52
Parental refusal 2,885 9.50   2,641 8.35
Other 410 1.33   183 0.59
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.5 Weighted Percentages and Sample Sizes for 1999 and 2000 NHSDAs, by Final Interview Code, among Persons Aged 18 or Older
Final Interview Code 1999 NHSDA   2000 NHSDA
Sample Size Weighted
Percentage
  Sample Size Weighted
Percentage
Total Selected Persons

57,872

100.00

 

60,719

100.00
Interview complete 41,322 67.41   46,008 72.92
No one at dwelling unit 1,473 2.25   1,498 2.16
Respondent unavailable 3,025 4.71   2,441 3.69
Breakoff 37 0.07   54 0.09
Physically/mentally incompetent 773 2.85   819 2.78
Language barrier—Spanish 153 0.13   99 0.09
Language barrier—Other 422 1.62   391 1.16
Refusal 9,468 19.41   8,654 16.22
Parental refusal 3 0.00   14 0.01
Other 1,196 1.55   741 0.89
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.6 Response Rates and Sample Sizes for the 1999 and 2000 NHSDAs, by Demographic Characteristics
 

1999 NHSDA

  2000 NHSDA

Selected
Persons

Completed
Interviews
Weighted
Response
Rate
  Selected
Persons
Completed
Interviews
Weighted
Response
Rate
Total 89,883 66,706 68.55%   91,961 71,764 73.93%
Age in Years              
    12-17
32,011 25,384 78.07%   31,242 25,756 82.58%
    18-25
30,439 22,151 71.21%   29,424 22,849 77.34%
    26 or older
27,433 19,171 66.76%   31,295 23,159 72.17%
Gender              
    Male
43,883 31,987 67.12%   44,899 34,375 72.68%
    Female
46,000 34,719 69.81%   47,062 37,389 75.09%
Race/Ethnicity              
    Hispanic
11,203 8,755 74.59%   11,454 9,396 77.95%
    White
63,211 46,272 67.98%   64,517 49,631 73.39%
    Black
10,552 8,044 70.39%   10,740 8,638 76.19%
    All other races
4,917 3,635 59.28%   5,250 4,099 67.31%
Region              
    Northeast
16,794 11,830 64.03%   18,959 14,394 71.68%
    Midwest
24,885 18,103 69.63%   25,428 19,355 73.23%
    South
27,390 21,018 70.93%   27,217 22,041 76.38%
    West
20,814 15,755 67.47%   20,357 15,974 72.68%
County Type              
    Large metropolitan
36,101 25,901 65.15%   37,754 28,744 71.77%
    Small metropolitan
30,642 22,612 69.98%   31,400 24,579 74.96%
    Nonmetropolitan
23,140 18,193 74.97%   22,807 18,441 77.58%
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999 and 2000.

 

Table A.7 Comparison of Initiation Rates, by Year of Initiation and Survey Year
  Year of Initiation Average
of Ratio
of 1-Year
Recall to
2-Year
Recall
1993 1994 1995 1996
Year of Survey
1994 1995 1995 1996 1996 1997 1997 1998
Rate for Age 12 to 17
    Marijuana
59.2 53.7 74.2 75.2 75.7 73.6 83.2 75.6 1.055
    Cocaine
8.9 5.0 10.2 5.7 10.6 8.0 11.3 11.0 1.480
    Heroin
0.7 0.5 2.1 1.4 2.5 1.8 3.9 1.5 1.722
Rate for Age 18 to 25
    Marijuana
46.9 41.4 42.1 55.9 47.7 53.4 53.6 50.5 0.960
    Cocaine
12.8 12.8 9.9 11.8 13.8 14.7 14.8 13.9 0.961
    Heroin
0.1 1.4 1.4 2.1 2.4 1.9 2.3 3.0 0.692
Number of Initiates
    Marijuana
2,035 1,783 2,251 2,548 2,368 2,443 2,540 2,384 1.015
    Cocaine
595 538 533 530 652 654 675 664 1.031
    Heroin
41 62 122 97 141 93 171 127 1.195
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1994-1998.

 

Table A.8 Marijuana Annual Age-Specific Rates at First Use Per 1,000 Person-Years of Exposure: 1999
Year New Method -
Imputed Variable
  New Method -
Edited Variable
  Old Method -
Imputed Variable
12-17 18-25   12-17 18-25   12-17 18-25
1965 9.6 7.5   8.9 6.7   9.1 7.7
1966 19.3 32.1   19.5b 32.1   17.9 32.6
1967 20.7 30.8   20.6 30.9   19.1 29.4
1968 20.3 43.5   20.2 41.2   16.3 44.6
1969 34.1 56.7   34.5b 56.4   29.2b 55.7
1970 55.8 47.7   55.0 46.7   49.1b 50.9
1971 46.7 49.7   46.2 48.5   41.8a 54.0a
1972 59.7 50.2   59.7 47.8a   54.3b 52.7
1973 59.0 39.4   59.6a 36.8   53.3b 43.5a
1974 66.6 53.0   65.6 52.4   56.6b 62.2b
1975 65.3 53.7   63.6 53.4   58.3b 56.8
1976 76.5 60.5   75.4 60.1   63.2b 72.3b
1977 86.5 51.2   86.5 50.0   77.9b 57.8a
1978 84.2 49.6   81.7 48.8   75.0b 56.7
1979 86.3 54.5   84.2 54.4   75.7b 63.2b
1980 75.9 48.2   75.5 44.6   66.6b 53.3
1981 51.8 35.1   51.3 35.0   50.1 36.3
1982 59.0 36.3   59.5b 35.1   51.9b 40.7a
1983 55.3 33.1   53.1a 32.5   48.5b 38.8b
1984 58.5 38.5   58.5 38.3   52.1b 43.7b
1985 58.4 38.7   57.8 37.2   51.8b 44.9b
1986 53.2 29.8   53.3 28.4a   47.6b 33.7a
1987 56.1 37.3   54.3 36.7   49.8b 42.4b
1988 55.7 31.6   54.6a 31.4   49.4b 33.1
1989 46.7 26.8   46.2 25.8   40.2b 32.2b
1990 48.4 29.0   48.2 28.6   42.5b 32.7a
1991 46.1 31.9   45.6 31.4   39.5b 37.5b
1992 51.0 30.5   50.4 29.5a   45.7b 34.2b
1993 60.0 36.7   59.7 36.3   53.4b 41.5b
1994 74.3 42.1   72.9b 41.2   67.2b 47.1b
1995 78.3 46.1   76.7b 45.6   70.8b 53.1b
1996 89.9 44.1   87.7b 42.7b   80.0b 52.8b
1997 90.0 45.1   87.3b 44.5a   79.6b 53.8b
1998 82.6 46.5   79.2b 45.6b   73.5b 54.5b
Note: The numerator of each rate is the number of persons in the age group who first used the drug in the year, while the denominator is the person-time exposure measured in thousands of years.
a Difference between the estimate and New Method - Imputed is statistically significant at the 0.05 level.
b Difference between the estimate and New Method - Imputed is statistically significant at the 0.01 level.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1999.

 

Table A.9 Comparison of Numbers of Marijuana Initiates (in Thousands) and Mean Age at First Marijuana Use: 1994-1998 PAPI Versus 1999 CAI
Year Initiates (1,000s)   Mean Age (Years)
1994-1998 PAPI 1999 CAI- Edited Data 1999 CAI- Imputed Data   1994-1998 PAPI 1999 CAI- Edited Data 1999 CAI- Imputed Data
1965 601 442 478   18.95 21.77 21.61
1966 977 1,229 1,234   20.05 18.68 18.68
1967 1,423 1,199 1,210   19.76 18.92 18.91
1968 1,621 1,470 1,533   18.97 18.89 18.91
1969 2,245 2,301 2,317   19.19 19.43 19.43
1970 2,611 2,501a 2,585   19.21 18.20 18.34
1971 2,710 2,403 2,456   18.78 17.87 17.84
1972 2,861 2,676b 2,747   18.62 18.16a 18.24
1973 2,897 2,610a 2,697   18.28 19.03 19.15
1974 2,966 2,873b 2,938   18.50 17.85 17.82
1975 3,128 2,923a 2,989   18.51 18.90a 18.84
1976 2,786 3,216 3,267   18.69 18.38 18.34
1977 2,889 3,195a 3,251   18.95 18.03 18.07
1978 2,846 2,959a 3,046   17.77 18.14 18.10
1979 2,654 2,983b 3,052   18.22 18.22 18.18
1980 2,499 2,564a 2,680   18.41 18.13 18.38
1981 2,115 1,820a 1,840   17.94 18.29 18.26
1982 1,964 2,056a 2,090   18.19 18.85 18.98
1983 2,143 1,889b 1,954   17.85 18.90a 18.77
1984 2,010 2,029 2,040   19.19 18.56 18.54
1985 1,775 1,890a 1,938   17.85 18.05 18.03
1986 1,845 1,604b 1,633   19.32 17.15 17.18
1987 1,756 1,708b 1,763   17.92 17.48 17.45
1988 1,565 1,595a 1,620   17.49 17.49a 17.47
1989 1,371 1,353a 1,388   17.87 17.21 17.29
1990 1,423 1,451a 1,470   17.66 17.42 17.44
1991 1,415 1,519b 1,545   17.47 17.78 17.76
1992 1,644 1,544b 1,578   17.60b 16.49 16.49
1993 1,983 1,930b 1,972   16.96 17.33 17.45
1994 2,380 2,235b 2,293   16.90 16.61 16.59
1995 2,409 2,359b 2,421   16.57 16.65b 16.59
1996 2,462 2,532b 2,616   16.62 17.24b 17.18
1997 2,114b 2,493b 2,571   17.09 17.16b 17.08
1998 2,345b 2,437   17.63b 17.52
— Not available.
a Difference between the estimate and 1999 CAI-Imputed is statistically significant at the 0.05 level.
b Difference between the estimate and 1999 CAI-Imputed is statistically significant at the 0.01 level.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1994-1998 PAPI and 1999 CAI.

 

Table A.10 Comparison of Mode Effect: Marijuana Annual Age-Specific Rates at First Use Per 1,000 Person-Years of Exposure with PAPI and CAI Data
Year PAPI 1994-1998   PAPI 1999 Old Method - Edited   CAI 1999 Old Method - Edited
12-17 18-25   12-17 18-25   12-17 18-25
1965 8.7 13.7a   11.8 13.8   8.4 6.9
1966 13.9 23.5   7.4a 38.8   18.1 32.6
1967 15.6 38.8   18.4 36.7   18.9 29.5
1968 20.1 45.2   35.0a 47.0   16.2 42.5
1969 31.7 54.1   29.0 50.9   29.5 55.3
1970 35.1 64.3   27.1a 50.1   48.3 50.5
1971 40.8 65.9   41.6 49.3   41.4 52.8
1972 48.4 64.1   44.3 69.4   54.5 50.5
1973 60.2 57.7a   52.4 68.0   54.1 40.8
1974 57.6 61.7   68.7 58.0   55.4 61.7
1975 67.8 57.8   48.7 43.7   56.7 56.4
1976 59.5 52.4a   74.0 61.0   62.2 72.1
1977 66.7 50.2   34.5b 58.5   77.8 56.8
1978 75.2 49.9   62.2 61.1   74.9 53.1
1979 60.6 59.0   63.5 45.6   73.7 62.9
1980 59.2 56.0   58.0 47.9   66.4 49.9
1981 54.3 43.1   54.4 69.2a   49.6 36.3
1982 48.2 42.3   55.8 35.5   52.2 39.6
1983 56.4a 45.1   52.1 39.9   46.3 38.2
1984 53.1 38.4   54.3 28.4a   52.2 43.6
1985 48.8 38.6   52.6 37.9   51.2 43.4
1986 48.4 41.3a   38.4 43.4   47.7 32.3
1987 48.4 40.5   41.6 29.8   48.1 41.7
1988 44.9 36.9   39.8 29.3   48.5 32.9
1989 37.0 32.8   38.1 40.4   40.0 31.1
1990 36.9 36.6   38.9 33.8   42.3 32.3
1991 38.4 34.0   45.4 23.3a   39.0 37.0
1992 44.5 37.0   44.7 25.4   45.2 33.5
1993 55.1 45.9   65.3 39.1   53.1 41.2
1994 72.8 47.9   76.7 41.7   66.0 46.1
1995 74.1 52.6   72.5 47.7   69.1 52.5
1996 79.3 52.1   56.3b 56.8   77.9 51.3
1997 64.4a 47.1   84.4 57.5   76.9 53.2
1998   55.2a 53.5   70.2 53.2
— Not available.
a Difference between PAPI and CAI is statistically significant at the 0.05 level.
b Difference between PAPI and CAI is statistically significant at the 0.01 level.
Source: SAMHSA, Office of Applied Studies, National Household Survey on Drug Abuse, 1994-1998 PAPI and 1999 CAI.

Table Of Contents
This is the page footer.

This page was last updated on June 16, 2008.

SAMHSA, an agency in the Department of Health and Human Services, is the Federal Government's lead agency for improving the quality and availability of substance abuse prevention, addiction treatment, and mental health services in the United States.

Yellow Line

Site Map | Contact Us | Accessibility Privacy PolicyFreedom of Information ActDisclaimer  |  Department of Health and Human ServicesSAMHSAWhite HouseUSA.gov

* Adobe™ PDF and MS Office™ formatted files require software viewer programs to properly read them. Click here to download these FREE programs now

What's New

Highlights Topics Data Drugs Pubs Short Reports Treatment Help Mail OAS