Medicines Exposures: Collection, Coding, and Classification Workshop 

Skip Navigation

Last Reviewed:  6/1/2008
Last Updated:  8/9/2005

Medicines Exposures: Collection, Coding, and Classification Workshop 

December 16, 2002
Baltimore Marriott Waterfront
Baltimore, MD

This meeting was held in conjunction with the National Children’s Study, which is led by a consortium of federal agency partners: the U.S. Department of Health and Human Services (including the National Institute of Child Health and Human Development [NICHD] and the National Institute of Environmental Health Sciences [NIEHS], two parts of the National Institutes of Health, and the Centers for Disease Control and Prevention [CDC]) and the U.S. Environmental Protection Agency (EPA).

Welcome and Opening Remarks

Paul Seligman, M.D., M.P.H., Center for Drug Evaluation and Research, U.S. Food and Drug Administration (FDA), DHHS, said that the workshop objectives were to provide the National Children’s Study (Study) with an updated understanding of exposures to prescription and over-the-counter (OTC) pharmaceuticals, dietary supplements and herbals, and the means by which to measure them. Workshop participants were charged with providing recommendations to the Study on best practices.

How Did the U.S. 1958-1965 Collaborative Perinatal Project Work?

Louis Vernacchio, M.D., Boston University, provided an overview of this study, which was conducted by the NIH predecessor of the National Institute of Neurological Disorders and Stroke. The primary objective was to assess prenatal and perinatal risks for cerebral palsy and other neurological states. Twelve university medical centers enrolled more than 58,000 pregnant women. The follow-through period ran through the child’s eighth birthday. Investigators collected data on prescription and OTC products but not on topicals, laxatives, vitamins, minerals, or antacids. The use of herbals was not significant.

Study records were unclear as to how often mothers were asked about use of the substances of interest. The interviews were described as "nonprompted." That is, the first question was whether the subject had used any of the study drugs and went into detail only if the subject responded in the affirmative. Compared with currently available products, the OTC product list was limited, suggesting that the job of developing categories for modern OTC products could be formidable.

Dr. Vernacchio noted that such studies require rigorous data collection, with special attention paid to developing clear timelines of exposure and use of interview techniques that enhance subject recall. The drug-coding scheme for such a study should be based on pharmacologically active ingredients, with unique codes for those ingredients.

Perspective of Modern Cohorts

Graham Colditz, M.D., Harvard University Medical School, discussed the Nurses Health Study (NHS), which began in 1976 as a one-page questionnaire to assess the use of oral contraceptives (OCs) and gauge their effect on breast cancer and other pathologies. Dr. Colditz explained that the initial layout of the questionnaire led to errors. This first phase of the study did not collect information on brand or dose.

NHS II, which began in 1989, drew a younger cohort with part of the focus being on the effects on women who began using OCs during adolescence. The questionnaire broke down use by year of life, starting at age 13, with terms of at least 10 months as a minimum duration of use. NHS II attempted to validate subject data by contacting physicians, but these records did not always list the brand. The questionnaire in the NHS II included multivitamin use, but phrasing proved nettlesome because of ambiguity as to what constitutes a multivitamin. However, many of the measures matched serum markers, which largely validated subject self-reporting. Postmenopausal hormones have also been examined in the NHS II, including a review of dosing and prescription patterns.

Key points of using a questionnaire in such a study include:

  • Do subjects recognize the product as named?
  • Is it prescribed? If not, it may be underreported.
  • Do subjects see the data tables the same way researchers see them?
  • Keep it simple. There are too many drugs out there to not take care to engineer questionnaires carefully.

What Can Health Plans Contribute?

Jonathan Finkelstein, M.D., M.P.H., Harvard Pilgrim Health Care, described health care systems as managed care organizations/insurers and integrated delivery systems, including health maintenance organizations (HMOs) and other large group practices. These organizations can provide a range of data, including frequency and type of service, demographic data, drug-dispensing information, and much more. Information technology (IT) systems make data even more accessible, even though such systems have not been designed as study data systems. The resulting information describes subject age, duration and dates of enrollment, diagnoses, pharmacy benefit, National Drug Code (NDC) categories, and more. Patient records can help validate patient claims regarding use of pharmaceuticals.

The HMO Research Network Center for Education on Research and Therapeutics (CERTs) investigated pediatric antibiotic use from 1995 to 2000 by randomly sampling 25,000 children from 9 health plans. They drew substantial information about decreasing use of antibiotics with this database. CERTs is currently studying prescription drug use by women during pregnancy to evaluate how commonly these women are exposed to contraindicated drugs. Dr. Finkelstein noted that this study demonstrated that health plan data gave a study "great power with relatively little cost."

Dr. Finkelstein commented that using the NDC system is easier for new, proprietary agents and somewhat more complex for classes of drug and products with many producers. Distinguishing between forms (for example, topical versus systemic) requires careful attention. CERTs has developed methods for sharing NDC categories among the plans they work with in order to standardize drug data extraction and product categorization.

A limitation of health plan data is that these data measure pharmaceuticals dispensed rather than pharmaceuticals consumed. Dr. Finkelstein added that such a data set will not capture samples dispensed by physicians, OTC purchases, alternative therapies, or any drugs purchased outside the plan, which may occur due to convenience or due to availability at a cost lower than a plan’s co-pay. He explained that proprietary NDC systems are popping up, that multiple diagnoses can confound the effort, and that laws pertaining to privacy will need to be followed in any research using these data. The benefits of health plan data are that they include large, well-defined populations and often draw from populations not likely to participate in clinical trials. This data source presents researchers with efficient, unbiased assessments of exposures.

Collecting Information Directly from Patients

Allen Mitchell, M.D., Boston University, expanded on three themes during his presentation:

  • What exposure information is needed?
  • Why get it from subjects?
  • How does one obtain valid data from subjects?

Dr. Mitchell noted that the Study has targeted the collection of information on exposures to include prescription and OTC drugs, dietary supplements (such as multivitamins), and herbals. He commented that ideally, the Study should identify the agent, condition prompting use, start and stop dates, dose, route, frequency, dose form, and variations in compliance with prescription guidelines or, in the case of nonprescription products, with recommended use. The sources of information on these various exposures include a physician/health care professional, pharmacy, retail store, and mail order. Although selected sources can systematically capture information on receipt of prescription drugs, study subjects are the "final common pathway" for all exposures. If adequately queried, subjects can provide reliable information on exposures to prescription drugs, and they may be the only source for information on OTC/supplement use.

Slone’s Birth Defect Study (BDS), from 1976 to the present, has obtained medication histories of mothers of more than 24,000 infants. Study interviewers enhance recall through inquiries based on illnesses/indications, drug categories, and specific drug names. BDS included a fictitious drug name to assess false responses; none of the more than 8,000 subjects reported exposure to this "drug." Subject recall is likely to depend on the severity of the condition for which a drug is used, the duration of use, and the interval that has elapsed between last use and the inquiry. For example, aspirin or acetaminophen, if taken for only one day for a headache, will not likely be recalled after a few months have passed, whereas an anticonvulsant taken for at least 6 months is likely to be recalled even after a relatively long time.

Experience from an Existing Cohort

Jørn Olsen, M.D., Danish Epidemiology Science Center, discussed the National Birth Cohort in Denmark, 1996-2004. "Better Health for Mother and Child" is a nationwide prospective study of 100,000 pregnant women and their children. Subjects in this study are recruited by materials available at doctors’ offices. This study aims to assess fetal growth and influences, infections, diet, medications, lifestyle/environment, and other variables. Part of the study involves a questionnaire on diet mailed to subjects’ homes. Telephone interviews (generally less than 15 minutes each) are conducted at 12 weeks, 30 weeks, 6 months, and 18 months.

Whenever possible, study investigators obtain prospective data. The Danish National Registry for prescription drugs has been released, which will link with the birth cohort. However, the registry does not cover OTCs. Drug use by pregnant women in Denmark’s Jutland Region was compared with purchasing records. The outcome indicated that pregnant women did not consistently use prescription drugs.

Dr. Olsen noted that only half the primary care practitioners they approached tried to recruit women, with compensation cited as a problem. Subjects were encouraged to consider their commitment before enrolling. Study endpoints will be grafted on from other data sources, such as hospital treatment records and so on. The interval endpoints of 6 and 18 months (baby’s age) allow some endpoint data generation. The six-month interview interval is geared to catching women before they go back to work, an approach used because Danish law provides a six-month maternity leave.

Question-and-Answer Session

Participants discussed the importance of collecting interpartum hospital exposure information and noted that data collection issues include subject reporting versus institutional collection.

A participant asked whether drugs and treatments borrowed to treat socially unacceptable diseases would be underreported, but the moderator noted that focused inquiry, anonymity, courtesy, and a nonjudgmental approach would help bolster accurate reporting of such situations.

The use of diaries was offered as a possible means of recording data, but problems included lack of compliance and self-selection bias. Diaries might be useful as supplementary sources of information. Audio and Internet diaries were also discussed.

Computer-based questionnaires were described as an economical means of obtaining subject responses, but validation studies have not been performed. The relevant population of mothers would have to be experienced enough with computers that such a notion would not be technically daunting. An in-depth interview may become necessary at some point.

Coding and Classification of Drug Data

Katherine Kelley, R.Ph., Boston University, described the Slone Drug Dictionary, which was developed specifically for use in epidemiologic studies of drug effects. The dictionary describes more than 14,500 agents; lists more than 7,300 multicomponent product codes; and includes prescription products, OTCs, dietary supplements, and herbals. Each chemical/product is indexed by chemical/biological name as well as by trade name. The dictionary contains some information on excipients and will include genus/species and part of the plant used in coding herbals when known. Products are classified by therapeutic categories and "coalitions," a grouping that can be determined by active ingredient (or combinations of active ingredients). Multicomponent products are assigned a unique number as they are encountered, and individual components are also assigned specific codes. The coalition system of categories can be tailored to meet specific research needs. A coalition includes any product that contains a designated agent, whether as the primary secondary agent.

Comparisons of Existing Commercial Systems in the United States

William Fant, Pharm.D., University of Cincinnati, discussed a number of commercial drug coding systems, all of which are based on FDA’s NDC Directory, found at (www.fda.gov/cder/ndc/database/default.htm). These systems were developed as pricing systems but have been modified to include other information necessary for dispensing of medications, including patient consultation guides, therapeutic substitution codes, and warning label codes. The original intent of the NDC was to allow pharmacies to obtain reimbursement for Medicare out-of-hospital benefits. The NDC is restricted to human drugs and a few OTCs. The 10 digits are arranged to indicate the drug, dose size, and package size, in a 4-4-2 configuration. The NDC system has been expanded for some products to 11 digits in a 5-4-2 configuration. The commercial pricing systems indicate the configuration used for specific products. As a result of mergers and acquisitions, some NDC codes have been reused, and some manufacturers have not followed the original guidelines in the assignment of the NDC codes.

There is a need to review classification by chemistry, pharmacology, or therapeutic use. The commercial systems were not designed to track and record changes in products over time. They represent a moment in time and provide no historical reference. Commercial systems do not track unapproved uses. For example, the Red Book™ Data Services database has 180,000 line listings, with more than 89,000 active listings with 111 fields. Of those fields, 50 refer pricing for private and various government compensation. MediSpan’s Master Drug Database is a flat database with lookup tables and a 14-digit hierarchical lookup code. FirstDataBank has a relational database that provides the most comprehensive database.

U.S. Interagency Efforts to Rationalize Drug Terminology

Stuart Nelson, M.D., National Library of Medicine, NIH, DHHS, pointed out that the Study must assume that terminology will change over time and that researchers need to exploit IT capabilities to control for these changes. The Unified Medical Language System (UMLS) is designed to retrieve and integrate relevant information from computer-based patient records, among others. The UMLS Metathesaurus is built on the notion that the concept should drive the system, giving the look-up a "name that never changes." Dr. Nelson emphasized that no single vocabulary can serve all purposes and that issues such as maintenance and proprietary attitudes deepen the problem. Key points are the use of a concept orientation, tracking meaning as it changes over time, and facilitating interoperation between vocabularies.

There are many supporters of a standardized vocabulary, and the Health Insurance Portability and Accountability Act allows for mandated standards in clinical systems in the United States. UMLS will provide a common distribution format for the standard vocabularies included in the UMLS. The effort has to account for identical end products with different names (some chemicals have the same problem). Clinical drug classification will hinge on a finished product, with information on ingredients, strength, and dose form. Dr. Nelson said that NDC codes are too "granular." For nomenclature, the goals are to define standardized representational format and relate UMLS clinical drugs to those standard forms. The prescription norms key points are to relate brand names to generics. The NDC undergoes about 4,000 changes a month, but there are only about 30 legitimate new molecular components coming online each year. UMLS has difficulty in tracking herbals and other unlicensed products. UMLS does not categorize by excipients, but such information may eventually be incorporated with help from FDA.

General Discussion: Coding

A participant commented that a certain level of detail requires competent personnel and that a coding system must reflect epidemiological imperatives rather than those of the pharmaceutical or pharmacologic laboratory. A participant suggested that the coding system should not be "too crude" because inactive ingredients, for instance, are sometimes the substance of interest. Another participant commented that where biological agents are concerned, there is no such thing as a generic. A participant noted that it is difficult to come up with a coding scheme when the Study hypotheses are not fully determined. Some of the variables in drug use are likely beyond any capture for the Study, due simply to the fact that collecting relevant data is impractical.

On the matter of intervals of measurement, a participant suggested that the onset of puberty might be an opportune time, because there are many health issues that arise during the teen years. Another participant commented that it might be interesting to sample otherwise healthy children to establish their medication use and determine if there are any revealing indicators. A participant with a background in clinical data systems asked everyone to consider the nature of the information and how it would look in the future as well as at present. Although IT systems will not consume a lion’s share of the Study budget, the better these needs can be anticipated, the more quickly and responsively IT contractors can respond.

Speakers/Presenters

Graham Colditz, M.D., Ph.D., Harvard University Medical School
William Fant, Pharm.D., University of Cincinnati
Jonathan Finkelstein, M.D., M.P.H., Harvard Pilgrim Health Care
Katherine Kelley, R.Ph., Boston University
Allen Mitchell, M.D., Boston University
Stuart Nelson, M.D., National Library of Medicine, NIH, DHHS
Jørn Olsen, M.D., Ph.D., Danish Epidemiology Science Center
Paul Seligman, M.D., M.P.H., Center for Drug Evaluation and Research, FDA, DHHS
Louis Vernacchio, M.D., M.Sc., Boston University

Other Participants

Kathryn Aikin, Ph.D.
Arthur M. Bennett, M.E.A., B.E.E.
Mark Cosentino, D.P.M., Ph.D.
Christina D. Chambers, Ph.D., M.P.H.
Janet D. Cragan, M.D.
Carry W. Croghan, M.S.
Matthew W. Gillman, M.D., S.M.
Gary L. Ginsberg, Ph.D.
Adrienne B. Goslee, M.S., M.P.H.
Gilman Grave, M.D.
Doris Haire
Janet R. Hardy, M.P.H., M.Sc.
Ralph E. Kauffman, M.D.
Dianne L. Kennedy, R.Ph., M.P.H.
Carole Kimmel, Ph.D.
Michele Koppelman, M.A.
Sandra Kweder, M.D.
Tamar Lasky, Ph.D.
Donald R. Mattison, M.D.
Ruth B. Merkatz, Ph.D., R.N., F.A.A.N
Karin Nelson, M.D.
Chaichana Nimnuan, M.D., Dr.Ph.
Caitlin C. Oppenheimier, M.P.H.
Gladys Reynolds, Ph.D.
William J. Rodriguez, M.D., Ph.D.
Kathi Shea, B.S.
Erik Svendsen, Ph.D., M.S.
Kathleen Uhl, M.D.
Robert Ward, M.D.
Sumner J. Yaffe, M.D.