Evaluation of Cervical Cytology

THIS EVIDENCE REPORT IS OUTDATED AND IS NO LONGER VIEWED AS GUIDANCE FOR CURRENT MEDICAL PRACTICE. IT IS MAINTAINED FOR ARCHIVAL PURPOSES ONLY.

U.S. Department of Health and Human Services
2101 East Jefferson Street
Rockville, MD 20852
http://www.ahcpr.gov

Prepared by:
Duke University, Durham, NC
Douglas C. McCrory, MD, MHSc
David B. Matchar, MD
Co-Project Directors

Lori Bastian, MD
Santanu Datta, MS, MBA
Victor Hasselblad, PhD
Jason Hickey, MS
Evan Myers, MD, MPH
Kavita Nanda, MD
Investigators

The Agency for Health Care Policy and Research (AHCPR), through its Evidence-based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHCPR and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHCPR encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHCPR expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the health care system as a whole by providing important information to help improve health care quality.

We welcome written comments on this evidence report. They may be sent to: Director, Center for Practice and Technology Assessment, Agency for Health Care Policy and Research, 6010 Executive Blvd., Suite 300, Rockville, MD 20852.



John M. Eisenberg, M.D.	Douglas B. Kamerow, M.D.
Administrator	Director, Center for Practice and Technology Assessment
Agency for Health Care Policy and Research	Agency for Health Care Policy and Research

The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Health Care Policy and Research or the U.S. Department of Health and Human Services of a particular drug, device, test treatment, or other clinical service.

This report compares new technologies for cervical cytological screening with conventional Papanicolaou (Pap) test screening in terms of diagnostic accuracy, costs, effectiveness, and cost-effectiveness in adult women of average cervical cancer risk

Published literature on the accuracy of cervical cytological screening, costs of screening and treatment, and cost-effectiveness were identified in MEDLINE, CINAHL, CancerLit, EconLit, HealthSTAR, and EMBASE databases.

Diagnostic test studies were included if they compared cervical cytology diagnosis with concurrent colposcopy or biopsy and provided estimates of sensitivity and specificity. For the new technologies, studies were also included that used a cytology reference standard and allowed estimation of either sensitivity or specificity. Articles on costs and health outcomes were selected if they assessed the effect of screening on life expectancy or quality of life, number of cases of cervical cancer, or total health care costs.

For diagnostic test studies, paired reviewers independently abstracted sensitivity and specificity data from each study. Quality scores were assessed on blind interpretation of screening test results, histological reference standard, verification of test negative subjects, description of disease spectrum, avoidance of bias in sample selection, publication type, and source of support. Diverse articles on costs and health outcomes were summarized and quality-scored according to criteria published by an expert panel.

Supplemental analyses include a meta-analysis to generate summary estimates of Pap test discrimination; cost analysis using claims databases to generate costs of treatment and screening; and a Markov model to estimate the effectiveness and costs of different technologies and clinical strategies.

Conventional Pap smear screening, based on the few studies that avoided severe biases, showed specificity of 98 percent (95 percent confidence interval (CI); 97-99 percent) and sensitivity of 51 percent (95 percent CI; 37-66 percent). The sample prevalence of disease is strongly related to between-study variability in Pap test sensitivity and specificity and may reflect bias. Other indicators of study quality were not significant when prevalence was controlled. The Pap test is more accurate when a high-grade squamous intraepithelial lesion threshold is used with the goal of detecting a high-grade lesion than when lower thresholds, such as a low-grade squamous intraepithelial lesion (LSIL) or atypical squamous cells of uncertain significance (ASCUS), are used with the goal of detecting low- or high-grade dysplasia. Few studies of the new technologies used histology or colposcopy as a reference standard or allowed estimates of both sensitivity and specificity. In studies using a cytology reference standard, each of the new technologies appears to significantly improve sensitivity relative to conventional Pap smear screening; however, little information is available on the effects on specificity.

Cost-effectiveness ratios from published models comparing Pap smear screening with no screening fall into an acceptable range, but these models used parameters that overstate Pap test accuracy.

Base case estimates of the incremental cost-effectiveness of conventional Pap screening every 3 years compared with no Pap screening is $4,097 per life-year saved. A technology applied to the initial step in Pap screening that reduces the false negative rate by a factor of 0.6 at an incremental cost per slide of $10 has an incremental cost of $22,010 per life-year saved when performed every 3 years. With more frequent screening intervals, the cost per life-year saved is greater than $50,000. Technologies that allow 100 percent rescreening of slides initially read as normal by conventional Pap screening, at a reduction in false negative rate of 0.85 or higher, are more effective than technologies that improve initial screening with a reduction of 0.6. At these reductions in false negative rate, with identical incremental costs of $10 per slide and a screening interval of every 3 years, the cost per life-year saved of rescreening technologies compared with improved initial screening is $45,375. Findings were relatively insensitive to assumptions about cervical cancer incidence, cost of technologies, diagnostic strategies for abnormal screening results, and age at onset of screening. Findings were sensitive to both the reduction in false negative rate (i.e., improvement in sensitivity) and the relative specificity of the technologies compared with conventional Pap.

Estimates of the sensitivity of the conventional Pap test are biased in most studies; based on the least biased studies, sensitivity is near 50 percent, much lower than generally believed. Newer technologies improve sensitivity compared with conventional Pap screening; however, there are no precise estimates for their effect on specificity. Under assumptions favorable to improved initial screening technologies and rescreening technologies, either approach can result in acceptable cost per life-year saved at 3-year Pap screening intervals. However, the imprecision in estimates of effectiveness and cost of the new technologies makes drawing firm conclusions about their relative cost-effectiveness problematic.

This document is in the public domain and may be used and reprinted without permission, except for those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

McCrory DC, Matchar DB, Bastian L, et al. Evaluation of Cervical Cytology. Evidence Report/Technology Assessment No. 5. (Prepared by Duke University under Contract No. 290-97-0014.) AHCPR Publication No. 99-E010. Rockville, MD: Agency for Health Care Policy and Research. February 1999.

Worldwide, carcinoma of the cervix is one of the most common malignancies in women. It is estimated that approximately 13,700 new cases of the disease will occur in the United States in 1998. A woman's lifetime risk of being diagnosed with cervical cancer in the United States is currently 0.83 percent, and the risk of dying from the disease is 0.27 percent.

The incidence of cervical cancer and associated mortality have each decreased over 40 percent since 1973; the decreases are largely attributable to the success of mass screening using the Papanicolaou (Pap) test to diagnose premalignant or early-stage cases. The decreases in invasive cervical cancer incidence and mortality since the introduction of the Pap smear have been so dramatic that it is one of the few interventions to receive an "A" recommendation from the U.S. Preventive Services Task Force even though there are no randomized trials demonstrating its effectiveness.

Despite the indisputably dramatic impact of Pap screening, there is still uncertainty about the details of Pap smear performance, and much could be done to improve the performance of the test and followup of patients after screening. Controversy about the details of Pap smear performance is manifest in differing recommendations about the frequency of screening and the age (if any) at which screening may safely be stopped. A significant proportion of patients and providers fail to comply with even the least demanding recommendations for Pap screening frequency. Numerous barriers to screening have been identified that reduce access to Pap smears and other preventive services.

Recently, efforts to improve Pap smear performance have focused on reducing the number of false negative smears, that is, cases in which premalignant or malignant cells have been misdiagnosed as normal. Measures adopted to improve laboratory performance on this point include manual rescreening of a portion of slides initially evaluated as negative, an approach mandated by Federal law (Clinical Laboratory Improvement Amendments [CLIA]). Recently, several technologies have been developed to optimize Pap test screening by reducing the false negative rate. These technologies are a major focus of this report.

1. What is the accuracy of cervical cytology using conventional Pap smears and new technologies (thin-layer cytology, computer rescreening, algorithm-based decision making technology) for detecting cervical cancer and its precursors?

2. What are the direct medical costs associated with cervical cancer screening, evaluation, treatment, and followup of cervical cytological abnormalities and treatment and followup of cervical cancer?

3. What are the effects on total health care cost, morbidity, and mortality of regular cervical cytological screening using thin-layer cytology and computer rescreening using neural network or algorithm-based decisionmaking technology compared with the conventional Pap smear in women participating in a screening program?

On the first point, the report will review published studies comparing cervical cytological diagnosis with clinical diagnosis based on colposcopy or biopsy. The results of this review will form the basis for a meta-analysis.

On the second point, the report will identify and examine current claims data and other datasets to estimate empirically costs associated with cervical cytological screening.

On the third point, the report will review the literature on the effectiveness and cost-effectiveness of cervical cytology screening and use these data to develop a comprehensive cost-effectiveness model to examine the impact of the newer screening technologies. In the absence of definitive clinical trials on key questions of cervical cancer screening, policymakers have relied on decision-modeling studies to integrate epidemiological data on the natural history of cervical cancer precursors, data on the performance of diagnostic tests for early cervical cancer or cervical cancer precursors, and data on cost. These models estimate the efficacy of various screening programs, balance estimated efficacy against estimated cost, and lead to decisions about appropriate screening intervals and age cutoffs.

Recent developments in specimen processing and interpretation may substantially improve the Pap smear as a diagnostic test for cervical cancer and cancer precursors. Three new devices recently approved by the Food and Drug Administration (FDA) are considered in this report: ThinPrep, Papnet, and AutoPap. The three devices employ three different types of technology: thin-layer cytology (ThinPrep) and computerized rescreening utilizing neural-network technology (Papnet) or algorithmic classification (AutoPap).

Each of these technologies was developed to reduce the false negative rate associated with cervical cytological screening. The two major components to this false negative rate are false negatives related to sampling error and false negatives related to detection error. About two-thirds of false negatives are a result of sampling error and the remaining one-third a result of detection error. Each of the new technologies is directed at one of these components of false negatives. Thin-layer cytology aims primarily to fix sampling error, whereas computerized rescreening targets detection error. This implies that neither technology will be able to reduce false negatives beyond a certain threshold.

One newly approved device, Papnet, uses neural-network computerized rescreening of Pap smears initially read as negative by a cytotechnologist.The system works by using automated computerized imaging of Pap smear slides and interpretation of images using a computerized algorithm to identify slides that are likely to contain abnormal cells. The Papnet system (Neuromedical Systems, Inc.) identifies cells or clusters of cells that require review and can display up to 128 images of the slide likely to contain abnormalities. These images can be reviewed by a cytotechnologist who can decide whether or not to review the slide using light microscopy.

AutoPap 300 QC system (Neopath, Inc.), an algorithm-based decisionmaking technology, identifies slides exceeding a certain threshold for the likelihood of abnormal cells. The laboratfory can select different thresholds corresponding to 10, 15, and 20 percent review rates. In contrast to random rescreening, the population of slides selected by the AutoPap 300 QC system is enriched with abnormalities and, at the 10-15 percent sort rate, this population of slides should contain 70-80 percent of the slides containing abnormalities missed by manual screening.

A variety of other technologies or clinical strategies have been proposed to improve Pap testing including various devices for collecting a cytological sample from the cervix. Still other technologies have been proposed to augment or replace cervical cytological screening, including colposcopic photographs for review by experts (cervicography) and DNA testing for specific human papillomavirus (HPV). These technologies are not considered in the present report.

The primary target population for this evidence report is women of average cervical cancer risk in the United States who are candidates for Pap smear screening. For the purposes of our analysis, candidates for Pap smear screening include women between the age of onset of sexual activity and the age of 85.

Although a large proportion of cervical cancer occurs in women with very limited or no screening, we did not examine programs or policies designed to improve screening compliance. Some previous studies have focused on special populations such as elderly women and elderly women who have not previously been screened.

The principal practice setting considered is the primary care practice in the United States (general internal medicine, family practice, adolescent medicine, and obstetrics/gynecology) and government and nongovernment family planning clinics (e.g., Planned Parenthood, public health clinics).

The comprehensive review of the literature, from identification of databases through abstraction of individual articles into the evidence tables, was a multistep, sequential process. This process is detailed below

MEDLINE, CancerLit, HealthSTAR, CINAHL, EMBASE, and EconLit computerized database searches, supplemented by manual journal searches and querying experts and device manufacturers, were the sources used to identify English language reports on the accuracy of cervical cytological screening, costs associated with screening and treatment, and cost-effectiveness.

Citations for the review of accuracy of cervical cytological testing were retrieved with a search strategy that combined various text word and index terms for cervical cytological tests with cervical cancer or dysplasia and sensitivity and specificity. The strategy to retrieve articles on the costs and health outcomes associated with cervical cancer screening combined cervical cytological test terms with terms describing cost analysis and mathematical modeling. Experienced librarians assisted with the design and translation of these search strategies for each database searched.

Separate sets of criteria for including articles in the evidence report were developed for the two topics that were the subject of literature reviews (diagnostic testing and cost and health outcomes). In each case, final screening criteria were developed through an iterative process. Each iteration of criteria was pilot-tested by each reviewer/abstractor on a subset of randomly chosen articles.

Articles on diagnostic testing were first screened based on information available through the online databases (primarily title, authors, and abstract when available). Citations were eliminated in Step 1 of the screening process if cervical cytology was not evaluated as a screening test or if the screening test results were not compared with a reference standard. In Step 2 of the screening process, full texts of articles were reviewed to select articles in which a reference standard of colposcopy or histology was used, the screening test and references standard were reasonably concurrent (i.e., within 3 months), and sufficient data to calculate both sensitivity and specificity were provided (i.e., all cells of a two-by-two table). Of the 939 bibliographic references reviewed, 561, or approximately 60 percent, were excluded during the first screening, and another 293, or 31 percent, during the second screening. Eighty-six articles were included according to these criteria: 84 studies of conventional Pap screening and one study each of ThinPrep and Papnet. Because so few studies of the new technologies met the original criteria, we modified the criteria to include studies of the new technologies that used a cytology reference standard and allowed estimation of either sensitivity or specificity. We considered a total of 59 studies (12 on AutoPap, 27 on Papnet, and 20 on ThinPrep) during this final stage of the screening process (Step 3). The net result was the inclusion of 6 studies of AutoPap, 11 of Papnet, and 8 of ThinPrep.

Articles on cost and health outcomes of cervical cytological screening were selected if they assessed the effect of screening on life expectancy or quality, number of cases of cervical cancer, or total health care costs for any of the following cytological screening technologies: conventional Pap smears, thin-layer cytology, or Pap smears with computerized rescreening. Of the 672 articles identified, 638, or 95 percent, were eliminated during the screening process. Thirty-four articles were included in the review.

Key information was abstracted onto specially designed forms and verified by either duplicate abstraction (two-by-two tables) or overreading by paired clinician-abstractors. Differences were resolved by consensus.

For the diagnostic testing articles, both members of each abstractor team also independently completed two-by-two tables for each study, extracting the key data to calculate sensitivity, specificity, and prevalence and other data to be used in the meta-analysis. The main outcome measures considered were the sensitivity and specificity of cytological abnormality by Pap test for detecting cases, where "cytological abnormality" was defined by one of three thresholds ranging from atypical squamous cells of uncertain significance (ASCUS) (threshold 1) to low-grade squamous intraepithelial lesion (LSIL) (threshold 2) to high-grade squamous intraepithelial lesion (HSIL) (threshold 3), and where a "case" was defined as a histological diagnosis of dysplasia or carcinoma. Equivalent categories in other classification schemes were also used. Two-by-two tables were constructed for four different combinations of cytological versus histological thresholds: ASCUS/ cervical intraepithelial neoplasia (CIN1), LSIL/CIN1, LSIL/CIN2-3, and HSIL/CIN2-3.

Quality scores for articles on diagnostic testing were assigned according to predetermined methodological criteria based on blind interpretation of screening test results, use of a reference standard of histology, selection of test negative patients for verification, avoidance of bias in sample collection, description of the spectrum of disease in the sample, publication as a full report (as opposed to abstract), and source of support.

The quality of articles on costs and health outcomes was described according to recently published criteria by an expert panel on cost and effectiveness in medicine.

We used the effectiveness score to combine data from multiple studies describing the performance of the conventional Pap test in discriminating between patients with and without cervical lesions. The effectiveness score takes account of both sensitivity and specificity by fitting a receiver operating characteristic (ROC) curve through a logistic odds transformation of the two and thus accounts for their interdependence. The effectiveness score is more normally distributed than either sensitivity or specificity and can be thought of as a gauge of the overall discriminatory ability of the test. Standardized effectiveness scores can be interpreted across different diagnostic tests. In general, a score of 3 reflects a test with good discrimination, whereas a score of 1 reflects a test that does not discriminate between disease positives and disease negatives.

We used maximum likelihood estimation techniques and a random effects model to calculate summary measures of effectiveness at each of the four explicit diagnostic thresholds (ASCUS/CIN1, LSIL/CIN1, LSIL/CIN2-3, HSIL/CIN2-3). We further evaluated the effect of variations in disease prevalence and in quality of study design and reporting on test discrimination.

Several available datasets were analyzed to estimate direct medical costs of screening, diagnosing, and treating cervical cancer, calculating separate estimates for women 20-64 years of age and those 65 years and older (eligible for Medicare). For women 20-64, the unit cost of screening, diagnosis, and treatment of cervical cancer was estimated from MEDSTAT data from 1992, 1993, and 1994, inflated to reflect 1994 charges and converted to costs using 1994 cost-to-charge ratios published by the American Hospital Association.

For women over 65, Medicare's resource-based relative value scale (RBRVS) fee schedule for physician services, Medicare's clinical laboratory fee schedule for laboratory services, and national average Diagnosis-related group (DRG) payments for hospital admissions were used to identify the payments associated with services received for cervical cancer screening, diagnosis, and treatment. Charges and payment information obtained from all sources were then converted to reflect costs associated with the services provided and all costs were inflated to 1997 dollars.

We constructed a 20-state Markov model that follows a cohort of women from age 15 to 85 and assumes that there are no prevalent cases of HPV infection or squamous intraepithelial lesion (SIL) at age 15. Cycle lengths are 1 year long. No Pap smear screening is compared with the following screening strategies: conventional Pap smears at 1-, 2- and 3-year intervals, thin-layer cytology smears at 1-, 2- and 3-year intervals, and 100 percent computerized rescreening at 1-, 2- and 3-year intervals.

We used a U.S. health system perspective and evaluated the direct and health-care specific costs associated with screening, diagnosis, and treatment of cervical cancer and its precursors. We did not consider other societal costs such as work loss. The model considers the following outcomes: cost per year of life saved, cost per cervical cancer death prevented and per cervical cancer case prevented, and the number of morbid therapies avoided.

We discounted costs and years of life at 3 percent annually in the base case and varied the discount rate from 0 to 5 percent in a sensitivity analysis. Specific parameter estimates were derived from a preliminary literature assessment conducted for this report and prior published models of cervical cancer screening.

Important findings regarding the discrimination about the accuracy of cervical cytological screening include the following:

The accuracy of the Pap test is strongly affected by disease prevalence. Higher disease prevalence is associated with higher estimates of sensitivity and lower estimates of specificity (with a greater effect on specificity). These findings are consistent with prevalence as a marker for workup bias and perhaps also reflect an imperfect reference standard that is more specific than sensitive.

Important findings regarding the costs of cervical cytological screening and cervical cancer diagnosis and treatment include the following:

Important findings from a review of previously published models of the cost and effectiveness of cervical cytological screening include the following:

Important findings from a new model of cost and effectiveness of cervical cytological screening include the following: