Cancer Screening
Cancer Incidence and Mortality
Summary Development
The Scientific Basis
Detection
High-Risk Populations
Cancer Recurrence
Improved Outcomes
Observed Patterns of Cancer Stage at Diagnosis
Interpreting Changes in Relative Survival Over Time
Study Designs
Disease-Specific and All-Cause Mortality Endpoints
Measures of Risk
Cancer Incidence and Mortality
In 2008, an estimated 1,437,180 people in the United States will be diagnosed
with cancer, and 565,650 will die of cancer.[1] Estimates of the premature deaths that
could have been avoided through screening vary from 3% to 35%, depending on a
variety of assumptions. Beyond the potential for avoiding death, screening may
reduce cancer morbidity since treatment for earlier-stage cancers is often
less aggressive than that for more advanced-stage cancers.
Several potential harms must be considered against any potential benefit of screening for cancer.[2] Although most cancer screening tests are noninvasive or minimally invasive, some involve small risks of serious complications that may be immediate (e.g., perforation with colonoscopy) or delayed (e.g., potential carcinogenesis from radiation). Another harm is the false-positive test result, which may lead to anxiety and unnecessary invasive diagnostic procedures. These invasive diagnostic procedures carry higher risks of serious complications. A less familiar harm is overdiagnosis, i.e., the diagnosis of a condition that would not have become clinically significant had it not been detected by screening. This harm is becoming more common as screening tests become more sensitive at detecting tiny tumors. Finally, a false-negative screening test may falsely reassure an individual with subsequent clinical signs or symptoms of cancer and thereby actually delay diagnosis and effective treatment.
In developing the cancer screening summaries, the PDQ Screening and Prevention
Editorial Board uses the following definitions:
- Screening is a means of detecting disease early in asymptomatic people.
- Positive results of examinations, tests, or procedures used in screening
are usually not diagnostic but identify persons at increased risk for
the presence of cancer who warrant further evaluation.
- Diagnosis is confirmation of disease by biopsy or tissue examination in
the work-up following positive screening tests. (Following a
positive screening result, cancer can often be ruled out by
procedures other than biopsy or tissue examination.)
The purpose of this summary is to present an explicit evidence-based approach
used in the development of the screening summaries. In reaching conclusions,
evidence on the balance of risks and benefits is weighed. Cost and cost-effectiveness, however, is not taken into account. Assignment of levels of
evidence associated with such screening tests is also discussed.
Summary Development
The cancer screening summaries are based on various levels of published
scientific evidence and collective clinical experience. The highest level of
evidence is taken as mortality reduction in controlled, randomized clinical
trials. The results of clinical studies, case-control studies, cohort studies,
and other information are also considered in formulating the summaries. In
addition, the incidence of cancer, stage distribution, treatment, and mortality
rates are considered. The summaries are subject to modification as new
evidence becomes available.
The Scientific Basis
At least two requirements must be met for screening to be efficacious:
- A test or procedure must be available to detect cancers earlier than if
the cancer were detected as a result of the development of symptoms.
- Evidence must be available that treatment initiated earlier as a consequence of
screening results in an improved outcome.
These requirements are necessary but not sufficient to prove the efficacy of
screening, which requires a decrease in cause-specific mortality. For example,
these two criteria are met in the case of screening for childhood neuroblastoma
by assessment of urinary catecholamine metabolites. On the basis of these
criteria, a mass screening program was conducted in Saitama Prefecture, Japan,
between 1981 and 1992 for 6-month-old infants.[3] Over that 12-year period, the
annual incidence of neuroblastoma in children younger than 1 year increased from about
28 per million to 260 per million but without a significant reduction in incidence in
children older than 1 year. Because there also was no reduction in mortality for the
disease, this experience provided strong evidence of overdiagnosis—diagnosis
of some neuroblastomas detectable by screening, which would not have been
clinically diagnosed later. Similar experiences have been reported elsewhere
in Japan [4] and in the Quebec Neuroblastoma Screening Project (QNSP) in Canada.[5] The history of screening for neuroblastoma also provides a useful illustration of the benefit of undertaking well-designed evaluations of emerging screening technologies before implementing screening programs. Although such studies are very costly, it has been shown that the QNSP itself averted unnecessary morbidity for thousands of children and did so while returning a yield plausibly estimated at a cost savings 64.5 times the investment in the study.[6]
Detection
Direct or assisted visual observation is the most widely available examination
for the detection of cancer. It is useful in identifying suspicious lesions in
the skin, retina, lip, mouth, larynx, external genitalia, and cervix.
The second most available detection procedure is palpation to detect lumps, nodules, or tumors in the breast, mouth, salivary
glands, thyroid, subcutaneous tissues, anus, rectum, prostate, testes, ovaries,
and uterus and enlarged lymph nodes in the neck, axilla, or groin.
Internal cancers require procedures and tests such as endoscopy, x-rays,
magnetic resonance imaging, or ultrasound. Laboratory tests, such as the Pap
smear or the fecal occult blood test have been employed for detection of
specific cancers.
The performance of screening tests is usually measured in terms of sensitivity,
specificity, and positive-predictive values (PPV) and negative-predictive values (NPV).
Sensitivity is the chance that a person with cancer has a positive test.
Specificity is the chance a person without cancer has a negative test. PPV is
the chance that a person with a positive test has cancer. NPV is the chance
that a person with a negative test does not have cancer. PPV and to a lesser
degree, NPV are affected by the prevalence of disease in the screened
population. For a given sensitivity and specificity, the higher the
prevalence, the higher the PPV.
High-Risk Populations
Some individuals are known to
be at high risk for cancer, such as those with a personal history of cancer or
a strong family history of cancer (in two or more first-degree relatives);
increasingly, as genetic mutations and polymorphisms are found to be associated
with specific cancers, high-risk individuals will be identified through genetic
testing. The type, periodicity, and commencement of screening in high-risk populations
for most cancers reflect the judgment of practitioners rather than
evidence from scientifically conducted studies. Physician judgment is needed in such circumstances to determine the
most appropriate application of available screening methods. Prudence suggests
increased vigilance in the higher-risk populations. At a minimum, this means
that the high-risk person is identified, is counseled appropriately, and
regularly undergoes those screening procedures that have been shown to be of benefit to the general
population.
Cancer Recurrence
Please see the PDQ treatment summaries for information on cancer recurrence.
Improved Outcomes
For nearly all cancers, treatment options and survival are related to stage, which is generally characterized by the anatomic extent of disease. On this basis, it is assumed that early detection of cancer at an earlier stage may yield better outcomes. In the 1940s, a generalized staging classification of localized,
regional, and distant disease was developed to show long-term trends, and
it is still useful. In the more detailed TNM system, which has been
periodically modified, the (T)umor size, the status of the lymph (N)odes, and
the status of distant (M)etastases are also categorized. These elements are
grouped into stages 0, I, II, III, and IV according to their association with survival.
In general,
larger primary malignant tumors have a higher incidence of metastasis to
regional lymph nodes and to distant sites. Stage has such a profound effect on outcome
that all randomized treatment trials require the comparison of similar stages
in evaluating differences in outcome. Shifts in stage may also herald improved
survival and decreased mortality, though stage shift alone does not establish
benefit.
Biologic cellular characteristics of cancer, such as grade, hormone sensitivity, and gene overexpression are recognized as important predictors of cancer behaviors. For example, high-grade cancer may be fast growing and quick to metastasize regardless of stage at the time of diagnosis. Therefore, detection of these cancers when small may not affect outcome. Randomized controlled trials are most definitive in determining screening benefits.
Observed Patterns of Cancer Stage at Diagnosis
The Surveillance, Epidemiology, and End Results (SEER) Program
of the National Cancer Institute gathers cancer incidence data from 11
geographic areas, covering approximately 14% of the U.S. population. These
population-based data of long duration (1973–present) are a unique and
important resource in monitoring stage-related survival.
Interpreting Changes in Relative Survival Over Time
Increases in survival over time, however, even when based on data from tumor
registries, such as SEER that include all cases in a given population, are
difficult to interpret. They may reflect the benefits of early detection or
improved treatment or both, but they may also result from lead-time bias and
overdiagnosis, both of which occur commonly with screening.
Lead-time bias will result
in longer estimated survival of people with cancers that have been identified through screening because the time before the
cancer would have been diagnosed clinically is included in the calculation of survival.
Overdiagnosis may result from finding cancers that would never have become
manifest clinically. By definition, these cancers have a good prognosis. For example,
autopsy series have shown a high percentage of occult early prostate carcinomas
in elderly men who died of causes unrelated to prostate cancer.[7] The
discovery of these cancers through screening could increase the number of cases
and give the appearance of stage shift, and of increases in survival or cure
rates, without necessarily reducing mortality. An analysis of data
reported by the SEER program between 1950 and 1996 found that changes over time in
5-year relative survival rates for 20 major cancers were essentially unrelated
to trends in mortality rates for those cancers over the same period.[8] The
authors suggest that changes in 5-year survival rates are largely due to earlier
diagnosis and to detection of subclinical cases that might never have surfaced
clinically. They conclude that inferences about the effectiveness of early
diagnosis or treatment should not be drawn from temporal changes in 5-year
survival rates, but rather should be based on changes in mortality rates. Thus,
changes in 5-year survival rates or stage shifts are not appropriate measures of the
effectiveness of screening for early disease. Reductions in incidence rates
for late-stage tumors represent a better measure of progress due to screening
than 5-year survival trends, although such evidence is less compelling than
reductions in mortality.
Study Designs
Varying study designs may be available to support a given summary. The
strongest design would be obtained from a
randomized controlled trial. It is, however, not always practical to conduct
such a trial to address every question surrounding the field of screening. For
each summary of evidence statement, the associated strength of study designs are
listed. There are five study designs that are generally used in judging the evidence. In order of strength of design, the five levels are as follows:
- Evidence obtained from randomized
controlled trials.
- Evidence obtained from nonrandomized controlled
trials.
- Evidence obtained from cohort or case-control
studies.
- Evidence obtained from ecologic and descriptive studies (e.g., international patterns studies, time series).
- Opinions of respected authorities based on clinical experience, descriptive
studies, or reports of expert committees.
Experimental trials are designed to correct for or eliminate selection,
lead-time, length, healthy volunteer, and other biases when prospectively
testing a detection procedure to determine its effect on health outcome. The
highest level of evidence and greatest benefit from screening is mortality
reduction in a randomized controlled trial. For most sites, such evidence is
not available. Theoretically it is possible to conduct randomized trials for most interventions, but the sample size that is
needed, the expense, and the duration of such trials for most cancers, frequently make this approach impractical.
Therefore, evidence obtained by other methods is often used.
In certain cases, a preliminary alternative to using mortality reduction to evaluate a new screening modality could be a relatively short-term (e.g., several years) comparison of interval cancer rates observed in a randomized trial comparing the new test and the “standard” screening modality. If the new screening test has the potential to improve disease-specific mortality, repeated applications over a discrete period of time should result in a lower proportion of patients in the intervention arm presenting with symptomatic cancer (of the type screened for) between negative screens. That is, through increased early detection and resulting treatment, the new screening test prevents a higher percentage of clinically important asymptomatic lesions from progressing to overt cancer. Unlike cross-sectional sensitivity comparisons in which study participants receive both new and older screening modalities, this trial design allows for an estimation of the degree of overdiagnosis generated by a screening test. This comparison should take place within the context of a randomized controlled trial.[9]
Case-control and cohort studies provide indirect evidence for the effectiveness
of screening, but it is difficult to eliminate the contribution of selection bias and healthy volunteer biases evident in these studies.
Ecological studies can demonstrate association between the use of screening and a stage shift in cancer that can provide indirect evidence of the value of screening. Such
evidence is particularly compelling for the effectiveness of screening for
cervical cancer.[10] Ecological correlation of mortality and intensity of
screening has been used in this context. Such studies do not prove a mortality-reduction effect, and the potential for bias to
invalidate inferences from nonexperimental studies or to give misleading
results can be substantial.[11-16]
Descriptive uncontrolled studies based on the experience of individual
physicians, hospitals, and nonpopulation-based registries may yield some
information about screening. The performance characteristics of various
detection tests, such as sensitivity, specificity, and PPVs are generally first reported in such descriptive studies. The first
evidence that screening may be successful is an increase in the incidence of
early cancers as well as a decreased incidence of late-stage metastatic cancers
(stage shift); later, a reduction in deaths may occur. These descriptive
studies do not establish efficacy because of the absence of an appropriate
control group.
A more detailed description of how the overall evidence regarding benefits and harms of screening tests is graded by the PDQ Screening and Prevention Editorial Board can be found in the PDQ summary on Levels of Evidence for Cancer Screening and Prevention Studies.
Disease-Specific and All-Cause Mortality Endpoints
Disease-specific mortality has been the most widely accepted endpoint in
randomized clinical trials of cancer screening; however, the validity of this
endpoint rests on the fundamental assumptions that the cause of death can be
accurately determined and that the screening and subsequent treatment have
negligible effects on other causes of death. Recent reviews of randomized
clinical trials of cancer screening suggest that misclassification in cause of
death has been a major problem and that misclassification has led to an
overestimation of the effectiveness (or an underestimation of the harms) of
screening.[17-19] In contrast to disease-specific mortality, all-cause
mortality depends only on an accurate ascertainment of deaths and when they
occur and therefore is not affected by misclassification in cause of death.
One major limitation of the all-cause mortality endpoint however is that it is
unlikely to reveal a statistically significant effect of cancer screening
because this intervention is usually targeted to a disease that causes only a
small proportion of all deaths. Nevertheless, all-cause mortality should be
considered in conjunction with disease-specific mortality to reduce the
possibility that a major harm (or benefit) from screening is hidden by
misclassification in cause of death.
Measures of Risk
Several measures of risk are used in cancer research. Absolute risk or absolute rate
measures the actual cancer risk or rate in a population or subgroup (e.g., U.S.
population, or whites or African Americans). For example, the SEER Program reports risk and rate of cancer in specific
geographic areas of the United States.
Rates are often adjusted (e.g., age-adjusted rates) to allow a more accurate comparison of rates
over time or among groups. The purpose of the adjustment is to make the groups
more alike with respect to important characteristics that may affect the
conclusions. For example, when the SEER Program compares cancer rates over
time in the United States, the rates are adjusted to one age distribution. If
this were not done, cancer rates would seem to increase over time simply because the
U.S. population is getting older and the risk of cancer is higher in older age
groups.
Relative risk (RR) compares the risk of developing cancer among those who have
a particular characteristic or exposure with those who do not. RR
is expressed as a ratio of risks or rates; it ranges from infinity to the
inverse of infinity (i.e., zero). If the RR is greater than one, the exposure or
characteristic is associated with a higher cancer risk; if the RR is
one, the exposure and cancer are not associated with one another; if the
RR is less than one, the exposure is associated with a lower cancer
risk (i.e., the exposure is protective). RR is often used in clinical trials of cancer prevention and
screening to estimate the reduction in cancer risk or risk of death,
respectively.
An odds ratio (OR) is often used as an estimate of the RR. It, too,
indicates whether there is an association between an exposure or characteristic
and cancer. It compares the odds of an exposure or characteristic among cancer
cases with the odds among a comparison group without cancer. Although not as intuitively understood as rates or risk, OR is used because it is statistically more valid in some settings when other measures of risk are not valid. For relatively
uncommon events/diseases such as a cancer diagnosis, it can be interpreted like a RR is interpreted; however, it becomes a progressively inaccurate estimate of the RR as the underlying absolute risk of disease in the population under study rises above 10%. ORs are typically used
in case-control studies to identify potential risk factors or protective
factors for cancer.
Risk or rate difference (or excess risk) compares the actual cancer risk or
rate among at least two groups of people, based on an important characteristic or
exposure, by subtracting the risks or rates from one another (e.g., subtracting
lung cancer rates among nonsmokers from that of cigarette smokers estimates the
excess risk of lung cancer due to smoking). This can be used in public health
to estimate the number of cancer cases that could be avoided if an exposure
were reduced or eliminated in the population.
Population-attributable risk measures the proportion of cancers that can be
attributed to a particular exposure or characteristic. It combines information
about the RR of cancer associated with a particular exposure and the
prevalence of that exposure in the population, and estimates the proportion of
cancer cases in a population that could be avoided if an exposure were reduced
or eliminated.
Number needed to screen estimates the number of people that must participate in
a screening program for one death to be prevented over a defined time interval.
Average life-years saved estimates the number of years that an intervention
saves, on average, for an individual who receives the intervention. This
reflects mortality reduction as well as life extension (or avoidance of
premature deaths).
References
-
American Cancer Society.: Cancer Facts and Figures 2008. Atlanta, Ga: American Cancer Society, 2008. Also available online. Last accessed October 1, 2008.
-
Kramer BS: The science of early detection. Urol Oncol 22 (4): 344-7, 2004 Jul-Aug.
[PUBMED Abstract]
-
Yamamoto K, Hayashi Y, Hanada R, et al.: Mass screening and age-specific incidence of neuroblastoma in Saitama Prefecture, Japan. J Clin Oncol 13 (8): 2033-8, 1995.
[PUBMED Abstract]
-
Bessho F: Effects of mass screening on age-specific incidence of neuroblastoma. Int J Cancer 67 (4): 520-2, 1996.
[PUBMED Abstract]
-
Woods WG, Tuchman M, Robison LL, et al.: A population-based study of the usefulness of screening for neuroblastoma. Lancet 348 (9043): 1682-7, 1996 Dec 21-28.
[PUBMED Abstract]
-
Soderstrom L, Woods WG, Bernstein M, et al.: Health and economic benefits of well-designed evaluations: some lessons from evaluating neuroblastoma screening. J Natl Cancer Inst 97 (15): 1118-24, 2005.
[PUBMED Abstract]
-
Woolf SH: Screening for prostate cancer with prostate-specific antigen. An examination of the evidence. N Engl J Med 333 (21): 1401-5, 1995.
[PUBMED Abstract]
-
Welch HG, Schwartz LM, Woloshin S: Are increasing 5-year survival rates evidence of success against cancer? JAMA 283 (22): 2975-8, 2000.
[PUBMED Abstract]
-
Irwig L, Houssami N, Armstrong B, et al.: Evaluating new screening tests for breast cancer. BMJ 332 (7543): 678-9, 2006.
[PUBMED Abstract]
-
Hakama M, Miller AB, Day NE, eds.: Screening for cancer of the uterine cervix. Lyon, France: International Agency for Research on Cancer, 1986.
-
Connor RJ, Prorok PC, Weed DL: The case-control design and the assessment of the efficacy of cancer screening. J Clin Epidemiol 44 (11): 1215-21, 1991.
[PUBMED Abstract]
-
Friedman DR, Dubin N: Case-control evaluation of breast cancer screening efficacy. Am J Epidemiol 133 (10): 974-84, 1991.
[PUBMED Abstract]
-
Janzon L, Andersson I: The Malmo mammographic screening trial. In: Miller AB, Chamberlain J, Day NE, et al., eds.: Cancer Screening. Cambridge: Cambridge University Press, 1991, pp 37-44.
-
Moss SM: Case-control studies of screening. Int J Epidemiol 20 (1): 1-6, 1991.
[PUBMED Abstract]
-
Weiss NS, Lazovich D: Case-control studies of screening efficacy: the use of persons newly diagnosed with cancer who later sustain an unfavorable outcome. Am J Epidemiol 143 (4): 319-22, 1996.
[PUBMED Abstract]
-
Suzuki KJ, Nakaji S, Tokunaga S, et al.: Confounding by dietary factors in case-control studies on the efficacy of cancer screening in Japan. Eur J Epidemiol 20 (1): 73-8, 2005.
[PUBMED Abstract]
-
Black WC: Overdiagnosis: An underrecognized cause of confusion and harm in cancer screening. J Natl Cancer Inst 92 (16): 1280-2, 2000.
[PUBMED Abstract]
-
Olsen O, Gøtzsche PC: Screening for breast cancer with mammography. Cochrane Database Syst Rev (4): CD001877, 2001.
[PUBMED Abstract]
-
Black WC, Haggstrom DA, Welch HG: All-cause mortality in randomized trials of cancer screening. J Natl Cancer Inst 94 (3): 167-73, 2002.
[PUBMED Abstract]
Back to Top
Next Section > |