Workshop participants:
Paul
Bunn, MD Gerard
T. Kennealey, MD
Richard
Pazdur, MD Ellen
Stovall
Laurie
Burke, PhD, MPH Steve
Piantadosi, MD, PhD
Martin
H. Cohen, MD Scott
Saxman, MD
Thomas
R. Fleming, PhD Mark
Somerfield, PhD
Richard
J. Gralla, MD Mary
Lopez Wilson
David
Johnson, MD
Richard
Kaplan, MD
Patricia Keegan, MD
(This document summarizes discussions among the workshop participants and does not necessarily represent the views of the FDA.CDER)
The
United States Food and Drug Administration (FDA) is embarking upon a process to
evaluate endpoints used as the basis for approval of cancer drugs. In a series of public workshops, FDA will
seek input from panels of experts on disease-specific endpoints to support drug
approval. Issues identified by the workshops
will be before the Oncologic Drugs Advisory Committee (ODAC). Subsequently, FDA will develop a series of
guidance documents on endpoints for the approval of cancer drugs.
On
1.2 Endpoints supporting
approval of drugs for lung cancer
Drugs
approved in the past decade for first-line treatment of NSCLC are navelbine,
paclitaxel, gemcitabine, and docetaxel.
Docetaxel was also approved for second-line treatment of NSCLC. In most cases, the primary basis of approval
was a statistically significant improvement in survival demonstrated in one or
two randomized controlled studies. In most instances the survival advantage was
6-8 weeks at the median (Table 2). The first-line docetaxel application was
approved based on survival benefit established though a non-inferiority
comparison to navelbine. All of the
approved first-line regimens were cisplatin doublets except navelbine, which was
also approved for single-agent use.
Recently gefitinib received accelerated approval for third-line
treatment of NSCLC (after failure of platinum and docetaxel regimens) based on
a response rate of 10% and a median response duration of 7 months. Photofrin
was approved for local treatment (intrabronchial photodynamic therapy) of
patients with symptomatic obstructing NSCLC.
The endpoints supporting approval of Photofrin were the intrabronchial
tumor response rate and assessments on a pulmonary symptom severity scale
(evaluating cough, hemoptysis, and dyspnea).
Over the past decade, only one drug new drug application has been
approved for the treatment of small cell lung cancer (SCLC). Topotecan was approved for second-line
treatment based primarily on ORR (considered to represent benefit in that
setting) with supportive evidence from pulmonary symptom assessments.
2. Lung cancer endpoints
2.1 Objective response rate
Effects
on ORR have been used twice as the basis of lung cancer drug approval, once as
a "reasonably likely" surrogate endpoint to support AA in NSCLC
refractory to available therapy (gefitinib) , and once as a full surrogate to
support regular approval of second-line treatment in small cell lung cancer
(topotecan). Recently RECIST criteria
have provided a widely accepted common standard method for ORR measurement.
Effects on ORR have not been demonstrated to reliably predict effects on
survival in NSCLC, although some studies with higher ORR rates have reported
higher levels of symptom benefit.
Inclusion of stable disease in the endpoint definition has been
suggested to improve the prediction of survival effects. For regulatory
purposes, meaningful evaluation of such an endpoint would need to be done in
randomized controlled studies.
2.2 Time to Progression
TTP
has often been poorly defined and not rigorously evaluated. In most NSCLC studies submitted for drug
approval where a survival benefit was demonstrated, a TTP effect was also
noted. However, more importantly, it has
not been established that benefit on TTP reliably predicts benefit on survival.
In Cancer Cooperative Group studies of NSCLC, median overall survival (OS) is
usually about twice the median TTP.
Simplified analyses of progression have been suggested, such as percent progression
at a single pre-specified time.
Retrospective analyses of a Cooperative Group data base suggest that
such an progression endpoint may predict effects on OS. In NSCLC, the
relationship between surrogates such as TTP and survival have been difficult to
evaluate because drugs have produced only minimal effects on survival. With the advent of more effective agents,
these relationships need to be re-examined.
2.3 Survival
As
discussed earlier, most lung cancer drug approvals have been based on a
significant improvement in survival.
Survival is an optimal endpoint because measurement is easy and
accurate, and the value of a survival improvement is unquestioned. Given the variation in survival among
different study populations, survival must be assessed in randomized controlled
studies. The most efficient design
evaluates the potential superiority of the new drug regimen compared to a
standard control regimen. In "add-on"
designs (typically cisplatin plus or minus new drug) the new drug does not have
to "beat" a standard drug, it must only demonstrate additional
benefit in combination. Because multiple NSCLC treatment doublets are now
marketed and are associated with small survival benefits, it may no longer be
possible to have a platinum-alone control arm. To date no new 3 drug
combination has shown constant superiority over 2 drug combinations in either
SCLC or NCSC and no approval has been applied for using this approach.
2.4 Non-inferiority survival
studies in NSCLC
In
many disease settings, efficacy can be established by showing that a new drug
is "non-inferior" to an effective standard drug. As discussed below non-inferiority (NI)
studies are difficult to perform in NSCLC at the current time.
One can never prove that two treatments are equal. One can, however, show that a treatment is
not worse than a standard treatment by more than some specified acceptable
amount (margin). A critical issue in
determining this NI margin relates to identifying what treatment effect (TE)
can reliably be attributed to the standard drug. The goal of the NI study in a regulatory
setting is to compare the new drug to the standard drug and, through inference
and statistical methodology, determine the fraction
of TE that is demonstrated to be retained (FTEDR). Through
clinical and regulatory judgment, one determines whether this is an acceptable
fraction (AF) of the TE for the specific clinical setting. Some of the important considerations in this
judgment are the level of improvement that the new drug provides in safety,
tolerability or convenience relative to the standard treatment.
The field of NI analysis is still developing
so that there are ongoing research and discussions about the most reliable and
reasonable methods for determining FTEDR and regarding the most appropriate AF
for different clinical settings.
NI trials are most readily conducted in
settings in which the standard intervention provides a large TE, where the size
of this TE has been precisely established in earlier randomized trials, and
where these estimates of the TE from earlier trials are unbiased with respect
to the effect the comparator regimen will actually have in the active control
trial, (with this latter condition often referred to as the “constancy
assumption”). In the NSCLC setting, NI
designs are difficult because of the small and poorly documented survival
benefit associated with the active control treatments (standard first-line
chemotherapy). Evidence documenting the
TE of currently approved treatments in NSCLC generally consists of one or two
trials showing a 1-2 month median survival difference of marginal statistical
significance. Taxotere is the only drug
approved for second line treatment of NSCLC.
The only data available to estimate the taxotere TE is a 104 patient
study comparing taxotere to best supportive care. The small size of this study
does not allow a precise estimate of treatment effect (HR .56, C.I. 0.35, 0.88).
2.5 Disease-free survival
FDA has long stated that for adjuvant
treatment, disease-free survival (DFS) would be an adequate approval endpoint
in disease settings where most patients are symptomatic or where effects on DFS
is a reliable predictor of effects on survival.
Most lung cancer patients are symptomatic when disease recurs, so it
seems reasonable for ODAC to discuss whether delay of symptoms is a reasonable
basis to justify the use of DFS for lung cancer drug approval, and, if so,
whether drug approval based on DFS should be restricted to less toxic
therapies. Another approach would be to
actually measure the treatment-related and/or tumor-related symptoms using
HRQOL scales to determine whether treatment toxicity outweighed DFS
benefit. The adequacy of DFS as a
survival surrogate in lung cancer cannot be rigorously tested with existing
data due to the lack of effective adjuvant treatments. Two small studies studies with a significant
or near significant DFS benefit showed similar effects on survival and one
study showed a significant survival effect but no effect on DFS.
2.6 Patient-reported outcomes (PRO)
At
presentation, over 90% of lung cancer patients with Stage III or IV extent
report two or more disease-related symptoms.
These commonly include pulmonary effects such as cough and dyspnea, and
the general symptoms of fatigue, pain, and anorexia; additionally, patients
have high degrees of psychological distress. Consequently, in addition to
survival outcomes, information about treatment effects on the patient reported
outcomes of health-related quality of life (HRQOL) and symptom benefit is
important.
2.6.1 Symptom assessments
To
date, patient morbidity assessments used by FDA as direct support for cancer
drug approval have consisted of measurements or observations that allowed FDA
to infer symptom benefit, such as tumor responses paired with reported
improvements in tumor-related symptoms or signs. The perceived advantages of a targeted
assessment of symptoms include the potential relative brevity of instruments
compared to some multidimensional tools, and the assumption that improvement in
measurements could reflect true patient benefit. However, several problems exist for this
approach: a) the definition of effective patient benefit is not always agreed
upon, b) clinical correlates for changes in symptom measures are not always
known, c) if the tool used is not validated, uncertainty in the accuracy of the
endpoint could be a problem. Instruments
assessing improvements in symptoms require that patients have the symptoms at
study entry thus restricting patient eligibility.
2.6.2 Health Related Quality of Life Instruments (HRQOL)
When evaluating the role of HRQOL instruments in the lung cancer drug-approval process, important issues are the instrument's relevance and validity in that setting. Psychometric properties are important to consider. Do the instruments available for evaluating lung cancer meet the well-established criteria for acceptability? Several existing measures applicable to lung cancer patients have undergone extensive psychometric evaluation. Because HRQOL attempts to assess the impact of treatment and disease on multiple dimensions of importance to patients, these instruments vary in length. Typically, quality of life scales are longer than scales that only evaluate single or a few symptoms. This greater length must be considered in the context of the problems of missing data. Additionally, differentiating between cancer-related symptoms, side-effects of treatment and symptoms or problems not related to cancer can often be difficult and perhaps the impact of these can be best assessed using a multi-dimensional instrument.
Regulatory context will affect the appropriate use of instruments. HRQOL instruments have been proposed as primary efficacy endpoints determining whether a new drug is approved, as co-primary endpoints supporting drug approval, or as secondary endpoints to be described in drug labeling or to guide future research. For drug approval, FDA must find that the drug is both safe and effective. Therefore, assessments that reflect primarily a difference in drug toxicity cannot support approval without separate demonstration of effectiveness. PROs that blend together assessments of efficacy and safety may be acceptable as primary endpoints when the comparator drug is relatively non-toxic. When the comparator drug is toxic, however, it may be necessary to separately assess tumor-related PRO benefit and toxicity-related PRO benefit, or to measure effectiveness by non-PRO endpoints.
There
are currently three lung cancer-specific instruments with published and
acceptable psychometrics in peer-reviewed literature, using previously
established criteria. All three are in
common use, the EORTC-LC13, the FACT-L, and the LCSS. They share several factors, but also have
some differences, which have been discussed in some detail in a recent
comprehensive review (Drs. C. Earle and J. Weeks for the NCI sponsored Clinical
Outcomes Working Group (COMWOG)). Although these questionnaires differ in the
number and format of their scales, all ask patients to measure the impact of
lung cancer specific symptoms and treatment related complications on several
dimensions of quality of life; all are brief, easy to administer, and have
acceptable psychometric properties (feasibility, reliability and validity). They are likely to be able to measure
accurately the positive and negative impact of disease and treatment, as
expressed by patients, on the various dimensions of quality of life. All three
of these instruments have undergone fairly extensive field testing, and have
been used in many trials in many countries. In trials using more than one
instrument, the different instruments tend to show convergent results. It appears that the EORTC instrument is more
frequently used in
· EORTC QLQ-LC13. The EORTC QLQ-C30 consists of 30 general
cancer-related questions in Likert and numerical analogue scale (NAS) formats,
covering the week leading up to its administration. The LC13 adds 13 lung
cancer related questions (thus 43 in total). The core instrument combined
with the lung cancer subscale is estimated to take about 11 minutes to
complete. It has been translated into 23
major languages.
· LCSS. The LCSS is a lung cancer-specific instrument. It concentrates on the symptoms of lung cancer, capturing overall quality of life only by a global question. It does not have a “general cancer” component and does not attempt to assess the toxicity of treatment directly. It consists only of 9 visual analogue scales (VAS) and 6 optional items for an observer to fill out for further context if desired, and asks about HRQOL in the previous 24 hours. Of the three lung cancer-specific HRQOL instruments, the LCSS has the most published literature documenting its psychometric properties.
· FACT-L. The FACT-G (the general component) consists
of 34 questions, while the FACT-L (for lung cancer) currently adds 7 questions.
The FACT has well-documented content validity.
As with the LCSS, it was developed using patient input as well as that
of medical professionals for item generation and review. The FACT-L emphasizes
social and emotional well-being, enhancing its multidimensional in scope. The
FACT may be best in situations where patients are not as ill. It does not have
as comprehensive an assessment of symptoms as the other two lung
cancer-specific instruments and, therefore, has been most successful in
monitoring patients receiving supportive care rather than aggressive
anti-cancer treatment.
2.6.3 Conclusions regarding
PROs in lung cancer
There
is a clear need to evaluate patient-reported outcomes (PRO) in patients with
lung cancer. While evaluation
difficulties remain, recent trials indicate that initiatives to overcome these
problems are meeting with some success.
Studies using symptom endpoints and quality of life have been successful
in selected indications, but present difficulties. Attention to the areas specified in the
companion article (Gralla et al) is necessary if trials are to overcome common
problems in PRO evaluation. Education of
investigators in the importance and conduct of PRO research is needed prior to
the initiation of trials. Patients must
understand the study requirements as part of the consenting process. Steps must be taken to ensure that as little
data as possible are missing. The
endpoints and analysis plan need to be specified prior to the initiation of the
trial. Trials need to be properly powered
and adequately controlled for the specified endpoints.
Further
research in the evaluation and analysis of PROs will enhance this important
component of the cancer drug evaluation. Concordant evidence of anti-tumor
activity (either as survival data, response rates or as prolongation of TTP)
can be desirable. Indeed PROs should be
viewed as components of the total value of a treatment, and together with these
other cancer endpoints, provide a mutually enhanced picture of the benefits and
risks of anticancer therapies.
Table
1: Approved treatment indications based on disease free survival (DFS),
1990-1992
Drug |
Indication |
Signficant
findings other than DFS |
Year
of approval |
Anastrazole |
Adjuvant
therapy of post menopausal breast cancer |
|
2002 |
Busulfan |
Induction
therapy for bone marrow transplantation in CML |
Time
to engraftment |
1999 |
Paclitaxel |
Node
positive breast cancer |
Survival
benefit |
1999 |
Epirubicin |
Node
positive breast cancer |
Survival
benefit |
1999 |
Tamoxifen |
Node
negative breast cancer |
|
1990 |
|
|
|
|
Table 2:NSCLC
Approved First and Second Line Treatments
Advanced or
Metastatic Disease
Treatments |
# Trials |
# of
Pts |
Endpoint |
Result |
First-Line |
|
|||
Navelbine vs.
5-FU/LV |
1 |
211 2:1
rand |
Survival |
Median
surv. 30w vs 22w, p=.06 1
year surv. 24% vs 16% RR
12% vs 3% |
Navelbine/cisplat
vs cisplat & Navelbine/cisplat
vs navelbine vs vindesine/cisplat |
2 |
432 612 |
Survival Survival |
Median surv. 7.8m vs 6.2m, p=.01 1
year surv. 38% vs 22% RR
19% vs 8%, p<.001 Median surv. 9.2m vs 7.2m vs 7.4m, p=.05 1 yr surv. 35% vs 30% vs 27% RR 28% vs 14% vs 15%, p<.001 |
Gemzar/cisplat vs cisplat & Gemzar/cisplat vs VP16/cisplat |
2 |
522 135 |
Survival Survival |
Median surv. 9.0m vs 7.6m, p=.008 TTP 5.2m vs 3.7m, p=.009 RR 26% vs 10%, p<.001 Median surv. 8.7m vs 7.0m, p=.18 TTP 5.0m vs 4.1m, p=.015 RR 33% vs 14%, p<.01 |
Paclitaxel 135mg/m2 or 200mg/m2/cisplat vs
VP16/cisplat |
1 |
599 |
Survival |
Median surv. 9.3m vs 10.0m vs 7.4m, NS TTP 4.3m vs 4.9m, vs 2.7m, p=.05, .08 RR
25% vs 23% vs 12%, p=.001, <.001 |
Docetaxel/cisplat
vs Navelbine/cisplat vs Docetaxel/ carboplat |
1 |
1218 |
Survival |
Median surv. 10.9m vs 10.0m vs 9.1m, NS Efficacy
established by a non-inferiority analysis. Docetax/ carboplat did not
demonstrate preservation of 50% of the survival effect of Navelbine/cisplat. |
Second Line |
|
|||
Docetaxel
vs Best Supportive Care & Docetaxel
vs Investigator choice |
2 |
104 248 |
Survival Survival |
Median surv. 7.5m vs 4.6m, p=.01 TTP 12.3w vs 7.0w, p<.05 RR 5.5% Median surv. 5.7m vs 5.6m, NS 1
year surv. 30% vs 20%, p<.05 RR
5.7% vs 0.8% |
m=months,
NS=nonsignificant, RR= response rate, TTP=time to progression, w=weeks