fresh frozen plasma, and we're asking the same questions now in red
cells.
So in terms of red
cells as a whole, we've talked about the potential for benefit, life-saving
elements, but also possible harm, and I think there are several studies that
have looked at this.
This is actually a
very large study looking at almost 80,000 patients with myocardial infarction,
and the curves there, it's a very small detailed graph, but effectively what
it's showing is that mortality, or long-term survival after a myocardial
infarction, is highly associated with hematocrit presentation.
In fact, in this
80,000 patient dataset, there were almost 3700 patients who received a
transfusion during their care, and I think this is a very interesting, though
retrospective, again, study, that looks at how the association of transfusion
would benefit, or potential for benefit.
And if you look here,
on the table, you can see that the presenting hematocrit relates very
significantly to the association of mortality with transfusion.
So, in fact, if you
presented with hematocrit below 24, then with transfusion, your survival was in
fact fivefold that of your peers who didn't receive a transfusion. The benefit, if you call it a benefit,
associated, continues up until a hematocrit of approximately thirty-three.
When you reach 33 to
36, as your presenting hematocrit, you don't appear to--it's pretty even,
whereas if you go above, there's in fact a statistical association with
worsened outcome.
So this is one of the
studies that potentially suggests an association with transfusion with both
life-saving benefit but also potentially threatening the outcome. There are several other studies in the kind
of populations I mentioned after cardiac surgery, in critically-ill patients,
and a BEAR study, which was a prospective randomized controlled trial, with
restrictive and liberal transfusion targets, is also a good example, as well as
the overall outcome, was noted in that study but in some subsets, particularly
the younger patients and the patients with lower APACHE scores, lower critical
illness scores, then in fact a post hoc analysis showed a p value of less than
.05 in terms of the association of transfusion strategy with outcome.
So there is some sort
of evidence, retrospective, to suggest, and some prospective, suggesting that
there is potential for harm and benefit from red cell transfusion, in general.
Well, with this in
mind, the next question becomes, if you accept that there is potential harm
from a transfusion, then is one of the elements that we look at, the age of
transfused blood. And what you're
looking at here is a distribution of the aged blood when it's transfused to
heart surgery patients at our institution in
Notably, the median
age here is 14 days, and that again is very similar to the Koch stud that's
been referred to before, and I think probably the reason why they chose 14
days, as much as any other cutoff issue that they selected.
Notably, in the
population of blood cells that are transfused, where it's been looked at, at
least 20 percent of red cells are transfused after their fourth week of
storage, and depending on the month you look at, up to 38 percent of pack red
cells can be transfused within two days of expiry, and some months. It's a highly variable variable.
So we've referred to
the storage lesion and it's not something I'm going to go into in great depth,
but very well-described, in many facets.
But I'm going to move on and look at some of the clinical studies that
are out there.
There have been a
whole slew of studies, many of them relatively small, as few as 30 patients,
that have shown associations of the age of blood, with older blood being
associated with a worse outcome in these populations, and as referred to, there
are also several studies that have shown no association.
Again, what's common
to all of these, in fact, is the size of the datasets, and they are all relatively
small, mostly less than about 500 patients, which is why, I think, this
publication that came out several weeks ago by Colleen Koch at the Cleveland
Clinic is so interesting, and that is because it's the first one with a very
large dataset that we've had a chance to assess the issue of aged blood
storage.
And what they did was
take a period of time, and a relatively homogenous group of cardiac surgery
patients at a single institution, and divide them, again, down the median age
of blood transfusion.
And they selected
patients who, by chance alone, had received all of their blood transfusions
that were either less than or equal to 14 days, or greater than 14 days, and
you can see the breakdown there. And
they asked a propensity adjusted association and a multivariable logistic
regression analysis for the storage age association with postoperative
mortality that's within 30 days. And
then they looked at a Kaplan Meyer approach to long-term survival.
This is the table
showing the outcomes, and in hospital death was associated with a p value of
.004 for increased mortality with older blood, and as referred to by the
previous speaker, there were specific major organ complications also
associated, potentially implicating certain pathophysiologies in this
association.
And notably, other
organs were not associated, particularly the neurologic outcome and the
myocardial outcome, did not seem to be related to this association with
mortality.
They also, within the
dataset that they looked at, had a composite outcome of organ outcome on
mortality, and developed this in terms of the oldest unit each patient had.
So even among the
patients that were in the older age group blood, they found patients who had
older blood tended to have a higher complication score. And I also want you to fix on the shape of
this figure, because I'll be showing you some other data from our institution
which you can compare with this.
And the long-term
mortality, as we've already seen, was associated with a worse outcome when the
blood was older, and effect which seemed to approximately resolve after about a
half a year.
So if we just move on
now to work at our institution, this was actually work we published prior to
the publication of the Koch data, so I'll show you some data we've done
subsequently to compare our outcomes with theirs, to see if they are similar,
and you can see the data there.
And these are the
number of units that the patients were receiving, on average, and you can see
that two is by far the most prevalent units.
We did not include within this period of time the patients who did not
receive any blood transfusions. These
were the blood storage vehicles for that period of time. You can see them in AS-1 and AS-3.
And in contrast to the
average age of the units transfused, we looked specifically at the oldest unit
that any patient received, and you can see that in contrast to the median age
of old transfusions being 14 days, if you look at patients receiving blood with
heart surgery, there tends to be an older average median age, and probably the
larger explanation for that is that many of these patients receive more than
one unit, and of course each time you receive another unit, you have a second
chance to get an older unit.
So median age, 19
days. So apologies for the numbers here
being small, but these are post hoc numbers, just to give you a comparison to
the Koch data, and the first numbers there are essentially reproduction of
their analysis, and you can see the 30 day mortality. 30 day mortality is 1.68 in the patients who
received only blood less than 14 days, and 2.38 in those who received more.
Those are actually
very similar numbers to those in the Koch database, and the p value, while only
trending, was .14. As I say, this was
not the primary analysis.
When we looked at a
similar analysis, but looking just at the oldest unit, in this case we found
the p value was .003, and again if we used the median value of oldest unit,
less than--or more than 19 days, the value, p value was .01 in terms of
statistical significance.
But, in fact, the
primary analysis we were looking at was the storage age of the oldest unit, and
postoperative mortality, and similar to the Koch study, the long-term survival.
What we found, in
fact, was that the oldest unit was a strong predictor and highly associated
with mortality risk, even when the number of units transfused, and the Hammond
score, which is a well-validated preoperative mortality risk score, were
included in the model, and the associated risk was an increase of 20 percent
for every increase of seven days in the age of the blood.
But what we were
interested in, in addition to evaluating for the presence of an association,
was whether there was a curvilinear pattern to this association, if you wish,
the shelf life. Was there a best before
date, or anything in that nature?
So we added in a cubic
spline analysis to this initial analysis, and--excuse me. This, again, essentially just shows the same
data I just presented in another format, showing that the effect was similar in
all groups relative to their preoperative predicted risk of mortality.
But this is our figure
showing stratified by score, so patient risk, in fact, very similar patterns,
again, very similar to the pattern we saw with the Koch study, showing--or
suggesting possibly that there is a period after which this association changes
in its steepness of the curve, if you wish.
Similarly, if we
stratify by the number of units patients received, again, a very similar curve
shape, and again, approximately 28 days, we see a change in the slope of the
curve.
Also similar to the
Koch data, we found an association, not as strong as the Koch data showed, with
the sicker and more transfused patients, showing a statistical increase in their
mortality risk with more older--their oldest unit being an older blood unit.
So leukoreduction was
raised, actually, as another variable, and it turns out that in our dataset,
leukoreduction was introduced approximately halfway through the study, and so
we were actually able to introduce this as a question.
Mortality, in
hospital, and long term, has actually already been looked at with
leukoreduction. In fact, in one
meta-analysis of trials with leukoreduction, and they have in fact found no
association with outcome.
Interestingly, though,
in the three trials with cardiac surgery subgroups, one of the more
critically-ill groups evaluated in this study, they did see an association in
three randomized controlled trials combined, of leukoreduction with reduced
mortality in this period of time, 22 to 66 months.
As pointed out, the
issue of leukoreduction potentially has some relevance to the age of stored
blood, because if there are white cells lingering, then they have longer to
show their displeasure and release, potentially, of cytokines and substances of
such.
And in fact one study,
postcardiac surgery patients, has looked at the association of leukoreduction
with age of blood, and in fact demonstrated an increased incidence of pneumonia
in the patients who had not been leukoreduced.
Interestingly, in our
study we did not find any association between leukoreduction and patient
outcome.
So, in conclusion, I
think red cells can contribute to complications, in addition to their
life-saving potential, and certainly in the clinical arena, I think there's
been a move towards using them as a last resort, once we've exhausted all other
safer methods of avoiding extreme anemia, but prolonged storage of blood is
relatively consistently associated with higher complication rates, including
mortality in some patients datasets.
And the role of
leukoreduction certainly remains to be determined. And overall, my conclusions are that
effectiveness trials, which essentially is a lot of what we've done in the past,
have been sufficient, are going to be continuing to be added to by
post-marketing surveillance safety follow-ups.
So how this relates,
if you accept that the retrospective data is useful, then I would say this
suggests that we shouldn't take steps backwards before we know that we're in
the right place in terms of the approach to storage of blood. Thank you.
Questions?
DR. SIEGAL: Okay.
Thank you very much. Are there
questions?
Dr. Szymanski.
DR. SZYMANSKI: Thank you.
I was impressed that you brought the leukoreduction into this picture,
since right now we have been talking only about age of the storage, the length
of the storage as possible problem.
But we have not
questioned as to what other features might be involved here, because when your
red cell ages, many changes happen, nonviable cells increase and also the
ability to deliver oxygen decreases, and that's another possible parameter that
can be involved, as well as release of all these cytokines, and maybe the cell
deformability change, plus free hemoglobin may be a problem.
And I think it would
be really nice if some kind of prospective study could be done, that would
analyze various factors, that might be responsible during the, you know, during
the aging of the stored red cells. I
think that would be very important, because up to now, we have only talked
about viability. We haven't talked about
function and other characteristics of this which could be harmful in a clinical
situation.
DR.
STAFFORD-SMITH: My comment, that I think
we've referred to before, there's one study by Hebert and colleagues, that
looked at, you know, a small dataset, approximately 60 patients, the
feasibility of adjusting the age of blood that people receive, to look at if
there is a difference in outcome.
I think the potential
limitations or the challenges that are going to occur in terms of designing the
ultimately study are firstly, that--and we, in our institution, are beginning
this process--is the position of equipoise, and is there sufficient equipoise
to justify giving one person, or one group of people a particularly old unit of
blood?
And to demonstrate the
associations that are being demonstrated, you really have to move out towards,
you know, the significantly older blood, and even, for example, the Hebert
study was looking at four days old versus 19 days old, on average, and one
would expect that that is not really where the signal is, if there is a signal.
Now the only other
variable I--well, the only other thing I'd like to mention is we were highly
concerned, obviously, with the retrospective nature, and trying to do
everything we could to evaluate the dataset.
One thing certain is
when one is in the operating room and reaching for blood, one doesn't look at
the date of expiry when one's looking at the patient characteristics. But we did look at the date of expiry and
patient characteristics for many of the variables that we had available, and
there were none where we were able to show any major deviation from standard,
you know, random, essentially, distribution of units to patients with various
characteristics.
DR. SZYMANSKI: Were these units given during the serious or
operative period?
DR.
STAFFORD-SMITH: I beg your pardon?
DR. SZYMANSKI: When were these units transfused, during
surgery or--
DR.
STAFFORD-SMITH: For our study, they were
from the start of the surgery to the discharge from hospital. Or death.
DR. CRYER: You had almost half your patients receive two
units, or less, it looked like. Was
there a difference in mortality in that particular group?, because you wouldn't
think that that would be dying from some complication of the operation. It would have to be more likely the blood.
DR.
STAFFORD-SMITH: Right. We broke down--we did a subanalysis of
patients, in fact, with one unit or one and two units, and, in fact, in that
dataset, this association was not present.
Now having said that,
that's a relatively small total number of patients also.
DR. FINNEGAN: I have two questions for you, actually. The first one's a little rude. Do you know, either in your study or in the
DR.
STAFFORD-SMITH: How do you mean, the
processing? You mean the target
hematocrit?
DR. FINNEGAN: Yes; for the 42 days. In other words, what we're looking at now, do
you know if you fell in the 67 percent threshold or in the 75 percent threshold
in the processing of your red cells?
DR.
STAFFORD-SMITH: Right. I apologize.
I don't know that.
DR. FINNEGAN: My second question is Mark Gladwin has done
some work in sickle cell patients and found that the native serum hemoglobin,
that is, free hemoglobin in the serum causes significant problems with nitrous oxide,
the scavenging. And so my question would
be: Do you think that some of the problem you're with the older cells is in
fact related to the free hemoglobin?
DR.
STAFFORD-SMITH: That is very
possible. There are lots of other
circumstances that I can relate to clinically, where you see that. For example, when we've done some of the
trials with free hemoglobin, attempts to replace blood, blood substitutes, one
of the major problems is hypertension, for example, and acute kidney injury,
both of which are related, presumably, to the nitric oxygen scavenging of the
free hemoglobin.
DR. FINNEGAN: So that obviously, at the higher the cell
survival rate, then the lower the free hemoglobin you're going to have, the
better the outcome?
DR.
STAFFORD-SMITH: Along that rationale,
that would be true; yes.
DR. KATZ: In a New England Journal paper, recently, or
what you presented, are you able to control for the operative team that's doing
these surgeries and these mortality analyses?
The surgeon and--
DR.
STAFFORD-SMITH: No, I understand what
you're saying. We didn't make a specific
attempt, but again, I guess if I was to try and defend, you know, the analysis,
I would say that our desire to pick a younger or an older blood wasn't affected
by the surgical team.
In fact, I think we
were pretty much random in terms of which blood, other than probably the one at
the top of the pile of bloods under the ice bag that we picked, and again that
was pretty much random throughout every variable we looked at.
We didn't specifically
look at surgical team.
DR. SIEGAL: Dr. Rentas.
DR. RENTAS: I just happened to read the paper that you
just mentioned last night, and actually it was mentioned by Dr. Davey as
well. You show Table 2 but Table 1 was
never shown, and it seems to me like the preexisting conditions in some of
these patients that were given older blood was increased as compared to the
patients that were given fresh blood.
However, when you look
at the discussions, the authors really didn't go into any details about
that. Is there something you can say
about that?
DR.
STAFFORD-SMITH: Well, they actually also
have a figure which attempts to demonstrate a very similar pattern of
transfusion among the older and the younger patient groups in terms of blood.
I'm not sure that I
can explain why those patients had differences in the older and younger
patients.
DR. DI BISCEGLIE: This may be related. The technical thing in the conduct of all of
these studies. You had said something
like the more blood the patient needs, the more likely they are to --
DR.
STAFFORD-SMITH: Well, the more
opportunity, because sort of like if you throw a dice, there's more chances
you'll get a higher number.
DR. DI BISCEGLIE: I'm sorry.
I couldn=t --
DR.
STAFFORD-SMITH: Sorry. Each time you throw a dice, you get another
chance to get a high number. So you just
take more chances to get a high number or an old blood, in fact. It's a potential bias of if you take a
patient with many transfusions, they just have many more chances to get an
older unit.
DR. DI BISCEGLIE: -- that the blood bank is giving you the
oldest, and then the next?
DR.
STAFFORD-SMITH: No.
DR. DI BISCEGLIE: Under a policy of first -- because if that's
correct, the next time you throw the dice, then --
DR. STAFFORD-SMITH: Well, at the risk of bringing out something
that's hard to explain, and maybe even harder to understand, we were as
concerned, as maybe you are about this.
So we tried to find a way in which we could assess for the potential for
bias because of the number of units a patient had received.
And so what we did was
we took the patient's blood unit number, and we took the last number of each
unit, and we considered that a random number generator.
And if you give a
patient more units, sure enough, the mean number that they receive goes
up. Gradually climbs. So what we wanted to see was, in our
analysis, if we used that analysis with this random number generator and
controlled for the number of transfusions, did an association between mortality
and increased transfusions disappear?
In other words, was it
adequately accounted for by controlling in our model for number of
transfusions? And in fact the effect
disappeared with the random number generator being used, whereas with the age
of transfusion being used, it was a robust finding and it didn't
disappear. I don't know if that is
clear.
DR. SIEGAL: Any more questions?
(No response.)
DR. SIEGAL: All right.
Then let's go on. Now we come to
Larry Dumont, director of Cell Labeling Laboratory, assistant professor,
DR. DUMONT: mitigating condition, members of the
committee, and Dr. Epstein, and FDA, and those of you beyond the barrier, good
afternoon.
It's been a
"fun" time. Dr. AuBuchon sends
his regrets. He really wanted to be
here. He knew it'd be a lot of fun in
the discussion.
I wanted to point out,
for some of you that probably don't know, what the best collaborative is. It's a group of manufacturing members and
scientific member, and we're an independent group, we meet a couple times a
year, and we run lots of self-initiated independent type studies looking at
improvements in blood safety.
This paper, I think you
all have reprints of. It should appear
this month in Transfusion.
My conflicts of
interest have not changed in the last couple of hours. And what we want to talk about is the red
cell performance criteria, of course, that FDA has on slate. Now when they evaluate red cells, they of
course look at several in vitro characteristics, like hemolysis, maintenance of
ATP, etcetera. Also this autologous 24
hour recovery, that's the main point of discussion today, and in some case,
they could go to clinical outcome or safety trials.
But again we're going
to look specifically at this one outcome; continue to look at it. I'm going to give you my own rendition of the
history of this. In 1947, we started out
with a mean 24 hour recovery of autologous radiolabeled red cells of greater
than 70 percent. That has morphed into a
greater than 75 percent recovery in 1985, and then, with the addition of a
standard deviation requirement, in 1998, and that has continued to morph until
what we have today with mean recovery greater than 75 percent, the standard
deviation criteria, and this business of the lower confidence limit, one cited
95 percent for the population proportion of successes has to be greater than 70
percent.
And this is where that
75 percent comes into play, and I call that the success threshold, and that's
where the business of 21 successes out of 24 trials, or 18 successes, at least,
out of 20 trials, comes from.
And I think the major
discussion is not that value. It's that
value, right there.
Well, as you already
know, we're talking about lots of percentages, and it gets confusing, at best,
but I think I can tell you it's going to be okay, we'll get through all these
things.
To try to help that,
I've actually constructed some cartoons, and we're all fairly comfortable, I
think, looking at Gaussian curves, and these are cartoons, because actually
these distributions are not strictly
Gaussian, and so my apologies to all the hoard of biostatisticians in
the room, but I think it'll make the point.
So in 1970, a
distribution that would look like this, with a mean of 70 percent, or higher,
that would be okay. And this, down here,
is the 24 hour red cell recovery.
Well, in about 1985,
that was moved up to a minimum of 75 percent for the mean. Then in the 1990's, there was the addition of
the standard deviation criteria.
Well, about 2004,
there was a decision made that everything less than 75 percent happens to be an
unacceptable individual recovery. So
anything down here is bad. One more little
apology for the cartoon. Microsoft won
the battle the day I was making these.
This is actually supposed to come right down to seventy-five. Couldn't figure that out. But I think you get the idea.
Well, the implication
for this, from a practical standard, is that a test red cell, or a new red cell
product, a new bag, new solution, new machine, really must have at least a 90.3
percent success for an 80 percent chance of passing a in vivo recovery trial,
something that I would run in my laboratory.
So that looks like this.
So for an 80 percent
success probability to run a trial, a manufacturer that would come to me and
want to contract for my research lab, we would hope that they would be
up--that's about an eighty--87 percent with that standard deviation criteria,
to pass, have a reasonable chance of passing the current criteria.
Well, we looked at
that and we asked a couple questions which have been asked today already.
The first one was what
is the clinical evidence that this number down here, this 24 hour recovery
number, has anything to say about the reactions that we see in patients, which
are real; but how does this help us with that?
And when we looked at
that, we kind a got a blank piece of paper out of it, and just one small
comment about the epidemiological studies that have been talked about
today. Certainly, I'm a big proponent of
those. I like them. I think they're great. But we need to realize that-- one
example. Red cells are not issued at
random. They absolutely are not.
I was reading the New
England Journal paper from Cleveland one night, my wife, who's a blood banker,
was making dinner. I got to the table
that described the characteristics of the early and the late, and it showed ABO
type. And I said, "Hey, Deb,"
I briefly described the study for her, I said, What do you think the
distribution of ABO types for the newer and old were? She filled in the table.
I mean, every blood
banker knows that. So that's a real
fallacy of some of those studies. So we
came up with a blank sheet.
So their next question
was what is the capability of current red cells products to meet these
criteria? So that's where we went. And our conclusion, which we'll go through in
more detail, is that a success threshold of 67 to 70 percent, instead of 75,
will provide a reasonable probability of passing FDA-proposed criteria for red
cell products in current use in the
And that would look,
again on this cartoon, a distribution like this, that's clearly moving up but
doesn't meet this previous high standard that I showed you.
So the objective of
our study was to define the ability of currently-available red cells collection
and storage systems to satisfy this criteria of in vivo recovery proposed by
FDA for approval of red cell systems.
And the way we went
about it is we looked for data that was available for all approved, cleared
methods between 1990 and 2006. We went
directly to the laboratories that conducted these studies and we ended up with
data from 11 laboratories, and they sent the data into us and we put it in a
central database, all coded in secret, so nobody knew what it was.
Of course you had to
do a data review and cleaning, and find where the obvious mistakes are in that,
and double-checks, etcetera.
And then we also,
after that, we went back to the sponsors of these old studies, and we said
please dig out your records and verify that we have the right data that you had
submitted to FDA. So we went through
that verification.
Then we ended up with
a database that you already heard that we also shared at the request of FDA
with them, and they were actually very helpful in reviewing it in detail, and
so we got a pretty solid database.
We then stratified
into three types of products. Liquid
stored for 42 days. So these are things
that are in AS-1, AS-3, AS-5, stored in the refrigerator. Gamma irradiated products. These are stored 28 days,
post-irradiation. And then products that
have been frozen and deglycerolized, stored 15 to 30 days in the freezer.
Then we approached it
from a sampling standpoint, where we went into each of these groups, and we
sampled samples, sets of 24 each, and we repeated that for a total of 5000
times. Some people might call this like
a
So we repeated that
for each of these groups. Thank God for
computers. And out of 34 studies, we had
some leukocyte-reduced products, some nonleukocyte-reduced. We had some automated collections, some
manual collections. Of course we had the
frozen/thawed. Liquid stored
gamma-irradiated. In total, we had 941
of valuable recoveries from this dataset.
And for a descriptor
of each of these groups, we'll go through that now.
This shows for the
liquid-stored products, there were 641 of them, and this is the 24 hour
recovery, and this is a frequency histogram, and with the lowest recovery in
this report of 36 percent.
The frequency that was
less than 75 percent in this group was 11.7 percent. Remember now, these are called failures, and
if we just did a binomial expansion of this descriptor, and looking at a sample
size of 24, the probability of having greater than or equal to 21 successes by
sampling out of this was 69.3 percent.
But we did this, and
then plus we did the resampling exercise.
The gamma-irradiated products had 31 percent that were less than the 75,
and you can see means of standard deviation's over here.
And the probability of
having a successful experiment against this population was exceedingly small,
less than 4 percent.
The frozen/thawed
products were--actually, they looked the best of them all. We had 5.6 percent less than 75 percent, with
95.7 percent chance of passing the criteria according to the binomial
expansion.
So how does this
work? Well, this is just an example of
one type of sample of 24 that we took.
So this was the 257th replica of 5000 that we did like this. We had a lab identifier. We knew what storage solution it was in. We had a recovery value.
And in this case we
were evaluating less than 70 percent recovery.
So you can see that this one is 64, this one is 67, this one was 58, and
that one was forty.
So those were failures
at the 70 percent criteria, and so this sample of 24 did not pass any of the
current criteria, less than 75 percent, meaning it had greater than 9 percent
standard deviation, and we had four out of 24 less than 70 percent.
So again we repeated
this, we had 5000 groups like this where we had these descriptors for each of
the populations.
And then we
reevaluated at different cut levels, success thresholds, 75, 70, and 67
percent.
So here with the
resampling, this is with the 42 day liquid stored products. This is the number of recoveries that are
greater than 75 percent. So these are
all failures, and the number of successes that we would have in doing this
study with that population is about 67 percent.
They all passed the
mean criteria. They all passed standard
deviation. Well, 95 percent passed the
standard deviation criteria.
With the
gamma-irradiated product, just like the binomial predicted, it was a miserable
success against that criteria, 3.5 percent passed. The mean greater than 75 percent, 96 percent
of them passed that, didn't do so well on SD criteria.
Frozen/thawed products
did the best of them all, with 95 percent passing the 75 percent success
threshold.
So our initial
conclusion on this was based on that failure rate, and the concern that we
didn't have a good correlation between 24 hour recovery and solid clinical
outcome, was that that was an unacceptable number in our eyes.
And we felt that the
general clinical performance of these products is certainly adequate as proved
over years of clinical practice, and it represents the state of the art.
So we ask ourselves
what's the sensitivity to the success threshold. So this shows the chance of passing. This is the liquid stored products, the
gamma-irradiated products, and the frozen products, and these are success
thresholds of 75 percent, 70 percent, 67 percent, and you can see that it's
quite sensitive to the success threshold between 75 and 70 percent.
That made a big
difference, a little bit more difficulty if we dropped to sixty-seven. The paper describes how I got the 67
percent. We won't go into that today. And of course the gamma-irradiated product is
very sensitive in this range and doesn't have much effect for the
frozen/thawed.
So the summary for the
chance of passing these criteria, 42-day liquid stored, gamma frozen, shows
that for the 42-day liquid stored product, a 100 percent of them passed the
mean criteria. 95 percent standard
deviation criteria. With a sample size
of 24, 69 percent passed the 75 percent.
With a sample size of 20, it gets worse.
As we would expect, 58 percent.
If we would modify the success threshold to 70 percent, you can see how
that probability or that power improves, and that's what it looks like for 67
percent.
So some other key
observations that I wanted to show the committee.
One is current red
cell products are not different than the study population that we examined, and
number two, there are differences between laboratories and/or study subjects.
So this is
liquid-stored products. This graph I
showed you a minute ago, where below the 75 percent, we had 11.7 percent of
these products were less than that. Less
than 70 percent recovery, 4.5 percent were less than that.
Now here's some data,
recent data. These are from two studies
that are ongoing in two different laboratories.
These are controlled products.
This is a product that's being transfused this afternoon in our blood
banks, in our operating rooms, and out of 36 products, right here, four out of
36, or 11 percent, are less than 75 percent.
2.8 percent are less than 70 percent.
So, to me, this looks
like that. There are differences between
laboratories, and probably study subjects.
So this shows you the same recent data, two different laboratories, to
be unnamed. This shows 24 hour recovery,
exactly the same conditions, and you can see the distribution in the laboratory
that I call number one and you can see the distribution in laboratory number
two.
So we have some
differences between the labs and we clearly have differences between subjects.
We have some other
observations that we haven't had a chance to evaluate yet but we're generating
a hypothesis of why this might be caused from special effects in the specific
subject. So there's a whole host of
unanswered questions, I think, in this assay method.
So our conclusion is
that the FDA proposed success threshold of 75 percent is not validated against
currently-approved red cell products available in the
Based on actual in
vivo recovery performance, a success threshold of 67 to 70 percent will provide
a reasonable probability of passing FDA-proposed criteria for new products, and
it might look something like this, where the lower confidence limit, 95
percent, for a population, proportion of successes has to be greater than 70
percent, where the success threshold is 67 percent.
In my view, if we're
going to use this kind of criteria, the mean and standard deviation are helpful
but I don't think they should be in the criteria, cause I think this takes care
of the whole issue, and the other problem is none of these distributions meet
normality assumptions. So it's a real
problem, trying to make inferences with means of standard deviations.
Once again, we would
suggest not to make it unnecessarily burdensome for new innovations to enter
the market, and would suggest that a distribution that looks something like
this with--this is shown with a 70 percent cutoff--would be a reasonable
approach for a criteria for new products.
And I wanted to
acknowledge these are the study laboratories where the studies were done. The sponsors of these studies are shown
here. The individuals that worked really
hard to pull out the old dusty records are shown right here.
Thank you very
much. I'll take questions.
DR. SIEGAL: Questions for Dr. Dumont?
DR. FLEMING: Dr. Dumont, you had a key introductory slide,
about six slides in, that raised two critical questions, what's the clinical
evidence and what's the capability of current RBC products. I don't know if we can put that up here while
we speak.
AUDIENCE MEMBER: We can't hear you.
DR. FLEMING: Oh. So
we were asking for his slide that--it comes right after this, I think. Okay; there you go. Thank you.
So there are two key
issues here. One of them is what I might
call one based on clinical relevance issues, and another that is more
statistical power issues, and surprising to you, maybe, being a statistician, I
really want to focus more, right now, on the clinical relevance issues.
We will come back to
these power issues after the next presentation, when I think there's even more
data to address what is the likelihood of success.
But you spoke--and I
understand--you spoke with some concern about the reliability or validity of
the type of data that we've seen, indicating that these measures of success
could truly be relevant or related to what we really care about, which is the
risk of clinically-relevant outcomes--ventilary support, renal failure, sepsis,
multiple organ failure, mortality, etcetera, and we spoke a lot about mortality
in the previous presentation.
I understand your
point. There is certainly valid reason
to be concerned about lack of randomization, etcetera. There still is, however, a considerable
amount of evidence there that raised some concern, even though you can validly
question the reliability of that concern.
But I haven't heard
what you've provided as the evidence for why the shift doesn't matter. So specifically what's not shown here in your
slide presentations, but what's shown in your paper, is that the median recovery
when you have frozen is 88 percent, and when it's that high, that's the reason,
as you're saying, when you're shifting, your distribution has shifted over
here. That's the reason you have a high
likelihood of meeting FDA criteria.
The median recovery
for gamma-irradiated is about 79 percent.
You've shifted this distribution over considerably, and as a result, you
correctly noted, you have a low probability under the FDA criteria, that those
interventions would be approved.
In essence, how is it
that you're explaining to us that while you're questioning the evidence that
shifting from here to here is in fact putting you at greater harm, what
evidence are you giving us that it's not putting you at greater harm? That's a very substantial shift here, and are
we saying, are you saying that it's perfectly okay to have a 78 percent, or 79
percent success, or average recovery, rather than 88 percent average recovery,
and that those two differences don't matter clinically?
What is the evidence
that you're--so you're contesting the evidence that says it does matter
clinically, but you're not giving us any evidence that says it doesn't matter
clinically, and that's a pretty substantial shift between the average recovery
for the frozen versus gamma-irradiated.
DR. DUMONT: So you're looking at the right side, I'm
looking at the left side, and I would submit that when we were here, when we
were here without the shaded area in there, that we had products, and in fact
we have products today that are being used, that have--they do have some kind
of risk profile associated with them.
I subject that putting
this mark at 75 percent is strictly arbitrary, and that it is not demonstrated
anywhere, that I'm aware of, that that is associated with any of the negative
events that we see in the clinic. And in
fact there would probably be an even stronger association if we would look at
other parameters such as 2,3-DPG level.
I mean, we can just
pick one, and, you know, there's fifth of them, and we use this for good
reason, but I think that it's supplying--I mean, my suggestion to the committee
to consider is that that's supplying an unnecessary burden for new innovative
products.
DR. FLEMING: But what you're suggesting, that I agree
with, is the further you require this distribution to be shifted to the right,
the higher the burden it is for a product to achieve the criterion.
But the fundamental
argument for where that should be shouldn't be on a statistical power
calculation. It should be on a clinical
relevance situation. There have been data
put forward that say when you go from this region over toward the left, you're
going to be in a higher risk for clinical outcomes of concern.
You're contesting the
reliability of that data but you're not providing us any evidence that in fact
reassures us, that when you allow this distribution to shift substantially to
the left, that it's not going to be harmful.
It sounds a bit like
absence of evidence is evidence of absence.
We don't have data that it's a problem.
Therefore, it's not a problem.
DR. DUMONT: Well,
okay. I get it. Can I answer?
All right.
DR. DI BISCEGLIE: May I clarify a question, I guess the onset
to Dr. Fleming's question as you say, that the evidence that he's looking for
is in fact that most of the approved products in fact shift that curve to the
left now, and so we have the clinical outcomes that we have today. Isn't that the evidence that he's asking
for? No?
DR. DUMONT: I believe, in my view, that's the only
evidence we have. I believe the other
data that--where we say younger red cells have a higher recovery, and younger
red cells may have better clinical outcomes, that may be an example of true,
true and unrelated, because we have no data that says that this particular
measurement is causal in clinical outcomes.
DR. VOSTAL: If I could just make one point. The criteria that we're talking about today
really applies to liquid stored red cells, and the other conditions, which are
gamma-irradiated cells and frozen red cells, they're special cases, and we'll
discuss those at some other time. But
for today's discussion, it's only liquid stored red cells.
DR. SIEGAL: Dr. Cryer.
DR. CRYER: I'd like to ask, in the FDA presentation,
there was three different graphs they put up, and I assume those were all
liquid red cell products. Okay. Were those three all in your study as
well? Do you know? Because one had a huge variation and eight of
them below the line, and--
DR. DUMONT: I think they were; yes.
DR. CRYER: They were all over the place. And another one was really tight.
DR. DUMONT: Those data, I believe, are included in this
dataset. There's the additional 94, but
you're talking about the ABC slide?
DR. CRYER: Yes.
The ABC. Yes. The ABC slide.
DR. DUMONT: Where you had the--
DR. CRYER: And I guess the problem I have is if that's
true, A and--I think it was A and C, I can't remember. But one of them--whatever--there was one bad
one and there was one tight one, and I'm having a little trouble, why you would
think, using your statistical analysis here, that those two products were in
any way similar.
You're saying they
were both safe and fine, basically is what I'm hearing you say. And I wouldn't want one. The other one looked okay.
DR. DUMONT: Well I'm saying we don't have the clinical
outcomes to answer that question.
DR. CRYER: I would
agree with that, but you're measuring a process. You're not measuring clinical outcome. You're
measuring a process. This whole thing
measures a process--
DR. DUMONT: Absolutely.
DR. CRYER: --of how reliable the survival of red cells
is after a process. That's what it
measures.
DR. DUMONT: I'm sorry.
I can't address that any further.
DR. FLEMING: Before we lose this slide, I want to make
sure we keep our eye on the target here, because it doesn't matter that it's
liquid or not liquid. Suppose we are
just focusing on liquid.
What this slide is
saying is if you have liquid, where the distribution is here, centered around
ninety, that's going to be a product that's going to be just fine with the
current FDA criteria.
If you have another
product where it's centered around 78, make it liquid, it's not going to do
just fine under the FDA criterion that's currently in place.
But if you soften the
criterion, it too will do just fine, and so it doesn't--these issues are not
specific to whether it's liquid or frozen.
The point is whatever the formulation is, are we saying that if you have
a recovery that is normally distributed around 90, that's great. We're all agreeing that's great.
But if you have one normally distributed around 78, that's just
fine too. If you believe that, then we
should make these changes, and that's going to get those products on to the
market just as--or very readily. But
what's the scientific clinical rel--this isn't the statistical--clinical
relevance, that when you get to that much lower a recovery, it is in fact just
as good as when you had 90 percent recovery.
DR. ZIMRIN: I guess I'm a little bit naive, but I'm used
to scientific presentations that actually try and present all the data, and a
balanced view of the data. I find this a
little bothersome, that there's studies, that we hear about two studies that
suggest one thing, and we don't hear about the whole host that don't.
I mean--and I'm sorry,
I've forgotten his name--but a speaker implied that there are a bunch of small
studies and then the New England Journal study came along.
But there was actually
a study looking at more than 2000 patients in the
So I feel a little bit
frustrated here because I would like to have a scientific, thoughtful analysis
going on, and it seems that we've gotten sort of--I mean, this has gotten sort
of polarized in this, and I just find that disturbing.
So when you haven't
seen the data--but I don't think we've been presented with the data, actually,
at least all the data that's out there.
DR. VOSTAL: I'd just like to make a comment about the
European study. It didn't show a difference. However, those are red cells stored in a
different storage solution. It's called SAGM, which includes mannitol. So it's not the same storage solutions we use
in the
DR. KATZ: Larry, it seems like a lot of your argument
hinges on the ability to get something important to market, and it might help,
particularly nonblood bankers in the group, to have an idea of what it is that
we're having trouble getting, or will have trouble getting as a result of more
stringent criteria.
DR. DUMONT: You want an example of what kind of product
might be--there could be a new product for the processing of whole blood into
multiple components. That would be one
example.
There could be a new
type of blood bag that would not use DEHP plasticizer. There could be a treatment process to
inactivate pathogens in the blood product.
Those are types of things that would be subjected to this test. Is that what you were going for?
DR. CRYER: Yes.
Can I ask one more. Maybe, Tom,
you can help me with the statistical part, but it seems like if you're looking
for better accuracy in a process, and it seems that the criterion that the FDA
put down with having a lower confidence interval, be above seventy, you can
achieve that by having a higher mean with the same variability, or you can
achieve it by having the same mean with less variability.
And it seems to me
that what we're asking for is more consistency in a process and less
variability, and this really only addresses fixing it by moving the mean.
DR. FLEMING: You are correct, and Dr. Dumont I think
acknowledged, when he was presenting, that we're trying to simply--and I
appreciate what he was trying to do. He
was trying to take some complicated issues, simplify them, assuming that you
have normal distributions, and you're absolutely right--you don't necessarily
have normal distributions, and the essence of what the FDA criterion indicate
is that you want the area under the curve, that falls to the left of 75
percent, to be rather low.
And you're exactly
right. You can get that by either
shifting the distribution to the right, with considerable variability, or
tightening the variability, getting more precision around having maybe not a
higher mean but a mean that's sufficiently above 70, 75 percent, that it's
sufficiently precisely estimated that you have a low probability of being below
seventy-five. So you're right.
DR. SZYMANSKI: These studies really done on technical level,
and there are technical variabilities, you know, you have to take that into
account. That doesn't mean that they are
the true, absolute--absolute truth.
For instance, if you
measure the red cell mass, different methods, you will get different results,
and right now, most of that cell masses in these studies are measured with
technetium, in technetium, and that has a higher grade than chromium.
And when I compared
results, mass results measured with chromium, or with technetium, I found 10 to
15 percent overestimation with technetium.
I presented that in ASHE meeting in 1996. So, you know, this also is one variable, that
one has to consider what you use to measure these various values, and there can
be variations between different laps, depending on what is their methodology.
And then again I want
to bring this donor, recipient donor variability, because that is a biological
variability which is hard to make, you know, totally uniform, that there is no
variation. That you can't have a
Gaussian curve but this very, very, you know, precise area.
And so I mean, you can
decide whatever you want, you can have a perfect, you know, very high levels,
and very high thresholds, but then when you go to labs and you actually measure
these things, you might really not get them, and then you really have
difficulty in obtaining, you know, validation.
And if you apply the methodology as we have used them in the past, these
are the variabilities that there is. I
mean, it would be lovely if it would be much, much better. But those happen.
FDA can put high
values, expected to produce wonderful clinical outcomes. It might; but you might never be able to
measure that.
DR. DUMONT: Mr. Chairman--oh. Sorry.
DR. CRYER: One more methodological question that
addresses the variability issue. Do the
labs that do this sort of testing ever use a pair design, so that the
person--it'd be the same person that got the control thing one week, and then a
week later you did it with the new test on the same person, so that--in an
attempt to get rid of the variability between subjects.
DR. DUMONT: We do use pair designs at times. However, this criteria is not on a pair
design. This is an absolute criteria. So I agree that the pair design would resolve
a lot of that issue.
DR. VOSTAL: If I could comment to that. When companies come to us and talk about the
design of these studies, we always suggest to them they should use a control
arm in their studies, so they can identify individuals who do have poor
recoveries, and those can then be excluded from the final analysis. But it's up to the companies to make that
choice. We don't require that they run a
control arm since we do have a standard, a cutoff standard.
DR. DUMONT: One of the questions that I had from a
regulatory standpoint, if we're going to make the leap to say that this axis, right
here, relates in some loose way to clinical efficacy, then if company XYZ comes
out, and they have a new product, and they compare it to red cells and ASX,
that's in current use, and they show that they're superior to that, then are
they going to be able to get a claim in the market, that they have a more
efficacious red cell?
DR. FLEMING: That doesn't follow, because if we use, in
cholesterol-lowering agents, if we use LDL changes to approve new agents, and
you have statins now that have a 30 percent reduction in cholesterol, and
they're approved based on that, fortunately, we didn't approve lipid-lowering
agents when they had a 10 percent lowering of cholesterol, it didn't provide a
benefit--but if you have a new cholesterol-lowering agent that gives a 50
percent reduction, it doesn't prove that you are better than one that has a 30
percent reduction.
Let's think of Dr.
Szymanski's concern about the variability -- lab variability--the conclusion
that I draw from that is it makes me more worried about using the surrogate at
all. I wouldn't push that argument too
hard, because the alternative to using a surrogate is to do large-scale
thousands of person trials with non-inferiority analyses ruling out that this
new formulation isn't unacceptably worse in terms of what we really care about,
which are the clinical outcomes.
So if we don't believe
in the surrogate at all, or we have considerable concerns about it, the
conclusion isn't to make it even weaker.
The conclusion is to turn to something else. So I'm not of the mindset, even though I'm a
critic of surrogates, that we have that level of concern, unless my colleagues
persuade me that the issue here is if we can still believe in this surrogate,
what is a level of rigor that we need to have, in order to be confident that we
are protecting the public, that we don't have a meaningfully less-effective
agent when you're using these measures?
And if the argument is
being given that this standard isn't going to be met by large fractions of
current state-of-the-art agents, I'd like to just defer until Dr. Kim's
presentation, because I think that isn't true either.
DR. RENTAS: If I could say really quick, I think the
numbers presented by Dr. He speak by themselves. Even when you apply the 95/70 rule back to
1998, 2003, 17 out of 19 will meet that criteria. I just think that speak by itself there.
DR. SIEGAL: Last comment?
That's it.
DR. FINNEGAN: My comment was could you please take a break.
DR. SIEGAL: Yes.
Well, all right. Let's hear from
Dr. Kim and then take our break. And
those who need to take a break now are excused.
DR. KIM: Good afternoon. My name is Jessica Kim from FDA, and I'm a
biostatistician at the Division of Biostatistics. I'm holding cough drop because I have very
severe coughing, so that's why my voice is a little weird.
My presentation's
title is the Statistical Methods in the Evaluation of Red Blood Cell Products (in
vivo study), and I'm going to present statistical ways of understanding
current FDA acceptance criteria for RBC products that the Agency accepted since
2004.
Here is the outline of
my presentation. First, I'm going to go
over timeline of when each element of the current acceptance criteria was
adopted.
And then detailed
statistical analysis, detailed statistical procedures of the current acceptance
criteria will be discussed, and during the discussion two items will be
focused.
One is the criterion
that emphasizes on the viability of the individual RBC products, and the second
one is the statistical power of a study which played important role, analyzing
the historical data. And then briefly,
the summary of the BEST data will be provided to continue FDA's analysis of the
BEST and FDA-combined data, and then I will summarize my presentation.
Now here's the timeline.
Up until 1997, 75 percent RBC survival was used for the acceptance
criteria for RBC products, and in the period, 1998 to 2003, mean RBC survival
at least 75 percent, and standard
deviation is at most 9 percent, and at least 20 units at two sites. The more specific statistical criteria was
settled for the acceptance criteria.
And then after 2004,
the 95, one-sided low confidence interval for the population proportion of
successes need to be greater than 70 percent, and here, a success is defined as
RBC survival is at least at 75 percent.
That element was added to the current acceptance criteria for RBC
products.
Now, in summary, we
can see that you have two parts of this current acceptance criterion, and let
me look at the second part of the current acceptance criterion. That about the sample mean, at least 75
percent, and the sample standard deviation is at most 9 percent, and at least
20 units, in total, and at least two sites.
These criterion is
mostly about the sample data, about the study result. It does imply about the population proportion
but we do not connect using this criterion.
This is only for the--criterion for the sample data, and that the first
part of this current acceptance criteria is more about emphasizing individual
units' viability and connects to the relation to the population distribution.
Now in vivo
study, that the one-sided 95 percent lower confidence limit criterion is
equivalent to, say, we are testing, the population proportion of successes is
greater than 70 percent, versus than or equal to 70 percent.
And the corresponding
testing hypothesis or procedure to test such a hypothesis, we need a couple of
them to be prespecified ahead of time.
One is the definition
of success, and the other one is the determination of the study size with the
significance level, and the desired power.
And here this
individual success is defined as in vivo RBC survivor is at least at 75
percent, and the significance level, in other words, the force positive rate in
this case, is defined as one-sided at 5 percent. And I would like to point out the one thing
in this criterion. Traditional clinical
trial under FDA's regulation, one-sided at .025 percent used and the two-sided,
5 percent used, which means this criterion actually has a little higher force
positive rate than the traditional clinical trial under FDA regulation.
Now before I go to the
actual statistical procedure to testing such hypothesis, this slide shows the
graphic interpretation of the in vivo study hypothesis. Now this histogram is constructed using the
BEST data, and if the population distribution of in vivo RBC survival
percent is given as this graph in slide six, using threshold value 75 percent
to categorize each individual as a failure or success, the right-hand side
under the histogram will give you proportion of successes, and the study
hypothesis, in vivo study, we want to have, we want to make this area
under the curve as large as possible, and actually we set this value at at
least 70 percent.
And you can also
notice that the data from the BEST study, I checked the normal assumption. Unfortunately the normal assumption was
rejected and I believe this distribution curve is not symmetric, and also we
have some extreme values that violates the normal assumption.
So I want to make sure
that the mean 75 percent, and the standard deviation, 9 percent, is about the
sample information, not about the population distribution.
So we are not talking
about the distribution is standard at the 75 percent.
Okay. Now two.
The next question in testing such a hypothesis is the answer about the
question about the sample size, to admit the minimum acceptance proportion of
successes and to take care of the limited resources of conducting such studies,
FDA agreed, and they recommended at least 20 units in at least two sites. And this table shows different, various
samples sizes, study size, and the true number of allowable failures out of
these studies, specific study with a specific study size, to meet this
one-sided 95/70 rule.
So, for example, if a
study conducted with a size, 24, and you have 21 of them is greater than or
equal to 75 percent, then that study will meet the 95/70 rule.
And the next row, if
study conducted with a study size, 28, and 24 of them meet at least the 75
percent, that study will meet the 95/70 rule.
And I want to point
out from this table, as the study size increases, the number of allowable
failures, the number of individual units that did not meet the 75 percent, it
can be increased, and still meet the 95/70 rule.
And that's partially
related to the previous question, that if you shift the mean, shifted the mean
response of the population distribution, that the variation, and also the
variation is getting smaller, which mean the larger your sample size, you will
still meet the--you have chance--to the 95/70 rule.
Now the next slide
explains the correlation between the sample size and the statistical
power. Here's technical definition of
the statistical power. Statistical power
is the likelihood of achieving a statistical significant result, if your
research hypothesis is actually true.
What that means, and
under our situation, if, in fact, the population proportion of successes is
greater than 70 percent, what is the probability or what is the likelihood that
a RBC recovery study will meet the acceptance criteria?
Now that's the
technical definition of statistical power, and that's the practical application
of the statistical power to our situation.
Now the big things
that I want you to notice in this definition of the statistical power is the
assumption. How much confidence, how
strong evidence do we have about population distribution? Depending on that information, your
likelihood, your probability or your statistical power will be different.
And this next
statement, the sample states the relationship between sample size and the
statistical power.
Now if the likelihood
is good, if the chance that you will meet the conducted study, will satisfy the
current acceptance criteria is good, in a sense, at least 80 percent, then your
sample size would be considered adequate.
In other words, if I
make it a negation, if the likelihood is not good, if you have a low chance of
meeting the current acceptance criteria, then your sample size would not be
considered as adequate. So the answer to
that kind of the, the issue, to take care of that issue you can increase the
sample size to get the higher power, to higher chance of meeting the acceptance
criteria.
So this is the
technical definition and the relationship of the sample size and the
statistical power, and next couple of slides, I will show, again, the numerical
ways of looking at the statistical power in relation to the sample size in the
big assumption, the "if" part.
Now this table, I
calculated all the powers with a different sample size, different study
size. So this number, 14/19 indicates
the study size, and the parenthesized value is the number of allowable failures
to meet the 95/70 rule.
And here, first column
is assumed to be true rate. If we had
prior knowledge on the population distribution, you can have .75, that's what
you believe about the population distribution.
Under that assumption you can calculate the power.
If you have a prior
knowledge about the population distribution, the population proportion of
success as .85, you can use that assumption for the population proportion of
successes, you can calculate to the power.
Now I want you to pay
attention. If you have a strong belief
about the population distribution, the population proportion of success if
.9. As the sample size increases, the
power increases. The likelihood that
that particular study will meet the acceptance criteria increases.
Well, under the fixed
sample size, if you have information about the population proportion of
successes greater, increasing than with fixed, under the fixed size, the power
increases. And the number, the true rate
increasing, meaning it's farther away from the testing value, the testing
hypothesized value which is the 70 percent.
So what I want to
emphasize from this calculation, from this table, power depends on study size
and also the assumed true rate, population, the proportion of successes. So the question becomes this increasing
sample size, that's obvious, to get the high power, but the question becomes
how much information do we have about the population proportion of successes?
How much statistical
evidence or the clinical evidence, how much evidence do we have? which number
can we use to evaluate the likelihood of the particular study will meet
acceptance criteria?
So I'm going to talk
about this estimation about the population proportion of successes, using the
FDA and BEST data.
Now this graph shows
with the selected study size, study size 33, 28, and 24, the power curve looks
like this, and again, the power increases as the true rate increases, and the
power increases as the sample size increases, study size increases.
So if you have a
strong belief, strong evidence about the true population proportion of
successes as 0.875, that's the vertical line, then the larger the sample size,
this upper--the curve of the sample size is 33.
The large the sample size, the strong likelihood that you will achieve.
Okay. The next two slides is a brief summary of the
BEST data, and this table is quoted from Dr. Dumont's paper, and FDA
investigated and verified all the information as correct.
Now there were 42
liquid-stored data, 641--42, liquid-stored data values, and the percentage of
the individual RBC recovery will meet at least 75 percent was 88.3
percent. And the 70 percent threshold
value, there were 95.5 percent of them, meet that 70 percent threshold value.
And then there were
98.1 percent of them meet the 67 threshold percent, and the mean response was
82.1, and standard deviation was 6.71.
And so using those
proportion, meet the different threshold value from the BEST data, we can
calculate the power, the likelihood to meet the current acceptance criteria
with the specific sample size of 24, and it was 0.693 and 0.979 and 0.999. And as I emphasized before, depending on the
estimate about the true rate of the population proportion of successes, the
power increases, and this one is just a sample size 24, and if you increase the
sample size the likelihood will increase.
Now this table
summarized the BEST data, using the number of years that the product, the
particular product approved, and whether the particular product satisfies the
current acceptance criteria. So, for
example, there were two studies, retrospective, collected, approved in year
2000, and those two studies met the 95/70 rule.
And the year 2001,
there were two studies collected, and one of them didn't meet the 95/70
rule. And the BEST data, there were one,
two, three--four studies approved in the 2004 and 2006, and FDA was able to
collect four additional approved studies in year 2004, 2007, and those
approved, additional approved studies, would be added in the FDA's analysis.
Now here's the scatter
plot of all the data, and the axis indicates the year of the study approved,
and with a decimal point extension indicates the study number.
So in 1990, study
number nine was approved, and this is the distribution of the data, values of
the 24-hour RBC recovery percentage. And
that horizontal line indicates the individual success threshold value for the
individual RBC recovery percentage.
And the circled study
are from the four additional FDA's data.
The next graph shows
proportion of successes of each data, and again the x axis categorized the year
of the study approved and the study number, and the horizontal--the vertical
line indicates observed proportion of successes.
And the circled
location shows a study didn't meet the current criteria. And as you can see, in year 2001, with the
study number 31, in year 2003, study number 21--all of them met the current
criteria, even before the 95/70 rule was adopted.
So overall, this graph
shows the improvement of the product as year goes by.
Now using the BEST and
the FDA combined data, I found out the proportion of individuals who meet at
least 75 percent, and the proportion of the individuals who met the 74 and the
73 percent, and then with a different study size, I calculated the power.
Again, I want to
amphioxi the power, meaning the likelihood of your particular study will
satisfy the current acceptance criteria, increases as sample size increases.
And again, if the true
rate, using a true rate about the population increases with a fixed study size,
the likelihood increases.
So the key point is
how, again, the same question that I raised before--how we can verify the
information about the true rate, information about the true population
proportion of successes.
So one way to have a
better estimate about the population proportion of successes, we categorize the
data into different time periods. Now
this table shows 1990 through 1997, and then 1998 to 2007, and the
significance, the year of 1998, is the year when the standard deviation
criterion was adopted.
And as you can see,
the success rate from the data in this period was 0.836, and the success rate
in second period was .0898, and because of the higher success rate, again, the
power, the likelihood to meet, the particular study will meet the current
acceptance criteria is much, much higher, significantly higher.
And the corresponding
graph to this table shows more obvious evidence. Here, over the time period, the first and
second time period, the power increases significantly with a different
threshold value grew.
Now the next table, I
categorize the data with another two different time periods, and this table,
the significance of the year 2004, is the year when the current acceptance
criteria adopted. And using the same
threshold value for the individual success, success rate was estimated for the
population proportion of successes, and the power was calculated with a fixed
study size of 24, and again, the corresponding graph to this table, with a
different threshold value, grew over the time period, the power, the
likelihood, or the probability that this particular study will meet the current
criterion, the 95/70 rule, is significantly increased.
Now the next table
categorized the data with three different periods. This is simply--1998 is when the standard
deviation criterion was accepted and the 2004 when the current acceptance
criteria was adopted, and with the threshold value for an individual success,
75 percent, the population proportion, estimated population proportion of successes
increases from 0.836 to 0.883, and then 0.931, and the power significantly
increases.
And the corresponding
graph to this table is given as this slide with a different--success threshold
value grew.
And as you can see
over the time, the power increases and it verifies the previous line graph,
that over the year, the product has better quality, and over the year, the
manufacturers were able to produce the better quality of RBC product.
So here's my
summary. Now FDA's current acceptance
criterion emphasizes critical significance to ensure that each recipient
receives a high, viable RBC product. Now
the statistical power to meet FDA's 95/70 rule depends on the study size as
well as an estimate for the population proportion.
So to have a high likelihood,
to have a high power, one way to solve that question, to answer that question,
we can increase the study size, and also we can estimate the true rate under
current acceptance criteria and/or under more recent time period, which is more
relevant to estimate the population proportion of successes.
And the traditional
clinical trial under FDA's regulation, 80 percent power has been used for a
typical clinical trial. And the data
clear show that the manufacturers are able to produce better products, over
time, and the Agency believes the current acceptance criteria serve well for
the purpose of regulating viable RBC products.
And thank you, and
that's all for my presentation, and do we have any questions?
DR. SIEGAL: Are there questions for Dr. Kim?
DR. FLEMING: I'd like to just quickly step through slides
21, 17, and 9, and in reverse order, could you go back to 17, just to make sure
we're drilling down with the same common understanding.
So let's go to 21
first. My apologies. Twenty-one, first.
DR. KIM: I can't see this here. Which one is 21?
DR. FLEMING: Go forward, I think. So it's the fourth slide from the end.
DR. KIM: This one?
DR. FLEMING: Yes.
DR. KIM: Okay.
DR. FLEMING: Okay.
So just to break this down a little bit and try to put it into simple
terms. If you take these eight studies
that are the ones, the eight products that have been approved in the last four
years under the current criteria, these studies, when you pool together all of
the 173 people, in 93 percent they successfully hit the 75 percent
criterion. And so, in fact, if your new
product is just average relative to those eight--that's, in essence, as Dr. Kim
was talking about, from a hypothesis perspective, if we say out product, in
truth, is just the same as the average, not the best of the eight, not the
worst of the eight, just average, then we would have a 92 percent chance of
getting that product approved according to the current criteria.
Now if you say, well,
I'm not going to be that stringent, I'm going to go back to the 19 products
that have been approved since 1998, including a couple products that don't meet
this criterion, should they have been approved or not? We could debate that one.
But let's assume
they're good enough. I'm going to
include them in my average.
So if I go back to the
19, by the way, what we know is that 17 of them 19 do meet this current FDA
criterion. So the argument that these products are having trouble meeting this
criterion is hard to defend when you're looking at, of the products that are
approved, 17 of 19 hit this criterion.
But if in fact we say
it's good enough to be the same as the average of all of these products, which
weigh more heavily, the way things were in the late '90s compared to where they
are now, then go back to slide 17, when you pool those together, what you have
is a 90 percent success rate, 90 percent of all of those 549 people achieve the
75 percent recovery.
And if you, in truth,
have a 90 percent success rate, you've got just under an 80 percent chance that
your product will be approved.
So last slide; go back
to number nine.
DR. KIM: Number nine?
DR. FLEMING: Slide nine.
So you're back about eight slides.
It's the one that says at the top--it's the table for power. It's one before this. There you go.
So essentially, what's
happening is if you use the current FDA criterion based on a sample size of 24,
which means you can only have three or fewer failures, then where are you in
this true rate?
Well, in the last eight
approvals, their true rate, in truth, is 93.
Their true success rate is 93, so 92 percent of those products will get
approved.
If you just say I
don't have to be as good as the products in the last four years. I'll be as good as the products, the 19 products
over the last decade, including two that didn't, in fact, meet these criteria,
then that success rate is 90, and so you'd still have almost an 80 percent
chance of getting the product approved.
So this is where truth
is in the last four years. This is where
truth is when you look at the average over the last decade. Now under these criteria, you're still going
to let half the products through that have only an 85 percent success rate.
You're going to let a
quarter of the products through that have only an 80 percent success rate. You're going to let one in eight products
through that have only a 75 percent success rate.
Remember, today, the
successful products are 93. If anything,
the issue is we're still, under the FDA criteria, letting a fair number of
products that look a lot less effective through.
But if your view is 75
percent's good enough, 75 percent is like 95, then let's weaken the FDA
criteria so we let more than one in eight of these products through. We can let six or seven of eight of these
products through.
So this column is
telling you exactly what is the current success rate, according to these
truths, and where truth is in the successful products in the last four years is
here, at 93, and even if you go back to the last decade, it's here, at 90, and
you're seeing a high approval rate, and you're seeing even the possession of
some approvals for products that are discernible less successful achieving 75
percent.
But if you want these
products, if you want products with a true rate of 75 rather than 93 to get
through, then let's weaken the FDA criterion.
DR. DI BISCEGLIE: A quick one, if I may. Just to understand the regulatory
process. Is it possible for a sponsor,
finding that they've got one too many failures, to go out and recruit another
six patients to increase their sample size?
DR. EPSTEIN: The statistician should answer, but the
answer is yes, but you pay a statistical price.
In other words, the number of additional subjects you have to study is
increased over the number you would have had to study, had you first selected a
larger study cohort.
But the answer is yes,
you can go back and study more and get a more accurate answer. But there is a statistical correction or
price you pay.
DR. CRYER: But that shouldn't hurt you if your product
really is good.
DR. EPSTEIN: That's correct. But i think we alluded to this, and I just
want to make one point clear, and Larry can comment on it, if he wishes.
I think part of the
underlying problem here is a business case, because what hasn't been made
explicit, the cost per individual patient of doing a radiolabeled red cell
recovery is very high, and I think it's around the region of 8- or $10,000 per
subject. And so the problem, from the
business point of view, is if FDA were to say, well, 80 percent power is
adequate because it's a typical standard of a drug trial, from the business
point of view, a company is saying, well, I don't want to run a 20 percent risk
that my product, which complies with the FDA standard, will fail in a trial
that's going to cost me around, you know, $240,000.
And I think that that
is part of the driver for wanting a higher level of assurance that the trial
will succeed.
But, you know, FDA's
point of view isn't to look at the cost of the trial. The trials are feasible and our goal is to
have the highest quality products that are achievable with the current
technology.
But I think that
there's been an undercurrent which has gone unstated, and Larry, if you want to
address it, I think it's be helpful.
DR. DUMONT: It's true that these are very expensive
studies to run. The other thing, when
you're considering cost from a business case, is the calendar time that it
takes, and calendar time for people that develop products, I don't do that
anymore--to do, for example, a pair trial, I mean that's at least a six month
endeavor, if not longer, because you have to wait for 42 days, you have to do
the study, etcetera, etcetera.
But the other piece
that doesn't have to do with the money is actually the number of subjects that
you expose to radioactivity.
And so if you drive up
the numbers, I mean, it's easy to work the numbers, but the numbers are
actually people that you expose to radioactivity. So the higher that goes, the more people are
put at risk, and I think that is in the purview of FDA.
I had a quick question
for Dr. Kim while she's standing there, if I could.
I was just curious,
the last group of studies from 200,4 onwards, where there's actually been so
much focus on this, how do you know that those subjects weren't selected
subjects, that, oh, we know that Mary Jean gives low recovery so we're not
going to enter her in the study, we're going to pick these people cause they
always give us high recoveries.
How do you know that?
DR. KIM: That's not something that we can answer. That's integrity of the sponsors, how they
conduct their study, and if we have that kind of problem in regulating and
submitting, that ruins the relationship between the sponsor--trust between
sponsor and FDA.
If you suspect that
there is a case of such kind of things, then I think you have to report to FDA.
DR. ZIMRIN: I find it truly amazing that two different
groups could look at the same dataset, or a very similar dataset, and come up
with conclusions that are so amazingly different.
Could you explain--and
this is probably a futile request--but explain in terms that a nonmathematician
could understand--how that could come about?
DR. KIM: The simple answer is the BEST study party focused
on estimating the true population proportion rate, combining all the datas, and
the FDA looked at the separate time period.
So we didn't change, we didn't manipulate, we used the same data. The difference is estimating about the true
population proportion, using everything all together versus time period, and
the time period is not randomly separated.
It depends on when the new criteria was adopted, like if 1998 was the--
significant changes in standard deviation criteria was adopted, so we believe
there is some kind of action different from the year 1990 to 1997, and then
1998 to 2003.
And in 2004 is the
year when the current acceptance criteria was adopted. So that's the difference.
DR. ZIMRIN: One more question. Of the current additives and things that are
commonly in use, can you give me some sense of when they were approved.
DR. KIM: When they were--
DR. ZIMRIN: I mean these ones that we're talking about,
these awful ones, that only four or eight would be approved today. Do these include things that we commonly use
today?
DR. KIM: The dataset that was used. Doctor--
Dr. ZIMRIN: No, no, no.
The product.
DR. VOSTAL: I can't really give you an exact date those
were approved, but they were approved a long time ago. But there has been changes in the products
from the early times to now, and some of that was already discussed in terms of
leukoreduction, you know, and more of the products being leukoreduced, and also
there are more apheresis products on the market. So there is a subtle change, over time.
DR. FLEMING: Actually, I don't think the two analyses are
different, in the following sense. If
you go back to slide 21 again, if you could, it's the fourth slide from the
end. The BEST trial, as I
understand--the fourth study from the end.
Okay; keep going. Okay.
When I was talking
through this, I focused first on what we've seen in the last four years, and
then I expanded to what you would see over the last decade, and when you bring
in this success rate and dilute this one, then you end up with a 90 percent
success rate, and the individual basis, and that gives you 80 percent power.
If you bring in these
eight studies as well, then the success rate goes down and the power goes down,
and that's what BEST is looking at, plus I believe BEST only had four of these
trials.
And so that's the only
difference between what BEST was doing and what the FDA is doing, and again, to
me, slide nine is really the key issue because if you--one more time, if you
could, go back to slide nine. And so the
FDA is looking not only at four trials in the last four years, they're looking
at eight trials, and over those eight trials the success rate is 93 percent and
the power would be 92 for somebody who's just the average, similar to those
eight.
If you then add back
the 11 studies from '98 to 2003, your success rate dilutes to 90 percent, but
you still have almost an 80 percent success rate.
If you dilute further
back to those studies that go back to 1990, then your success rate is going to
be dropping down into the area of 88 percent, 87 percent, and the power's going
to drop to 67 percent.
So again, the
fundamental issue is, if you have a product that's just average for what you've
had approved over the last four years, you've got a 92 percent chance of
meeting the FDA criteria.
If you have a product
that's just average, over the last decade you have an 80 percent chance of
getting the product approved.
But if you want to
allow products to be approved here, that have only a 75 percent, when we can
get 93 today, at least 90 today--but if 75 is good enough, then, yes, that
product only has a one in eight chance of succeeding.
But if we weaken the
FDA criteria, we can get a much greater chance of getting approved.
DR. SIEGAL: One last point.
DR. DUMONT: I just want to address that since that was my
study. I think the difficulty is we
looked at the data as the best representation for what's being used today in
the blood bank.
Dr. Kim looked at the
data differently. She was looking at
more of an instantaneous change in time.
So there is a difference in the way the data were--the data didn't
change but there was a difference in the way it was looked. And if you remember that last slide that I
showed, those were current red cell products that are made in the blood bank,
they're leukocyte-reduced, they look just like the total population that we
sampled from.
So that, in my view,
is the difference between the two.
DR. SIEGAL: Is there any more discussion at this point?
Dr. Cryer.
DR. CRYER: I have one last question on that. What percent of the patients that would not
meet the criteria,