Approaches to Evaluation and Validation of Therapeutically Relevant Biomarkers
Annette Molinaro, PhD;
Yale University School of Medicine
View Presentation (PDF)
To avoid toxicity and expense, improved tools for selecting individual patients for
treatments and accurate prediction of who will respond and who will not respond to
treatment are needed. Although new technologies for genomic profiling have been
developed, none has made it into clinical practice because it is difficult to develop
biomarker classifiers and sufficiently validate them.
The main steps to developing a classifier are to:
- select a prediction model;
- split the sample data into training and test sets;
- perform feature selection;
- fit the model to the training set; and
- estimate the prediction accuracy with the test set.
Because it is possible to find a perfect classifier even when no signal is present, to
avoid over-fitting or chance, some form of a training or test set must be used. It is
important to remember that there should be no adjustment of the model or fitting on the
test set and that feature selection is done within the training set.
After statistical significance is assessed and prediction error is estimated, the
investigator should determine whether the prediction error confidence interval includes
chance. Split sample (in which two-thirds of the sample is placed in the training set and
one-third is placed in the test set) and leave-one-out cross-validation (in which the
training set is n – 1, the test set is 1 observation, and the validation is repeated n
times until each observation is in the test set once) can be used for internal
validity.
Following internal validation, questions regarding the accuracy of the classifier, the
ability of the classifier to enhance prediction accuracy, and whether the classifier is
worthy of further investigation will be answered. If the genomic classifier is worthy of
further investigation, then its broad clinical application can be examined through
external validation. This independent validation of prediction accuracy for the
completely specified classifier determines whether patients benefit by using the
classifier (e.g., better efficacy, reduced incidence of adverse events) versus not using
the classifier.
|