NCGC Logo Link to HomeLink to About UsLink to Assay GuidanceLink to News Events & PublicationsLink to Contact UsLink to Resources
Assay Guidance  //  Assay Guidance Manual  //  Assay Validation
>>  Assay Guidance
Assay Guidance Manual
Introduction
Assay Validation
Assay Operations for SAR Support
Enzymatic Assays
Receptor Binding Assays
GTPγS Binding Assays
Tissue Culture Assays
Cell-Based Elisa (C-Elisa) and Westerns Blots for Quantitative Antigen Detection
FLIPR™ Assays to Measure GPCR and Ion Channel Targets
Immunoassay Methods
Data Standardization for Results Management
Mechanism of Action Assays for Enzymes
Glossary of Quantitative Biology Terms
NCGC Assay Guidance Criteria
Assay Validation
Copyright © 2008, Eli Lilly and Company and the National Institutes of Health Chemical Genomics Center. All Rights Reserved. For more information, please review the Privacy Policy and Site Usage and Agreement.

Table of Contents
    1. OVERVIEW
    2. STABILITY AND PROCESS STUDIES
    3. PLATE UNIFORMITY AND SIGNAL VARIABILITY ASSESSMENT
    4. REPLICATE-EXPERIMENT STUDY
    5. HOW TO DEAL WITH HIGH ASSAY VARIABILITY
    6. BRIDGING STUDIES FOR ASSAY UPGRADES AND MINOR CHANGES
    7. REFERENCES

    The statistical validation requirements for an assay vary, depending upon the prior history of the assay. Stability and Process studies (Section B) should be done for all assays, prior to the commencement of the formal validation studies. If the assay is new, or has never been previously validated, then full validation is required. This consists of a 3 day Plate Uniformity study (Section C) and a Replicate-Experiment study (Section D). If the assay has been previously validated in a different laboratory, and is being transferred to a new laboratory, then a 2 day Plate Uniformity study (Section C) and a Replicate-Experiment study (Section C) are required. An assay is considered validated if it has previously been assessed by all the methods in this section, and is being transferred to a new laboratory without undergoing any substantive changes to the protocol. If the intent is to store the data with the results of the previous facility’s data then an assay comparison study (Section D) should be done as part of the Replicate-Experiment study. Otherwise only the intra-laboratory part of the Replicate-Experiment study (Section D) is recommended.

    If the assay is updated from a previous version run in the same facility then the requirements vary, depending upon the extent of the change. Major changes require a validation study equivalent to a laboratory transfer. Minor changes require bridging studies that demonstrate the equivalence of the assay before and after the change. See Section E for examples of major and minor changes.

    These techniques are intended to be applied to ≥ 96-well primary target binding and functional assays. You should discuss with a statistician alternatives for assays with significant time, resource or expenditure constraints to properly balance validation requirements with these constraints.

    Reagent Stability and Storage Requirements
    It is important to determine the stability of reagents under storage and assay conditions.
    • Use the manufacturer’s specifications if the reagent is a commercial product.
    • Identify conditions under which aliquots of the reagent can be stored without loss of activity.
    • If the proposed assay will require that the reagent be frozen and thawed repeatedly, test its stability after similar numbers of freeze-thaw cycles.
    • If possible, determine the storage-stability of the reagent.
    • If reagents are combined and aliquoted together, examine the storage-stability of the mixtures.

    Conduct time-course experiments to determine the range of acceptable times for each incubation step in the assay. This information will greatly aid in addressing logistic and timing issues.

    Reagent Stability During Daily Operations; Use Of Daily Leftover Reagents
    The stability studies will require running assays under standard conditions, but with one of the reagents held for various times before addition to the reaction. The results will be useful in generating a convenient protocol and understand the tolerance of the assay to potential delays encountered during screening.

    If possible, reagents should be stored in aliquots suitable for daily needs. However, some information pertinent to saving leftover reagents (particularly expensive ones) for future assays should be obtained.

    New lots of critical reagents should be validated using the bridging studies

    Test compounds are delivered at fixed concentrations in 100% DMSO, thus solvent-compatibility of assays should be determined. Typically, the uninhibited or fully stimulated assay may be performed in the presence of DMSO concentrations spanning the expected final concentration. Typically, DMSO concentrations from 0 to 10% are tested. Note that this study should be done relatively early in development of the assay because other studies, such as the variability studies, should be performed with the concentration of DMSO that will be used in screening. For cell based assays, it is recommended that the final %DMSO be kept under 1%.

    Overview
    All assays should have a plate uniformity assessment. For new assays the plate uniformity study should be run over 3 days to assess uniformity and separation of signals, using DMSO at the concentration to be used in screening. For assay transfers (See Section A for the definition of an assay transfer) the plate uniformity study need be only 2 days.

    The actual variability tests are conducted on three types of signals.

    • "Max" signal: This measures the maximum signal. For agonist assays this would be maximal response of an agonist; for potentiator assays this would be an EC10 concentration of a standard agonist (the actual percentage is as per protocol and may not be 10% in some cases) plus maximal concentration of a standard potentiator. For inhibition type assays this would be a reaction with an EC80 concentration of a standard agonist (again the actual percentage is as per protocol, and may not be 80%). For inverse agonist assays this would be the untreated constitutively active condition in the presence of DMSO alone.
    • "Min" signal: This measures the background signal. For agonist assays this is the basal signal. For potentiator assays this is an EC10 concentration of agonist. For inhibitor assays, including receptor-binding assays, this is an EC80 concentration of the standard agonist plus a maximally inhibiting concentration of a standard antagonist (preferred) or unstimulated reaction.
    • "Mid" signal: This estimates the signal variability at some point between the maximum and minimum signals. Typically, for agonist assays the mid-point is reached by adding an EC50 concentration of a full agonist/activator compound; for potentiator assays it is an EC10 concentration of agonist plus EC50 concentration of a potentiator; and for inhibitor assays it is an EC80 concentration of an agonist plus an IC50 concentration of a standard inhibitor to each well.
    N.B. If calibration of the signals is required then the concentration levels and all analyses are to be conducted on the calibrated responses and not the raw plate reader counts. It is a requirement that the raw signals lie within the range of the calibration curve, ie at most 1-2% of the wells lie outside the calibration range (i.e. above the fitted top or below the fitted bottom of the calibration curve).

    Two different plate formats exist for the plate uniformity studies: an Interleaved-Signal format where all signals are on all plates, but varied systematically so that over all plates on a given day each signal is observed in each well, and a Uniform-Signal plate format where each signal is run uniformly across entire plates. There are no universal advantages to either format. The Interleaved-Signal format can be used in all instances and requires fewer plates. The Uniform-Signal format is easier to run, and more useful for detecting non-uniform signals, but takes more plates in total. It also should not be used if signals vary across plates on a given day. See Section C.3.d. for examples of when they should not be used.

    Procedure
    You should use the following plate layouts, for which Excel analysis templates have been developed. These layouts have a combination of wells producing max, min, and mid signals on a plate with proper statistical design. Use the same plate formats on all days of the test. Do not change the concentration producing the mid point signal over the course of the test. See Section C.2.d. for a further discussion about midpoint accuracy. The trials should use independently prepared reagents and preferably be run on separate days.

    Plate 1
    Row C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
    1 H M L H M L H M L H M L
    2 H M L H M L H M L H M L
    3 H M L H M L H M L H M L
    4 H M L H M L H M L H M L
    5 H M L H M L H M L H M L
    6 H M L H M L H M L H M L
    7 H M L H M L H M L H M L
    8 H M L H M L H M L H M L
    H=Max, M=Mid, L=Min

    Plate 2
    Row C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
    1 L H M L H M L H M L H M
    2 L H M L H M L H M L H M
    3 L H M L H M L H M L H M
    4 L H M L H M L H M L H M
    5 L H M L H M L H M L H M
    6 L H M L H M L H M L H M
    7 L H M L H M L H M L H M
    8 L H M L H M L H M L H M
    H=Max, M=Mid, L=Min

    Plate 3
    Row C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
    1 M L H M L H M L H M L H
    2 M L H M L H M L H M L H
    3 M L H M L H M L H M L H
    4 M L H M L H M L H M L H
    5 M L H M L H M L H M L H
    6 M L H M L H M L H M L H
    7 M L H M L H M L H M L H
    8 M L H M L H M L H M L H
    H=Max, M=Mid, L=Min

    The points below describe these calculations and acceptance criteria. The overall requirement for the signals is that the raw signals are sufficiently tight and that there is sufficient separation between the max and min signals to conduct screening. Calculations and acceptance criteria are summarized as follows.
    1. Outliers should be flagged with an asterisk in the plate input section. The outliers should be “obvious”, and the rate of outliers should be less than 2 percent (i.e. on average less than 2 on a 96 well plate, 8 on a 384 well plate).
    2. Compute the mean (AVG), SD, and CV (of the mean) for each signal (max, mid, min) on each plate. Note that the CV should be calculated taking into account the number of wells per test compound per concentration that will be used in the production assay. For example if in the production assay duplicate wells will be run for each concentration of each test substance then
      Equation1

      More generally, if there will be n wells per test compound per concentration then
      Equation2

      The acceptance criterion are that the CV’s of each signal be less than or equal to 20%. Note that the min signal often fails to meet this criterion, especially for those assays whose min signal mean is very low. An alternate acceptance criterion for the min signal is SDmin ≤ both SDmid and SDmax. All plates should pass all signal criteria (ie all Max and Mid signals should have CV’s less than 20% and all Min signals should either pass the CV criteria or all Min signals should pass the SD criteria).
    3. For each of the mid-signal wells, compute a percent activity for agonist or stimulation assay relative to the means of the max and min signals on that plate,
      Equation3

      For inhibition assays compute percent inhibition for each mid-signal well, where %Inhibition = 100 - %Activity.
    4. Compute the mean and SD for the mid-signal percent activity values on each plate. The acceptance criterion is SDmid ≤ 20 on all plates.
    5. Compute a Signal Window (SW) or Z’ factor (Z’) for each plate, as described below. The acceptance criterion SW ≥ 2 or Z’ ≥ 0.4 on all plates (either all SW’s ≥ 2 or all Z’ ≥ 0.4).
    The formula for the signal window is:

    Equation4

    where n is the number of replicates of the test substance that will be used in the production assay. Instead of the SW the Z’ factor can be used to evaluate the signal separation, where the only difference is the denominator (AVGmax – AVGmin) is used instead of SDmax. The complete formula is:

    Equation5

    If one assumes that the SD of the max signal is at least as large as the SD of the min signal, then the Z’ factor will be within a specific range for a given signal window, as illustrated in the following graph. Note that Z’ values greater than 1 are possible only if AVGmax < AVGmin, and so the templates also check that all Z’ values are less than 1.

    Graph1
    Z-Factor interval versus Signal Window

    The recommended acceptance criterion is Z’ factor ≥ 0.4, which is comparable to a SW ≥ 2. Either measure could be used.

    A scatter plot (see examples below) can reveal patterns of drift, edge effects and other systematic sources of variability. The response is plotted against well number, where the wells are ordered either by row first, then by column, or by column first, then by row. The overall requirement is that plates do not exhibit material edge or drift effects. In general drift or edge effects < 20% are considered non-material, and effects seen only on a single or few plates, and not the predominant pattern are also considered non-material. Some guidelines to detecting and dealing with these problems follow.

    No drift or edge effects
    The following two plots (of the same data) show an example where there are no edge effects or drift.

    Graph2

    Graph3

    Drift
    Use the max and mid signals to look for drift. Consider drift associated with the min only if the mean signal is greater than 10% of the maximum signal. Look for significant trends in the signal from left-to-right and top-to-bottom. If you observe drift that exceeds 20% then you have material drift effects. In the example below, the mean of column 1 is 10.6, while the mean of column 10 is 13.8, and the overall mean is 12.2. The drift is 26% [(13.8-10.6)/12.2], and therefore should be investigated.

    Graph4

    Graph5

    Edge Effects
    Edge effects can contribute to variability, and spotting them can be a helpful troubleshooting technique. Edge effects are sometimes due to evaporation from wells that are incubated for long periods of time. Edge effects can also be caused either by a short incubation time or by plate stacking – these conditions allow the edge wells to reach the desired incubation temperature faster than the inner wells. Edge effects may show up in the data as represented in the following example.

    Graph6

    Graph7

    Note: Because of the vertical axis scale, problems in the min and even mid signals may not be visible. Adjusting the scale to highlight the min and mid scales may be necessary to properly examine these signals.

    The normalized mid signal should not show any significant shift across plates or days. “Significant” depends to a certain extent on the typical slopes encountered in dose response curves. Thus plate-to-plate or day-to-day variation in the mid point percent activity needs to be assessed in light of the steepness of the dose-response curves of the assay. For receptor binding assays, and other assays with a slope parameter of 1, a 15% difference can correspond to a two-fold change in potency. The template will translate the mean normalized mid-signal to potency shifts across plates and days. There should not be a potency shift ≥2 between any two plates within a day, or ≥2 between any two average day mid point %activities. For functional assays whose slopes may not equal 1 you can enter a “typical” slope into the template. This should be based on the slope of a dose-response curve for the substance used to generate the mid point signal.

    For these calculations to have utility the mid point %Inhibition/Activity should be “near” the midpoint. Values within the range of 30-70% are ideal. Studies with mean values outside this range should be discussed with a statistician, especially before any studies are repeated solely for this reason. Also note that the conditions used to obtain the midpoint should not be changed over the course of the plate uniformity study.

    1. Intra-plate Tests: Each plate should have a
      CVmax and CVmid ≤ 20%,
      CVmin ≤ 20% or SDminmin(SDmid, SDmax),
      Normalized SDmid ≤ 20,
      SW ≥ 2 or Z’ ≥ 0.4.
    2. No material edge, drift or other spatial effects. Note that the templates do not check this criterion
    3. Inter-plate and Inter-Day Tests: The normalized average mid-signal should not translate into a fold shift
      > 2 within days,
      > 2 across any two days.

    384-well plates contain 16 rows by 24 columns, and one 384-well plate contains the equivalent of four 96-well plates. Two different formats of interleaved plate uniformity templates have been developed. The first layout expands the 96-well plate format into 4 squares. The plate layouts are as follows:

    Graph8
    Standard Interleaved 384-well Format Plate Layouts

    The second is useful for assays using certain automation equipment such as Tecan and Beckman. In that case column 1 of the 96-well plate corresponds to columns 1 and 2 of the 384-well plate, and is laid out in 8 pairs of columns. The plate layouts for it are as follows:

    Graph9
    HHMMLL 384-well Plate Uniformity Plate Layouts

    The analysis and acceptance criterion are exactly the same as for 96-well format Plate Uniformity Studies. See Section 2.C.2e for a summary of the acceptance criterion.

    Uniform-Signal plate layouts are an alternative format to conduct the plate uniformity studies. Their main advantage is easier execution since all wells on each plate are exactly the same, and together with heat maps provide for a straightforward assessment of spatial properties. The disadvantages are that this format requires twice as many plates as the Interleaved-Signal format, and that the normalizing calculations are quite artificial in that max and min signals are not on-plate signals and therefore may produce misleading results. See Section C.3.d for further elaboration of this point.

    Max, Mid and Min signals are prepared as defined in Section C.1. Two plates are run for each signal, making six plates per day. On each plate all wells are the same, i.e. either all Max, all Mid, or all Min. The number of days required is the same as for the Interleaved-Signal layout: three days for new assays, two days for transfers of previously validated assays.

    The actual calculations will be performed by the template. Details of the calculations are as follows:
    1. Compute the mean (AVG), standard deviation (SD) and Coefficient of Variation (CV) for each plate (as per the Interleaved-Signal format the CV’s should reflect the number of wells per test-condition envisioned in the production assay). Requirements are the same as for Interleaved-Signal format: The CV of each plate should be less than 20%. For the Min plates having SD ≤ SDmid and SDmax, where
      Equation6

      is the combined standard deviation from the two Mid plate SD’s, and similarly for the Min and Max signals.
    2. For each of the Mid signal plates, compute the percent activity for agonist or stimulation assays, and percent inhibition for antagonist or inhibition assays (including binding assays). In this format the calculation is
      Equation7

      where AVGmin is the average taken over the two Min plate averages, and AVGmax is the average taken over the two Max plate averages. Percent Inhibition = 100 - %Activity.
    3. Compute the SD of the normalized signals on each Mid plate. The acceptance criterion is SD%mid ≤ 20.
    4. Compute the Z’ factor and/or the SW for each day. The formulas are the same as in Section C.2.b, except that AVGmax and AVGmin are defined as in point 2 above, and SDmax and SDmin are defined as in point 1 above. The acceptance criterion is either all Z’ ≥ 0.4 or all SW ≥ 2.

    The Excel template provides scatterplots of the plate signals combined across plates and days and is interpreted in a similar manner as the Interleaved-Signal format. The criterion for acceptance is the same as for the interleaved format: No drift or edge effects that exceed 20% of the mean. Also as in the Interleaved-Signal format the presence of these effects should be apparent as the predominant effect, and not seen just in single isolated plates for the assay to be failed by this criterion.

    The following example illustrates a spatially uniform result, an edge effect, and a drift effect. Day 1 shows an acceptably uniform result. Day 2 shows an assay with a significant edge effect (25% from the mean edge value to the mean of the interior), and Day 3 shows an assay with significant drift (25% change in mean value from left to right as compared to the average in the middle). If patterns are similar or worse than those depicted in Day 2 or Day 3 then the assay does not pass the spatially uniform requirement.

    Graph10

    The Inter-plate and inter-day tests are exactly the same as in Section C.2.d, except the definitions of %Activity and %Inhibition defined above (Section C.3.a) are used in the tests.
    Impact of Plate Variation on Validation Results
    The Uniform-Signal format does make the assumption that plate variation within each run day is negligible. If this assumption is not correct then many of the diagnostic tests described here will be misleading, and the Interleaved-Signal format should be used instead. In particular, Z’ factors and/or Signal Windows may be incorrect in either direction, and the Inter-plate and Inter-Day tests could possibly fail acceptable assays.

    The following example illustrates the problem. The raw signals of one day of an Interleaved-Signal format Plate Uniformity Study are shown on the left in Panel A. The Max and Mid raw signals vary across the 3 plates (Panel A, Plates 1-3), but note that the %Activity is very stable across the 3 plates (Panel B, Plates 1-3). The maximum fold shift across plates is 1.2. The Midpoint Percent Activity plot (Panel B) shows what can happen if you don’t have on-plate Max and Min controls. The three left-hand panels show the plates normalized to their own controls while, to mimic the Uniform-Signal protocol with its off-plate controls, the right hand columns of Panel B show each plate’s mid signal normalized to the plate 3 controls, i.e. “Plate 1” shows the actual plate 1 mid signal normalized to the plate 3 Max and Min signals, “Plate 2” shows the actual plate 2 mid signals normalized to the plate 3 Max and Min signals and “Plate 3” is the plate 3 mid signals normalized to their own controls. In the presence of plate variation the off-plate controls do not effectively normalize the assay. As Panel B shows, plate-to-plate variation in the raw signals can induce the appearance of significant mid-point variation when in fact there is little variation in signals properly normalized to on-plate controls. In this example using off-plate controls Plates 1-3 have a max fold shift of 2.0 which does not pass the inter-plate acceptance criterion.

    Graph11
    Panel A. Raw data values for 3 plates of an Interleaved-Signal Plate Uniformity Study. Plates 1-3 show the actual plate values obtained on one day of the test.

    Graph12
    Panel B. Normalized midpoint values for 3 plates of a Interleaved-Signal Plate Uniformity Study. Plates 1-3 show the actual plate midpoints normalized to the on-plate controls. Plates 4-6 show the same mid points all normalized to the Plate 3 Min and Max controls.

    Overview
    It is important to verify that the assay results are reproducible, i.e. that the variability of key end points of the assay are acceptably low. In addition, if the assay is to report results with those previously reported by another assay then it should be verified that the two labs produce equivalent results. In this section, we define how to quantify assay variability and determine assay equivalence. It is important to read the entire section below to understand the rationale for the statistical methods employed in calculating reproducibility of potency and efficacy. We strongly recommend consultation with a statistician before designing experiments to estimate variability described below.

    Rationale
    Replicate-Experiment studies are used to formally evaluate the within-run assay variability and formally compare the new assay to the existing (old) assay. They also allow a preliminary assessment of the overall or between-run assay variability, but two runs are not enough to adequately assess overall variability. Post-production methods (Section III) are used to formally evaluate the overall variability in the assay. Note that the Replicate-Experiment study is a diagnostic and decision tool used to establish that the assay is ready to go into production by showing that the endpoints of the assay are reproducible over a range of potencies. It is not intended as a substitute for post-production monitoring or to provide an estimate of the overall Minimum Significant Ratio (MSR).

    It may seem counter-intuitive to call the differences between two independent assay runs “within-run”. However, the terminology results from the way those terms are defined. Experimental variation is categorized into two distinct components: between-run and within-run sources. Consider the following examples:

    • If there is variation in the concentrations of buffer components between 2 runs then the assay results could be affected. However, assuming that the same buffer is used with all compounds within the run, each compound will be equally affected and so the difference will only show up when comparing one run to another run, i.e. in two runs one run will appear higher on average than the other run. This variation is called between-run variation.
    • If the concentration of the compound in the stock plate varies from the target concentration then all wells where that compound is used will be affected. However, wells used to test other compounds will be unaffected. This type of variation is called within-run as the source of variation affects different compounds in the same run differently.
    • Some sources of variability affect both within- and between-run variation. For example, in a FLIPR assay cells are plated and then incubated for 24-72 hours to achieve a target cell density taking into account the doubling time of the cells. For example, if the doubling time equals the incubation time, and the target density is 30,000 cells/well, then 15,000 cells/well are plated. But even if exactly 15,000 cells are placed in each well there won’t be exactly 30,000 cells in each well after 24 hours. Some will be lower and some will be higher than the target. These differences are within-run as not all wells are equally affected. But also suppose in a particular run only 13,000 cells are initially plated. Then the wells will on average have fewer than 30,000 cells after 24 hours, and since all cells are affected this is between-run variation. Thus cell density has both within- and between-run sources of variation.

    The total variation is the sum of both sources of variation. When comparing two compounds across runs, one must take into account both the within-run and between-run sources of variation. But when comparing two compounds in the same run, one must only take into account the within-run sources, since, by definition, the between-run sources affect both compounds equally.

    In a Replicate-Experiment study the between-run sources of variation cause one run to be on average higher than the other run. However, it would be very unlikely that the difference between the two runs were exactly the same for every compound in the study. These individual compound “differences from the average difference” are caused by the within-run sources of variation. The higher the within-run variability the greater the individual compound variation in the assay runs.

    The analysis approach used in the Replicate-Experiment study is to estimate and factor out between-run variability, and then estimate the magnitude of within-run variability.

    All assays should have a reproducibility comparison (Steps 1-3). If the assay is to replace an existing assay and combine the data then an assay comparison study should also be done (Steps 4 and 5).
    1. Select 20-30 compounds that have potencies covering the concentration range being tested and, if applicable, efficacy measures that cover the range of interest. The compounds should be well spaced over these ranges.
    2. All of the compounds should be run in each of two runs of the assay.
    3. Compare the two runs as per Section D.3-D.6.
    4. All compounds should be run in a single run of the previous assay.
    5. Compare the results of the two labs by analyzing the first run of the new assay with the single run of the previous assay.

    For the reproducibility comparison paste potency values from the two runs into the Run 1 and Run 2 data columns. All tests are conducted by the spreadsheet, and there are additional plots and diagnostics available to assist in judging the results. For the assay comparison study paste the potency values for the first run of the new assay into the Run1 column and the potency values for the (single) run of the previous assay into the Run 2 column. Potency values should be calculated according to the methods of Section III.

    The points below describe and define the terms used in the template and the acceptance criterion discussed in the Diagnostic Tests section below.

    1. Compute the difference in log-potency (= first – second) between the first and second run for each compound. Let Equation8 be the sample mean and standard deviation of the difference in log-potency. Since ratios of EC50 values (relative potencies) are more meaningful than differences in potency (1 and 3, 10 and 30, 100 and 300 have the same ratio but not the same difference), we take logs in order to analyze ratios as differences.
    2. Compute the Mean-Ratio: Equation9. This is the geometric average fold difference in potency between two runs.
    3. Compute the Ratio Limits: Equation10, where n is the number of compounds. This is the 95% confidence interval for the Mean-Ratio.
    4. Compute the Minimum Significant Ratio: Equation11. This is the smallest potency ratio between two compounds that is statistically significant.
    5. Compute the Limits of Agreement: Equation12. Most of the compound potency ratios (approximately 95%) should fall within these limits.
    6. For each compound compute the Ratio (=first/second) of the two potencies, and the Geometric Mean potency: Equation13.

    Items 2-6 can be combined into one plot: the Ratio-GM plot. An example is in Figure 1. The points represent the compounds; the blue-solid, green long-dashed and red short-dashed lines represent the MR, RLs and LsA values respectively.

    Figure 1 shows the desired result of pure chance variation in the difference in activities between runs. The blue solid line shows the geometric mean potency ratio, i.e. the average relationship between the first and second run. The green long-dashed lines show the 95% confidence limits of the mean ratio. These limits should contain the value 1.0, as they do in this case. The red short-dashed lines indicate the limits of agreement between runs. They indicate the individual compound variation between the first and second run. You should see all, or almost all, the points fall within the red dashed lines. The lower line should be above 0.33, while the upper line should be below 3.0, which indicates a 3-fold difference between runs in either direction. The MSR should be less than 3.0, as it is in this example.

    Graph12
    Figure 1. Potency Ratio versus GM Potency. This is a typical example for an acceptable assay: The MR=0.90, RLs=(0.78-1.03) [contains the value 1.0], MSR=1.86 [under 3.0], LsA=(0.48-1.67) [between 0.33 and 3.0].

    1. If the MSR ≥ 3 then there is poor individual agreement between the two runs. This problem occurs when the within-run variability of the assay is too high. See Figure 2(a) below for an illustration. An assay meets the MSR acceptance criterion if the (within-run) MSR < 3.
    2. If Ratio limits do not contain the value 1, then there is a statistically significant average difference between the two runs. Within a lab (Step 3) this is due to high between-run assay variability. Between labs (Step 4), this could be due to a systematic difference between labs, or high between-run variability in one or both labs. See Figure 2(b) below for an illustration. Note that it is possible with a very “tight” assay (i.e. one with a very low MSR) or with a large set of compounds to have a statistically significant result for this test that is not very material, i.e., the actual MR is small enough to be ignorable. If the result is statistically significant then examine the MR. If it is between 0.67 and 1.5 then the average difference between runs is less than 50% and is deemed immaterial. However, in Figure 2(b) the MR=2.01, indicating a 101% difference between runs, which is too high to be considered “equivalent”. Note that there is no direct requirement for the MR, but values that are this extreme are unlikely to pass the Limits of Agreement criterion in step 3 below.
    3. The MR and the MSR are combined into a single interval referred to as the Limits of Agreement. An assay that either has a high MSR and/or an MR different from 1 will tend to have poor agreement of results between the two runs. An assay meets the Limits of Agreement acceptance criterion if both the upper and lower limits of agreement are between 0.33 and 3.0. Note that assays depicted in both Figures 2a and 2b do not have Limits of Agreement inside the acceptance region and thus do not meet the acceptance criterion.

    PANEL A
    Graph13
    PANEL B
    Graph14
    Figure 2. Potency Ratio vs. GM Potency. (A) Shows a case where the within-run variability is too large (MR= 0.8, RLs= (0.61-1.07), MSR= 3.54, and LsA= (0.23-2.84), and (B) shows a case where the LsA are outside the acceptable range because the Mean Ratio is too large, i.e., there is a tendency for the activity values in run 1 to be larger than in run 2 (MR= 2.01, RL= (1.75-2.32), MSR= 1.86, and LsA= (1.08-3.75). In both cases the reason(s) for these conditions should be investigated.

    The points below describe and define the terms used in the template and the acceptance criterion discussed in the Diagnostic Tests section. Note that the methods described here are intended for functional full/partial assays and non-competitive antagonist assays. Some potentiator assays, as well as assays normalized by fold stimulation may best be analyzed with the techniques described in the potency section rather than the methods described here. Consult a statistician for the best method of analysis.
    1. Compute the difference in efficacy (= first – second) between the first and second run for each compound. Let Equation14 be the sample mean and standard deviation of the difference in efficacy.
    2. Compute the Mean-Difference: Equation15. This is the average difference in efficacy between the two runs.
    3. Compute the Difference Limits: Equation16, where n is the number of compounds. This is a 95% confidence interval for the Mean-Difference.
    4. Compute the Minimum Significant Difference: Equation17. This is the smallest efficacy difference between two compounds that is statistically significant.
    5. Compute the Limits of Agreement: Equation18. Most of the compound efficacy differences should fall within these limits (approximately 95%).
    6. For each compound compute the Difference (= first-second) of the two efficacies, and the Mean efficacy (average of first and second).

    Items 2-6 can be combined onto one plot: the Difference-Mean plot (not shown). The plot is very similar to the Ratio-GM plot except that both axes are on the linear scale instead of the log scale.

    Generally the same two problems discussed under potency need to be judged for efficacy as well. However, a general acceptance criterion for efficacy has not been established as there is not a consensus on efficacy standards, and for most projects potency is the primary property of interest. As guidelines, the MD should be less than 5 (i.e., less than 5% average difference between runs) and the MSD should be less than 20 (e.g., 20% activity). More importantly, the MD and MSD should be used to judge the appropriateness of any efficacy CSF’s a project may have. For example, if the CSF for efficacy is >80%, and the MSD is 30%, then the assay will fail too many efficacious compounds - a 90%-active compound would fall below the CSF 25% of the time. A more appropriate CSF in this situation would be 70 or even 60%.

    1. In Step 3 conduct reproducibility and equivalence tests for potency comparing the two runs in the new lab. The assay should pass both tests (MSR < 3 and both Limits of Agreement should be between 0.33 and 3.0).
    2. In Step 5 conduct reproducibility and equivalence tests for potency comparing the first run of the new lab to the single run of the old lab. The assays should pass both tests to be declared equivalent (Limits of Agreement between 0.33 and 3.0).
    3. For full/partial agonist assays and non-competitive antagonist assays, repeat points 1 and 2 for efficacy. Use the informal guidelines discussed above, and project efficacy CSFs to judge acceptability of results.
    Notes
    1. If a project is very new, there may not be 20-30 unique active compounds (where active means some measurable activity above the minimum threshold of the assay). In that case it is acceptable to run compounds more than once to get an acceptable sample size. For example, if there are only 10 active compounds then run each compound twice. However, when doing so, (a) it is important to biologically evaluate them as though they were different compounds, including the preparation of separate serial dilutions, and (b) label the compounds “a”, “b” etc. so that it is clear in the test-retest analyses which results are being compared across runs.
    2. Functional assays need to be compared for both potency (EC50) and efficacy (%maximum response). This may well require a few more compounds in those cases.
    3. In binding assays, it is best to compare Ki’s, and in functional antagonist assays it is best to compare Kb’s.
    4. An assay may pass the reproducibility assessment (Steps 1-3 in the procedure [Section D.2.]), but may fail the assay comparison study (Steps 4-5 in the procedure [Section D.2]). The assay comparison study may fail either because of a MR different from 1 or a high “MSR” in the assay comparison study. If it’s the former then there is a potency shift between the assays. You should assess the values in the assays to ascertain their validity (e.g. which assay’s results compare best to those reported in the literature?). If it fails because the Lab Comparison study is too large (but the new assay passes the reproducibility study) then the old assay lacks reproducibility. In either case, if the problem is with the old assay, then the team should consider rerunning key compounds in the new assay to provide comparable results to compounds subsequently run in the new assay.

    High Variation in Single Concentration Determinations
    The table below can be used as a reference to determine the number of replicates necessary for assays with high variability. For a given CV of the raw data values based on 1 well, it shows the number of replicates needed for the CV of a mean to be less than or equal to 10 or 20%. This table does not indicate how the IC50/Ki/Kb variability will be affected (See Section E.2 for high variation in IC50/Ki/Kb responses).

    CV using 1 well Number of Wells so that CV < 10% Number of Wells so that CV < 20%
    < 10 1 1
    10.00 to 14.1 2 1
    14.2 to 17.3 3 1
    17.4 to 20.0 4 1
    20.1 to 22.3 5 2
    22.4 to 24.4 6 2
    24.5 to 26.4 7 2
    26.5 to 28.2 8 2
    28.3 to 30.00 9 3
    30.1 to 31.6 10 3
    31.7 to 33.1 11 3
    33.2 to 34.6 12 3
    34.7 to 36.0 13 4
    36.1 to 37.4 14 4
    37.4 to 38.7 15 4
    38.8 to 40.00 16 4

    Adding replicates to reduce variability will also reduce the capacity (i.e., throughput) of the assay to test compounds. Further optimization of the assay could reduce variability and maintain or increase its capacity. The decision to further optimize or add replicates will have to be made for each assay.

    If in Section D the assay fails either test (MSR > 3 or Limits of Agreement outside the interval 1/3-3) then the variability of the assay is too high. The following options should be considered to reduce the assay variability:
    1. Optimizing the assay to lower variability in the signal (see Section V) of the raw data values Check that the dose range is appropriate for the compound results. Adding doses and/or replicates may improve the results. A minimum of 8 doses at half-log intervals is recommended. In general, it is better to have more doses (up to 12) rather than more replicates.
    2. Consider adding replicates as discussed below. Note that the impact of adding replication may be minimal, and so the Replicate Experiment Study should be used to assess whether increasing the number of replicates will achieve the objective.
    3. Adopt as part of the standard protocol to re-run results. For example, each compound may be tested once per run on 2 or more runs. Then averaging the results will reduce the assay variability (NB. In such cases the individual run results are stored in the database and then the data mining/query tools are used to average the results).

    To investigate the impact of adding replicate wells in the concentration-response assay you should conduct the Replicate-Experiment study with the maximum number of wells contemplated (typically 3-4 wells / concentration). To examine the impact of replication compute the MSR versus number-of-replicates curve. To construct this curve, make all data calculations using just the first replicate of each concentration to evaluate the MSR and Limits of Agreement for 1 well per concentration. Then repeat all calculations using the first two replicates per concentration, and so on until you are using all replicates. If the assay does not meet the acceptance criterion when all replicates are used then replication will not sufficiently impact the assay to warrant the replication. If it does meet the criterion using all replicates ascertain how many replicates are needed by noting the smallest number of replicates that are required to meet the Replicate-Experiment acceptance criterion. Two examples below will help illustrate the steps.

    A binding assay was run using 1 well per concentration and the Replicate-Experiment study did not meet the acceptance criterion. To examine if replication would help a new Replicate-Experiment study was conducted using 4 wells per concentration. Using just the first replicate from each concentration, the results were normalized, curves fit and Ki’s were calculated for each concentration-response curve. The MSR and LsA were evaluated. The entire calculation steps were repeated using the first 2 replicates, first 3 replicates and all 4 replicates, with the following results:

    Reps MSR LsA
    2 3.62 0.35-4.59
    3 3.32 0.43-4.74
    4 2.44 0.53-3.16

    From the table we can see that it takes all 4 replicates to meet the MSR acceptance criterion, and no amount of replication (up to 4 replicates) will meet LsA acceptance criterion.

    In a second study, a pair of uptake inhibition assays (the project had two targets, each measured by one assay) the Plate Uniformity Study indicated two replicates would be required to meet the Plate Uniformity Signal acceptance criteria in Assay 2. However, plate uniformity criteria concerning replication do not readily translate to dose-response requirements, and so the requirements were investigated in both assays. The Replicate-Experiment Study was conducted using two replicates. The calculations were performed using both replicates, and the re-calculated using just the first replicate. The MSR and LsA are summarized in the following table:

    Replicates Used Assay 1 Assay 2
    MSR LsA MSR LsA
    Rep 1 Only 2.27 0.44-2.27 3.30 0.28-3.08
    Both Reps 1.71 0.57-1.67 2.15 0.44-2.03

    Using two replicates both assays meet all acceptance criterion. Using just a single replicate Assay 1 still meets all criteria, while assay 2 does not. Note that in this instance both assays benefited from increased replication. However, assay 1 is a very tight assay and hence this benefit is not really needed in that case. So in this case the replication requirements were the same for both single dose screening and dose-response studies, but in general this will not be the case.

    Overview
    Sections C and D cover the validation of entirely new assays, or assays that are intended to replace existing assays. The replacement assays are “different” from the original assay, either because of facility changes, personnel differences, or substantively different detection and automation equipment. Assay upgrades and changes occur as a natural part of the assay life cycle. Requiring a full validation for every conceivable change is impractical and would serve as a barrier to implementing assay improvements. Hence full validation following every assay change is not recommended. Instead bridging studies or “mini-validation” studies are recommended to document that the change does not degrade the quality of the data generated by the new assay.

    The level of validation recommended has 3 tiers, from a small plate uniformity study (Tier I), to just the assay comparison portion of the Replicate-Experiment study (Tier II) to the full validation package of Sections C and D (Tier III). Examples of changes within each Tier are given below, along with the recommended validation study for that tier. Note that if the study indicates the change will have an adverse impact on assay quality (i.e. the study indicates there are problems), then the cause should be investigated and a full (Tier III) validation should be done. If the results from that study indicate the assays are not equivalent, but the new assay has to be implemented, then a the results should not be combined into one set.

    The following applies principally to changes in biological components of the protocol. If changes are made to the data analysis protocol then these can ordinarily be validated without generating any new data, by comparing the results using the original and new data analysis protocols on a set of existing data. Discuss any changes with a statistician. If changes are made to both the data analysis and biological components of the protocol then the appropriate tier should be selected according to the severity of the biological change as discussed below. The data analysis changes should be validated on the new validation data and any additional validation work may be needed as judged by the statistician.

    Tier I modifications are single changes in an assay such as a change to a reagent, instrumentation, or assay condition that is made either to improve the assay quality or increase the capacity without changing the assay quality. The changes can also be made for reasons unrelated to assay throughput or performance (e.g. change of a supplier for cost savings). Examples of such change are:
    • Changes in detection instruments with similar or comparable optics and electronics. E.g.: plate readers, counting equipment, spectrophotometers. A performance check for signal dynamic range, and signal stability is recommended prior to switching instruments.
    • Changes in liquid handling equipment with similar or comparable volume dispensing capabilities. Volume calibration of the new instrument is recommended prior to switching instruments. [Note that plate and pipette tip materials can cause significant changes in derived results (IC50, EC50). This may be due to changes in the adsorption and wetting properties of the plastic material employed by vendors. Under these conditions a full validation may be required].

    The purpose of the validation study is to document the change does not reduce the assay quality.

    Protocol
    Conduct a 4 plate Plate Uniformity Study using the layouts in the “2 Plates per Day” tab of the Plate Uniformity Template (the layouts are the same as Plates 1 and 2 of Section C.2. Plates 1 and 2 should be done using the existing protocol, and Plates 3 and 4 done using the new protocol on the same day using the same reagents and materials (except for the intentional change). Use the 2 Day / 2 Plates per Day template to conduct the analysis. Analysis
    The main analysis is a visual inspection of the “all plates” plots to ensure that the signals have not changed in either in magnitude and/or variability. The mean and SD calculations for each plate can help, but visual inspection is usually sufficient. Example
    An assay was changed by replacing a manual pipetting step with a multidrop instrument. A 4-plate Plate Uniformity study was run as per the protocol, with the manual pipetting done in plates 1 and 2, and the multidrop in plates 3 and 4. The results show that the mean percent activity is the same, and the multidrop’s varability superior (i.e. lower) to the manual pipetting.

    Graph15

    Graph16

    Tier I Validation study comparing manual pipetting (plates 1 and 2) versus Multidrop pipetting (plates 3 and 4) in GTPγS assay

    Tier II changes are more substantive than Tier I changes, and have greater potential to directly impact EC50/IC50 results. Examples of such changes are:
    • Changes in dilution protocols covering the same concentration range for the concentration–response curves. A bridging study is recommended when dilution protocol changes are required.
    • Lot changes of critical reagents such as a new lot of receptor membranes or a new lot of serum antibodies.
    • Assay moved to a new laboratory without major changes in instrumentation, using the same reagent lots, same operators and assay protocols.
    • Assay transfer to an associate or technician within the same laboratory having substantial experience in the assay platform, biology and pharmacology. No other changes are made to the assay.

    Protocol and Analysis
    Conduct the assay comparison portion of the Replicate Experiment Study discussed in Section D, i.e. compare one run of 20-30 compounds of the assay using the existing assay to one run of the assay under the proposed format and compare the results. If the compound set used in the original validation is available then one need to only run the set again in the new assay protocol, and compare back to Run 1 of the original Replicate-Experiment Study. The acceptance criterion is the same as for the assay comparison study: Both Limits of Agreement should be between 1/3 and 3.0.

    Substantive changes requiring full assay validation: When substantive changes are made in the assay procedures, measured signal responses, target pharmacology and control compound activity values may change significantly. Under these circumstances, the assay should be re-validated according to methods described in Sections IIC and IID. The following changes constitute substantive changes, particularly when multiple changes in factors listed below are involved:
    • Changes in assay platform: e.g.: Filter binding to Fluorescence polarization for kinase assays.
    • Changes in assay reagents (including lot changes and supplier) that produce significant changes in assay response, pharmacology and control activity values. For example, changes in enzyme substrates, isozymes, cell-lines, label types, control compounds, calibration standards, (radiolabel vs. fluorescent label), plates, tips and bead types, major changes in buffer composition and pH, co-factors, metal ions, etc.
    • Transfer of the assay to a different laboratory location, with distinctly different instrumentation, QB practices or training.
    • Changes in detection instruments with significant difference in the optics and electronics. For example, plate readers, counting equipment, spectrophotometers.
    • Changes in liquid handling equipment with significant differences in volume dispensing capabilities.
    • Changes in liquid handling protocol with significant differences in volume dispensing methods.
    • Changes in assay conditions such as shaking, incubation time, or temperature that produce significant change in assay response, pharmacology and control activity values.
    • Major changes in dilution protocols involving mixed solvents, number of dilution steps and changes in concentration range for the concentration-response curves.
    • Change in analyst/operator running the assay, particularly if new to the job and/or has no experience in running the assay in its current format/assay platform.
    • Making more than one of the above-mentioned changes to the assay protocol at any one time.

    Substantive changes require full validation, i.e. a three day Plate Uniformity Study and Replicate Experiment Study. If the intent is to report the data together with the previous assay data then an assay comparison study should be conducted as part of the Replicate Experiment study.

    1. Sittampalam GS, Iversen PW, Boadt JA, Kahl SD, Bright S, Zock JM, Janzen WP and Lister MD: Design of Signal Windows in High Throughput Screening Assays for Drug Discovery. J Biomol Screen 1997;2:159-169.
    2. Zhang J-H, Chung TDY, Oldenburg KR: A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J Biomol Screen 1999;4:67-73.
    3. Taylor PB, Stewart FP, Dunnington DJ, Quinn ST, Schulz CK, Vaidya KS, Kurali E, Tonia RL, Xiong WC, Sherrill TP, Snider JS, Terpstra ND, and Hertzberg RP: Automated Assay Optimization with Integrated Statistics and Smart Robotics. J. Biomol Screen 2000;5:213-225.
    4. Iversen PW, Eastwood BJ and Sittenpalam, GS: A Comparison of Assay Performance Measures in Screening Assays: Signal Window, Z’-Factor and Assay Variability Ratio. J of Biomol Screen, 2006;11:247-252.
    5. Bland JM, Altman DG: Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;I:307-310.
    6. Eastwood BJ, Farmen, MW, Iversen, PW, Craft, TJ, Smallwood, JK, Garbison, KE, Delapp, N, and Smith, GF: The Minimum Significant Ratio: A Statistical Parameter to Characterize the Reproducibility of Potency Estimates from Concentration-Response Assays and Estimation by Replicate-Experiment Studies. J Biomol Screen 2006;11:253-261.
    7. Eastwood, BJ, Chesterfield, AK, Wolff MC, and Felder CC: Methods for the Design and Analysis of Replicate-Experiment Studies to Establish Assay Reproducibility and the Equivalence of Two Potency Assays, in Gad, S (ed): Drug Discovery Handbook, John Wiley and Sons, New York, 2005, 667-688.