Papers that reference this paper.

 

Template Learning Models for Visual Target

Detection in Fixed and Random Noise

 

Albert J. Ahumada, Jr. and Bettina L. Beard

 

NASA Ames Research Center, Moffett Field, California

 

Models for the detection of targets in random backgrounds usually include an observer template that approximately matches the target. When detection thresholds are measured in noise using a two-interval forced-choice paradigm, three randomization conditions have been used: fixed, twin, and random. In the fixed noise condition, a single noise sample is presented in both intervals of all trials. In the twin noise condition, the same noise sample is used in the two intervals of a trial, but a new sample is generated for each trial. In the random noise condition a new sample is always used. Fixed noise conditions usually result in lower thresholds than twin noise, and twin noise usually results in lower thresholds than random noise. Our template learning models attribute the advantage of fixed over twin noise either (1) by the template learning process introducing more noise into the template in the twin noise condition, or (2) to templates reducing position uncertainty by incorporating the fixed noise. In our models the template learning process contributes to the accelerating nonlinear increase in performance with signal amplitude at low signal-to-noise ratios.

Introduction

From the theory of signal detectability, the ideal observer trying to find a known signal in a sample of white Gaussian noise should cross-correlate a copy of that signal with the noise and report a detection if the correlation is greater than a criterion value (Peterson, Birdsall & Fox, 1954). The copy of the signal can be regarded as a matched filter or matched template. The first psychophysical application of this theory was visual target detection without noise. The noise was assumed to be internal to the observer (Swets, Tanner & Birdsall, 1964). Early models developed to predict visual target detection in noise included a template, internal noise, and the external noise. The inefficiency of the observer relative to the ideal observer was attributed to two factors, (1) the strength of the internal noise relative to the external noise and (2) the relation of the observer template with the signal, called (somewhat inappropriately) sampling efficiency (Legge, Kersten & Burgess, 1987). Observer uncertainty about signal parameters also reduces observer efficiency (Pelli, 1985).

If the external noise is Gaussian, but not white, the ideal observer does not use a matched filter. The ideal template is a pre-whitened signal template applied to the pre-whitened signal-plus-noise. Results of some studies that have varied the spectrum of the external masking noise suggest that observers can adjust their templates when the noise spectrum changes (Myers, Barrett, Borgstrom, Patton & Seeley, 1985; Burgess, Li & Abbey, 1997; Burgess, 1999). Barrett, Yao, Rolland and Myers (1993) claim that the observer behavior is consistent with the notion that the observers compute the best linear discriminant function using Hotelling’s rule, but this claim seems to be disputed by the two papers cited above that found the apparent template adjustment result. For stationary Gaussian noise, the Hotelling rule performance is equivalent to that of an observer who pre-whitens the signals assuming the noise spectrum is the average of that on signal-plus-noise and the noise-only trials. To implement the Hotelling rule, the observer must construct estimates of the expected images on both signal-plus-noise trials and noise-only trials. Even if the Hotelling rule can predict the behavior in the long run, it does not provide any mechanism that would allow the observer to develop the discriminant function. Here we propose template learning models. Our models do not find the Hotelling solution, but their predictions might be similar if the adaptation in the visual system pre-processor provided sufficient pre-whitening (Eckstein, Watson & Ahumada, 1997).

 

Figure 1.

 

Figure 1 presents a schematic of a simple image detection model with template learning and positional uncertainty. Gray-shaded boxes represent functions that have yet to be included in the working mathematical model. An input image plus added noise enters the visual system. An internal visual representation is formed of the stimulus, which includes internal noise sources. This noisy visual representation is correlated with memory templates, a decision is made as to which stimulus was present, feedback is given, and the templates are updated.

The experimental result that led us to develop a template learning model is the improvement in detection performance when a fixed noise is used rather than random noise. For a Yes/No detection task, a fixed noise masker can be thought of as contributing additional internal noise (Ahumada, 1987). Burgess and Colborne (1988) used a two-alternative forced-choice detection procedure. They compared a random noise condition where both intervals had new noise samples with a twin noise condition where each trial had a new sample, but the same sample was used in both intervals. They interpreted the increased masking in the random condition over the twin condition as indicating the relative strengths of the internal and external noises. Their fixed template model, however, predicts that the twin condition masking would be the same as fixed noise masking. Ahumada and Beard (1997) and Beard and Ahumada (1999) found lower detection thresholds in the fixed noise condition than in the twin condition. In this paper we describe in a more general notation the template-matching models with adaptable templates introduced by Beard and Ahumada (1999). The models illustrate the possible role of template learning in the lowering of target detection thresholds in fixed noise conditions relative to twin noise conditions.

A Template Learning Model

The sequence of observable events in a two-alternative forced-choice experiment consists of a sequence of trials. On trial n there are two stimulus presentation intervals, indexed by j, and two stimuli, indexed by k. The observer responds by selecting the interval thought to have the target (arbitrarily indicated by k = 2). The experimenter then gives correct feedback.

Each trial also has unobservable events, the internal visual representation of the stimulus and the memory representations (templates) for each stimulus. For simplicity we assume there is only one memory representation for each type of stimulus. We represent these internal states by vectors. I(n, j) represents the internal visual representation of the stimulus presented in interval j of trial n. M(n, j, k) represents the memory representation for stimulus k after interval j, j = 0, 1, 2, of trial n. The j = 0 index give a way of representing the state of the memory representation after the response and feedback of the preceding trial.

A vision model specifies how the internal states I are computed from the stimuli. Detection models specify how the response is generated from the internal states I and the templates M. Here we are concerned with the template learning procedure, which is the procedure for computing M as a function of the initial states M(0, 0, k) and the history of prior events. Template models have been described that combine a reasonable amount of complexity in both the vision and decision components (Eckstein et al., 1997; Lu & Dosher, 1999). We simplify both the vision model and the decision model in order to focus on the learning procedure.

In our simplified vision model, the internal visual representation vector I(n, j) is the sum of three vectors, V(k) an average internal visual representation of signal k, N(n, j) an internal representation of the external noise presented in interval j, and G(n, j) a random vector sample of internal noise,

I(n, j) = V(k) + N(n, j) + G(n, j). (1)

In our simplified decision model, the observer computes the inner product of each internal visual representation with both memory representations and selects the interval whose inner product difference is largest in favor of the target. Assuming the target is stimulus k = 2, the observer responds "interval 1", if

[M(n, 1, 2)*I (n, 1) - M(n, 1, 1)*I (n, 1)] > [M(n, 2, 2)*I (n, 2) - M(n, 2, 1)*I (n, 2)], (2a)

where the star symbol "*" indicates the vector inner product. This expression can be rewritten as

[M(n, 1, 2) - M(n, 1, 1)]*I (n, 1) > [M(n, 2, 2) - M(n, 2, 1)]*I (n, 2). (2b)

 

The template adjustment process generates a "matched filter" for each signal, a template typical of the internal representation of that signal. After each stimulus and feedback event the template is replaced with a weighted average of itself and the internal visual representation of the stimulus, using a learning rate parameter, r. Equation 3a represents the adjustment after the first stimulus; Equation 3b represents the effect of the second stimulus; and Equation 3c represents the effect of the response and feedback.

M(n, 1, k) = [1 - r(n, 0, 1, k)] M(n, 0, k) + r(n, 0, 1, k) I(n,1), (3a)

M(n, 2, k) = [1 - r(n, 1, 2, k)] M(n, 1, k) + r(n, 1, 2, k) I(n,2), (3b)

M(n+1, 0, k) = [1 - r(n, 2, 0, k, 1) - r(n, 2, 0, k, 2)] M(n, 2, k)

+ r(n, 2, 0, k, 1) I(n, 1) + r(n, 2, 0, k, 2) I(n, 2), (3c)

where r(n, 2, 0, k, 1), for example, is the weight for combining image I(n, 1) with template k after the response and feedback. The learning rates should be a function of many things, such as the similarity of M and I, the feedback on the trial, and the experience of the observer.

Jakowatz, Shuey, and White (1961) show how matched templates can be learned without any feedback and without any prior knowledge of the signal. Here we show some results for two models that learn only from the feedback. The main difference between the two models is that no position uncertainty is assumed in the first model. This simplification allows closed-form expressions for the model performance and clarifies the role of the different parameters. In the first model we also assume that the observer remembers only the image from the second interval and updates the template that feedback associates with the second interval. This allows the model to predict a difference between fixed and twin conditions without position uncertainty in the model.

 

Model 1: A Template Learning Model without Positional Uncertainty 

Response generation rules

Removing template adjustment during the stimulus intervals removes the dependence of the template upon the interval within the trial. The response rule of Equation (2) is then to respond "interval 1" if

[M(n, 2) - M(n, 1)] *[I (n, 1) - I (n, 2)] > 0. (4)

The template adjustment rule of Equation (3) simplifies to

M(n+1, k) = [1 - r(1, n, 2, k) - r(1, n, 2, k)] M(n, k)

+ r(n, k,1) I(n, 1) + r(n, k, 2) I(n, 2), (5)

For Model 1 we assume that only the interval 2 internal image is effective and learning is constant and perfectly consistent with the correct feedback. That is,

r(n, k, 2) = r , if stimulus k was in interval 2 ,

r(n, k, 2) = 0, otherwise. (6)

 

Asymptotic behavior of the model

The updated template is a weighted average of the stimulus and the internal noise on all the trials on which it was adjusted. M(0, k) is the initial value of template k. Let I(n(m, k), 2) be the image on the mth trial that stimulus k was in the second interval, then the template M(n(m, k), k) at the end of the mth update is given by

M(n(m, k), k) = (1 - r)^m M(0, k) + r (1 - r)^(m-1) I(n(1, k), 2) + ... + r I(n(m, k), 2) .(7)

From our simplified vision model, we have

I(n(m, k), 2) = V(k) + N(n(m, k), 2) + G(n(m, k), 2). (8)

For a fixed noise masking condition, the first two terms on the right are fixed and the third is random. For random and twin masking conditions, the second two terms are random variables. Let F(k) be the fixed part of the stimulus on update m of this template k and X(n(m, k)) be the random part of the stimulus. For large values of m, the contribution from M(0, k) becomes negligible and the template approaches

F(k) + Sum[ X(n(m-i, k), k) r (1 - r)^i], (9)

where the summation index i ranges from 0 to m.

If the X’s are independent, with mean 0 and covariance matrix S(X), the asymptotic template distribution has a mean of F(k) and a co-variance matrix S(M) of

S(M) = S(X) r^2 / (1-(1- r)^2)

= S(X) r / (2 - r)

= S(X) / n(r). (10)

The quantity n(r) = (2 - r) / r can be regarded as the effective number of noise components averaged for a given r. For r = 1, it is 1, for r = 0.5, it is 3.

 

Asymptotic detection performance of the model

The performance of the model is controlled by the quality of the templates. Let the difference of the templates be D(M, n) = M(n, 2) - M(n, 1), and the difference of the stimulus representations be D(I, n) = I(n, 1) - I(n, 2), the response rule of Equation 6 becomes, respond "interval 1" if

D(M, n) * D(I, n) > 0. (11)

Because the randomness in D(M, n) comes from past trials, D(M, n) and D(I, n) are independent random vectors. The mean of D(M, n) is S = F(2) - F(1). The mean of D(I, n) is S when the stimulus 2 is interval 1 and -S when it is in interval 2. The mean of D(M, n) * D(I, n) is E = S*S, when the signal is in the first interval and -E when it is in the second. The performance of the model can be characterized by

d' = (E - (-E))/SD[D(M, n)*D(I, n)] = 2 E/SD[D(M, n)*D(I, n)], (12)

where SD[] indicates the standard deviation. If the individual components of the representations are independent and have the same variance, the variance of D(M, n)*D(I, n) can be expressed as

SD[D(M, n)*D(I, n)]^2 = E ( s(M)^2 + s(I)^2) + c s(M)^2 s(I)^2 (13)

where the s’s are the standard deviations of the individual components and c is the number of components in the vector.

For the twin and fixed noise cases, the variance of a component of D(I, n) is given by

s(I)^2= 2 s(G)^2, (14)

where s(G) is the standard deviation of an individual internal noise component. For the fixed noise case, the variance of a component of D(M, n) is given by

s(M)^2= 2 s(G)^2 / n(r), (15)

while for the twin noise case, it is

s(M)^2= 2 (s(N)^2 + s(G)^2) / n(r), (16)

where s(N) is the standard deviation of a component of the internal representation of the external noise. The predictions for the random noise condition are obtained by substituting the sum s(N)^2+s(G)^2 for s(G)^2 in the fixed noise predictions. Figure 2 plots the predicted performance in the twin and fixed conditions as a function of d’ for the ideal observer, limited only by the internal noise,

d’ = Sqrt[2 E / s(G)^2]. (17)

As the learning parameter r becomes very small, n(r) becomes very large, and the performance for both twin and fixed noise conditions becomes ideal. (It would take many updates to approach the asymptotic level.) The higher curve below the unity slope line shows predicted model performance in the fixed noise condition with the parameters c = 16 and r = 0.5. The lower curve shows the twin noise prediction for those same parameters and s(N) = 2 s(G).

 

Figure 2.

 

Figure 2 illustrates another feature of the template learning model. The model predicts an accelerating nonlinearity of performance with signal level generated by the poor quality of the templates at low signal levels (Tanner, 1961). Tanner inferred this effect when he failed to find a close correspondence between observer behavior and a stimulus uncertainty model as signal strength increased. Template learning now joins signal uncertainty (Pelli, 1985) and nonlinear transducer (Legge & Foley, 1980) as a quantified process for explaining non-linear performance for low signal-to-noise ratios.

 

Model 2: A Template Learning Model with Position Uncertainty

Position uncertainty can be added to the above model by assuming that each memory template is cross-correlated with the input image representation over a range of positions. In general, these correlations are then weighted by an uncertainty spread function, and then the maximum weighted correlation and its position are found (Barth, Beard & Ahumada, 1999). We can revise Equation (2a) by replacing the inner product * with the operation (*) indicating the maximum of the weighted cross-correlations over positions. That is, the observer responds "interval 1" if

[M(n, 1, 2)(*)I (n, 1) - M(n, 1, 1)(*)I (n, 1)] >

[M(n, 2, 2)(*)I (n, 2) - M(n, 2, 1)(*)I (n, 2)]. (18)

This expression cannot be simplified as in Equation (2b) because the (*) operation is not linear. The memory template updating equation also needs modification. The template adjustment rule of Equation (5) can be rewritten as

M(n+1, k) = [1 - r(1, n, 2, k) - r(1, n, 2, k)] M(n, k)

+ r(n, k, 1) I*(n, 1, k) + r(n, k, 2) I*(n, 2, k), (19)

where I*(n, j, k) is the image shifted to the maximum weighted cross-correlation position of template k.

For Model 2 we assume that for both intervals the internal image is equally effective and learning is constant and consistent with the correct feedback. That is,

r(n, k, j) = r , if stimulus k was in interval j ,

r(n, k, j) = 0, otherwise. (20)

 

Model Simulations

In our simulation environment, we used one of two 4 x 4 pixel targets for the internal target V(2), either a simple 2 x 2 checkerboard or a 2 x 4 light bar above a 2 x 4 dark bar. The targets were centered in a 6 x 6 array of white noise. The position uncertainty weighting function was uniform over all 9 possible positions.

The initial templates M(0, k) in our model were ideal, the asymptotic templates learned with arbitrarily small learning rates and no position uncertainty. For fixed conditions the initial templates included the external noise sample, while for twin conditions the initial templates were the target and a zero image. For fixed noise, the fixed external component of the template noise helps lock-in on the signal position. For twin noise, however, the external noise is changing on each trail and effectively adds random noise to the templates whenever the target and no-target templates correlate best at different positions.

 

Figure 3

 

To see the role of learning rate and the ratio of internal to external noise on the fixed/twin effect we estimated the 79% correct threshold level for the two conditions. Each threshold was estimated from six repetitions of 400 trials at 4 signal levels. Figure 3 shows the threshold difference (twin - fixed) for the two target patterns, checkerboard and bar. The abscissa is the ratio of the internal noise standard deviation to the external noise standard deviation, s(G)/s(N). The parameter is the learning rate. The horizontal lines indicate the average psychophysical values and the associated confidence interval from the data of Beard and Ahumada (1999). The largest fixed/twin noise difference occurred for the smaller internal noise level since the external noise causes the effect. Of the conditions we simulated, the parameters whose results best matched the data were r = 0.0 or 0.1 and s(G)/s(N) = 1. When the model has a learning rate of zero, learning is still needed to predict the result, since the initial templates reflect learning of the fixed noise sample. In a separate simulation, we estimated the fixed/twin threshold difference with a learning rate of zero and the initial templates set to the target and zero noise images for both conditions. In this case, the threshold difference disappeared.

Template matching models (Barrett, 1992; Burgess, 1994) which correlate visual representations of the images with one or more memory templates, can predict target detectability in random noise masking conditions, where a different noise sample is added to each input image. If the same template is used in fixed and twin noise conditions, they do not, however predict a fixed/twin noise masking difference. Image discrimination models cannot predict either a fixed/twin difference or random noise effects (since the difference calculation would include not only the target, but also any changes in the noise). Model 2 demonstrates that combining template learning with positional uncertainty can explain the fixed/twin noise threshold difference. Model 1 shows how template learning alone could explain the fixed/twin effect, and how template learning contributes to the accelerating nonlinearity associated with uncertainty and transducer functions (Pelli, 1985; Legge & Foley, 1980).

 

Acknowledgments

This work was supported in part by NASA Grant 199-06-39, NASA Aeronautics RTOP #505-64-53, NASA Cooperative Agreement NCC2-327 with the San Jose State University Foundation.

 

References

Ahumada, A. J., "Putting the noise of the visual system back in the picture," Journal of the Optical Society of America A, 4, 2372-2378 (1987).

Ahumada, A. J., B. L. Beard, "Image discrimination models predict detection in fixed but not random noise," Journal of the Optical Society of America A, 14, 2471-2476 (1997).

Barrett, H. H., J. Yao, J. P. Rolland, K. J. Myers, "Model observers for assessment of image quality," Proceedings of the National Academy of Sciences, U. S. A., 90, 9758-9765 (1993).

Beard, B. L., A. J. Ahumada, "Detection in fixed and random noise in foveal and parafoveal vision explained by template learning," Journal of the Optical Society of America A, 3, 755-763 (1999).

Burgess, A. E., "Visual signal detection with two-component noise: lowpass spectrum effects," Journal of the Optical Society of America A, 16, 694-704 (1999).

Burgess, A. E., B. Colborne, "Visual signal detection: IV. Observer inconsistency," Journal of the Optical Society of America A, 5, 617-628 (1988).

Burgess, A. E., X. Li, C. K. Abbey, "Visual signal detectability with two noise components: anomalous masking effects," Journal of the Optical Society of America A, 14, 2420-2442 (1997).

Eckstein, M., A. B. Watson, A. J. Ahumada, "Visual signal detection in structured backgrounds: II. Effects of contrast gain control, background variations, and white noise," Journal of the Optical Society of America A, 14, 2406-2419 (1997).

Jakowatz, C. V., R. L. Shuey, G. M. White, "Adaptive waveform recognition," Symposium on 'Information Theory' held at the Royal Institute, London, C. Cherry, ed., (Aug. 29-Sept 2, 1961).

Legge, G. E., J. M. Foley, "Contrast masking in human vision," Journal of the Optical Society of America, 70, 1458-1471 (1980).

Legge, G. E., D. Kersten, and A. E. Burgess, "Contrast discrimination in noise," Journal of the Optical Society of America A, 4, 381-404 (1987).

Lu, Z. L., Dosher, B. A., "Characterizing human perceptual inefficiencies with equivalent internal noise," Journal of the Optical Society of America A, 16, 764-778 (1999).

Myers, K. J., H. H. Barrett, M. C. Borgstrom, D. D. Patton, G. W. Seeley, "Effect of noise correlation on detectability of disk signals in medical imaging," Journal of the Optical Society of America A, 2, 1752-1759 (1985).

Pelli, D. G., "Uncertainty explains many aspects of visual contrast detection and discrimination," Journal of the Optical Society of America A, 2, 1508-1532 (1985).

Peterson, W. W., T. G. Birdsall, and W. C. Fox, "The theory of signal detectability," Transactions of the IRE Group on Information Theory, PGIT-4, 171-212 (1954).

Swets, J. A., W. P. Tanner, T. G. Birdsall, "Decision processes in perception," in J. A. Swets, ed., Signal Detection and Recognition by Human Observers, John Wiley and Sons: New York (1964).

Tanner, W. P., "Physiological implications of psychophysical data," Annals of the New York Academy of Science, 89, 752-765 (1961), reprinted in J. A. Swets, ed., Signal Detection and Recognition by Human Observers, John Wiley and Sons: New York (1964).