Bureau of Transportation Statistics (BTS)
Printable Version

Statistically-Based Validation of Computer Simulation Models in Traffic Operations and Management - Rejoinder

Jerome Sacks

Nagui M. Rouphail

Byungkyu (Brian) Park

Piyushimita (Vonu) Thakuriah

We thank the discussants for their comments. Their points are well taken and add focus to some key issues in planning or carrying out a validation.

COMMENTS AND QUESTIONS OF RILETT AND SPIEGELMAN

Rilett and Spiegelman raise the question of whether a junior engineer can adopt and implement our approach for validation. Our answer is yes. It would be a terrific learning experience, as it was for us!

Data Quality

They ask about the effect of data quality or accuracy of input data. This is a subject of great importance, wide open for study, and about which little has been done. An exception is Bayarri, Berger, and Molina (referred to in the Analysis of Uncertainty section), who have incorporated in their analysis the presence of observer error in the manually collected data.

Transferability

Is the strategy/methodology transferable to other networks? We think so, provided we stick to urban street networks with few pedestrians. Clearly, attention must be paid to driver behavior. As noted by Rilett and Spiegelman, it is typical to tune driver behavior input distributions. We have been deliberately cautious about doing so for fear of “over-tuning.” We did tune in two instances: we changed the geometry by creating a sink and source to remove a CORSIM inability to cope with a congested intersection, and we adjusted the free-flow speed parameter on one corridor to conform to actual field conditions. In a third instance, we noted the effect of driver behavior in utilizing, at one intersection, more green time than ostensibly displayed and would incorporate this change if we were to proceed to a third stage.

A new network may require similar tuning. Our recommendation would be to do so very carefully, in a limited way and only after identifying the specific flaws that can be overcome with defensible tuning. We are some distance away from making this formal, but we are concerned that overly ambitious tuning masks flaws and can fail to account for natural variations.

Accuracy of Predictions

The discussants are correct that more than one instance of accuracy of predictions are needed to assess predictive validity lest the evaluation suffer from the Nostradamus or Babe Ruth effect—dubious though legendary. And somebody has to keep score. Our hope is that this is taken seriously and made part of any program that pursues the establishment and use of simulation models.

Variations in Demand

We agree with Rilett and Spiegelman that major changes in signal plans on large networks could well affect demand rates and lead to unexpected system characteristics. However, dramatic changes in an urban context are unlikely in the short run without major changes to the network geometry, at which point a new context must be faced. We would not advocate predicting characteristics under such new conditions on the basis of old demand rates.

Any changes in turning movements (we did not note any exceptional ones) after implementation of the new plan in September 2000 could not have been the result of adaptation to a new signal plan—the plan was in effect for less than 24 hours before data were collected.

Variations in demand rates and turning percentages are being accounted for in the current work of Bayarri, Berger, and Molina (cited above). Their results quantify a decrease in system performance and an increase in the variability of performance.

Further study of the system under scenarios of different demand (e.g., changing the input data by fixed percentages) could be done; how to make meaningful changes to turning percentages is less obvious. Simulators (unlike CORSIM) that induce routes based on origin-destination information may be more amenable to such study, but these are issues further down the road.

DISCUSSION BY MAX MORRIS

Morris points out the complexity inherent in pursuing a validation strategy that can distinguish among the multiple sources of uncertainty and their effect on validation goals. This can be done, as Morris notes, by a Bayesian formulation and analysis and has recently been carried out by a team of researchers at the National Institute of Statistical Sciences in an application to a deterministic computer model. The application to stochastic simulators such as CORSIM is, in principle, doable; the actual implementation will have considerable complexity and has not yet been done.

Morris notes that the effect of misspecification or inadvertent omission of details in the model can induce a bias that should be accounted for. This can be done by adapting the Bayesian formulation of calibration in Kennedy and O’Hagan (2001) to the current situation, modeling the field data as simulator + bias + measurement error and modeling the bias. How to incorporate issues of variability in the model output vis-à-vis the variability in the field is not clear without more extensive, and expensive, field data.

We agree with Morris that a more extensive data collection and study would be needed to assure validity under different contexts such as different time periods, days of the week, or weather conditions. This is also a point raised by Rilett and Spiegelman. In reality, validation must be an ongoing, and perhaps never-ending, process interacting with model development. At any point in time, we ought to be able to quantify the reliability of the model.

To conclude, we are gratified that the discussants agree with us about the value and importance of pursuing the multiple issues inherent to validation. We are possibly less skeptical than Rilett and Spiegelman about the utility of our approach in practice, but we seem to be in agreement with them and with Morris about what has to be done.

REFERENCE

Kennedy, M.C. and A. O’Hagan. 2001. Bayesian Calibration of Computer Models. Journal of the Royal Statistical Society B 63:425–64.