A common assumption as far as statistical modeling and their predictive powers is that as databases grow in size and scope and available computing power increases, the models will become more and more accurate at predicting choice, behavior, risk, etc.
The principle of parsimony is the statistical transliteration of Ockham’s razor, and as such sets the limit of statistical accuracy. In other words, beyond a certain limit, adding assumptions to a model by way of additional attributes (individual variables) does not improve the accuracy of its predictions.
Keeping it simple has just as much predictive validity and is more viable in the long run because it assumes less uncertainty, and unless the fundamental mathematical paradigm that gives statistical theory its basis undergoes a major shift that improves accuracy, the models cannot significantly increase their predictive capacity any time in the near future.
I think this is a clear indication that transdisciplinary communication by data miners and trained statisticians works to the advantage of the public good precisely because it raise awareness about the nature of these methods that influence so many aspects of our lives – most notable of those being marketing and sales.
Mark Twain’s popularization of the adage that there are “three kinds of lies: lies, damn lies, and statistics” is of particular importance in this age of digitized information. Statistics and numerical predictions can be manipulated to produce almost any desired result, though many times at a steep price.
Greater public awareness about the use and abuse of statistics is absolutely necessary if we are to control the numbers and avoid letting the numbers control us.