Parsimony & predictive ability: Competing values?

A common assumption as far as statistical modeling and their predictive powers is that as databases grow in size and scope and available computing power increases, the models will become more and more accurate at predicting choice, behavior, risk, etc.

But what this blogger, a data miner by trade, points out is not only the falsity of this statement, but an implicit theoretical limitation to data mining.

The principle of parsimony is the statistical transliteration of Ockham’s razor, and as such sets the limit of statistical accuracy. In other words, beyond a certain limit, adding assumptions to a model by way of additional attributes (individual variables) does not improve the accuracy of its predictions.

Keeping it simple has just as much predictive validity and is more viable in the long run because it assumes less uncertainty, and unless the fundamental mathematical paradigm that gives statistical theory its basis undergoes a major shift that improves accuracy, the models cannot significantly increase their predictive capacity any time in the near future.

I think this is a clear indication that transdisciplinary communication by data miners and trained statisticians works to the advantage of the public good precisely because it raise awareness about the nature of these methods that influence so many aspects of our lives – most notable of those being marketing and sales.

Mark Twain’s popularization of the adage that there are “three kinds of lies: lies, damn lies, and statistics” is of particular importance in this age of digitized information. Statistics and numerical predictions can be manipulated to produce almost any desired result, though many times at a steep price.

Greater public awareness about the use and abuse of statistics is absolutely necessary if we are to control the numbers and avoid letting the numbers control us.

This entry was posted in Metrics, Transdisciplinarity. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>