U.S. Department of Commerce

Research Reports

You are here: Census.govSubjects A to ZResearch Reports Sorted by Year › Abstract of RR93/04
Skip top of page navigation

The Overfitting Principles Supporting AIC

David F. Findley

RR 93/04

ABSTRACT

In the context of statistical model estimation and selection, what is "overfit"? What is "overparameterization"? When is a "principle of parsimony" appropriate? Suggestive answers are usually given to such questions rather than precise definitions and mathematical statistical results. In this article, we investigate some relations that yield asymptotic equality between a variate which is the natural measure of overfit due to parameter estimation and one which is a natural measure of the accuracy loss that occurs when the estimated model is applied to an independent replicate of the data used for estimation. Relations connecting overfit with accuracy loss are what we call overfitting principles. The principles we consider yield a theoretical framework in which questions like those posed above can be answered with some precision and with allowance for the possibility that the model family does not contain the true model. One of the relations is shown to be conditionally equivalent to the bias-correction property used by Akaike to motivate the definition of AIC. Our results establishing this principle also provide the first complete verifications of AIC's bias-correction property for general exponential families for i.i.d. data and for invertible Gaussian ARMA time series models.
Source: U.S. Census Bureau | Statistical Research Division | (301) 763-3215 (or chad.eric.russell@census.gov) |   Last Revised: October 08, 2010