Statistical Engineering Division
Seminar Series
Data Mining with Stepwise Regression
Dean Foster
Statistics Department
The Wharton School
University of Pennsylvania
This talk will discuss using stepwise regression to fit very large
datasets. The goal of this talk is to show that stepwise regression
can be an easy and effective tool in data mining large datasets. The
only prior knowledge I'll assume is that you are comfortable
using linear regression.
I will discuss some of the pitfalls that exist in stepwise regression
and how they can be avoided. To avoid these pitfalls three
modifications to standard regression are required: (1) use
interactions to capture non-linearities, (2) use Bonferroni to pick
variables to include, and (3) use the sandwich estimator to get robust
standard errors. The talk will explain what each of these three
modifications are and why they are necessary. If all three of these
are done, we end up with a procedure that can be used on almost any
data set.
The papers behind this talk can be found at my web site
http://diskworld.wharton.upenn.edu/.
Much of this talk is found in the paper on predicting bankruptcy.
NIST Contact:
Walter Liggett, x-2851.