SED navigation bar go to SED home page go to SED seminars page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Statistical Engineering Division
Seminar Series

Data Mining with Stepwise Regression

Dean Foster
Statistics Department
The Wharton School
University of Pennsylvania

This talk will discuss using stepwise regression to fit very large datasets. The goal of this talk is to show that stepwise regression can be an easy and effective tool in data mining large datasets. The only prior knowledge I'll assume is that you are comfortable using linear regression.

I will discuss some of the pitfalls that exist in stepwise regression and how they can be avoided. To avoid these pitfalls three modifications to standard regression are required: (1) use interactions to capture non-linearities, (2) use Bonferroni to pick variables to include, and (3) use the sandwich estimator to get robust standard errors. The talk will explain what each of these three modifications are and why they are necessary. If all three of these are done, we end up with a procedure that can be used on almost any data set.

The papers behind this talk can be found at my web site http://diskworld.wharton.upenn.edu/. Much of this talk is found in the paper on predicting bankruptcy.

NIST Contact: Walter Liggett, x-2851.

Date created: 1/14/2002
Last updated: 1/14/2002
Please email comments on this WWW page to sedwww@nist.gov.