From Santa Fe Institute Events Wiki
Statistical Inference for Complex Networks Workshop, December 3-5, 2008, Santa Fe NM
Thursday, December 4, 2008
9:00 - 10:00 Cosma Shalizi (homepage)
Selecting Among Stochastic Models
When multiple different models are proposed for the same data set, it is generally necessary to pick one of them, at least tentatively, as a superior description of the data. Doing so reliably is the problem of model selection. Beginning with a reminder of why the obvious approach --- pick the model which fits best --- is often a Bad Idea, I describe the major approaches: penalization methods (including information-criteria penalties), cross-validation, formal theories of capacity control (covering numbers, VC dimension, and all that), sieves and encompassing, non-nested hypothesis testing, and model adequacy testing. I also discuss model averaging and ensemble techniques, which avoid picking a single best model, and why they are not (always) recipes for over-fitting.