Abstract
This paper introduces and analyzes a procedure called Testing-Based Forward Model Selection (TBFMS) in linear regression problems. This procedure inductively selects covariates that add predictive power into a working statistical model before estimating a final regression. The criterion for deciding which covariate to include next and when to stop including covariates is derived from a profile of traditional statistical hypothesis tests. This paper proves probabilistic bounds for prediction error and the number of selected covariates, which depend on the quality of the tests. The bounds are then specialized to a case with heteroskedastic data with tests derived from Huber-Eicker-White standard errors. TBFMS performance is compared to Lasso and Post-Lasso in simulation studies. TBFMS is then analyzed as a component into larger post-model selection estimation problems for structural economic parameters. Finally, TBFMS is used to illustrate an empirical application to estimating determinants of economic growth.