Header

UZH-Logo

Maintenance Infos

On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models


Seibold, Heidi; Bernau, Christoph; Boulesteix, Anne-Laure; De Bin, Riccardo (2017). On the choice and influence of the number of boosting steps for high-dimensional linear Cox-models. Computational Statistics:Epub ahead of print.

Abstract

In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.

Abstract

In biomedical research, boosting-based regression approaches have gained much attention in the last decade. Their intrinsic variable selection procedure and ability to shrink the estimates of the regression coefficients toward 0 make these techniques appropriate to fit prediction models in the case of high-dimensional data, e.g. gene expressions. Their prediction performance, however, highly depends on specific tuning parameters, in particular on the number of boosting iterations to perform. This crucial parameter is usually selected via cross-validation. The cross-validation procedure may highly depend on a completely random component, namely the considered fold partition. We empirically study how much this randomness affects the results of the boosting techniques, in terms of selected predictors and prediction ability of the related models. We use four publicly available data sets related to four different diseases. In these studies, the goal is to predict survival end-points when a large number of continuous candidate predictors are available. We focus on two well known boosting approaches implemented in the R-packages CoxBoost and mboost, assuming the validity of the proportional hazards assumption and the linearity of the effects of the predictors. We show that the variability in selected predictors and prediction ability of the model is reduced by averaging over several repetitions of cross-validation in the selection of the tuning parameters.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

1 download since deposited on 28 Nov 2017
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > Epidemiology, Biostatistics and Prevention Institute (EBPI)
Dewey Decimal Classification:610 Medicine & health
Language:English
Date:28 November 2017
Deposited On:28 Nov 2017 15:06
Last Modified:19 Feb 2018 09:25
Publisher:Springer
ISSN:0943-4062
Additional Information:The final publication is available at Springer via http://dx.doi.org/10.1007/s00180-017-0773-8
OA Status:Closed
Publisher DOI:https://doi.org/10.1007/s00180-017-0773-8

Download

Content: Accepted Version
Language: English
Filetype: PDF - Registered users only until 28 November 2018
Size: 1MB
View at publisher
Embargo till: 2018-11-28