Header

UZH-Logo

Maintenance Infos

Methods for handling missing variables in risk prediction models


Held, Ulrike; Kessels, Alfons; Garcia Aymerich, Judith; Basagaña, Xavier; Ter Riet, Gerben; Moons, Karel G M; Puhan, Milo Alan (2016). Methods for handling missing variables in risk prediction models. American Journal of Epidemiology, 184(7):545-551.

Abstract

Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

Abstract

Prediction models should be externally validated before being used in clinical practice. Many published prediction models have never been validated. Uncollected predictor variables in otherwise suitable validation cohorts are the main factor precluding external validation. We used individual patient data from 9 different cohort studies conducted in the United States, Europe, and Latin America that included 7,892 patients with chronic obstructive pulmonary disease who enrolled between 1981 and 2006. Data on 3-year mortality and the predictors of age, dyspnea, and airflow obstruction were available. We simulated missing data by omitting the predictor dyspnea cohort-wide, and we present 6 methods for handling the missing variable. We assessed model performance with regard to discriminative ability and calibration and by using 2 vignette scenarios. We showed that the use of any imputation method outperforms the omission of the cohort from the validation, which is a commonly used approach. Compared with using the full data set without the missing variable (benchmark), multiple imputation with fixed or random intercepts for cohorts was the best approach to impute the systematically missing predictor. Findings of this study may facilitate the use of cohort studies that do not include all predictors and pave the way for more widespread external validation of prediction models even if 1 or more predictors of the model are systematically missing.

Statistics

Citations

6 citations in Web of Science®
4 citations in Scopus®
Google Scholar™

Altmetrics

Downloads

15 downloads since deposited on 21 Sep 2016
10 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:04 Faculty of Medicine > University Hospital Zurich > Clinic and Policlinic for Internal Medicine
04 Faculty of Medicine > Epidemiology, Biostatistics and Prevention Institute (EBPI)
Dewey Decimal Classification:610 Medicine & health
Language:English
Date:14 September 2016
Deposited On:21 Sep 2016 08:40
Last Modified:08 Dec 2017 20:24
Publisher:Oxford University Press
ISSN:0002-9262
Free access at:Official URL. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/aje/kwv346
Official URL:http://aje.oxfordjournals.org/content/early/2016/09/13/aje.kwv346.full.pdf+html?sid=1dd8b4dc-e60c-4975-a754-052f72c6a980
PubMed ID:27630143

Download

Download PDF  'Methods for handling missing variables in risk prediction models'.
Preview
Content: Published Version
Filetype: PDF
Size: 325kB
View at publisher