Abstract
BACKGROUND: It is unclear whether data-driven machine learning models, which are trained on large epidemiological cohorts, may improve prediction of co-morbidities in people living with HIV.
METHODS: In this proof-of-concept study, we included people living with HIV of the prospective Swiss HIV Cohort Study with a first estimated glomerular filtration rate (eGFR) >60 ml/min/1.73 m2 after January 1, 2002. Our primary outcome was chronic kidney disease (CKD) ─ defined as confirmed decrease in eGFR ≤60 ml/min/1.73 m2 over three months apart. We split the cohort data into a training set (80%), validation set (10%), and test set (10%) ─ stratified for CKD status and follow-up length.
RESULTS: Of 12,761 eligible individuals (median baseline eGFR, 103 ml/min/1.73 m2), 1,192 (9%) developed a CKD after a median of eight years. We used 64 static and 502 time-changing variables: Across prediction horizons and algorithms and in contrast to expert-based standard models, most machine learning models achieved state-of-the-art predictive performances with areas under the receiver operating characteristic curve and precision recall curve ranging from 0.926 to 0.996 and from 0.631 to 0.956, respectively.
CONCLUSIONS: In people living with HIV, we observed state-of-the-art performances in forecasting individual CKD onsets with different machine learning algorithms.