Abstract
Background:
Rapid diagnosis of SARS-CoV-2 infection in patients not primarily assigned with the diagnosis of COVID-19 is highly relevant to effectively rule out virus transmission among patients and medical staff.
The purpose is to develop a model for the prediction of the actual presence of a SARS-CoV-2 infection before a valid test result is available and to avoid unnecessary testing in Critical Care Units.
Methods:
Datasets of laboratory and blood gas analysis tests were collected retrospectively for the development and subsequent validation of machine learning (ML) based models. The data set was composed of 1. 254 SARS-CoV-2 positive cases, collected in an ICU dedicated to patients with COVID-19 pneumonia, 2a. 914 SARS-CoV-2 negative patients treated in a Neurocritical Care Unit and 2b. 32 patients treated for severe influenza pneumonia in a Medical ICU at the same hospital. The models were subsequently validated on a dataset collected from the Neurocritical Care Unit that consisted of data from 7 positive and 42 negative patients. Models were adapted to newly available laboratory values throughout their ICU stay. Extremely Randomized Trees (ERT) and Random Forest (RF) models were evaluated. A baseline model comprising fully grown trees, an optimized model including optimal values for the maximum depth,
and a simplified model that only uses the 6 most important features were trained.
Results:
The overall best model, evaluated via crossvalidation on the development set, is an optimized ERT model with a ROC AUC value of 0.946. The model performance on the validation set is best for the simplified RF model achieving a ROC AUC value of 0.701. Gini feature and permutation importance for the simplified RF model revealed hemoglobin, procalcitonin, C-reactive protein, glomerular
filtration rate based on CKD-EPI equation, creatinine, and urea as the most important input features. Using the simplified RF model and a threshold of 0.012 for the probability, a sensitivity above 80% with a specificity of 43% is achieved. Compared to a hypothetical daily testing regimen, using a threshold of 0.145, the simplified RF model detects all positive cases, and, with a false positive rate of 35%, daily tests might be reduced by two thirds.
Conclusions:
The model developed may support the medical staff in the ICUs by enabling faster and more reliable recognition of COVID-19. Unnecessary serial test sampling might be reduced. To ensure the quality of the model before clinical use, it should be further validated in prospective patient cohorts.