Exploring the Effects of Missingness and Imputation on the Early Prediction of Sepsis

Liam McCoy1, Srinivasan Sivanandan2, Daniel Dastoor2, Sneha Desai2, Angad Kalra2
1University of Toronto Faculty of Medicine, 2University of Toronto


Introduction: Sepsis is a major cause of morbidity and mortality in modern intensive care units (ICUs). Early detection and antibiotic treatment of sepsis is critical for improving patient survival in ICUs. Algorithms for the successful early prediction of sepsis from electronic health record physiological signals must deal with a high degree of missingness in laboratory values - values ordered infrequently and at irregular intervals. Reflecting deliberate choices to test or not, the pattern of missing and present values implicitly contain important clinical insights and must be actively considered in both the prediction and imputation processes. Methods: In this paper we establish and compare results for sepsis prediction with recurrent and non-recurrent prediction models. Further, we compare the predictive performance of our models on datasets imputed by methods of mean and forward imputation, Denoising Autoencoders, learned decay factors, and a Generative Adversarial Network. Through the addition of artificially induced missingness, we compare the performance of these imputation strategies through the reconstruction error of the physiological signals. Results: The best performing model in our experiments was an LSTM model trained on physiological signals and their corresponding missingness indicators, obtaining an AUROC of 0.839 ± 0.004 with a utility score of 0.444 ± 0.015. We saw no correlation between quality of imputation performance and subsequent predictive performance. Conclusion: In exploring these imputation strategies on the predictive performance of our models, our experiments show that the simplest imputation methods perform at least as well as enhanced imputation techniques, demonstrating the orthogonality of imputation and prediction performance.