Utilizing Informative Missingness for Early Prediction of Sepsis

Janmajay Singh1, Oshiro Kentaro1, Raghava Krishnan2, Masahiro Sato1, Tomoko Ohkuma1, Noriji Kato1
1Fuji Xerox Co., Ltd., 2Fuji Xerox


Aims: Physicians have to routinely make crucial decisions about patients’ health in the ICU. Sepsis affects about 35% of ICU patients, killing approximately 25% of the afflicted. In this paper, we aim to predict the occurrence of sepsis early by studying the missingness of laboratory variables and trends in overall physiological data. Methods: Initially, owing to the small size of the dataset and its sparsity, we employed Random Forests and XGBoost which were deemed to be problem-appropriate. Models were evaluated with several window sizes and imputation methods. With the availability of additional data, we formally defined the problem as one of prediction by classification of highly sparse, multivariate time series data. For this problem setting, we plan to switch to sequential models like Hidden Markov Models and Gated Recurrent Units (GRU). Dealing with the sparsity of the dataset and making effective use of informative missingness of laboratory variables form the objective of this work. Variants of GRU have been proposed to deal with sparse, sequential data by utilizing informative missingness. As an extension to existing work, we will use matrix factorization based methods from the field of Recommender Systems which also deal with similarly sparse data. We assume that the observation of laboratory variables implies a doctor’s preventive action against sepsis onset. With this assumption, research in predicting a user’s binary feedback in recommender systems may be used to predict the next observation of laboratory variables, thereby improving information derived from missing patterns. Results: The XGBoost model with a sliding window of size 6 and no imputation achieved the best scores on the smaller, 5000 patients dataset. It achieved a mean AUC score of 0.858 on 5-fold stratified cross-validated data. The AUC and Utility scores of XGBoost on the test data were 0.744 and 0.277 respectively.