Development of a Sepsis Early Warning Indicator

Gregory Arbour, David Dai, Kasthuri Karunanithi, Neal Kaw, Sebnem Kuzulugil, Josh Murray, ChloĆ© Pou-Prom, Michaelia Young
St. Michael's Hospital


Aims: Sepsis is a life-threatening condition caused by infection. It is estimated to affect 15% of intensive care unit (ICU) patients and is related to approximately 1 in 8 deaths in Canada. Identifying and treating sepsis before it happens leads to decreased mortality and shorter lengths of stay. We used the PhysioNet 2019 Challenge dataset to identify sepsis before onset. The dataset consisted of routinely-collected laboratory results, vital signs and demographic information. Methods: After exploratory analysis, we normalized each variable and imputed missing values using a fill-forward approach (i.e., measures from hour 1 get carried over to hour 2). We then used mean imputation to fill in the missing values prior to the first recorded observations. The features were given as input to a gradient boosted algorithm, and we used 10-fold cross-validation with a random search to select the best hyperparameters. Results: The dataset included 40,330 patient encounters, with 2,932 (7.3%) meeting the definition for sepsis. The data included 1,551,920 hours of observation, with a median of 38 per patient (Interquartile Range: 24-47 hours). Due to the unbalanced nature of the data, we trained the model by downsampling the majority class (i.e. data points without the sepsis outcome). We picked the model and threshold cutoff that achieved the best utility score. Our final model achieved a mean AUROC of 0.796 (SD of 0.007), F-measure of 0.143 (SD of 0.0036) and utility score of 0.298 (SD of 0.0129) on 10-fold cross-validation. Conclusion: We developed a gradient boosting classifier to identify sepsis before onset in ICU patients from three different hospitals. Since our current approach treats each timestamp as an individual data point, future efforts will focus on making use of past information and on incorporating features that capture the direction and magnitude of change for all vitals and laboratory test results.