Randomly under Sampled Boosted Tree for Predicting Sepsis from Intensive Care Unit Databases

Peter Doggart and Megan Rutherford
B-Secur Ltd


Introduction: Bacterial infection can result in sepsis; a toxic immune response by the body. Although the rate of mortality due to sepsis has fallen within the UK, overall rates remain higher than in Europe. Early detection of sepsis has been linked to elevated successful outcomes. This work focuses on the use of a Random Under Sample (RUS) Boosted Tree for classifying sepsis from intensive care unit databases.

 Method: To account for missing data (e.g. tests not being performed), forward-filling imputation was employed. For each clinical variable (e.g. heart rate, fibrinogen etc.), discrete categories were obtained using K-means clustering. Remaining missing clinical values were included in the data set as an extra discrete category. Demographics such as age and gender were left unmodified. Due to the low prevalence of sepsis, results from many weak learner decision trees were melded into one high quality ensemble predictor using the RUSBoost algorithm.

 Results: Full training set (40,336 subjects) achieved sensitivity and specificity of 61.42% and 80.05% respectively (Accuracy: 79.71%, AUC: 77.79%) with 5 fold cross-validation. In the unofficial phase of the challenge the model achieved a normalized utility score of 0.2674. 

Conclusion: The results show that the model is capable of detecting sepsis in patients. However, there is more work to be done in order to improve performance. Future work will investigate the use of fixed time windows rather than individual hourly measurements to increase prediction performance.