Benchmark of Machine Learning Models for Early Sepsis Prediction

Anamika paul Rupa1, Saddam Al Amin2, Sanjay Purushotham1
1UMBC, 2Researcher at UMBC


As part of the Physionet 2019 challenge, we focused on predicting the onset of sepsis using machine learning models. First, we formulated the early prediction of sepsis as a classification task and explored several machine learning models such as logistic regression, linear discriminant analysis, AdaBoost; and deep learning models such as LSTM to predict the sepsis label 6 hours in advance. This task is very challenging due to the class imbalance and missing data issues in the provided dataset. To address these issues, we employed bootstrapping and imputation techniques. We conducted 5-fold stratified cross-validation on bootstrapped training dataset A and our preliminary experiments showed that the linear discriminant analysis model obtained the best accuracy (0.78) and utility scores (0.21) among all the tested models. In our ongoing work, we are exploring deep learning based multi-instance learning framework for predicting if and when a patient develops sepsis.