Cracking the “Sepsis” Code: Assessing Time Series Nature of EHR Data, and Using Deep Learning for Early Sepsis Prediction

soodabeh sarafrazi1, Rohini Choudhari1, Himanshi Mehta2, Chiral Mehta2, Patricia Francis-Lyon3
1Student, 2Student,University of San Francisco, 3University of San Francisco


On a yearly basis, sepsis costs US hospitals more than any other health condition. A majority of patients who suffer from sepsis are not diagnosed at the time of admission. Early detection and antibiotic treatment of sepsis are vital to improve outcomes for these patients, as each hour of delayed treatment is associated with increased mortality. In this study our goal is to predict sepsis 12 hours before its occurrence using values of common blood tests routinely taken in the ICU. We have investigated the performance of several machine learning algorithms such as XGBoost, CNN, CNN-LSTM and CNN-XGBoost. Contrary to our expectations, XGBoost outperforms all of the sequential models and yields the best hour by hour prediction. We attribute this to the way we fill the missing values, replacing NAs by forward-fill, and for any remaining missing information by values in the normal range. The unofficial phase of submission has resulted in a normalized score of 0.28 for our model.