Early detection of sepsis in hospitalized critical care patients is crucial for improving survival. As a result, there is an important need to develop modern machine learning approaches to support the expert judgement of a treating physician.
In this study, we aim to develop a robust sepsis prediction model using physiological data from the 2019 PhysioNet Challenge. In our preliminary analysis, we trained a recurrent neural network using long short-term memory (LSTM) and achieved an accuracy score of 92%. While the LSTM parameters themselves can be optimized in well-understood ways to produce a more accurate classifier, the impact of pre-processing parameters on sepsis prediction performance remain largely unknown. Thus, we looked to more systematically consider the impact of upstream decisions made in processing the data before training a model. Specifically, we built a framework that iterates over various preprocessing techniques and machine learning models, with the goal of identifying optimal parameters for sepsis prediction. We consider two feature selection methods: stepwise forward feature selection, and manual feature selection performed by a physician with domain knowledge. We then compare three classification models: 1) random forest classification using feature extraction of time series characteristics, 2) LSTM, and 3) LSTM with fully convolutional neural network (LSTM-FCN). The models were trained using a subset of 4000 patient records containing between 10 and 256 hourly observations. In the training set, 42% of patients presented with sepsis at some point during their hospitalization. Each model was validated on a separate set of 4000 patient records using a utility function that rewards early prediction of sepsis and penalizes late or missed predictions of sepsis. Together, our end-to-end framework facilitates the exploration of different preprocessing and machine learning techniques in an efficient, automated manner.