Early Prediction of Sepsis Using Recurrent Neural Networks with Longitudinally Uncertain Labels

Sardar Ansari
Research Fellow - University of Michigan


Introduction: Recurrent neural networks (RNN) are commonly used for analysis of longitudinal data. Specifically, stateful RNNs are used when every time point in the sequence has a label. These models require the labels for the entire sequence to be known in advance. However, in many medical applications including prediction of sepsis, it is impossible to know the true labels for all the data points. For example, while the exact time of diagnosis might be known for sepsis, it is often impossible to know the exact time when the infection took place. Moreover, there is large variability between patients in terms of when their data (measured variables) exhibits signs of deterioration. As a result, the conventional RNN lacks the flexibility needed to predict sepsis.

Methods: To incorporate the uncertainty associated with the labels of sepsis cases, we introduce a three class RNN model; Negative labels for data points belonging to the control cases, Positive labels for data points belonging to sepsis cases after the diagnosis was made, and Uncertain labels for data points prior to the diagnosis (when the labels are uncertain). The new RNN objective function aims to classify the Negative and Positive instances into their corresponding classes, while the Uncertain instances can be classified as either positive or negative. The objective function also minimizes the time before a sepsis patient is correctly classified as such.

Results: The preliminary scores for the method outlined above were obtained on the public data provided by the challenge. The data was divided into training (50%), validation (20%) and test (30%) sets. The AUROC, AUPRC, Accuracy, F-measure and Utility on the test set were 0.758, 0.061, 0.832, 0.106 and 0.306, respectively. At the time of the submission, official scores on the hidden dataset were not available.