Sepsis Detection Using Missingness Information

Clémentine Aguet1, Jérôme Van Zaen2, Mathieu Lemay1


Background: Sepsis is a major healthcare concern for the society. An early detection is crucial to reduce its consequences and death rate related. PhysioNet/Computing in Cardiology Challenge 2019 is addressing this issue, providing about 40,000 records from ICU patients. This multivariate database combines demographics, vital signs and laboratory measurements.

Introduction: As clinical measurements appear at irregular frequencies, dataset inevitably contains missing observations. Discarding such information does not achieve valuable performances, especially with large missing rate. A better strategy is data imputation, which involves substituting missing data. However, it has been observed that missing data patterns hold relevant information regarding the patient health state. Inspired by Purushotham et al. (2018)’s methodology, this work focuses on implementing a sepsis detection model incorporating representations of missingness information as additional features.

Method: To capture the long-term dependencies of time-series, the model consists of a recurrent neural network (RNN) with two long short-term memory (LSTM) layers, 128 hidden units and 0.3 dropout each, followed by a fully connected layer with sigmoid activation function as prediction layer. The model is trained with backpropagation and Adam optimizer. As a first implementation, the dataset with imputation of missing values is given as input. Variables with more than 97% of missing values are discarded. The remaining ones are replaced by either mean over all training set or previous measurements according to the missing rate. The performance of this implementation will then be compared to the outcomes of an equivalent network including missing pattern features instead of applying imputations.

Result and conclusion: The results currently obtained are those of the imputation approach. The estimation of the model performance is established from multiple scores. (e.g. accuracy = 0.998, AUROC = 0.98, F-measure = 0.7, utility = 0.63). Those results can be used as comparison baseline for the model incorporating missingness information.