Novelty Detection for the Early Prediction of Sepsis

Oliver Carr, Stefan Bostock, Nicolas Basty, John Prince, Navin Cooray, Kirubin Pillay, Maarten De Vos
University of Oxford


Aims: Sepsis is a condition which can cause multiple organ failure due to infections in the body. It is one of the leading causes of death in patients who are critically ill, yet is difficult to diagnose as patients often have other underlying diseases. We aim to detect the early onset of sepsis using physiological measures as part of the Physionet/Computing in Cardiology Challenge 2019.

Methods: Physiological measures were provided for each hour of each patient's stay in hospital. Feature selection was performed by selecting measures with the fewest number of missing entries. Features that were highly correlated were removed resulting in selecting: heart rate, pulse oximetry, temperature, systolic blood pressure, respiration rate, and demographic measures. Due to a large data imbalance, novelty detection was chosen to identify sepsis as a deviation in the feature vector (at each hour) using a model of normality. This model utilized all non-sepsis data which was not within 16 hours of the onset of sepsis. K-means was applied, fitting 100 clusters to the data, and a probability density distribution then centred on each cluster to determine the probability of each feature vector fitting the model of normality. A novelty score was calculated as the negative log of this probability, with a global threshold chosen to label each feature vector as sepsis or non-sepsis.

Results and Discussion: The results of a 10-fold stratified cross-validation achieved an F1 score of 0.13 and a score of 0.34 using the utility function. The utility score for patients who had sepsis was 0.49 (from a possible 0 to 1) and for non-sepsis patients was -0.16 (from a possible -2 to 0). The model has been submitted for testing on the hidden test set. Future work will focus on further optimizing model parameters and incorporating a time dependency.