Aim: This paper describes an approach to detect onset of sepsis in patients ahead of time. Such early detection is vital for reducing mortality rate and for effective treatment.
Method: The PhysioNet 2019 challenge data set has 2 distinct characteristics - 1) It contains significant amount of missing data and 2) The number of patients developing sepsis is far less than non-sepsis patients making the data set unbalanced. Decision trees with surrogate splits were used to address missing data and random under-sampling boosting (RUSBoost) was used to address imbalance in the data. During analysis of training data, it was observed that out of 40 features, most of the features lying in the middle (9-33) have missing values. However, these features cannot be dropped because the outcome (on validation data) is directly impacted. The authors plan to undertake further study of these features in next phase which may result in selection of most significant ones.
Results: The initial dataset of 5000 patient records was split into 2 parts : 10% of the records were kept separate for validation and remaining 90% records were used for training. With the method described earlier, we obtained AUROC as 0.7048, accuracy as 0.93, f-measure as 0.146 and utility as 0.3453 on validation data set.
Conclusion: Preliminary results are encouraging however there is scope of improvement. The next phase will consider larger data sets made available, feature selection, better cross validation techniques and parameter tuning for RUSBoost.