A Random Forest Approach for Predicting Early Onset of Sepsis

Lakshman Narayanaswamy, Devendra Garg, Bhargavi Narra


Aims: This study presents a Machine Learning (ML) model that predicts onset of sepsis earlier in time that what is possible using common severity scoring systems. The ML model prediction is based on input of patient vitals and lab test data. Methods: Our study focus on two key aspects to maximise sepsis predictions. Feature engineering and ML algorithm selection. We have selected features based on the observation that patient vitals are available on a hourly basis, whereas lab test parameter when available are once in 24 hours or less. To capture the time series nature of data, we calculate the hourly variation in vitals parameters and provide them as features. We left out lab test measures that are available less than 10% across patients. For lab values that are non available for a particular time, we substitute it with the mean of the normal range for that lab test. For the ML algorithm, we choose an algorithm that handles unbalanced data and can be tuned for overfitting and recall. We trained the model using random forest with decision tree. Random forest provides the ability to understand the influence of each feature in the prediction and also gives the care provider the ability to understand and interpret the prediction better rather then black box algorithm that would provide a number. Results: The available data was split 90:10 between training and test data. Our model gives AUROC of 0.982 , AURPC of 0.766, Accuracy of 0.871, F-measure score of 0.183 and Utility of 0.724 on test data.