Combining Clinical Scores and Statistical Features with a Random Forest for Predicting Sepsis

Sai Pavan Kumar Veeranki1, Günter Schreier2, Martin Kropf3, Dieter Hayn4, Alphons Eggerth4, Andreas Ziegl4
1Technical University Graz, 2Technical University Graz, AIT Austrian Institute of Technology, 3Charité Berlin, 4AIT Austrian Institute of Technology


Predicting sepsis in ICU patients using time series analysis and machine learning Introduction: Sepsis is the disproportioned response to an infection and among the leading causes of death. Several studies have reported that early diagnosis and treatment can reduce the risk of adverse outcome from sepsis. However, sepsis is difficult to detect and predict. Earlier identification of patients at high risk of developing sepsis would provide a valuable window of effective treatment. We present a method that make use of the hourly information available from hospital care based on time series analysis and machine learning. Methods: Our approach was based on the dataset provided by the CinC Challenge 2019 on PhysioNet. We separated the dataset into two fractions of 80% and 20% of patients for training and testing, respectively. A total of 589 features were created from the 40 featured time series of hourly sampled data, consisting of demographics (age, …), lab values (creatinine, …) and vital signs (blood pressure, …). Features were calculated either using statistical aggregation (mean, variance, count of outliers, …) or by evaluating the time series with respect to clinical scores like SOFA, qSOFA and SIRS grading systems. To detect sepsis, we followed a two-stage approach. Firstly, we classified cases as sepsis yes/no based on random forests of 100 decision trees. In the second stage, we determined the point of early sepsis diagnosis for the positively classified cases based on the rule that this time point was found to occur typically 10 hours before the end of the sequence. MATLAB with statistics and machine learning toolbox was used for computing. Results: For our first submission we achieved a normalized utility score of 0.628056706475. Our preceding local analysis resulted in an AUC of 0.94 for the decision of sepsis yes/no and a maximum utility score of 0.66.