Ensemble Tree Classifier with Multi-level Augmented Features in Sepsis Onset Detection

Yanpu Li1 and Juxian Chen2
1Worcester Polytechnic Institute, 2Julian


Early sepsis diagnosis is essential for personalized treatment enhancement and in-hospital mortality reduction for individuals at risk of developing sepsis. Because of the heterogeneous infection and response nature in sepsis patients, such syndrome requires physicians to carefully examine over a constellation of clinical signs and laboratory tests, which is quite time-consuming, but may even not reach a solid consensus. To this end, we aimed at developing a reliable and efficient algorithm to do sepsis onset detection. Our method can be summarized as a feature extraction and classification step.

In the first stage, three-level augmented features have been generated to comprehensively capture the “time series”, “similarity” and “manifest” properties of data. The first two features are based on the hypotheses that patients develop sepsis progressively along time. At similar developing stage, patients would share similar phenotypes. So we utilized RNN to obtain the sequential information of the raw data, and transformed each record into vectors to measure similarity. For the third feature, it is generated to deal with high missing rate. Under the assumption that these missing data represents clinicians’ expert knowledge, namely the clinicians think there is no need for the patient to conduct laboratory test at this timestamp. So we applied convolutional and pooling layers for data compression to undermine the missing values and manifest the non-missing ones. In the classification step, we used an ensemble of tree based models. We applied 5 fold stratified cross validation in the training process and tested on a hidden set, yielding an AUC score of 0.84, accuracy of 0.88 and utility score of 0.37.