An Algorithm for Early Detection of Sepsis Using Traditional Statistical Regression Modeling

Matthias Görges
The University of British Columbia


Sepsis is the final common pathway of many infections, whereby the body’s immune response leads to organ failure, and eventually death. It is associated with high mortality rates and, if survived, results in significant morbidity. Early detection is imperative to improve sepsis outcomes. However, there is also a need to avoid an overly high false alarm rate, as places unnecessary burden on healthcare resources and contributes to significant healthcare costs. The aim of this study was to develop and evaluate a simple algorithm for early sepsis detection.

Significant missing data were encountered in the dataset. This was addressed using forward-filling of missing values, and substituting population means into entirely empty columns. The initial risk factors were limited to vital signs, which are frequently measured during routine care. Features were added using absolute z-score normalization, first derivative, and variable combinations, such as the shock index and estimated cardiac output. A logistic regression model was trained and the risk score threshold was identified using the Youden cut-off from the receiver operating characteristics (ROC) curve. The model coefficients were then implemented into a risk scoring function.

The candidate score demonstrated an AUROC of 0.761 (95%CI 0.750-0.772) when evaluated against the initial 5,000 test sample dataset. With a threshold of 0.0155, it had a positive predictive value of 4.1 and a negative predictive value of 99.3. When evaluated against training setA, it had an AUROC of 0.682, accuracy of 0.757, and utility score of 0.227, against training setB, results were 0.714, 0.814, and 0.185, respectively. The unofficial phase normalized utility score was 0.200.

Evaluation indicated significant potential for further optimization, including the reduction of false-positives. The utility of adding relative change features, using the first valid value as a reference, and variable interaction terms, are expected to provide valuable scope for further investigation.